BlogEngineering
EngineeringJanuary 20, 202615 min read

How We Achieved 98.2% Accuracy in a Noisy Kitchen

A deep dive into our noise-separation architecture.

OO
Olaoluwasubomi Olaoye
CEO & Founder
CONCYA.BLOG
EDITORIAL

A commercial restaurant kitchen during dinner service is an acoustic nightmare. Sizzling grills, clanging pans, shouted orders, running water, exhaust fans, background music from the dining room — all blending into a wall of noise that can exceed 90 dB. This is where our voice engine lives and thrives.

The Problem with Standard ASR

Standard automatic speech recognition (ASR) models are trained on clean audio — podcasts, audiobooks, phone calls in quiet rooms. When exposed to restaurant-level noise, their accuracy drops from 95%+ to below 70%. In a noisy kitchen, traditional ASR is effectively unusable. This isn't a minor degradation — it's a complete system failure.

98.2%
Our Accuracy (Noisy)
<70%
Standard ASR (Noisy)
90dB
Typical Kitchen Level
400K+
Hours Training Data

Our Approach: Multi-Stage Noise Separation

Rather than trying to build a single model that handles all acoustic conditions, we developed a multi-stage pipeline where each stage specializes in a different aspect of the noise problem.

We don't filter noise — we teach the system to hear through it.

  • 01Stage 1: Acoustic Scene Classification — Identifying the type of noise environment in real-time
  • 02Stage 2: Adaptive Beamforming — Dynamically focusing on the speech source
  • 03Stage 3: Neural Noise Separation — Deep learning-based source separation trained on restaurant audio
  • 04Stage 4: Robust ASR — Fine-tuned recognition model that expects and handles residual noise

The result is a system that doesn't just tolerate noise — it thrives in it. And it gets better over time, as every call in a noisy environment adds to our proprietary training dataset. This is the kind of compounding advantage that becomes a genuine moat.

OO
Olaoluwasubomi Olaoye
CEO & Founder

Building the operating system for physical spaces.