NVIDIA AI Sensory Taps for Voice and Visual Applications


You may not know Todd Mozer, but chances are you’ve experienced his company, enabling voice and visual AI for billions of consumer electronics devices around the world.

Sensory, born in 1994 from Silicon Valley, is a pioneer of compact models used in the mobile devices of industry giants. Today, Sensory brings interactivity to all kinds of voice-enabled electronic devices. LG and Samsung have used Sensory not only in their cell phones, but also in refrigerators, remote controls and portable devices.

“What if I want my talking microwave to find me any recipe on the internet, to walk me through the recipe? That’s where the hybrid computing approach can come in,” said Mozer, CEO and Founder .

Hybrid computing is the dual approach of using cloud and on-premises computing resources.

The company’s latest efforts rely on NVIDIA NeMo – a toolkit for building state-of-the-art conversational AI models – and Triton Inference Server for its Sensory Cloud hybrid compute unit.

Make electronic devices smarter

Devices are getting more and more powerful. As special-purpose inference accelerators come to market, the best models tend to be bigger and require even more memory, so edge-based processing isn’t always the best solution.

Cloud connections for devices can improve the performance of these compact models. Live rollouts of updates can apply to wearables, cellphones, cars and more, Mozer said.

“Having a cloud connection offers updates for smaller, more accurate on-device models,” he said.

This provides a payoff for many device feature enhancements. Sensory offers its customers text-to-speech, text-to-speech, wake word verification, natural language understanding, face recognition, and speaker and sound identification.

Sensory is also working with NVIDIA Jetson edge AI modules to bring the power of its Sensory Cloud to larger device implementations.

Exploit Triton for inference

The company’s Sensory Cloud runs voice and vision models with NVIDIA Triton. Sensory’s custom cloud model management infrastructure built around Triton allows different customers to run different model versions, deploy custom models, enable automatic updates, and monitor usage and errors .

It can be deployed as a container by Sensory customers for on-premises or cloud-based implementations. It can also be used completely privately, without any data being transmitted to Sensory.

Triton provides Sensory a special purpose machine learning task library for all Triton communications and rapid deployment of new models with minimal coding. It also enables an asynchronous actor pipeline to make it easier to assemble and scale a new pipeline. Triton’s dynamic batch processing helps increase GPU throughput and performance analysis for inference optimization.

Sensory is a member of NVIDIA Inception, a global program designed to support cutting-edge startups.

NeMo Recruitment for Hybrid Cloud Models

Sensory developed NVIDIA NeMo to provide improvements in accuracy and functionality for all of its cloud technologies.

NeMo’s enhanced features include its proprietary feature extractor, audio streaming optimizations, customizable vocabularies, multilingual templates and more.

The company now has 17 languages ​​supported by NeMo models. And thanks to proprietary sensory enhancements, word error rates consistently outperform the best text-to-speech rates, according to the company.

“Sensory brings enhanced features and functionality with NVIDIA Triton hardware and NeMo software,” said Mozer. “This type of hybrid cloud setup gives customers new AI-powered capabilities.”

Image credit: sensory


About Author

Comments are closed.