Preface to a series of blogs on Automatic Speech Recognition with Dr Gaurav Goswami
Let us begin with a broad introduction before delving deeper.
Speech as a method of interfacing with software and apps has become very popular. Speech has many exciting applications — Virtual assistants (like Siri, Cortana etc.), call routing, IVR, medical transcription, and dictation amongst the popular ones.
Based on the kind of speech application you are using — there could be different components used under the hood. These include but are not limited to:
Jazz up your BOT
In my previous blog, we saw how to whip up a quick web based BOT without much of a sweat. In this one we’ll see how to spruce it up a bit.
Start by clicking on the Web Chat integration of your Assistant.
Use the options under the style tab to set the Assistant’s name and set the colous for the chat header, user bubbles and interactive elements. Interactive elements include
Watson Assistant(WA) in its initial versions (including Watson Conversation) required some complex UI to be built around it while integrating it with one’s website. Code patterns provided on IBM’s developer website gave a jumpstart to developers. However, it was not very trivial and required a web developer with advanced skills to tackle this. Picture this, you needed a UI that would
Quickfix solution to placing divs next to each other and placing css grids withins these divs.
Firefighting is what the day threw at me! Not a novice to UI development, at the same time I am not at the top of the game.
I was thrown into a situation where I had to whip up three grids placed side by side. This is what I did with the aid of stackoverflow and google.
Problem 1: Place place 3 divs next to each other.
Add the following styles in your page. …
In earlier parts of this series, we have covered an introduction to Automatic Speech Recognition (ASR), defined the problem, and briefly discussed the overall process. Moving on to the next important topic related to ASR, let us try to better understand the models that enable it.
Traditionally, ASR relies on two machine learning models to make sense out of the audio input that it receives, namely — the Language Model and the Acoustic Model.
To get the best transcription quality, all of these models can be specialized for a given language, dialect, application domain, type of speech, and communication channel.
In the previous blog we briefly introduced you to the concept of Speech Recognition. Now let us dig in a little deeper. Let us follow the sound wave from when it is uttered to how it gets transformed into the text that it represents.
Audio input could comes from different sources. It could come from an audio file, microphone on a laptop, mobile etc.
Audio basically means sound waves. These sound waves are converted into electrical signals. These electrical signals are then converted into bit values by an Analog to Digital Converter (ADC). The ADC will sample the signal at…
Code changes to shift from API V1 to API V2 of Watson Assistant
Many of my chatbots and voicebots built using Watson Assistant in the last few years used the Watson Assistant API V1. Digging up one of these oldies, I hit into some import and require conflicts in Nodejs. Googling around, I felt moving on to API V2 would be good.
ps: API V2 does not support modification of the workspace.
Here is a link to the official documentation to do a migration to API V2
However, here are the changes I had to make to my Nodejs…
How to train Watson Speech to Text to identify new Names or Proper Nouns
Watson Speech to Text (STT) base models come trained with a predefined set of names / proper nouns. While using one of these models in an Indian context, Watson had trouble recognising Indian names in the audio that had to be transcribed.
When a model has to be trained to understand something new, the most important element is……of course Data!! So in my case data would be different utterances with the names to be identified. This certainly, I didnt have!! I had one advantage though —…