Speech feedback

It's possible to get feedback while recording. After every sentence feedback is provided indicating whether or not the sentence was read well. To perform a recording with feedback the following calls have to be made:

Meet prerequisites
Prepare the speech feedback
Register audio procedure for streaming
Start listening for audio

Prerequisites

To be able to do speech feedback the following prerequisites needs to be met:

A speech challenge must be used so that needs to exist.
The speech challenges needs to have the language set
The user performing the speech feedback needs to have a valid profile with birthYear and language set.

Prepare

Prepare a new speech feedback. This RPC should be called for each new speech feedback. A unique id is generated for a speech feedback and a speech challenge is prepared.

URI

nl.itslanguage.feedback.prepare

Parameters

Name	Type	Description
challenge_id	`string`	Required The id of the speech challenge to prepare.

Response

The id of the new speech recording is returned as a string.

Start listening

In order to receive feedback the server needs to listen for audio on the registered audio rpc. While listening the server will reply using progressive results. The server will stop listening when the audio rpc returns.

Note

The server will only stop processing the audio when the audio rpc returns.

URI

nl.itslanguage.feedback.listen_and_reply

Parameters

Name	Type	Description
recording_id	`string`	Required The unique id of the speech recording.
rpc	`string`	Required The URI of a registered audio rpc.

Response

The rpc returns progressive results for realtime feadback. After every sentence the following json is sent as a progressive result:

{
  "feedback_id": "recording_1",
  "sentence": 0,
  "errors": 1,
  "confidence": -78.0,
  "currentFrame": 68,
  "eosFrame": 63,
  "sessionId": "test",
  "tokenType": "EOS"
  "words": [
    {
      "sentenceIndex": 0,
      "textIndex": 1,
      "expected": "funny",
      "recognized": "funny",
      "label": "CW",
      "description": "Correct.",
      "explanation": "The pronunciation matches the expected text."
    },
    {
      "sentenceIndex": 0,
      "textIndex": 1,
      "expected": "gif",
      "recognized": "gif-ERR",
      "label": "PC",
      "description": "Phonetic Change",
      "explanation": "One or more phones are changed."
    }
  ]
}

Name	Type	Description
feedback_id	`string`	The unique id of the feedback this results belongs to.
sentence	`int`	The index of the sentence, starting at `0`.
errors	`int`	Amount of errors made.
confidence	`float`	Confidence of the end of sentence detection.
currentFrame	`int`	Audio frame of end of sentence detection.
eosFrame	`int`	Audio frame of the end of sentence.
sessionId	`string`	Session identifier.
tokenType	`string`	The detected token.
words	`list`	All expected and recognized words in the sentence.

The list of words each contain the following fields:

Name	Type	Description
sentenceIndex	`int`	Index of the sentence.
textIndex	`int`	Index of the whole text.
expected	`string`	The word as it was expected to be pronounced.
recognized	`string`	The recognized result. If nothing is recognized this field is `null`.
label	`string`	Label describing what was recognized.
description	`string`	Description of the label.
explanation	`string`	in-depth explanation of the label.
startTiming	`int`	Beginning of the section in milliseconds.
endTiming	`int`	End of a section in milliseconds.

Other than a correct word labels can indicate errors in the pronunciation. These are all the possible labels that can currently be assigned to a word.

Label	Description	Explanation
`CW`	Correct	The pronunction matches the expected text.
`SL`	Silence	Anomalous silence in-between words.
`RW`	Repetition	Expected word was repeated.
`OW`	Omission	Expected word was omitted.
`PC`	Phonetic Change	One or more phones are changed.
`FS`	False Start	Partial in-prompt realisation.

When the recording is finished a recording with feedback is returned:

{
  "id": "recording_1",
  "created": "2014-01-28T21:25:10Z",
  "updated": "2014-01-28T21:25:10Z",
  "audioUrl": "https://api.itslanguage.nl/download/audio.wav",
  "sentences": [
    {
      "sentence": 0,
      "errors": 1,
      "confidence": -159.0,
      "currentFrame": 64,
      "eosFrame": 61,
      "sessionId": "test",
      "tokenType": "EOS"
      "words": [
        {
          "sentenceIndex": 0,
          "textIndex": 0,
          "expected": "hello",
          "recognized": "hello",
          "label": "CW",
          "description": "Correct",
          "explanation": "The pronunciation matches the expected text.",
          "startTiming": 10,
          "endTiming": 400
        },
        {
          "sentenceIndex": 0,
          "textIndex": 1,
          "expected": "there",
          "recognized": "there-ERR",
          "label": "PC",
          "description": "Phonetic Change",
          "explanation": "One or more phones are changed.",
          "startTiming": 400,
          "endTiming": 620
        }
      ]
    },
    {
      "sentence": 1,
      "errors": 0,
      "confidence": -124.1,
      "currentFrame": 87,
      "eosFrame": 84,
      "sessionId": "test",
      "tokenType": "EOS"
      "words": [
        {
          "sentenceIndex": 1,
          "textIndex": 2,
          "expected": "general",
          "recognized": "general",
          "label": "CW",
          "description": "Correct",
          "explanation": "The pronunciation matches the expected text.",
          "startTiming": 40,
          "endTiming": 320
        },
        {
          "sentenceIndex": 1,
          "textIndex": 3,
          "expected": "kenobi",
          "recognized": "kenobi",
          "label": "CW",
          "description": "Correct",
          "explanation": "The pronunciation matches the expected text.",
          "startTiming": 320,
          "endTiming": 500
        }
      ]
    }
  ]
}

Name	Type	Description
id	`string`	The id of the recording.
created	`string`	The timestamp when the recording was created.
updated	`string`	The timestamp when the recording was last updated.
audioUrl	`string`	The url to fetch the recorded audio.
sentences	`array`	A list containing the feedback per sentence.

Pause

When desired the feedback recording can be paused. Doing so will stop the server from processing the audio stream and returning feedback. Do note that the audio recording isn't paused when the feedback is paused. To pause the recording the audio RPC needs to stop sending data. For practical reasons it is recommended to stop sending audio when the feedback is paused. Also see the Resume docs for more info.

Note

Pausing the feedback will not stop the feedback. See the note on audio processing how to stop it.

URI

nl.itslanguage.feedback.pause

Parameters

Name	Type	Description
id	`string`	Required The id of the feedback to pause.

Resume

A paused feedback can be resumed using this RPC. After this the audio stream is again processed by the server. Currently it is required to re-send the audio header when resuming feedback. Failing to do so will fail the feedback as the server can't recognise the audio format without a header.

Note

It appears to be valid to have a wave file with multiple headers.

URI

nl.itslanguage.feedback.resume

Parameters

Name	Type	Description
id	`string`	Required The id of the feedback to resume.
sentence_id	`int`	Required The id of the sentence (starting with `0`) to resume on.