Skip to content

tpl-vad-lvcsr tnl

This template detects speech with a VAD and sends the segmented audio to the LVCSR or STT recognizer in slot 0.

tpl-vad-lvcsr has task-type==phrasespot.

Expected task types:

tpl-vad-lvcsr-3.17.0.snsr, tpl-opt-spot-vad-lvcsr

Operation

flowchart TD
  start((start))
  start --> fetch0

  subgraph slot0[**slot 0** (lvcsr)]
    startSTT((start))
    startSTTfinal((start))
    stopSTT((stop))
    stopSTTpartial((stop))
    processSTT[process]
    partialSTT(^result-partial)
    intentSTT(^nlu-intent)
    slotSTT(^nlu-slot)
    resultSTT(^result)
    nluSTT{NLU<br>match?}

    slmSTT{SLM<br>included?}
    generateSTT[generate]
    slmstartSTT(^slm-start)
    slmresultpartialSTT(^slm-result-partial)
    slmresultSTT(^slm-result)

    startSTT --> processSTT
    processSTT ---->|hypothesis| partialSTT
    partialSTT --> stopSTTpartial

    startSTTfinal --> nluSTT
    nluSTT -->|yes| intentSTT
    nluSTT -->|no| resultSTT
    intentSTT --> slotSTT
    slotSTT --> resultSTT
    slotSTT -->|more| intentSTT

    resultSTT --> slmSTT
    slmSTT -->|yes| slmstartSTT
    slmSTT -->|no| stopSTT
    slmstartSTT -->|OK| generateSTT
    slmstartSTT -->|STOP| stopSTT
    generateSTT -->|response| slmresultpartialSTT
    slmresultpartialSTT --> generateSTT
    generateSTT -->|done| slmresultSTT
    slmresultSTT --> stopSTT
  end

  fetch0[/samples from ->audio-pcm/]
  fetch1[/samples from ->audio-pcm/]
  audio0(^sample-count)
  audio1(^sample-count)

  silence(^silence)
  begin(^begin)
  END(^end)
  limit(^limit)

  process0[VAD process]
  process1[VAD process]

  final@{ shape: f-circ }
  listenEnd@{ shape: f-circ }

  fetch0 --> audio0
  audio0 --> process0
  process0 --> fetch0
  process0 -->|speech start| begin
  process0 -->|timeout| silence
  silence ~~~ final
  silence --> listenEnd

  begin --> fetch1
  fetch1 --> audio1
  audio1 --> process1

  process1 --> startSTT
  stopSTTpartial --> fetch1

  process1 -->|speech end| END
  process1 -->|speech limit| limit
  END --> final
  limit --> final

  final --> startSTTfinal
  stopSTT --> listenEnd
  listenEnd ----> fetch0
  1. Read audio data from ->audio-pcm.
  2. Invoke ^sample-count.
  3. If VAD processing does not detect the start of speech within the leading-silence timeout, invoke ^silence and continue at step 1.
  4. Invoke ^begin if processing detects the start of speech, else continue at step 1.
  5. Read audio date from ->audio-pcm.
  6. Invoke ^sample-count.
  7. If VAD processing detects an endpoint invoke either ^limit or ^end and continue at step 9.
  8. Process VAD segmented audio in the LVCSR or STT recognizer
    • Invoke ^result-partial with interim recognition result hypothesis.
    • Continue at step 5.
  9. Produce a final LVCSR or STT recognition hypothesis.
    • Invoke ^nlu-intent and ^nlu-slot for each NLU intent found.
    • Invoke ^result with the final recognition hypothesis.
    • If there's no SLM, continue at step 1.
    • Invoke ^slm-start, if the callback returns STOP, continue at step 1.
    • Generate SLM result, invoking ^slm-result-partial on each generated token.
    • Invoke ^slm-result with complete SLM result.
    • Continue at step 1.

Register callback handlers with setHandler only for those events you're interested in.

Settings

^begin, ^end, ^limit, ^nlu-intent, ^nlu-slot, ^result, ^result-partial, ^sample-count, ^silence, ^slm-result, ^slm-result-partial, ^slm-start

none

audio-stream, audio-stream-first, audio-stream-last

->audio-pcm, audio-stream-from, audio-stream-to

audio-stream-size, audio-stream-size, backoff, custom-vocab, hold-over, include-leading-silence, leading-silence, max-recording, partial-result-interval, samples-per-second, stt-profile

lvcsr, phrasespot

live-spot.c, snsr-eval.c, PhraseSpot.java

Notes

Use this template for command and control type applications where commands are initiated just by speaking.

Examples

% cd ~/Sensory/TrulyNaturalSDK/7.6.1

% bin/snsr-edit -o vad-stt.snsr\
    -t model/tpl-vad-lvcsr-3.17.0.snsr\
    -f 0 model/stt-enUS-automotive-medium-2.3.15-pnc.snsr

# Say, for example: "Turn the air conditioning up all the way"
% snsr-eval -t vad-stt.snsr
P   1000   1040 T
P   1000   1600 Turn the egg
P   1040   2040 Turn the air conditioner
P   1040   2320 Turn the air conditioning up
P   1040   2760 Turn the air conditioning up all the way
NLU intent: set_fan (0.9547) = turn the air conditioning up 100%
NLU entity:   hvac (0.9744) = air conditioning
NLU entity:   percentage_value (0.8963) = 100%
  1040   2880 Turn the air conditioning up all the way.
^C