Configuration¶
Configuration settings are both readable and writable and are part of task models; they are saved to Stream by dup and save, and restored by load.
Use these to change model or fine-tune model behavior. Models have reasonable defaults, so there's usually no need to modify settings of this type.
Most frequently used are operating-point for wake words and command sets, leading-silence, and trailing-silence for VAD templates, partial-result-interval for LVCSR and STT, and stt-profile for STT models.
Use the Session get and set functions that match the type of the setting. Use getInt, for example, to read the int value for operating-point.
0.¶
1.¶
configuration stream read-write
ac-prune-top-k¶
configuration int read-write tnl 7.5.0
Reduce LVCSR decoder CPU use
This setting trades CPU use for recognition accuracy.
A subset recognizers optimized for low resource use created by VoiceHub allow reducing the CPU cycles used in search decoding at the expense of an increased recognition error rate.
Set to 0 to disable.
accuracy¶
configuration double read-write
am-size¶
audio-stream-size¶
configuration int read-write
Input audio buffer size.
The number of audio samples kept in a circular audio history buffer, accessible through audio-stream.
Use this buffer to retrieve segmented audio using alignments (begin-sample, begin-ms, end-sample, end-ms) obtained in the ^result.
Set to 0 to disable audio buffering.
backoff¶
configuration int read-write
Start point back-off in ms.
Audio margin added before the start point found by a VAD.
cache-file¶
configuration string read-write
Continuous Adaptation cache file name.
When set, enrolled user data will be saved to, and loaded from this file. If not set, enrolled user data are discarded when the spotter session is released.
This setting is only available in fixed-phrase spotters that support continuous adaptation.
If you need more control over how or when the enrollment context is saved you can do this from the ^adapted callback handler.
complete-only¶
configuration int read-write tnl
Controls whether incomplete LVCSR results are accepted.
The text result available in the ^result callback for LVCSR recognizers reports the recognition result that best matches the acoustic evidence the recognizer saw. The default behavior is to show incomplete results, even if they are not accepted by the grammar specification. For example, if a custom recognizer uses
and the audio contains only "1 2 3 4", then the final result will be "1 2 3 4".If this behavior is not desirable, setting complete-only to 1 will suppress such incomplete results. The ^result callback will still happen, but text will be <no-match/>. The ^nlu-intent and ^nlu-slot events will not be invoked.
ctx-enroll¶
configuration int read-write
Number of enrollments with trailing context.
The recommended number of enrollments where the phrase is followed by additional speech. For example: "Hey Sensory will it rain tomorrow?"
custom-vocab¶
configuration string read-write stt
Custom STT vocabulary.
STT recognizers occasionally do not have full vocabulary coverage for low-frequency words, proper names, trade marks, and such. Use this custom vocabulary setting to add new words to a recognizer.
Format:
- New vocabulary word or words,
- followed by zero or more mis-recognized examples, separated by commas.
- Vocabulary entries are separated by
\r,\nor;
Note
Use custom vocabulary to address minor recognition issues. For more than a couple of hundred entries you'll get better performance with a domain-specific STT model. Please contact your Sensory sales representative to explore options.
Example
% snsr-eval -t model/stt-enUS-automotive-medium-2.3.15-pnc.snsr \
-s partial-result-interval=0 \
data/enrollments/armadillo-1-4-c.wav
NLU intent: no_command = an anlla record a video
400 1720 an anlla record a video
% snsr-eval -t model/stt-enUS-automotive-medium-2.3.15-pnc.snsr \
-s partial-result-interval=0 \
-s 'custom-vocab="armadillo, an anlla; jackalope"' \
data/enrollments/armadillo-1-4-c.wav
NLU intent: no_command = armadillo record a video
400 1720 armadillo record a video
debug-log-file¶
configuration string read-write
Debug log filename.
The name of the log file tpl-spot-debug writes to. This value is required, and no default is defined in the template. The directory the log file is in must exist, and must be writable.
These optional and mutually exclusive character sequences are substituted with the time stamp when the log file is first opened:
%@- year-month-day_hour-minute-second.milliseconds (UTC)%#- milliseconds since the epoch.
delay¶
configuration int read-only 6.16.0
Phrase spotter delay in ms.
Deprecated
Support for this setting will be removed from the next major release of the TrulyNatural SDK.
First deprecated in release 6.16.0 (2021-06-06) and made read-only in 7.0.0 (2023-11-20).
The cumulative recognition score for a wake word or command recognizer can exceed the decision threshold before the end of the utterance. This setting controls how long the recognizer will wait while the recognition score is still increasing before reporting the event.
Longer delays can increase the time alignment accuracy of the end of the spotted phrase.
duration-ms¶
configuration double read-write
Low false-reject listening window.
Selects the time window in ms following a close false-reject that smart wake words will use low-fr-operating-point instead of operating-point.
Defaults to 10 seconds if not explicitly set.
enrollment-task-index¶
configuration int read-write
The index of the sub-task to enroll.
For enrollment tasks that contain multiple sub-tasks (for example, a user-defined trigger and an enrolled fixed trigger), this integer value selects which of the sub-tasks the enrollments should be applied to.
See the documentation delivered with the task file for the sub-task mapping.
Note
For most enrollment tasks the only supported task index is 0.
fex-hash¶
configuration string read-only pre-release
Feature extractor hash.
Pre-release
This is an experimental feature. Do not use unless recommended by Sensory.
This is a unique string that identifies the feature type used by the task.
hold-over¶
configuration int read-write
Endpoint hold-over.
Audio margin added after the endpoint found by a VAD. This is the amount of trailing silence to include in the segmentation.
include-leading-silence¶
configuration int read-write
Include leading silence in VAD output.
Set to 1 to include all audio up to the endpoint in the <-audio-pcm output stream. Set to 0 to return to the default behavior, which discards leading silence.
If this setting is used with a spot-VAD template such as tpl-spot-vad, tpl-spot-vad-lvcsr, or tpl-opt-spot-vad-lvcsr the leading silence includes the trigger phrase.
include-model¶
configuration int read-write
Debug log includes a copy of the model.
This boolean value controls whether the debug-log-file includes a copy of the task model (the .snsr file).
The default value is 1. Set include-model=0 for smaller (but less complete) debug log files.
include-wake-word-audio¶
configuration int read-write 7.6.0 tnl
Include the wake word audio in VAD output
When set to 1, VAD templates tpl-spot-vad, tpl-spot-vad-lvcsr, and tpl-opt-spot-vad-lvcsr include the wake word in the audio output. Set to 0 to return to the default behavior, where the output does not include the wake word audio.
Note
This setting is a synonym for include-leading-silence when used with these templates. If you set both include-wake-word-audio and include-leading-silence, include-wake-word-audio takes precedence.
interactive¶
configuration int read-write
leading-silence¶
configuration int read-write
VAD leading silence time-out, in ms.
The VAD will invoke the ^silence event handler if no speech is detected during the first leading-silence ms of processed audio.
listen-window¶
configuration int read-write
Phrase spot listening window in seconds or milliseconds.
This is the duration that a spotter will listen for a command before timing out. Spotters with short listening windows are typically optimized to have lower false reject, but higher false accept rates.
If this value is 120 or less it is in seconds. Values larger than 120 are in ms. In wake word spotters tuned for continuous listening this value is 0.
Note
This value is only used when:
- Converting models to DSP format for embedded use.
- When the spotter is used in slot
1of the tpl-spot-sequential spotter template model.
In all other cases spotters listen continuously, regardless of the value of listen-window.
lm-size¶
loop¶
configuration int read-write
Control template looping behavior.
In tpl-spot-sequential, setting this value to 1 changes when the listening focus returns to slot 0. Instead of immediately returning to slot 0 after a spot in slot 1, it resets the expiration timer, and only a timeout returns to slot 0.
This allows for a wake word followed by zero or more commands from a command set. The default behavior (loop = 0) is to allow at most one command before requiring another wake word utterance.
7.6.0 Setting loop = 2 pins the listening focus to slot 1. Use this, for example, if an application needs to gate a command set recognizer with a wake word or an external event such as a push-to-talk button.
low-fr-operating-point¶
configuration int read-write
Low false-reject spotter operating point.
Selects the low false-reject fall-back operating point used by smart wake words . This low false-reject operating point is selected for duration-ms if a spot was rejected at operating-point but would have been accepted at low-fr-operating-point.
max-recording¶
max-users¶
nlu-match-max¶
configuration int read-write tnl
The maximum number of alternate NLU matches to consider
Limits the number of ^nlu-slot callbacks issued in case of multiple valid NLU matches to the recognition result. The default value is 1, limiting NLU results to the best-scoring match only.
nlu-size¶
operating-point¶
configuration int read-write
Spotter operating point.
Selects the trade-off between false accept and false reject errors for wake word and command set recognizers.
Higher-numbered points are more accepting.
- The valid range is from
1to21inclusive. - Lower-numbered points have a lower false accept rate at the expense of higher false reject fraction.
- The false accept rate is expressed as the expected number of false accepts (where the recognizer mistakenly spots the trigger phrase) per time unit. For example, 1.2 false accepts per day.
- The false reject rate is the percentage of times the actual trigger phrase is spoken, but not recognized. For example, 4.5%.
- The default operating point is selected by Sensory during trigger development for a good balance between the these two error types.
- Not all operating points are necessarily valid. Use operating-point-iterator to find all the available points.
operating-point-iterator, low-fr-operating-point, duration-ms
partial-result-interval¶
configuration double read-write tnl stt
Partial result update interval.
The current preliminary result is emitted every partial-result-interval milliseconds. Set to 0 to disable partial result reporting.
Warning
Do not change partial-result-interval from an event handler, or while a model is running.
Note
In STT models this also sets the interval at which the model is evaluated. Less frequent updates trade preliminary result latency for lower average CPU use. Set to 0 for the lowest possible evaluation rate and CPU use.
pass-through¶
configuration int read-write
VAD audio pass-through behavior.
If set to 0, no audio from ->audio-pcm will be passed through to <-audio-pcm. The begin- and endpoint handlers will still be invoked. The default value, 1, passes speech-detected samples to <-audio-pcm.
push-buffer-backlog¶
configuration int read-write
Reports the number of bytes of deferred push data.
If push is used with a push-duration-limit, this setting reports the number of bytes deferred for processing in subsequent calls to push.
push-buffer-size¶
configuration int read-write
The size of the internal ring buffers used by push.
If push is used with a push-duration-limit, processing will require deferral if the duration limit is reached. In this case, push will allocate a ring buffer to hold these data. This setting configures the size of this buffer, in bytes.
The default buffer size is sufficient to defer up to 250 ms of audio data.
push-duration-limit¶
configuration double read-write
Sets a limit to the maximum processing time push should consume.
This setting is the maximum number of milliseconds any call to push should spend processing data before returning control to the caller.
The default value is 0, which disables the processing limit.
Note
This requires a valid real-time clock function, see CONFIG_CLOCK_FUNC.
TrulyNatural SDK libraries for Android, Linux, macOS, iOS, and Java include real-time clock functions and require no additional configuration.
You should use a push-duration-limit if:
- You're using push, and
- you collect live audio on the same thread as the recognizer, and
- you will drop audio packets if you don't return from push before the next packet is available.
push-duration-limit adds a cap to the amount of CPU used in each call to push. This requires and allocates an additional input ring buffer that's push-buffer-size bytes in size.
If you have a separate thread, or interrupt-driven live audio recording and you want to maximize throughput, increase the size of the audio ring buffer instead of using a push-duration-limit.
Recommendations:
- Use 15 ms audio chunks.
- The audio recording buffer size determines the longest time the average recognizer throughput can fall behind real time.
- With a a 30 ms buffer only two 15 ms block fit, which means that every SDK processing call must return within 15 ms, or we'll lose a block or partial block.
- Using a 300 ms buffer relaxes this. 20 blocks mean that we can fall up to 18 blocks (270 ms) behind before losing audio.
ram-limit¶
configuration double read-write tnl 7.5.0
Limit LVCSR decoder memory use
The amount of heap RAM to allocate to LVCSR search decoding, in bytes.
A subset recognizers optimized for low resource use created by VoiceHub allow limiting the amount of heap RAM to allocate to search decoding. This setting modifies this limit. Lower values can increase error rates, so we recommend that you set this to as large a value as constraints allow. Set to 0 to disable the limit.
req-enroll¶
configuration int read-write
Enrollment target.
The recommended number of enrollments for each user. Using either more or fewer enrollments will reduce overall spotter performance.
result-max¶
configuration int read-write tnl
The maximum number of alternate phrase results to consider
Limits the number of alternate phrases returned by LVCSR models.
If result-max > 1, phrase-iterator will return phrase-level recognition results in order of likelihood.
The default is result-max == 1, which returns only the most likely result.
Limitations
- word-iterator and phone-iterator are available for the most likely result only.
- Time alignments are accurate for the most likely result only.
- score values are not usable when
result-max > 1. - Silence markup is elided from all but the top scoring phrase. An empty text result indicates that silence was the best match to the acoustic input.
Warning
N-best processing is computationally expensive, frequently prohibitively so. Contact Sensory for guidance before using this feature in production.
samples-per-second¶
save-enroll-audio¶
configuration int read-write
Include enrollment audio in the enrollment context.
Set to 1 to include the enrollment audio in enrollment contexts, 0 to exclude.
score-offset¶
search.frame-nota¶
configuration double read-write
Out-of-vocabulary rejection sensitivity.
This setting controls out-of-vocabulary rejection in custom LVCSR recognizers.
Custom LVCSR recognizers report <no-match/> for words or phrases that are not in the grammar. With an search.frame-nota value of 0 the recognizer will never report <no-match/>, it will return the closest match instead. With search.frame-nota at 1.0, almost all input will return <no-match/>.
The optimal value for search.frame-nota depends on the vocabulary used. A reasonable value to start testing with is 0.2.
Note
Do not change search.frame-nota for models that include statistical language model components. These models typically have either -broad- or -background- in the model name, and are configured to use the language model to recognize utterances not covered by the custom grammar.
show-silence¶
configuration int read-write tnl
Include silence in recognizer results.
When set to 1, LVCSR recognition results include word-pause <wp>, sentence-begin <s>, and sentence-end </s> markup. The default value is 0, which elides these from results.
slm-enabled¶
configuration int read-write stt 7.4.0
Enable optional SLM component.
Set to 0 to turn the SLM component off, 1 to turn on.
^slm-start, ^slm-result, ^slm-result-partial, slm-turn-limit
slm-size¶
slm-turn-limit¶
configuration int read-write stt 7.4.0
Configure SLM history behavior.
If slm-turn-limit >= 0 the optional SLM component limits the number of conversational turns in the model history. The default -1, which keeps all history.
Writing to slm-turn-limit discards existing history.
Note
Values larger than 0 increases the SLM result latency and CPU use.
slot¶
configuration string read-write
Template slot selector.
Use with tpl-spot-select and tpl-opt-spot-vad-lvcsr to select the active slot.
0, 1, phrasespot, lvcsr
stt-profile¶
configuration string read-write stt 7.4.0
Select STT speed vs accuracy trade-off.
Default value is accurate, set to fast to reduce CPU load at the expense of recognition accuracy.
sv-threshold¶
configuration double read-write
task-name¶
configuration string read-only 6.14.0
Task name.
Deprecated
Support for this setting will be removed from the next major release of this SDK.
Do not use this in new code.
task-type¶
configuration string read-only
Task type.
This, together with task-version, describes the model behavior: Which setting keys and streams it supports.
Examples include: enroll, lvcsr, phrasespot, phrasespot-vad, and vad.
task-type-and-version-list¶
configuration string write-only
Verifies that a model matches one of list of types and versions.
When used with require, the value argument must be a semicolon-separated list of task-type and task-version values. This list must have at least one element.
A task will match the requirement if one of the task-type fields match, and the corresponding task-version is satisfied.
If no task-type matches, require returns REQUIRE_MISMATCH.
If a task-type matches, but the associated task-version is not satisfied, require returns VERSION_MISMATCH.
Example
task-version¶
configuration string read-only
threshold¶
configuration int read-write 7.4.0
Dynamic operating point selection threshold.
Deprecated
Superseded by built-in support for smart wake words in TrulyNatural 7.4.0.
Selects the threshold used by tpl-spot-dynop-1.4.0.snsr to decide whether to select the low-fr-operating-point.
trailing-silence¶
configuration int read-write
VAD trailing silence time-out, in ms.
The VAD will invoke the ^end event handler once trailing-silence ms of silence has followed the last bit of speech.
user¶
configuration string read-write
Enrolling user tag.
Sets the tag for the current enrollment. This should be a unique alphanumeric phrase, without spaces. It is the phrase returned as a recognition result.
If enrolling more than one phrase for any of the users, the tag must contain one / that separates a user-specific part from the phrase part. For example: user1/phrase1, user2/phrase1, user2/phrase2.