Voice SDK
Speaker recognition for stand-alone or Web applications
VeriSpeak voice identification technology is designed for biometric system developers and integrators. The text-dependent speaker recognition algorithm assures system security by checking both voice and phrase authenticity. Voiceprint templates can be matched in 1-to-1 (verification) and 1-to-many (identification) modes.
Available as a software development kit that enables the development of stand-alone and Web-based speaker recognition applications on Microsoft Windows, Linux, macOS, iOS and Android platforms.
VeriSpeak SDK is based on VeriSpeak voice recognition technology and is designed for biometric systems developers and integrators. The SDK allows rapid development of biometric applications using functions from the VeriSpeak algorithm. VeriSpeak can be easily integrated into the customer's security system. The integrator has complete control over SDK data input and output.
License Activation Options
The components are copy-protected. The following license activation options are available:
- Activation by serial number is not suitable for ARM-Linux, except BeagleBone Black and Raspberry Pi 3 devices.
- Activation by serial number is not suitable for virtual environments.
- Activating single computer licenses – An installation license for a VeriSpeak component will be activated for use on a particular computer. The number of available licenses in the license manager will be decreased by the number of activated licenses.
- Managing single computer licenses via a LAN or the Internet – The license manager allows the management of installation licenses for VeriSpeak components across multiple computers or mobile/embedded devices in a LAN or over the Internet. The number of managed licenses is limited by the number of licenses in the license manager. No license activation is required and the license quantity is not decreased. Once issued, the license is assigned to a specific computer or device on the network.
- Using license manager as a dongle – A volume license manager containing at least one license for a VeriSpeak component may be used as a dongle, allowing the VeriSpeak component to run on the particular computer where the dongle is attached.
Additional VeriSpeak component licenses for the license manager may be purchased at any time.
Licenses Validity
All SDK and component licenses are perpetual and do not have expiration. There are no annual fee or any other fees except license purchasing fee. It is possible to move licenses from one computer or device to another. Neurotechnology provides a way to renew the license if the computer undergoes changes due to technical maintenance.
The table below compares VeriSpeak 12.1 Standard SDK and VeriSpeak 12.1 Extended SDK. The list can be narrowed with filtering by certain requirements based on the target biometric system.
VeriSpeak SDK components and licenses | ||
---|---|---|
Component types | VeriSpeak 12.1 Standard SDK |
VeriSpeak 12.1 Extended SDK |
Voice component licenses included with a specific SDK: | ||
Voice Extractor | 1 single computer license | 1 single computer license |
Voice Matcher | 1 single computer license | 1 single computer license |
Voice Client | 3 single computer licenses | |
Mobile Voice Extractor | 1 single computer license | 1 single computer license |
Mobile Voice Matcher | 1 single computer license | 1 single computer license |
Mobile Voice Client | 3 single computer licenses | |
Matching Server | + |
VeriSpeak SDK includes programming samples and tutorials that show how to use the components of the SDK to perform voice template extraction or matching against other templates. The samples and tutorials are available for these programming languages and platforms:
Windows 32 & 64 bit | Linux 32 & 64 bit | macOS | Android | iOS | |
---|---|---|---|---|---|
Programming samples | |||||
C/C++ | + | + | + | ||
Objective-C | + | ||||
C# | + | ||||
Visual Basic .NET | + | ||||
Java | + | + | + | + | |
Programming tutorials | |||||
C | + | + | + | ||
C++ | + | + | + | ||
C# | + | ||||
Visual Basic .NET | + | ||||
Java | + | + | + | + |
There are specific requirements for each platform which will run VeriSpeak-based applications.
Microsoft Windows Platform Requirements
- 2 GHz or better processor is recommended.
- x86 (32-bit) processors can still be used, but the algorithm will not provide the specified performance.
- AVX2 support is highly recommended. Processors that do not support AVX2 will still run the VeriSpeak algorithms, but in a mode, which will not provide the specified performance. Most modern processors support this instruction set, but please check if a particular processor model supports it.
- Microsoft SQL Server;
- MySQL;
- Oracle;
- PostgreSQL;
- SQLite.
- Microsoft Visual Studio 2012 or newer (for application development under C/C++, C#, Visual Basic .Net)
- Java SE JDK 8 or newer
Android Platform Requirements
- If you have a custom Android-based device or development board, contact us to find out if it is supported.
- Java SE JDK 8 (or higher)
- AndroidStudio 4.0 IDE
- AndroidSDK 21+ API level
- Gradle 6.1.1 build automation system or newer
- Android Gradle Plugin 4.0.0
- Internet connection for activating VeriSpeak component licenses
iOS Platform Requirements
- iPhone 5S or newer iPhone.
- iPad Air or newer iPad models.
- a Mac running macOS 10.12.6 or newer.
- Xcode 9.x or newer.
macOS Platform Requirements
- 2 GHz or better processor is recommended.
- AVX2 support is highly recommended. Processors that do not support AVX2 will still run the VeriSpeak algorithms, but in a mode, which will not provide the specified performance. Most modern processors support this instruction set, but please check if a particular processor model supports it.
- XCode 6.x or newer
- GNU Make 3.81 or newer (to build samples and tutorials development)
- Java SE JDK 8 or newer
Linux x86-64 Platform Requirements
- 2 GHz or better processor is recommended.
- x86 (32-bit) processors can still be used, but the algorithm will not provide the specified performance.
- AVX2 support is highly recommended. Processors that do not support AVX2 will still run the VeriSpeak algorithms, but in a mode, which will not provide the specified performance. Most modern processors support this instruction set, but please check if a particular processor model supports it.
- MySQL;
- Oracle;
- PostgreSQL;
- SQLite.
- gcc 4.8 or newer
- GNU Make 3.81 or newer
- Java SE JDK 8 or newer
ARM Linux Platform Requirements
- ARMHF architecture (EABI 32-bit hard-float ARMv7) is required.
- Lower clock-rate processors may be also used, but the voiceprint processing will take longer time.
- gcc 4.8 or newer
- GNU Make 3.81 or newer
- Java SE JDK 8 or newer
- The speaker recognition accuracy of MegaMatcher depends on the audio quality during enrollment and identification.
- Voice samples of at least 2-seconds in length are recommended to assure speaker recognition quality.
- A passphrase should be kept secret and not spoken in an environment where others may hear it if the speaker recognition system is used in a scenario with unique phrases for each user.
- The text-independent speaker recognition may be vulnerable to attack with a covertly recorded phrase from a person. Passphrase verification or two-factor authentication (i.e. requirement to type a password) will increase the overall system security.
- The same microphone model is recommended (if possible) for use during both enrollment and recognition, as different models may produce different sound quality. Some models may also introduce specific noise or distortion into the audio, or may include certain hardware sound processing, which will not be present when using a different model. This is also the recommended procedure when using smartphones or tablets, as different device models may alter the recording of the voice in different ways.
- The same microphone position and distance is recommended during enrollment and recognition. Headsets provide optimal distance between user and microphone; this distance is recommended when non-headset microphones are used.
- Web cam built-in microphones should be used with care, as they are usually positioned at a rather long distance from the user and may provide lower sound quality. The sound quality may be affected if users subsequently change their position relative to the web cam.
- Settings for clear sound must be ensured; some audio software, hardware or drivers may have sound modification enabled by default. For example, the Microsoft Windows OS usually has, by default, sound boost enabled.
- A minimum 11025 Hz sampling rate, with at least 16-bit depth, should be used during voice recording.
- A quiet environment for enrollment and recognition.
- Several samples of the same phrase recorded in different environments can be stored in a biometric template. Later the user will be matched against these samples with much higher recognition quality.
- Close-range microphones (like those in headsets or smartphones) that are not affected by distant sources of sound.
- Third-party or custom solutions for background noise reduction, such as using two separate microphones for recording user voice and background sound, and later subtracting the background noise from the recording.
-
Natural voice changes may affect speaker recognition accuracy:
- a temporarily hoarse voice caused by a cold or other sickness;
- different emotional states that affect voice (i.e. a cheerful voice versus a tired voice);
- different pronunciation speeds during enrollment and identification.
-
The aforementioned voice and user behavior changes can be
managed in two ways:
- separate enrollments for the altered voice, storing the records in the same person's template;
- a controlled, neutral voice during enrollment and identification.
All voice templates should be loaded into RAM before identification, thus the maximum voice template database size is limited by the amount of available RAM.
The voiceprint template size has linear dependence on the voice sample length. For example, when using voice samples that are 2 times shorter, the template size values will be 2 times smaller.
VeriSpeak biometric template extraction and matching algorithm is designed to run on multi-core processors allowing to reach maximum possible performance on the used hardware.
VeriSpeak 12.1 text-dependent voiceprint engine specifications | ||||
---|---|---|---|---|
Android-based platform |
PC-based platform |
|||
Template extraction components | Mobile Voice Extractor |
Mobile Voice Client |
Voice Extractor |
Voice Client |
Template extraction time (seconds) | 1.34 (1) | 1.20 (1) | 1.34 (2) | 0.60 (2) |
Template matching components | Mobile Voice Matcher | Voice Matcher | ||
Template matching speed fixed phrase mode (voiceprints per second) |
100 (1) | 8,000 (2) | ||
Template matching speed unique phrase mode (voiceprints per second) |
20 (1) | 1,700 (2) | ||
Single voiceprint record size in a template, when 5 second long voice samples used (bytes) | 3,500 - 4,500 |
Notes:
(1) Requires to be run on Android devices based on at least
Snapdragon S4 system-on-chip with Krait 300 processor (4
cores, 1.51 GHz).
(2) Requires to be run on PC or laptop with at least Intel Core
i7-8700K processor.
The VeriSpeak 12.1 algorithm has been tested with voice samples taken from the XM2VTS Database, as well as with voice samples from Neurotechnology's internal database.
These voice template matching experiments were performed with the VeriSpeak 12.1 text-dependent engine:
Click to zoom
Click to zoom
Receiver operation characteristic (ROC) curves are usually used to demonstrate the recognition quality of an algorithm. ROC curves show the dependence of false rejection rate (FRR) on the false acceptance rate (FAR). Charts with ROC curves for each of the experiments are available above.
VeriSpeak 12.1 text-dependent algorithm tests with XM2VTS and Neurotechnology's internal databases | |||
---|---|---|---|
Exp. 1 | Exp. 2 | Exp. 3 | |
Total voice samples in the database | 2360 | 309 | 305 |
Subjects in the database | 295 | 42 | 42 |
Recording sessions per subject | 8 | 1 - 10 | 1 - 10 |
Average voice sample length (seconds) | 7.112 | 4.975 | 6.214 |
EER | 0.5647 % | 1.1750 % | 0.1720 % |
FRR at 0.1 % FAR | 1.5630 % | 3.2000 % | 0.2854 % |
FRR at 0.01 % FAR | 4.1560 % | 6.9110 % | 0.3806 % |
FRR at 0.001 % FAR | 11.230 % | 8.3490 % | 0.4757 % |