Model guide
Detailed information on model capabilities, supported sample rates, latency characteristics, and how to choose the right model for your use case.
Read the model guideBrowse and download the latest ai-coustics inference artifacts. Pick the model that matches your version, language and use case.
Last updated 2026-06-10 14:22 UTC
We'll only show models compatible with your SDK and version.
Pick the task that matches your use case. Enhancement also exposes VAD; VAD-only models are leaner. Audio Insight models extract additional signals from audio.
Quail Voice Focus is optimized for near-field, single-user voice interactions, prioritizing speech close to the microphone while suppressing background noise and distant speech to improve downstream Speech-to-Text (STT) performance. It features an intelligent warm-up period that listens for a moment to confidently identify the primary speaker before full suppression kicks in, prioritizing accuracy over speed.
| Model | Version | Download | SHA-256 |
|---|---|---|---|
| quail-vf-l-16khz | v1 | quail_vf_l_16khz_jc5pk1aa_v17.aicmodel | 1509e36bd30ca296204515eb777dd3663c46a6d4531136599d9e8494e6e4fc27 |
| quail-vf-1.1-l-16khz | v1 | quail_vf_1_1_l_16khz_d00ghjzn_v15.aicmodel | e0337ec3388fd7be4cd0ab3758b62dd7afbd7817a15b406b17d0250b18d6340d |
| quail-vf-2.0-l-16khz | v2 | quail_vf_2_0_l_16khz_d42jls1e_v18.aicmodel | c33a73442e2598acfd2fdc88ca127d1e8ecea0941dc93e4d3e1169246941de6e |
| quail-vf-2.1-l-16khz | v3 | quail_vf_2_1_l_16khz_8xope536_v11.aicmodel | 04c1ef610feccf2e6f24af608c6cf3f73490ffd34c31a8000a898b24d70998a8 |
| quail-vf-2.1-l-16khz | v4 | quail_vf_2_1_l_16khz_8xope536_v11.aicmodel | fa1908c2a9557d019b16044cfcfb2cc63bea82811c598a2ebbc30411dff77fdd |
| quail-vf-2.1-l-16khz | v5 | quail_vf_2_1_l_16khz_8xope536_v11.aicmodel | e3f6cd3fda62767b601faf50d12dfb75cf1782960ec5a4df5cff32a00f331944 |
| quail-vf-2.1-s-16khz | v3 | quail_vf_2_1_s_16khz_5i8jb8of_v12.aicmodel | 4f46d47179a4505e0220d1c7e2ccc3041a59724dff0afdb9218507274b4b6616 |
| quail-vf-2.1-s-16khz | v4 | quail_vf_2_1_s_16khz_5i8jb8of_v12.aicmodel | cf251ecad62b043a9fbfe274fe25b8876c5ac1921bc9fb8a8c73c8cb9deada1d |
| quail-vf-2.1-s-16khz | v5 | quail_vf_2_1_s_16khz_5i8jb8of_v12.aicmodel | 7d0cc51114b77510601adfa0272ebd7918bf02fa36e7e32bdbf11c83b7af43d5 |
Standalone voice activity detection. Robust to noise, runs without a separate enhancement model.
| Model | Version | Download | SHA-256 |
|---|---|---|---|
| quail-vad-2.0-xxs-16khz | v4 | quail_vad_2_0_xxs_16khz_g6609la0_v21.aicmodel | 0caaa4b29864cf6b329e6fc92b3d626c93b79aff52dd4d45aa28d4862fecc2be |
| quail-vad-2.0-xxs-16khz | v5 | quail_vad_2_0_xxs_16khz_g6609la0_v21.aicmodel | 852005789c9b982f0786c896e6ea1f9ac72c8640f4ea456dceb58e5f54738929 |
Quail is purpose-built for far-field and multi-speaker environments, tuning its processing specifically to enhance downstream Speech-to-Text (STT) engine performance. It does not suppress distant-sounding speech, making it better suited for speakerphone setups, meeting rooms, or situations with multiple participants spread across a space.
| Model | Version | Download | SHA-256 |
|---|---|---|---|
| quail-l-16khz | v1 | quail_l_16khz_dtv5nvgu_v20.aicmodel | b3db85b6091949cf84d9d6cde1f150cbd74746252a0ae61241d739536b5ea362 |
| quail-l-16khz | v2 | quail_l_16khz_dtv5nvgu_v20.aicmodel | 3327953548d078d575394181f7df265fe5d32383a55e4fcdb8a08f2688c93549 |
| quail-l-16khz | v3 | quail_l_16khz_dtv5nvgu_v20.aicmodel | 8839706f3462adcef003181e70919cc5ca41fa62096634d005ba1fbc89f2f0f4 |
| quail-l-16khz | v4 | quail_l_16khz_dtv5nvgu_v20.aicmodel | 6572795dd68d41940b3bc78df4ccee243468e69db390343075d6574d229dc8f9 |
| quail-l-16khz | v5 | quail_l_16khz_dtv5nvgu_v20.aicmodel | 6f1aeeb545cef59055be1cb5f825889e84c8a16572c17c869acebd87fe1b1a3f |
| quail-l-8khz | v1 | quail_l_8khz_mp51agn0_v13.aicmodel | 9b5e3d094c32f417d7477b4acaa4ef26510dabbb20d6009a1795940dfee28a74 |
| quail-l-8khz | v2 | quail_l_8khz_mp51agn0_v13.aicmodel | fe26e46f03451791f592ff69972a7b845a9092ebc305062ed46a713571743738 |
| quail-l-8khz | v3 | quail_l_8khz_mp51agn0_v13.aicmodel | 689eeea2ddd54c1081e6c6bb5287060b631ea4d704fc285ef40af48d296cc445 |
| quail-l-8khz | v4 | quail_l_8khz_mp51agn0_v13.aicmodel | cdb0602d02e8c329fddd57eacc89fc2d823b33c1c2d477295f5d2afb460be85e |
| quail-l-8khz | v5 | quail_l_8khz_mp51agn0_v13.aicmodel | 6e7501e989c91e35d906112ffa9456d486515396c712675e3d79919594916bc5 |
| quail-s-16khz | v1 | quail_s_16khz_ilmexgyt_v8.aicmodel | d613cdd917a47296accb47c45bcc268994f032e1df636c504201099d61c51bf7 |
| quail-s-16khz | v2 | quail_s_16khz_ilmexgyt_v8.aicmodel | 027242aa1a5174a1eec7a77fdd60c0895c934bd24020cb57cd76117ab5c0f197 |
| quail-s-16khz | v3 | quail_s_16khz_ilmexgyt_v8.aicmodel | bbdcb6b0f43957ddc7b1ef150e31b906fd03e5732f41082bebfe81c52f690022 |
| quail-s-16khz | v4 | quail_s_16khz_ilmexgyt_v8.aicmodel | e6a6a873dadaa5e4d5df6135fd4e94be52b1c0d12a27fa24841a6f32addf651d |
| quail-s-16khz | v5 | quail_s_16khz_ilmexgyt_v8.aicmodel | 2f7da1c7d1e1e9b53220908b84d2fb25b0c34d1a3c8538db70080f4967a48c17 |
| quail-s-8khz | v1 | quail_s_8khz_9q57ojki_v20.aicmodel | 093103c9f17261d29d45de91e3506f412eb66c47ed5777eecdd8378a557d6746 |
| quail-s-8khz | v2 | quail_s_8khz_9q57ojki_v20.aicmodel | 80581fe9d63f1170357db24c2edd214e03701450d57b3befa5d92e59acdef2fb |
| quail-s-8khz | v3 | quail_s_8khz_9q57ojki_v20.aicmodel | 6e1f83c0ad0c341f253d77e5dda08088379e0937c349d9c8ccfbbf880a2e496e |
| quail-s-8khz | v4 | quail_s_8khz_9q57ojki_v20.aicmodel | de55b61a97e4febb7b9e5da7f472727be5ce98077962035a4a3c1fa745f95133 |
| quail-s-8khz | v5 | quail_s_8khz_9q57ojki_v20.aicmodel | e6096081912241879b5e97a5755b598faa0bbad1e18cfd7acb7507cf66515ca0 |
Speech enhancement for human-to-human communication. Reduces noise and reverberation while keeping speech natural.
| Model | Version | Download | SHA-256 |
|---|---|---|---|
| rook-l-16khz | v2 | rook_l_16khz_dtv5nvgu_v20.aicmodel | e72c5a7d6cc6f396daeff83ffb25be41318cc87a962e22a607963bebf3cfdcfc |
| rook-l-16khz | v1 | rook_l_16khz_dtv5nvgu_v20.aicmodel | ec8550a4f41f58beac13d2a576fdb36f62a89410192ce0f52ed43b8289b7970c |
| rook-l-16khz | v3 | rook_l_16khz_dtv5nvgu_v20.aicmodel | b90223799cac9f92bc92a1bb87c27b3b072a5c34897680b9953180fdca0610df |
| rook-l-16khz | v4 | rook_l_16khz_dtv5nvgu_v20.aicmodel | 2e25022720e41c15f3ec39b46a10dc117b4bf7803320fbc26bad1d4e8790c215 |
| rook-l-16khz | v5 | rook_l_16khz_dtv5nvgu_v20.aicmodel | b4df0e31be9e0bf1e8e29172be2ca1bbf7429abc42f8ce56cf61c1797fd0df64 |
| rook-l-48khz | v2 | rook_l_48khz_7n1r44ri_v23.aicmodel | ff8838cef6c6fa13681f2354f6f6d637f330a107c845854c09a4f2ed9837b7ac |
| rook-l-48khz | v1 | rook_l_48khz_7n1r44ri_v23.aicmodel | 04a63bf298821874ebef7ac558e66be7964c21baf651fb9638d557766353bbcb |
| rook-l-48khz | v3 | rook_l_48khz_7n1r44ri_v23.aicmodel | 51a41edd010809035deddaf3914abc6382fd2bb2592994ca8a514cc58de5deab |
| rook-l-48khz | v4 | rook_l_48khz_7n1r44ri_v23.aicmodel | 3ca83524c1499fc98399df5471dc209a3650cbf2977a34097683f7e10785d84d |
| rook-l-48khz | v5 | rook_l_48khz_7n1r44ri_v23.aicmodel | cb261875281ec668635c6ca7779a13dffde113c8f1033cdfce9155b0b55959dd |
| rook-l-8khz | v2 | rook_l_8khz_mp51agn0_v13.aicmodel | fe2b1c02e60b886d571ce40c05e3d215184450c6c7220acb2c3b4ded1b3f5e95 |
| rook-l-8khz | v1 | rook_l_8khz_mp51agn0_v13.aicmodel | e916757e356dd299bde6842a31c2e50ea11d81bdca0434a09c9d6e25f2ab5038 |
| rook-l-8khz | v3 | rook_l_8khz_mp51agn0_v13.aicmodel | 9a335d6ad7f456256e244d2594101b26f06c4cafd9a62f3d85e211782785515c |
| rook-l-8khz | v4 | rook_l_8khz_mp51agn0_v13.aicmodel | 111fecf8eaa5dea2aac2d41cd481b38edbe4df87c96633f915ee74a726396516 |
| rook-l-8khz | v5 | rook_l_8khz_mp51agn0_v13.aicmodel | 7cb2db7178c683fb720e9579f2bc6c88d683c8d864b8b7368d1e4c88888d60bb |
| rook-s-16khz | v2 | rook_s_16khz_ilmexgyt_v8.aicmodel | 3eff7b5c6874f06a3ced70844fe31705c35c646ff198aa3590ece6007eb7bc25 |
| rook-s-16khz | v1 | rook_s_16khz_ilmexgyt_v8.aicmodel | 43ef287f103839330e71f508f130339c5753b50ac1254436368168ab759faf2d |
| rook-s-16khz | v3 | rook_s_16khz_ilmexgyt_v8.aicmodel | 1952e587be74c2a409ab1795af1bb9e892f46ffcb254525a8109debae31f989e |
| rook-s-16khz | v4 | rook_s_16khz_ilmexgyt_v8.aicmodel | cdcb5e5b264301a341e6cdcf836112e8a02919106653e0ecd2da350e51c5e323 |
| rook-s-16khz | v5 | rook_s_16khz_ilmexgyt_v8.aicmodel | b20438883d3c21e97b73010da650f7faab104b7807773fa19b12eb29ee999cc4 |
| rook-s-48khz | v2 | rook_s_48khz_kg8kp9qm_v52.aicmodel | 0328e6f8f45a07c6b5a957a571a0565370622b9bf98c18b7ade45cbeb7cc24a1 |
| rook-s-48khz | v1 | rook_s_48khz_kg8kp9qm_v52.aicmodel | 291ca14c21b66e1acc3b6ca352e9452c5a04da2730f433ab9d49edf2528229ac |
| rook-s-48khz | v3 | rook_s_48khz_kg8kp9qm_v52.aicmodel | 35529310b22b132d3a30443a0d66a280cec407f3bb406a79dcb1d2dab7f975bb |
| rook-s-48khz | v4 | rook_s_48khz_kg8kp9qm_v52.aicmodel | bb60832a0954b09a7ddc9324b813005adf2794cc839684894bd74153cc48615c |
| rook-s-48khz | v5 | rook_s_48khz_kg8kp9qm_v52.aicmodel | 06ad93e8c6cfbb96819ec6b1de65e308d543f117d19f11b8a2e41c6f86e7168e |
| rook-s-8khz | v2 | rook_s_8khz_9q57ojki_v20.aicmodel | ccb1a220f1d1bc26426e03848e6bcb79183ab9f1022e9ae07e9ab531347809e2 |
| rook-s-8khz | v1 | rook_s_8khz_9q57ojki_v20.aicmodel | a97d221f192730532109125effdbb22da64cb07a66c473cf4e2f9a9559d94c68 |
| rook-s-8khz | v3 | rook_s_8khz_9q57ojki_v20.aicmodel | a8212ae3b9addffa06b56629c31dda4ad30fd9a30fbf7dece1f41319f3736d14 |
| rook-s-8khz | v4 | rook_s_8khz_9q57ojki_v20.aicmodel | 9cca1885bb2d8db024ef4f77fdf3bea6032f313a5f1eea380c49657ca9af3421 |
| rook-s-8khz | v5 | rook_s_8khz_9q57ojki_v20.aicmodel | abf4f50acb00a1c05d2a3be7848fb0152e8841379a16e77bbdd126d286ad4b5a |
| rook-xs-48khz | v1 | rook_xs_48khz_g4qwc6lv_v21.aicmodel | 63bffca7273040fc82b28a53eedfb69fdd4d9beacab5c3c13fa4d8a44fd65a1b |
| rook-xxs-48khz | v1 | rook_xxs_48khz_wsur2zkw_v7.aicmodel | 914b8deb45a49aa32e73828c2cc54472fda211df276a206d6691e7c428eb36aa |
Tyto is a lightweight model that runs on your audio stream and predicts whether the audio reaching your agent will cause downstream failures. It outputs a single score plus a breakdown across seven dimensions: noise, reverb, dropouts, codec, and more.
| Model | Version | Download | SHA-256 |
|---|---|---|---|
| tyto-l-16khz | v5 | tyto_l_16khz_yhlek4hc_v43.aicmodel | 8fb7cb977fb6f81b75fb27398eb1e4a5c619f7f95c252e17df75ef7c27faf336 |
Additional models that don't fit the categories above.
| Model | Version | Download | SHA-256 |
|---|---|---|---|
| bypass | v1 | bypass.aicmodel | 883fbd2d86e951cc4c4fac4af547c41dc61da385beed01084021c15f2c9f0c1e |
| bypass | v2 | bypass.aicmodel | 5f6808ec14d53534d78738d8e05bd2fa27953eb58b960240971307a19dfc9ecf |
| bypass | v3 | bypass.aicmodel | bcaffed11b4eaa6bf9ad656089b1192628c68df2ad56880d3aa8b1982e9f8d4a |
| bypass | v4 | bypass.aicmodel | f1041d8173dcbd7715bc668ae6baab6c43598bb6b95d63ddbbd6c19f2bd5f89b |
| bypass | v5 | bypass.aicmodel | 3239206fdbac28e89bae144a4f2c5a87528e6f4f912d470bb011c94f4b650e7a |
Documentation for picking the right model and getting the most out of it.
Detailed information on model capabilities, supported sample rates, latency characteristics, and how to choose the right model for your use case.
Read the model guideHow to use our models to improve downstream Speech-to-Text (STT) accuracy, including integration patterns and tuning tips.
Improve STT performance