CPlusVoxType
System-Wide Voice to Text for Windows
Press a hotkey, speak, and your words are typed into whatever application is focused — text editors, browsers, chat apps, anything. By default, CPlusVoxType runs local AI speech recognition on your machine using Whisper models via whisper.cpp — no internet connection required, no usage fees, and your voice data never leaves your computer. Optional cloud integrations with OpenAI and Mistral are available if you prefer server-side transcription; when enabled, audio is sent to the selected provider and is subject to their terms of service.
Which version should I download?
Both installers are fully self-contained — no additional runtime, framework, or driver installation required.
CPU Edition
~22 MB download
Works on every Windows PC. Choose this if you do not have an NVIDIA GPU, or if you have an older or low-end NVIDIA card (GT/GTX 700 series or earlier).
Download CPU EditionCUDA Edition
~427 MB download
Optimised for modern NVIDIA GPUs (GTX 900 series or newer). Transcription is typically 5–10x faster than CPU, particularly with larger models. Bundles the NVIDIA CUDA runtime libraries. Up-to-date NVIDIA drivers are required.
Download CUDA EditionNot sure? Start with the CPU edition — it works on every machine. You can switch to CUDA later by installing over the top; the installer handles the upgrade automatically.
Getting Started
Up and running in four steps.
Install
Run the .msi installer. A standard Windows install wizard guides you through setup. After installation, a Start Menu shortcut is created and the app appears in Add/Remove Programs for easy management.
Download a Whisper Model
On first launch, open Settings > Models and download a model. Models run entirely on your machine via the whisper.cpp inference engine — once downloaded, they work offline with no API keys or accounts needed.
Set Up Your Microphone
Go to Settings > Microphone. Select your input device, verify audio capture with the built-in test, and adjust VAD sensitivity to suit your environment.
Start Dictating
Press Ctrl+Shift+Space and speak. Your speech is transcribed and delivered to the active application. Two modes are available: push-to-talk (hold to record) and toggle (press to start/stop). The hotkey is fully customisable in Settings.
Choose Your Model
CPlusVoxType supports six OpenAI Whisper model sizes. Models are stored locally and work entirely offline once downloaded.
| Model | Size | Best For |
|---|---|---|
| Tiny | 75 MB | Quick notes, simple dictation |
| Base | 142 MB | General everyday use |
| Small | 466 MB | Mixed accents, technical terms |
| Medium | 1.5 GB | Professional dictation |
| Large v3 | 3.1 GB | Maximum quality |
| Distil Large v3 Recommended | 800 MB | Best balance of speed and accuracy |
For CPU Users
- Tiny / Base — Best for older or low-end hardware. Typically 1–3 seconds per sentence.
- Small — Good middle ground on modern CPUs (Intel 10th gen / Ryzen 3000+). Typically 3–6 seconds per sentence with noticeably better accuracy.
- Distil Large v3 — Recommended. Its distilled architecture delivers speed comparable to the Small model with significantly better accuracy on modern CPUs.
- Medium / Large v3 — Usable but slower. A 10-second utterance may take 15–30+ seconds. Best where accuracy is the priority and wait times are acceptable.
For CUDA (GPU) Users
- Distil Large v3 — Recommended. Near real-time transcription on most NVIDIA GPUs. A 10-second clip typically processes in 1–2 seconds.
- Large v3 — Highest accuracy. With GPU acceleration the speed trade-off is manageable (typically 3–5 seconds for a 10-second clip). Ideal for complex vocabulary, heavy accents, or multilingual use.
- Smaller models — Functional, but there is little reason to use them with a GPU since larger models run fast enough with CUDA and deliver better results.
In short: Use Distil Large v3 regardless of your setup. It offers the best trade-off between speed and accuracy for both CPU and GPU users. Consider Tiny or Base only if you are on very old hardware, or Large v3 if you have a GPU and require the absolute highest accuracy.
System Requirements
CPU Edition
| Component | Minimum | Recommended |
|---|---|---|
| OS | Windows 10 64-bit | Windows 11 |
| CPU | Any x86-64 (AVX2) | Intel 10th gen / Ryzen 3000+ |
| RAM | 2 GB free | 4 GB+ (8 GB+ for large models) |
| Disk | 100 MB + model size | 1 GB+ |
| GPU | Not required | |
CUDA Edition
| Component | Minimum | Recommended |
|---|---|---|
| OS | Windows 10 64-bit | Windows 11 |
| CPU | Any x86-64 (AVX2) | Intel 10th gen / Ryzen 3000+ |
| RAM | 4 GB free | 8 GB+ |
| GPU | NVIDIA GTX 900+ | RTX 3060+ (6 GB+ VRAM) |
| VRAM | 2 GB | 4 GB+ (6 GB+ for Large v3) |
| Disk | 500 MB + model size | 1.5 GB+ |
| Drivers | Up-to-date NVIDIA (Game Ready or Studio) | |
VRAM note: The selected model must fit in your GPU's video memory. The Distil Large v3 (800 MB) runs comfortably with 4 GB of VRAM. The full Large v3 (3.1 GB) needs 6 GB or more. If a model does not fit, CPlusVoxType falls back to CPU processing automatically.
Cloud APIs (Optional)
Cloud APIs are entirely optional. The local Whisper models provide excellent transcription quality at no cost and with full privacy. You do not need an API key to use CPlusVoxType.
For users who prefer to offload processing to a server, integrations with OpenAI (Whisper API and GPT-4o Transcribe) and Mistral AI (Voxtral API) are available. Both providers charge per minute of audio — typical costs for casual dictation are in the region of $2–5 per month.
Configure cloud APIs in Settings > Cloud APIs. An API key from the respective provider is required. Cloud API usage is subject to OpenAI's and Mistral AI's own terms of service and pricing. CPlusVoxType is not affiliated with or endorsed by either provider.
Settings at a Glance
| Setting | Default |
|---|---|
| Hotkey | Ctrl+Shift+Space |
| Recording mode | Push-to-talk |
| Input device | System default |
| Whisper model | Distil Large v3 |
| Transcription source | Local |
| Theme | Dark |
| Sound feedback | Enabled |
All settings persist automatically to %APPDATA%\CPlusVoxType\CPlusVoxType.ini
Legal Notices
© 2026 CPlusVoxType Contributors. All rights reserved.
This software is proprietary. The source code is not available for redistribution, modification, or derivative works.
CPlusVoxType is currently provided free of charge. Future updates may introduce a paid subscription or licensing model. Continued use of updated versions may require a paid plan.
NO WARRANTY. This software is provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose, and noninfringement. In no event shall the authors or copyright holders be liable for any claim, damages, or other liability, whether in an action of contract, tort, or otherwise, arising from, out of, or in connection with the software or the use or other dealings in the software. You use this software entirely at your own risk.
Cloud API usage (OpenAI, Mistral) is subject to those providers' own terms of service and pricing. CPlusVoxType is not affiliated with or endorsed by OpenAI or Mistral AI.
CPlusVoxType uses OpenAI Whisper models via the whisper.cpp inference engine. These are open-weight models licensed under their respective terms.
Ready to dictate?
Download CPlusVoxType free for Windows. Local AI speech recognition — no accounts, no subscriptions, no data collection.