FREE v0.1.0

CPlusVoxType

System-Wide Voice to Text for Windows

Press a hotkey, speak, and your words are typed into whatever application is focused — text editors, browsers, chat apps, anything. By default, CPlusVoxType runs local AI speech recognition on your machine using Whisper models via whisper.cpp — no internet connection required, no usage fees, and your voice data never leaves your computer. Optional cloud integrations with OpenAI and Mistral are available if you prefer server-side transcription; when enabled, audio is sent to the selected provider and is subject to their terms of service.

Local AI Recognition System-Wide Dictation GPU Accelerated Privacy First Cloud API Fallback Windows 10/11

Download CPU Edition (~22 MB) Download CUDA Edition (~427 MB) Which version do I need?

Which version should I download?

Both installers are fully self-contained — no additional runtime, framework, or driver installation required.

CPU Edition

~22 MB download

Works on every Windows PC. Choose this if you do not have an NVIDIA GPU, or if you have an older or low-end NVIDIA card (GT/GTX 700 series or earlier).

Download CPU Edition

CUDA Edition

~427 MB download

Optimised for modern NVIDIA GPUs (GTX 900 series or newer). Transcription is typically 5–10x faster than CPU, particularly with larger models. Bundles the NVIDIA CUDA runtime libraries. Up-to-date NVIDIA drivers are required.

Download CUDA Edition

Not sure? Start with the CPU edition — it works on every machine. You can switch to CUDA later by installing over the top; the installer handles the upgrade automatically.

Getting Started

Up and running in four steps.

Install

Run the .msi installer. A standard Windows install wizard guides you through setup. After installation, a Start Menu shortcut is created and the app appears in Add/Remove Programs for easy management.

Download a Whisper Model

On first launch, open Settings > Models and download a model. Models run entirely on your machine via the whisper.cpp inference engine — once downloaded, they work offline with no API keys or accounts needed.

Set Up Your Microphone

Go to Settings > Microphone. Select your input device, verify audio capture with the built-in test, and adjust VAD sensitivity to suit your environment.

Start Dictating

Press Ctrl+Shift+Space and speak. Your speech is transcribed and delivered to the active application. Two modes are available: push-to-talk (hold to record) and toggle (press to start/stop). The hotkey is fully customisable in Settings.

Choose Your Model

CPlusVoxType supports six OpenAI Whisper model sizes. Models are stored locally and work entirely offline once downloaded.

Model	Size	Best For
Tiny	75 MB	Quick notes, simple dictation
Base	142 MB	General everyday use
Small	466 MB	Mixed accents, technical terms
Medium	1.5 GB	Professional dictation
Large v3	3.1 GB	Maximum quality
Distil Large v3 Recommended	800 MB	Best balance of speed and accuracy

For CPU Users

Tiny / Base — Best for older or low-end hardware. Typically 1–3 seconds per sentence.
Small — Good middle ground on modern CPUs (Intel 10th gen / Ryzen 3000+). Typically 3–6 seconds per sentence with noticeably better accuracy.
Distil Large v3 — Recommended. Its distilled architecture delivers speed comparable to the Small model with significantly better accuracy on modern CPUs.
Medium / Large v3 — Usable but slower. A 10-second utterance may take 15–30+ seconds. Best where accuracy is the priority and wait times are acceptable.

For CUDA (GPU) Users

Distil Large v3 — Recommended. Near real-time transcription on most NVIDIA GPUs. A 10-second clip typically processes in 1–2 seconds.
Large v3 — Highest accuracy. With GPU acceleration the speed trade-off is manageable (typically 3–5 seconds for a 10-second clip). Ideal for complex vocabulary, heavy accents, or multilingual use.
Smaller models — Functional, but there is little reason to use them with a GPU since larger models run fast enough with CUDA and deliver better results.

In short: Use Distil Large v3 regardless of your setup. It offers the best trade-off between speed and accuracy for both CPU and GPU users. Consider Tiny or Base only if you are on very old hardware, or Large v3 if you have a GPU and require the absolute highest accuracy.

System Requirements

CPU Edition

Component	Minimum	Recommended
OS	Windows 10 64-bit	Windows 11
CPU	Any x86-64 (AVX2)	Intel 10th gen / Ryzen 3000+
RAM	2 GB free	4 GB+ (8 GB+ for large models)
Disk	100 MB + model size	1 GB+
GPU	Not required

CUDA Edition

Component	Minimum	Recommended
OS	Windows 10 64-bit	Windows 11
CPU	Any x86-64 (AVX2)	Intel 10th gen / Ryzen 3000+
RAM	4 GB free	8 GB+
GPU	NVIDIA GTX 900+	RTX 3060+ (6 GB+ VRAM)
VRAM	2 GB	4 GB+ (6 GB+ for Large v3)
Disk	500 MB + model size	1.5 GB+
Drivers	Up-to-date NVIDIA (Game Ready or Studio)

VRAM note: The selected model must fit in your GPU's video memory. The Distil Large v3 (800 MB) runs comfortably with 4 GB of VRAM. The full Large v3 (3.1 GB) needs 6 GB or more. If a model does not fit, CPlusVoxType falls back to CPU processing automatically.

Cloud APIs (Optional)

Cloud APIs are entirely optional. The local Whisper models provide excellent transcription quality at no cost and with full privacy. You do not need an API key to use CPlusVoxType.

For users who prefer to offload processing to a server, integrations with OpenAI (Whisper API and GPT-4o Transcribe) and Mistral AI (Voxtral API) are available. Both providers charge per minute of audio — typical costs for casual dictation are in the region of $2–5 per month.

Configure cloud APIs in Settings > Cloud APIs. An API key from the respective provider is required. Cloud API usage is subject to OpenAI's and Mistral AI's own terms of service and pricing. CPlusVoxType is not affiliated with or endorsed by either provider.

Settings at a Glance

Setting	Default
Hotkey	`Ctrl+Shift+Space`
Recording mode	Push-to-talk
Input device	System default
Whisper model	Distil Large v3
Transcription source	Local
Theme	Dark
Sound feedback	Enabled

All settings persist automatically to %APPDATA%\CPlusVoxType\CPlusVoxType.ini

Legal Notices

This software is proprietary. The source code is not available for redistribution, modification, or derivative works.

CPlusVoxType is currently provided free of charge. Future updates may introduce a paid subscription or licensing model. Continued use of updated versions may require a paid plan.

NO WARRANTY. This software is provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose, and noninfringement. In no event shall the authors or copyright holders be liable for any claim, damages, or other liability, whether in an action of contract, tort, or otherwise, arising from, out of, or in connection with the software or the use or other dealings in the software. You use this software entirely at your own risk.

Cloud API usage (OpenAI, Mistral) is subject to those providers' own terms of service and pricing. CPlusVoxType is not affiliated with or endorsed by OpenAI or Mistral AI.

CPlusVoxType uses OpenAI Whisper models via the whisper.cpp inference engine. These are open-weight models licensed under their respective terms.

Ready to dictate?

Download CPlusVoxType free for Windows. Local AI speech recognition — no accounts, no subscriptions, no data collection.

CPU Edition (~22 MB) CUDA Edition (~427 MB)