Speech-to-Text (STT) is a Digital Suite module that converts audio into structured text for notes, reports, support logs, and searchable records. It sits after audio upload and before text review or reuse, helping websites turn voice input into cleaner output for documentation and workflow support.
- Technical context: This workflow includes audio input, speech recognition, text cleanup, and transcript export.
- Technical benefit: It reduces manual note-taking, improves review, and makes spoken information reusable.
In 2024, while building the Text-to-Speech module, I realized a complete media web system needed the reverse flow too. This Speech-to-Text module converts speech audio into structured text using Conda, Torch, CUDA, and GPU acceleration for heavier transcription. It later became a blueprint for scalable AI-driven audio-to-text processing. Common use cases come next.
conda create -n module_stt python=3.10 -y
conda activate module_sttpip install flask flask-cors python-dotenv torch transformers librosa soundfileunzip module_stt.zip -d <Project_Path>
cd <Project_Path>/module_stt
cp .env.example .env
MODULE_STT_HOST=0.0.0.0
MODULE_STT_PORT=<YOUR_PORT>
MODULE_STT_CONDA_ENV=<YOUR_CONDA_ENV_PATH>
MODULE_STT_CACHE=<YOUR_MODEL_CACHE_PATH>
MODULE_STT_BETWEEN_DIR=<Project_Path>\module_stt\between
MODULE_STT_OUTPUT_DIR=<Project_Path>\module_stt\output
MODULE_STT_BETWEEN_AUDIO_PATH=<Project_Path>\module_stt\between\between.wav
MODULE_STT_TRANSCRIPT_OUTPUT_PATH=<Project_Path>\module_stt\output\transcript.txt
MODULE_STT_DEFAULT_AUDIO_PATH=<Project_Path>\module_stt\input\input.wav
MODULE_STT_MODEL_NAME=khanhld/wav2vec2-base-vietnamese-160hpython home.py # starts the Flask module service directly
# or
python run.py # starts the module through the configured runtime settings
curl "http://127.0.0.1:<YOUR_PORT>/wav2vec2?path=<Project_Path>/python/module_tts/tts/wav2vec2/run.wav"

After the technical overview above, this guide explains how to use the Speech-to-Text module with short voice recordings and supported audio files.
Use the section below to experience the module directly. Start with a short recording, then adjust the file length based on your workflow needs.
Use the steps below to quickly test this module with your real content.
Upload audio to convert it into text
Supported formats: MP3, WAV, M4A, OGG
Duration limit: up to 5 minutes.
No file selected
Readers can use this module pattern to turn spoken updates into a more structured text workflow for reporting, documentation, and support tasks. In real projects, that helps reduce manual note-taking, keep transcription handling more consistent, and support stable operation across integrated content flows.
This Speech-to-Text module combines controlled audio input handling, a reusable transcription path, and practical output processing into one maintainable service layer. It keeps the transcription workflow more consistent within the broader system architecture.