ASR Dysarthria

Automatic speech recognition for people with dysarthria

This repo is under heavy research and development and so the README.md is outdated. Sorry!

I deployed a web page so you can use a model in your browser: https://asr-dysarthria-preliminary.pages.dev/

Training

Use this Jupyter Notebook wav2vec2-large-xls-r-300m-dysarthria-big-dataset.ipynb to train your own model

Installation

Prerequisities:

Python >= 3.10
Anaconda

Steps:

conda install --file requirements.txt

Inference

In directory cli-app:

Run model.safetensors: python -m run

Run ONNX: python -m onnx_run

Adjust these scripts if needed (by default they translate a file.wav file in cli-app folder)

Deploying

Download and convert trained model (model.safetensors file)

mkdir models
python scripts/convert_model.py --url https://huggingface.co/jmaczan/wav2vec2-large-xls-r-300m-dysarthria-big-dataset/resolve/main/model.safetensors --output models

Serve it

cd web-app
python -m http.server

Pretrained models

[Recommended] Loss: 0.0864, Wer: 0.182 https://huggingface.co/jmaczan/wav2vec2-large-xls-r-300m-dysarthria-big-dataset
Loss: 0.0615 Wer: 0.1764 https://huggingface.co/jmaczan/wav2vec2-large-xls-r-300m-dysarthria

Datasets

Uaspeech https://huggingface.co/datasets/Vinotha/uaspeechall
TORGO https://huggingface.co/datasets/jmaczan/TORGO

Description

The code here is based on Patrick von Platen's article and notebook https://huggingface.co/blog/fine-tune-xlsr-wav2vec2

Resources

Dataset

Big

https://huggingface.co/datasets/jmaczan/TORGO

Small

https://huggingface.co/datasets/jmaczan/TORGO-very-small

Others

https://ai.meta.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/

https://pytorch.org/audio/stable/tutorials/speech_recognition_pipeline_tutorial.html

https://huggingface.co/docs/datasets/v2.16.1/audio_dataset

https://distill.pub/2017/ctc/

https://ai.meta.com/blog/self-supervision-and-building-more-robust-speech-recognition-systems/

License

MIT License

Author

Jędrzej Paweł Maczan

https://huggingface.co/jmaczan | jed@maczan.pl | https://github.com/jmaczan

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

ASR Dysarthria

Training

Installation

Inference

Deploying

Pretrained models

Datasets

Description

Resources

Papers

Code

Data

Dataset

Big

Small

Others

License

Author

Files

README.md

Latest commit

History

README.md

File metadata and controls

ASR Dysarthria

Training

Installation

Inference

Deploying

Pretrained models

Datasets

Description

Resources

Papers

Code

Data

Dataset

Big

Small

Others

License

Author