Skip to content

jmaczan/asr-dysarthria

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ASR Dysarthria

Automatic speech recognition for people with dysarthria

This repo is under heavy research and development and so the README.md is outdated. Sorry!

I deployed a web page so you can use a model in your browser: https://asr-dysarthria-preliminary.pages.dev/

Training

Use this Jupyter Notebook wav2vec2-large-xls-r-300m-dysarthria-big-dataset.ipynb to train your own model

Installation

Prerequisities:

  • Python >= 3.10
  • Anaconda

Steps:

  • conda install --file requirements.txt

Inference

In directory cli-app:

Run model.safetensors: python -m run

Run ONNX: python -m onnx_run

Adjust these scripts if needed (by default they translate a file.wav file in cli-app folder)

Deploying

Download and convert trained model (model.safetensors file)

mkdir models
python scripts/convert_model.py --url https://huggingface.co/jmaczan/wav2vec2-large-xls-r-300m-dysarthria-big-dataset/resolve/main/model.safetensors --output models

Serve it

cd web-app
python -m http.server

Pretrained models

Datasets

Description

The code here is based on Patrick von Platen's article and notebook https://huggingface.co/blog/fine-tune-xlsr-wav2vec2

Resources

Papers

https://ar5iv.labs.arxiv.org/html/2204.00770 (https://arxiv.org/abs/2204.00770)

https://www.isca-speech.org/archive/pdfs/interspeech_2022/baskar22b_interspeech.pdf

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10225595

https://www.sciencedirect.com/science/article/pii/S2405959521000874

https://www.isca-speech.org/archive/pdfs/interspeech_2021/green21_interspeech.pdf

https://arxiv.org/pdf/2006.11477.pdf

https://arxiv.org/pdf/2211.00089.pdf

https://www.sciencedirect.com/science/article/abs/pii/S0957417423002981

Code

https://huggingface.co/blog/fine-tune-wav2vec2-english

Data

http://www.cs.toronto.edu/~complingweb/data/TORGO/torgo.html

Dataset

Big

https://huggingface.co/datasets/jmaczan/TORGO

Small

https://huggingface.co/datasets/jmaczan/TORGO-very-small

Others

https://ai.meta.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/

https://pytorch.org/audio/stable/tutorials/speech_recognition_pipeline_tutorial.html

https://huggingface.co/docs/datasets/v2.16.1/audio_dataset

https://distill.pub/2017/ctc/

https://ai.meta.com/blog/self-supervision-and-building-more-robust-speech-recognition-systems/

License

MIT License

Author

Jędrzej Paweł Maczan

https://huggingface.co/jmaczan | [email protected] | https://github.com/jmaczan