Skip to content

Latest commit

 

History

History
122 lines (66 loc) · 2.87 KB

README.md

File metadata and controls

122 lines (66 loc) · 2.87 KB

ASR Dysarthria

Automatic speech recognition for people with dysarthria

This repo is under heavy research and development and so the README.md is outdated. Sorry!

I deployed a web page so you can use a model in your browser: https://asr-dysarthria-preliminary.pages.dev/

Training

Use this Jupyter Notebook wav2vec2-large-xls-r-300m-dysarthria-big-dataset.ipynb to train your own model

Installation

Prerequisities:

  • Python >= 3.10
  • Anaconda

Steps:

  • conda install --file requirements.txt

Inference

In directory cli-app:

Run model.safetensors: python -m run

Run ONNX: python -m onnx_run

Adjust these scripts if needed (by default they translate a file.wav file in cli-app folder)

Deploying

Download and convert trained model (model.safetensors file)

mkdir models
python scripts/convert_model.py --url https://huggingface.co/jmaczan/wav2vec2-large-xls-r-300m-dysarthria-big-dataset/resolve/main/model.safetensors --output models

Serve it

cd web-app
python -m http.server

Pretrained models

Datasets

Description

The code here is based on Patrick von Platen's article and notebook https://huggingface.co/blog/fine-tune-xlsr-wav2vec2

Resources

Papers

https://ar5iv.labs.arxiv.org/html/2204.00770 (https://arxiv.org/abs/2204.00770)

https://www.isca-speech.org/archive/pdfs/interspeech_2022/baskar22b_interspeech.pdf

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10225595

https://www.sciencedirect.com/science/article/pii/S2405959521000874

https://www.isca-speech.org/archive/pdfs/interspeech_2021/green21_interspeech.pdf

https://arxiv.org/pdf/2006.11477.pdf

https://arxiv.org/pdf/2211.00089.pdf

https://www.sciencedirect.com/science/article/abs/pii/S0957417423002981

Code

https://huggingface.co/blog/fine-tune-wav2vec2-english

Data

http://www.cs.toronto.edu/~complingweb/data/TORGO/torgo.html

Dataset

Big

https://huggingface.co/datasets/jmaczan/TORGO

Small

https://huggingface.co/datasets/jmaczan/TORGO-very-small

Others

https://ai.meta.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/

https://pytorch.org/audio/stable/tutorials/speech_recognition_pipeline_tutorial.html

https://huggingface.co/docs/datasets/v2.16.1/audio_dataset

https://distill.pub/2017/ctc/

https://ai.meta.com/blog/self-supervision-and-building-more-robust-speech-recognition-systems/

License

MIT License

Author

Jędrzej Paweł Maczan

https://huggingface.co/jmaczan | [email protected] | https://github.com/jmaczan