The Duolingo for speech therapy–we're making better speech accessible to all, without the crazy costs.
1 in 14 US children and over 3 million US adults have some sort of speech impediment. However, solutions to mitigate these impediments often have extremely high costs, with many services being over $1000 per month. We want to fix that for those who need it. Talky is an open-source, gamified speech evaluator to help you improve your speaking and pronounciation. With lessons and targeted practice exercises covering all common phoneme mispronounciations, and a scientifically-backed method for giving a detailed phoneme-level pronounciation score to each sentence spoken.
Talky is made with many different technologies. Here are our major ones:
- React frontend (Vite, Three.js, Auth0, TailWind CSS)
- Flask backend (MongoDB, REST API)
- Goodness of Pronounciation Algorithm (this Huggingface model, this algorithm for evaluation, and the forced alignment algorithm)
- SpeechSynthesis by Mozilla
Here are some of the features of Talky! More are currently being implemented!
Talky has lessons which unlock when the previous lesson is completed. The subsequent lesson is then generated with content and exercises prioritizing the user's weakest phonemes.
For each lesson, the user will walk our robot mascot, Talky, to the end flag. When a sentence is pronounced well enough, the robot will advance forward in our 3D terrain.
Talky will use the Goodness of Pronounciation algorithm to provide a detailed phoneme-level score analysis for the user. Each phoneme is colored with a different label depending on how well that phoneme was pronounced. A more precise score is also seen when hovering over each phoneme.
The user can also choose to target-practice individual phonemes in the word bank. For each commonly-mispronounced phoneme, there is an interactice page which contains cards with words for that specific phoneme.
For fun, a card will be randomly selected for the user to practice, and a standard pronounciation will be generated according to the user's selected voice accent.
Thanks for trying out Talky! While we had (will soon be reinstated) deployed link, you can also try out Talky locally. To do this, here are some dependencies you would need:
Next, clone the repository create a virtual environment. Then, install required packages using:
pip install -r requirements.txt
You should run the frontend folder (talky-app) in a terminal, with
npm run dev
And run the server (server folder) in another terminal, with
python main.py
You should also create .env files in /server and /talky-app. The environment variables in /server should be
GROQ_API_KEY=YOUR_GROQ_API_KEY
DB_NAME=YOUR_MONGO_DB_NAME
MONGO_USERNAME=YOUR_MONGO_USERNAME_db_user
MONGO_PASSWORD=YOUR_MONGO_DB_PASSWORD
MONGO_URI=YOUR_MONGO_URI
The environment variables in /talky-app should be
VITE_AUTH0_DOMAIN=YOUR_AUTH0_DOMAIN
VITE_AUTH0_CLIENT_ID=YOUR_AUTH0_CLIENT_ID
VITE_AUTH0_AUDIENCE=YOUR_AUTH0_AUDIENCE
Hi! We really appreciate any contributions to this repository. When contributing, please follow these steps:
- Fork the repository
- Make a branch from main
- Write your code. The setup instructions are described above
- Write tests for that code
- Make a pull request to the parent repository with your changes Feel free to also check out issues for something to work on!
All licenses in this repository are copyrighted by their respective authors.
Everything else is released under CC0.
No Copyright
The person who associated a work with this deed has dedicated the work to the public domain by waiving all of his or her rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information below.
Other Information:
* In no way are the patent or trademark rights of any person affected
by CC0, nor are the rights that other persons may have in the work or in
how the work is used, such as publicity or privacy rights.
* Unless expressly stated otherwise, the person who associated a work with
this deed makes no warranties about the work, and disclaims liability for
all uses of the work, to the fullest extent permitted by applicable law.
* When using or citing the work, you should not imply endorsement
by the author or the affirmer.
http://creativecommons.org/publicdomain/zero/1.0/legalcode
@article{witt2000phone,
title={Phone-level pronunciation scoring and assessment for interactive language learning},
author={Witt, Steven M. and Young, Steve J.},
journal={Speech Communication},
volume={30},
number={2--3},
pages={95--108},
year={2000},
doi={10.1016/S0167-6393(99)00044-8}
}@misc { phy22-phoneme,
author = {Phy, Vitou},
title = {{Automatic Phoneme Recognition on TIMIT Dataset with Wav2Vec 2.0}},
year = 2022,
note = {{If you use this model, please cite it using these metadata.}},
publisher = {Hugging Face},
version = {1.0},
doi = {10.57967/hf/0125},
url = {https://huggingface.co/vitouphy/wav2vec2-xls-r-300m-timit-phoneme}
}