Ive made a couple of key changes and their is a new option for people who want to help with the sox implementation to make the speech recognition more continuous rather than chunk-based.
- The bug in the voicecommand -s hardware has been fixed.
- Allows multilingual support with !lang and !language
- Fixed casing bug when matching multiple variables
- Install, Uninstall, and Update scripts are now seperated by project. So now if you want to only update youtube, just run UpdateAUISuite.sh youtube
- tts and tts-nofill have been combined.
- Moving away from yt.js to browse youtube in the browser. Now adding node.js youtube browsing API. See https://github.com/StevenHickson/RaspberryPiTV
- Building https://github.com/StevenHickson/RaspberryPiTV to work with voicecommand and adding omxcontrols using https://github.com/StevenHickson/omxplayer_fifo
- With the above, this allows a control panel that can control videos, play pandora, browse youtube, control music, and run voicecommand. Note that this is in beta and will require a lot of manual installation as their is no installation or readme yet (Hopefully soon to come).
- Added youtube-dl cron update so that youtube-dl updates automatically every night. Often if someone says the youtube script doesnt work, it is because youtube-dl is out of date and YouTube has updated their security algorithms. Running sudo youtube-dl -U often fixes this problem.
- Added an option in speech-recog.sh to use sox instead of arecord. Simply uncomment out the sox portion and comment the arecord portion in /usr/bin/speech-recog.sh as below:
at
sox -r 16000 -t alsa $hardware /dev/shm/out.flac silence 1 0.3 1% 1 0.5 1%
wget -q -U "rate=16000" -O - --post-file /dev/shm/out.flac --header="Content-Type: audio/x-flac; rate=16000" "http://www.google.com/speech-api/v1/recognize?lang=en&client=Mozilla/5.0" | sed -e s/[{}]//g| awk -v k="text" {n=split($0,a,","); for (i=1; i<=n; i++) print a[i]; exit } | awk -F: NR==3 { print $3; exit }
#arecord -D $hardware -f cd -t wav -d $duration -r 16000 | flac - -f --best --sample-rate 16000 -o /dev/shm/out.flac 1>/dev/shm/voice.log 2>/dev/shm/voice.log; wget -O - -o /dev/null --post-file /dev/shm/out.flac --header="Content-Type: audio/x-flac; rate=16000" http://www.google.com/speech-api/v1/recognize?lang="$lang" | sed -e s/[{}]//g| awk -v k="text" {n=split($0,a,","); for (i=1; i<=n; i++) print a[i]; exit } | awk -F: NR==3 { print $3; exit }
rm /dev/shm/out.flac
Please let me know how this works for people so I can debug and get this working permanently.
As always, you can find the install, update, and new YouTube videos at my YouTube channel here:
https://www.youtube.com/channel/UCxa9JQjCl8ij_7za1_sRCVQ/videos
If you are wondering why Ive been so quiet, its because I moved, started grad school at Georgia Tech, and have been doing a technical review for a computer vision book.
Since Im a poor graduate student, please support my tinkering:
0 comments:
Post a Comment