The Shtooka Recorder is the easiest and fastest way to record sentences for the Tatoeba Project.
You can use other free software such as Audacity, but it's much, much slower and requires many extra steps to accomplish the same thing.
If you can't use the Shtooka Recorder, and must use Audacity, please read Using Audacity to Record for the Tatoeba Project.
Brief Outline - How to record for the Tatoeba Project
- Download the Shtooka Recorder
- Try recording a few sentences to make sure you know how it works.
- Write to email@example.com telling us that you're interested in recording for us. Tell us what your native language is. Tell us your tatoeba.org username if you have one.
- CK will create a list of sentences formatted for the Shtooka Recorder for you.
- Record a few of these and send them to firstname.lastname@example.org just to make sure everything is OK before you spend a lot of time recording many sentences.
- After that, you can easily record many sentences for us.
Step 1: Download the Shtooka Recorder (for Windows)
- Download Link: http://web.archive.org/web/20110621160436/http://shtooka.net/soft/kit_shtooka/kit_shtooka_Install_0.9.8.exe
The "Shtooka Recorder" is part of this package.
- This is a "Windows" program. (I use it with Windows XP.)
- This wouldn't work on my Macintosh with Windows 7 via Bootcamp. The application would run, but the audio interface wouldn't work.
- If you don't have a computer that runs Windows, maybe you can borrow a friend's computer.
Alternate - Download the Swac Recorder (For Windows or Linux)As far as I know, nobody contributing to the Tatoeba Project has used this one.
(I think this is just a slight rewrite of the Shtooka program.) So far, I think I still prefer the Shtooka Recorder. However, maybe one of these will work better for you.
See a screenshot comparison between the Shtooka Recorder and the Swac Recorder.
- for Windows (XP, Vista, Win7, Win8), download the 32-bit or 64-bit binary
- for Ubuntu (13.10, 12.04): download the 32-bit or 64-bit binary
- for Fedora (20, 19), download the source package
You can't easily record long sentences with the Swac Recorder.
1-minute Demo of the Shtoooka Recorder by CKThis quick demo shows you how fast and easily you can record sentences.
Notice that on the 4th sentence, the audio was "saturated" and would have been distorted, but the recorder takes care of this kind of error by flashing pink in the level meter and then going back to the beginning of that sentence, so you can record it again..
Steps 2 and 3: Record a few sentences and send them to us.
- Record a few sentences and send these to email@example.com.
- Tell us what your native language is.
- Tell us your tatoeba.org username if you have one.
- If the quality isn't good enough, we can perhaps suggest ways to improve your recordings.
Steps 4 and 5: We will send you a list of sentences to record.
- Someone on the team will send you a list of sentences properly formatted for the Shtooka Recorder.
- Paste these into "Words to Record."
- And then record a few of these and send them to firstname.lastname@example.org, just to be sure everything is working OK before you spend a lot of time recording.
- Make sure your "mask" setting includes %CRC as shown in this image.
Step 6: Record as many sentences for us as you can.
- As you record, you should skip any sentence that don't sound natural to you. There is also an option to "remove" a sentence after you've recorded it.
- You can email the Flac files directly to email@example.com, or upload them to a dropbox account and send firstname.lastname@example.org the URL.
Suggestions / Recommendations
- Use the best external microphone that you have or can borrow from someone.
- Built-in microphones often pick up noise from the hard disk or the fan.
- The higher quality the microphone, the higher quality your recordings will be.
- Don't record all of the sentences.
- Don't record any sentences that don't sound 100% natural to you.
- Don't record sentences that you don't want your voice saying, such as offensive sentences or vulgar sentences.
- If you are a man, don't record sentences that would sound better with a woman's voice (and visa versa).
- (Even if they are perfectly good "written" sentences, it might be good to limit audio to things we actually say.)
- If you accidentally record a sentence that you don't want, you can use the "Remove" menu item in the Shtooka Recorder.
- Listen the the files before sending them to us.
- Before sending your audio files to the Tatoeba Project, listen to all of them (maybe twice), and throw out the ones that don't sound natural or that have unwanted noises. I suggest using VLC (Free at www.videolan.org), rather than the Shtooka Recorder, since I think it's easier and faster. VLC can play FLAC files.
My Current Recording Setup
Perhaps you don't need to read the rest of this page.
Screenshots from the Original Shtooka Recorder Instructions
The following is from shtooka.net (July 22, 2011) and is used under the Creative Commons "By" license. (http://creativecommons.org/licenses/by/2.0/fr/)
It has been slightly edited, mainly to eliminate non-working links.
List of words that will be recorded:This is where you paste in the sentence data.
Information About the Speaker:This can be ignored for the Tatoeba Project. We don't use this data. However, it doesn't hurt to enter it. You'll only need to do this once.
Audio Recording:You can resize these windows to fit your screen as you like.
The user pronounces the first word, then Shtooka Recorder automatically switch to the next word while saving the file.
How to configure Recording SettingsYou can use the default settings, so you don't need to change any of them.
For sentences, I find that setting the "final silence" to about 0.80 works quite well for me if there are some 2 sentence items. If all items are single sentences with no pauses, then 0:40 is what I use since I don't have to wait as long between sentences.
How does it work?
This window demonstrates the settings that are relevant to the recording process. Let's review the way the program works.
- The program continuously waits for a word to begin. It decides that a word begins when the input level exceeds a given threshold (the "start level", shown at point #1).
- When a word has started, the program waits for a silence. The program considers that there is a silence when the input level is low enough to be attributed to residual noise (i.e. when the input level goes below the "Max Noise Level" threshold, for example at point #2).
- During the recording of a sentence, there can be silences between words. To record a full sentence, the system has to distinguish between silences in the middle of the sentence and the final silence at the end of the sentence. The criterion is simple: when a silence exceeds a given length (say, 0.5s or 1s), it is considered to be a final silence, and the word/sentence is saved. When a silence is detected, a plain vertical line is drawn to show the point where the program considers silence to be final one (#6)
- When the program decides to save a word/sentence, it saves not only the grey "word" itself (#4), but also a small time before the word starts and after the word stops (the two hatched zones, #3)
- If the input level goes higher than the threshold #7 (the horizontal horizontal line), the program will consider that the input is saturated. You will then have to record the word again. Speak a bit more softly, or move your microphone further from your mouth.
The "Block Length" Parameter
This sets the time shown as a single block in the life "sound graph" diagram, and sets the duration for which "sound" or "silence" is determined. If you want a finer granularity, make it smaller; otherwise 0.05s is a good choice.
The "Margin Before" Parameter
This sets the time to be included in the recording before the first "sound" is determined. It should not be less than "Block Length", and usually should allow a listener to shift attention to listening after clicking "playback". (This is the duration of left of the two hatched zones, #3)
The "Margin After" Parameter
This sets the time to be included in the recording after the last "sound" block. It can be used as a "buffer" of silence before another sound recording can be played. (This is duration of the right of the two hatched zones, #3)
The "Final Silence" Parameter
This sets the time that the program has to wait after the end of the word (#6) to save it. If you want to record simple words, you can set it to 0.5s, if you are recording whole sentences, set it to 1s or 1.5s.
The "Minimum Length" Parameter
At the end of the word, if the total time is less than the "Minimum Length" the program will not save the buffer. This parameters can help you not to record parasite sounds.
The "Starting Threshold" Parameter
This sets the #1 Level, the minimum loudness triggering the beginning of the word or sentence.
The "Max Noise Level" Parameter
Sets the #2 threshold. Set it as low as you can. If this level is too high, the program will stop before the end of words!
The "Saturation Threshold" Parameter
This sets the #7 threshold. Try speaking very loudly into your microphone to determine the saturation level of your audio system, and set this parameter a little lower.
The documentation is also at web.archive.org
- English: http://web.archive.org/web/20110722082618/http://shtooka.net/soft/shtooka_recorder/en/
- French: http://web.archive.org/web/20110621160010/http://shtooka.net/soft/shtooka_recorder/fr/
YouTube VideoSkip to 0:32, if you've already downloaded the Shtooka Recorder.
Created by tatoeba.org/user/profile/AmberShadow, I think.
Linux SourceSwac-Record swac-record est un programme écrit en C++ pour Qt qui permet l’enregistrement systématique de mots ou expression.
Find Some "Packs" of Words and Sentences