ESPnet: end-to-end speech processing toolkit
文件網站:
https://
espnet。github。io/espnet
/installation。html
github地址:
https://
github。com/espnet/espne
t
paper:
https://
arxiv。org/pdf/1804。0001
5。pdf
程式碼總體
程式碼結構
espnet/ # Python modules
utils/ # Utility scripts of ESPnet
test/ # Unit test
test_utils/ # unit test for executable scripts
egs/ # The complete recipe for each corpora
an4/ # AN4 is tiny corpus and can be obtained freely, so it might be suitable for tutorial
asr1/ # ASR recipe
- run。sh # Executable script
- cmd。sh # To select the backend for job scheduler
- path。sh # Setup script for environment variables
- conf/ # Containing COnfiguration files
- steps/ # The utils scripts from Kaldi
- utils/ # The utils scripts from Kaldi
tts1/ # TTS recipe
。。。
1。 espnet
python程式碼,主要有以下這幾個部分:
語音增強
語音識別
語言模型
機器翻譯( machine translation )
語音翻譯 (speech translation)
語音合成 (speechhc to text )
2。 utils
工具python utility
在 utils/ 路徑下
資料格式處理
addjson。py
: add multiple json values to an input or output value
change_yaml。py
: change specified attributes of a YAML file
concatjson。py
: concatenate json files
get_yaml。py
: get a specified attribute from a YAML file
json2sctm。py
: convert json to sctm
json2text。py
: convert ASR recognized json to text
json2trn_mt。py
: convert json to machine translation transcription
json2trn。py
: convert a json to a transcription file with a token dictionary
json2trn_wo_dict。py
: convert a json to a transcription file with a token dictionary
mergejson。py
: merge json files
merge_scp2json。py
: Given each file paths with such format as
mix-mono-wav-scp。py
: Mixing wav。scp files into a multi-channel wav。scp using sox。
result2json。py
: convert sclite’s result。txt file to json
splitjson。py
: split a json file for parallel processing
text2token。py
: convert raw text to tokenized text
text2vocabulary。py
: create a vocabulary file from text files
trim_silence。py
: Trim slience with simple power thresholding and make segments file。(切割silence的frame)
trn2ctm。py
: convert trn to ctm
trn2stm。py
: convert trn to stm
模型處理
average_checkpoints。py
: average models from snapshot
特徵處理
apply-cmvn。py
: apply mean-variance normalization to files
compute-cmvn-stats。py
: Compute cepstral mean and variance normalization statisticsIf wspecifier provided: per-utterance by default, or per-speaker ifspk2utt option provided; if wxfilename: global
compute-fbank-feats。py
: compute FBANK feature from WAV
compute-stft-feats。py
: compute STFT feature from WAV
convert_fbank_to_wav。py
:convert FBANK to WAV using Griffin-Lim algorithm(迭代,由頻譜求語音時域訊號的方法)
copy-feats。py
: copy feature with preprocessing
dump-pcm。py
: dump PCM files from a WAV scp file
eval-source-separation。py
: Evaluate enhanced speech。 e。g。 。。/doc/argparse2rst。py –ref ref。scp –enh enh。scp –outdir outputdiror 。。/doc/argparse2rst。py –ref ref。scp ref2。scp –enh enh。scp enh2。scp –outdir outputdir
feats2npy。py
: Convet kaldi-style features to numpy arrays
feat-to-shape。py
: convert feature to its shape
generate_wav_from_fbank。py
: generate wav from FBANK using wavenet vocoder
文字處理
filt。py
: filter words in a text file
工具 bash utility tools
在 utils/ 路徑下
資料格式轉換
data2json.sh
下載
download_from_google_drive.sh
特徵轉換
convert_fbank.sh
dump_pcm.sh
wav scp 到 pcm 波、
feat_to_shape.sh
generate_wav.sh
由fbank 生成 wav 檔案
make_fbank.sh
make_stft.sh
模型相關
pack_model.sh
recog_wav.sh
識別
synth_wav.sh
合成
translate_wav.sh
翻譯
spm_decode
分割
spm_encode
語音增強
eval_source_separation.sh