ESPnet: end-to-end speech processing toolkit

文件網站:

https://

espnet。github。io/espnet

/installation。html

github地址:

https://

github。com/espnet/espne

t

paper:

https://

arxiv。org/pdf/1804。0001

5。pdf

程式碼總體

語音識別工具ESPnet程式碼結構

程式碼結構

espnet/ # Python modules

utils/ # Utility scripts of ESPnet

test/ # Unit test

test_utils/ # unit test for executable scripts

egs/ # The complete recipe for each corpora

an4/ # AN4 is tiny corpus and can be obtained freely, so it might be suitable for tutorial

asr1/ # ASR recipe

- run。sh # Executable script

- cmd。sh # To select the backend for job scheduler

- path。sh # Setup script for environment variables

- conf/ # Containing COnfiguration files

- steps/ # The utils scripts from Kaldi

- utils/ # The utils scripts from Kaldi

tts1/ # TTS recipe

。。。

1。 espnet

python程式碼,主要有以下這幾個部分:

語音增強

語音識別

語言模型

機器翻譯( machine translation )

語音翻譯 (speech translation)

語音合成 (speechhc to text )

2。 utils

工具python utility

在 utils/ 路徑下

資料格式處理

addjson。py

: add multiple json values to an input or output value

change_yaml。py

: change specified attributes of a YAML file

concatjson。py

: concatenate json files

get_yaml。py

: get a specified attribute from a YAML file

json2sctm。py

: convert json to sctm

json2text。py

: convert ASR recognized json to text

json2trn_mt。py

: convert json to machine translation transcription

json2trn。py

: convert a json to a transcription file with a token dictionary

json2trn_wo_dict。py

: convert a json to a transcription file with a token dictionary

mergejson。py

: merge json files

merge_scp2json。py

: Given each file paths with such format as can be omitted and the default is “str”。

mix-mono-wav-scp。py

: Mixing wav。scp files into a multi-channel wav。scp using sox。

result2json。py

: convert sclite’s result。txt file to json

splitjson。py

: split a json file for parallel processing

text2token。py

: convert raw text to tokenized text

text2vocabulary。py

: create a vocabulary file from text files

trim_silence。py

: Trim slience with simple power thresholding and make segments file。(切割silence的frame)

trn2ctm。py

: convert trn to ctm

trn2stm。py

: convert trn to stm

模型處理

average_checkpoints。py

: average models from snapshot

特徵處理

apply-cmvn。py

: apply mean-variance normalization to files

compute-cmvn-stats。py

: Compute cepstral mean and variance normalization statisticsIf wspecifier provided: per-utterance by default, or per-speaker ifspk2utt option provided; if wxfilename: global

compute-fbank-feats。py

: compute FBANK feature from WAV

compute-stft-feats。py

: compute STFT feature from WAV

convert_fbank_to_wav。py

:convert FBANK to WAV using Griffin-Lim algorithm(迭代,由頻譜求語音時域訊號的方法)

copy-feats。py

: copy feature with preprocessing

dump-pcm。py

: dump PCM files from a WAV scp file

eval-source-separation。py

: Evaluate enhanced speech。 e。g。 。。/doc/argparse2rst。py –ref ref。scp –enh enh。scp –outdir outputdiror 。。/doc/argparse2rst。py –ref ref。scp ref2。scp –enh enh。scp enh2。scp –outdir outputdir

feats2npy。py

: Convet kaldi-style features to numpy arrays

feat-to-shape。py

: convert feature to its shape

generate_wav_from_fbank。py

: generate wav from FBANK using wavenet vocoder

文字處理

filt。py

: filter words in a text file

工具 bash utility tools

在 utils/ 路徑下

資料格式轉換

data2json.sh

下載

download_from_google_drive.sh

特徵轉換

convert_fbank.sh

dump_pcm.sh

wav scp 到 pcm 波、

feat_to_shape.sh

generate_wav.sh

由fbank 生成 wav 檔案

make_fbank.sh

make_stft.sh

模型相關

pack_model.sh

recog_wav.sh

識別

synth_wav.sh

合成

translate_wav.sh

翻譯

spm_decode

分割

spm_encode

語音增強

eval_source_separation.sh