BiLSTM-CRF實現NER程式碼逐行解析-小白版

本文是對Pytorch官方文件中的示例程式碼進行的說明。希望透過這篇文章，能讓自己對這個模型程式碼的理解更加透徹。

本文適合ner或者說深度學習小白食用~

本文食用方法

：程式碼，參考文件1，都要看，參考文件2、3僅作為補充。後面附的程式碼除了維特比函式之外幾乎每一句都有解析，也可以看完參考文件1後不懂的地方直接看本文。

先說一下我學習這份程式碼的過程：找參考文件看，瞭解原理->下載原始碼跑通->對著程式碼一行一行看，看懂語法->開始在網上找各種原始碼解析資料，再次看程式碼，看懂每個函式的輸入輸出分別是什麼->第三遍看程式碼，分析函式中每一個用到的引數分別是什麼含義，在紙上把每一個張量用影象畫一下->最後一遍看程式碼，總結所有函式互相呼叫的過程，邊看邊畫圖（不過這次畫的圖太抽象了，很多細節都不在上面，大家就當作函式目錄看一下好啦），並根據自己理解註釋程式碼->形成這篇文字。

一共歷時將近2個月（不過前一個月多在家過暑假來著全把時間放在語法上了，自律性差，搞了太久），從第三遍用了4天，第四遍只用了1個下午。這是我搞的差不多的第一個深度學習模型，希望有了這次經歷以後學習起來可以輕鬆一些。（其實還有很多細節沒有掌握，以後慢慢來啦哈哈哈）

程式碼來源

：

https：//

pytorch。org/tutorials/b

eginner/nlp/advanced_tutorial。html

參考文件

：參考文件1 這篇文章裡給了很多理論解釋的連結，建議先看這篇打好理論基礎。

參考文件2 這篇文章裡給出了很多公式的變形過程，建議瞭解大致工作原理，在對照程式碼看的時候看這篇哦，還要拿好紙和筆跟著演算（比如為什麼句子總分數用了log_sum_exp（）計算而實際句子分數卻不用）。

參考文件3 這篇文章的作者對程式碼很細節的地方都進行了解析！非常易懂，如果有程式碼細節搞不明白的地方（小白搞不明白的地方哈哈，因為我的每一個搞不懂都透露著我是小白本白）可以看這篇哦。

我畫der圖：

import

torch

import

torch。autograd

autograd

import

torch。nn

import

torch。optim

optim

#進行最佳化演算法時使用

torch

。

manual_seed

（

）

#設定隨機種子，使得每次使用rand生成的隨機數是一樣的

‘’‘

以數值形式返回張量最大值的索引

輸入：vec：張量，由下面的引用可知這裡應該是一維張量（只有一行），即向量

輸出：以值的形式返回vec中每行最大值的索引

’‘’

def

argmax

（

vec

）：

# return the argmax as a python int

，

idx

torch

。

max

（

vec

，

）

return

idx

。

item

（）

‘’‘

根據輸入的句子序列轉換成在字典中對應的序號序列

用於對輸入的句子進行預處理

輸入：seq：用於訓練的句子序列集

to_ix：陣列，從seq中單詞對映到id

輸出：句子序列對應的id tensor（張量）

’‘’

def

prepare_sequence

（

seq

，

to_ix

）：

idxs

［

to_ix

［

］

for

seq

］

return

torch

。

tensor

（

idxs

，

dtype

torch

。

long

）

# Compute log sum exp in a numerically stable way for the forward algorithm

‘’‘

用於計算句子的總得分

輸入：vec：綜合分數的張量（還不清楚這個向量是幾維，如何計算的）

輸出：損失函式，用於向前計算，訓練時使用

’‘’

def

log_sum_exp

（

vec

）：

max_score

vec

［

，

argmax

（

vec

）］

max_score_broadcast

max_score

。

view

（

，

）

。

expand

（

，

vec

。

size

（）［

］）

return

max_score

torch

。

log

（

torch

。

sum

（

torch

。

exp

（

vec

max_score_broadcast

）））

‘’‘建立模型時使用的類’‘’

class

BiLSTM_CRF

（

。

Module

）：

‘’‘

初始化函式

輸入：字典大小，標籤-id對應陣列，詞嵌入向量的維度，隱層的維度

’‘’

def

__init__

（

self

，

vocab_size

，

tag_to_ix

，

embedding_dim

，

hidden_dim

）：

super

（

BiLSTM_CRF

，

self

）

。

__init__

（）

self

。

embedding_dim

self

。

hidden_dim

self

。

vocab_size

self

。

tag_to_ix

self

。

tagset_size

len

（

tag_to_ix

）

#tagset_size 指tag的個數

self

。

word_embeds

。

Embedding

（

vocab_size

，

embedding_dim

）

#利用已有的Embedding函式建立嵌入層（該層維度：vocab_size，embedding_dim）

self

。

lstm

。

LSTM

（

embedding_dim

，

hidden_dim

，

num_layers

，

bidirectional

True

）

#利用已有的LSTM函式建立雙向lstm層，

#特別注意一下這裡的//2（因為是雙向的關係）

# Maps the output of the LSTM into tag space。

self

。

hidden2tag

。

Linear

（

hidden_dim

，

self

。

tagset_size

）

#用線性函式，將隱層（lstm）的輸出張量對映到tag域的輸出層

# Matrix of transition parameters。 Entry i，j is the score of

# transitioning *to* i *from* j。

self

。

transitions

。

Parameter

（

#使用Parameter（）得到的引數可以自動隨著模型發生改變

torch

。

randn

（

self

。

tagset_size

，

self

。

tagset_size

））

#轉移矩陣的初始化（維度：tagset_size，tagset_size）

# These two statements enforce the constraint that we never transfer

# to the start tag and we never transfer from the stop tag

#轉移矩陣是為了體現標籤之間相互轉移的關係，注意：transitions［i］［j］表示從j->i的可能分數

self

。

transitions

。

data

［

tag_to_ix

［

START_TAG

］，

：］

10000

#不能從任何標籤轉移到START_TAG

self

。

transitions

。

data

［：，

tag_to_ix

［

STOP_TAG

］］

10000

#STOP_TAG不能轉移到任何標籤

self

。

hidden

self

。

init_hidden

（）

‘’‘

初始化隱層的引數h0（每個句子的初始隱藏狀態）與c0（每個句子的初始細胞狀態），這裡的隱層就是Bilstm層

輸出：初始化的隱層引數（張量維度：num_layers*num_directions，batch_size，hidden_size）

’‘’

def

init_hidden

（

self

）：

return

（

torch

。

randn

（

，

self

。

hidden_dim

），

torch

。

randn

（

，

self

。

hidden_dim

））

‘’‘

計算一個句子的所有路徑總得分，要注意這裡的得分不是簡單的幾個矩陣分數對應相加，是以log_sum_exp形式得到的，是公式中的分母

輸入：feats： lstm層的發射矩陣（張量維度：seq_len， tagset_size）？？？？？

輸出：所有路徑的總得分（是根據公式來的）

’‘’

def

_forward_alg

（

self

，

feats

）：

# Do the forward algorithm to compute the partition function

init_alphas

torch

。

full

（（

，

self

。

tagset_size

），

10000。

）

#init_alphas 用於向前計算訓練時的引數（維度：1，tagset_size）。這裡是用-10000對引數初始化

# START_TAG has all of the score。

init_alphas

［

］［

self

。

tag_to_ix

［

START_TAG

］］

0。

#一定是從 START_TAG 轉移到該句子第一個單詞的

# Wrap in a variable so that we will get automatic backprop

forward_var

init_alphas

#具體運算的過程可以參考公式

# Iterate through the sentence

for

feat

feats

：

#feat：每一個單詞（時間步）對應的各個tag得分

alphas_t

［］

# The forward tensors at this timestep 當前時間步的向前函式

for

next_tag

range

（

self

。

tagset_size

）：

# broadcast the emission score： it is the same regardless ofthe previous tag

emit_score

feat

［

next_tag

］

。

view

（

，

）

。

expand

（

，

self

。

tagset_size

）

#看到這裡可以明白，feats就是lstm層的發射矩陣

# the ith entry of trans_score is the score of transitioning to

# next_tag from i

trans_score

self

。

transitions

［

next_tag

］

。

view

（

，

）

# The ith entry of next_tag_var is the value for the

# edge （i -> next_tag） before we do log-sum-exp

next_tag_var

forward_var

trans_score

emit_score

#下一個tag時當前next_tag的得分為三個分數加和

# The forward variable for this tag is log-sum-exp of all the

# scores。

alphas_t

。

append

（

log_sum_exp

（

next_tag_var

）

。

view

（

））

#next_tag的綜合分數

forward_var

torch

。

cat

（

alphas_t

）

。

view

（

，

）

#更新句子的向前引數

terminal_var

forward_var

self

。

transitions

［

self

。

tag_to_ix

［

STOP_TAG

］］

#算上從句子最後一個單詞轉移到STOP_TAG的分數

alpha

log_sum_exp

（

terminal_var

）

#得到最終得分

return

alpha

‘’‘

得到一個句子中每個詞的lstm輸出對應的tag（即lstm層的發射矩陣），也就是經過lstm層得到的分數

輸入：sentence：一個句子

輸出：在tag中的對映，即lstm層的發射矩陣（張量維度：seq_len， tagset）

’‘’

def

_get_lstm_features

（

self

，

sentence

）：

self

。

hidden

self

。

init_hidden

（）

embeds

self

。

word_embeds

（

sentence

）

。

view

（

len

（

sentence

），

，

）

#改變維度為（seq_len，batch_size，embedding_dim）

lstm_out

，

self

。

hidden

self

。

lstm

（

embeds

，

self

。

hidden

）

#更新隱層引數

lstm_out

。

view

（

len

（

sentence

），

self

。

hidden_dim

）

#LSTM輸出是二維向量

lstm_feats

self

。

hidden2tag

（

lstm_out

）

#線性變化，對映到tag_space，作為發射矩陣

return

lstm_feats

‘’‘

給出lstm層得到的句子真實標籤的訓練得分（這裡的得分是不需要用到log_sum_exp（）函式的，因為對公式進行了變形）

輸入：feats：lstm層發射矩陣

tags：給出的標記序列

輸出：句子的得分 = 轉移分數+lstm層的計算分數

’‘’

def

_score_sentence

（

self

，

feats

，

tags

）：

# Gives the score of a provided tag sequence

score

torch

。

zeros

（

）

tags

torch

。

cat

（［

torch

。

tensor

（［

self

。

tag_to_ix

［

START_TAG

］］，

dtype

torch

。

long

），

tags

］）

#在給定句子的開頭加START_TAG（cat（）中括號和小括號都可以）

for

，

feat

enumerate

（

feats

）：

score

self

。

transitions

［

tags

［

］，

tags

［

］］

feat

［

tags

［

］］

score

self

。

transitions

［

self

。

tag_to_ix

［

STOP_TAG

］，

tags

［

］］

#給句子算上轉移到最後STOP_TAG的分數

return

score

‘’‘

用於計算最佳路徑以及最佳路徑得分

輸入引數：feats：句子到tag的對映張量

輸出引數：（預測的）最佳路徑的得分；最佳路徑

’‘’

#維特比解碼器

def

_viterbi_decode

（

self

，

feats

）：

backpointers

［］

# Initialize the viterbi variables in log space

init_vvars

torch

。

full

（（

，

self

。

tagset_size

），

10000。

）

init_vvars

［

］［

self

。

tag_to_ix

［

START_TAG

］］

# forward_var at step i holds the viterbi variables for step i-1

forward_var

init_vvars

for

feat

feats

：

bptrs_t

［］

# holds the backpointers for this step

viterbivars_t

［］

# holds the viterbi variables for this step

for

next_tag

range

（

self

。

tagset_size

）：

# next_tag_var［i］ holds the viterbi variable for tag i at the

# previous step， plus the score of transitioning

# from tag i to next_tag。

# We don‘t include the emission scores here because the max

# does not depend on them （we add them in below）

next_tag_var

forward_var

self

。

transitions

［

next_tag

］

best_tag_id

argmax

（

next_tag_var

）

bptrs_t

。

append

（

best_tag_id

）

viterbivars_t

。

append

（

next_tag_var

［

］［

best_tag_id

］

。

view

（

））

# Now add in the emission scores， and assign forward_var to the set

# of viterbi variables we just computed

forward_var

（

torch

。

cat

（

viterbivars_t

）

feat

）

。

view

（

，

）

backpointers

。

append

（

bptrs_t

）

# Transition to STOP_TAG

terminal_var

forward_var

self

。

transitions

［

self

。

tag_to_ix

［

STOP_TAG

］］

best_tag_id

argmax

（

terminal_var

）

path_score

terminal_var

［

］［

best_tag_id

］

# Follow the back pointers to decode the best path。

best_path

［

best_tag_id

］

for

bptrs_t

reversed

（

backpointers

）：

best_tag_id

bptrs_t

［

best_tag_id

］

best_path

。

append

（

best_tag_id

）

# Pop off the start tag （we dont want to return that to the caller）

start

best_path

。

pop

（）

assert

start

self

。

tag_to_ix

［

START_TAG

］

# Sanity check

best_path

。

reverse

（）

#將得到的路徑反轉得到真實的最佳路徑

return

path_score

，

best_path

#反向傳播

’‘’

綜合上面所寫的計算真實的路徑值，和計算路徑值之和的函式，用二者之差作為loss，我們的目標是透過訓練讓loss變小

loss越小，說明非正確路徑的得分越接近0，結果也就越準確

輸入：sentence：所用句子

tags：真實的序列標籤。

輸出：loss

‘’‘

def

neg_log_likelihood

（

self

，

sentence

，

tags

）：

feats

self

。

_get_lstm_features

（

sentence

）

#lstm

forward_score

self

。

_forward_alg

（

feats

）

#前向傳播，算出的所有路徑的總分數

gold_score

self

。

_score_sentence

（

feats

，

tags

）

#根據實際得到的標籤計算的真實路徑分數

return

forward_score

gold_score

#根據差值反向訓練，損失函式

’‘’

應該是用於預測的函式（雖然不知道為啥叫 forward）

輸入：sentence：待預測的句子序列

輸出：tag序列得分，tag序列

‘’‘

def

forward

（

self

，

sentence

）：

# dont confuse this with _forward_alg above。

# Get the emission scores from the BiLSTM

lstm_feats

self

。

_get_lstm_features

（

sentence

）

# Find the best path， given the features。

score

，

tag_seq

self

。

_viterbi_decode

（

lstm_feats

）

return

score

，

tag_seq

START_TAG

“”

STOP_TAG

“”

EMBEDDING_DIM

HIDDEN_DIM

# Make up some training data

training_data

［（

“the wall street journal reported today that apple corporation made money”

。

split

（），

“B I I I O O O B I O O”

。

split

（）

），

（

“georgia tech is a university in georgia”

。

split

（），

“B I O O O O B”

。

split

（）

）］

test_data

［（

“the apple”

。

split

（），

“B I”

。

split

（）

）］

word_to_ix

{}

for

sentence

，

tags

training_data

：

for

word

sentence

：

word

not

word_to_ix

：

word_to_ix

［

word

］

len

（

word_to_ix

）

tag_to_ix

{

“B”

：

，

“I”

：

，

“O”

：

，

START_TAG

：

，

STOP_TAG

：

}

ix_to_tag

{}

for

，

tag_to_ix

。

items

（）：

ix_to_tag

［

］

model

BiLSTM_CRF

（

len

（

word_to_ix

），

tag_to_ix

，

EMBEDDING_DIM

，

HIDDEN_DIM

）

optimizer

optim

。

SGD

（

model

。

parameters

（），

0。01

，

weight_decay

1e-4

）

#最佳化器

# Check predictions before training

with

torch

。

no_grad

（）：

precheck_sent

prepare_sequence

（

training_data

［

］［

］，

word_to_ix

）

precheck_tags

torch

。

tensor

（［

tag_to_ix

［

］

for

training_data

［

］［

］］，

dtype

torch

。

long

）

（

model

（

precheck_sent

））

（

tensor

（

9。2679

），

［

，

］）

# Make sure prepare_sequence from earlier in the LSTM section is loaded

for

epoch

range

（

300

）：

# again， normally you would NOT do 300 epochs， it is toy data

for

sentence

，

tags

training_data

：

# Step 1。 Remember that Pytorch accumulates gradients。

# We need to clear them out before each instance

model

。

zero_grad

（）

#對於每一條句子都應該先梯度清零

# Step 2。 Get our inputs ready for the network， that is，

# turn them into Tensors of word indices。

sentence_in

prepare_sequence

（

sentence

，

word_to_ix

）

#把句子中的每一個單詞都對應變成了一個序號，句子變成了一個序號張量（len（）*1）

targets

torch

。

tensor

（［

tag_to_ix

［

］

for

tags

］，

dtype

torch

。

long

）

# Step 3。 Run our forward pass。

loss

model

。

neg_log_likelihood

（

sentence_in

，

targets

）

#訓練

# Step 4。 Compute the loss， gradients， and update the parameters by

# calling optimizer。step（）

loss

。

backward

（）

#計算梯度

optimizer

。

step

（）

#更新引數

# Check predictions after training

with

torch

。

no_grad

（）：

precheck_sent

prepare_sequence

（

training_data

［

］［

］，

word_to_ix

）

（

model

（

precheck_sent

））

#打印出來的是最佳路徑的得分以及對應的tag序號

# We got it！

# 輸出：（tensor（20。4906），［0， 1， 1， 1， 2， 2， 2， 0， 1， 2， 2］）

’‘’以下是對輸出形式的改變函式，與演算法無關‘’‘

# def get_entity（char_seq，tag_seq）：

# length = len（char_seq）

# entity = ［］

# for i，（char，tag） in enumerate（zip（char_seq，tag_seq））：

# if tag == ’B‘：

# if ’ent‘ in locals（）。keys（）：

# entity。append（ent）

# del ent

# ent = char

# if i+1 == length：

# entity。append（ent）

# if tag ==’I‘：

# ent = ent + “ ” + char

# if i+1 == length：

# entity。append（ent）

# if tag not in ［’B‘，’I‘］：

# if ’ent‘ in locals（）。keys（）：

# entity。append（ent）

# del ent

# continue

# return entity

# # Check predictions after training

# with torch。no_grad（）：

# precheck_sent = prepare_sequence（training_data［0］［0］， word_to_ix）

# # print（model（precheck_sent）） # 返回路徑最大分數和維特比演算法得出的路線

# path_score，state_path = model（precheck_sent）

# y_pred = ［ix_to_tag［x］ for x in state_path］

# entity_list = get_entity（training_data［0］［0］，y_pred）

# for x in entity_list：

# print（x）

# # We got it！

文章有問題的地方還請批評指正！大家一起討論學習呀：）

BiLSTM-CRF實現NER程式碼逐行解析-小白版

shivering consciousness是什麼意思

vivo NEX雙屏版相較於其他手機，有哪些優勢呢？

隨便看看

創維電視連線手機,黑屏什麼意思？

牧民燉牛排做法？

分家分家協議書怎樣才有法律效力？

資政大夫祠要門票嗎？

BiLSTM-CRF實現NER程式碼逐行解析-小白版

shivering consciousness是什麼意思

vivo NEX雙屏版相較於其他手機，有哪些優勢呢？

猜你喜歡

請問買蘋果8好還是蘋果x？

YJango的迴圈神經網路——scan實現LSTM

(WIN10+cpu)Anaconda安裝pytorch後續(anaconda安裝完成後)

隨便看看

創維電視連線手機,黑屏什麼意思？

牧民燉牛排做法？

分家分家協議書怎樣才有法律效力？

資政大夫祠要門票嗎？