TensorFlow2.0教程-文字分類

TensorFlow2.0教程-文字分類

我們將構建一個簡單的文字分類器，並使用IMDB進行訓練和測試

最全Tensorflow 2.0 入門教程持續更新：

完整tensorflow2。0教程程式碼請看https：//github。com/czy36mengfei/tensorflow2_tutorials_chinese （歡迎star）

本教程主要由tensorflow2.0官方教程的個人學習復現筆記整理而來，中文講解，方便喜歡閱讀中文教程的朋友，官方教程：

https：//www。tensorflow。org

1。IMDB資料集

下載

imdb

keras

。

datasets

。

imdb

（

train_x

，

train_y

），

（

test_x

，

text_y

）

keras

。

datasets

。

imdb

。

load_data

（

num_words

10000

）

瞭解IMDB資料

（

“Training entries： {}， labels： {}”

。

format

（

len

（

train_x

），

len

（

train_y

）））

（

train_x

［

］）

（

‘len： ’

，

len

（

train_x

［

］），

len

（

train_x

［

］））

Training

entries

：

25000

，

labels

：

25000

［

，

530

，

973

，

1622

，

1385

，

458

，

4468

，

3941

，

173

，

256

，

100

，

838

，

112

，

670

，

480

，

284

，

150

，

172

，

112

，

167

，

336

，

385

，

172

，

4536

，

1111

，

546

，

447

，

192

，

147

，

2025

，

1920

，

4613

，

469

，

530

，

1247

，

515

，

626

，

386

，

316

，

106

，

2223

，

5244

，

480

，

3785

，

130

，

619

，

124

，

135

，

1415

，

215

，

407

，

107

，

117

，

5952

，

256

，

3766

，

723

，

530

，

476

，

400

，

317

，

1029

，

104

，

381

，

297

，

2071

，

141

，

194

，

7486

，

226

，

134

，

476

，

480

，

144

，

5535

，

224

，

104

，

226

，

1334

，

283

，

4472

，

113

，

103

，

5345

，

178

，

］

len

：

218

189

建立id和詞的匹配字典

word_index

imdb

。

get_word_index

（）

word2id

{

：（

）

for

，

word_index

。

items

（）}

word2id

［

‘’

］

word2id

［

‘’

］

word2id

［

‘’

］

word2id

［

‘’

］

id2word

{

：

for

，

word2id

。

items

（）}

def

get_words

（

sent_ids

）：

return

‘ ’

。

join

（［

id2word

。

get

（

，

‘？’

）

for

sent_ids

］）

sent

get_words

（

train_x

［

］）

（

sent

）

START

this

film

was

just

brilliant

casting

location

scenery

story

direction

everyone

‘s really suited the part they played and you could just imagine being there robert is an amazing actor and now the same being director father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also to the two little boy’

that

played

the

UNK

norman

and

paul

they

were

just

brilliant

children

are

often

left

out

the

UNK

list

think

because

the

stars

that

play

them

all

grown

are

such

big

profile

for

the

whole

film

but

these

children

are

amazing

and

should

praised

for

what

they

have

done

don

‘t you think the whole story was so lovely because it was true and was someone’

life

after

all

that

was

shared

with

all

2。準備資料

# 句子末尾padding

train_x

keras

。

preprocessing

。

sequence

。

pad_sequences

（

train_x

，

value

word2id

［

‘’

］，

padding

‘post’

，

maxlen

256

）

test_x

keras

。

preprocessing

。

sequence

。

pad_sequences

（

test_x

，

value

word2id

［

‘’

］，

padding

‘post’

，

maxlen

256

）

（

train_x

［

］）

（

‘len： ’

，

len

（

train_x

［

］），

len

（

train_x

［

］））

［

530

973

1622

1385

458

4468

3941

173

256

100

838

112

670

480

284

150

172

112

167

336

385

172

4536

1111

546

447

192

147

2025

1920

4613

469

530

1247

515

626

386

316

106

2223

5244

480

3785

130

619

124

135

1415

215

407

107

117

5952

256

3766

723

530

476

400

317

1029

104

381

297

2071

141

194

7486

226

134

476

480

144

5535

224

104

226

1334

283

4472

113

103

5345

178

］

len

：

256

3。構建模型

import

tensorflow。keras。layers

layers

vocab_size

10000

model

keras

。

Sequential

（）

model

。

add

（

layers

。

Embedding

（

vocab_size

，

））

model

。

add

（

layers

。

GlobalAveragePooling1D

（））

model

。

add

（

layers

。

Dense

（

，

activation

‘relu’

））

model

。

add

（

layers

。

Dense

（

，

activation

‘sigmoid’

））

model

。

summary

（）

model

。

compile

（

optimizer

‘adam’

，

loss

‘binary_crossentropy’

，

metrics

［

‘accuracy’

］）

Model

：

“sequential”

_________________________________________________________________

Layer

（

type

）

Output

Shape

Param

=================================================================

embedding

（

Embedding

）

（

None

，

None

，

）

160000

_________________________________________________________________

global_average_pooling1d

（

None

，

）

_________________________________________________________________

dense

（

Dense

）

（

None

，

）

272

_________________________________________________________________

dense_1

（

Dense

）

（

None

，

）

=================================================================

Total

params

：

160

，

289

Trainable

params

：

160

，

289

Non

trainable

params

：

_________________________________________________________________

4。模型訓練與驗證

x_val

train_x

［：

10000

］

x_train

train_x

［

10000

：］

y_val

train_y

［：

10000

］

y_train

train_y

［

10000

：］

history

model

。

fit

（

x_train

，

y_train

，

epochs

，

batch_size

512

，

validation_data

（

x_val

，

y_val

），

verbose

）

result

model

。

evaluate

（

test_x

，

text_y

）

（

result

）

Train

15000

samples

，

validate

10000

samples

Epoch

15000

［

==============================

］

sample

loss

：

0。6919

accuracy

：

0。5071

val_loss

：

0。6901

val_accuracy

：

0。5101

。。。

Epoch

15000

［

==============================

］

sample

loss

：

0。1046

accuracy

：

0。9721

val_loss

：

0。3022

val_accuracy

：

0。8843

25000

［

==============================

］

sample

loss

：

0。3216

accuracy

：

0。8729

［

0。32155542838573453

，

0。87292

］

5。查看準確率時序圖

import

matplotlib。pyplot

plt

history_dict

history

。

history

history_dict

。

keys

（）

acc

history_dict

［

‘accuracy’

］

val_acc

history_dict

［

‘val_accuracy’

］

loss

history_dict

［

‘loss’

］

val_loss

history_dict

［

‘val_loss’

］

epochs

range

（

，

len

（

acc

）

plt

。

plot

（

epochs

，

loss

，

‘bo’

，

label

‘train loss’

）

plt

。

plot

（

epochs

，

val_loss

，

‘b’

，

label

‘val loss’

）

plt

。

title

（

‘Train and val loss’

）

plt

。

xlabel

（

‘Epochs’

）

plt

。

xlabel

（

‘loss’

）

plt

。

legend

（）

plt

。

show

（）

plt

。

clf

（）

# clear figure

plt

。

plot

（

epochs

，

acc

，

‘bo’

，

label

‘Training acc’

）

plt

。

plot

（

epochs

，

val_acc

，

‘b’

，

label

‘Validation acc’

）

plt

。

title

（

‘Training and validation accuracy’

）

plt

。

xlabel

（

‘Epochs’

）

plt

。

ylabel

（

‘Accuracy’

）

plt

。

legend

（）

plt

。

show

（）

TensorFlow2.0教程-文字分類

網友吐槽王源靠關係去的伯克利，以他的實力需要靠關係嗎？

成年人近視375度要一直戴眼鏡嗎，想需要的時候戴，不需要的時候摘掉?

隨便看看

問？杜鵑鳥為什麼要清明節過了才叫？

高鐵站大螢幕地標是什麼？

工藝專利和發明專利的區別？

水泵的出水口接頭是多大尺寸？

TensorFlow2.0教程-文字分類

網友吐槽王源靠關係去的伯克利，以他的實力需要靠關係嗎？

成年人近視375度要一直戴眼鏡嗎，想需要的時候戴，不需要的時候摘掉?

猜你喜歡

【conference】-nips2020感興趣文章list-技術文章10

NLP文字分類實戰: 傳統方法與深度學習

基於PySpark的分散式生物醫學文字挖掘第二部分：多項式Logistic迴歸

隨便看看

問？杜鵑鳥為什麼要清明節過了才叫？

高鐵站大螢幕地標是什麼？

工藝專利和發明專利的區別？

水泵的出水口接頭是多大尺寸？