TensorFlow2.0教程-文字分類
我們將構建一個簡單的文字分類器,並使用IMDB進行訓練和測試
最全Tensorflow 2.0 入門教程持續更新:
完整tensorflow2。0教程程式碼請看https://github。com/czy36mengfei/tensorflow2_tutorials_chinese (歡迎star)
本教程主要由tensorflow2.0官方教程的個人學習復現筆記整理而來,中文講解,方便喜歡閱讀中文教程的朋友,官方教程:
https://www。tensorflow。org
1。IMDB資料集
下載
imdb
=
keras
。
datasets
。
imdb
(
train_x
,
train_y
),
(
test_x
,
text_y
)
=
keras
。
datasets
。
imdb
。
load_data
(
num_words
=
10000
)
瞭解IMDB資料
(
“Training entries: {}, labels: {}”
。
format
(
len
(
train_x
),
len
(
train_y
)))
(
train_x
[
0
])
(
‘len: ’
,
len
(
train_x
[
0
]),
len
(
train_x
[
1
]))
Training
entries
:
25000
,
labels
:
25000
[
1
,
14
,
22
,
16
,
43
,
530
,
973
,
1622
,
1385
,
65
,
458
,
4468
,
66
,
3941
,
4
,
173
,
36
,
256
,
5
,
25
,
100
,
43
,
838
,
112
,
50
,
670
,
2
,
9
,
35
,
480
,
284
,
5
,
150
,
4
,
172
,
112
,
167
,
2
,
336
,
385
,
39
,
4
,
172
,
4536
,
1111
,
17
,
546
,
38
,
13
,
447
,
4
,
192
,
50
,
16
,
6
,
147
,
2025
,
19
,
14
,
22
,
4
,
1920
,
4613
,
469
,
4
,
22
,
71
,
87
,
12
,
16
,
43
,
530
,
38
,
76
,
15
,
13
,
1247
,
4
,
22
,
17
,
515
,
17
,
12
,
16
,
626
,
18
,
2
,
5
,
62
,
386
,
12
,
8
,
316
,
8
,
106
,
5
,
4
,
2223
,
5244
,
16
,
480
,
66
,
3785
,
33
,
4
,
130
,
12
,
16
,
38
,
619
,
5
,
25
,
124
,
51
,
36
,
135
,
48
,
25
,
1415
,
33
,
6
,
22
,
12
,
215
,
28
,
77
,
52
,
5
,
14
,
407
,
16
,
82
,
2
,
8
,
4
,
107
,
117
,
5952
,
15
,
256
,
4
,
2
,
7
,
3766
,
5
,
723
,
36
,
71
,
43
,
530
,
476
,
26
,
400
,
317
,
46
,
7
,
4
,
2
,
1029
,
13
,
104
,
88
,
4
,
381
,
15
,
297
,
98
,
32
,
2071
,
56
,
26
,
141
,
6
,
194
,
7486
,
18
,
4
,
226
,
22
,
21
,
134
,
476
,
26
,
480
,
5
,
144
,
30
,
5535
,
18
,
51
,
36
,
28
,
224
,
92
,
25
,
104
,
4
,
226
,
65
,
16
,
38
,
1334
,
88
,
12
,
16
,
283
,
5
,
16
,
4472
,
113
,
103
,
32
,
15
,
16
,
5345
,
19
,
178
,
32
]
len
:
218
189
建立id和詞的匹配字典
word_index
=
imdb
。
get_word_index
()
word2id
=
{
k
:(
v
+
3
)
for
k
,
v
in
word_index
。
items
()}
word2id
[
‘
]
=
0
word2id
[
‘
]
=
1
word2id
[
‘
]
=
2
word2id
[
‘
]
=
3
id2word
=
{
v
:
k
for
k
,
v
in
word2id
。
items
()}
def
get_words
(
sent_ids
):
return
‘ ’
。
join
([
id2word
。
get
(
i
,
‘?’
)
for
i
in
sent_ids
])
sent
=
get_words
(
train_x
[
0
])
(
sent
)
<
START
>
this
film
was
just
brilliant
casting
location
scenery
story
direction
everyone
‘s really suited the part they played and you could just imagine being there robert
s
that
played
the
<
UNK
>
of
norman
and
paul
they
were
just
brilliant
children
are
often
left
out
of
the
<
UNK
>
list
i
think
because
the
stars
that
play
them
all
grown
up
are
such
a
big
profile
for
the
whole
film
but
these
children
are
amazing
and
should
be
praised
for
what
they
have
done
don
‘t you think the whole story was so lovely because it was true and was someone’
s
life
after
all
that
was
shared
with
us
all
2。準備資料
# 句子末尾padding
train_x
=
keras
。
preprocessing
。
sequence
。
pad_sequences
(
train_x
,
value
=
word2id
[
‘
],
padding
=
‘post’
,
maxlen
=
256
)
test_x
=
keras
。
preprocessing
。
sequence
。
pad_sequences
(
test_x
,
value
=
word2id
[
‘
],
padding
=
‘post’
,
maxlen
=
256
)
(
train_x
[
0
])
(
‘len: ’
,
len
(
train_x
[
0
]),
len
(
train_x
[
1
]))
[
1
14
22
16
43
530
973
1622
1385
65
458
4468
66
3941
4
173
36
256
5
25
100
43
838
112
50
670
2
9
35
480
284
5
150
4
172
112
167
2
336
385
39
4
172
4536
1111
17
546
38
13
447
4
192
50
16
6
147
2025
19
14
22
4
1920
4613
469
4
22
71
87
12
16
43
530
38
76
15
13
1247
4
22
17
515
17
12
16
626
18
2
5
62
386
12
8
316
8
106
5
4
2223
5244
16
480
66
3785
33
4
130
12
16
38
619
5
25
124
51
36
135
48
25
1415
33
6
22
12
215
28
77
52
5
14
407
16
82
2
8
4
107
117
5952
15
256
4
2
7
3766
5
723
36
71
43
530
476
26
400
317
46
7
4
2
1029
13
104
88
4
381
15
297
98
32
2071
56
26
141
6
194
7486
18
4
226
22
21
134
476
26
480
5
144
30
5535
18
51
36
28
224
92
25
104
4
226
65
16
38
1334
88
12
16
283
5
16
4472
113
103
32
15
16
5345
19
178
32
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
]
len
:
256
256
3。構建模型
import
tensorflow。keras。layers
as
layers
vocab_size
=
10000
model
=
keras
。
Sequential
()
model
。
add
(
layers
。
Embedding
(
vocab_size
,
16
))
model
。
add
(
layers
。
GlobalAveragePooling1D
())
model
。
add
(
layers
。
Dense
(
16
,
activation
=
‘relu’
))
model
。
add
(
layers
。
Dense
(
1
,
activation
=
‘sigmoid’
))
model
。
summary
()
model
。
compile
(
optimizer
=
‘adam’
,
loss
=
‘binary_crossentropy’
,
metrics
=
[
‘accuracy’
])
Model
:
“sequential”
_________________________________________________________________
Layer
(
type
)
Output
Shape
Param
#
=================================================================
embedding
(
Embedding
)
(
None
,
None
,
16
)
160000
_________________________________________________________________
global_average_pooling1d
(
Gl
(
None
,
16
)
0
_________________________________________________________________
dense
(
Dense
)
(
None
,
16
)
272
_________________________________________________________________
dense_1
(
Dense
)
(
None
,
1
)
17
=================================================================
Total
params
:
160
,
289
Trainable
params
:
160
,
289
Non
-
trainable
params
:
0
_________________________________________________________________
4。模型訓練與驗證
x_val
=
train_x
[:
10000
]
x_train
=
train_x
[
10000
:]
y_val
=
train_y
[:
10000
]
y_train
=
train_y
[
10000
:]
history
=
model
。
fit
(
x_train
,
y_train
,
epochs
=
40
,
batch_size
=
512
,
validation_data
=
(
x_val
,
y_val
),
verbose
=
1
)
result
=
model
。
evaluate
(
test_x
,
text_y
)
(
result
)
Train
on
15000
samples
,
validate
on
10000
samples
Epoch
1
/
40
15000
/
15000
[
==============================
]
-
1
s
73
us
/
sample
-
loss
:
0。6919
-
accuracy
:
0。5071
-
val_loss
:
0。6901
-
val_accuracy
:
0。5101
。。。
Epoch
40
/
40
15000
/
15000
[
==============================
]
-
1
s
45
us
/
sample
-
loss
:
0。1046
-
accuracy
:
0。9721
-
val_loss
:
0。3022
-
val_accuracy
:
0。8843
25000
/
25000
[
==============================
]
-
1
s
22
us
/
sample
-
loss
:
0。3216
-
accuracy
:
0。8729
[
0。32155542838573453
,
0。87292
]
5。查看準確率時序圖
import
matplotlib。pyplot
as
plt
history_dict
=
history
。
history
history_dict
。
keys
()
acc
=
history_dict
[
‘accuracy’
]
val_acc
=
history_dict
[
‘val_accuracy’
]
loss
=
history_dict
[
‘loss’
]
val_loss
=
history_dict
[
‘val_loss’
]
epochs
=
range
(
1
,
len
(
acc
)
+
1
)
plt
。
plot
(
epochs
,
loss
,
‘bo’
,
label
=
‘train loss’
)
plt
。
plot
(
epochs
,
val_loss
,
‘b’
,
label
=
‘val loss’
)
plt
。
title
(
‘Train and val loss’
)
plt
。
xlabel
(
‘Epochs’
)
plt
。
xlabel
(
‘loss’
)
plt
。
legend
()
plt
。
show
()
plt
。
clf
()
# clear figure
plt
。
plot
(
epochs
,
acc
,
‘bo’
,
label
=
‘Training acc’
)
plt
。
plot
(
epochs
,
val_acc
,
‘b’
,
label
=
‘Validation acc’
)
plt
。
title
(
‘Training and validation accuracy’
)
plt
。
xlabel
(
‘Epochs’
)
plt
。
ylabel
(
‘Accuracy’
)
plt
。
legend
()
plt
。
show
()