人臉識別資料集及評估協議(尚未完成)

資料集分享

https：//

blog。csdn。net/u01365959

8/article/details/100680431

facenet原始碼，學習identification，資料集的使用和protocols

davidsandberg/facenet

CASIA-Webface dataset download link

CASIA-Webface dataset download link · Issue #18 · happynear/AMSoftmax

VGGFace2

第二部分，我準備寫人臉識別資料集中的指標含義

2. 人臉識別評價指標

這一小節介紹人臉識別中會用到的評價指標，包括模型效能評估常用指標、人臉識別常用指標和核心的實現方式。

2。1 模型效能評估常用指標

（4。4。2分類模型評判指標（一） - 混淆矩陣（Confusion Matrix） \ 人臉識別演算法評價指標——TAR，FAR，FRR，ERR - 程式設計師大本營 \ 人臉識別常用的效能評價指標 - I am what i am - CSDN部落格\ 人臉識別演算法評價指標——TAR，FAR，FRR，ERR - Lavi的專欄 - CSDN部落格）

2.1.1 混淆矩陣 (Confusion Matrix) (TP、TN、FP 和 FN)

混淆矩陣是一種評判模型分類結果的指標，屬於模型評估的一部分。

(1) TP

（True Positive）指的是實際為正樣本，預測結果恰好為正的個數（正確估計為正樣本數，實際就是正樣本）。

(2) TN

（True Negative）指的是實際為負樣本，預測結果恰好為負的個數（正確估計為負樣本數，實際就是負樣本）。

(3) FP

（False Positive）指的是實際為負樣本，預測結果卻為正的個數（錯誤估計為正樣本數，實際是負樣本）。

(4) FN

（False Negative）指的是實際為正樣本，預測結果卻為負的個數（錯誤估計為負樣本數，實際是正樣本）。

2.1.2 基於混淆矩陣的延伸指標1 (ACC 和 PPV)

(1) ACC

（Accuracy，準確率）指的是分類模型所有判斷正確的結果佔總觀測值的比重，

$ACC=\frac{TP+TN}{TP+TN+FP+FN}.\\$

(2) PPV

（Precision，精確率）指的是在模型預測是Positive的所有結果中，模型預測對的比重，

$PPV=\frac{TP}{TP+FP}.\\$

2.1.2 基於混淆矩陣的延伸指標2 (TPR、TNR、FPR 和 FNR)

這一部分的指標與2。1。3節中的有重疊部分，所以與以上分開展示。

(1) TPR

（True Positive Rate， Sensitivity，Recall，靈敏度）指的是真實值是Positive的所有結果中，模型預測對的比重，

$TPR=\frac{TP}{TP+FN}.\\$

(2) TNR

（True Negative Rate，Specificity，特異度）指的是在真實值是Negative的所有結果中，模型猜測對的比重，

$TNR=\frac{TN}{TN+FP}.\\$

(3) FPR

（False Positive Rate）指的是在真實值為Negative的所有結果中，模型猜測錯的比重，

$FPR=\frac{FP}{FP+TN}.\\$

(4) FNR

（False Negative Rate）指的是在真實值為Positive的所有結果中，模型猜測錯的比重，

$FNR=\frac{FN}{TP+FN}.\\$

我們根據TPR和FPR就可以得到ROC曲線了。

2.1.3 ROC、AUC和EER

(1) ROC

（receiver operating characteristic curve）曲線，我們這裡使用

https：//

blog。csdn。net/liuweiyux

iang/article/details/81259492

中的一張圖，圖中橫座標是FPR，縱座標是TPR，描述的是TPR-FPR的關係，曲線上的任意一點都對應了一個T （閾值）。

ROC曲線的特性：（1）對於（0，0）點，T=1，沒有預測為P的值，TP和FP都為0；（2）對於（1，1）點，T=0，全部預測為P；（3）（0，1）點為最完美髮分類器，完全區分正負樣例；（4）曲線越是“凸”向左上角，說明分類器效果越好；（5）隨機預測會得到（0，0）和（1，1）的直線上的一個點；（6）曲線上離（0，1）越近的點分類效果越好，對應越合理的T。

從圖中可以看出紅色曲線所代表的分類器效果好於藍色曲線所表示的分類器。

(2) AUC

（Area Under the Curve）曲線下面積

RUC曲線對應的曲線下面積，越大越好，最大值是1。

(3) EER

（Equal Error Rate）表示等誤率，也就是FPR=FNR的值。FNR=1-TPR，TPR=-FNR+1，FNR的曲線就是圖中的黑色直線，經過（0，1）和（1，0）。圖中曲線的橫座標是FPR，直線對應的橫座標是FNR，在A和B點的FPR=FNR，此時的FPR就是等誤率。

2.1.4 基於混淆矩陣的延伸指標3 (TAR、FAR & FRR、ERR)

(1) TAR

（True Accept Rate）表示正確接受的比例，（和以上TPR重疊）

$TAR=\frac{TP}{TP+FN}=\frac{同人分數>T}{同人比較的次數},\\$

TAR就是對屬於同一人的影象對進行比較，把它們當作是同一人的影象的比例。TAR越大越好。

(2) FAR

（False Accept Rate）表示錯誤接受的比例，（和以上FPR重疊）

$FAR=\frac{FP}{FP+TN}=\frac{非同人分數>T}{非同人比較的次數},\\$

FAR就是我們比較不同人的圖象時，把其中的影象對當成同一個人影象的比例。FAR越小越好。

(3) FRR

（False Reject Rate）表示錯誤拒絕的比例，（和以上FNR重疊）

$FRR=\frac{FN}{FN+TP}=\frac{同人比較分數<T}{同人比較的次數},\\$

FRR就是把相同的人的影象對，當成不同的人的了。

(4) EER

（Equal Error Rate）表示等誤率。EER為取某個T值時，使得FAR=FRR時，FAR或者FRR的值。一般畫兩條曲線，求交點。這裡我們看到這裡的關於EER的定義與上面並不矛盾。

(5) TAR @ FAR=0.001/0.01/0.1

，在FAR=0。001/0。01/0。1的情況下，TAR的值。之所以採用這種形式是因為在不同的FAR下度量的TAR是會不同的。當我們增大閾值T之後，可以減小FAR，但是同時TAR也會減小，這相當於提高了標準，會有更少的影象對滿足相似度的要求。相反，當我們減小閾值T之後，可以增大TAR正確接收的比例，但是同時FAR也會增加，使得錯誤接受的比例增加。考慮極端情況，相似度閾值T=1時，所有匹配都將被拒絕，不會發生錯誤接受，也不存在正確接受，FAR=0，TAR=0。相反，也能夠把T設定為0。所以在報告TAR時，必須說明FAR的取值才是有意義的，否則的話，把FAR設定為1，那麼TAR也就能為1。TAR值越大越好。觀察了這裡的TAR=TPR，FAR=FPR之後，其實就是在ROC曲線上找到FPR=0。1/0。01/0。001時，TPR的值。

2。2 人臉識別常用評估指標及其Code實現

（scikit-learn中評估分類器效能的度量，像混淆矩陣、ROC、AUC等 - Lavi的專欄 - CSDN部落格 \ 人臉識別中的ROC和AUC - 程式設計師大本營 \ TensorFlow——實現人臉識別實驗精講（Face Recognition using Tensorflow））

2.2.1 人臉識別常用評估指標

([2017 CVPR] Neural Aggregation Network for Video Face Recognition)

人臉識別的模型評估指標主要包含Accuracy、AUC、ROC 和 TAR @ FAR=0。1/0。01/0。001。

2.2.2 相應的Code實現

先有一個基本的使用程式碼介紹

from sklearn import metrics

from scipy import interpolate

from scipy。optimize import brentq

import matplotlib。pyplot as plt

# given y_test （the actual labels）， y_pred_class （predicted results）， y_pred_prob （predicted probabilities）

y_test = ［1， 0， 0， 1， 0， 0， 1， 1， 0， 0］ # true values

y_pred_class = ［0， 0， 0， 0， 0， 0， 0， 1， 0， 1］ # predicted values

y_pred_prob = ［0。36752429， 0。28356344， 0。28895886， 0。4141062， 0。15896027， 0。17065156， \

0。49889026， 0。51341541， 0。27678612， 0。67189438］ # predicted probabilities

y_test = ［1。0， 0。0， 0。0， 1。0， 0。0， 0。0， 1。0， 1。0， 0。0， 0。0］

y_test = y_test*5 + y_pred_class*4 + ［1， 1， 1， 1， 1， 0， 0， 0， 0， 0］

y_pred_prob = y_pred_prob*10

y_pred_class = y_pred_class*10

# confusion matrix

confusion = metrics。confusion_matrix（y_test， y_pred_class）

# TP TN FP FN

TP = confusion［1， 1］

TN = confusion［0， 0］

FP = confusion［0， 1］

FN = confusion［1， 0］

# accuracy

ACC = （TP+TN） / float（TP+TN+FN+FP）

# ACC = metrics。accuracy_score（y_test， y_pred_class） another way

# precision

PPV = TP / float（TP+FP）

# PPV = metrics。precision_score（y_test， y_pred_class）

# TPR， sensitivity， recall

TPR = TP / float（TP+FN）

# TPR = metrics。recall_score（y_test， y_pred_class）

# TNR， specificity

TNR = TN / float（TN+FP）

# FPR

FPR = FP / float（TN+FP）

# FPR = 1 - TNR

# F1 score

F1_score = （2*PPV*TPR） / （PPV+TPR）

# F1_score = metrics。f1_score（y_test， y_pred_class）

# ROC

# IMPORTANT： first argument is true values， second argument is predicted probabilities

fpr， tpr， thresholds = metrics。roc_curve（y_test， y_pred_prob）

plt。plot（fpr， tpr， ‘r’）

plt。xlim（［0。0， 1。0］）

plt。ylim（［0。0， 1。0］）

plt。title（‘ROC curve’）

plt。xlabel（‘FPR （False Positive Rate）’）

plt。ylabel（‘TPR （True Positive Rate）’）

plt。grid（True）

plt。show（）

plt。savefig（‘ROC。png’）

# AUC

# IMPORTANT： first argument is true values， second argument is predicted probabilities

# AUC = metrics。roc_auc_score（y_test， y_pred_prob）

AUC = metrics。auc（fpr， tpr）

# # calculate cross-validated AUC

# from sklearn。cross_validation import cross_val_score

# mean_socre = cross_val_score（logreg， X， y， cv=10， scoring=‘roc_auc’）。mean（）

# print（mean_socre）

# EER

EER = brentq（lambda x： 1。 - x - interpolate。interp1d（fpr， tpr）（x）， 0。， 1。）

# TAR @ FAR = 0。1 / 0。01 / 0。001， FAR = FPR， TAR = TPR

TAR_FAR_E1 = brentq（lambda x： 0。1 - interpolate。interp1d（tpr， fpr）（x）， 0。， 1。）

TAR_FAR_E2 = brentq（lambda x： 0。01 - interpolate。interp1d（tpr， fpr）（x）， 0。， 1。）

TAR_FAR_E3 = brentq（lambda x： 0。001 - interpolate。interp1d（tpr， fpr）（x）， 0。， 1。）

再有一個介面，用於自動化評估

from sklearn import metrics

from scipy import interpolate

from scipy。optimize import brentq

import matplotlib。pyplot as plt

def calculate_performance_split（y_test， y_pred_class， y_pred_prob， split_id）：

# confusion matrix

confusion = metrics。confusion_matrix（y_test， y_pred_class）

# TP TN FP FN

TP = confusion［1， 1］

TN = confusion［0， 0］

FP = confusion［0， 1］

FN = confusion［1， 0］

# accuracy

ACC = （TP+TN） / float（TP+TN+FN+FP）

# ACC = metrics。accuracy_score（y_test， y_pred_class） another way

# precision

PPV = TP / float（TP+FP）

# PPV = metrics。precision_score（y_test， y_pred_class）

# TPR， sensitivity， recall

TPR = TP / float（TP+FN）

# TPR = metrics。recall_score（y_test， y_pred_class）

# TNR， specificity

TNR = TN / float（TN+FP）

# FPR

FPR = FP / float（TN+FP）

# FPR = 1 - TNR

# F1 score

F1_score = （2*PPV*TPR） / （PPV+TPR）

# F1_score = metrics。f1_score（y_test， y_pred_class）

# ROC

# IMPORTANT： first argument is true values， second argument is predicted probabilities

fpr， tpr， thresholds = metrics。roc_curve（y_test， y_pred_prob）

fig = plt。figure（1）

plt。plot（fpr， tpr， ‘r’）

plt。xlim（［0。0， 1。0］）

plt。ylim（［0。0， 1。0］）

plt。title（‘ROC curve’）

plt。xlabel（‘FPR （False Positive Rate）’）

plt。ylabel（‘TPR （True Positive Rate）’）

plt。grid（True）

plt。draw（）

plt。pause（4）

plt。savefig（‘ROC_’ + str（split_id） + ‘。png’）

plt。close（fig）

# AUC

# IMPORTANT： first argument is true values， second argument is predicted probabilities

# AUC = metrics。roc_auc_score（y_test， y_pred_prob）

AUC = metrics。auc（fpr， tpr）

# # calculate cross-validated AUC

# from sklearn。cross_validation import cross_val_score

# mean_socre = cross_val_score（logreg， X， y， cv=10， scoring=‘roc_auc’）。mean（）

# print（mean_socre）

# EER

EER = brentq（lambda x： 1。 - x - interpolate。interp1d（fpr， tpr）（x）， 0。， 1。）

# TAR @ FAR = 0。1 / 0。01 / 0。001， FAR = FPR， TAR = TPR

TAR_FAR_E1 = brentq（lambda x： 0。1 - interpolate。interp1d（tpr， fpr）（x）， 0。， 1。）

TAR_FAR_E2 = brentq（lambda x： 0。01 - interpolate。interp1d（tpr， fpr）（x）， 0。， 1。）

TAR_FAR_E3 = brentq（lambda x： 0。001 - interpolate。interp1d（tpr， fpr）（x）， 0。， 1。）

return ACC， AUC， TAR_FAR_E1， TAR_FAR_E2， TAR_FAR_E3， fpr， tpr

if __name__ == ‘__main__’：

# given y_test （the actual labels）， y_pred_class （predicted results）， y_pred_prob （predicted probabilities）

y_test = ［1， 0， 0， 1， 0， 0， 1， 1， 0， 0］ # true values

y_pred_class = ［0， 0， 0， 0， 0， 0， 0， 1， 0， 1］ # predicted values

y_pred_prob = ［0。36752429， 0。28356344， 0。28895886， 0。4141062， 0。15896027， 0。17065156， \

0。49889026， 0。51341541， 0。27678612， 0。67189438］ # predicted probabilities

y_test = ［1。0， 0。0， 0。0， 1。0， 0。0， 0。0， 1。0， 1。0， 0。0， 0。0］

y_test = y_test*5 + y_pred_class*4 + ［1， 1， 1， 1， 1， 0， 0， 0， 0， 0］

y_pred_prob = y_pred_prob*10

y_pred_class = y_pred_class*10

ACC， AUC， TAR_FAR_E1， TAR_FAR_E2， TAR_FAR_E3， fpr， tpr = calculate_performance_split（y_test， y_pred_class， y_pred_prob， 1）

print（ACC， AUC， TAR_FAR_E1， TAR_FAR_E2， TAR_FAR_E3， fpr， tpr）

2.3 各種資料集的介紹以及使用方法

2.3.1 YTF (YouTube Faces Database)

［2011 CVPR］ Face Recognition in Unconstrained Videos with Matched Background Similarity。

官網：YouTube Faces Database ： Main；備用下載網址：

https：//

blog。csdn。net/u01365959

8/article/details/100680431

。

3425個影片，1595個主體，對於每個主體有平均2。15個影片。最短的剪輯時間是48幀，最長的剪輯時間是6070幀，對於一個影片剪輯的平均長度是181。3幀。

資料集編碼。

所有的影片幀使用多個好的人臉描述器來編碼。我們在每一幀中考慮人臉檢測器的輸出，人臉包圍框擴充套件成了原始尺寸的2。2倍並且從幀中分割出來，隨後調整尺度至200*200。隨後再進行一次分割，留下在人臉中心的100*100。隨著一個灰度變換，影象透過自動調整檢測到的人臉特徵點來進行對齊。我們使用如下的描述器，LBP、CSLBP 和 FPLBP。

Benchmark測試。

和LFW benchmark的例子一樣，我們提供標準的、10折的，交叉驗證影象對匹配（相同/不同）測試。特別地，我們從資料集中隨機收集5000個影片對，其中一半的影片對屬於同一個人，另一半是不同人的。這些影片對被分成10個分割集合。每個分割集合包含250個正和250個負影片對。影片對分開，確保分割集合保持主體相互獨有。如果一個主體的影片在一個分割集中出現，那個主體的影片不應該包含在其他任何分割集中。benchmark的目標是決定，對於每一個分割集，哪些是相同，哪些是不同的，透過從剩餘9個分割集合中的影片進行訓練。我們發現，鼓勵分類技術，學習是什麼使得人臉相似或者不同，而不是學習特定個體的外觀屬性。

在YTF的官網上，說了YTF是一個人臉影片的資料集，用來研究在影片中的無約束人臉識別問題。資料集包含3425個影片，屬於1595個不同的人。目標是衡量在這些影片上的影片對匹配技術的效能。同時還提供影片中的人臉的描述器編碼。

Errata:

We were recently sent a list of errors that occurred during the labeling process。 This information is provided as two files： YTFErrors。csv and splits_corrected。txt。 We will later on publish recommendations for reporting results obtained on the corrected data set。

下載檔案包括：

(1) WolfHassnerMaoz_CVPR11.pdf

，是發表了YTF資料集的文章

(2) frame_images_DB.tar.gz

，其中包含1595個資料夾，每個資料夾中包含1-6個子資料夾，3083個txt檔案對應3083個主體，看起來有點詭異，主體只有1595個啊。對比了一些資料夾和txt檔案，發現txt檔案其實是多出來很多主體是資料夾不含有的。

TXT中包含的是主體對應的每個影片段的每幀的bbox資訊，如下

Aaron_Eckhart\0\0。555。jpg，0，237，137，84，84，0。0，1

它的含義是，

filename，［ignore］，x，y，width，height，［ignore］，［ignore］

其中，x，y 是人臉中心，width 和 height 是框住了人臉的方形框的寬和高。

filename，比如Aaron_Eckhart\0\0。555。jpg，給出的是檔名。

(3) aligned_images_DB.tar.gz

，這個和（2）是相似的，區別只是這個經歷了一些操作，包含：a。人臉檢測，把包圍框放大2。2倍，並從幀中摳出來；b。人臉對齊

(4) descriptors_DB.tar.gz

，

Contains mat files with the descriptors of the frames。

The directory structure is： subject_name\mat files

For each video there are two files： aligned_video_1。mat，video_1。mat

The files contains descriptors per frame， several descriptors type per frame。

One contains the aligned version of the faces in the frame and the other contain the not aligned version。

Each of the above file has a struct with the following （for example a video with 80 frames）：

VID_DESCS_FPLBP：［560x80 double］

VID_DESCS_LBP：［1770x80 double］

VID_DESCS_CSLBP：［480x80 double］

VID_DESCS_FILENAMES：{1x80 cell}

(5) meta_data.tar.gz,

Contains the meta_and_splits。mat file， which provides an easy way for accessing the mat files in the descriptors DB。 The Splits is a data structure dividing the data set to 10 independent splits。

Each triplet in the Splits is in the format of （index1， index2， is_same_person）， where index1 and index2 are the indices in the mat_names structure。 All together 5000 pairs divided equally to 10 independent splits， with 2500 same pairs and 2500 not-same pairs。

video_labels：［1x3425 double］

video_names：{3425x1 cell}

mat_names：{3425x1 cell}

Splits：［500x3x10 double］

(6) headpose_DB.tar.gz,

Contains mat files with the three rotation angles of the head for each frame in the data set。

The directory structure is：

headorient_apirun_subject_name_video_number。mat

Each mat file contains a struct with the following：

headpose：［3x60 double］

(7) sources.tar.gz,

for running the benchmark tests and implementation of all methods。

(8) spllits.txt,

資料分割集分配。

常用評價手段包括：

根據（［2017 CVPR］ Neural Aggregation Network for Video Face Recognition），

(1) mean accuracy

(2) mean auc

(3) average roc curves

他們的計算都是在經過10-fold交叉驗證後得到的平均結果，參考2。2節內容和如下交叉驗證內容。

交叉驗證

（交叉驗證 - stardsd - 部落格園 / 交叉驗證_百度百科）

交叉驗證（Cross Validation），又稱作迴圈估計（Rotation Estimation），該理論由Seymour Geisser提出。

(1) Holdout驗證

資料並未交叉使用，隨機從最初的樣本中選出部分，形成驗證資料，剩餘的當作訓練資料。一般來說，少於原本樣本三分之一的資料被選做驗證資料。

(2) k-fold cross validation (k折交叉驗證)

初始樣本集合分割成k個子集，一個單獨的子集作為驗證模型的資料，其他k-1個用來訓練。驗證重複k次，每個樣本子集驗證一次，平均k次驗證的結果或其他方式，最終得到一個單一估測。優勢是，同時重複使用隨機產生的子樣本進行訓練和驗證。最常用的是10-fold cross validation。

(3) LOOCV (Leave-One-Out Cross Validation，留一交叉驗證)

留一交叉驗證是一個極端的例子，樣本集合的大小為N，那麼用來驗證的樣本為1，剩下的用作訓練，進行N次。

(4) Bootstrapping交叉驗證

自舉法用的很少。

關於YTF的確切使用案例，

2。3。2 MS-Celeb-1M

cmusatyalab/openface

alfredtofu/MS-Celeb-1M_Extractor

StevenWHU/preprocessing_for_ms-celeb-1M

人臉識別資料集及評估協議(尚未完成)

人死了叫鬼，鬼死了叫什麼？

著名鑽石以及它們的故事

隨便看看

父系氏族公社時期的代表性文化遺址？

扶海洲的傳說？

張家界新娘是如何被發現的？

東風41各個省都有嗎？

人臉識別資料集及評估協議(尚未完成)

人死了叫鬼，鬼死了叫什麼？

著名鑽石以及它們的故事

猜你喜歡

影象最近鄰插值及python實現

周志華《機器學習》LDA深入補充推導和python實現（多分類問題）

Python 畫出好看的圖

隨便看看

父系氏族公社時期的代表性文化遺址？

扶海洲的傳說？

張家界新娘是如何被發現的？

東風41各個省都有嗎？