PifPaf: Composite Fields for Human Pose Estimation

paper連結：

https：//

arxiv。org/pdf/1903。0659

3。pdf

程式碼連結：

https：//

github。com/vita-epfl/op

enpifpaf

一篇multi-human pose estimate的文章，貌似我從上個月就開始看了，但是斷斷續續的一直沒認真讀懂。來更新下現在的理解：

ps：對沒接觸過human pose的小夥伴來說，可能下文的 pif paf 你會很懵，這是啥啥？

所以可以去看看姊妹篇： openpose［1］

文章很直接， pif（Part Intensity Fields）部分出key-point點， paf（Part Association Fields）部分出點之間的連線。注意是multi-person，所以可以區分開來多人進行連線的。 backbone用的是res-50。

看看pif是什麼： 17x5的維度。

17代表人體的17種關鍵點， 5代表： pij =（pijc， pijx， pijy， pijb， pijσ）

表示在location（i， j）處，此處是一個關鍵點的confidence分數是 pijc，（ pijx， pijy，）可以認為是該位置處專屬的一個offset（趨向真實的ground truth），或者理解為這個點上的一個“趨勢”向量也行，

pijb 是一起學習的一個引數.

pijσ是一個尺度度量引數，後面對heat map的“精修”會用到它。因為人有大小，那麼它的關鍵點也對應大小變化著。程式碼去看encoder/pif。py 深究。。。

怎麼對heat map 精修？

fig 1

上圖（a）可以看到， “候選”關鍵點是一堆堆的，很離散。。（b）是上面提到的（xij， yij）向量，也是密密麻麻的很發散。。（c）利用公式（1），得到對映至高解析度上的更精準的heat map。

fig 2

呼叫在： decoder/pif。py：

def

_target_intensities

（

self

，

v_th

0。1

）：

start

time

。

perf_counter

（）

targets

。

zeros

（（

self

。

pif

。

shape

［

］，

int

（

self

。

pif

。

shape

［

］

self

。

stride

），

int

（

self

。

pif

。

shape

［

］

self

。

stride

）））

scales

。

zeros

（

targets

。

shape

）

。

zeros

（

targets

。

shape

）

for

，

scale

，

zip

（

targets

，

self

。

pif

，

scales

，

）：

，

［：，

［

］

v_th

］

# v 是conf

self

。

stride

self

。

stride

self

。

stride

# 在這裡實現 pif 出點的 “精修”

scalar_square_add_gauss

（

，

self

。

pif_nn

，

truncate

0。5

）

scalar_square_add_constant

（

scale

，

）

scalar_square_add_constant

（

，

）

targets

。

minimum

（

1。0

，

targets

）

……………………。。

def

_pifhr_seeds

（

self

）：

start

time

。

perf_counter

（）

seeds

［］

for

field_i

，

（

，

）

enumerate

（

zip

（

self

。

_pifhr

，

self

。

_pifhr_scales

））：

index_fields

index_field

（

。

shape

）

candidates

。

concatenate

（（

index_fields

，

。

expand_dims

（

，

）），

）

mask

self

。

seed_threshold

candidates

。

moveaxis

（

candidates

［：，

mask

］，

，

）

occupied

。

zeros

（

。

shape

）

for

sorted

（

candidates

，

key

lambda

：

［

］，

reverse

True

）：

，

int

（

［

］），

int

（

［

］）

occupied

［

，

］：

continue

width

max

（

，

［

，

］）

scalar_square_add_single

（

occupied

，

［

］，

［

］，

width

2。0

，

1。0

）

seeds

。

append

（（

［

］，

field_i

，

［

］

self

。

stride

，

［

］

self

。

stride

））

self

。

debug_visualizer

：

field_i

self

。

debug_visualizer

。

pif_indices

：

self

。

log

。

debug

（

‘occupied seed， field

’

，

field_i

）

self

。

debug_visualizer

。

occupied

（

occupied

）

seeds

list

（

sorted

（

seeds

，

reverse

True

））

len

（

seeds

）

500

：

# 最多取 500 個候選點？

seeds

［

500

］［

］

0。1

：

# conf閾值必須>0。1

seeds

［

for

seeds

［

］

0。1

］

else

：

seeds

［：

500

］

self

。

debug_visualizer

：

self

。

debug_visualizer

。

seeds

（

seeds

，

self

。

stride

）

self

。

log

。

debug

（

‘seeds

，

%。3f

s’

，

len

（

seeds

），

time

。

perf_counter

（）

start

）

return

seeds

scalar_square_add_gauss 實現： functional。pyx

# 公式（1）

def

scalar_square_add_gauss

（

float

［：，

：］

field

，

x_np

，

y_np

，

sigma_np

，

v_np

，

float

truncate

2。0

）：

sigma_np

。

maximum

（

1。0

，

sigma_np

）

width_np

。

maximum

（

1。0

，

truncate

sigma_np

）

minx_np

。

round

（

x_np

width_np

）

。

astype

（

。

int

）

minx_np

。

clip

（

minx_np

，

field

。

shape

［

］

）

miny_np

。

round

（

y_np

width_np

）

。

astype

（

。

int

）

miny_np

。

clip

（

miny_np

，

field

。

shape

［

］

）

maxx_np

。

round

（

x_np

width_np

）

。

astype

（

。

int

）

maxx_np

。

clip

（

maxx_np

，

minx_np

，

field

。

shape

［

］）

maxy_np

。

round

（

y_np

width_np

）

。

astype

（

。

int

）

maxy_np

。

clip

（

maxy_np

，

miny_np

，

field

。

shape

［

］）

cdef

float

［：］

x_np

cdef

float

［：］

y_np

cdef

float

［：］

sigma

sigma_np

cdef

long

［：］

minx

minx_np

cdef

long

［：］

miny

miny_np

cdef

long

［：］

maxx

maxx_np

cdef

long

［：］

maxy

maxy_np

cdef

float

［：］

v_np

cdef

Py_ssize_t

，

cdef

Py_ssize_t

minx

。

shape

［

］

cdef

float

deltax2

，

deltay2

cdef

float

cdef

float

，

csigma2

for

range

（

）：

csigma2

sigma

［

］

［

］

# offset x

［

］

# offset y

［

］

# confidence

for

range

（

minx

［

］，

maxx

［

］）：

deltax2

（

）

for

range

（

miny

［

］，

maxy

［

］）：

deltay2

（

）

approx_exp

（

0。5

（

deltax2

deltay2

）

csigma2

）

field

［

，

］

2。然後看paf是什麼， 19x7維度， 19表示19條關鍵點連線線， 7表示：

aij = （aijc， aijx1；， aijy1， aijb1， aijx2， aijy2， aijb2）

可以和上面的pif類似理解。 aijc表示該（i，j）位置處在19條關鍵點連線中某一條，的confidence分數（注意它是19 channel）。（aijx1， aijy1， aijx2， aijy2）則是兩個“趨勢”向量（offset）。因為是某條連線的中間點嘛，所以會有兩個方向上點的offset預測。 aijb1 aijb2則也是兩個向量的“寄生”引數。

再看paf值怎麼定？一口氣要確定兩點的資訊。

比如我們要找右膝和右腳踝的連線，那就在當前的（i，j）位置找最近的一個腳踝或者膝蓋（你可能會問怎麼找？那不是每個點有（xijx1 xijy1 xijx2 xijy2 嘛用這倆向量找最小唄~）），然後此點確定後，根據single person ground truth，對應的該人的另一個關鍵點就也確定了。程式碼去看encoder/paf。py 深究。。。

3。再看 Greedy Decoding 是什麼？

因為上面出點也由pif出好了，兩兩點之間的連線也可以由paf搞定（但注意multi-person會出現不同人之間點互連的問題）。所以需要一個重新組合的過程。

fig 3

s（a， x）是在關鍵點x處，某條“預備”連線a的分數。 ac依然是連線a的confidence分數，中間的exp（***）部分你就認為是該關鍵點的第一個offset向量用two-tailed Laplace distribution“包裝”下的意思吧。。。。 f2部分是和公式（1）類似的一個計算。湊合起來，就是經過關鍵點x的，某一條連線的score了。

這部分的演算法細節還請移步［2］，程式碼的話仔細看decoder/pifpaf。py

def

_grow_connection

（

self

，

paf_field

）：

assert

len

（

）

assert

paf_field

。

shape

［

］

# source value

s_mask

paf_mask_center

（

paf_field

，

［

］，

［

］，

sigma

2。0

）

not

。

any

（

s_mask

）：

return

，

paf_field

［：，

s_mask

］

# source distance

。

linalg

。

norm

（

。

expand_dims

（

，

）

paf_field

［

：

］，

axis

）

b_source

paf_field

［

］

b_target

paf_field

［

］

# combined value and source distance

paf_field

［

］

scores

。

exp

（

1。0

b_source

）

# two-tailed cumulative Laplace

self

。

connection_method

‘median’

：

return

self

。

_target_with_median

（

paf_field

［

：

］，

scores

，

sigma

1。0

）

self

。

connection_method

‘max’

：

return

self

。

_target_with_maxscore

（

paf_field

［

：

］，

scores

）

raise

Exception

（

‘connection method not known’

）

def

_target_with_median

（

self

，

target_coordinates

，

scores

，

sigma

，

max_steps

）：

target_coordinates

。

moveaxis

（

target_coordinates

，

）

assert

target_coordinates

。

shape

［

］

scores

。

shape

［

］

target_coordinates

。

shape

［

］

：

return

（

target_coordinates

［

］［

］，

target_coordinates

［

］［

］，

。

tanh

（

scores

［

］

3。0

self

。

paf_nn

））

。

sum

（

target_coordinates

。

expand_dims

（

scores

，

），

axis

）

。

sum

（

scores

）

target_coordinates

。

shape

［

］

：

return

［

］，

［

］，

。

tanh

（

。

sum

（

scores

）

3。0

self

。

paf_nn

）

，

prev_d

weiszfeld_nd

（

target_coordinates

，

weights

scores

，

max_steps

）

closest

prev_d

sigma

close_scores

。

sort

（

scores

［

closest

］）［

self

。

paf_nn

：］

score

。

tanh

（

。

sum

（

close_scores

）

3。0

self

。

paf_nn

）

return

（

［

］，

［

］，

score

）

補充下paper使用的Laplace loss：

fig 4

loss

。

# regress loss

def

laplace_loss

（

，

logb

，

weight

None

）：

“”“Loss based on Laplace Distribution。

Loss for a single two-dimensional vector （x1， x2） with radial

spread b and true （t1， t2） vector。

”“”

norm

torch

。

sqrt

（（

）

（

）

# |x-u|

# log2 = 0。694 0。694 + logb == log2b

losses

0。694

logb

norm

torch

。

exp

（

logb

）

# torch。exp（-logb） == 1 / b

weight

not

None

：

losses

weight

return

torch

。

sum

（

losses

）

loss。py 程式碼往下看可以發現，這個神奇的miu，是target offset：

def

forward

（

self

，

）：

# pylint： disable=arguments-differ

……。。。。。

# see line29 laplace loss

reg_losses

。

append

（

self

。

regression_loss

（

torch

。

masked_select

（

x_reg

［：，

：，

］，

reg_masks

），

torch

。

masked_select

（

x_reg

［：，

：，

］，

reg_masks

），

# pred 的 offset x y

torch

。

masked_select

（

x_spread

，

reg_masks

），

# logb 引數

torch

。

masked_select

（

target_reg

［：，

：，

］，

reg_masks

），

# target offset x y

torch

。

masked_select

（

target_reg

［：，

：，

］，

reg_masks

），

weight

，

）

1000。0

batch_size

）

# 1000 normalizer？

ok 暫時寫這些，有新的體會會再來補充。。。。

歡迎評論區指正指點。。。。 ~~~

0621更新：附一段webcam。py程式碼，可替換原開原始碼，跑個影片demo see see：

“”“Webcam demo application。

Example command：

python -m openpifpaf。webcam

——checkpoint outputs/resnet50block5-pif-paf-edge401-190621-093216。pkl。epoch052

——source=”mvp_human_fail_3_0615。avi“

”“”

import

numpy

import

argparse

import

time

import

matplotlib

import

matplotlib。pyplot

plt

import

torch

import

cv2

from

。network

import

nets

from

。

import

decoder

，

show

，

transforms

COCO_PERSON_SKELETON

［

，

］，

［

，

］，

［

，

］，

［

，

］，

［

，

］，

［

，

］，

［

，

］，

［

，

］，

［

，

］，

［

，

］，

［

，

］，

［

，

］，

［

，

］，

［

，

］，

［

，

］，

［

，

］，

［

，

］，

［

，

］，

［

，

］］

def

plot_points

（

img

，

labels

，

skeleton

）：

nums

，

kth

labels

。

shape

［：

］

for

range

（

nums

）：

# r = np。random。randint（0， 255） # 可以考慮自己隨機顏色

# g = np。random。randint（0， 255）

# b = np。random。randint（0， 255）

# color = （r， g， b）

point_color

（

255

，

255

，

255

）

# 白點

line_color

（

，

255

）

# 紅線

label

labels

［

，

：，

：］

［］

for

range

（

kth

）：

label

［

］［

］

0。7

：

。

append

（（

label

［

］［

］，

label

［

］［

］，

））

# j+1代表這個點的index，後續點連線用到

point_size

thickness

for

point

：

cv2

。

circle

（

img

，

（

int

（

point

［

］），

int

（

point

［

］）），

point_size

，

point_color

，

thickness

）

for

connecte

skeleton

：

for

：

for

：

［

］

connecte

［

］

and

［

］

connecte

［

］：

cv2

。

line

（

img

，

（

int

（

［

］），

int

（

［

］）），

（

int

（

［

］），

int

（

［

］）），

line_color

，

）

def

main

（）：

parser

argparse

。

ArgumentParser

（

description

__doc__

，

formatter_class

argparse

。

ArgumentDefaultsHelpFormatter

，

）

nets

。

cli

（

parser

）

decoder

。

cli

（

parser

，

force_complete_pose

False

，

instance_threshold

0。1

，

seed_threshold

0。5

）

parser

。

add_argument

（

‘——no-colored-connections’

，

dest

‘colored_connections’

，

default

True

，

action

‘store_false’

，

help

‘do not use colored connections to draw poses’

）

parser

。

add_argument

（

‘——disable-cuda’

，

action

‘store_true’

，

help

‘disable CUDA’

）

parser

。

add_argument

（

‘——source’

，

default

，

help

‘OpenCV source url。 Integer for webcams。 Or ipwebcam streams。’

）

parser

。

add_argument

（

‘——scale’

，

default

0。1

，

type

float

，

help

‘input image scale factor’

）

args

parser

。

parse_args

（）

# check whether source should be an int

len

（

args

。

source

）

：

args

。

source

int

（

args

。

source

）

# add args。device

args

。

device

torch

。

device

（

‘cpu’

）

not

args

。

disable_cuda

and

torch

。

cuda

。

is_available

（）：

args

。

device

torch

。

device

（

‘cuda’

）

# load model

model

，

nets

。

factory_from_args

（

args

）

model

。

（

args

。

device

）

processor

decoder

。

factory_from_args

（

args

，

model

）

# name = np。random。randint（1，100）

videoCapture

cv2

。

VideoCapture

（

args

。

source

）

# 待檢測的video

fps

videoCapture

。

get

（

cv2

。

CAP_PROP_FPS

）

size

（

int

（

videoCapture

。

get

（

cv2

。

CAP_PROP_FRAME_WIDTH

）），

int

（

videoCapture

。

get

（

cv2

。

CAP_PROP_FRAME_HEIGHT

）））

scale

True

# 是否scale影片幀的尺寸

scale

：

size

（

683

，

384

）

ret

，

frame

videoCapture

。

read

（）

videoWriter

cv2

。

VideoWriter

（

‘show。avi’

，

cv2

。

VideoWriter_fourcc

（

‘MJPG’

），

fps

，

size

）

frame_cnt

while

ret

：

frame1

cv2

。

resize

（

frame

，

size

）

# （384， 683， 3）

image

cv2

。

cvtColor

（

frame1

，

cv2

。

COLOR_BGR2RGB

）

processed_image_cpu

transforms

。

image_transform

（

image

。

copy

（））

# normalize

processed_image

processed_image_cpu

。

contiguous

（）

。

（

args

。

device

，

non_blocking

True

）

# transpose 2，0，1

fields

processor

。

fields

（

torch

。

unsqueeze

（

processed_image

，

））［

］

keypoint_sets

，

processor

。

keypoint_sets

（

fields

）

keypoint_sets

。

shape

［

］

：

plot_points

（

image

，

keypoint_sets

，

COCO_PERSON_SKELETON

）

videoWriter

。

write

（

image

）

frame_cnt

（

‘幀數： ’

，

frame_cnt

）

ret

，

frame

videoCapture

。

read

（）

__name__

‘__main__’

：

main

（）

pps：參考連結：

https：//

blog。csdn。net/murdock_c

/article/details/88851912

［1］ OpenPose： Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

［2］ Personlab： Person pose estimation and instance segmentation with a bottomup， part-based， geometric embedding model

PifPaf: Composite Fields for Human Pose Estimation

名著遠離塵囂的作者是誰？

我是上班族，我在裝修房子

隨便看看

“酉良”的英文？

養魚換下來的活性炭可以種蘭花嗎？

歌曲遠情原唱？

在4s店更換車門現在後悔了？

PifPaf: Composite Fields for Human Pose Estimation

名著遠離塵囂的作者是誰？

我是上班族，我在裝修房子

猜你喜歡

如何自學成為一個駭客，零基礎看有關書籍有用嗎？

Python繪製二元函式曲面

BGM叨嗶叨丨這個殺手不太冷

隨便看看

“酉良”的英文？

養魚換下來的活性炭可以種蘭花嗎？

歌曲遠情原唱？

在4s店更換車門現在後悔了？