paper連結:

https://

arxiv。org/pdf/1903。0659

3。pdf

程式碼連結:

https://

github。com/vita-epfl/op

enpifpaf

一篇multi-human pose estimate的文章, 貌似我從上個月就開始看了,但是斷斷續續的一直沒認真讀懂。 來更新下現在的理解:

ps: 對沒接觸過human pose的小夥伴來說,可能下文的 pif paf 你會很懵,這是啥啥?

所以可以去看看姊妹篇: openpose[1]

文章很直接, pif(Part Intensity Fields)部分出key-point點, paf(Part Association Fields) 部分出點之間的連線。 注意是multi-person, 所以可以區分開來多人進行連線的。 backbone用的是res-50。

看看pif是什麼: 17x5的維度。

17代表人體的17種關鍵點, 5代表: pij =(pijc, pijx, pijy, pijb, pijσ)

表示在location(i, j)處, 此處是一個關鍵點的confidence分數是 pijc, ( pijx, pijy,)可以認為是該位置處專屬的一個offset(趨向真實的ground truth), 或者理解為這個點上的一個“趨勢”向量也行,

pijb 是一起學習的一個引數.

pijσ是一個尺度度量引數 , 後面對heat map的“精修”會用到它。 因為人有大小, 那麼它的關鍵點也對應大小變化著。 程式碼去看encoder/pif。py 深究。。。

怎麼對heat map 精修?

PifPaf: Composite Fields for Human Pose Estimation

fig 1

上圖(a)可以看到, “候選”關鍵點是一堆堆的, 很離散。。 (b)是上面提到的(xij, yij)向量, 也是密密麻麻的很發散。。 (c) 利用公式(1), 得到對映至高解析度上的更精準的heat map。

PifPaf: Composite Fields for Human Pose Estimation

fig 2

呼叫在: decoder/pif。py:

def

_target_intensities

self

v_th

=

0。1

):

start

=

time

perf_counter

()

targets

=

np

zeros

((

self

pif

shape

0

],

int

self

pif

shape

2

*

self

stride

),

int

self

pif

shape

3

*

self

stride

)))

scales

=

np

zeros

targets

shape

ns

=

np

zeros

targets

shape

for

t

p

scale

n

in

zip

targets

self

pif

scales

ns

):

v

x

y

s

=

p

[:,

p

0

>

v_th

# v 是conf

x

=

x

*

self

stride

y

=

y

*

self

stride

s

=

s

*

self

stride

# 在這裡實現 pif 出點的 “精修”

scalar_square_add_gauss

t

x

y

s

v

/

self

pif_nn

truncate

=

0。5

scalar_square_add_constant

scale

x

y

s

s

*

v

scalar_square_add_constant

n

x

y

s

v

targets

=

np

minimum

1。0

targets

……………………。。

def

_pifhr_seeds

self

):

start

=

time

perf_counter

()

seeds

=

[]

for

field_i

f

s

in

enumerate

zip

self

_pifhr

self

_pifhr_scales

)):

index_fields

=

index_field

f

shape

candidates

=

np

concatenate

((

index_fields

np

expand_dims

f

0

)),

0

mask

=

f

>

self

seed_threshold

candidates

=

np

moveaxis

candidates

[:,

mask

],

0

-

1

occupied

=

np

zeros

s

shape

for

c

in

sorted

candidates

key

=

lambda

c

c

2

],

reverse

=

True

):

i

j

=

int

c

0

]),

int

c

1

])

if

occupied

j

i

]:

continue

width

=

max

4

s

j

i

])

scalar_square_add_single

occupied

c

0

],

c

1

],

width

/

2。0

1。0

seeds

append

((

c

2

],

field_i

c

0

/

self

stride

c

1

/

self

stride

))

if

self

debug_visualizer

if

field_i

in

self

debug_visualizer

pif_indices

self

log

debug

‘occupied seed, field

%d

field_i

self

debug_visualizer

occupied

occupied

seeds

=

list

sorted

seeds

reverse

=

True

))

if

len

seeds

>

500

# 最多取 500 個候選點?

if

seeds

500

][

0

>

0。1

# conf閾值必須>0。1

seeds

=

s

for

s

in

seeds

if

s

0

>

0。1

else

seeds

=

seeds

[:

500

if

self

debug_visualizer

self

debug_visualizer

seeds

seeds

self

stride

self

log

debug

‘seeds

%d

%。3f

s’

len

seeds

),

time

perf_counter

()

-

start

return

seeds

scalar_square_add_gauss 實現: functional。pyx

# 公式(1)

def

scalar_square_add_gauss

float

[:,

:]

field

x_np

y_np

sigma_np

v_np

float

truncate

=

2。0

):

sigma_np

=

np

maximum

1。0

sigma_np

width_np

=

np

maximum

1。0

truncate

*

sigma_np

minx_np

=

np

round

x_np

-

width_np

astype

np

int

minx_np

=

np

clip

minx_np

0

field

shape

1

-

1

miny_np

=

np

round

y_np

-

width_np

astype

np

int

miny_np

=

np

clip

miny_np

0

field

shape

0

-

1

maxx_np

=

np

round

x_np

+

width_np

astype

np

int

maxx_np

=

np

clip

maxx_np

+

1

minx_np

+

1

field

shape

1

])

maxy_np

=

np

round

y_np

+

width_np

astype

np

int

maxy_np

=

np

clip

maxy_np

+

1

miny_np

+

1

field

shape

0

])

cdef

float

[:]

x

=

x_np

cdef

float

[:]

y

=

y_np

cdef

float

[:]

sigma

=

sigma_np

cdef

long

[:]

minx

=

minx_np

cdef

long

[:]

miny

=

miny_np

cdef

long

[:]

maxx

=

maxx_np

cdef

long

[:]

maxy

=

maxy_np

cdef

float

[:]

v

=

v_np

cdef

Py_ssize_t

i

xx

yy

cdef

Py_ssize_t

l

=

minx

shape

0

cdef

float

deltax2

deltay2

cdef

float

vv

cdef

float

cv

cx

cy

csigma2

for

i

in

range

l

):

csigma2

=

sigma

i

**

2

cx

=

x

i

# offset x

cy

=

y

i

# offset y

cv

=

v

i

# confidence

for

xx

in

range

minx

i

],

maxx

i

]):

deltax2

=

xx

-

cx

**

2

for

yy

in

range

miny

i

],

maxy

i

]):

deltay2

=

yy

-

cy

**

2

vv

=

cv

*

approx_exp

-

0。5

*

deltax2

+

deltay2

/

csigma2

field

yy

xx

+=

vv

2。 然後看paf是什麼, 19x7維度, 19表示19條關鍵點連線線, 7表示:

aij = (aijc, aijx1;, aijy1, aijb1, aijx2, aijy2, aijb2)

可以和上面的pif類似理解。 aijc表示該(i,j)位置處在19條關鍵點連線中某一條, 的confidence分數(注意它是19 channel)。 (aijx1, aijy1, aijx2, aijy2)則是兩個“趨勢”向量(offset)。 因為是某條連線的中間點嘛, 所以會有兩個方向上點的offset預測。 aijb1 aijb2則也是兩個向量的“寄生”引數。

再看paf值怎麼定? 一口氣要確定兩點的資訊。

比如我們要找右膝和右腳踝的連線, 那就在當前的(i,j)位置找最近的一個腳踝或者膝蓋(你可能會問怎麼找? 那不是每個點有(xijx1 xijy1 xijx2 xijy2 嘛 用這倆向量找最小唄~)), 然後此點確定後, 根據single person ground truth, 對應的該人的另一個關鍵點就也確定了。 程式碼去看encoder/paf。py 深究。。。

3。 再看 Greedy Decoding 是什麼?

因為上面出點也由pif出好了, 兩兩點之間的連線也可以由paf搞定(但注意multi-person會出現不同人之間點互連的問題)。 所以需要一個重新組合的過程。

PifPaf: Composite Fields for Human Pose Estimation

fig 3

s(a, x)是在關鍵點x處, 某條“預備”連線a的分數。 ac依然是連線a的confidence分數, 中間的exp(***)部分你就認為是該關鍵點的第一個offset向量用two-tailed Laplace distribution“包裝”下的意思吧。。。。 f2部分是和公式(1)類似的一個計算。 湊合起來, 就是經過關鍵點x的, 某一條連線的score了。

這部分的演算法細節還請移步[2], 程式碼的話仔細看decoder/pifpaf。py

def

_grow_connection

self

xy

paf_field

):

assert

len

xy

==

2

assert

paf_field

shape

0

==

7

# source value

s_mask

=

paf_mask_center

paf_field

xy

0

],

xy

1

],

sigma

=

2。0

if

not

np

any

s_mask

):

return

0

0

0

paf_field

=

paf_field

[:,

s_mask

# source distance

d

=

np

linalg

norm

np

expand_dims

xy

1

-

paf_field

1

3

],

axis

=

0

b_source

=

paf_field

3

b_target

=

paf_field

6

# combined value and source distance

v

=

paf_field

0

scores

=

np

exp

-

1。0

*

d

/

b_source

*

v

# two-tailed cumulative Laplace

if

self

connection_method

==

‘median’

return

self

_target_with_median

paf_field

4

6

],

scores

sigma

=

1。0

if

self

connection_method

==

‘max’

return

self

_target_with_maxscore

paf_field

4

7

],

scores

raise

Exception

‘connection method not known’

def

_target_with_median

self

target_coordinates

scores

sigma

max_steps

=

20

):

target_coordinates

=

np

moveaxis

target_coordinates

0

-

1

assert

target_coordinates

shape

0

==

scores

shape

0

if

target_coordinates

shape

0

==

1

return

target_coordinates

0

][

0

],

target_coordinates

0

][

1

],

np

tanh

scores

0

*

3。0

/

self

paf_nn

))

y

=

np

sum

target_coordinates

*

np

expand_dims

scores

-

1

),

axis

=

0

/

np

sum

scores

if

target_coordinates

shape

0

==

2

return

y

0

],

y

1

],

np

tanh

np

sum

scores

*

3。0

/

self

paf_nn

y

prev_d

=

weiszfeld_nd

target_coordinates

y

weights

=

scores

max_steps

=

max_steps

closest

=

prev_d

<

sigma

close_scores

=

np

sort

scores

closest

])[

-

self

paf_nn

:]

score

=

np

tanh

np

sum

close_scores

*

3。0

/

self

paf_nn

return

y

0

],

y

1

],

score

補充下paper使用的Laplace loss:

PifPaf: Composite Fields for Human Pose Estimation

fig 4

loss

py

# regress loss

def

laplace_loss

x1

x2

logb

t1

t2

weight

=

None

):

“”“Loss based on Laplace Distribution。

Loss for a single two-dimensional vector (x1, x2) with radial

spread b and true (t1, t2) vector。

”“”

norm

=

torch

sqrt

((

x1

-

t1

**

2

+

x2

-

t2

**

2

# |x-u|

# log2 = 0。694 0。694 + logb == log2b

losses

=

0。694

+

logb

+

norm

*

torch

exp

-

logb

# torch。exp(-logb) == 1 / b

if

weight

is

not

None

losses

=

losses

*

weight

return

torch

sum

losses

loss。py 程式碼往下看可以發現, 這個神奇的miu, 是target offset:

def

forward

self

x

t

):

# pylint: disable=arguments-differ

……。。。。。

# see line29 laplace loss

reg_losses

append

self

regression_loss

torch

masked_select

x_reg

[:,

:,

0

],

reg_masks

),

torch

masked_select

x_reg

[:,

:,

1

],

reg_masks

),

# pred 的 offset x y

torch

masked_select

x_spread

reg_masks

),

# logb 引數

torch

masked_select

target_reg

[:,

:,

0

],

reg_masks

),

# target offset x y

torch

masked_select

target_reg

[:,

:,

1

],

reg_masks

),

weight

=

weight

/

1000。0

/

batch_size

# 1000 normalizer?

ok 暫時寫這些, 有新的體會會再來補充。。。。

歡迎評論區指正指點。。。。 ~~~

0621更新: 附一段webcam。py程式碼, 可替換原開原始碼, 跑個影片demo see see:

“”“Webcam demo application。

Example command:

python -m openpifpaf。webcam

\

——checkpoint outputs/resnet50block5-pif-paf-edge401-190621-093216。pkl。epoch052

\

——source=”mvp_human_fail_3_0615。avi“

”“”

import

numpy

as

np

import

argparse

import

time

import

matplotlib

import

matplotlib。pyplot

as

plt

import

torch

import

cv2

from

。network

import

nets

from

import

decoder

show

transforms

COCO_PERSON_SKELETON

=

16

14

],

14

12

],

17

15

],

15

13

],

12

13

],

6

12

],

7

13

],

6

7

],

6

8

],

7

9

],

8

10

],

9

11

],

2

3

],

1

2

],

1

3

],

2

4

],

3

5

],

4

6

],

5

7

]]

def

plot_points

img

labels

skeleton

):

nums

kth

=

labels

shape

[:

2

for

i

in

range

nums

):

# r = np。random。randint(0, 255) # 可以考慮自己隨機顏色

# g = np。random。randint(0, 255)

# b = np。random。randint(0, 255)

# color = (r, g, b)

point_color

=

255

255

255

# 白點

line_color

=

0

0

255

# 紅線

label

=

labels

i

:,

:]

x

=

[]

for

j

in

range

kth

):

if

label

j

][

-

1

>=

0。7

x

append

((

label

j

][

0

],

label

j

][

1

],

j

+

1

))

# j+1代表這個點的index, 後續點連線用到

point_size

=

1

thickness

=

1

for

point

in

x

cv2

circle

img

int

point

0

]),

int

point

1

])),

point_size

point_color

thickness

for

connecte

in

skeleton

for

p1

in

x

for

p2

in

x

if

p1

-

1

==

connecte

0

and

p2

-

1

==

connecte

1

]:

cv2

line

img

int

p1

0

]),

int

p1

1

])),

int

p2

0

]),

int

p2

1

])),

line_color

1

def

main

():

parser

=

argparse

ArgumentParser

description

=

__doc__

formatter_class

=

argparse

ArgumentDefaultsHelpFormatter

nets

cli

parser

decoder

cli

parser

force_complete_pose

=

False

instance_threshold

=

0。1

seed_threshold

=

0。5

parser

add_argument

‘——no-colored-connections’

dest

=

‘colored_connections’

default

=

True

action

=

‘store_false’

help

=

‘do not use colored connections to draw poses’

parser

add_argument

‘——disable-cuda’

action

=

‘store_true’

help

=

‘disable CUDA’

parser

add_argument

‘——source’

default

=

0

help

=

‘OpenCV source url。 Integer for webcams。 Or ipwebcam streams。’

parser

add_argument

‘——scale’

default

=

0。1

type

=

float

help

=

‘input image scale factor’

args

=

parser

parse_args

()

# check whether source should be an int

if

len

args

source

==

1

args

source

=

int

args

source

# add args。device

args

device

=

torch

device

‘cpu’

if

not

args

disable_cuda

and

torch

cuda

is_available

():

args

device

=

torch

device

‘cuda’

# load model

model

_

=

nets

factory_from_args

args

model

=

model

to

args

device

processor

=

decoder

factory_from_args

args

model

# name = np。random。randint(1,100)

videoCapture

=

cv2

VideoCapture

args

source

# 待檢測的video

fps

=

videoCapture

get

cv2

CAP_PROP_FPS

size

=

int

videoCapture

get

cv2

CAP_PROP_FRAME_WIDTH

)),

int

videoCapture

get

cv2

CAP_PROP_FRAME_HEIGHT

)))

scale

=

True

# 是否scale影片幀的尺寸

if

scale

size

=

683

384

ret

frame

=

videoCapture

read

()

videoWriter

=

cv2

VideoWriter

‘show。avi’

cv2

VideoWriter_fourcc

*

‘MJPG’

),

fps

size

frame_cnt

=

0

while

ret

frame1

=

cv2

resize

frame

size

# (384, 683, 3)

image

=

cv2

cvtColor

frame1

cv2

COLOR_BGR2RGB

processed_image_cpu

=

transforms

image_transform

image

copy

())

# normalize

processed_image

=

processed_image_cpu

contiguous

()

to

args

device

non_blocking

=

True

# transpose 2,0,1

fields

=

processor

fields

torch

unsqueeze

processed_image

0

))[

0

keypoint_sets

_

=

processor

keypoint_sets

fields

if

keypoint_sets

shape

0

>

0

plot_points

image

keypoint_sets

COCO_PERSON_SKELETON

videoWriter

write

image

frame_cnt

+=

1

print

‘幀數: ’

frame_cnt

ret

frame

=

videoCapture

read

()

if

__name__

==

‘__main__’

main

()

pps: 參考連結:

https://

blog。csdn。net/murdock_c

/article/details/88851912

[1] OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

[2] Personlab: Person pose estimation and instance segmentation with a bottomup, part-based, geometric embedding model