Package `poseEstimationPipeline_python_module`

Overview

PoseEstimation pipeline to supposed to detect persons in a frame/image and estimate their respective poses.Since it employs a dedicated object-detector to detect persons as first part of pipeline,it can be used to estimate poses in dynamic real-time conditions with high accuracy.

Object-detector architecture is based on the mobilenetV2 architecture as backbone and yolo architecture for anchors generations for final bounding-boxes detection.

Pose-estimation architecture is based on the mobileNetV2 as backbone and a simple encoder-decoder architecture for final keypoints estimation.

Model handles all the preprocessing and post-processing steps,and just expects a frame BGR[UINT8] data and returns the keypoints-coordinates in the frame space passed to the model,hence providing truly plug&play functionality even for existing user defined pipeline.

Features

Works in super real-time with fullHD resolutions.
Work with all versions of python3 (tested with python3 only).
Highly portable being a compiled python module.
Minimal dependencies (all dependencies are standard DLLs).
Only expected python dependencies are numpy and opencv(for reading/displaying images purpose).

Limitations

Estimating poses for more than one person would slow down the pipeline,since poseEstimation part would be run separately for each person detected.

Benchmarks

Architecture	OS	Time*(ms)
Intel i5-8300H Cpu @ 2.30GHz	Windows	26-27 ms
Intel i5-8300H Cpu @ 2.30GHz + Nvidia GTX 1050 (hybrid)	Windows	< 19 ms

*Time measured in python runtime averaging over 100 loops.
*Time measured with inputs upto HD resolutions.
(All resolutions are resized to fixed input-size specified by model before further processing.)

Usage

joint_pairs = [[0, 1], [1, 3], [0, 2], [2, 4],
                [5, 6], [5, 7], [7, 9], [6, 8], 
                [8, 10],[5, 11], [6, 12], [11, 12],
                [11, 13], [12, 14], [13, 15], [14, 16]]

import numpy as np
import cv2

frame = cv2.imread("./test.jpg")        #read frame,image [BGR uint8 data]

import poseEstimationPipeline_python_module as pipeline          # for cpu based module.
import poseEstimationPipeline_hybrid_python_module as pipeline   # for hybrid(cpu + Gpu(cuda)) module.

pipeline.load_model("./poseEstimationPipeline.bin")     #Initialize model and load weights.

MAX_NUMBER_POSE = 2           # Maximum number of persons/poses user wants to detect, for each person detected, poseEstimation would be run separately hence leading to higher latency, Default is 10.
keypoints = pipeline.detect_pose(frame,max_count=MAX_NUMBER_POSE)

print("Number of person detected: {}".format(keypoints.shape[0]))

#plot keypoints on the frame.
keypoint_threshold = 0.2    #Empirical Value.
for count in range(pose_count):
    for pair in joint_pairs:
        y1_k = int(keypoints[count][pair[0],0])
        x1_k = int(keypoints[count][pair[0],1])
        confidence_1 = keypoints[count][pair[0],2]

        y2_k = int(keypoints[count][pair[1],0])
        x2_k = int(keypoints[count][pair[1],1])
        confidence_2 = keypoints[count][pair[1],2]

        if  confidence_1 > keypoint_threshold and confidence_2 > keypoint_threshold:
            cv2.line(frame,(x1_k,y1_k), (x2_k,y2_k), (255,255,0), 2)
            cv2.circle(frame, (x1_k,y1_k), 1, (0,255,255), 2)
            cv2.circle(frame, (x2_k,y2_k), 1, (0,255,255), 2)

cv2.imshow("pose",frame)
cv2.waitKey(0)

Resources:

Pose-Estimation and tracking: https://arxiv.org/abs/1804.06208
MobileNet: https://arxiv.org/abs/1704.04861
YOLO: https://arxiv.org/abs/1506.02640

How to install:

Once SDK has been downloaded into a local directory. Follow these steps.

cd into directory. i.e at the root of directory.
Make sure all the requirements have been fulfilled as stated in requirements.
On Windows run following command at the root of the directory :

pip install .

On Linux run following command at the root of the directory :

pip install .

Functions

def detect_pose(frame, conf_threshold=0.85, nms_threshold=0.45, max_count=10)

Detect keypoints for multiple persons in a given frame . Expects BGR format UINT8 data,generally resulting from cv2.imread or cv2.VideoCapture() based sources.

Inputs:

frame: [Height,Width,3] numpy array containing UINT8 data in BGR format,although may work with RGB data but expected BGR.

conf_threshold:float confidence threshold,only objects with confidence above this threshold would be returned.

nms_threshold:float threshold used to suppress overlapping bboxes during Non Maximum Supression.

max_count:int maximum number of poses/person that could be detected in a single image/frame.

Returns:

keypoints: numpy array float32 data of shape [number_persons_detected,17,3], where keypoints[i,j,:] = [y,x,confidence], (y,x) coordinates can be directly plotted on the given frame.

def load_model(weightsPath)

Initialize the model and load weights from the path specified by weightsPath argument.

Inputs:

weightsPath:str  /path/to/weights  ,path to file to load weights from ,generally a .bin extension