Package poseEstimationPipeline_python_module


Overview

PoseEstimation pipeline to supposed to detect persons in a frame/image and estimate their respective poses.Since it employs a dedicated object-detector to detect persons as first part of pipeline,it can be used to estimate poses in dynamic real-time conditions with high accuracy.

Object-detector architecture is based on the mobilenetV2 architecture as backbone and yolo architecture for anchors generations for final bounding-boxes detection.

Pose-estimation architecture is based on the mobileNetV2 as backbone and a simple encoder-decoder architecture for final keypoints estimation.

Model handles all the preprocessing and post-processing steps,and just expects a frame BGR[UINT8] data and returns the keypoints-coordinates in the frame space passed to the model,hence providing truly plug&play functionality even for existing user defined pipeline.

Features

  • Works in super real-time with fullHD resolutions.
  • Work with all versions of python3 (tested with python3 only).
  • Highly portable being a compiled python module.
  • Minimal dependencies (all dependencies are standard DLLs).
  • Only expected python dependencies are numpy and opencv(for reading/displaying images purpose).

Limitations

  • Estimating poses for more than one person would slow down the pipeline,since poseEstimation part would be run separately for each person detected.

Benchmarks

Architecture OS Time*(ms)
Intel i5-8300H Cpu @ 2.30GHz Windows 26-27 ms
Intel i5-8300H Cpu @ 2.30GHz + Nvidia GTX 1050 (hybrid) Windows < 19 ms
  • *Time measured in python runtime averaging over 100 loops.
  • *Time measured with inputs upto HD resolutions.
  • (All resolutions are resized to fixed input-size specified by model before further processing.)

Usage

joint_pairs = [[0, 1], [1, 3], [0, 2], [2, 4],
                [5, 6], [5, 7], [7, 9], [6, 8], 
                [8, 10],[5, 11], [6, 12], [11, 12],
                [11, 13], [12, 14], [13, 15], [14, 16]]

import numpy as np
import cv2

frame = cv2.imread("./test.jpg")        #read frame,image [BGR uint8 data]

import poseEstimationPipeline_python_module as pipeline          # for cpu based module.
import poseEstimationPipeline_hybrid_python_module as pipeline   # for hybrid(cpu + Gpu(cuda)) module.

pipeline.load_model("./poseEstimationPipeline.bin")     #Initialize model and load weights.

MAX_NUMBER_POSE = 2           # Maximum number of persons/poses user wants to detect, for each person detected, poseEstimation would be run separately hence leading to higher latency, Default is 10.
keypoints = pipeline.detect_pose(frame,max_count=MAX_NUMBER_POSE)

print("Number of person detected: {}".format(keypoints.shape[0]))

#plot keypoints on the frame.
keypoint_threshold = 0.2    #Empirical Value.
for count in range(pose_count):
    for pair in joint_pairs:
        y1_k = int(keypoints[count][pair[0],0])
        x1_k = int(keypoints[count][pair[0],1])
        confidence_1 = keypoints[count][pair[0],2]

        y2_k = int(keypoints[count][pair[1],0])
        x2_k = int(keypoints[count][pair[1],1])
        confidence_2 = keypoints[count][pair[1],2]

        if  confidence_1 > keypoint_threshold and confidence_2 > keypoint_threshold:
            cv2.line(frame,(x1_k,y1_k), (x2_k,y2_k), (255,255,0), 2)
            cv2.circle(frame, (x1_k,y1_k), 1, (0,255,255), 2)
            cv2.circle(frame, (x2_k,y2_k), 1, (0,255,255), 2)

cv2.imshow("pose",frame)
cv2.waitKey(0)

Resources:


How to install:

Once SDK has been downloaded into a local directory. Follow these steps.

  • cd into directory. i.e at the root of directory.
  • Make sure all the requirements have been fulfilled as stated in requirements.
  • On Windows run following command at the root of the directory :

    pip install .

    On Linux run following command at the root of the directory :

    pip install .

Functions

def detect_pose(frame, conf_threshold=0.85, nms_threshold=0.45, max_count=10)

Detect keypoints for multiple persons in a given frame . Expects BGR format UINT8 data,generally resulting from cv2.imread or cv2.VideoCapture() based sources.

Inputs:

frame: [Height,Width,3] numpy array containing UINT8 data in BGR format,although may work with RGB data but expected BGR.

conf_threshold:float confidence threshold,only objects with confidence above this threshold would be returned.

nms_threshold:float threshold used to suppress overlapping bboxes during Non Maximum Supression.

max_count:int maximum number of poses/person that could be detected in a single image/frame.

Returns:

keypoints: numpy array float32 data of shape [number_persons_detected,17,3], where keypoints[i,j,:] = [y,x,confidence], (y,x) coordinates can be directly plotted on the given frame.
def load_model(weightsPath)

Initialize the model and load weights from the path specified by weightsPath argument.

Inputs:

weightsPath:str  /path/to/weights  ,path to file to load weights from ,generally a .bin extension