Package poseEstimationPipeline_python_module
Overview
PoseEstimation
pipeline to supposed to detect persons in a frame/image and estimate their respective poses.Since it employs a dedicated object-detector
to detect persons as first part of pipeline,it can be used to estimate poses in dynamic real-time conditions with high accuracy.
Object-detector
architecture is based on the mobilenetV2
architecture as backbone
and yolo
architecture for anchors
generations for final bounding-boxes
detection.
Pose-estimation
architecture is based on the mobileNetV2
as backbone and a
simple encoder-decoder
architecture for final keypoints
estimation.
Model handles all the preprocessing
and post-processing
steps,and just expects a frame BGR[UINT8]
data and returns the keypoints-coordinates
in the frame
space passed to the model,hence providing truly plug&play
functionality even for existing user defined pipeline.
Features
- Works in super real-time with fullHD resolutions.
- Work with all versions of python3 (tested with python3 only).
- Highly portable being a compiled python module.
- Minimal dependencies (all dependencies are standard DLLs).
- Only expected python dependencies are
numpy
andopencv
(for reading/displaying images purpose).
Limitations
- Estimating poses for more than one person would slow down the pipeline,since poseEstimation part would be run separately for each person detected.
Benchmarks
Architecture | OS | Time*(ms) |
---|---|---|
Intel i5-8300H Cpu @ 2.30GHz | Windows | 26-27 ms |
Intel i5-8300H Cpu @ 2.30GHz + Nvidia GTX 1050 (hybrid) | Windows | < 19 ms |
- *Time measured in python runtime averaging over 100 loops.
- *Time measured with inputs upto HD resolutions.
- (All resolutions are resized to fixed input-size specified by model before further processing.)
Usage
joint_pairs = [[0, 1], [1, 3], [0, 2], [2, 4],
[5, 6], [5, 7], [7, 9], [6, 8],
[8, 10],[5, 11], [6, 12], [11, 12],
[11, 13], [12, 14], [13, 15], [14, 16]]
import numpy as np
import cv2
frame = cv2.imread("./test.jpg") #read frame,image [BGR uint8 data]
import poseEstimationPipeline_python_module as pipeline # for cpu based module.
import poseEstimationPipeline_hybrid_python_module as pipeline # for hybrid(cpu + Gpu(cuda)) module.
pipeline.load_model("./poseEstimationPipeline.bin") #Initialize model and load weights.
MAX_NUMBER_POSE = 2 # Maximum number of persons/poses user wants to detect, for each person detected, poseEstimation would be run separately hence leading to higher latency, Default is 10.
keypoints = pipeline.detect_pose(frame,max_count=MAX_NUMBER_POSE)
print("Number of person detected: {}".format(keypoints.shape[0]))
#plot keypoints on the frame.
keypoint_threshold = 0.2 #Empirical Value.
for count in range(pose_count):
for pair in joint_pairs:
y1_k = int(keypoints[count][pair[0],0])
x1_k = int(keypoints[count][pair[0],1])
confidence_1 = keypoints[count][pair[0],2]
y2_k = int(keypoints[count][pair[1],0])
x2_k = int(keypoints[count][pair[1],1])
confidence_2 = keypoints[count][pair[1],2]
if confidence_1 > keypoint_threshold and confidence_2 > keypoint_threshold:
cv2.line(frame,(x1_k,y1_k), (x2_k,y2_k), (255,255,0), 2)
cv2.circle(frame, (x1_k,y1_k), 1, (0,255,255), 2)
cv2.circle(frame, (x2_k,y2_k), 1, (0,255,255), 2)
cv2.imshow("pose",frame)
cv2.waitKey(0)
Resources:
- Pose-Estimation and tracking: https://arxiv.org/abs/1804.06208
- MobileNet: https://arxiv.org/abs/1704.04861
- YOLO: https://arxiv.org/abs/1506.02640
How to install:
Once SDK has been downloaded into a local directory. Follow these steps.
cd
into directory. i.e at the root of directory.- Make sure all the requirements have been fulfilled as stated in requirements.
-
On
Windows
run following command at the root of the directory :pip install .
On
Linux
run following command at the root of the directory :pip install .
Functions
def detect_pose(frame, conf_threshold=0.85, nms_threshold=0.45, max_count=10)
-
Detect keypoints for multiple persons in a given frame . Expects BGR format UINT8 data,generally resulting from
cv2.imread
orcv2.VideoCapture() based sources
.Inputs:
frame: [Height,Width,3] numpy array containing UINT8 data in BGR format,although may work with RGB data but expected BGR. conf_threshold:float confidence threshold,only objects with confidence above this threshold would be returned. nms_threshold:float threshold used to suppress overlapping bboxes during Non Maximum Supression. max_count:int maximum number of poses/person that could be detected in a single image/frame.
Returns:
keypoints: numpy array float32 data of shape [number_persons_detected,17,3], where keypoints[i,j,:] = [y,x,confidence], (y,x) coordinates can be directly plotted on the given frame.
def load_model(weightsPath)
-
Initialize the model and load weights from the path specified by weightsPath argument.
Inputs:
weightsPath:str /path/to/weights ,path to file to load weights from ,generally a .bin extension