-
Notifications
You must be signed in to change notification settings - Fork 7.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tensor RT #285
Open
bushibushi
wants to merge
55
commits into
CMU-Perceptual-Computing-Lab:master
Choose a base branch
from
Deepomatic:TensorRT_PR
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Tensor RT #285
Changes from 38 commits
Commits
Show all changes
55 commits
Select commit
Hold shift + click to select a range
564aece
Files for tensort rt pose detection, for now nothing done.
bushibushi dfc1f82
Adding timer in new demo and checking build before replacing inference.
bushibushi a4885e0
PoseExtractorTensorRT changed names for build conflicts but still per…
bushibushi c05580d
Started modifying tutorial pose 3.
bushibushi 9a97e93
More precise timing.
bushibushi 4778ed6
More precise timings before replacing inference.
bushibushi 9c258b7
Clearer timing display.
bushibushi e6fbd25
Replaced poseExtractorCaffe with poseExtractorTensorRT
bushibushi f290fc5
Added inference sample code at end of poseExtractorTensorRT to work o…
bushibushi ddc2396
First code adaptation trial. Will not compile, still loads to replace.
bushibushi f09f27b
New netTensorRT version, cleaner, ready for debug, loads of questions.
bushibushi ba2b435
Fixed everything to compile, runs, reads network and convert but then…
bushibushi 97bbc05
Debug logs.
bushibushi c666163
First try on tensorRT inference with caffe Blobs.
bushibushi 1c77534
Running, but not pose recognition. Find a way to copy memory correctly.
bushibushi 1380b14
pose.sh script
bushibushi 32f5387
Timing in original pose demo
bushibushi d2310db
Did not take into account forwardPass input data !
bushibushi 576c055
Data copied to cuda memory. Correct sizes hardcoded, no CUDA error an…
bushibushi e5d27fe
Tutorial pose 3 working !!!! Gaining x2 inference time, now time for …
bushibushi 7d37095
TensorRT Net input and output dimensions at runtime.
bushibushi f3a898c
NetTensorRT cleaning.
bushibushi 5c630b5
NetTensorRT cleaning bis.
bushibushi a617583
Cleaning compilation fix.
bushibushi d3a31e0
caffeToGIE needs fixed input size and cannot be determined at runtime…
bushibushi f6df326
Engine serialization and deserialization.
bushibushi 404077a
Targetting highest possible FPS in demo.
bushibushi 1971baa
Asynchronous inference.
bushibushi 330d4bb
Way simpler inference code, a lot was useless.
bushibushi c2be9aa
Removing log to speedup inference.
bushibushi 89e3b44
ResizeAndMergeBase CPU version.
bushibushi b54ae11
Inference model for pose net size 256x192
bushibushi 7808f89
Detailed poseExtractor Timings.
bushibushi 8023fb1
Faster Resize and Merge.
bushibushi b3ae8ec
Merge branch 'master' into TensorRT_PR
bushibushi ec58a48
TENSORRT precompiler guards
bushibushi 33aa099
TENSORRT compilation is still partly using caffe
bushibushi 359b601
Missing guards for TensorRT
bushibushi d4a89d0
PIMPL version of poseExtractorTensorRT, still having template compila…
bushibushi 766c44a
Spot the differences part 1.
bushibushi c12dd28
Spot the differences part 2
bushibushi 047d18b
Spot the differences 3
bushibushi 9e4d903
Fixed compilation without TensorRT
bushibushi b3655d0
Fix attempt
bushibushi e76dc71
Wrong variable name
bushibushi bbe83e9
Merge branch 'master' into TensorRT_PR
bushibushi 6456dff
Too much changed in poseExtractorCaffe, need to rewrite TensorRT one …
bushibushi b3673e6
PIMPL for netTensorRT
bushibushi 273a351
Fix source issues, example remains.
bushibushi ca682c4
Fix samples
bushibushi cb0d440
Compilation fixed, TensorRT net optimisation works, segfault on infer…
bushibushi 827510b
Code kind of work, not full pipeline lead to no shape displayed, size…
bushibushi a1619fa
Useless preproc macros
bushibushi 344ab67
NetTensorRT modifs
bushibushi b9c33c9
Merge branch 'master' into TensorRT_PR
bushibushi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
174 changes: 174 additions & 0 deletions
174
examples/tutorial_pose/3_extract_from_image_TensorRT.cpp
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,174 @@ | ||
// ------------------------- OpenPose Library Tutorial - Pose - Example 3 - Extract from Image with TensorRT ------------------------- | ||
// This first example shows the user how to: | ||
// 1. Load an image (`filestream` module) | ||
// 2. Extract the pose of that image (`pose` module) | ||
// 3. Render the pose on a resized copy of the input image (`pose` module) | ||
// 4. Display the rendered pose (`gui` module) | ||
// In addition to the previous OpenPose modules, we also need to use: | ||
// 1. `core` module: for the Array<float> class that the `pose` module needs | ||
// 2. `utilities` module: for the error & logging functions, i.e. op::error & op::log respectively | ||
|
||
// 3rdparty dependencies | ||
#include <gflags/gflags.h> // DEFINE_bool, DEFINE_int32, DEFINE_int64, DEFINE_uint64, DEFINE_double, DEFINE_string | ||
#include <glog/logging.h> // google::InitGoogleLogging | ||
// OpenPose dependencies | ||
#include <openpose/core/headers.hpp> | ||
#include <openpose/filestream/headers.hpp> | ||
#include <openpose/gui/headers.hpp> | ||
#include <openpose/pose/headers.hpp> | ||
#include <openpose/utilities/headers.hpp> | ||
|
||
// See all the available parameter options withe the `--help` flag. E.g. `./build/examples/openpose/openpose.bin --help`. | ||
// Note: This command will show you flags for other unnecessary 3rdparty files. Check only the flags for the OpenPose | ||
// executable. E.g. for `openpose.bin`, look for `Flags from examples/openpose/openpose.cpp:`. | ||
// Debugging | ||
DEFINE_int32(logging_level, 3, "The logging level. Integer in the range [0, 255]. 0 will output any log() message, while" | ||
" 255 will not output any. Current OpenPose library messages are in the range 0-4: 1 for" | ||
" low priority messages and 4 for important ones."); | ||
// Producer | ||
DEFINE_string(image_path, "examples/media/COCO_val2014_000000000192.jpg", "Process the desired image."); | ||
// OpenPose | ||
DEFINE_string(model_pose, "COCO", "Model to be used. E.g. `COCO` (18 keypoints), `MPI` (15 keypoints, ~10% faster), " | ||
"`MPI_4_layers` (15 keypoints, even faster but less accurate)."); | ||
DEFINE_string(model_folder, "models/", "Folder path (absolute or relative) where the models (pose, face, ...) are located."); | ||
DEFINE_string(net_resolution, "128x96", "Multiples of 16. If it is increased, the accuracy potentially increases. If it is decreased," | ||
" the speed increases. For maximum speed-accuracy balance, it should keep the closest aspect" | ||
" ratio possible to the images or videos to be processed. E.g. the default `128x96` is" | ||
" optimal for 16:9 videos, e.g. full HD (1980x1080) and HD (1280x720) videos."); | ||
DEFINE_string(resolution, "1280x720", "The image resolution (display and output). Use \"-1x-1\" to force the program to use the" | ||
" default images resolution."); | ||
DEFINE_int32(num_gpu_start, 0, "GPU device start number."); | ||
DEFINE_double(scale_gap, 0.3, "Scale gap between scales. No effect unless scale_number > 1. Initial scale is always 1." | ||
" If you want to change the initial scale, you actually want to multiply the" | ||
" `net_resolution` by your desired initial scale."); | ||
DEFINE_int32(scale_number, 1, "Number of scales to average."); | ||
// OpenPose Rendering | ||
DEFINE_bool(disable_blending, false, "If blending is enabled, it will merge the results with the original frame. If disabled, it" | ||
" will only display the results on a black background."); | ||
DEFINE_double(render_threshold, 0.05, "Only estimated keypoints whose score confidences are higher than this threshold will be" | ||
" rendered. Generally, a high threshold (> 0.5) will only render very clear body parts;" | ||
" while small thresholds (~0.1) will also output guessed and occluded keypoints, but also" | ||
" more false positives (i.e. wrong detections)."); | ||
DEFINE_double(alpha_pose, 0.6, "Blending factor (range 0-1) for the body part rendering. 1 will show it completely, 0 will" | ||
" hide it. Only valid for GPU rendering."); | ||
|
||
typedef std::vector<std::pair<std::string, std::chrono::high_resolution_clock::time_point>> OpTimings; | ||
|
||
static OpTimings timings; | ||
|
||
static void timeNow(const std::string& label){ | ||
const auto now = std::chrono::high_resolution_clock::now(); | ||
const auto timing = std::make_pair(label, now); | ||
timings.push_back(timing); | ||
} | ||
|
||
static std::string timeDiffToString(const std::chrono::high_resolution_clock::time_point& t1, | ||
const std::chrono::high_resolution_clock::time_point& t2 ) { | ||
return std::to_string((double)std::chrono::duration_cast<std::chrono::duration<double>>(t1 - t2).count() * 1e3) + " ms"; | ||
} | ||
|
||
int openPoseTutorialPose3() | ||
{ | ||
#ifdef USE_TENSORRT | ||
op::log("Starting pose estimation.", op::Priority::High); | ||
|
||
timeNow("Start"); | ||
|
||
op::log("OpenPose Library Tutorial - Pose Example 3.", op::Priority::High); | ||
// ------------------------- INITIALIZATION ------------------------- | ||
// Step 1 - Set logging level | ||
// - 0 will output all the logging messages | ||
// - 255 will output nothing | ||
op::check(0 <= FLAGS_logging_level && FLAGS_logging_level <= 255, "Wrong logging_level value.", __LINE__, __FUNCTION__, __FILE__); | ||
op::ConfigureLog::setPriorityThreshold((op::Priority)FLAGS_logging_level); | ||
op::log("", op::Priority::Low, __LINE__, __FUNCTION__, __FILE__); | ||
// Step 2 - Read Google flags (user defined configuration) | ||
// outputSize | ||
const auto outputSize = op::flagsToPoint(FLAGS_resolution, "1280x720"); | ||
// netInputSize | ||
const auto netInputSize = op::flagsToPoint(FLAGS_net_resolution, "128x96"); | ||
// netOutputSize | ||
const auto netOutputSize = netInputSize; | ||
// poseModel | ||
const auto poseModel = op::flagsToPoseModel(FLAGS_model_pose); | ||
// Check no contradictory flags enabled | ||
if (FLAGS_alpha_pose < 0. || FLAGS_alpha_pose > 1.) | ||
op::error("Alpha value for blending must be in the range [0,1].", __LINE__, __FUNCTION__, __FILE__); | ||
if (FLAGS_scale_gap <= 0. && FLAGS_scale_number > 1) | ||
op::error("Incompatible flag configuration: scale_gap must be greater than 0 or scale_number = 1.", __LINE__, __FUNCTION__, __FILE__); | ||
// Logging | ||
op::log("", op::Priority::Low, __LINE__, __FUNCTION__, __FILE__); | ||
// Step 3 - Initialize all required classes | ||
op::CvMatToOpInput cvMatToOpInput{netInputSize, FLAGS_scale_number, (float)FLAGS_scale_gap}; | ||
op::CvMatToOpOutput cvMatToOpOutput{outputSize}; | ||
op::PoseExtractorTensorRT poseExtractorTensorRT{netInputSize, netOutputSize, outputSize, FLAGS_scale_number, poseModel, | ||
FLAGS_model_folder, FLAGS_num_gpu_start}; | ||
op::PoseRenderer poseRenderer{netOutputSize, outputSize, poseModel, nullptr, (float)FLAGS_render_threshold, | ||
!FLAGS_disable_blending, (float)FLAGS_alpha_pose}; | ||
op::OpOutputToCvMat opOutputToCvMat{outputSize}; | ||
const op::Point<int> windowedSize = outputSize; | ||
op::FrameDisplayer frameDisplayer{windowedSize, "OpenPose Tutorial - Example 1"}; | ||
// Step 4 - Initialize resources on desired thread (in this case single thread, i.e. we init resources here) | ||
poseExtractorTensorRT.initializationOnThread(); | ||
poseRenderer.initializationOnThread(); | ||
|
||
timeNow("Initialization"); | ||
|
||
// ------------------------- POSE ESTIMATION AND RENDERING ------------------------- | ||
// Step 1 - Read and load image, error if empty (possibly wrong path) | ||
cv::Mat inputImage = op::loadImage(FLAGS_image_path, CV_LOAD_IMAGE_COLOR); // Alternative: cv::imread(FLAGS_image_path, CV_LOAD_IMAGE_COLOR); | ||
if(inputImage.empty()) | ||
op::error("Could not open or find the image: " + FLAGS_image_path, __LINE__, __FUNCTION__, __FILE__); | ||
timeNow("Step 1"); | ||
// Step 2 - Format input image to OpenPose input and output formats | ||
op::Array<float> netInputArray; | ||
std::vector<float> scaleRatios; | ||
std::tie(netInputArray, scaleRatios) = cvMatToOpInput.format(inputImage); | ||
double scaleInputToOutput; | ||
op::Array<float> outputArray; | ||
std::tie(scaleInputToOutput, outputArray) = cvMatToOpOutput.format(inputImage); | ||
timeNow("Step 2"); | ||
// Step 3 - Estimate poseKeypoints | ||
poseExtractorTensorRT.forwardPass(netInputArray, {inputImage.cols, inputImage.rows}, scaleRatios); | ||
const auto poseKeypoints = poseExtractorTensorRT.getPoseKeypoints(); | ||
timeNow("Step 3"); | ||
// Step 4 - Render poseKeypoints | ||
poseRenderer.renderPose(outputArray, poseKeypoints); | ||
timeNow("Step 4"); | ||
// Step 5 - OpenPose output format to cv::Mat | ||
auto outputImage = opOutputToCvMat.formatToCvMat(outputArray); | ||
timeNow("Step 5"); | ||
|
||
// ------------------------- SHOWING RESULT AND CLOSING ------------------------- | ||
// Step 1 - Show results | ||
frameDisplayer.displayFrame(outputImage, 0); // Alternative: cv::imshow(outputImage) + cv::waitKey(0) | ||
// Step 2 - Logging information message | ||
op::log("Example 1 successfully finished.", op::Priority::High); | ||
|
||
const auto totalTimeSec = timeDiffToString(timings.back().second, timings.front().second); | ||
const auto message = "Pose estimation successfully finished. Total time: " + totalTimeSec + " seconds."; | ||
op::log(message, op::Priority::High); | ||
|
||
for(OpTimings::iterator timing = timings.begin()+1; timing != timings.end(); ++timing) { | ||
const auto log_time = (*timing).first + " - " + timeDiffToString((*timing).second, (*(timing-1)).second); | ||
op::log(log_time, op::Priority::High); | ||
} | ||
|
||
#endif // USE_TENSORRT | ||
|
||
// Return successful message | ||
return 0; | ||
} | ||
|
||
int main(int argc, char *argv[]) | ||
{ | ||
// Initializing google logging (Caffe uses it for logging) | ||
google::InitGoogleLogging("openPoseTutorialPose3"); | ||
|
||
// Parsing command line flags | ||
gflags::ParseCommandLineFlags(&argc, &argv, true); | ||
|
||
// Running openPoseTutorialPose1 | ||
return openPoseTutorialPose3(); | ||
} | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
#ifdef USE_TENSORRT | ||
#ifndef OPENPOSE_CORE_NET_TENSORRT_HPP | ||
#define OPENPOSE_CORE_NET_TENSORRT_HPP | ||
|
||
#include <caffe/net.hpp> | ||
#include <openpose/core/common.hpp> | ||
#include <openpose/core/net.hpp> | ||
|
||
#include "NvInfer.h" | ||
|
||
namespace op | ||
{ | ||
class OP_API NetTensorRT : public Net | ||
{ | ||
public: | ||
NetTensorRT(const std::array<int, 4>& netInputSize4D, const std::string& caffeProto, const std::string& caffeTrainedModel, const int gpuId = 0, | ||
const std::string& lastBlobName = "net_output"); | ||
|
||
virtual ~NetTensorRT(); | ||
|
||
void initializationOnThread(); | ||
|
||
// Alternative a) getInputDataCpuPtr or getInputDataGpuPtr + forwardPass | ||
float* getInputDataCpuPtr() const; | ||
|
||
float* getInputDataGpuPtr() const; | ||
|
||
// Alternative b) | ||
void forwardPass(const float* const inputNetData = nullptr) const; | ||
|
||
boost::shared_ptr<caffe::Blob<float>> getOutputBlob() const; | ||
|
||
private: | ||
// Init with constructor | ||
const int mGpuId; | ||
const std::array<int, 4> mNetInputSize4D; | ||
std::array<int, 4> mNetOutputSize4D; | ||
const unsigned long mNetInputMemory; | ||
const std::string mCaffeProto; | ||
const std::string mCaffeTrainedModel; | ||
const std::string mLastBlobName; | ||
// Init with thread | ||
|
||
boost::shared_ptr<caffe::Blob<float>> spInputBlob; | ||
boost::shared_ptr<caffe::Blob<float>> spOutputBlob; | ||
|
||
// TensorRT stuff | ||
nvinfer1::ICudaEngine* cudaEngine; | ||
nvinfer1::IExecutionContext* cudaContext; | ||
nvinfer1::ICudaEngine* caffeToGIEModel(); | ||
nvinfer1::ICudaEngine* createEngine(); | ||
cudaStream_t stream; | ||
cudaEvent_t start, end; | ||
|
||
DELETE_COPY(NetTensorRT); | ||
}; | ||
} | ||
|
||
#endif // OPENPOSE_CORE_NET_TENSORRT_HPP | ||
#endif // USE_TENSORRT |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
#ifdef USE_TENSORRT | ||
#ifndef OPENPOSE_POSE_POSE_EXTRACTOR_TENSORRT_HPP | ||
#define OPENPOSE_POSE_POSE_EXTRACTOR_TENSORRT_HPP | ||
|
||
#include <caffe/blob.hpp> | ||
#include <openpose/core/common.hpp> | ||
#include <openpose/core/net.hpp> | ||
#include <openpose/core/nmsCaffe.hpp> | ||
#include <openpose/core/resizeAndMergeCaffe.hpp> | ||
#include <openpose/pose/bodyPartConnectorCaffe.hpp> | ||
#include <openpose/pose/enumClasses.hpp> | ||
#include <openpose/pose/poseExtractor.hpp> | ||
|
||
namespace op | ||
{ | ||
class OP_API PoseExtractorTensorRT : public PoseExtractor | ||
{ | ||
public: | ||
PoseExtractorTensorRT(const Point<int>& netInputSize, const Point<int>& netOutputSize, const Point<int>& outputSize, const int scaleNumber, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the definition and implementation for |
||
const PoseModel poseModel, const std::string& modelFolder, const int gpuId, const std::vector<HeatMapType>& heatMapTypes = {}, | ||
const ScaleMode heatMapScale = ScaleMode::ZeroToOne); | ||
|
||
virtual ~PoseExtractorTensorRT(); | ||
|
||
void netInitializationOnThread(); | ||
|
||
void forwardPass(const Array<float>& inputNetData, const Point<int>& inputDataSize, const std::vector<float>& scaleRatios = {1.f}); | ||
|
||
const float* getHeatMapCpuConstPtr() const; | ||
|
||
const float* getHeatMapGpuConstPtr() const; | ||
|
||
const float* getPoseGpuConstPtr() const; | ||
|
||
private: | ||
const float mResizeScale; | ||
std::shared_ptr<Net> spNet; | ||
std::shared_ptr<ResizeAndMergeCaffe<float>> spResizeAndMergeTensorRT; | ||
std::shared_ptr<NmsCaffe<float>> spNmsTensorRT; | ||
std::shared_ptr<BodyPartConnectorCaffe<float>> spBodyPartConnectorTensorRT; | ||
// Init with thread | ||
boost::shared_ptr<caffe::Blob<float>> spTensorRTNetOutputBlob; | ||
std::shared_ptr<caffe::Blob<float>> spHeatMapsBlob; | ||
std::shared_ptr<caffe::Blob<float>> spPeaksBlob; | ||
std::shared_ptr<caffe::Blob<float>> spPoseBlob; | ||
|
||
DELETE_COPY(PoseExtractorTensorRT); | ||
}; | ||
} | ||
|
||
#endif // OPENPOSE_POSE_POSE_EXTRACTOR_TENSORRT_HPP | ||
#endif // USE_TENSORRT |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If DEEP_NET is
tensorrt
then theelse
clause is never reached hence lines73-76
are not needed. It looks like the libraries and dirs that are part of the else clause will still be needed fortensorrt
however so that should be moved up.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now TensorRT is included for the main inferences on the middle of a pipeline using CAFFE, for example I use caffe blobs for input and output. I think it's line 64-65 that should be removed.