Assess parallelism opportunities in Nav2 #2042

SteveMacenski · 2020-10-14T18:25:31Z

From GPU, OpenMP, TBB, etc

DWB has these N critics over M trajectories structure that could be parallelized at 2 levels
AMCL / nav2 localization framework is built on a particle filter, which are independently updated
Costmap layers can be updated separately and then combined (and sensor processing layers can process multiple measurement readings at once with raycasting/marking)
Anything in planning? Not sure because search based. Maybe a sampling planner could better utilize
Bt navigator recoveries to check for planning/control validity during recovery execution to preempt
Nav2 dynamic tracking / detection / layer processing
Any new algorithms
transformations of TF for things like pointclouds to a new frame. (e.g. laser projections too)
Voxel grid: To support the above, we have a voxel grid library which could probably have some internal optimizations with parallel computing to march
SmacPlanner* (soon): We are collision checking in increments of a motion primitive for the state lattice planner to make sure valid expansions. E.g. we have long motion primitives that cannot be summarized by collision checking the start and end, need to do a few in between and collision checking isn't cheap.
RPP/Recoveries: We forward project in dt increments into the future given a velocity command in the RPP controller and backup / spin recoveries to collision check. This could be parallelized to collision check them all at once.

The text was updated successfully, but these errors were encountered:

simutisernestas · 2020-10-17T18:06:17Z

I've taken a look into costmap layers updates out of curiosity. I'm not quite sure if that's a reasonable assessment, so here is a short description of what I did exactly. As I understand the heaviest stuff in layers is done here ObstacleLayer::updateBounds and my idea were to exploit OpenMPs "parallel for" directive.

So I've modified the first loop in the LayeredCostmap::updateMap function to support parallel updates. Also, I've stacked more layers onto costmap to make the difference more clear. Changes are available here: simutisernestas@f0630c6.

I've observed that map average update time (tb3 simulation, Intel® Core™ i7-3770 CPU @ 3.40GHz×8):

without "parallel for" ranges from 19ms - 24ms
with "parallel for" ranges from 12ms - 18ms

If considering the 5Hz update rate 6ms difference adds up to a 30ms gain per second. I suppose that processing the real-world data could possibly increase layer update time and the effect seen here would be much more visible (for example longer lidar range or bigger pointcloud).

Would be cool to hear a second opinion. :)

SteveMacenski · 2020-10-26T22:12:30Z

I'm not sure that update bounds is the best place for this because all of the layers updating will be updating those max/min i/j pointers. You'd need to make sure you handled those resources carefully so that they don't get corrupted. I'd think the updateCosts would be a good target too (but similarly with the master_grid needing to be careful). I think OMP has some options like SHARED or something similar to deal with these cases. I'd try both updates.

I was also thinking within the obstacle / voxel layers parallelizing the marking / clearing operations since those are independent measurements and many of them (QVA sensor is thousands of iterations)

30ms isn't anything to snuff at. Alot of significant performance gains can be had by nickle and diming the system. 30ms here... 30 ms there... all the sudden you're 2 or 3x faster. 6ms on a 24ms process is still a 25% improvement, that's alot for such a little amount of work!

abylikhsanov · 2021-02-20T16:48:45Z

Regarding our previous chat on this issue:
#2190

You have mentioned that you would start from the "outer loop" first, can you please elaborate more on what you meant?

SteveMacenski · 2021-02-23T18:36:55Z

You mentioned and outer and inner loop to try to parallelize with DWB, just start with the outer one only and then benchmark + add a PR. I think you'll find one level will do most of the heavy lifting you require (e.g. DWB has these N critics over M trajectories structure that could be parallelized at 2 levels). I forget which is the outer most for loop in DWB, but I think its the trajectory generator (e.g. M) to generate the M trajectories of vx * vy samples, parallelize that first

SteveMacenski · 2021-03-06T01:09:20Z

@simutisernestas is there a reason we couldn't merge that openMP solution into costmap 2d?

simutisernestas · 2021-03-06T08:56:06Z

if you're up for it, I would be happy to make a PR

SteveMacenski · 2021-03-08T20:10:55Z

Sure, its a starting point! Only does updateBounds (not costs) but you've shown some compelling speed ups on just that itself!

SteveMacenski · 2021-04-07T04:19:24Z

@abylikhsanov any progress to share?

Parv-Maheshwari · 2021-07-06T08:58:30Z

* Anything in planning? Not sure because search based. Maybe a sampling planner could better utilize

hi @SteveMacenski . I have worked on a sampling based local planner in frenet frame for ROS1 on which I have used OpenMP which showed a five fold increase in the frequency while using just 8 threads.

So I wanted to know would it possible to include our local planner as a controller plugin for NAV2. We would obviously add or change funtionalities according to the requirement for NAV2.

I would also love to read your thoughts about this and what all should/can we do.

P.S. Me and my team are open to porting our planner to ROS2

SteveMacenski · 2021-07-06T18:01:22Z

Hi @Parv-Maheshwari, thanks for reaching out! I think that might be a good discussion to have in this ticket #1710 instead. Can you continue the discussion there explaining specifically what the technique is you've implemented that you'd be interested in contributing (and potentially a link if already open sourced)?

SteveMacenski · 2022-02-08T02:53:32Z

Closing for now -- I've recently done some experiments on a Nvidia Jetson and was surprised how little CPU Nav2 was using with the full system running while processing 2 depth sensors. It looks like Nav2 is good enough as-is for embedded use that we don't need to speed up a whole lot more to be perfectly suitable. DWB is the big area that can use the most help that is the thing causing problems and we have another ticket open to handle that #2045

SteveMacenski mentioned this issue Feb 18, 2021

Controller server aborts the action server handle #2190

Closed

simutisernestas mentioned this issue Jun 7, 2021

parallelize costmap bound updates #2393

Closed

6 tasks

Parv-Maheshwari mentioned this issue Jul 6, 2021

Working list of algorithms to include #1710

Closed

22 tasks

SteveMacenski closed this as completed Feb 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assess parallelism opportunities in Nav2 #2042

Assess parallelism opportunities in Nav2 #2042

SteveMacenski commented Oct 14, 2020 •

edited

Loading

simutisernestas commented Oct 17, 2020

SteveMacenski commented Oct 26, 2020 •

edited

Loading

abylikhsanov commented Feb 20, 2021

SteveMacenski commented Feb 23, 2021

SteveMacenski commented Mar 6, 2021

simutisernestas commented Mar 6, 2021

SteveMacenski commented Mar 8, 2021

SteveMacenski commented Apr 7, 2021

Parv-Maheshwari commented Jul 6, 2021

SteveMacenski commented Jul 6, 2021

SteveMacenski commented Feb 8, 2022

Assess parallelism opportunities in Nav2 #2042

Assess parallelism opportunities in Nav2 #2042

Comments

SteveMacenski commented Oct 14, 2020 • edited Loading

simutisernestas commented Oct 17, 2020

SteveMacenski commented Oct 26, 2020 • edited Loading

abylikhsanov commented Feb 20, 2021

SteveMacenski commented Feb 23, 2021

SteveMacenski commented Mar 6, 2021

simutisernestas commented Mar 6, 2021

SteveMacenski commented Mar 8, 2021

SteveMacenski commented Apr 7, 2021

Parv-Maheshwari commented Jul 6, 2021

SteveMacenski commented Jul 6, 2021

SteveMacenski commented Feb 8, 2022

SteveMacenski commented Oct 14, 2020 •

edited

Loading

SteveMacenski commented Oct 26, 2020 •

edited

Loading