Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Router] Upstream Fine-Grained Parallel Router (FPT'24) #2920

Draft
wants to merge 12 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions doc/src/api/vprinternals/router_connection_router.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
==========
Connection Router
==========

ConnectionRouter
---------
.. doxygenfile:: connection_router.h
:project: vpr

SerialConnectionRouter
----------
.. doxygenclass:: SerialConnectionRouter
:project: vpr

ParallelConnectionRouter
----------
.. doxygenclass:: ParallelConnectionRouter
:project: vpr
1 change: 1 addition & 0 deletions doc/src/api/vprinternals/vpr_router.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ VPR Router

router_heap
router_lookahead
router_connection_router
131 changes: 101 additions & 30 deletions doc/src/vpr/command_line_usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,12 +47,12 @@ By default VPR will perform a binary search routing to find the minimum channel

Detailed Command-line Options
-----------------------------
VPR has a lot of options. Running :option:`vpr --help` will display all the available options and their usage information.
VPR has a lot of options. Running :option:`vpr --help` will display all the available options and their usage information.

.. option:: -h, --help

Display help message then exit.

The options most people will be interested in are:

* :option:`--route_chan_width` (route at a fixed channel width), and
Expand Down Expand Up @@ -208,7 +208,7 @@ General Options
* Any string matching ``name`` attribute of a device layout defined with a ``<fixed_layout>`` tag in the :ref:`arch_grid_layout` section of the architecture file.

If the value specified is neither ``auto`` nor matches the ``name`` attribute value of a ``<fixed_layout>`` tag, VPR issues an error.

.. note:: If the only layout in the architecture file is a single device specified using ``<fixed_layout>``, it is recommended to always specify the ``--device`` option; this prevents the value ``--device auto`` from interfering with operations supported only for ``<fixed_layout>`` grids.

**Default:** ``auto``
Expand Down Expand Up @@ -892,7 +892,7 @@ If any of init_t, exit_t or alpha_t is specified, the user schedule, with a fixe

.. option:: --place_agent_algorithm {e_greedy | softmax}

Controls which placement RL agent is used.
Controls which placement RL agent is used.

**Default:** ``softmax``

Expand All @@ -914,10 +914,10 @@ If any of init_t, exit_t or alpha_t is specified, the user schedule, with a fixe

.. option:: --place_reward_fun {basic | nonPenalizing_basic | runtime_aware | WLbiased_runtime_aware}

The reward function used by the placement RL agent to learn the best action at each anneal stage.
The reward function used by the placement RL agent to learn the best action at each anneal stage.

.. note:: The latter two are only available for timing-driven placement.

.. note:: The latter two are only available for timing-driven placement.

**Default:** ``WLbiased_runtime_aware``

.. option:: --place_agent_space {move_type | move_block_type}
Expand All @@ -927,20 +927,20 @@ If any of init_t, exit_t or alpha_t is specified, the user schedule, with a fixe
**Default:** ``move_block_type``

.. option:: --place_quench_only {on | off}

If this option is set to ``on``, the placement will skip the annealing phase and only perform the placement quench.
This option is useful when the the quality of initial placement is good enough and there is no need to perform the
This option is useful when the the quality of initial placement is good enough and there is no need to perform the
annealing phase.

**Default:** ``off``


.. option:: --placer_debug_block <int>

.. note:: This option is likely only of interest to developers debugging the placement algorithm

Controls which block the placer produces detailed debug information for.
Controls which block the placer produces detailed debug information for.

If the block being moved has the same ID as the number assigned to this parameter, the placer will print debugging information about it.

* For values >= 0, the value is the block ID for which detailed placer debug information should be produced.
Expand All @@ -952,7 +952,7 @@ If any of init_t, exit_t or alpha_t is specified, the user schedule, with a fixe
**Default:** ``-2``

.. option:: --placer_debug_net <int>

.. note:: This option is likely only of interest to developers debugging the placement algorithm

Controls which net the placer produces detailed debug information for.
Expand Down Expand Up @@ -996,7 +996,7 @@ The following options are only valid when the placement engine is in timing-driv

.. option:: --quench_recompute_divider <int>

Controls how many times the placer performs a timing analysis to update its criticality estimates during a quench.
Controls how many times the placer performs a timing analysis to update its criticality estimates during a quench.
If unspecified, uses the value from --inner_loop_recompute_divider.

**Default:** ``0``
Expand Down Expand Up @@ -1080,7 +1080,7 @@ The following options are only valid when the placement engine is in timing-driv

NoC Options
^^^^^^^^^^^^^^
The following options are only used when FPGA device and netlist contain a NoC router.
The following options are only used when FPGA device and netlist contain a NoC router.

.. option:: --noc {on | off}

Expand All @@ -1090,15 +1090,15 @@ The following options are only used when FPGA device and netlist contain a NoC r
**Default:** ``off``

.. option:: --noc_flows_file <file>

XML file containing the list of traffic flows within the NoC (communication between routers).

.. note:: noc_flows_file are required to specify if NoC optimization is turned on (--noc on).

.. option:: --noc_routing_algorithm {xy_routing | bfs_routing | west_first_routing | north_last_routing | negative_first_routing | odd_even_routing}

Controls the algorithm used by the NoC to route packets.

* ``xy_routing`` Uses the direction oriented routing algorithm. This is recommended to be used with mesh NoC topologies.
* ``bfs_routing`` Uses the breadth first search algorithm. The objective is to find a route that uses a minimum number of links. This algorithm is not guaranteed to generate deadlock-free traffic flow routes, but can be used with any NoC topology.
* ``west_first_routing`` Uses the west-first routing algorithm. This is recommended to be used with mesh NoC topologies.
Expand All @@ -1111,11 +1111,11 @@ The following options are only used when FPGA device and netlist contain a NoC r
.. option:: --noc_placement_weighting <float>

Controls the importance of the NoC placement parameters relative to timing and wirelength of the design.

* ``noc_placement_weighting = 0`` means the placement is based solely on timing and wirelength.
* ``noc_placement_weighting = 1`` means noc placement is considered equal to timing and wirelength.
* ``noc_placement_weighting > 1`` means the placement is increasingly dominated by NoC parameters.

**Default:** ``5.0``

.. option:: --noc_aggregate_bandwidth_weighting <float>
Expand All @@ -1133,7 +1133,7 @@ The following options are only used when FPGA device and netlist contain a NoC r
Other positive numbers specify the importance of meeting latency constraints compared to other NoC-related cost terms.
Weighting factors for NoC-related cost terms are normalized internally. Therefore, their absolute values are not important, and
only their relative ratios determine the importance of each cost term.

**Default:** ``0.6``

.. option:: --noc_latency_weighting <float>
Expand All @@ -1143,7 +1143,7 @@ The following options are only used when FPGA device and netlist contain a NoC r
Other positive numbers specify the importance of minimizing aggregate latency compared to other NoC-related cost terms.
Weighting factors for NoC-related cost terms are normalized internally. Therefore, their absolute values are not important, and
only their relative ratios determine the importance of each cost term.

**Default:** ``0.02``

.. option:: --noc_congestion_weighting <float>
Expand All @@ -1159,11 +1159,11 @@ The following options are only used when FPGA device and netlist contain a NoC r
.. option:: --noc_swap_percentage <float>

Sets the minimum fraction of swaps attempted by the placer that are NoC blocks.
This value is an integer ranging from [0-100].
* ``0`` means NoC blocks will be moved at the same rate as other blocks.
This value is an integer ranging from [0-100].

* ``0`` means NoC blocks will be moved at the same rate as other blocks.
* ``100`` means all swaps attempted by the placer are NoC router blocks.

**Default:** ``0``

.. option:: --noc_placement_file_name <file>
Expand Down Expand Up @@ -1249,7 +1249,7 @@ Analytical Placement is generally split into three stages:

* ``none`` Do not use any Detailed Placer.

* ``annealer`` Use the Annealer from the Placement stage as a Detailed Placer. This will use the same Placer Options from the Place stage to configure the annealer.
* ``annealer`` Use the Annealer from the Placement stage as a Detailed Placer. This will use the same Placer Options from the Place stage to configure the annealer.

**Default:** ``annealer``

Expand Down Expand Up @@ -1326,8 +1326,8 @@ VPR uses a negotiated congestion algorithm (based on Pathfinder) to perform rout

.. option:: --max_pres_fac <float>

Sets the maximum present overuse penalty factor that can ever result during routing. Should always be less than 1e25 or so to prevent overflow.
Smaller values may help prevent circuitous routing in difficult routing problems, but may increase
Sets the maximum present overuse penalty factor that can ever result during routing. Should always be less than 1e25 or so to prevent overflow.
Smaller values may help prevent circuitous routing in difficult routing problems, but may increase
the number of routing iterations needed and hence runtime.

**Default:** ``1000.0``
Expand Down Expand Up @@ -1406,7 +1406,7 @@ VPR uses a negotiated congestion algorithm (based on Pathfinder) to perform rout

.. option:: --router_algorithm {timing_driven | parallel | parallel_decomp}

Selects which router algorithm to use.
Selects which router algorithm to use.

* ``timing_driven`` is the default single-threaded PathFinder algorithm.

Expand Down Expand Up @@ -1488,13 +1488,84 @@ The following options are only valid when the router is in timing-driven mode (t
**Default:** ``0.0``

.. option:: --router_profiler_astar_fac <float>

Controls the directedness of the timing-driven router's exploration when doing router delay profiling of an architecture.
The router delay profiling step is currently used to calculate the place delay matrix lookup.
Values between 1 and 2 are resonable; higher values trade some quality for reduced run-time.

**Default:** ``1.2``

.. option:: --enable_parallel_connection_router {on | off}

Controls whether the MultiQueue-based parallel connection router is used during a single connection routing.

When enabled, the parallel connection router accelerates the path search for individual source-sink connections using
multi-threading without altering the net routing order.

**Default:** ``off``

.. option:: --post_target_prune_fac <float>

Controls the post-target pruning heuristic calculation in the parallel connection router.

This parameter is used as a multiplicative factor applied to the VPR heuristic (not guaranteed to be admissible, i.e.,
might over-predict the cost to the sink) to calculate the 'stopping heuristic' when pruning nodes after the target has
been reached. The 'stopping heuristic' must be admissible for the path search algorithm to guarantee optimal paths and
be deterministic.

Values of this parameter are architecture-specific and have to be empirically found.

This parameter has no effect if :option:`--enable_parallel_connection_router` is not set.

**Default:** ``1.2``

.. option:: --post_target_prune_offset <float>

Controls the post-target pruning heuristic calculation in the parallel connection router.

This parameter is used as a subtractive offset together with :option:`--post_target_prune_fac` to apply an affine
transformation on the VPR heuristic to calculate the 'stopping heuristic'. The 'stopping heuristic' must be admissible
for the path search algorithm to guarantee optimal paths and be deterministic.

Values of this parameter are architecture-specific and have to be empirically found.

This parameter has no effect if :option:`--enable_parallel_connection_router` is not set.

**Default:** ``0.0``

.. option:: --multi_queue_num_threads <int>

Controls the number of threads used by MultiQueue-based parallel connection router.

If not explicitly specified, defaults to 1, implying the parallel connection router works in 'serial' mode using only
one main thread to route.

This parameter has no effect if :option:`--enable_parallel_connection_router` is not set.

**Default:** ``1``

.. option:: --multi_queue_num_queues <int>

Controls the number of queues used by MultiQueue in the parallel connection router.

Must be set >= 2. A common configuration for this parameter is the number of threads used by MultiQueue * 4 (the number
of queues per thread).

This parameter has no effect if :option:`--enable_parallel_connection_router` is not set.

**Default:** ``2``

.. option:: --multi_queue_direct_draining {on | off}

Controls whether to enable queue draining optimization for MultiQueue-based parallel connection router.

When enabled, queues can be emptied quickly by draining all elements if no further solutions need to be explored in the
path search to guarantee optimality or determinism after reaching the target.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may be confusing and needs more explanation. The queue draining optimization requires the ordering heuriststic (astar fac/offset) to be admissible in order to be deterministic.

The wording above also makes it sound like you can only use this if optimality and determinism is required; however this optimization can be turned on even if you do not care about either of these.


This parameter has no effect if :option:`--enable_parallel_connection_router` is not set.

**Default:** ``off``

.. option:: --max_criticality <float>

Sets the maximum fraction of routing cost that can come from delay (vs. coming from routability) for any net.
Expand Down
3 changes: 2 additions & 1 deletion utils/route_diag/src/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,8 @@ static void do_one_route(const Netlist<>& net_list,
segment_inf,
is_flat);

ConnectionRouter<FourAryHeap> router(
// TODO: adding tests for parallel connection router
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this TODO still valid?

SerialConnectionRouter<FourAryHeap> router(
device_ctx.grid,
*router_lookahead,
device_ctx.rr_graph.rr_nodes(),
Expand Down
6 changes: 6 additions & 0 deletions vpr/src/base/SetupVPR.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -433,6 +433,12 @@ static void SetupRouterOpts(const t_options& Options, t_router_opts* RouterOpts)
RouterOpts->astar_fac = Options.astar_fac;
RouterOpts->astar_offset = Options.astar_offset;
RouterOpts->router_profiler_astar_fac = Options.router_profiler_astar_fac;
RouterOpts->enable_parallel_connection_router = Options.enable_parallel_connection_router;
RouterOpts->post_target_prune_fac = Options.post_target_prune_fac;
RouterOpts->post_target_prune_offset = Options.post_target_prune_offset;
RouterOpts->multi_queue_num_threads = Options.multi_queue_num_threads;
RouterOpts->multi_queue_num_queues = Options.multi_queue_num_queues;
RouterOpts->multi_queue_direct_draining = Options.multi_queue_direct_draining;
RouterOpts->bb_factor = Options.bb_factor;
RouterOpts->criticality_exp = Options.criticality_exp;
RouterOpts->max_criticality = Options.max_criticality;
Expand Down
6 changes: 6 additions & 0 deletions vpr/src/base/ShowSetup.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -379,6 +379,12 @@ static void ShowRouterOpts(const t_router_opts& RouterOpts) {
VTR_LOG("RouterOpts.astar_fac: %f\n", RouterOpts.astar_fac);
VTR_LOG("RouterOpts.astar_offset: %f\n", RouterOpts.astar_offset);
VTR_LOG("RouterOpts.router_profiler_astar_fac: %f\n", RouterOpts.router_profiler_astar_fac);
VTR_LOG("RouterOpts.enable_parallel_connection_router: %s\n", RouterOpts.enable_parallel_connection_router ? "true" : "false");
VTR_LOG("RouterOpts.post_target_prune_fac: %f\n", RouterOpts.post_target_prune_fac);
VTR_LOG("RouterOpts.post_target_prune_offset: %f\n", RouterOpts.post_target_prune_offset);
VTR_LOG("RouterOpts.multi_queue_num_threads: %d\n", RouterOpts.multi_queue_num_threads);
VTR_LOG("RouterOpts.multi_queue_num_queues: %d\n", RouterOpts.multi_queue_num_queues);
VTR_LOG("RouterOpts.multi_queue_direct_draining: %s\n", RouterOpts.multi_queue_direct_draining ? "true" : "false");
VTR_LOG("RouterOpts.criticality_exp: %f\n", RouterOpts.criticality_exp);
VTR_LOG("RouterOpts.max_criticality: %f\n", RouterOpts.max_criticality);
VTR_LOG("RouterOpts.init_wirelength_abort_threshold: %f\n", RouterOpts.init_wirelength_abort_threshold);
Expand Down
Loading
Loading