Hwloc.jl is a high-level wrapper of the hwloc library. It examines the current machine's hardware topology (memories, caches, cores, etc.) and provides Julia functions to visualize and access this information conveniently.
Taken from the hwloc website:
The Portable Hardware Locality (hwloc) software package provides a portable abstraction (across OS, versions, architectures, ...) of the hierarchical topology of modern architectures, including NUMA memory nodes, sockets, shared caches, cores and simultaneous multithreading. It also gathers various system attributes such as cache and memory information as well as the locality of I/O devices such as network interfaces, InfiniBand HCAs or GPUs.
hwloc primarily aims at helping applications with gathering information about increasingly complex parallel computing platforms so as to exploit them accordingly and efficiently.
Perhaps the most important function is Hwloc.topology()
which
displays a tree structure describing the system topology. This
roughly corresponds to the output of the lstopo
program (non-GUI version).
On my laptop this gives the following output:
julia> using Hwloc
julia> topology()
Machine (31.05 GB)
Package L#0 P#0 (31.05 GB)
NUMANode (31.05 GB)
L3 (12.0 MB)
L2 (1.25 MB) + L1 (48.0 kB) + Core L#0 P#0
PU L#0 P#0
PU L#1 P#4
L2 (1.25 MB) + L1 (48.0 kB) + Core L#1 P#1
PU L#2 P#1
PU L#3 P#5
L2 (1.25 MB) + L1 (48.0 kB) + Core L#2 P#2
PU L#4 P#2
PU L#5 P#6
L2 (1.25 MB) + L1 (48.0 kB) + Core L#3 P#3
PU L#6 P#3
PU L#7 P#7
HostBridge
PCI 00:02.0 (VGA)
GPU "renderD128"
GPU "card0"
PCIBridge
PCI 01:00.0 (NVMExp)
Block(Disk) "nvme0n1"
PCIBridge
PCI 72:00.0 (Network)
Net "wlp114s0"
PCIBridge
PCI 73:00.0 (Other)
Block "mmcblk0"
Often, one is only interested in a summary of this topology.
The function topology_info()
provides such a compact description, which is loosely similar to the output of the hwloc-info
command-line application.
julia> topology_info()
Machine: 1 (31.05 GB)
Package: 1 (31.05 GB)
NUMANode: 1 (31.05 GB)
L3Cache: 1 (12.0 MB)
L2Cache: 4 (1.25 MB)
L1Cache: 4 (48.0 kB)
Core: 4
PU: 8
Bridge: 6
PCI_Device: 22
OS_Device: 13
If you prefer a more verbose graphical visualization you may consider using topology_graphical()
:
julia> topology_graphical()
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Machine (31GB total) │
│ │
│ ┌────────────────────────────────────────────────────────────────────┐ ├┤╶─┬─────┬─────────────┐ │
│ │ Package L#0 │ │ │ PCI 00:02.0 │ │
│ │ │ │ └─────────────┘ │
│ │ ┌────────────────────────────────────────────────────────────────┐ │ │ │
│ │ │ NUMANode L#0 P#0 (31GB) │ │ ├─────┼┤╶───────┬───────────────────┐ │
│ │ └────────────────────────────────────────────────────────────────┘ │ │3.9 3.9 │ PCI 01:00.0 │ │
│ │ │ │ │ │ │
│ │ ┌────────────────────────────────────────────────────────────────┐ │ │ │ ┌───────────────┐ │ │
│ │ │ L3 (12MB) │ │ │ │ │ Block nvme0n1 │ │ │
│ │ └────────────────────────────────────────────────────────────────┘ │ │ │ │ │ │ │
│ │ │ │ │ │ 953 GB │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ └───────────────┘ │ │
│ │ │ L2 (1280KB) │ │ L2 (1280KB) │ │ L2 (1280KB) │ │ L2 (1280KB) │ │ │ └───────────────────┘ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │ │
│ │ │ ├─────┼┤╶───────┬──────────────────┐ │
│ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │0.6 0.6 │ PCI 72:00.0 │ │
│ │ │ L1d (48KB) │ │ L1d (48KB) │ │ L1d (48KB) │ │ L1d (48KB) │ │ │ │ │ │
│ │ └────────────┘ └────────────┘ └────────────┘ └────────────┘ │ │ │ ┌──────────────┐ │ │
│ │ │ │ │ │ Net wlp114s0 │ │ │
│ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │ │ └──────────────┘ │ │
│ │ │ L1i (32KB) │ │ L1i (32KB) │ │ L1i (32KB) │ │ L1i (32KB) │ │ │ └──────────────────┘ │
│ │ └────────────┘ └────────────┘ └────────────┘ └────────────┘ │ │ │
│ │ │ └─────┼┤╶───────┬───────────────┐ │
│ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ 1.0 │ Block mmcblk0 │ │
│ │ │ Core L#0 │ │ Core L#1 │ │ Core L#2 │ │ Core L#3 │ │ │ │ │
│ │ │ │ │ │ │ │ │ │ │ │ 238 GB │ │
│ │ │ ┌────────┐ │ │ ┌────────┐ │ │ ┌────────┐ │ │ ┌────────┐ │ │ └───────────────┘ │
│ │ │ │ PU L#0 │ │ │ │ PU L#2 │ │ │ │ PU L#4 │ │ │ │ PU L#6 │ │ │ │
│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │
│ │ │ │ P#0 │ │ │ │ P#1 │ │ │ │ P#2 │ │ │ │ P#3 │ │ │ │
│ │ │ └────────┘ │ │ └────────┘ │ │ └────────┘ │ │ └────────┘ │ │ │
│ │ │ ┌────────┐ │ │ ┌────────┐ │ │ ┌────────┐ │ │ ┌────────┐ │ │ │
│ │ │ │ PU L#1 │ │ │ │ PU L#3 │ │ │ │ PU L#5 │ │ │ │ PU L#7 │ │ │ │
│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │
│ │ │ │ P#4 │ │ │ │ P#5 │ │ │ │ P#6 │ │ │ │ P#7 │ │ │ │
│ │ │ └────────┘ │ │ └────────┘ │ │ └────────┘ │ │ └────────┘ │ │ │
│ │ └────────────┘ └────────────┘ └────────────┘ └────────────┘ │ │
│ └────────────────────────────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(Note that as of now this may produce colorful output on some systems.)
Hwloc
exports a few convenience functions for obtaining particularly import information,
such as the number of physical and virtual cores (i.e. processing units), NUMA nodes, and sockets / packages:
julia> num_physical_cores()
6
julia> num_virtual_cores()
12
julia> num_numa_nodes()
1
julia> num_packages()
1
One may also use getinfo()
to programmatically access some of the output of topology_info()
:
julia> getinfo()
Dict{Symbol, Int64} with 11 entries:
:Package => 1
:PU => 8
:OS_Device => 13
:Core => 4
:L3Cache => 1
:Machine => 1
:PCI_Device => 22
:L2Cache => 4
:NUMANode => 1
:Bridge => 6
:L1Cache => 4
Assuming that multiple caches of the same level (e.g. L1) have identical properties, one can use the convenience functions cachesize()
and cachelinesize()
to obtain the relevant sizes in Bytes:
julia> cachesize()
(L1 = 32768, L2 = 262144, L3 = 12582912)
julia> cachelinesize()
(L1 = 64, L2 = 64, L3 = 64)
Otherwise, there are the following more specific functions available:
julia> @show Hwloc.l1cache_sizes();
@show Hwloc.l2cache_sizes();
@show Hwloc.l3cache_sizes();
Hwloc.l1cache_sizes() = [32768, 32768, 32768, 32768, 32768, 32768]
Hwloc.l2cache_sizes() = [262144, 262144, 262144, 262144, 262144, 262144]
Hwloc.l3cache_sizes() = [12582912]
Some systems have CPU cores of differents kinds, like, e.g., efficiency and performance cores. With Hwloc.jl, you can query the number of different kinds and the count of CPU cores for each kind. For example, on Mac mini M1 (4 efficiency and 4 performance cores):
julia> using Hwloc
julia> num_cpukinds()
2
julia> num_virtual_cores_cpukinds()
2-element Vector{Int64}:
4
4
You can also see which PU belongs to which CPU kind by passing cpukind=true
to topology
:
julia> topology(; cpukind=true)
Machine (3.49 GB)
Package L#0 P#0 (3.49 GB)
NUMANode (3.49 GB)
L2 (4.0 MB)
L1 (64.0 kB) + Core L#0 P#0
PU L#0 P#0 (1, DarwinCompatible=apple,icestorm;ARM,v8)
L1 (64.0 kB) + Core L#1 P#1
PU L#1 P#1 (1, DarwinCompatible=apple,icestorm;ARM,v8)
L1 (64.0 kB) + Core L#2 P#2
PU L#2 P#2 (1, DarwinCompatible=apple,icestorm;ARM,v8)
L1 (64.0 kB) + Core L#3 P#3
PU L#3 P#3 (1, DarwinCompatible=apple,icestorm;ARM,v8)
L2 (12.0 MB)
L1 (128.0 kB) + Core L#4 P#4
PU L#4 P#4 (2, DarwinCompatible=apple,firestorm;ARM,v8)
L1 (128.0 kB) + Core L#5 P#5
PU L#5 P#5 (2, DarwinCompatible=apple,firestorm;ARM,v8)
L1 (128.0 kB) + Core L#6 P#6
PU L#6 P#6 (2, DarwinCompatible=apple,firestorm;ARM,v8)
L1 (128.0 kB) + Core L#7 P#7
PU L#7 P#7 (2, DarwinCompatible=apple,firestorm;ARM,v8)
CoProc(OpenCL) "opencl0d0"
To manually traverse and investigate the system topology tree, one may use gettopology()
to
obtain the top-level Hwloc.Object
.
julia> topo = gettopology()
Hwloc.Object: Machine
julia> fieldnames(typeof(topo))
(:type_, :os_index, :name, :attr, :mem, :depth, :logical_index, :children, :memory_children)
julia> Hwloc.children(topo)
1-element Array{Hwloc.Object,1}:
Hwloc.Object: Package
julia> Hwloc.children(topo.children[1])
1-element Array{Hwloc.Object,1}:
Hwloc.Object: L3Cache
julia> l2cache = Hwloc.children(topo.children[1].children[1])[1]
Hwloc.Object: L2Cache
julia> Hwloc.attributes(l2cache)
Cache{size=262144,depth=2,linesize=64,associativity=4,type=Unified}
julia> l2cache |> print_topology
L2 (256.0 kB) + L1 (32.0 kB) + Core L#0 P#0
PU L#0 P#0
PU L#1 P#1
Topology elements of type Hwloc.Object
also are Julia iterators. One can thus readily traverse the corresponding part of the topology tree:
julia> for obj in l2cache
@show hwloc_typeof(obj)
end
hwloc_typeof(obj) = :L2Cache
hwloc_typeof(obj) = :L1Cache
hwloc_typeof(obj) = :Core
hwloc_typeof(obj) = :PU
hwloc_typeof(obj) = :PU
julia> collect(obj for obj in l2cache)
5-element Array{Hwloc.Object,1}:
Hwloc.Object: L2Cache
Hwloc.Object: L1Cache
Hwloc.Object: Core
Hwloc.Object: PU
Hwloc.Object: PU
julia> count(hwloc_isa(:PU), l2cache)
2
julia> collectobjects(:PU, l2cache)
2-element Array{Hwloc.Object,1}:
Hwloc.Object: PU
Hwloc.Object: PU
On the first call of gettopology()
, Hwloc.jl examines the current machine's
hardware topology and caches the result in Hwloc.machine_topology
.
To query the system the system topology again -- i.e. not using the cached
Hwloc.Object
representing the entire machine -- simply pass the reload=true
(false
by default) kwarg:
julia> topo = gettopology(;reload=true)
Hwloc.Object: Machine
You may prefer not to include I/O devices in you Hwloc tree, then we recommend
passing the io=false
(true
by default) kwarg, in addition to reload
(cf.
above):
julia> topo = gettopology(;reload=true, io=false)
Hwloc.Object: Machine
julia> topology(topo)
Machine (31.05 GB)
Package L#0 P#0 (31.05 GB)
NUMANode (31.05 GB)
L3 (12.0 MB)
L2 (1.25 MB) + L1 (48.0 kB) + Core L#0 P#0
PU L#0 P#0
PU L#1 P#4
L2 (1.25 MB) + L1 (48.0 kB) + Core L#1 P#1
PU L#2 P#1
PU L#3 P#5
L2 (1.25 MB) + L1 (48.0 kB) + Core L#2 P#2
PU L#4 P#2
PU L#5 P#6
L2 (1.25 MB) + L1 (48.0 kB) + Core L#3 P#3
PU L#6 P#3
PU L#7 P#7
(note: to avoid caching by accident, we recommend passing reload=true
to
gettopology
)
Warning: As discussed earlier, Hwloc.jl
makes heavy use of caching in the
high-level API. Using the low-level and high-level APIs together can result in
cached values being used by accident! We therefore recommend that the high-level
gettopology
funcion is used, where caching is controlled via the reload
kwarg.
Under the hood, gettopology
uses Hwloc.topology_init
and
Hwloc.topology_load
to directly ccall
into libhwloc
. Hwloc.topology_init
is reponsible for creating a low-level LibHwloc.hwloc_topology
object.
Hwloc.topology_load
wraps this a Hwloc.Object
Julia object.
Note: Hwloc.topology_load
is destructive to the LibHwloc.hwloc_topology
object:
julia> htopo = Hwloc.topology_init()
Ptr{Hwloc.LibHwloc.hwloc_topology} @0x000000000883cf60
julia> topo = Hwloc.topology_load(htopo)
Hwloc.Object: Machine
julia> topo = Hwloc.topology_load(htopo)
ERROR: AssertionError: ierr == 0
Stacktrace:
[1] topology_load(htopo::Ptr{Hwloc.LibHwloc.hwloc_topology})
@ Hwloc ~/.julia/dev/Hwloc/src/lowlevel_api.jl:347
[2] top-level scope
@ REPL[78]:1
This is because LibHwloc.hwloc_topology
are not garbage-collected (a call to
Hwloc.topology_init
, without a later call to Hwloc.hwloc_topology_destroy
will leak memory). This is why Hwloc.topology_load
calls
Hwloc.hwloc_topology_destroy
after creating the Hwloc.Object
Julia object
(which is garbage collected!).
If the AbstractTrees
module is loaded, then passing an Hwloc.Object
to AbstractTrees.children
will construct an HwlocTreeNode
. Calling children(gettopology())
will
return the Hwloc tree root:
julia> using AbstractTrees, Hwloc
julia> t = children(gettopology());
julia> print_tree(t; maxdepth=2)
Hwloc.Object: Machine
├─ Hwloc.Object: Package [L#0 P#0]
│ ├─ Hwloc.Object: L3Cache
│ │ ⋮
│ │
│ └─ Hwloc.Object: NUMANode
└─ Hwloc.Object: Bridge [HostBridge]
├─ Hwloc.Object: PCI_Device [00:00.0 (HostBridge)]
├─ Hwloc.Object: PCI_Device [00:02.0 (VGA)]
│ ⋮
│
├─ Hwloc.Object: PCI_Device [00:04.0 (SignalProcessing)]
├─ Hwloc.Object: Bridge [PCIBridge]
│ ⋮
│
├─ Hwloc.Object: Bridge [PCIBridge]
├─ Hwloc.Object: Bridge [PCIBridge]
├─ Hwloc.Object: PCI_Device [00:0a.0 (SignalProcessing)]
├─ Hwloc.Object: PCI_Device [00:0d.0 (USB)]
├─ Hwloc.Object: PCI_Device [00:0d.2 (USB)]
├─ Hwloc.Object: PCI_Device [00:0d.3 (USB)]
├─ Hwloc.Object: PCI_Device [00:12.0 (Serial)]
├─ Hwloc.Object: PCI_Device [00:14.0 (USB)]
├─ Hwloc.Object: PCI_Device [00:14.2 (RAM)]
├─ Hwloc.Object: PCI_Device [00:15.0 (SerialBus)]
│ ⋮
│
├─ Hwloc.Object: PCI_Device [00:15.1 (SerialBus)]
│ ⋮
│
├─ Hwloc.Object: PCI_Device [00:16.0 (Communication)]
├─ Hwloc.Object: PCI_Device [00:19.0 (SerialBus)]
│ ⋮
│
├─ Hwloc.Object: PCI_Device [00:19.1 (SerialBus)]
│ ⋮
│
├─ Hwloc.Object: Bridge [PCIBridge]
│ ⋮
│
├─ Hwloc.Object: Bridge [PCIBridge]
│ ⋮
│
├─ Hwloc.Object: PCI_Device [00:1f.0 (ISABridge)]
├─ Hwloc.Object: PCI_Device [00:1f.3 (MultimediaAudio)]
├─ Hwloc.Object: PCI_Device [00:1f.4 (SMBus)]
└─ Hwloc.Object: PCI_Device [00:1f.5 (SerialBus)]
For examples of using the AbstracTree interface to search the Hwloc tree, see: NetworkInterfaceControllers.jl