netdev CI testing #6666

kuba-moo · 2024-03-27T20:02:33Z

Reusable PR for hooking netdev CI to BPF testing.

We will get rid of RTNL from RTM_NEWROUTE and SIOCADDRT. Then, we must perform two lookups for nexthop and dev under RCU to guarantee their lifetime. ip6_route_info_create() calls nexthop_find_by_id() first if RTA_NH_ID is specified, and then allocates struct fib6_info. nexthop_find_by_id() must be called under RCU, but we do not want to use GFP_ATOMIC for memory allocation here, which will be likely to fail in ip6_route_multipath_add(). Let's move nexthop_find_by_id() after the memory allocation. Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: NipaLocal <nipa@local>

We will get rid of RTNL from RTM_NEWROUTE and SIOCADDRT. Then, we want to allocate everything as possible before entering the RCU section. The RCU section will start in the middle of ip6_route_info_create(), and this is problematic for ip6_route_multipath_add() that calls ip6_route_info_create() multiple times. Let's split ip6_route_info_create() into two parts; one for memory allocation and another for nexthop setup. Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: NipaLocal <nipa@local>

ip6_route_info_create_nh() will be called under RCU. Then, fib6_nh_init() is also under RCU, but per-cpu memory allocation is very likely to fail with GFP_ATOMIC while bluk-adding IPv6 routes and we will see a bunch of this message in dmesg. percpu: allocation failed, size=8 align=8 atomic=1, atomic alloc failed, no space left percpu: allocation failed, size=8 align=8 atomic=1, atomic alloc failed, no space left Let's preallocate rt->fib6_nh->rt6i_pcpu in ip6_route_info_create(). If something fails before the original memory allocation in fib6_nh_init(), ip6_route_info_create_nh() calls fib6_info_release(), which releases the preallocated per-cpu memory. Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: NipaLocal <nipa@local>

ip6_route_info_create_nh() will be called under RCU. It calls fib_nh_common_init() and allocates nhc->nhc_pcpu_rth_output. As with the reason for rt->fib6_nh->rt6i_pcpu, we want to avoid GFP_ATOMIC allocation for nhc->nhc_pcpu_rth_output under RCU. Let's preallocate it in ip6_route_info_create(). Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: NipaLocal <nipa@local>

net is not used in ip6_route_info_append() after commit 36f19d5 ("net/ipv6: Remove extra call to ip6_convert_metrics for multipath case"). Let's remove the argument. Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: NipaLocal <nipa@local>

We will get rid of RTNL from RTM_NEWROUTE and SIOCADDRT. Then, the RCU section will start before ip6_route_info_create_nh() in ip6_route_multipath_add(), but ip6_route_info_create() is called in the same loop and will sleep. Let's split the loop into ip6_route_mpath_info_create() and ip6_route_mpath_info_create_nh(). Note that ip6_route_info_append() is now integrated into ip6_route_mpath_info_create_nh() because we need to call different free functions for nexthops that passed ip6_route_info_create_nh(). In case of failure, the remaining nexthops that ip6_route_info_create_nh() has not been called for will be freed by ip6_route_mpath_info_cleanup(). OTOH, if a nexthop passes ip6_route_info_create_nh(), it will be linked to a local temporary list, which will be spliced back to rt6_nh_list. In case of failure, these nexthops will be released by fib6_info_release() in ip6_route_multipath_add(). Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: NipaLocal <nipa@local>

We will get rid of RTNL from RTM_NEWROUTE and SIOCADDRT. If the request specifies a new table ID, fib6_new_table() is called to create a new routing table. Two concurrent requests could specify the same table ID, so we need a lock to protect net->ipv6.fib_table_hash[h]. Let's add a spinlock to protect the hash bucket linkage. Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: NipaLocal <nipa@local>

We will get rid of RTNL from RTM_NEWROUTE and SIOCADDRT. Then, we may be going to add a route tied to a dying nexthop. The nexthop itself is not freed during the RCU graceful period, but if we link a route after __remove_nexthop_fib() is called for the nexthop, the route will be leaked. To avoid the race between IPv6 route addition under RCU vs nexthop deletion under RTNL, let's add a dead flag and protect it and nh->f6i_list with a spinlock. __remove_nexthop_fib() acquires the nexthop's spinlock and sets false to nh->dead, then calls ip6_del_rt() for the linked route one by one without the spinlock because fib6_purge_rt() acquires it later. While adding an IPv6 route, fib6_add() acquires the nexthop lock and checks the dead flag just before inserting the route. Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: NipaLocal <nipa@local>

Now we are ready to remove RTNL from SIOCADDRT and RTM_NEWROUTE. The remaining things to do are 1. pass false to lwtunnel_valid_encap_type_attr() 2. use rcu_dereference_rtnl() in fib6_check_nexthop() 3. place rcu_read_lock() before ip6_route_info_create_nh(). Let's complete RTNL-free conversion. When each CPU-X adds 100000 routes on table-X in a batch on c7a.metal-48xl EC2 instance with 192 CPUs, without this series: $ sudo ./route_test.sh ... added 19200000 routes (100000 routes * 192 tables). Time elapsed: 189154 milliseconds. with this series: $ sudo ./route_test.sh ... added 19200000 routes (100000 routes * 192 tables). Time elapsed: 62531 milliseconds. Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: NipaLocal <nipa@local>

This builds on commit 19249c0 ("net: make net.core.{r,w}mem_{default,max} namespaced") by adding support for writing the sysctls from within net namespaces, rather than only reading the values that were set in init_net. These are relatively commonly-used sysctls, so programs may try to set them without knowing that they're in a container. It can be surprising for such attempts to fail with EACCES. Unlike other net sysctls that were converted to namespaced ones, many systems have a sysctl.conf (or other configs) that globally write to net.core.rmem_default on boot and expect the value to propagate to containers, and programs running in containers may depend on the increased buffer sizes in order to work properly. This means that namespacing the sysctls and using the kernel default values in each new netns would break existing workloads. As a compromise, inherit the initial net.core.*mem_* values from the current process' netns when creating a new netns. This is not standard behavior for most netns sysctls, but it avoids breaking existing workloads. Signed-off-by: Danny Lin <[email protected]> Signed-off-by: NipaLocal <nipa@local>

Following operations can trigger a warning[1]: ip netns add ns1 ip netns exec ns1 ip link add bond0 type bond mode balance-rr ip netns exec ns1 ip link set dev bond0 xdp obj af_xdp_kern.o sec xdp ip netns exec ns1 ip link set bond0 type bond mode broadcast ip netns del ns1 When delete the namespace, dev_xdp_uninstall() is called to remove xdp program on bond dev, and bond_xdp_set() will check the bond mode. If bond mode is changed after attaching xdp program, the warning may occur. Some bond modes (broadcast, etc.) do not support native xdp. Set bond mode with xdp program attached is not good. Add check for xdp program when set bond mode. [1] ------------[ cut here ]------------ WARNING: CPU: 0 PID: 11 at net/core/dev.c:9912 unregister_netdevice_many_notify+0x8d9/0x930 Modules linked in: CPU: 0 UID: 0 PID: 11 Comm: kworker/u4:0 Not tainted 6.14.0-rc4 kernel-patches#107 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 Workqueue: netns cleanup_net RIP: 0010:unregister_netdevice_many_notify+0x8d9/0x930 Code: 00 00 48 c7 c6 6f e3 a2 82 48 c7 c7 d0 b3 96 82 e8 9c 10 3e ... RSP: 0018:ffffc90000063d80 EFLAGS: 00000282 RAX: 00000000ffffffa1 RBX: ffff888004959000 RCX: 00000000ffffdfff RDX: 0000000000000000 RSI: 00000000ffffffea RDI: ffffc90000063b48 RBP: ffffc90000063e28 R08: ffffffff82d39b28 R09: 0000000000009ffb R10: 0000000000000175 R11: ffffffff82d09b40 R12: ffff8880049598e8 R13: 0000000000000001 R14: dead000000000100 R15: ffffc90000045000 FS: 0000000000000000(0000) GS:ffff888007a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000d406b60 CR3: 000000000483e000 CR4: 00000000000006f0 Call Trace: <TASK> ? __warn+0x83/0x130 ? unregister_netdevice_many_notify+0x8d9/0x930 ? report_bug+0x18e/0x1a0 ? handle_bug+0x54/0x90 ? exc_invalid_op+0x18/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? unregister_netdevice_many_notify+0x8d9/0x930 ? bond_net_exit_batch_rtnl+0x5c/0x90 cleanup_net+0x237/0x3d0 process_one_work+0x163/0x390 worker_thread+0x293/0x3b0 ? __pfx_worker_thread+0x10/0x10 kthread+0xec/0x1e0 ? __pfx_kthread+0x10/0x10 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2f/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> ---[ end trace 0000000000000000 ]--- Fixes: 9e2ee5c ("net, bonding: Add XDP support to the bonding driver") Signed-off-by: Wang Liang <[email protected]> Acked-by: Jussi Maki <[email protected]> Reviewed-by: Nikolay Aleksandrov <[email protected]> Reviewed-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: NipaLocal <nipa@local>

struct virtio_net_rss_config was less useful in actual code because of a flexible array placed in the middle. Add new structures that split it into two to avoid having a flexible array in the middle. Suggested-by: Jason Wang <[email protected]> Signed-off-by: Akihiko Odaki <[email protected]> Acked-by: Jason Wang <[email protected]> Tested-by: Lei Yang <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Signed-off-by: NipaLocal <nipa@local>

Mark the fields of struct virtio_net_ctrl_rss as little endian as they are in struct virtio_net_rss_config, which it follows. Fixes: c7114b1 ("drivers/net/virtio_net: Added basic RSS support.") Signed-off-by: Akihiko Odaki <[email protected]> Acked-by: Jason Wang <[email protected]> Tested-by: Lei Yang <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Signed-off-by: NipaLocal <nipa@local>

The new RSS configuration structures allow easily constructing data for VIRTIO_NET_CTRL_MQ_RSS_CONFIG as they strictly follow the order of data for the command. Signed-off-by: Akihiko Odaki <[email protected]> Tested-by: Lei Yang <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Signed-off-by: NipaLocal <nipa@local>

virtnet_probe() lacks the code to free rss_hdr in its error path. Allocate rss_hdr with devres so that it will be automatically freed. Fixes: 86a48a0 ("virtio_net: Support dynamic rss indirection table size") Signed-off-by: Akihiko Odaki <[email protected]> Acked-by: Jason Wang <[email protected]> Tested-by: Lei Yang <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Signed-off-by: NipaLocal <nipa@local>

…face During network interface initialization, the NIC driver needs to register its Rx queue with the XDP, to ensure the incoming XDP buffer carries a pointer reference to this info and is stored inside xdp_rxq_info. While this struct isn't tied to XDP prog, if there are any changes in Rx queue, the NIC driver needs to stop the Rx queue by unregistering with XDP before purging and reallocating memory. Drop page_pool destroy during Rx channel reset as this is already handled by XDP during xdp_rxq_info_unreg (Rx queue unregister), failing to do will cause the following warning: warning logs: https://gist.github.com/MeghanaMalladiTI/eb627e5dc8de24e42d7d46572c13e576 Fixes: 46eeb90 ("net: ti: icssg-prueth: Use page_pool API for RX buffer allocation") Signed-off-by: Meghana Malladi <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: NipaLocal <nipa@local>

…it_xdp_frame() There is an error check inside emac_xmit_xdp_frame() function which is called when the driver wants to transmit XDP frame, to check if the allocated tx descriptor is NULL, if true to exit and return ICSSG_XDP_CONSUMED implying failure in transmission. In this case trying to free a descriptor which is NULL will result in kernel crash due to NULL pointer dereference. Fix this error handling and increase netdev tx_dropped stats in the caller of this function if the function returns ICSSG_XDP_CONSUMED. Fixes: 62aa324 ("net: ti: icssg-prueth: Add XDP support") Signed-off-by: Meghana Malladi <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: NipaLocal <nipa@local>

…equest Whenever there is a perout request from the user application, kernel receives req structure containing the configuration info for that req. Add NULL pointer handling for perout request if that req struct points to NULL. Fixes: e5b456a ("net: ti: icss-iep: Add pwidth configuration for perout signal") Signed-off-by: Meghana Malladi <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: NipaLocal <nipa@local>

When delete l3s ipvlan: ip link del link eth0 ipvlan1 type ipvlan mode l3s This may cause a null pointer dereference: Call trace: ip_rcv_finish+0x48/0xd0 ip_rcv+0x5c/0x100 __netif_receive_skb_one_core+0x64/0xb0 __netif_receive_skb+0x20/0x80 process_backlog+0xb4/0x204 napi_poll+0xe8/0x294 net_rx_action+0xd8/0x22c __do_softirq+0x12c/0x354 This is because l3mdev_l3_rcv() visit dev->l3mdev_ops after ipvlan_l3s_unregister() assign the dev->l3mdev_ops to NULL. The process like this: (CPU1) | (CPU2) l3mdev_l3_rcv() | check dev->priv_flags: | master = skb->dev; | | | ipvlan_l3s_unregister() | set dev->priv_flags | dev->l3mdev_ops = NULL; | visit master->l3mdev_ops | To avoid this by do not set dev->l3mdev_ops when unregister l3s ipvlan. Suggested-by: David Ahern <[email protected]> Fixes: c675e06 ("ipvlan: decouple l3s mode dependencies from other modes") Signed-off-by: Wang Liang <[email protected]> Signed-off-by: NipaLocal <nipa@local>

Protect the parser TCAM/SRAM memory, and the cached (shadow) SRAM information, from concurrent modifications. Both the TCAM and SRAM tables are indirectly accessed by configuring an index register that selects the row to read or write to. This means that operations must be atomic in order to, e.g., avoid spreading writes across multiple rows. Since the shadow SRAM array is used to find free rows in the hardware table, it must also be protected in order to avoid TOCTOU errors where multiple cores allocate the same row. This issue was detected in a situation where `mvpp2_set_rx_mode()` ran concurrently on two CPUs. In this particular case the MVPP2_PE_MAC_UC_PROMISCUOUS entry was corrupted, causing the classifier unit to drop all incoming unicast - indicated by the `rx_classifier_drops` counter. Fixes: 3f51850 ("ethernet: Add new driver for Marvell Armada 375 network unit") Signed-off-by: Tobias Waldekranz <[email protected]> Reviewed-by: Maxime Chevallier <[email protected]> Tested-by: Maxime Chevallier <[email protected]> Signed-off-by: NipaLocal <nipa@local>

As mentioned in the commit baeb705 ("ice: always check VF VSI pointer values"), we need to perform a null pointer check on the return value of ice_get_vf_vsi() before using it. Fixes: 6ebbe97 ("ice: Add a per-VF limit on number of FDIR filters") Signed-off-by: luoxuanqiang <[email protected]> Signed-off-by: NipaLocal <nipa@local>

Some dwmac variants such as dwmac_socfpga don't use xpcs but lynx_pcs. Don't call xpcs_config_eee_mult_fact() in this case, as this causes a crash at init : Unable to handle kernel NULL pointer dereference at virtual address 00000039 when write [...] Call trace: xpcs_config_eee_mult_fact from stmmac_pcs_setup+0x40/0x10c stmmac_pcs_setup from stmmac_dvr_probe+0xc0c/0x1244 stmmac_dvr_probe from socfpga_dwmac_probe+0x130/0x1bc socfpga_dwmac_probe from platform_probe+0x5c/0xb0 Fixes: 060fb27 ("net: stmmac: call xpcs_config_eee_mult_fact()") Signed-off-by: Maxime Chevallier <[email protected]> Reviewed-by: Russell King (Oracle) <[email protected]> Signed-off-by: NipaLocal <nipa@local>

tc_actions.sh keeps hanging the forwarding tests. sdf@: tdc & tdc-dbg started intermittenly failing around Sep 25th Signed-off-by: NipaLocal <nipa@local>

Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

Disable tests we don't care about, we use alltests in kunit. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

Signed-off-by: NipaLocal <nipa@local>

kuba-moo force-pushed the to-test branch from 6bd5e75 to bdd05e2 Compare March 27, 2024 21:49

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch 3 times, most recently from 4f22ee0 to 8a9a8e0 Compare March 28, 2024 04:46

kuba-moo force-pushed the to-test branch 11 times, most recently from 64c403f to 8da1f58 Compare March 29, 2024 00:01

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch 3 times, most recently from 78ebb17 to 9325308 Compare March 29, 2024 02:14

kuba-moo force-pushed the to-test branch 6 times, most recently from c8c7b2f to a71aae6 Compare March 29, 2024 18:01

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from 9325308 to 7940ae1 Compare March 29, 2024 18:12

kuba-moo force-pushed the to-test branch 2 times, most recently from d8feb00 to b16a6b9 Compare March 30, 2024 00:01

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from 7940ae1 to 8f1ff3c Compare March 30, 2024 00:21

kuba-moo force-pushed the to-test branch 2 times, most recently from 4164329 to c5cecb3 Compare March 30, 2024 06:00

q2ven and others added 29 commits March 21, 2025 05:02

forwarding: set timeout to 3 hours

44244ca

tc_actions.sh keeps hanging the forwarding tests. sdf@: tdc & tdc-dbg started intermittenly failing around Sep 25th Signed-off-by: NipaLocal <nipa@local>

profile patch

b2ec39d

Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

tc_action dbg

9c8673b

Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

selftests: net: enable profiling

64fff64

Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

kunit: try to disable broken and unneccessary tests

ba4be20

Disable tests we don't care about, we use alltests in kunit. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

drv: net: add timeout

c749cb5

Signed-off-by: NipaLocal <nipa@local>

Merge branch 'net-next-2025-03-21--12-00' into HEAD

d3cc054

kuba-moo force-pushed the to-test branch from 01ecc6f to d3cc054 Compare March 21, 2025 12:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

netdev CI testing #6666

netdev CI testing #6666

kuba-moo commented Mar 27, 2024

netdev CI testing #6666

Are you sure you want to change the base?

netdev CI testing #6666

Conversation

kuba-moo commented Mar 27, 2024