layout	title
post	第162期

C++ 中文周刊 2024-06-30 第162期

周刊项目地址

公众号

点击「查看原文」跳转到 GitHub 上对应文件，链接就可以点击了

qq群点击进入

RSS

欢迎投稿，推荐或自荐文章/软件/资源等，评论区留言

本期文章由 HMY lhmouse赞助

资讯

标准委员会动态/ide/编译器信息放在这里

最近c++26还在开会，前方记者Mick有报道

后续有其他报道咱们也会转发一下

文章

Member ordering and binary sizes

就是字段排序可能存在空洞导致padding填充

比如

class Example1 {
public:
    double a;// = 4.2;   // 8 bytes
    int b;// = 1;      // 4 bytes
    float c;    // 4 bytes
    char d;     // 1 byte
    bool e;     // 1 byte
    bool f;     // 1 byte
    // Assuming typical alignment, 'a' (8 bytes) should be first,
    // followed by 'b' and 'c' (both 4 bytes), and then 'd' and 'e' (1 byte each).
};

static_assert(sizeof(Example1) == 24);
struct Example2 {
    int b;//=1;      // 4 bytes
    char d;     // 1 byte
    float c;    // 4 bytes
    bool e;     // 1 byte
    double a;// = 4.2;   // 8 bytes
    bool f;
};

static_assert(sizeof(Example2) == 32);

popcnt也能向量化？

看个乐

UB or not UB: How gcc and clang handle statically known undefined behaviour

省流 gcc遇到UB倾向于生成ud2 崩溃，clang遇到UB倾向于不崩溃，把影响抹除

How the STL uses explicit

Efficiently allocating lots of std::shared_ptr

单线程分配shared_ptr对象，比较new make_shared 对象池，压测结果

	new	make_shared	fast_pool_allocator
GCC	69.0	38.1	34.0
Clang + libstdc++	69.2	38.6	35.7
Clang + libc++	76.5	40.8	40.4
GCC + tcmalloc	57.2	30.3	33.8
GCC + jemalloc	86.7	42.7	33.9

看一乐当然结果是满足直觉的，池化快一些，或者别用一堆shared ptr

How much memory does a call to ‘malloc’ allocates?

讲的是个常识，你分配的内存总是向上取整的，不一定是你要多少分配多少

size, speed and order tradeoffs

其实就是访问l1 l2 cache会有不同延迟，通过不同大小文件来测试，有空可以跑一下代码

为什么C++的std::forward会有两种重载

超详细！spdlog源码解析

1

2

3

Latency-Sensitive Applications and the Memory Subsystem: Keeping the Data in the Cache

while循环，没干活，干活逻辑是数据访问，那没干活分支应该可以热数据

比如原来的逻辑

td::unordered_map<int32_t, order> my_orders;
...
packet_t* p;
while(!exit) {
    p = get_packet();
    // If packet arrived
    if (p) {
        // Check if the identifier is known to us
        auto it = my_orders.find(p->id);
        if (it != my_orders.end()) {
            send_answer(p->origin, it->second);
        }
    }
}

while里是个干活逻辑，但是有个大的if，我们可以把这个if拆出来分成干活不干活两个逻辑

std::unordered_map<int32_t, order> my_orders;
...
packet_t* p;
int64_t total_random_found = 0;
while(!exit) {
    // 增加个检查header 然后再判断packet，不满足就去warm
    // 如果header没满足，packet必不满足
    if (packet_header_arrived()) {
        p = get_packet();
        // If packet arrived
        if (p) {
            // Check if the identifier is known to us
            auto it = my_orders.find(p->id);
            if (it != my_orders.end()) {
                send_answer(p->origin, it->second);
            }
        }
    } else {
        // 不干活就Cache warming 
        auto random_id = get_random_id();
        auto it = my_orders.find(random_id);
        // 随便干点啥避免被编译器优化掉
        total_random_found += (it != my_orders.end());
    }
}
std::cout << "Total random found " << total_random_found << "\n";

当然这种cache warm不一定非得随机，有可能副作用

可以从历史值来用，有个词怎么说来着，启发式

硬件层也有cache warm 比如 intel

amd也有 L3 Cache Range Reservation 不过没例子

作者测试了软件模拟cache warm，随机访问

数据，迭代多次的延迟，越小越好

hashmap数据量	正常访问hashmap	没有访问的时候只warm 0	没有访问的时候随机warm
1 K	226.1 (219.0)	213.3 (205.1)	132.5 (67.3)
4 K	324.7 (296.3)	350.7 (331.3)	140.1 (95.4)
16 K	396.8 (341.1)	389.1 (354.5)	208.7 (134.5)
64 K	425.5 (376.1)	416.0 (360.6)	232.1 (152.6)
256 K	514.2 (451.5)	473.3 (480.6)	338.8 (317.6)
1 M	599.8 (550.2)	615.1 (573.6)	466.3 (429.8)
4 M	702.1 (647.0)	619.7 (649.2)	531.3 (508.3)
16 M	756.7 (677.6)	668.8 (707.4)	543.2 (499.9)
64 M	769.1 (702.3)	735.9 (734.2)	641.0 (774.4)

能看到随机访问随机warm效果显著

Latency-Sensitive Application and the Memory Subsystem Part 2: Memory Management Mechanisms

这篇文章的视角比较奇怪，可能和已知的信息不同，目标是低延迟避免内存机制影响

page fault会引入延迟，所以要破坏page fault的生成条件怎么做？

尽可能分配好，而不是用到在分配，有概率触发page fault

mmap使用MAP_POPULATE
使用calloc不用malloc，用malloc/new 强制0填充
零初始化数组，立马使用上
vector 创造时直接构造好大小，不用reserve reserve不一定内存预分配，可能还会造成page fault()
- 或者重载allocator，预先分配内存
- 其他容器也是有类似的问题
使用内存大页
禁用 5-Level Page Walk
TLB shotdown规避这个一时半会讲不完可以看这个
关闭swap

视频

C++ Weekly - Ep 434 - GCC's Amazing NEW (2024) -Wnrvo

-Wnrvo 帮助分析，效果显著

Mirko Arsenijević — Lifting the Pipes - Beyond Sender/Receiver and Expected Outcome — 26.6.2024.

介绍他的dag库，没开源

开源项目介绍

asteria 一个脚本语言，可嵌入，长期找人，希望胖友们帮帮忙，也可以加群753302367和作者对线

互动环节

最近睡眠很差，如果觉得内容有误大家多多指出，困了，先睡

练了一下午街霸6，好难啊我靠，我年纪真是大了，反应跟不上连招连不上，也有可能是设备不行

上一期

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

162.md

162.md

C++ 中文周刊 2024-06-30 第162期

资讯

文章

Member ordering and binary sizes

popcnt也能向量化？

UB or not UB: How gcc and clang handle statically known undefined behaviour

How the STL uses explicit

Efficiently allocating lots of std::shared_ptr

How much memory does a call to ‘malloc’ allocates?

size, speed and order tradeoffs

为什么C++的std::forward会有两种重载

超详细！spdlog源码解析

Latency-Sensitive Applications and the Memory Subsystem: Keeping the Data in the Cache

Latency-Sensitive Application and the Memory Subsystem Part 2: Memory Management Mechanisms

视频

C++ Weekly - Ep 434 - GCC's Amazing NEW (2024) -Wnrvo

Mirko Arsenijević — Lifting the Pipes - Beyond Sender/Receiver and Expected Outcome — 26.6.2024.

开源项目介绍

互动环节

Files

162.md

Latest commit

History

162.md

File metadata and controls

C++ 中文周刊 2024-06-30 第162期

资讯

文章

超详细！spdlog源码解析

视频

开源项目介绍

互动环节