且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

记录内存访问足迹

更新时间:2023-02-26 18:04:45

已经为某些现代x86/EM64T CPU(可能是仅限Intel; Ivy和更新的台式机/服务器cpus). perf mem的手册页为 http://man7.org /linux/man-pages/man1/perf-mem.1.html 和内核文档目录中的相同文本: tools/perf/builtin-mem. c &部分在 tools/perf/builtin- report.c . https://perf.wiki.kernel.org/index.php/Tutorial中没有详细信息.

There is perf mem tool implemented for some modern x86/EM64T CPUs (probably, Intel-only; Ivy and newer desktop/server cpus). Man page of perf mem is http://man7.org/linux/man-pages/man1/perf-mem.1.html and same text in kernel docs dir: http://lxr.free-electrons.com/source/tools/perf/Documentation/perf-mem.txt. The text is incomplete; the best docs are sources: tools/perf/builtin-mem.c & partially in tools/perf/builtin-report.c. No details in https://perf.wiki.kernel.org/index.php/Tutorial.

qemu-mtrace不同,它不会记录每个内存访问,而只会记录第N个访问,其中N类似于10000或100000.但是它以本机速度和低开销工作.使用perf mem record ./program记录模式;尝试为某些CPU内核的系统范围或全局采样添加-a-C cpulist.无法从系统内部记录(跟踪)所有内存访问(工具应将信息写入内存,并将记录此访问-这是有限内存的无限递归),但是有非常昂贵的专有系统特定外部组件跟踪解决方案,例如JTAG或SDRAM嗅探器(5,000美元或以上).

Unlike qemu-mtrace it will not log every memory access, but only every Nth access where N is like 10000 or 100000. But it works with native speed and low overhead. Use perf mem record ./program to record pattern; try to add -a or -C cpulist for system-wide or global sampling for some CPU cores. There is no way to log (trace) all and every memory access from inside the system (tool should write info to memory and will log this access - this is infinite recursion with finite memory), but there are very costly proprietary system-specific external tracing solutions like JTAG or SDRAM sniffer ($5k or more).

perf mem的工具在2013年左右(3.10版的Linux内核)中添加,在lwn上搜索perf mem有以下结果: https://lwn.net/Articles/531766/

The tools of perf mem where added around 2013 (3.10 version of linux kernel), there are several results of searching perf mem on lwn: https://lwn.net/Articles/531766/

使用此补丁程序,可以对内存进行采样(而不是跟踪) 访问(加载,存储).对于负载,指令和数据 地址与延迟和数据源一起被捕获. 对于商店,将捕获指令和数据地址 以及有限的缓存和TLB信息.

With this patch, it is possible to sample (not trace) memory accesses (load, store). For loads, the instruction and data addresses are captured along with the latency and data source. For stores, the instruction and data addresses are capture along with limited cache and TLB information.

当前补丁 从Nehalem开始在Intel处理器上实现该功能. 这些补丁利用了PEBS负载延迟和精确存储 机制.精确商店仅在桑迪桥(Sandy Bridge)和 基于常春藤桥的处理器.

The current patches implement the feature on Intel processors starting with Nehalem. The patches leverage the PEBS Load Latency and Precise Store mechanisms. Precise Store is present only on Sandy Bridge and Ivy Bridge based processors.

添加了物理地址采样支持: https://lwn.net/Articles/555890/ (perf mem --phys-addr -t load rec); (还有与位相关的2016年c2c性能工具"来跟踪缓存行争用": https://joemario.github.io/blog/2016/09/01/c2c-blog/)

Physical address sampling support added: https://lwn.net/Articles/555890/ (perf mem --phys-addr -t load rec); (there is also bit related 2016 year c2c perf tool "to track down cacheline contention": https://lwn.net/Articles/704125/ with examples https://joemario.github.io/blog/2016/09/01/c2c-blog/)

perf mem上的一些随机幻灯片:

Some random slides on perf mem:

  • http://indico.cern.ch/event/280897/contributions/1628882/attachments/515361/711133/SE-CERN_PMU_workshop_2013.pdf#page=4
  • http://www.linuxtag.org/2013/fileadmin/www.linuxtag.org/slides/Arnaldo_Melo_-_Linux__perf__tools__Overview_and_Current_Developments.e323.pdf#page=10
  • https://people.netfilter.org/pablo/netdev0.1/slides/sowa-perf-analytics.pdf#page=19

有关解码perf mem -D report的一些信息: perf mem -D报告

Some info on decoding perf mem -D report: perf mem -D report

 # PID, TID, IP, ADDR, LOCAL WEIGHT, DSRC, SYMBOL
 2054  2054 0xffffffff811186bf 0x016ffffe8fbffc804b0    49 0x68100842 /lib/modules/3.12.23/build/vmlinux:perf_event_aux_ctx

"ADDR","DSRC","SYMBOL"是什么意思?

What does "ADDR", "DSRC", "SYMBOL" mean?

(由与此答案相同的用户回答)

(answered by the same user as in this answer)

  • IP-加载/存储指令的PC;
  • SYMBOL-函数名称,包含此指令(IP);
  • ADDR-数据的虚拟内存地址,由加载/存储请求(如果没有--phys-data选项)
  • DSRC-解码源".
  • IP - PC of the load/store instruction;
  • SYMBOL - name of function, containing this instruction (IP);
  • ADDR - virtual memory address of data, requested by load/store (if there was no --phys-data option)
  • DSRC - "Decoded Source".

还可以进行排序以获取一些基本统计信息:perf mem rep --sort=mem- http://thread.gmane.org/gmane.linux.kernel.perf.user/1438

There is also sorting to get some basic stats: perf mem rep --sort=mem - http://thread.gmane.org/gmane.linux.kernel.perf.user/1438

其他工具..有一种基于valgrind的(慢速)cachegrind 模拟器,用于模拟用户空间程序的高速缓存内存- https://lwn.net/Articles/257209/.对于与DRAMsim/DRAMsim2相关的低级(最慢)模型,还应该有一些东西. http://eng.umd.edu/~blj/dramsim/

Other tools.. There is (slow) cachegrind emulator based on valgrind for simulating cache memory for userspace prograns - "7.2 Simulating CPU Caches" of https://lwn.net/Articles/257209/. There should also be something for low-level (slowest) models related to DRAMsim/DRAMsim2 http://eng.umd.edu/~blj/dramsim/