Performance analysis tools based on Linux perf_events (aka perf) and ftrace

Related tags

Nix tools perf-tools
Overview

perf-tools

A miscellaneous collection of in-development and unsupported performance analysis tools for Linux ftrace and perf_events (aka the "perf" command). Both ftrace and perf are core Linux tracing tools, included in the kernel source. Your system probably has ftrace already, and perf is often just a package add (see Prerequisites).

These tools are designed to be easy to install (fewest dependencies), provide advanced performance observability, and be simple to use: do one thing and do it well. This collection was created by Brendan Gregg (author of the DTraceToolkit).

Many of these tools employ workarounds so that functionality is possible on existing Linux kernels. Because of this, many tools have caveats (see man pages), and their implementation should be considered a placeholder until future kernel features, or new tracing subsystems, are added.

These are intended for Linux 3.2 and newer kernels. For Linux 2.6.x, see Warnings.

Presentation

These tools were introduced in the USENIX LISA 2014 presentation: Linux Performance Analysis: New Tools and Old Secrets

Contents

Using ftrace:

Using perf_events:

Using eBPF:

  • As a preview of things to come, see the bcc tracing Tools section. These use bcc, a front end for using eBPF. bcc+eBPF will allow some of these tools to be rewritten and improved, and additional tools to be created.

Screenshots

Showing new processes and arguments:

# ./execsnoop 
Tracing exec()s. Ctrl-C to end.
   PID   PPID ARGS
 22898  22004 man ls
 22905  22898 preconv -e UTF-8
 22908  22898 pager -s
 22907  22898 nroff -mandoc -rLL=164n -rLT=164n -Tutf8
 22906  22898 tbl
 22911  22910 locale charmap
 22912  22907 groff -mtty-char -Tutf8 -mandoc -rLL=164n -rLT=164n
 22913  22912 troff -mtty-char -mandoc -rLL=164n -rLT=164n -Tutf8
 22914  22912 grotty

Measuring block device I/O latency from queue insert to completion:

# ./iolatency -Q
Tracing block I/O. Output every 1 seconds. Ctrl-C to end.

  >=(ms) .. <(ms)   : I/O      |Distribution                          |
       0 -> 1       : 1913     |######################################|
       1 -> 2       : 438      |#########                             |
       2 -> 4       : 100      |##                                    |
       4 -> 8       : 145      |###                                   |
       8 -> 16      : 43       |#                                     |
      16 -> 32      : 43       |#                                     |
      32 -> 64      : 1        |#                                     |

[...]

Tracing the block:block_rq_insert tracepoint, with kernel stack traces, and only for reads:

# ./tpoint -s block:block_rq_insert 'rwbs ~ "*R*"'
   cksum-11908 [000] d... 7269839.919098: block_rq_insert: 202,1 R 0 () 736560 + 136 [cksum]
   cksum-11908 [000] d... 7269839.919107: 
 => __elv_add_request
 => blk_flush_plug_list
 => blk_finish_plug
 => __do_page_cache_readahead
 => ondemand_readahead
 => page_cache_async_readahead
 => generic_file_read_iter
 => new_sync_read
 => vfs_read
 => SyS_read
 => system_call_fastpath

[...]

Count kernel function calls beginning with "bio_", summarize every second:

# ./funccount -i 1 'bio_*'
Tracing "bio_*"... Ctrl-C to end.

FUNC                              COUNT
bio_attempt_back_merge               26
bio_get_nr_vecs                     361
bio_alloc                           536
bio_alloc_bioset                    536
bio_endio                           536
bio_free                            536
bio_fs_destructor                   536
bio_init                            536
bio_integrity_enabled               536
bio_put                             729
bio_add_page                       1004

[...]

There are many more examples in the examples directory. Also see the man pages.

Prerequisites

The intent is as few as possible. Eg, a Linux 3.2 server without debuginfo. See the tool man page for specifics.

ftrace

FTRACE configured in the kernel. You may already have this configured and available in your kernel version, as FTRACE was first added in 2.6.27. This requires CONFIG_FTRACE and other FTRACE options depending on the tool. Some tools (eg, funccount) require CONFIG_FUNCTION_PROFILER.

perf_events

Requires the "perf" command to be installed. This is in the linux-tools-common package. After installing that, perf may tell you to install an additional linux-tools package (linux-tools-kernel_version). perf can also be built under tools/perf in the kernel source. See perf_events Prerequisites for more details about getting perf_events to work fully.

debugfs

Requires a kernel with CONFIG_DEBUG_FS option enabled. As with FTRACE, this may already be enabled (debugfs was added in 2.6.10-rc3). The debugfs also needs to be mounted:

# mount -t debugfs none /sys/kernel/debug

awk

Many of there scripts use awk, and will try to use either mawk or gawk depending on the desired behavior: mawk for buffered output (because of its speed), and gawk for synchronous output (as fflush() works, allowing more efficient grouping of writes).

Install

These are just scripts. Either grab everything:

git clone --depth 1 https://github.com/brendangregg/perf-tools

Or use the raw links on github to download individual scripts. Eg:

wget https://raw.githubusercontent.com/brendangregg/perf-tools/master/iosnoop

This preserves tabs (which copy-n-paste can mess up).

Warnings

Ftrace was first added to Linux 2.6.27, and perf_events to Linux 2.6.31. These early versions had kernel bugs, and lockups and panics have been reported on 2.6.32 series kernels. This includes CentOS 6.x. If you must analyze older kernels, these tools may only be useful in a fault-tolerant environment, such as a lab with simulated issues. These tools have been primarily developed on Linux 3.2 and later kernels.

Depending on the tool, there may also be overhead incurred. See the next section.

Internals and Overhead

perf_events is evolving. This collection began development circa Linux 3.16, with Linux 3.2 servers as the main target, at a time when perf_events lacks certain programmatic capabilities (eg, custom in-kernel aggregations). It's possible these will be added in a forthcoming kernel release. Until then, many of these tools employ workarounds, tricks, and hacks in order to work. Some of these tools pass event data to user space for post-processing, which costs much higher overhead than in-kernel aggregations. The overhead of each tool is described in its man page.

WARNING: In extreme cases, your target application may run 5x slower when using these tools. Depending on the tool and kernel version, there may also be the risk of kernel panics. Read the program header for warnings, and test before use.

If the overhead is a problem, these tools can be improved. If a tool doesn't already, it could be rewritten in C to use perf_events_open() and mmap() for the trace buffer. It could also implement frequency counts in C, and operate on mmap() directly, rather than using awk/Perl/Python. Additional improvements are possible for ftrace-based tools, such as use of snapshots and per-instance buffers.

Some of these tools are intended as short-term workarounds until more kernel capabilities exist, at which point they can be substantially rewritten. Older versions of these tools will be kept in this repository, for older kernel versions.

As my main target is a fleet of Linux 3.2 servers that do not have debuginfo, these tools try not to require it. At times, this makes the tool more brittle than it needs to be, as I'm employing workarounds (that may be kernel version and platform specific) instead of using debuginfo information (which can be generic). See the man page for detailed prerequisites for each tool.

I've tried to use perf_events ("perf") where possible, since that interface has been developed for multi-user use. For various reasons I've often needed to use ftrace instead. ftrace is surprisingly powerful (thanks Steven Rostedt!), and not all of its features are exposed via perf, or in common usage. This tool collection is in some ways a demonstration of hidden Linux features using ftrace.

Since things are changing, it's very possible you may find some tools don't work on your Linux kernel version. Some expertise and assembly will be required to fix them.

Links

A case study and summary:

Related articles:

Comments
  • fsyncsnoop: a new tool to trace fsync() system call

    fsyncsnoop: a new tool to trace fsync() system call

    While investigating some fsync performance issues, I wrote this tool. The implemetation was learned from opensnoop.

    This script needs lsof support to map fd to a file path.

    Signed-off-by: Oliver Yang [email protected]

    opened by yangoliver 18
  • funcgraph, funcslower, functrace: Make -p switch respect all the process' threads

    funcgraph, funcslower, functrace: Make -p switch respect all the process' threads

    Fixes #53 for the tools that use set_ftrace_pid as the filtering technique. Now, traverse /proc/$pid/task to find the "kernel" pids (thread ids) and use them instead. Also, add -L switch if anyone really wants thread-based filters.

    opened by goldshtn 5
  • Invalid printf format in iolatency/cachestat

    Invalid printf format in iolatency/cachestat

    [root@thinkpad ~]# iolatency 1 2
    Tracing block I/O. Output every 1 seconds.
    
      >=(ms) .. <(ms)   : I/O      |Distribution                          |
           0 -> 1       : 0        |                                      |
    
      >=(ms) .. <(ms)   : I/O      |Distribution                          |
           0 -> 1       : 0        |                                      |
    
    Ending tracing...
    [root@thinkpad ~]# iolatency -T 1 2
    Tracing block I/O. Output every 1 seconds.
    /usr/bin/iolatency: line 204: printf: `(': invalid format character
    /usr/bin/iolatency: line 204: printf: `(': invalid format character
    
    Ending tracing...
    [root@thinkpad ~]# which printf
    /usr/bin/printf
    
    opened by evgkrsk 5
  • added killsnoop, another snooping tool for tracing kill()s via ftrace

    added killsnoop, another snooping tool for tracing kill()s via ftrace

    Hi Brendan,

    first at all, your tools are awesome and excellent for learning kernel internals and for debugging problems. I've added killsnoop for snopping kills, since I recently had the problem, that someone kills a worker application, which we where not able to locate the origin. In the end, it was a cleaning cron job, which kills long running (and hanging) processes :-)

    Maybe someday, it will be useful for someone else.

    Best regards, Martin

    opened by MegaMaddin 5
  • Add support for TCP tail loss probes

    Add support for TCP tail loss probes

    Adds support for TCP tail loss probes to tcpretrans.

    This adds another column to the output - a boolean column that indicates if the retransmit was a tail loss probe:

    TIME     PID    LADDR:LPORT          -- RADDR:RPORT          STATE        TLP
    18:42:32 0      172.16.1.149:49509   R> 172.16.4.53:9092     ESTABLISHED  Y
    
    opened by csfrancis 5
  • Use /usr/bin/env in shebang.

    Use /usr/bin/env in shebang.

    On some distributions (e.g., NixOS) the bash binary is not in /bin/bash and the perl binary not in /usr/bin/perl. Using /usr/bin/env in the shebag instead should make it work.

    opened by ruediger 5
  • Not able to generate Flame graphs for Java Apps

    Not able to generate Flame graphs for Java Apps

    Hi Brendan,

    We encountered the below errors while trying to generate the flame graphs from the host. After successful perf record we invoked perf script command on perf.data perf script | ./stackcollapse-perf.pl > out.perf-folded got the below errors. How could we overcome these errors. I believe few are related perfdata file not available in host since it resides in container. This is a multi container env.

    Failed to open /tmp/perf-12964.map, continuing without symbols Failed to open /tmp/perf-20444.map, continuing without symbols Failed to open /lib/x86_64-linux-gnu/libpthread-2.23.so, continuing without symbols Failed to open /x/web/LIVE/keymakeragent/keymakeragent/infra/lib/linux_x86_py27/_faststat.so, continuing without symbols Failed to open /x/opt/pp/bin/python2.7, continuing without symbols Failed to open /x/web/LIVE/keymakeragent/keymakeragent/infra/lib/linux_x86_py27/greenlet.so, continuing without symbols Failed to open /applicationpackages/manifests/active/JDK/cronus/scripts/jdk1.8.0_60/jre/lib/amd64/server/libjvm.so, continuing without symbols Failed to open /tmp/perf-29726.map, continuing without symbols Failed to open /applicationpackages/manifests/active/JDK/cronus/scripts/jdk1.8.0_60/jre/lib/amd64/libnet.so, continuing without symbols Failed to open /lib/x86_64-linux-gnu/libc-2.23.so, continuing without symbols Failed to open /tmp/perf-5808.map, continuing without symbols Failed to open /tmp/perf-28806.map, continuing without symbols Failed to open /lib/ld-musl-x86_64.so.1, continuing without symbols Failed to open /tmp/perf-25990.map, continuing without symbols Failed to open /lib/libpthread-2.5.so, continuing without symbols Failed to open /lib/libc-2.5.so, continuing without symbols Failed to open /tmp/perf-14795.map, continuing without symbols Failed to open /applicationpackages/manifests/active/JSW/cronus/scripts/JSW/bin/wrapper, continuing without symbols Failed to open /tmp/perf-22375.map, continuing without symbols Failed to open /tmp/perf-14695.map, continuing without symbols Failed to open /usr/bin/ppregistrator, continuing without symbols Failed to open /x/web/LIVE/keymakeragent/keymakeragent/infra/lib/linux_x86_py27/gevent/core.so, continuing without symbols Failed to open /tmp/perf-14639.map, continuing without symbols Failed to open /tmp/perf-23526.map, continuing without symbols Failed to open /tmp/perf-8628.map, continuing without symbols Failed to open /applicationpackages/manifests/active/JDK/cronus/scripts/jdk1.8.0_60/jre/lib/amd64/libjava.so, continuing without symbols Failed to open /usr/lib/libstdc++.so.6.0.8, continuing without symbols Failed to open /x/web/LIVE/caldaemon/caldaemon, continuing without symbols Failed to open /tmp/perf-10390.map, continuing without symbols Failed to open /lib/librt-2.5.so, continuing without symbols Failed to open /tmp/perf-3844.map, continuing without symbols no symbols found in /bin/dash, maybe install a debug package? Failed to open /applicationpackages/manifests/active/JDK/cronus/scripts/jdk1.8.0_60/jre/lib/amd64/libnio.so, continuing without symbols Failed to open /x/opt/pp/lib/python2.7/lib-dynload/select.so, continuing without symbols Failed to open /x/opt/pp/lib/python2.7/lib-dynload/time.so, continuing without symbols Failed to open /applicationpackages/manifests/active/JDK/cronus/scripts/jdk1.8.0_60/jre/lib/amd64/libmanagement.so, continuing without symbols Failed to open /usr/bin/socat, continuing without symbols Failed to open /x/opt/pp/lib/python2.7/lib-dynload/_socket.so, continuing without symbols

    opened by sattishv 4
  • killsnoop: can't work with mawk

    killsnoop: can't work with mawk

    On Ubuntu Linux release, mawk is used by default. This issue caused killsnoop can't work on Ubuntu by default.

    There are two issues, a. The strtonum is not supported by mawk. Try to use int to convert string to number. For gawk, int usage need the --non-decimal-data option. On very old RHEL release(2.6.18 kernel), the gawk can support this option. b. killsnoop still has no results due to mawk buffering porblems. Using -W interactive could solve this isue. The option is available on mawk 1.2, RHEL 4+ should support it.

    Signed-off-by: Oliver Yang [email protected]

    opened by yangoliver 4
  • execsnoop: Instrument sys_execve first

    execsnoop: Instrument sys_execve first

    On my kernel (3.16 amd64), stub_execve can't be instrumented, and do_execve gets instrumented but fails silently (no events are caught, except very occasionally through call_usermodehelper).

    Print which implementation is used. Make sure do_execve is tried last due to silent failures.

    opened by g2p 4
  • little output from './execsnoop' with do_execve()

    little output from './execsnoop' with do_execve()

    I found running execsnoop has quite different results with do_execve() and stub_execve().

    There is an example in Fedora with Kernel 3.11 and stub_execve(),: 'bash -x' results: http://paste.ubuntu.com/8431228/ 'cat /sys/kernel/debug/tracing/trace_pipe' http://paste.ubuntu.com/8431209/.

    And Fedora with 3.16 and do_execve(): 'bash -x' http://paste.ubuntu.com/8431251/ 'cat /sys/kernel/debug/tracing/trace_pipe' http://paste.ubuntu.com/8431253/

    We could find a process with name starts with 'neutron-openvsw...' has execsnoop_stub_execve output from 'trace_pipe' but another not. In both of tests, that process has 'sched_process_fork' output. I don't know the develop history of stub_execve and do_execve and have to guess that are there any function call format changes between those, different limitations, or something else...

    opened by pyKun 4
  • iosnoop on Linux and heredoc getting confused

    iosnoop on Linux and heredoc getting confused

    I grabbed the isnoop script but unfortunately when I run it I end up with:

    /usr/local/bin/iosnoop: line 231: warning: here-document at line 62 delimited by end-of-file (wanted `END')
    /usr/local/bin/iosnoop: line 232: syntax error: unexpected end of file
    

    I'm slightly confused by why this is popping up, as the HEREDOC opened on line 62 is closed on line 74 with the END, or at least it should be.

    opened by daenney 4
  • funcgraph does not output anything

    funcgraph does not output anything

    I am using CentOS 7.9 with 3.10 Linux kernel version. I am trying to use funcgraph to record the duration of the kernel function. But I can't see any output from the funcgraph . How can I fix this ?

    image

    opened by BilyZ98 0
  • memory events not supported

    memory events not supported

    perf mem record ./stream

    failed: memory events not supported

    Perf mem is not supported in my system. I am using virtual box My processor: Intel® Core™ i9-9880H CPU @ 2.30GHz × 8 Ubuntu version: Ubuntu 21.10

    opened by tusharpandey1993 0
  • Negative number of hits when using cachestat

    Negative number of hits when using cachestat

    Thanks for your wonderful performance tools - I am happily using them for my studies.

    However, I sometimes find negative number of hits printed in my terminal when using cachestat. How is this possible, and how should I interpret this number? Could it be the problem of possible overflow? Below is the output that I've encountered.

    Counting cache functions... Output every 5 seconds.
    TIME         HITS   MISSES  DIRTIES    RATIO   BUFFERS_MB   CACHE_MB  DEBUG
    06:29:25    15198    59372      500    20.4%         1094       5754  (75070 500 59871 499)
    06:29:30    36746    58014      827    38.8%         1126       5953  (95587 827 58841 827)
    06:29:35     8498    21626      307    28.2%         1147       6029  (30431 307 21932 306)
    06:29:40   -13710    29487      172   -86.9%         1158       6134  (15949 172 29659 172)
    06:29:45   -16377    28720      125  -132.7%         1168       6237  (12468 125 28845 125)
    06:29:50   -22167    34690      276  -177.0%         1182       6360  (12799 276 34966 276)
    06:29:55    21356    21934     3172    49.3%         1213       6428  (46462 3172 25106 3172)
    
    opened by sunhongmin225 1
  • kprobe: fails when debugfs isn't mounted, use alternate

    kprobe: fails when debugfs isn't mounted, use alternate

    It's possible - especially on a production kernel - that CONFIG_DEBUG_FS_DISALLOW_MOUNT=y ; which implies that though the debugfs filesystem's APIs are available, the filesystem itself isn't mounted - it's invisible. Take this into account, else the script won't work on such systems. (Will setup a PR for this)

    opened by kaiwan 0
Releases(v1.0)
  • v1.0(Aug 17, 2017)

    I'm adding a tagged release to perf-tools so that package maintainers can grab a static version. There's nothing really special about this v1.0 release, other than putting a version on a toolkit that has grown and stabilized for now, meeting its original purpose: providing advanced tracing tools using ftrace and perf_events. Future releases might add hist triggers and/or eBPF tools (my eBPF tools can currently be found in the bcc repo).

    Source code(tar.gz)
    Source code(zip)
Owner
Brendan Gregg
Cloud computing performance architect and engineer.
Brendan Gregg
Performance monitoring and benchmarking suite

Introduction Likwid is a simple to install and use toolsuite of command line applications and a library for performance oriented programmers. It works

null 1.3k Dec 28, 2022
BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more

BPF Compiler Collection (BCC) BCC is a toolkit for creating efficient kernel tracing and manipulation programs, and includes several useful tools and

IO Visor Project 16.3k Dec 30, 2022
A java agent to generate method mappings to use with the linux `perf` tool

perf-map-agent A java agent to generate /tmp/perf-<pid>.map files for just-in-time(JIT)-compiled methods for use with the Linux perf tools. Build Make

null 1.5k Jan 1, 2023
PerfJ is a wrapper of linux perf for java programs.

PerfJ PerfJ is a wrapper of linux perf for java programs. As Brendan Gregg's words In order to profile java programs, you need a profiler that can sam

Min Zhou 353 Jan 2, 2023
Sampling CPU and HEAP profiler for Java featuring AsyncGetCallTrace + perf_events

async-profiler This project is a low overhead sampling profiler for Java that does not suffer from Safepoint bias problem. It features HotSpot-specifi

null 5.8k Jan 3, 2023
A virtual Linux shell environment application for Android OS. Runs Alpine Linux in QEMU system emulator. Termux app fork.

vShell (Virtual Shell) — a successor of Termux project which provides an alternate implementation of the Linux terminal emulator for Android OS.

null 2 Feb 1, 2022
Android Resource Manager application to manage and analysis your app resources with many features like image resize, Color, Dimens and code Analysis

AndroidResourceManager Cross-Platform tools to manage your resources as an Android Developer, AndroidResourceManager - ARM provide five main services

Amr Hesham 26 Nov 16, 2022
For English vocabulary analysis and sentence analysis in natural language, model trainin

Sword Come ?? For English vocabulary analysis and sentence analysis in natural language, model training, intelligent response and emotion analysis rea

James Zow 2 Apr 9, 2022
Flash Sale System AKA. seckill system

FlashSaleSystem Project highlights Distributed system scheme From a single machine to a cluster, it is easy to scale horizontally simply by adding ser

wsbleek 12 Sep 13, 2022
Rqueue aka Redis Queue [Task Queue, Message Broker] for Spring framework

Rqueue: Redis Queue, Task Queue, Scheduled Queue for Spring and Spring Boot Rqueue is an asynchronous task executor(worker) built for spring and sprin

Sonu Kumar 221 Jan 5, 2023
Terminal GUI library for simple ANSI console tools and graphical interfaces with Windows/Linux support

TerminalCore Terminal GUI library for Windows/Linux. This library contains all colors as ascii codes, native functions of the respective operating sys

Pascal 3 Oct 19, 2022
Table-Computing (Simplified as TC) is a distributed light weighted, high performance and low latency stream processing and data analysis framework. Milliseconds latency and 10+ times faster than Flink for complicated use cases.

Table-Computing Welcome to the Table-Computing GitHub. Table-Computing (Simplified as TC) is a distributed light weighted, high performance and low la

Alibaba 34 Oct 14, 2022
GMC-Tools - Plugin with basic tools for Minecraft server administrator

GMC-Tools - Plugin with basic tools for Minecraft server administrator. Currently we do not support configuration files and we do not recommend using this plugin on production servers.

GamesMC Studios 4 Jan 14, 2022
Operating Systems - Concepts of computer operating systems including concurrency, memory management, file systems, multitasking, performance analysis, and security. Offered spring only.

Nachos for Java README Welcome to Nachos for Java. We believe that working in Java rather than C++ will greatly simplify the development process by p

Sabir Kirpal 1 Nov 28, 2021
Performance visualisation tools

grav A collection of tools to help visualise process execution. This blog post has some detail on the rationale and implementation detail. Scheduler p

Mark Price 283 Dec 30, 2022
Packages your JAR, assets and a JVM for distribution on Windows, Linux and Mac OS X

About Packages your JAR, assets and a JVM for distribution on Windows, Linux and macOS, adding a native executable file to make it appear like a nativ

libgdx 2.4k Dec 24, 2022
Packages your JAR, assets and a JVM for distribution on Windows, Linux and Mac OS X

About Packages your JAR, assets and a JVM for distribution on Windows, Linux and macOS, adding a native executable file to make it appear like a nativ

libgdx 2.4k Jan 5, 2023