A java agent to generate method mappings to use with the linux `perf` tool

Overview

perf-map-agent

Join the chat at https://gitter.im/jrudolph/perf-map-agent Build Status

A java agent to generate /tmp/perf-<pid>.map files for just-in-time(JIT)-compiled methods for use with the Linux perf tools.

Build

Make sure JAVA_HOME is configured to point to a JDK. You need cmake >= 2.8.6 (see #30). Then run the following on the command line:

cmake .
make

# will create links to run scripts in <somedir>
bin/create-links-in <somedir>

Architecture

Linux perf tools will expect symbols for code executed from unknown memory regions at /tmp/perf-<pid>.map. This allows runtimes that generate code on the fly to supply dynamic symbol mappings to be used with the perf suite of tools.

perf-map-agent is an agent that will generate such a mapping file for Java applications. It consists of a Java agent written C and a small Java bootstrap application which attaches the agent to a running Java process.

When the agent is attached it instructs the JVM to report code blobs generated by the JVM at runtime for various purposes. Most importantly, this includes JIT-compiled methods but also various dynamically-generated infrastructure parts like the dynamically created interpreter, adaptors, and jump tables for virtual dispatch (see vtable and itable entries). The agent creates a /tmp/perf-<pid>.map file which it fills with one line per code blob that maps a memory location to a code blob name.

The Java application takes the PID of a Java process as an argument and an arbitrary number of additional arguments which it passes to the agent. It then attaches to the target process and instructs it to load the agent library.

Command line scripts

The bin directory contains a set of shell scripts to combine common perf / dtrace perations with creating the map file.

  • create-java-perf-map.sh <pid> <options*> takes a PID and options. It knows where to find libraries relative to the bin directory.
  • perf-java-top <pid> <perf-top-options> takes a PID and additional options to pass to perf top. Uses the agent to create a new /tmp/perf-<pid>.map and then calls perf top with the given options.
  • perf-java-record-stack <pid> <perf-record-options> takes a PID and additional options to pass to perf record. Runs perf record -g -p <pid> <perf-record-options> to collect performance data including stack traces. Afterwards it uses the agent to create a new /tmp/perf-<pid>.map file.
  • perf-java-report-stack <pid> <perf-record-options> calls first perf-java-record-stack <pid> <perf-record-options> and then runs perf report to directly analyze the captured data. You can call perf report -i /tmp/perf-<pid>.data again with any options after the script has exited to further analyze the data from the previous run.
  • perf-java-flames <pid> <perf-record-options> collects data with perf-java-record-stack and then creates a visualization using @brendangregg's FlameGraph tools. To get meaningful stacktraces spanning several JIT-compiled methods, you need to run your JVM with -XX:+PreserveFramePointer (which is available starting from JDK8 update 60 build 19) as detailed in ag netflix blog entry.
  • create-links-in <targetdir> will install symbolic links to the above scripts into <targetdir>.
  • dtrace-java-record-stack <pid> takes a PID. Runsdtrace to collect performance data including stack traces. Afterwards it uses the agent to create a new /tmp/perf-<pid>.map file.
  • dtrace-java-flames <pid> collects data with dtrace-java-record-stack and then creates a visualization using @brendangregg's FlameGraph tools. To get meaningful stacktraces spanning several JIT-compiled methods, you need to run your JVM with -XX:+PreserveFramePointer (which is available starting from JDK8 update 60 build 19) as detailed in ag netflix blog entry.

Environment variables:

  • PERF_MAP_OPTIONS: a string of additional options to pass to the agent as described below.
  • PERF_RECORD_SECONDS: the number of seconds, perf-java-report-stack and similar tools will record performance data
  • PERF_RECORD_FREQ: the sampling frequence as passed to perf record -F
  • FLAMEGRAPH_DIR: the directory into which @brendangregg's FlameGraph has been checked out
  • PERF_JAVA_TMP: the directory to put temporary files in, the default is /tmp
  • PERF_DATA_FILE: the file name where perf-java-record-stack will output performance data into, the default is $PERF_JAVA_TMP/perf-<pid>.data
  • PERF_COLLAPSE_OPTS: a string of additional flags to pass to stackcollapse-perf.pl (found in FLAMEGRAPH_DIR), (add --inline with unfoldall perfmap)
  • PERF_FLAME_OUTPUT: the file name to which the flamegraph SVG will be written, the default is flamegraph-<pid>.svg
  • PERF_FLAME_OPTS: options to pass to flamegraph.pl (found in FLAMEGRAPH_DIR), the default is --color java
  • DTRACE_SECONDS: the number of seconds, dtrace and similar tools will record performance data
  • DTRACE_FREQ: the sampling frequence as passed to dtrace
  • DTRACE_JAVA_TMP: the directory to put temporary files in, the default is /tmp
  • DTRACE_DATA_FILE: the file name where dtrace-java-record-stack will output performance data into, the default is $DTRACE_JAVA_TMP/dtrace-<pid>.data

Options

You can add a comma separated list of options to perf-java (or the AttachOnce runner). These options are currently supported:

  • unfold: Create extra entries for every codeblock inside a method that was inlined from elsewhere (named <inlined_method> in <root_method>). Be aware of the effects of 'skid' in relation with unfolding. See the section below. Also, see the below section about inaccurate inlining information.
  • unfoldall: Similar to unfold but will include the complete inlined stack at a code location in the form root_method->inlined method 1->inlined method 2->...->inlined method on top.
  • unfoldsimple: similar to unfold, however, the extra entries do not include the " in <root_method>" part
  • msig: include full method signature in the name string
  • dottedclass: convert class signature (Ljava/lang/Class;) to the usual class names with segments separated by dots (java.lang.Class). NOTE: this currently breaks coloring when used in combination with flamegraphs.
  • sourcepos: Adds the name of the source file and the line number on which it is declared for each method. Useful when profiling Scala applications that crate a lot of synthetic classes and methods. Does not work with native methods.

Known Issues

Skid

You should be aware that instruction level profiling is not absolutely accurate but suffers from 'skid'. 'skid' means that the actual instruction pointer may already have moved a bit further when a sample is recorded. In that case, (possibly hot) code is reported at an address shortly after the actual hot instruction. See this sample from one of Brendan's presentations demonstrating this issue.

If using unfold, perf-map-agent will report sections that contain code inlined from other methods as separate entries. Unfolded entries can be quite short, e.g. an inlined getter may only consist of a few instructions that now lives inside of another method's JITed code. The next few instructions may then already belong to another entry. In such a case, it is more likely that skid will not only affect the instruction pointer inside of a method entry but may affect which entry is chosen in the first place.

Skid that occurs inside a method is only visible when analyzing the actual assembler code (as with perf annotate). Skid that affects the actual symbol resolution to choose a wrong entry will be much more visible as wrong entries will be reported with tools that operate on the symbol level like the standard views of perf report, perf top, or in flame graphs.

So, while it is tempting to enable unfolded entries for the perceived extra resolution, this extra information is sometimes just noise which will not only clutter the overall view but may also be misleading or wrong.

Inaccurate mappings using the unfold* options

Hotspot does not retain line number and other debug information for inlined code at other places than safepoints. This makes sense because you don't usually observe code running between safepoints from the JVM's perspective. This is different when observing a process from the outside like with perf. For observed code locations outside of safepoints, the JVM will not report any inlining information and perf-map-agent will assign those areas to the host method of the inlining.

For more fidelity, Hotspot can be instructed to include debug information for non-safepoints as well. Use -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints when running the target process. Note, however, that this will produce a lot more information with the generated perf-<pid>.map file potentially growing to MBs of size.

Agent Library Unloading

Unloading or reloading of a changed agent library is not supported by the JVM (but re-attaching is). Therefore, if you make changes to the agent and recompile it you need to restart a target process that has an older version loaded to use the newer version.

Missing symbols for libjvm.so

libjvm.so is the runtime component of the JVM. It is not covered by perf-map-agent but perf will use debug symbols as provided by the distribution. If symbols for libjvm.so are missing see instructions for your Linux distribution to install debug symbols for the JVM. See also issue #39 which contains a few pointers about how to install these.

Disclaimer

I'm not a professional C code writer. The code is very "experimental", and it is e.g. missing checks for error conditions etc.. Use it at your own risk. You have been warned!

License

This library is licensed under GPLv2. See the LICENSE file.

Comments
  • Could not find or load main class net.virtualvoid.perf.AttachOnce

    Could not find or load main class net.virtualvoid.perf.AttachOnce

    # ./bin/perf-java-top 19835
    Error: Could not find or load main class net.virtualvoid.perf.AttachOnce
    

    scripts expects attach-main.jar to be found in out dir, while it is in the project root dir.

    opened by stepancheg 22
  • Improve reporting of inlined frames

    Improve reporting of inlined frames

    I think PMA is in a unique position to break new grounds in Java profiling by reporting the Java stack while highlighting the separation between real frames and virtual frames(inlined methods). To do this we will need:

    1. Report all frames when reporting inlined data: This can easily lead to VERY long strings. I'm not sure what the maximum 'method' name length supported by perf is.
    2. Support from flame graphs in colouring and displaying the frames.
    opened by nitsanw 12
  • flush buffers, and options for livemap and msig

    flush buffers, and options for livemap and msig

    Just a few small improvements, if you want them:

    • fflush(): to avoid a 4k buffering issue where the map file was always missing the most recent entries.
    • livemap: an option to write to an alternate file. This allows other software to be used to tidy up and write the real map file before running perf. (This is a step I've found to be necessary with complex production code, where -- despite a correct map file and perf profile -- perf's translation step gets messed up, unless the map file is tidied first. I have a program to do this I'll publish.)
    • msig: I changed the behavior to not include the method signatures, unless you set this option.

    I also have a fix (or rather, a hack) in-hand for the frame pointer issue I need to publish as well...

    opened by brendangregg 10
  • Some C stack frames with Java ancestor not correctly shown

    Some C stack frames with Java ancestor not correctly shown

    HI guys!

    I have produced a flame graph with:

    PERF_COLLAPSE_OPTS="--kernel --tid" PERF_RECORD_FREQ=99 PERF_RECORD_SECONDS=10 PERF_MAP_OPTIONS=unfoldall ~/perf-map-agent/bin/perf-java-flames <pid>
    

    image

    But seems that (some) C stack frames are not correctly shown as children of the related Java calls. Am I missing anything in the configuration?

    Thanks, Franz

    opened by franz1981 9
  • Not able to resolve symbols

    Not able to resolve symbols

    Hi

    I'm trying to use your project on a fedora 24 using openjdk 1.8.0.101 . The problem can be seen using perf-java-top and flame graphs as well . The create-java-perf-map seems to work well but it did saw how could get its output with perf to plot with frame. Jstack works as well (but as you mention it would be a terrible poor performance and quality).

    I thought it would be something with preserving the framepointer but the jdk version is compatible and I'm using the following JVM options:

    -XX:+PreserveFramePointer -XX:+StartAttachListener -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -XX:-OmitStackTraceInFastThrow -XX:+ShowHiddenFrames

    Do you have any suggestions about how to solve this problem?

    Thanks in advance Luan

    opened by luan-cestari 9
  • Bug fixes and output formatting

    Bug fixes and output formatting

    Hi, This is a bit of a mixed bag, so feel free to cherry pick the bits you find most relevant. The most important aspects are:

    1. Fix a bug in the inlined address range computation where the range was assigned to the wrong method.
    2. Error handling for JVMTI calls (only in method signature, the same pattern needs applying to all JVMTI interactions).

    In addition I added some comments, renamed variables for clarity (I think), and added a more compact format for inlined methods printing. Happy to have a discussion to make the PR more palatable. I would like to improve formatting further and settle on good way to configure it, but not sure how important backwards compatibility is. Thanks, Nitsan

    opened by nitsanw 8
  • Interpreter frames

    Interpreter frames

    I often see "Interpreter" frames. I think I was expecting more symbols.

       100.00%     java  perf-20844.map  [.] Interpreter
                   |
                   --- Interpreter
                       Interpreter
                       Interpreter
                       Interpreter
                       call_stub
                       JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)
                       jni_invoke_static(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*)
                       jni_CallStaticVoidMethod
                       JavaMain
                       start_thread
    
    opened by brendangregg 8
  • Can make in CentOS

    Can make in CentOS

    Hi I have ran cmake . but show me as below do you know what it is happened? finished or unfinished?

       -- The C compiler identification is GNU
    -- The CXX compiler identification is GNU
    -- Check for working C compiler: /usr/bin/gcc
    -- Check for working C compiler: /usr/bin/gcc -- works
    -- Detecting C compiler ABI info
    -- Detecting C compiler ABI info - done
    -- Check for working CXX compiler: /usr/bin/c++
    -- Check for working CXX compiler: /usr/bin/c++ -- works
    -- Detecting CXX compiler ABI info
    -- Detecting CXX compiler ABI info - done
    CMake Error at CMakeLists.txt:24 (include):
      include could not find load file:
    
        UseJava
    
    
    CMake Error at CMakeLists.txt:30 (add_jar):
      Unknown CMake command "add_jar".
    
    
    -- Configuring incomplete, errors occurred!
    
    opened by zouyx 7
  • incorrect method resolution?

    incorrect method resolution?

    Hi,

    I'm seeing some methods being unexpectedly listed high on perf top, which are called very infrequently. I have cross-checked with a JMC flight recording (which I assume to be quite accurate), and these paths were not showing up there at all.

    Is there a chance that the map is somehow inaccurate, perhaps over time with hotspot recompiling / deoptimizing / moving stuff around?

    Btw, this agent is quite useful, keep up the good work :)

    Regards, Viktor

    opened by phraktle 7
  • Missing method name from flamegraph

    Missing method name from flamegraph

    I am trying to build flamegraph, flamegraph is getting created successfully, but it is having a lot of methods without name / Generic symbol.

    Out put which i get is below.

    [aemauthor@local-aem62-test bin]$ ./perf-java-flames 17715 Recording events for 15 seconds (adapt by setting PERF_RECORD_SECONDS) [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.696 MB /tmp/perf-17715.data (618 samples) ] Failed to open 3E, continuing without symbols Flame graph SVG written to PERF_FLAME_OUTPUT='/home/aemauthor/perf-map-agent/bin/flamegraph-17715.svg'.

    Perf data, method map and Flamegraph are attached.

    Archive.zip

    opened by tosheer 6
  • High CPU time in vtable chunks

    High CPU time in vtable chunks

    During our code profiling tests on 16-core systems running 8-16 worker threads, we are observing a significant CPU time (25%) in "vtable chunks" based on the generated symbol map. I think this symbol is listed multiple times as well in the symbol map. This overhead also increases as the number of running threads is gradually raised from 8 to 16.

    Is this an overhead of perf-map-agent or something else ? Would like to also confirm if its related to virtual methods in our code.

    Another unrelated issue is that perf top always displays "unable to open vsyscalls". Is there a way to resolve it. Does this mean that perf is unable to display system calls and their CPU time ? All processes are being executed by the "root" user.

    opened by kotaravi 6
  • Trouble to attach with high PID value

    Trouble to attach with high PID value

    I am facing this issue with trying to load agent into JVM when system process PID out of integer range:

    /usr/lib/jvm/java-8-oracle/bin/java -cp attach-main.jar:/usr/lib/jvm/java-8-oracle/lib/tools.jar net.virtualvoid.perf.AttachOnce 115323
    Exception in thread "main" java.io.IOException: Non-numeric value found - int expected
    	at sun.tools.attach.HotSpotVirtualMachine.readInt(HotSpotVirtualMachine.java:299)
    	at sun.tools.attach.HotSpotVirtualMachine.loadAgentLibrary(HotSpotVirtualMachine.java:63)
    	at sun.tools.attach.HotSpotVirtualMachine.loadAgentPath(HotSpotVirtualMachine.java:88)
    	at net.virtualvoid.perf.AttachOnce.loadAgent(AttachOnce.java:51)
    	at net.virtualvoid.perf.AttachOnce.main(AttachOnce.java:34)
    

    Is there any other work-around than reboot Linux?

    Generally speaking, is this "integer" PID restriction for hotspot VM attachment has been fixed in any recent Java 8 or OpenJDK version?

    opened by ymartin59 1
  • Can't compile on Raspberry 4 with jdk 8u181

    Can't compile on Raspberry 4 with jdk 8u181

    pi@raspberrypi:~/perf-map-agent $ make
    [ 50%] Built target perfmap
    [ 66%] Building Java objects for attach-main.jar
    error: error reading /home/pi/jdk1.8.0_181/lib/ct.sym; cannot read zip file
    Fatal Error: Unable to find package java.lang in classpath or bootclasspath
    make[2]: *** [CMakeFiles/attach-main.dir/build.make:70: CMakeFiles/attach-main.dir/java_compiled_attach-main] Error 3
    make[1]: *** [CMakeFiles/Makefile2:110: CMakeFiles/attach-main.dir/all] Error 2
    make: *** [Makefile:84: all] Error 2
    

    The ct.sym file is there. Edit: The libperfmap.so is correctly compiled, it's the generation of the http://fem.rupy.se/attach-main.jar that fails, luckily I had it from a previous compile all it does is to attach libperfmap.so to the JVM. So I manage to get most things working but when I run sudo perf script > out.perf I get: Failed to open /home/pi/jdk1.8.0_181/jre/lib/rt.jar, continuing without symbols So the Java SE methods do not get replaced properly but you can guess what they are, the dumped ones however do: http://move.rupy.se/file/graph.svg

    opened by tinspin 0
  • ./bin/dtrace-java-record-stack is not working

    ./bin/dtrace-java-record-stack is not working

    Ran ./bin/dtrace-java-record-stack but the command does not work. I think the dtrace command inside the ./bin/dtrace-java-record-stack is not used properly.

    cat /tmp/dtrace-25840.data Usage /usr/bin/dtrace [--help] [-h | -G] [-C [-I]] -s File.d [-o ]

    opened by anianna96 1
  • [dtrace-perf-map] Use binary rather than linear search

    [dtrace-perf-map] Use binary rather than linear search

    On a dtrace data file of 2342965 lines and a perf map file of 19072 lines, processing of dtrace-perf-map.pl took well over 5 hours.

    Investigation demonstrated that this was because of the inner loop over the @map_entries array; reducing the length of that array to 10 would give a 5s runtime, 20 a 10s, 30 a 15s, etc. Perl is good for a lot of things but not for 4 billion array iterations.

    It seemed to me that we could improve this by sorting the array and applying binary search (An extent map was also considered but thought too complex to build, though fun). Processing the same file now takes roughly 20s on the same machine (11s on mine).

    Or, binary search go brrr.

    opened by bdw 0
  • Allow perf-map-agent to be loaded at startup

    Allow perf-map-agent to be loaded at startup

    Implemented Agent_OnLoad so that the agent may be loaded at startup. Previously the agent could only be attached at runtime.

    The agent may be loaded using the following cmd line syntax: perf record java -XX:+PreserveFramePointer -agentpath:/path/to/perf-map-agent/out/libperfmap.so app args

    opened by prasun3 1
  • Support for JDK 9+.

    Support for JDK 9+.

    This solves #42 by adding minimal support for JDK 9 and later versions. Issues addressed are:

    • lack of tools.jar since JDK 9 (replaced by adding jdk.attach module)
    • lack of javah since JDK 10 (now handled by javac, so no longer required)
    opened by pawel-dabrowski-codewise 0
Owner
null
Get Method Sampling from Java Flight Recorder Dump and convert to FlameGraph compatible format.

Note: Travis has removed the support for Oracle JDK 8. Therefore the build status is removed temporarily. Converting JFR Method Profiling Samples to F

M. Isuru Tharanga Chrishantha Perera 248 Dec 16, 2022
A Java agent that rewrites bytecode to instrument allocation sites

The Allocation Instrumenter is a Java agent written using the java.lang.instrument API and ASM. Each allocation in your Java program is instrumented;

Google 438 Dec 19, 2022
Java Agent for Memory Measurements

Overview Jamm provides MemoryMeter, a Java agent for all Java versions to measure actual object memory use including JVM overhead. Use To use MemoryMe

Jonathan Ellis 624 Dec 28, 2022
The Java agent for Apache SkyWalking

Apache SkyWalking Java Agent SkyWalking-Java: The Java Agent for Apache SkyWalking, which provides the native tracing/metrics/logging abilities for Ja

The Apache Software Foundation 447 Jan 5, 2023
Generate flame graph in HTML format from jfr(Java Flight Recorder) file

jfr-flamegraph-generator Generate Flame Graph from .jfr file. Get Started Executable jar and executable binary for Linux, Windows and macOS are provid

Lawrence Ching 3 Sep 22, 2022
BTrace - a safe, dynamic tracing tool for the Java platform

btrace A safe, dynamic tracing tool for the Java platform Version 2.1.0 Quick Summary BTrace is a safe, dynamic tracing tool for the Java platform. BT

btrace.io 5.3k Jan 9, 2023
Tool for creating reports from Java Flight Recorder dumps

jfr-report-tool Tool for creating reports from Java Flight Recorder dumps. Influenced by https://github.com/chrishantha/jfr-flame-graph . Kudos to @ch

Lari Hotari 50 Oct 28, 2022
The sample for how to use opentelemetry-collector in Java

opentelemetry-jaeger-prometheus Introduction OpenTelemetry Collector+Jaeger+Prometheus的可观测演示案例 Load Balance :Nginx 前端:Java SpringBoot Web + OpenTeleme

laziobird 5 Jun 19, 2022
One file java script for visualizing JDK flight recorder execution logs as flamegraphs without any dependencies except Java and a browser.

Flamegraph from JFR logs Simple one file Java script to generate flamegraphs from Java flight recordings without installing Perl and the Brendan Gregg

Billy Sjöberg 17 Oct 2, 2022
JVM Explorer is a Java desktop application for browsing loaded class files inside locally running Java Virtual Machines.

JVM Explorer JVM Explorer is a Java desktop application for browsing loaded class files inside locally running Java Virtual Machines. Features Browse

null 109 Nov 30, 2022
Java memory allocation profiler

Aprof - Java Memory Allocation Profiler What is it? The Aprof project is a Java Memory Allocation Profiler with very low performance impact on profile

Devexperts 211 Dec 15, 2022
Sampling CPU and HEAP profiler for Java featuring AsyncGetCallTrace + perf_events

async-profiler This project is a low overhead sampling profiler for Java that does not suffer from Safepoint bias problem. It features HotSpot-specifi

null 5.8k Jan 3, 2023
Fork of tagtraum industries' GCViewer. Tagtraum stopped development in 2008, I aim to improve support for Sun's / Oracle's java 1.6+ garbage collector logs (including G1 collector)

GCViewer 1.36 GCViewer is a little tool that visualizes verbose GC output generated by Sun / Oracle, IBM, HP and BEA Java Virtual Machines. It is free

null 4.1k Jan 4, 2023
Log analyser / visualiser for Java HotSpot JIT compiler. Inspect inlining decisions, hot methods, bytecode, and assembly. View results in the JavaFX user interface.

JITWatch Log analyser and visualiser for the HotSpot JIT compiler. Video introduction to JITWatch video Slides from my LJC lightning talk on JITWatch

AdoptOpenJDK 2.8k Jan 3, 2023
Java monitoring for the command-line, profiler included

jvmtop is a lightweight console application to monitor all accessible, running jvms on a machine. In a top-like manner, it displays JVM internal metri

null 1.2k Jan 6, 2023
OOM diagnostics for Java.

Polarbear A tool to help diagnose OutOfMemoryError conditions. Polarbear helps track down the root cause of OutOfMemoryError exceptions in Java. When

Cue 20 May 14, 2019
Inline raw ASM instructions in Java

asm-inline At first I thought: Oh, I can make an optimization transformer for Proguard And then this happened. Example: public class Test { public

null 27 Dec 8, 2022
Some utility classes around java records

record-util Some utility classes around java records On the menu MapTrait Transform any record to a java.util.Map just by implementing the interface M

Rémi Forax 32 Apr 6, 2022
Terminal UI JMX (Java management extension) viewer

JMXViewer Terminal UI JMX (Java management extension) viewer Usage java -jar jmxviewer.jar [pid] The PID is optional. If it is not provided, the appli

Ivan Yurchenko 20 Sep 15, 2022