Sends stacktrace-level performance data from a JVM process to Riemann.

Overview

Riemann JVM Profiler

riemann-jvm-profiler is a JVM agent that you can inject into any JVM process--one written in Clojure, Java, Scala, Groovy, etc.--which sends function-level profiler telemetry to a Riemann server for analysis, visualization, and storage. It is designed to answer questions like "Across this thousand-node Hadoop job, what functions consume the most CPU time, and why?"

Profiling your processes

Riemann-jvm-profiler requires no changes to your codebase. You'll need a Riemann server with the websocket/HTTP server (ws-server) accessible from each JVM process. The default Riemann websocket port is 5556.

As a library

You can add riemann-jvm-profiler artifact to your Maven or Leiningen project via Clojars, and somewhere in your application startup code, invoke the profiler programmatically.

(ns my-app.bin
  (:gen-class :name my-app.bin)
  (:require [riemann.jvm-profiler :as profiler]))

(defn -main [& args]
  ; Start Riemann profiler
  (profiler/start-global!
    {:host   "my.riemann.server"
     :prefix "my app"
     :load   0.05}))
  ...

As a Java agent

To profile any JVM process, build a fat jar using Leiningen.

cd riemann-jvm-profiler
lein uberjar

That'll spit out a file called target/riemann-jvm-profiler-0.1.0.jar, which you should copy to somewhere on the classpath or filesystem on each node where you'd like to profile a JVM process. Then add the agent to that process' java startup options:

java -javaagent:'/var/lib/riemann-jvm-profiler.jar=prefix=my app,host=my.riemann.server' ...

In a Hadoop job

Assuming you've distributed the jar to each node:

hadoop jar myjob.jar foo bar \
  -Dmapred.map.child.java.opts="'-javaagent:/var/lib/riemann-jvm-profiler.jar=host=my.riemann.server,localhost-pid?=true'" \
  -Dmapred.reduce.child.java.opts="'-javaagent:/var/lib/riemann-jvm-profiler.jar=host=my.riemann.server,localhost-pid?=true'" \

We're passing localhost-pid?=true to tell the profiler that its host field should include the process ID, since Hadoop spins up more than one JVM per box.

You could theoretically use dcache to distribute the jar to each Hadoop node and have it symlinked into some relative path prior to JVM start, but documentation is conflicting and scarce at best, and I've never gotten that to work with Cascading/Cascalog. SCPing the profiler uberjar to each node seems most reliable.

Observables

The profiler samples the stacks of all running threads as often as possible, while consuming only :load fraction of a single core. Expect 10 to 100 samples per second, depending on hardware and thread count. It builds up a statistical estimate of how much time is spent in each function, normalized to seconds/second. Every :dt seconds, the profiler finds the functions which consumed the most RUNNING thread time, finds an exemplar stacktrace for that function which contributed the most to its runtime, and sends each function as a single event to Riemann.

Choosing the dimensionless unit seconds/second allows the profiler to produce meaningful results when combined across machines with similar cores and systemic loads, but possibly disparate numbers of cores. The profiler is not yet smart enough to normalize load to compensate for the other processes on a given box; haven't figured out a reliable way to guess the CPU use of the JVM yet. Functions on boxes with higher background load will be proportionately over-represented; if application workload is uncorrelated with background load, this should not be a huge problem.

Combining events in Riemann

Combining profiler events across hosts yields a picture of the distributed system's hotspots as a whole. You can interpret the individual (and summed) metrics as being an approximate least upper bound on the number of cores engaged in running that function. Here's a small profiler-aggregation stream:

(require '[clojure.string :as str])

(defn profiler [index]
  (where (not (expired? event))
         (splitp re-matches service
                 ; Aggregate rate of samples taken
                 #".*profiler rate" (coalesce
                                      ; Total sample rate
                                      (smap folds/sum
                                            (with :host nil
                                                  index))

                                      ; Distinct number of hosts
                                      (smap folds/count
                                            (adjust [:service str/replace
                                                     "rate" "hosts"]
                                                    (with :host nil
                                                          index))))

                 ; Flatten function times across hosts, updating every 60s.
                 #".*profiler fn .+"
                 (pipe - (by :service
                             (coalesce 60
                                       (smap folds/sum
                                             (with {:host nil :ttl 120} -))))
                       ; And index the top 10.
                       (top 10 :metric
                            index
                            (with :state "expired" index))))))

; I usually have a top-level splitp to route events to various subsystems.
(let [index (index)]
 (streams
  (splitp re-find service
    ; Route profiler events to the profiler
    #"^my-app profiler " (profiler index)

    ...

    ; Index anything else
    index)))

Fire up a grid in Riemann-dash sorted by metric, and choose a query to view your particular application. Here, I'm looking at the "whitewash" prefix, and excluding the epollWait function, since it's not actually doing real work.

service =~ "whitewash profiler fn %" and not service =~ "% epollWait"

It may be helpful to have graphs or cells to show the number of hosts reporting, and the aggregate sample rate:

service = "whitewash profiler hosts"
service = "whitewash profiler rate"

That ought to get you started! Go forth, find hotspots, and make your code faster. :D

Options

Options to the agent are comma-separated key=value pairs. The common options are:

  :host       Riemann server hostname
  :port       Riemann HTTP port (default 5556)
  :prefix     Service prefix for distinguishing this telemetry from other apps
              (default "")
  :localhost  Override the event hostname (default: nil; calls (localhost))
  :localhost-pid?  If truthy, use pid@host as the event hostname.
  :dt         How often to send telemetry events to Riemann, in seconds
              (default 5)
  :load       Target fraction of one core's CPU time to use for profiling
              (default 0.02)"

License

Copyright © 2014 Kyle Kingsbury [email protected]

Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.

Comments
  • java.lang.NoClassDefFoundError: clojure/lang/ITransientMap

    java.lang.NoClassDefFoundError: clojure/lang/ITransientMap

    when i tried to start riemann got the following error in main class . confirmed that the jar riemann-jvm-profiler-0.1.0-standalone.jar riemann-jvm-profiler-0.1.0.jar

    are in classpath

    Exception in thread "main" java.lang.NoClassDefFoundError: clojure/lang/ITransientMap at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2484) at java.lang.Class.getMethod0(Class.java:2727) at java.lang.Class.getMethod(Class.java:1639) at sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:294) at sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:338) Caused by: java.lang.ClassNotFoundException: clojure.lang.ITransientMap at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) ... 6 more FATAL ERROR in native method: processing of -javaagent failed Abort trap: 6

    opened by kamaldeep-ebay 9
  • Fix an ambiguous Thread/sleep call on JDK 19

    Fix an ambiguous Thread/sleep call on JDK 19

    Java 19 has introduced Thread.sleep(Duration) that makes one of the calls to Thread/sleep ambiguous. This has resulted in a compile time warning:

    Reflection warning, riemann/jvm_profiler/stack.clj:177:27 - call
    to static method sleep on java.lang.Thread can't be resolved
    (argument types: java.lang.Object).
    

    as well as a runtime error "No matching method sleep found."

    This patch fixes it by explicitly coercing the double value to a long, thus making the Thread/sleep call unambiguous.

    Hat tip to @mvitz for helping me diagnose this problem.

    opened by jstepien 4
  • Some fixes to the example

    Some fixes to the example

    As written, the example basically doesn't work for me. Here is a copy-paste of my thought process from IRC:

    [2:56pm] gregorstocks: ok, it seems to be working if I a) replace smap with combine and b) replace (top ...) with index. i'm not sure what the right way to handle top here is but it's not that, it doesn't interact with (by :service ...) properly [2:57pm] gregorstocks: that does cause logspew about combine being deprecated, but neither smap nor sreduce seems appropriate here and I can live with logspew [2:58pm] gregorstocks: that's only the "flatten function times across hosts" bit, I don't care about the profiler rate so I just cut that bit out but I'm guessing that wants combine instead of smap too [3:00pm] gregorstocks: though I might be missing something, since the docs seem convinced that smap is a general-purpose replacement for combine but that's definitely not true as I understand those functions [3:01pm] gregorstocks: for example the docs for fixed-event-window do this: (fixed-event-window 5 (smap folds/mean index)) which seems to me like it shouldn't work [3:14pm] gregorstocks: ok i think ive got it fully working and this is FANTASTIC

    I'm still seeing some issues on my install (I get about 8 seconds of everything great, and then 2 seconds in which the most expensive parts disappear, then back to everything great again, etc), but it's much better this way than it was before.

    opened by GregorStocks 4
  • ClassNotFoundException thrown when using basic Java agent setup

    ClassNotFoundException thrown when using basic Java agent setup

    Exception in thread "main" java.lang.ClassNotFoundException: riemann.jvm_profiler.Agent
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
        at sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:280)
        at sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:338)
    FATAL ERROR in native method: processing of -javaagent failed
    Aborted (core dumped)
    

    I encounter this error when following the 'As a Java agent' section of the README, profiling a trivial hello-world main function.

    Exact command: java -javaagent:'/home/rmaus/scratch/riemann/riemann-jvm-profiler/riemann-jvm-profiler-0.1.0-standalone.jar=prefix=TEST,host=localhost,localhost-pid?=true' -cp test.jar com.Test

    Software versions: java version "1.6.0_30" Java(TM) SE Runtime Environment (build 1.6.0_30-b12) Java HotSpot(TM) 64-Bit Server VM (build 20.5-b03, mixed mode)

    Leiningen 1.7.1 on Java 1.6.0_30 Java HotSpot(TM) 64-Bit Server VM

    opened by ryan-maus 4
  •  java.lang.ClassNotFoundException: riemann.jvm_profiler.Agent

    java.lang.ClassNotFoundException: riemann.jvm_profiler.Agent

    I built the profiling jar using lein as mentioned in https://github.com/riemann/riemann-jvm-profiler

    Here is the command to run this as a java agent -javaagent:'/var/opt/lib/riemann-jvm-profiler-0.1.0.jar=prefix=MyApp,host=MyRiemannServer'

    At the runtime i see this error

    Exception in thread "main" java.lang.ClassNotFoundException: riemann.jvm_profiler.Agent at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

    My Lein Version lein --version Leiningen 1.7.1 on Java 1.8.0_66 Java HotSpot(TM) 64-Bit Server VM

    When i unzip the riemann-jvm-profiler-0.1.0.jar,I just see stack.clj under /riemann/jvm_profiler I see Agent.Java under /riemann Is there any problem with the build.

    opened by hkandasa 1
  • Prefix adjustment

    Prefix adjustment

    caveats per IRC:

    • inserts a spurious space when the string is empty
    • presupposes all users want to use space as their separator (which [Aphyr] likes to encourage but tries not to mandate)

    Former is addressed with commits now present. Shadowing reports function fixed in the test class. Happy to make further alterations based on feedback.

    opened by ralph-tice 1
  • profiling as java agent

    profiling as java agent

    I'm using the riemann-jvm-profiler as javaagent.

    I've added this in my server startup script RIEMANN_OPTS="-javaagent:'/var/opt/lib/riemann-jvm-profiler-0.1.0-standalone.jar=host=riemannhost,port=5556,localhost-pid?=true,dt=10,load=0.03'"

    JAVA_OPTS="${MONITORING_OPTS} ${RIEMANN_OPTS} "

    After starting up the server. I see these in the log file JAVA_OPTS: -server -XX:+UseCompressedOops - -XX:MaxPermSize=512m -Xms8192m -Xmx8192m -XX:+PrintGCDateStamps -XX:+PrintGCDetails -javaagent:'/var/opt/lib/riemann-jvm-profiler-0.1.0-standalone.jar=host=riemannhost,port=5556,localhost-pid?=true,dt=10,load=0.03'

    However after running my test cases i don't see any profiling information sent to my riemannhost. riemann.log also looks empty. What is the best way to debug this.

    I'm using jboss as my application server

    opened by hkandasa 0
  • Failed to load class

    Failed to load class "org.slf4j.impl.StaticLoggerBinder"

    I'm trying to add riemann-jvm-profiler to hive mapred jobs

    hive> set mapreduce.map.java.opts=-javaagent:'/var/lib/riemann-jvm-profiler.jar=prefix=profiler,host=10.50.xx.xx'
    hive> set mapreduce.reduce.java.opts=-javaagent:'/var/lib/riemann-jvm-profiler.jar=prefix=profiler,host=10.50.xx.xx'
    hive> select count(*) from test_table;
    Total MapReduce jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks determined at compile time: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapred.reduce.tasks=<number>
    SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
    SLF4J: Defaulting to no-operation (NOP) logger implementation
    SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
    

    This riemann-jvm-profiler.jar works well with tomcat on other server. Hive version 0.12.0-cdh5.1.3

    opened by qnick 0
  • differences with async-profiler

    differences with async-profiler

    async-profiler says it avoids safepoint problem in its README and a low-level profile for jvm-applications.

    I'm curious if riemann-jvm-profiler also avoid safepoint, and what's more or less compared with async-profile, Thanks in advance.

    opened by jiacai2050 3
Owner
Riemann
Riemann
Inspect pmap -X output of a java process, requires Java11, likely not 100% accurate

java-pmap-inspector Inspect pmap -X output of a java process, requires Java 11, likely not 100% accurate. Usage examples $ pmap -X pid > pmap.txt; jav

Brice Dutheil 7 Jul 6, 2022
Tools for tracking down memory / JVM problems & generating predictable-as-possible VM behaviour

Hawkshaw Tools for tracking down memory / JVM problems & generating predictable-as-possible VM behaviour You can Use Hawkshaw to mimic application obj

Martijn Verburg 40 Jan 9, 2021
production heap profiling for the JVM. compatible with google-perftools.

Heapster Heapster provides an agent library to do heap profiling for JVM processes with output compatible with Google perftools. The goal of Heapster

marius a. eriksen 392 Dec 27, 2022
JVM Profiler Sending Metrics to Kafka, Console Output or Custom Reporter

Uber JVM Profiler Uber JVM Profiler provides a Java Agent to collect various metrics and stacktraces for Hadoop/Spark JVM processes in a distributed w

Uber Common 1.7k Dec 22, 2022
Simple JVM Profiler Using StatsD and Other Metrics Backends

statsd-jvm-profiler statsd-jvm-profiler is a JVM agent profiler that sends profiling data to StatsD. Inspired by riemann-jvm-profiler, it was primaril

Etsy, Inc. 330 Oct 30, 2022
Small set of tools for JVM troublshooting, monitoring and profiling.

Swiss Java Knife (SJK) SJK is a command line tool for JVM diagnostic, troubleshooting and profiling. SJK exploits standard diagnostic interfaces of JV

Alexey Ragozin 3.2k Jan 3, 2023
A driver to allow deep interaction with the JVM without any restrictions

ToolFactory JVM Driver A driver to allow deep interaction with the JVM without any restrictions. To include ToolFactory JVM Driver in your projects si

ToolFactory 34 Oct 8, 2022
Kotlin-decompiled - (Almost) every single language construct of the Kotlin programming language compiled to JVM bytecode and then decompiled to Java again for better readability

Kotlin: Decompiled (Almost) every single language construct of the Kotlin programming language compiled to JVM bytecode and then decompiled to Java ag

The Self-Taught Software Engineer 27 Dec 14, 2022
Dynamic loading and compiling project based on JVM

camphor 基于jvm的弹性加载及编译中间件(Elastic loading and compiling middleware based on JVM) camphor_0.0.1 项目简介 该项目定位为弹性中间件,能够使系统在不重启的情况下完成增量代码文件的动态编译和加载 模块介绍 camp

palading 1 Jan 22, 2022
JavaOTTF - Official OTTF parser and composer for JVM languages

JavaOTTF Official OTTF parser and composer for JVM languages. Documentation Please refer to the Wiki Section. Installation Maven Add repository into p

Open Timetable 2 Nov 21, 2022
chardetng for the JVM

chardetng_j This is chardetng compiled for the JVM using asmble. Licensing Please see the file named COPYRIGHT. TL;DR: Apache-2.0 OR MIT Disclaimer Th

Henri Sivonen 1 Oct 18, 2021
JVM Explorer is a Java desktop application for browsing loaded class files inside locally running Java Virtual Machines.

JVM Explorer JVM Explorer is a Java desktop application for browsing loaded class files inside locally running Java Virtual Machines. Features Browse

null 109 Nov 30, 2022
Performance visualisation tools

grav A collection of tools to help visualise process execution. This blog post has some detail on the rationale and implementation detail. Scheduler p

Mark Price 283 Dec 30, 2022
Automatically discover and tag PII data across BigQuery tables and apply column-level access controls based on confidentiality level.

Automatically discover and tag PII data across BigQuery tables and apply column-level access controls based on confidentiality level.

Google Cloud Platform 18 Dec 29, 2022
Kyrestia, named after Kyrestia the Firstborne, is a process engine supporting mainstream process definition standards.

Kyrestia Kyrestia, named after Kyrestia the Firstborne, is a process engine supporting mainstream process definition standards. It is not only lightwe

Weiran Wu 32 Feb 22, 2022
A scientific charting library focused on performance optimised real-time data visualisation at 25 Hz update rates for data sets with a few 10 thousand up to 5 million data points.

ChartFx ChartFx is a scientific charting library developed at GSI for FAIR with focus on performance optimised real-time data visualisation at 25 Hz u

GSI CS-CO/ACO 386 Jan 2, 2023
A scientific charting library focused on performance optimised real-time data visualisation at 25 Hz update rates for data sets with a few 10 thousand up to 5 million data points.

ChartFx ChartFx is a scientific charting library developed at GSI for FAIR with focus on performance optimised real-time data visualisation at 25 Hz u

GSI CS-CO/ACO 385 Dec 30, 2022
steals the discord token, sends it to the discord webhook

Discord token stealer info This program stealing; Token from the discord app (chrome,opera,brave,discordcanary,discord e.t.c) Disclaimer I, the author

fantasy#1337 71 Dec 28, 2021