Performance monitoring and benchmarking suite

Overview

Introduction

Likwid is a simple to install and use toolsuite of command line applications and a library for performance oriented programmers. It works for Intel, AMD, ARMv8 and POWER9 processors on the Linux operating system. There is additional support for Nvidia GPUs. There is support for ARMv7 and POWER8 but there is currently no test machine in our hands to test them properly.

LIKWID Playlist (YouTube)

Build Status General LIKWID DOI

It consists of:

  • likwid-topology: print thread, cache and NUMA topology
  • likwid-perfctr: configure and read out hardware performance counters on Intel, AMD, ARM and POWER processors and Nvidia GPUs
  • likwid-powermeter: read out RAPL Energy information and get info about Turbo mode steps
  • likwid-pin: pin your threaded application (pthread, Intel and gcc OpenMP to dedicated processors)
  • likwid-bench: Micro benchmarking platform for CPU architectures
  • likwid-features: Print and manipulate cpu features like hardware prefetchers (x86 only)
  • likwid-genTopoCfg: Dumps topology information to a file
  • likwid-mpirun: Wrapper to start MPI and Hybrid MPI/OpenMP applications (Supports Intel MPI, OpenMPI, MPICH and SLURM)
  • likwid-perfscope: Frontend to the timeline mode of likwid-perfctr, plots live graphs of performance metrics using gnuplot
  • likwid-memsweeper: Sweep memory of NUMA domains and evict cachelines from the last level cache
  • likwid-setFrequencies: Tool to control the CPU and Uncore frequencies (x86 only)

For further information please take a look at the Wiki or contact us via Matrix chat LIKWID General.


Supported architectures

Intel

  • Intel Atom
  • Intel Pentium M
  • Intel Core2
  • Intel Nehalem
  • Intel NehalemEX
  • Intel Westmere
  • Intel WestmereEX
  • Intel Xeon Phi (KNC)
  • Intel Silvermont & Airmont
  • Intel Goldmont
  • Intel SandyBridge
  • Intel SandyBridge EP/EN
  • Intel IvyBridge
  • Intel IvyBridge EP/EN/EX
  • Intel Xeon Phi (KNL, KNM)
  • Intel Haswell
  • Intel Haswell EP/EN/EX
  • Intel Broadwell
  • Intel Broadwell D
  • Intel Broadwell EP
  • Intel Skylake
  • Intel Kabylake
  • Intel Coffeelake
  • Intel Skylake SP
  • Intel Cascadelake SP
  • Intel Icelake
  • Intel Icelake SP
  • Intel Tigerlake (experimental)

AMD

  • AMD K8
  • AMD K10
  • AMD Interlagos
  • AMD Kabini
  • AMD Zen
  • AMD Zen2
  • AMD Zen3 (limited)

ARM (experimental)

  • ARMv7
  • ARMv8
  • Special support for Marvell Thunder X2
  • Fujitsu A64FX
  • ARM Neoverse N1 (AWS Graviton 2)

POWER (experimental)

  • IBM POWER8
  • IBM POWER9

Nvidia GPUs (experimental)


Download, Build and Install

You can get the releases of LIKWID at: http://ftp.fau.de/pub/likwid/

For build and installation hints see INSTALL file or check the build instructions page in the wiki https://github.com/RRZE-HPC/likwid/wiki/Build

For quick install:

$VERSION=stable
wget http://ftp.fau.de/pub/likwid/likwid-$VERSION.tar.gz
tar -xaf likwid-$VERSION.tar.gz
cd likwid-$VERSION
vi config.mk # configure build, e.g. change installation prefix and architecture flags
make
sudo make install # sudo required to install the access daemon with proper permissions

For ARM builds, the COMPILER flag in config.mk needs to changed to GCCARMv8 or ARMCLANG (experimental). For POWER builds, the COMPILER flag in config.mk needs to changed to GCCPOWER or XLC (experimental).


Documentation

For a detailed documentation on the usage of the tools have a look at the html documentation build with doxygen. Call

make docs

or after installation, look at the man pages.

There is also a wiki at the github page: https://github.com/rrze-likwid/likwid/wiki

If you have problems or suggestions please let me know on the likwid mailing list: http://groups.google.com/group/likwid-users

or if it is bug, add an issue at: https://github.com/rrze-likwid/likwid/issues

You can also chat with us through Matrix:


Extras


Survey

We opened a survey at the user mailing list to get a feeling who uses LIKWID and how. Moreover we would be interested if you are missing a feature or what annoys you when using LIKWID. Link to the survey: https://groups.google.com/forum/#!topic/likwid-users/F7TDho3k7ps


Funding

LIKWID development was funded by BMBF Germany under the FEPA project, grant 01IH13009. Since 2017 the development is further funded by BMBF Germany under the SeASiTe project, grant 01IH16012A.

BMBF logo
Comments
  • No perfctr data for MEM_DP (Fortran)

    No perfctr data for MEM_DP (Fortran)

    Hello,

    I'm trying to get an overview of arithmetic and main memory performance of my program using the MEM_DP group in likwid-perfctr. Unfortunately all the memory counters just show 0:

    --------------------------------------------------------------------------------
    CPU name:	Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz
    CPU type:	Intel Kabylake processor
    CPU clock:	2.30 GHz
    --------------------------------------------------------------------------------
     FLOPS/s =    243.49446939835948      MFLOPS/s
     FLOPS =    771147000.00000000      nflops
     Time =    3.1670000553131104      s
     Datasize =    158.18400000000000       MB
     Loads =    1542294000.0000000     
    --------------------------------------------------------------------------------
    Region likwid_compute, Group 1: MEM_DP
    +-------------------+----------+
    |    Region Info    |  Core 0  |
    +-------------------+----------+
    | RDTSC Runtime [s] | 3.346748 |
    |     call count    |        1 |
    +-------------------+----------+
    
    +------------------------------------------+---------+-------------+
    |                   Event                  | Counter |    Core 0   |
    +------------------------------------------+---------+-------------+
    |             INSTR_RETIRED_ANY            |  FIXC0  | 48479620000 |
    |           CPU_CLK_UNHALTED_CORE          |  FIXC1  | 13018140000 |
    |           CPU_CLK_UNHALTED_REF           |  FIXC2  |  7591400000 |
    |              PWR_PKG_ENERGY              |   PWR0  |     52.1849 |
    |              PWR_DRAM_ENERGY             |   PWR3  |      3.2621 |
    | FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE |   PMC0  |           0 |
    |    FP_ARITH_INST_RETIRED_SCALAR_DOUBLE   |   PMC1  |   800809100 |
    | FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE |   PMC2  |           0 |
    |                DRAM_READS                | MBOX0C1 |      -      |
    |                DRAM_WRITES               | MBOX0C2 |      -      |
    +------------------------------------------+---------+-------------+
    
    +-----------------------------------+-----------+
    |               Metric              |   Core 0  |
    +-----------------------------------+-----------+
    |        Runtime (RDTSC) [s]        |    3.3467 |
    |        Runtime unhalted [s]       |    5.6503 |
    |            Clock [MHz]            | 3950.9693 |
    |                CPI                |    0.2685 |
    |             Energy [J]            |   52.1849 |
    |             Power [W]             |   15.5927 |
    |          Energy DRAM [J]          |    3.2621 |
    |           Power DRAM [W]          |    0.9747 |
    |            DP [MFLOP/s]           |  239.2798 |
    |          AVX DP [MFLOP/s]         |         0 |
    |          Packed [MUOPS/s]         |         0 |
    |          Scalar [MUOPS/s]         |  239.2798 |
    |  Memory load bandwidth [MBytes/s] |         0 |
    |  Memory load data volume [GBytes] |         0 |
    | Memory evict bandwidth [MBytes/s] |         0 |
    | Memory evict data volume [GBytes] |         0 |
    |    Memory bandwidth [MBytes/s]    |         0 |
    |    Memory data volume [GBytes]    |         0 |
    |       Operational intensity       |    inf    |
    +-----------------------------------+-----------+
    

    I get the same results with and without the Marker API. My testprogram is written in Fortran, I used the gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0 to compile likwid.

    The Fortran Module is compiled with gfortran (same version as gcc naturally). The changes in the config.mk are the following:

    COMPILER = GCC#NO SPACE
    
    # Path were to install likwid
    PREFIX ?= /usr/local/likwid#NO SPACE
    
    ACCESSMODE = perf_event#NO SPACE
    
    FORTRAN_INTERFACE = true#NO SPACE
    

    The changes in the make/inclue_GCC.mk are the following:

    FC  = gfortran
    
    FCFLAGS  = -fmodules ./  # gfortran
    

    The command I run is:

    sudo likwid-perfctr -C 0 -g MEM_DP ./likwid_testprogram
    

    I hope I haven't overlooked anything and I'm sorry if I did! Hope you can help me out, thanks in advance!

    Erik

    opened by ErikP94 27
  • [BUG] PIN_MASK missing

    [BUG] PIN_MASK missing

    I ran into some problems trying to do performance analysis of our hybrind MPI/OpenMP code with likwid. After I found out that, no matter what I do, all software threads are getting pinned on a single hw thread per MPI process, I checked the output of likwid-pin. There in the line "[pthread wrapper] PIN_MASK:" nothing coes afterwards.

    likwid-pin -V 3 -c 0,1,2,3 -s 0x7 ./run.py 
    DEBUG - [hwloc_init_cpuInfo:355] HWLOC CpuInfo Family 6 Model 158 Stepping 13 Vendor 0x0 Part 0x0 isIntel 1 numHWThreads 16 activeHWThreads 16
    DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 0 Thread 0 Core 0 Die 0 Socket 0 inCpuSet 1
    DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 8 Thread 1 Core 0 Die 0 Socket 0 inCpuSet 1
    DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 1 Thread 0 Core 1 Die 0 Socket 0 inCpuSet 1
    DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 9 Thread 1 Core 1 Die 0 Socket 0 inCpuSet 1
    DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 2 Thread 0 Core 2 Die 0 Socket 0 inCpuSet 1
    DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 10 Thread 1 Core 2 Die 0 Socket 0 inCpuSet 1
    DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 3 Thread 0 Core 3 Die 0 Socket 0 inCpuSet 1
    DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 11 Thread 1 Core 3 Die 0 Socket 0 inCpuSet 1
    DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 4 Thread 0 Core 4 Die 0 Socket 0 inCpuSet 1
    DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 12 Thread 1 Core 4 Die 0 Socket 0 inCpuSet 1
    DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 5 Thread 0 Core 5 Die 0 Socket 0 inCpuSet 1
    DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 13 Thread 1 Core 5 Die 0 Socket 0 inCpuSet 1
    DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 6 Thread 0 Core 6 Die 0 Socket 0 inCpuSet 1
    DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 14 Thread 1 Core 6 Die 0 Socket 0 inCpuSet 1
    DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 7 Thread 0 Core 7 Die 0 Socket 0 inCpuSet 1
    DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 15 Thread 1 Core 7 Die 0 Socket 0 inCpuSet 1
    DEBUG - [hwloc_init_cacheTopology:785] HWLOC Cache Pool ID 0 Level 1 Size 32768 Threads 2
    DEBUG - [hwloc_init_cacheTopology:785] HWLOC Cache Pool ID 1 Level 2 Size 262144 Threads 2
    DEBUG - [hwloc_init_cacheTopology:785] HWLOC Cache Pool ID 2 Level 3 Size 16777216 Threads 16
    DEBUG - [affinity_init:539] Affinity: Socket domains 1
    DEBUG - [affinity_init:541] Affinity: CPU die domains 1
    DEBUG - [affinity_init:546] Affinity: CPU cores per LLC 8
    DEBUG - [affinity_init:549] Affinity: Cache domains 1
    DEBUG - [affinity_init:553] Affinity: NUMA domains 1
    DEBUG - [affinity_init:554] Affinity: All domains 5
    DEBUG - [affinity_addNodeDomain:370] Affinity domain N: 16 HW threads on 8 cores
    DEBUG - [affinity_addSocketDomain:401] Affinity domain S0: 16 HW threads on 8 cores
    DEBUG - [affinity_addDieDomain:438] Affinity domain D0: 16 HW threads on 8 cores
    DEBUG - [affinity_addCacheDomain:474] Affinity domain C0: 16 HW threads on 8 cores
    DEBUG - [affinity_addMemoryDomain:504] Affinity domain M0: 16 HW threads on 8 cores
    DEBUG - [create_lookups:290] T 0 T2C 0 T2S 0 T2D 0 T2LLC 0 T2M 0
    DEBUG - [create_lookups:290] T 1 T2C 1 T2S 0 T2D 0 T2LLC 0 T2M 0
    DEBUG - [create_lookups:290] T 2 T2C 2 T2S 0 T2D 0 T2LLC 0 T2M 0
    DEBUG - [create_lookups:290] T 3 T2C 3 T2S 0 T2D 0 T2LLC 0 T2M 0
    DEBUG - [create_lookups:290] T 4 T2C 4 T2S 0 T2D 0 T2LLC 0 T2M 0
    DEBUG - [create_lookups:290] T 5 T2C 5 T2S 0 T2D 0 T2LLC 0 T2M 0
    DEBUG - [create_lookups:290] T 6 T2C 6 T2S 0 T2D 0 T2LLC 0 T2M 0
    DEBUG - [create_lookups:290] T 7 T2C 7 T2S 0 T2D 0 T2LLC 0 T2M 0
    DEBUG - [create_lookups:290] T 8 T2C 0 T2S 0 T2D 0 T2LLC 0 T2M 0
    DEBUG - [create_lookups:290] T 9 T2C 1 T2S 0 T2D 0 T2LLC 0 T2M 0
    DEBUG - [create_lookups:290] T 10 T2C 2 T2S 0 T2D 0 T2LLC 0 T2M 0
    DEBUG - [create_lookups:290] T 11 T2C 3 T2S 0 T2D 0 T2LLC 0 T2M 0
    DEBUG - [create_lookups:290] T 12 T2C 4 T2S 0 T2D 0 T2LLC 0 T2M 0
    DEBUG - [create_lookups:290] T 13 T2C 5 T2S 0 T2D 0 T2LLC 0 T2M 0
    DEBUG - [create_lookups:290] T 14 T2C 6 T2S 0 T2D 0 T2LLC 0 T2M 0
    DEBUG - [create_lookups:290] T 15 T2C 7 T2S 0 T2D 0 T2LLC 0 T2M 0
    Evaluated CPU string to CPUs: 0,1,2,3
    Running: ./run.py
    Using 4 thread(s) (cpuset: 0xf)
    [pthread wrapper] 
    [pthread wrapper] MAIN -> 1
    [pthread wrapper] PIN_MASK: 
    [pthread wrapper] SKIP MASK: 0x7
            threadid 140478346245888 -> SKIP 
            threadid 140478337853184 -> SKIP 
            threadid 140478329460480 -> SKIP 
    [pthread wrapper] 
    [pthread wrapper] MAIN -> 1
    [pthread wrapper] PIN_MASK: 
    [pthread wrapper] SKIP MASK: 0x7
            threadid 139758183470848 -> SKIP 
            threadid 139758101591808 -> SKIP 
            threadid 139758043674368 -> SKIP 
    Roundrobin placement triggered
            threadid 139757858096896 -> hwthread 1 - OK
            threadid 139757849704192 -> hwthread 1 - OK
            threadid 139757766047488 -> hwthread 1 - OK
    1 hardware thread(s) seem to be shared by 4 software thread(s). Check OMP_NUM_THREADS, OMP_PLACES, or OMP_PROC_BIND setting
    ...
    

    I istalled likwid like described in the quick install guide:

    VERSION=stable
    wget http://ftp.fau.de/pub/likwid/likwid-$VERSION.tar.gz
    tar -xaf likwid-$VERSION.tar.gz
    cd likwid-*
    vi config.mk # configure build, e.g. change installation prefix and architecture flags
    make
    sudo make install # sudo required to install the access daemon with proper permissions
    

    To reproduce the situation on our cluster (where the same problem occurs), I changed the access to "perf_event" and set the "perf_event_paranoid" value to 2. My local machine runs on Ubuntu 20.04 (Linux 5.13.0-25-generic x86_64) and the cluster on CentOS 7 (Linux 3.10.0-1160.42.2.el7.x86_64 x86_64).

    bug 
    opened by reynozeros 25
  • Rocket Lake support requrest.

    Rocket Lake support requrest.

    Why do you need support for this specific architecture? I hope to use likwid to profiling Rocket Lake.

    **Which architecture model, family and further information? CPU or accelerator? **

    CPU family:                    6
    Model:                           167
    Model name:                  11th Gen Intel(R) Core(TM) i9-11900K @ 3.50GHz
    

    Is the documentation of the hardware counters publicly available? https://software.intel.com/content/www/us/en/develop/articles/intel-sdm.html The RKL is backport of IceLake, I think most performance counters should be same.

    Are there already any usable tools (commercial or open-source)? Perf do support RKL architecture counter, but Power/Energy not yet.

    likwid-perfctr -g Energy sleep 10
    Cannot access directory /usr/local/share/likwid/perfgroups/unknown
    --------------------------------------------------------------------------------
    CPU name:       11th Gen Intel(R) Core(TM) i9-11900K @ 3.50GHz
    CPU type:       Unknown Intel Processor
    CPU clock:      3.50 GHz
    ERROR - [./src/perfmon.c:perfmon_init_maps:1157] Unsupported Processor
    ERROR - [./src/perfmon.c:perfmon_init_funcs:1690] Unsupported Processor
    Segmentation fault (core dumped)
    

    I can use rdmsr to read the PKG energy status:

    sudo rdmsr -d 0x611
    806652205
    

    but I dont know how to use msr method to profiling a program that have high power consumption and the run time lager than 60 seconds.

    According Intel SDM: "MSR_PKG_ENERGY_STATUS is a read-only MSR. It reports the actual energy use for the package domain. This MSR is updated every ~1msec. It has a wraparound time of around 60 secs when power consumption is high, andmay be longer otherwise."

    new architecture 
    opened by edisonchan 18
  • Issue of likwid-perfctr Command with '-t' Option

    Issue of likwid-perfctr Command with '-t' Option

    Hi,

    I would like to reopen this issue; with the current correction in the source code, the command still shows weird outputs.

    Here is the command I've always tried: likwid-perfctr -f -c 0-3 -g BRANCH -t 2s

    And the outputs: -------------------------------------------------------------------------------- # CORES: 0|1|2|3 1 8 4 2.0013075179619 2.0013102503959 2.0013102503959 2.0013102503959 2.0013102503959 6.1923223535304e-06 1.158459678048e-06 1.1490437761027e-06 9.5708087192693e-07 3389.9079014806 3397.4141617506 3369.8001504725 3417.0112020124 2.7854898210138 1.5682565789474 1.5612876599257 1.5499262174127 0.19770460445416 0.19366776315789 0.19397441188609 0.19281849483522 0.003825659243066 0.0024671052631579 0.0028889806025588 0.0029513034923758 0.019350380096752 0.012738853503185 0.014893617021277 0.01530612244898 5.0580511402903 5.1634819532909 5.1553191489362 5.1862244897959 1 8 4 4.004496399955 2.0009526654214 2.0009526654214 2.0009526654214 2.0009526654214 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1 8 4 6.00749528636 2.0007557606874 2.0007557606874 2.0007557606874 2.0007557606874 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1 8 4 8.0101798471776 2.0007722017634 2.0007722017634 2.0007722017634 2.0007722017634 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 --------------------------------------------------------------------------------

    Only the first line shows some values, however, the following lines shows many 0's.

    Best, S.

    opened by Sanghyun-Hong 18
  • likwid-topology confused by non-standard core assignements?

    likwid-topology confused by non-standard core assignements?

    This is about likwid 4.1.1 (release), built to use hwloc that comes with it (config.mk included for completeness).

    For some obscure reason the assignment of processors to physical address/core-id is not what one would expect on Intel hardware. Normally, one expects on a dual socket, 12-core machine (haswell E5-2680 v3), hyperthreading disabled: 0 -> 0:0 1 -> 0:1 .. 11 -> 0:11 12 -> 1:0 13 -> 1:1 ... 23 -> 1:11 The left-hand number is the processor, the first right-hand number the physical address, the second the core-id according to /proc/cpuinfo. On some machines however, we get: 0 -> 0:0 1 -> 0:2 2 -> 0:4 .. 5 -> 0:10 6 -> 1:0 7 -> 1:2 ... 11 -> 1:10 12 -> 0:1 13 -> 0:3 .. 17 -> 0:11 18 -> 1:1 19 -> 1:3 ... 23 -> 1:11 Obviously, it is not what we want, but that is our problem.

    However, when likwid-topology is run on such a node, it seems to get confused. It reports: Sockets: 2 Cores per socket: 6 Threads per core: 2 Apparently, the weird round-robin assignment tricks likwid-topology into assuming that hyperthreading is enabled. The complete output of likwid-topology is in attachment.

    lscpu and lstopo (version 1.10.1) reports are consistent with /proc/cpuinfo htough (output of both in attachment as well). So it would seem that the information coming for hwloc is somehow misinterpreted.

    Thanks, best regards, Geert Jan Bex

    lscpu_out.txt cpuinfo_out.txt likwid_topology_out.txt lstopo_out.txt config.txt

    opened by gjbex 17
  • [DOCS] GENERIC_EVENT

    [DOCS] GENERIC_EVENT

    I have some prblems with some counters on a Zen2 architecture. In the PPR for this architacture I found table 21 on page 182, where a combination of events are recommended, that are not documented in the eventlist afterwards. In particular I wanted to try the described l2 events (e.g. Event[0x43F960] )

    I now would like to try these events out with the GENERIC_EVENT. In the wiki of perfctr it says that the event 0x437805 is either specifyable with the full hex code or with config=0x05,umask=0x78. I didn't get the full hex code specification to work and didn't get, where the 0x43 went in the example. Also putting in the 0x43 either in the umask or the config option did not bring me the results that I get when I specify the complete hexcode directly to perf in wrappermode for example.

    So did I understand the documentation of the GENERIC_EVENT wrong or should this be more clear how it works?

    documentation 
    opened by reynozeros 16
  • [FeatureRequest] AccessD: Split out write vs read access for shared systems

    [FeatureRequest] AccessD: Split out write vs read access for shared systems

    I'm a sysadmin on a system with quite a fairly diverse range of users. The system is configured to allow single-core jobs i.e. many users might end up sharing a single node. As such, we would not want e.g. users to change the CPU frequency of a node they are sharing. This is easily done; we could e.g. chmod -x the SUID likwid-setFreq binary unless a job prologue determines a job has requested node-exclusive access, and have something in the epilogue to ensure everything is set back to default. The issue is with the likwid-accessD binary - it encompasses both very useful read access to msr device files alongside less-used write access.

    It would be useful if there was an easy way to disable write access. This would cover almost all use-cases our users have for likwid.

    Optionally, the write functionality could be split out into a separate daemon. That way, we could still allow users who have requested node-exclusive the ability to set these, while letting all users have read-only access. (Ideally it would also be useful if there was a simple way to reset the effects of both the -setFreq and -accessD binaries back to defaults, as I am not confident I'd be able to work out all the implications on my own)

    opened by pcass-epcc 14
  • [BUG]Stack overflow: *** stack smashing detected ***

    [BUG]Stack overflow: *** stack smashing detected ***

    Compiler warns about overflow when compiling with make using default settings and GCC 10.2

    ===>  COMPILE  GCC/thermal.o
    ===>  COMPILE  GCC/timer.o
    ===>  COMPILE  GCC/topology.o
    ./src/topology.c: In function ‘readTopologyFile’:
    ./src/topology.c:444:50: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
      444 |                     cpuid_info.osname[maxStrLen] = '\0';
          |                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
    ./src/topology.c:436:45: note: at offset 257 to an object with size 0 allocated by ‘malloc’ here
      436 |                 cpuid_info.osname = (char*) malloc(maxStrLen * sizeof(char));
          |                                             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    

    And also

    ===>  COMPILE  GCC/loadData.o
    ===>  ENTER  /tmp/likwid-5.0.2/ext/hwloc
    ===>  ENTER  /tmp/likwid-5.0.2/ext/lua
    In function ‘createstrobj’,
        inlined from ‘luaS_createlngstrobj’ at ./src/lstring.c:148:17,
        inlined from ‘luaS_newlstr’ at ./src/lstring.c:206:10:
    ./src/lstring.c:142:17: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
      142 |   getstr(ts)[l] = '\0';  /* ending 0 */
    In file included from ./includes/ldebug.h:11,
                     from ./src/lstring.c:17:
    ./src/lstring.c: In function ‘luaS_newlstr’:
    ./includes/lstate.h:185:18: note: at offset 0 to object ‘ts’ with size 24 declared here
      185 |   struct TString ts;
          |                  ^~
    ===>  CREATE SHARED LIB  liblikwid.so
    ===>  CREATE LIB  liblikwidpin.so
    
    

    patching the file src/toplogy.c to

                        cpuid_info.osname[maxStrLen -1] = '\0';
    

    gives

     likwid-topology
    /usr/local/bin/likwid-lua: error loading module 'liblikwid' from file '/usr/local/share/lua/5.3/liblikwid.lua':
            /usr/local/share/lua/5.3/liblikwid.lua:330: too many C levels (limit is 200) in function at line 313 near 'i'
    stack traceback:
            [C]: in ?
            [C]: in function 'require'
            /usr/local/share/lua/5.3/liblikwid.lua:33: in main chunk
            [C]: in function 'require'
            /usr/local/share/lua/5.3/liblikwid.lua:33: in main chunk
            [C]: in function 'require'
            /usr/local/share/lua/5.3/liblikwid.lua:33: in main chunk
            [C]: in function 'require'
            /usr/local/share/lua/5.3/liblikwid.lua:33: in main chunk
            [C]: in function 'require'
            ...
            [C]: in function 'require'
            /usr/local/share/lua/5.3/liblikwid.lua:33: in main chunk
            [C]: in function 'require'
            /usr/local/share/lua/5.3/liblikwid.lua:33: in main chunk
            [C]: in function 'require'
            /usr/local/share/lua/5.3/liblikwid.lua:33: in main chunk
            [C]: in function 'require'
            /usr/local/share/lua/likwid.lua:33: in main chunk
            [C]: in function 'require'
            /usr/local/bin/likwid-topology:35: in main chunk
            [C]: in ?
    
    

    This is with version 5.0.2, archlinux X86-64 on AMD Zen1

    bug 
    opened by sab24 14
  • #297 improve examples

    #297 improve examples

    C-markerAPI is a simple example and C-internalMarkerAPI is more complete. Examples now work with current version of likwid (v5.0.1) and comments have been dramatically improved.

    Mentioned in RRZE-HPC/likwid#292 Part of RRZE-HPC/likwid#297

    opened by paigeweber13 14
  • likwid_mpirun works for 1 OpenMP thread per MPI process but fails for 2 threads per process

    likwid_mpirun works for 1 OpenMP thread per MPI process but fails for 2 threads per process

    I am trying to run a hybrid MPI/OpenMP code across two haswell nodes. Both of the following work, i.e. with 1 OpenMP thread per MPI process: likwid-mpirun -mpi intelmpi -omp intel -np 48 -nperdomain N:24

    likwid-mpirun -f -mpi intelmpi -np 48-pin S0:0_S0:1_S0:2_S0:3_S0:4_S0:5_S0:6_S0:7_S0:8_S0:9_S0:10_S0:11_S1:0_S1:1_S1:2_S1:3_S1:4_S1:5_S1:6_S1:7_S1:8_S1:9_S1:10_S1:11

    An attempt to use 2 threads per process fails as follows: likwid-mpirun -f -mpi intelmpi -np 24 -pin S1:0-1_S1:2-3_S1:4-5_S0:0-1_S0:2-3_S0:4-5_S1:6-7_S1:8-9_S1:10-11_S0:6-7_S0:8-9_S0:10-11 : /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by )

    Any thought on what this means? The executable works fine for multiple threads per process using the normal "mpirun." Thanks.

    opened by aoloso 14
  • likwid-setFrequencies, segment fault

    likwid-setFrequencies, segment fault

    on my centos 7.3, I can't use likwid-setFrequencies

    
    [root@localhost runtmp]# likwid-setFrequencies -p -V 3
    DEBUG: Given CPU expression expands to 24 CPU cores:
    DEBUG: 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23
    DEBUG: Given CPU expression expands to 2 CPU sockets:
    DEBUG: 0,1
    Unable to open path /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq for reading
    	EXIT WITH ERROR:  Max Freq. could not be read
    [root@localhost runtmp]# find / -name scaling_cur_freq
    [root@localhost runtmp]# likwid-setFrequencies -l -V 3
    DEBUG: Given CPU expression expands to 24 CPU cores:
    DEBUG: 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23
    DEBUG: Given CPU expression expands to 2 CPU sockets:
    DEBUG: 0,1
    段错误(吐核)    
    

    'likwid-setFrequencies -l ' get segment fault

    /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq this file does not exist

    opened by allenbunny 13
  • Add support for AMD Zen4

    Add support for AMD Zen4

    Currently supported models:

    • Ryzen: Family 0x19, Model 0x61
    • Epyc: Family 0x19, Model 0x11

    Supported perfmon units:

    • [x] Fixed counters (instructions, mperf, aperf)
    • [x] General purpose counters (PMC)
    • [x] L3 cache segments (CPMC)
    • [x] RAPL energy measurements (per core and per L3 segment)
    • [x] DataFabric counters (DFC)

    Supported perfmon backends:

    • [x] direct
    • [x] accessdaemon
    • [x] perf_event

    Missing:

    • [x] Prefetch control
    • [x] Documentation
    opened by TomTheBear 0
  • [help wanted] likwid-powermeter: limits singularity containers to single thread only?

    [help wanted] likwid-powermeter: limits singularity containers to single thread only?

    Hi,

    I want to use likwid-powermeter for measuring power consumption of a custom-build singularity container. Unfortunately, it seems, that using likwid-powermeter under these circumstances the singularity container gets limited to single thread only.

    I am not sure, why this happens or if I am using linkwid-powermeter incorrectly, but I would appreciate any help on this topic.

    Also I am seeing the following Error outputted by likwid-powermeter:

    ERROR: ld.so: object '/usr/local/lib/liblikwidpin.so.5.2' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
    

    To Reproduce

    In the following I have prepared a simple example, so that the issue I am facing can be reproduced easily. Software-wise I am using the latest version of likwid (5.2.2) and the latest version of singularity (3.10.4) on an ubuntu 22.04 lts system. (This is only my local test platform, I have noticed this issue initially on the Test-Cluster of RRZE and also on earlier versions of singularity and likwid.)

    The following script builds a Docker image, which then is converted to a .sif singularity image. This image uses ubuntu:22.04 as base and then just installs p7zip (https://www.7-zip.org/download.html). The internal benchmark of 7z can be used as an example for a multi-threaded application in this case.

    #!/bin/bash -l
    
    mkdir tmp123
    
    # create Dockerfile
    cat > "./tmp123/Dockerfile" << EOF
    FROM ubuntu:22.04
    RUN apt update && apt install -y p7zip-full
    
    CMD 7z b
    EOF
    
    # build Dockerimage locally
    docker build -t 7z:amd64 ./tmp123/
    
    rm -rf ./tmp123/
    
    # convert docker image to .sif (singularity image format)
    sudo singularity build 7z-test.sif docker-daemon://7z:amd64
    

    With the following command I can run the 7z benchmark inside this singularity container:

    singularity exec 7z-test.sif 7z b
    

    Here everything works as expected, and the benchmark uses all available threads on the system.

    Measuring power consumption using likwid-powermeter

    For measuring power consumption for the execution of the 7z benchmark inside the singularity container I would use the following command:

    likwid-powermeter singularity exec 7z-test.sif 7z b
    
    Output
    fabian@ubuntu:~$ likwid-powermeter singularity exec 7z-test.sif 7z b
    --------------------------------------------------------------------------------
    CPU name:	11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz
    CPU type:	Intel Rocketlake processor
    CPU clock:	2.50 GHz
    --------------------------------------------------------------------------------
    ERROR: ld.so: object '/usr/local/lib/liblikwidpin.so.5.2' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
    ERROR: ld.so: object '/usr/local/lib/liblikwidpin.so.5.2' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
    
    7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
    p7zip Version 16.02 (locale=C,Utf16=off,HugeFiles=on,64 bits,16 CPUs 11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz (A0671),ASM,AES-NI)
    
    11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz (A0671)
    CPU Freq: - - - - - - - - -
    
    RAM size:   64089 MB,  # CPU hardware threads:  16
    RAM usage:   3530 MB,  # Benchmark threads:     16
    
                           Compressing  |                  Decompressing
    Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
             KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS
    
    22:       8001   100   7785   7784  |      61776   100   5269   5269
    23:       7603   100   7747   7747  |      59896   100   5183   5182
    24:       7082   100   7616   7615  |      58853   100   5166   5166
    25:       6939   100   7924   7923  |      58880   100   5241   5240
    ----------------------------------  | ------------------------------
    Avr:             100   7768   7767  |              100   5215   5214
    Tot:             100   6491   6491
    --------------------------------------------------------------------------------
    Runtime: 278.967 s
    Measure for socket 0 on CPU 0
    Domain PKG:
    Energy consumed: 11413 Joules
    Power consumed: 40.9117 Watt
    Domain PP0:
    Energy consumed: 8563.26 Joules
    Power consumed: 30.6963 Watt
    Domain PP1:
    Energy consumed: 0.0535278 Joules
    Power consumed: 0.000191879 Watt
    Domain DRAM:
    Energy consumed: 0 Joules
    Power consumed: 0 Watt
    Domain PLATFORM:
    Energy consumed: 0 Joules
    Power consumed: 0 Watt
    --------------------------------------------------------------------------------
    fabian@ubuntu:~$ 
    

    --> In this case the 7z benchmark inside the singularity container only used one thread of the system, instead of all available threads (Score is equivaltent to singel-threaded performace). Any idea on how to fix this behaviour? Am I missing something with my usage of likwid-powermeter?

    Using Docker:

    For comparison likwid-powermeter works just fine with docker containers:

    likwid-powermeter docker run -it --rm 7z:amd64
    
    Output
    fabian@ubuntu:~$ likwid-powermeter docker run -it --rm 7z:amd64
    --------------------------------------------------------------------------------
    CPU name:	11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz
    CPU type:	Intel Rocketlake processor
    CPU clock:	2.50 GHz
    --------------------------------------------------------------------------------
    
    7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
    p7zip Version 16.02 (locale=C,Utf16=off,HugeFiles=on,64 bits,16 CPUs 11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz (A0671),ASM,AES-NI)
    
    11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz (A0671)
    CPU Freq: - - - - - - - - -
    
    RAM size:   64089 MB,  # CPU hardware threads:  16
    RAM usage:   3530 MB,  # Benchmark threads:     16
    
                           Compressing  |                  Decompressing
    Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
             KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS
    
    22:      63468  1342   4600  61743  |     615910  1523   3449  52531
    23:      63841  1425   4566  65047  |     593610  1487   3455  51360
    24:      58923  1391   4556  63354  |     613151  1557   3456  53819
    25:      57578  1428   4605  65741  |     595314  1534   3453  52981
    ----------------------------------  | ------------------------------
    Avr:            1396   4582  63971  |             1525   3453  52673
    Tot:            1461   4018  58322
    --------------------------------------------------------------------------------
    Runtime: 32.5547 s
    Measure for socket 0 on CPU 0
    Domain PKG:
    Energy consumed: 3790.61 Joules
    Power consumed: 116.438 Watt
    Domain PP0:
    Energy consumed: 3440.85 Joules
    Power consumed: 105.694 Watt
    Domain PP1:
    Energy consumed: 0.00476074 Joules
    Power consumed: 0.000146238 Watt
    Domain DRAM:
    Energy consumed: 0 Joules
    Power consumed: 0 Watt
    Domain PLATFORM:
    Energy consumed: 0 Joules
    Power consumed: 0 Watt
    --------------------------------------------------------------------------------
    
    opened by fabianbees 3
  • [BUG] Problem with likwid-mpirun

    [BUG] Problem with likwid-mpirun

    Describe the bug likwid fails when running likwid-mpirun on the M1.

    To Reproduce On a M1 (I tried with the M1 from Apple Studio), do:

    $ likwid-mpirun -mpi openmpi -np 16 -pin S1:0-3@S2:0-3@S4:0-3@S5:0-3 -d hostname
    DEBUG: Executable given on commandline: /usr/sbin/hostname
    WARN: Cannot extract OpenMP vendor from executable or commandline, assuming no OpenMP
    sh: line 1: scontrol: command not found
    DEBUG: Reading hostfile from batch system
    Available hosts for scheduling:
    Host                    Slots   MaxSlots        Interface
    DEBUG: Evaluated CPU expressions: [[2,3,4,5,6,7,8,9,12,13,14,15,16,17,18,19]]
    DEBUG: Assign 16 processes with 1 per node and 16 threads per process to 0 hosts
    WARN: Only 0 processes out of 16 can be assigned, running with 0 processes
    DEBUG: Scheduling on hosts:
    /apps/modules/likwid-m1/bin/likwid-lua: /apps/modules/likwid-m1/bin/likwid-mpirun:1465: attempt to perform 'n%0'
    stack traceback:
            /apps/modules/likwid-m1/bin/likwid-mpirun:1465: in local 'writeWrapperScript'
            /apps/modules/likwid-m1/bin/likwid-mpirun:2514: in main chunk
            [C]: in ?
    

    I do explicit pinning for only accessing the Firestorm nodes, but there is no difference without the -pin parameter.

    $ likwid-mpirun --version
    likwid-mpirun -- Version 5.2.0 (commit: 233ab943543480cd46058b34616c174198ba0459)
    $ module li
    Currently Loaded Modulefiles:
     1) likwid/5.2.2
    
    bug 
    opened by JanLJL 1
  • [BUG] likwid-powermeter fails when a socket domain does not contain any hardware thread

    [BUG] likwid-powermeter fails when a socket domain does not contain any hardware thread

    Describe the bug

    When running likwid-powermeter in a cpuset limited environment (e.g. SLURM jobs) and the cpuset does not contain hardware threads in all socket domains, it fails.

    To Reproduce

    • LIKWID command and/or API usage

      $ likwid-pin -c N -p
      0,40,1,41,2,42,5,45,6,46,10,50,11,51,12,52,15,55,16,56,3,43,4,44,7,47,8,48,9,49,13,53,14,54,17,57,18,58,19,59,20,60,21,61,22,62,25,65,26,66,30,70,31,71,32,72,35,75,36,76,23,63,24,64,27,67,28,68,29,69,33,73,34,74,37,77,38,78,39,79
      
      $ taskset -c 0-9 likwid-powermeter -i   
      WARN: Selected affinity domain S0 has only 10 hardware threads, but selection string evaluates to 20 threads.
          This results in multiple threads on the same hardware thread.
      /tmp/bin/likwid-lua:  /tmp/bin/likwid-powermeter:172: attempt to index a nil value (field '?')
      stack traceback:
      	 /tmp/bin/likwid-powermeter:172: in main chunk
      	[C]: in ?
      
    • LIKWID version and download source (Github, FTP, package manger, ...) likwid-powermeter -- Version 5.2.2 (commit: 233ab943543480cd46058b34616c174198ba0459)

    • Operating system Ubuntu 20.04.5 LTS

    bug 
    opened by TomTheBear 0
  • Add ability for pointer chasing benchmarks

    Add ability for pointer chasing benchmarks

    We wanted to measure the performance of pointer chasing on different architectures and noted that this was/is currently not directly possible with likwid-bench directly. Thus, an additional directive was added to the pseudo assembly parser(s) that allows to initialize a memory stream with two new methods (the current default is init with "all 1" - which is not changed by this PR). Both methods have in common that the resulting values in the stream can be used for register indirect, register index addressing. The two methods are

    1. INDEX_STRIDE follows the method used in [1, Sec. 3.B.1] to init a stream with (index + stride) % stream_size
    2. LINKED_LIST creates a circularly linked list where each "virtual" list entry occupies a configurable amount of bytes which can be used to ensure that one jumps over cache lines between list elements. Note that "pointers" to the next list element are randomly arranged but the initialization ensures that a traversal of the created list will cover the whole stream, such that there are no "shortcuts".

    The PR also refactors stream initialization code to reduce code duplication for memory initialization. It also syncs the memory initialization to use off_t instead of int as datatype for the offset, which is what the declaration of the allocation struct employ uses.

    There are some drawbacks currently:

    1. Both stride and the linked list item/block size are currently read from the pseudo assembly file. It would be nice to control them from the "outside", e.g. by a command line argument to likwid-bench. This was attempted during development, but the approach caused crashes when the tests/benchmarks are "burned" into likwid-bench as they appear to be read-only in that case (didn't dig too deep here). The strided benchmarks files actually lack the optional stride in the INIT statement, currently rendering them to always load the same address. So one either chooses a different default for this case or reasons about a general solution to control these values/benchmark parameters.
    2. The benchmarks use 32-bit offsets only, mainly due to the fact that the TYPE INT uses... (surprise) int under the hood. This might become a limitation when one wants to use larger arrays. Also the initialization may produce bad values when the stream size grows beyond 2 GB.

    So there is still stuff to elaborate on...

    opened by christgau 2
Releases(v5.2.2)
  • v5.2.2(Aug 10, 2022)

    • Fix pin string parsing in pinning library
    • Make SBIN path configurable in build system
    • Add PKGBUILD for ArchLinux package builds
    • Remove accessDaemon double-fork in systemd environments
    • Group updates for L2/L3 (mainly AMD Zen)
    • Fix multi-initialization in MarkerAPI
    • Add energy event scaling for Fujitsu A64FX
    • Nvmon: Use Cupti error string to get better warning/error messages
    • Nvmon: Store events internally to re-use event strings in stopCounters
    • AccessLayer: Catch SIGCHLD to stop sending requests to accessDaemon if it was killed
    • likwid-genTopoCfg: Update writing and reading of topology file
    • Add INST_RETIRED_NOP event for Intel Icelake (desktop & server)
    • Removed some memory leaks
    • Improved checks for RDPMC availability
    • Add TOPDOWN_SLOTS for perf_event
    • Fix for systems with CPU sockets without hwthreads (A64FX FX1000)
    • Fix if HOME environment variable is not set (systemd)
    • Reader function for perf_event_paranoid in Lua to get state early
    • likwid-mpirun: Sanitize np and ppn values to avoid crashes

    Note: The groups MEM_DP and MEM_SP use only 6 of 8 memory controllers for Intel Icelake SP. The attached patch fixes both groups.

    Source code(tar.gz)
    Source code(zip)
    likwid-icx-mem-group-fix.patch(4.82 KB)
  • v5.2.1(Dec 3, 2021)

    We are happy to release a new bugfix version of the LIKWID tool suite.

    • Add support for Intel Rocketlake and AMD Zen3 variant (Family 19, Model 0x50)
    • Fix for perf_event multiplexing (important!)
    • Fix for potential deadlock in MarkerAPI (thx @jenny-cheung)
    • Build and runtime fixes for Nvidia GPU backend, updates for CUDA test codes
    • peakflops kernel for ARMv8
    • Updates for AMD Zen1/2/3 event lists and groups
    • Support spaces in MarkerAPI region tags (thx @jrmadsen)
    • Use 'online' cpulist instead of 'present'
    • Switch CI from Travis-CI to NHR@FAU Cx services
    • Document -reset and -ureset for likwid-setFrequencies
    • Reset cpuset in unpinned runs
    • Remove destructor in frequency module
    • Check PID if given through --perfpid
    • Intel Icelake: OFFCORE_RESPONSE events
    • AccessDaemon: Check PCI init state before using it
    • likwid-mpirun: Set mpi type for SLURM automatically
    • likwid-mpirun: Fix for skip mask for OpenMPI
    • Fix for triad_sve* benchmarks

    Note: The groups MEM_DP and MEM_SP use only 6 of 8 memory controllers for Intel Icelake SP. The attached patch fixes both groups.

    Source code(tar.gz)
    Source code(zip)
    likwid-icx-mem-group-fix.patch(4.82 KB)
  • v5.2.0(Jun 18, 2021)

    We are happy to release a new major update of the LIKWID tool suite.

    • Support for AMD Zen3 (Core + Uncore)
    • Support for Intel IcelakeSP (Core + Uncore)
    • New affinity code
    • Fix for Ivybridge uncore code
    • Bypass accessdaemon by using rdpmc instruction on x86_64
    • Introduce notion of CPU die in topology module
    • Use CPU dies for socket-lock for Intel CascadelakeAP
    • Add environment variable LIKWID_IGNORE_CPUSET to break out of current CPUset
    • Fixes for affinity module CPUlist sorting
    • Build against system-installed hwloc
    • Update for Intel SkylakeX/CascadelakeX L3 group
    • Rename DataFabric events for all generations of AMD Zen
    • Add static cache configuration for Fujitsu A64FX
    • Add multiplexing checks for perf_event backend
    • Fix for table width of likwid-topology after adding CPU die column
    • Adding RasPi 4 with 32 bit OS as ARMv7
    • Add default groups for Intel Icelake desktop
    • Fix for likwid-setFrequencies to not apply minFreq when setting governor
    • likwid-powermeter: Fix hwthread selection when run with -p
    • likwid-setFrequencies: Get measured base frequency if register is not readable
    • CLOCK group for all AMD Zen
    • Fixes in Nvidia GPU support in NvMarkerAPI and topology module

    WARNING: This version has bugs in the perf_event backend. The multiplexing checks cause problems. WARNING: The benchmarks triad_sve* for ARM8 chips use only 3 instead of 4 streams. Note: The groups MEM_DP and MEM_SP use only 6 of 8 memory controllers for Intel Icelake SP. The attached patch fixes both groups.

    Source code(tar.gz)
    Source code(zip)
    likwid-icx-mem-group-fix.patch(4.82 KB)
  • v5.1.1(Mar 31, 2021)

    Changelog for version 5.1.1:

    • Support for Intel Cometlake desktop (Core + Uncore)
    • Fix for topology module of Fujitsu A64FX
    • Fix for Intel Skylake SP in SNC mode
    • Fix for likwid-perfscope
    • Fix for CLI argument parsing
    • Updated group and data file checkers
    • Vector sum benchmark in SVE
    • FP_PIPE group for Fujitsu A64FX
    • Maximal number of CLI arguments configurable in config.mk (currently 16384)
    • Fix for cpulist_sort function
    • Fix for Intel SkylakeSP/CascadelakeSP CBOX devices in perf_event mode
    • Multiplexing-Fix for perf_event (with warning)
    • Adjust CUDA function pointer names in topology_gpu to avoid name clashes
    • Fix for Lua 5.1
    • Fix for likwid-setFrequency when reading CPU base frequency

    Note: This version does not contain any updates for AMD Zen3 and Intel IcelakeSP. Note: Uncore measurements on Intel Cascadelake AP systems require an update of the topology module which will come in 5.2.0 WARNING: The benchmarks triad_sve* for ARM8 chips use only 3 instead of 4 streams.

    Source code(tar.gz)
    Source code(zip)
  • v5.1.0(Nov 20, 2020)

    Changelog for version 5.1.0:

    • Support for Intel Icelake desktop (Core + Uncore)
    • Support for Intel Icelake server (Core only)
    • Support for Intel Tigerlake desktop (Core only)
    • Support for Intel Cannonlake (Core only)
    • Support for Nvidia GPUs with compute capability >= 7.0 (CUpti Profiling API)
    • Initial support for Fujitsu A64FX (Core) including SVE assembly benchmarks
    • Support for ARM Neoverse N1 (AWS Graviton 2)
    • Support for AMD Zen3 (Core + Uncore but without any events)
    • Check for Intel HWP
    • Fix for TID filter of Skylake SP LLC filter0 register
    • Fix for Lua 5.1
    • Fix for likwid-mpirun skip masks
    • Fortran90 interface for NvMarkerAPI (update)
    • CPU_is_online check to filter non-usable CPU cores
    • Fix for freeMemory in NUMA module (with hwloc backend)
    • Fix for likwid-setFrequencies

    We want to thank Intel, AMD, AWS and the University of Regensburg for their support.

    If you want to use this release in a publication, please cite: https://doi.org/10.5281/zenodo.4282696

    Source code(tar.gz)
    Source code(zip)
    likwid-mpirun-5.1.0.patch(656 bytes)
  • v5.0.2(Oct 6, 2020)

    Changelog for 5.0.2:

    • Fix memory leak in calc_metric()
    • New peakflops benchmarks in likwid-bench
    • Fix for NUMA domain handling properly
    • Improvements for perf_event backend
    • Fix for perfctr and powermeter with perf_event backend
    • Fix for likwid-mpirun for SLURM with cpusets
    • Fix for likwid-setFrequencies in cpusets
    • Update for POWER9 event list
    • Updates for AMD Zen, Zen+ and Zen2 (events, groups)
    • Fix for Intel Uncore events with same name for different devices
    • Fix for file descriptor handling
    • Fix for compilation with GCC10
    • Remove sleep timer warning
    • Update examples C-markerAPI and C-internalMarkerAPI

    Note: If you want to use LIKWID 5.0.2 with Lua 5.1, please apply this patch

    Source code(tar.gz)
    Source code(zip)
  • v5.0.1(Dec 23, 2019)

    I'm happy to announce a new bugfix release of LIKWID 5.

    • Some fixes for likwid-mpirun
      • Fix for hybrid pinning with multiple hosts
      • Fix for perf.groups without core-local events (switch to likwid-pin)
      • Fix for command line parser
      • For for mpiopts parameter
      • Add UPMC as Uncore counter to splitUncoreEvents()
      • Expand user-given input to abspath if possible
      • Check for at least one executable in user-given command
      • Add skip mask for SLURM + Intel OpenMP
      • Check if user-given MPI type is available
    • Fix for perf_event backend when used as root
    • Include likwid-marker.h in likwid.h to not break old MarkerAPI code
    • Enable build with ARM HPC compiler (ARMCLANG compiler setting)
    • Fix creation of likwid-bench benchmarks on POWER platforms
    • Fix for build system in NVIDIA_INTERFACE=BUILD_APPDAEMON=true
    • Update for executable tester
    • Update for MPI+X test (X: OpenMP or Pthreads)

    Merry Christmas

    Source code(tar.gz)
    Source code(zip)
  • v5.0.0(Nov 14, 2019)

    New version LIKWID 5.0.0

    Changelog:

    • Support for ARM architectures. Special support for Marvell Thunder X2
    • Support for IBM POWER architectures. Support for POWER8 and POWER9.
    • Support for AMD Zen2 microarchitecture.
    • Support for data fabric counters of AMD Zen microarchitecture
    • Support for Nvidia GPU monitoring (with NvMarkerAPI)
    • New clock frequency backend (with less overhead)
    • Generation of benchmarks for likwid-bench on-the-fly from ptt files
    • Switch back to C-based metric calculator (less overhead)
    • Interface function to performance groups, create your own.
    • Integration of GOTCHA for hooking into client application at runtime
    • Thread-local initialization of streams for likwid-bench
    • Enhanced support for SLURM with likwid-mpirun
    • New MPI and Hybrid pinning features for likwid-mpirun
    • Interface to enable the membind kernel memory policy
    • JSON output filter file (use -o output.json)
    • Update of internal HWLOC to 2.1.0

    Note: The MarkerAPI Macros have been moved to a separate header "likwid-marker.h"

    Source code(tar.gz)
    Source code(zip)
  • 4.3.4(Apr 23, 2019)

    New bugfix release:

    • Fix for detecting PCI devices if system can split up LLC and memory channels (Intel CoD or SNC)
    • Don't pin accessDaemon to threads to avoid long access latencies due to busy hardware thread
    • Fix for calculations in likwid-bench if streams are used for input and output
    • Fix for LIKWID_MARKER_REGISTER with perf_event backend
    • Support for Intel Atom (Tremont) (nothing new, same as Intel Atom (Goldmont Plus))
    • Workaround for topology detection if LLC and memory channels are split up. Kernel does not detect it properly sometimes. (Intel CoD or SNC)
    • Minor updates for build system
    • Minor updates for documentation

    Notice: If you want to compile likwid-4.3.4 with ACCESSMODE=perf_event, please apply the attached patch before compiling.

    Source code(tar.gz)
    Source code(zip)
    likwid-4.3.4-perf.patch(865 bytes)
  • 4.3.3(Nov 26, 2018)

    • Fixes for likwid-mpirun
    • Fixes for events of Intel Skylake SP and Intel Broadwell
    • Support for Intel CascadeLake X (only new eventlist, uses code from Intel Skylake SP)
    • Fix for bitmask creation in Lua
    • Event options for perf_event backend
    • New assembly benchmarks in likwid-bench
    • MarkerAPI: Function to reset regions
    • Some new performance groups (DIVIDE and TMA)
    • Fixes for AMD Zen performance groups
    • Fix when using topology input file
    • Minor bugfixes
    Source code(tar.gz)
    Source code(zip)
  • 4.3.2(Apr 17, 2018)

    • Fix in internal metric calculator
    • Support for Intel Knights Mill (core, rapl, uncore)
    • Intel Skylake X: Some fixes for events and perf. groups
    • Set KMP_INIT_AT_FORK to bypass bug in Intel OpenMP memory allocator
    • AMD Zen: Use RETIRED_INSTRUCTION instead of fixed-purpose counter for metric calculation
    • All FLOPS_* groups now have vectorization ratio
    • Fix for MarkerAPI with perf_event backend
    • Fix for maximal/minimal uncore frequency
    • Skip counters that are already in use, don't exit
    • likwid-mpirun: minor fix when overloading a host
    • Improved detection of PCI devices
    Source code(tar.gz)
    Source code(zip)
  • 4.3.1(Jan 4, 2018)

  • 4.3.0(Nov 7, 2017)

    • Support for Intel Skylake SP architecture (core, uncore, energy)
    • Support for AMD Zen architecture (core, l2, energy)
    • Support for Intel Goldmont Plus architecture
    • Pinning strategy 'balanced'
    • New Lua based calculator
    • Support for Intel PState CPU frequency daemon

    Minor:

    • Fixed MCDRAM measurements on Intel Xeon Phi (KNL) with perf_event back end

    Merry Christmas

    Source code(tar.gz)
    Source code(zip)
  • 4.2.1(Aug 3, 2017)

    • Fix for logical selection strings
    • likwid-agent: general update
    • likwid-mpirun: Improved SLURM support
    • likwid-mpirun: Print metrics sorted as they are listen in perf. group
    • likwid-perfctr: Print metrics/events as header in timeline mode Redirect to file when -o switch is used
    • likwid-setFrequency: Commandline options to set min, max and current frequency
    • Pinning-Library: Automatically detect and skip shepard threads
    • Intel Broadwell: Added support for E3 (like Desktop), Fix for L3 group
    • Intel IvyBridge: Fix for PCU fixed-purpose counters
    • Intel Skylake: Fix for events CYCLE_ACTIVITY, new event L2_LINES_OUT
    • Intel Xeon Phi (KNL): Fix for overflow register, Update for ENERGY group OFFCORE_RESPONSE events are now tile-specific
    • Intel SandyBridge: Fix for L3CACHE group
    • Event/Counter list contains only usable counters and events
    • Fix and warning message for static library builds
    Source code(tar.gz)
    Source code(zip)
  • 4.2.0(Dec 22, 2016)

    • Support for Intel Xeon Phi (Knights Landing): Core, Uncore, RAPL
    • Support for Uncore counters of some desktop chips (SandyBridge, IvyBridge, Haswell, Broadwell and Skylake)
    • Basic support for Linux perf_event interface instead of native access. Currently only core-local counters working, Uncore is experimental
    • Support to build against a existing Lua installation (5.1 - 5.3 tested)
    • Support for CPU frequency manipulation, Lua interface updated
    • Access module checks for LLNL's msr_safe kernel module
    • Support for counter registers that are only available when HyperThreading is off
    • Socket measurements can be used for all cores on the socket in metric formulas.

    The LIKWID team wishes Merry Christmas to everyone.

    Source code(tar.gz)
    Source code(zip)
  • 4.1.2(Aug 8, 2016)

    • Fix for likwid-powermeter: Use proper energy unit
    • Fix for performance groups for Intel Broadwell (D/EP): DATA and FALSE_SHARE
    • Reduce number of started access daemons
    • Clean Uncore unit local control registers (needed for simultaneous use of LIKWID 3 and 4)
    • Clean config, filter and counter registers at *_finalize function
    • Fix for likwid-features and likwid-perfctr
    Source code(tar.gz)
    Source code(zip)
  • 4.1.1(Jun 16, 2016)

    • Fix for Uncore handling for EP/EN/EX systems
    • Minor fix for Uncore handling on Intel desktop systems
    • Fix in generic readCounters function
    • Support for Intel Goldmont (untested)
    • Fixes for likwid-mpirun
    Source code(tar.gz)
    Source code(zip)
  • 4.1.0(May 19, 2016)

    • Support for Intel Skylake (Core + Uncore)
    • Support for Intel Broadwell (Core + Uncore)
    • Support for Intel Broadwell D (Core + Uncore)
    • Support for Intel Broadwell EP/EN/EX (Core + Uncore)
    • Support for Intel Airmont (Core)
    • Uncore support for Intel SandyBridge, IvyBridge and Haswell
    • Performance group and event set handling in library
    • Internal calculator for derived metrics
    • Improvement of Marker API
    • Get results/metrics of last measurement cycle
    • Fixed most memory leaks
    • Respect 'Intel PMU sharing guide'
    • Update of internal Lua to 5.3
    • More examples (C++11 threads,Cilk+, TBB)
    • Test suite for executables and library
    • Accuracy checker supports multiple CPUs
    • Security checked access daemon
    • Likwid-bench supports Integer benchmarks
    • Likwid-bench selects interation count automatically
    • Likwid-bench has new FMA related benchmarks
    • Likwid-mpirun supports SLURM job scheduler
    • Reintroduced tool likwid-features
    Source code(tar.gz)
    Source code(zip)
  • likwid-4.0.1(Jul 23, 2015)

    • likwid-bench: Iteration determination is done serially
    • likwid-bench: Manual selection of iterations possible
    • likwid-perfctr: Set cpuset to all CPUs not only the first
    • likwid-pin: Set cpuset to all CPUs not only the first
    • likwid-accuracy.py: Enhanced plotting functions, use only instrumented likwid-bench
    • likwid-accessD: Check for allowed register for PCI accesses
    • Add models HASWELL_M1 (0x45) and HASWELL_M2 (0x46) to likwid-powermeter and likwid-accessD
    • New test application using Cilk and Marker API
    • New test application using C++11 threads and Marker API
    • likwid-agent: gmetric version check for --group option and s/\s*/_/ in metric names
    • likwid-powermeter: Print RAPL domain name
    • Marker API: Initialize access already at likwid_markerInit()
    • Marker API: likwid_markerThreadInit() only pins if not already pinned
    Source code(tar.gz)
    Source code(zip)
  • likwid-4.0.0(Jun 17, 2015)

    After about one year of development, We are happy to announce the newest version of LIKWID.

    The features of LIKWID 4.0.0:

    • Support for Intel Broadwell
    • Uncore support for all Uncore-aware architectures
      • Nehalem (EX)
      • Westmere (EX)
      • SandyBridge EP
      • IvyBridge EP
      • Haswell EP
    • Measure multiple event sets in a round-robin fashion (no multiplexing!)
    • Event options to filter the counter increments
    • Whole LIKWID functionality is exposed as API for C/C++ and Lua
    • New functions in the Marker API to switch event sets and get intermediate results
    • Topology code relies on hwloc. CPUID is still included but only as fallback
    • Most LIKWID applications are written in Lua (only exception likwid-bench)
    • Monitoring daemon likwid-agent with multiple output backends
    • More performance groups
    Source code(tar.gz)
    Source code(zip)
Performance analysis tools based on Linux perf_events (aka perf) and ftrace

perf-tools A miscellaneous collection of in-development and unsupported performance analysis tools for Linux ftrace and perf_events (aka the "perf" co

Brendan Gregg 8.8k Dec 28, 2022
HopLa Burp Suite Extender plugin - Adds autocompletion support and useful payloads in Burp Suite

HopLa ?? All the power of PayloadsAllTheThings, without the overhead. This extension adds autocompletion support and useful payloads in Burp Suite to

Synacktiv 522 Dec 24, 2022
Multi-DBMS SQL Benchmarking Framework via JDBC

BenchBase BenchBase (formerly OLTPBench) is a Multi-DBMS SQL Benchmarking Framework via JDBC. Table of Contents Quickstart Description Usage Guide Con

CMU Database Group 213 Dec 29, 2022
The Apache Software Foundation 605 Dec 30, 2022
an open source solution to application performance monitoring for java server applications

Stagemonitor is a Java monitoring agent that tightly integrates with time series databases like Elasticsearch, Graphite and InfluxDB to analyze graphe

stagemonitor 1.7k Dec 30, 2022
APM, Application Performance Monitoring System

Apache SkyWalking SkyWalking: an APM(application performance monitor) system, especially designed for microservices, cloud native and container-based

The Apache Software Foundation 21k Jan 9, 2023
A suite of software tools and services created to support activity planning and sequencing needs of missions with modeling, simulation, scheduling and validation capabilities

Aerie A suite of software tools and services created to support activity planning and sequencing needs of missions with modeling, simulation, scheduli

NASA Advanced Multi-Mission Operations System 31 Jan 3, 2023
uniVocity-parsers is a suite of extremely fast and reliable parsers for Java. It provides a consistent interface for handling different file formats, and a solid framework for the development of new parsers.

Welcome to univocity-parsers univocity-parsers is a collection of extremely fast and reliable parsers for Java. It provides a consistent interface for

univocity 874 Dec 15, 2022
The JTS Topology Suite is a Java library for creating and manipulating vector geometry.

JTS Topology Suite The JTS Topology Suite is a Java library for creating and manipulating vector geometry. It also provides a comprehensive set of geo

LocationTech 1.5k Jan 6, 2023
GreenMail is an open source, intuitive and easy-to-use test suite of email servers for testing purposes.

GreenMail GreenMail is an open source, intuitive and easy-to-use test suite of email servers for testing purposes. Supports SMTP, POP3, IMAP with SSL

null 529 Dec 28, 2022
Copy Regex Matches is a Burp Suite plugin to copy regex matches from selected requests and/or responses to the clipboard.

Copy Regex Matches Copy Regex Matches is a Burp Suite plugin to copy regex matches from selected requests and/or responses to the clipboard. Install D

null 28 Dec 2, 2022
A Java 8+ Jar & Android APK Reverse Engineering Suite (Decompiler, Editor, Debugger & More)

Bytecode Viewer Bytecode Viewer - a lightweight user friendly Java Bytecode Viewer. New Features WAR & JSP Loading JADX-Core Decompiler Fixed APK & de

Kalen (Konloch) Kinloch 13.5k Jan 7, 2023
A complete 3D game development suite written purely in Java.

jMonkeyEngine jMonkeyEngine is a 3-D game engine for adventurous Java developers. It’s open-source, cross-platform, and cutting-edge. 3.2.4 is the lat

jMonkeyEngine 3.3k Dec 31, 2022
Ultimate Component Suite for JavaServer Faces

PrimeFaces This is an overview page, please visit PrimeFaces.org for more information. Overview PrimeFaces is one of the most popular UI libraries in

PrimeFaces 1.5k Jan 3, 2023
a Business Process Management (BPM) Suite

Quick Links Homepage: http://jbpm.org/ Business Applications: https://start.jbpm.org/ Documentation: https://docs.jboss.org/jbpm/release/latestFinal/j

KIE (Drools, OptaPlanner and jBPM) 1.4k Jan 2, 2023
Stanford CoreNLP: A Java suite of core NLP tools.

Stanford CoreNLP Stanford CoreNLP provides a set of natural language analysis tools written in Java. It can take raw human language text input and giv

Stanford NLP 8.8k Jan 9, 2023
Conformance test suite for OpenShift

Origin Kubernetes This repo was previously the core Kubernetes tracking repo for OKD, and where OpenShift's hyperkube and openshift-test binaries were

OpenShift 8.2k Jan 4, 2023
OAUTHScan is a Burp Suite Extension written in Java with the aim to provide some automatic security checks

OAUTHScan is a Burp Suite Extension written in Java with the aim to provide some automatic security checks, which could be useful during penetration testing on applications implementing OAUTHv2 and OpenID standards.

Maurizio S 163 Nov 29, 2022
A Forge mod based on the EssentialsX plugin suite

A Forge mod based on the EssentialsX plugin suite. Wiki (WIP) | Discord (WIP) | Website (WIP) | | | | Disclaimer: Mod is in early development. Note: I

Gigawhat 6 Aug 4, 2022
Zuul is a gateway service that provides dynamic routing, monitoring, resiliency, security, and more.

Zuul Zuul is an L7 application gateway that provides capabilities for dynamic routing, monitoring, resiliency, security, and more. Please view the wik

Netflix, Inc. 12.4k Jan 3, 2023