The New Official Aparapi: a framework for executing native Java and Scala code on the GPU.

Overview

pipeline status coverage report codecov Codacy Badge License SemVer Javadocs Maven Central Gitter

A framework for executing native Java code on the GPU.

Licensed under the Apache Software License v2

Aparapi allows developers to write native Java code capable of being executed directly on a graphics card GPU by converting Java byte code to an OpenCL kernel dynamically at runtime. Because it is backed by OpenCL Aparapi is compatible with all OpenCL compatible Graphics Cards.

A GPU has a unique architecture that causes them to behave differently than a CPU. One of the most noticeable differences is that while a typical CPU has less than a dozen cores a high end GPU may have hundreds of cores. This makes them uniquely suited for data-parallel computation that can result in speedups hundreds of times more than what is capable with your average CPU. This can mean the difference between needing a whole data center to house your application versus just one or two computers, potentially saving millions in server costs.

Aparapi was originally a project conceived and developed by AMD corporation. It was later abandoned by AMD and sat mostly idle for several years. Despite this there were some failed efforts by the community to keep the project alive, but without a clear community leader no new releases ever came. Eventually we came along and rescued the project. Finally after such a long wait the first Aparapi release in 5 years was published and the community continues to push forward with renewed excitement.

Below you will find two side-by-side comparisons for the nbody problem on a CPU vs a GPU. The simulation is being run on an inexpensive graphics card; you can even run it yourself from the examples project. Its obvious the drastic performance gains that can be achieved with Aparapi.

NBody GPU NBody CPU
GPU Accelerated CPU Multi-threaded (8 cores)

Donating

Beerpay Beerpay

As an open-source project we run entierly off donations. Buy one of our hardworking developers a beer by donating here. All donations go to our bounty fund and allow us to place bounties on important bugs and enhancements. You may also use Beerpay linked via the above badges.

Support and Documentation

This project is officially hosted on QOTO GitLab here however an up-to-date mirror is also maintained on Github here.

Aparapi Javadocs: latest - 2.0.0 - 1.10.0 - 1.9.0 - 1.8.0 - 1.7.0 - 1.6.0 - 1.5.0 - 1.4.1 - 1.4.0 - 1.3.4 - 1.3.3 - 1.3.2 - 1.3.1 - 1.3.0 - 1.2.0 - 1.1.2 - 1.1.1 - 1.1.0 - 1.0.0

For detailed documentation see Aparapi.com or check out the latest Javadocs.

For support please use Gitter or the official Aparapi mailing list and Discourse forum.

Please file bugs and feature requests on QOTO GitLab our old archived issues can still be viewed on Github as well.

Aparapi conforms to the Semantic Versioning 2.0.0 standard. That means the version of a release isnt arbitrary but rather describes how the library interfaces have changed. Read more about it at the Semantic Versioning page.

Related Projects

This particular repository only represents the core Java library. There are several other related repositories worth taking a look at.

  • Aparapi Examples - A collection of Java examples to showcase the Aparapi library and help developers get started.
  • Aparapi JNI - A java JAR which embeds and loads the native components at runtime. This prevents the need to seperately install the Aparapi Native library.
  • Aparapi Native - The native library component. Without this the Java library can't talk to the graphics card. This is not a java project but rather a C/C++ project.
  • Aparapi Vagrant - A vagrant environment for compiling aparapi native libraries for linux, both x86 an x64.
  • Aparapi Website - Source for the Aparapi website as hosted at http://aparapi.com. The site also contains our detailed documentation.

Prerequisites

Aparapi will run as-is on the CPU, however in order to access the GPU it requires OpenCL to be installed on the local system. If OpenCL isnt found then the library will just fallback to CPU mode. Aparapi supports, and has been tested on, both OpenCL 1.2, OpenCL 2.0, and OpenCL 2.1.

Aparapi runs on all operating systems and platforms, however GPU acceleration support is currently provided for the following platforms: Windows 64bit, Windows 32bit, Mac OSX 64bit, Linux 64bit, and Linux 32bit.

Note: It is no longer required to manually install the Aparapi JNI native interface, this is now done automatically through maven as a dependency on Aparapi.

Java Dependency

To include Aparapi in your project of choice include the following Maven dependency into your build.

<dependency>
    <groupId>com.aparapi</groupId>
    <artifactId>aparapi</artifactId>
    <version>2.0.0</version>
</dependency>

Obtaining the Source

The official source repository for Aparapi is located in the Syncleus Github repository and can be cloned using the following command.

git clone https://git.qoto.org/aparapi/aparapi.git

Getting Started

With Aparapi we can take a sequential loop such as this (which adds each element from inA and inB arrays and puts the result in result).

final float inA[] = .... // get a float array of data from somewhere
final float inB[] = .... // get a float array of data from somewhere
assert (inA.length == inB.length);
final float result = new float[inA.length];

for (int i = 0; i < array.length; i++) {
    result[i] = inA[i] + inB[i];
}

And refactor the sequential loop to the following form:

Kernel kernel = new Kernel() {
    @Override
    public void run() {
        int i = getGlobalId();
        result[i] = inA[i] + inB[i];
    }
};

Range range = Range.create(result.length);
kernel.execute(range);
Comments
  • Fix and Update: Fix issue #62 and provide new API for kernel profiling under multithreading (refs #62)

    Fix and Update: Fix issue #62 and provide new API for kernel profiling under multithreading (refs #62)

    Fixes issue #62 - SEVERE log level messages for the Aparapi kernel profiling functionality, due to missing multithreading safeness when multiple threads execute the same kernel class on the same device. Updates Aparapi API for kernel profiling functionality providing thread safe variants while maintaining backwards compatibility.

    bug enhancement bounty $$$ 
    opened by CoreRasurae 39
  • Can I take over development of this project?

    Can I take over development of this project?

    From @freemo on October 17, 2016 15:20

    Hi, my project has a pressing need to rely on aparapi and as such I have been contributing on the project in my own repositories. Since I will be contributing significant amount of work I'd like to contribute back that effort. You are of course welcome to adopt the project back into yours but you will find a lot has changed and that may be difficult at this point, but please feel free. If not perhaps the team would like to consider moving over to my new repositories for future effort? I am willing to discuss alternatives as well.

    Here is a recap of what I did so far and where the new repositories can be found.

    Everything has been mavinized!

    All new code is licensed under the apache license. Also since AMD is no longer the maintainer I changed the root package across all the projects.

    I pulled out the core java library. This is where all the platform independent code lives and ultimately produces the jar that will be used as the dependencies. This is now its own repository and can be found here:

    https://github.com/Syncleus/aparapi

    I pulled out all the platform specific code, the JNI layer, into its own repository. This no longer uses ant, as the project has been mavenized. But it isnt a Java project either, so it doesnt use maven. It has been refactored to use autotools, which is a platform independent way to compile shared libraries. It currently only compiles for linux platforms though. The code for the JNI libraries can now be found here, it uses submodules so please clone recursively:

    https://github.com/Syncleus/aparapi-jni

    I also created an Archlinux AUR and an unofficial binary repository for the aparapi system-specific shared library. This will allow installation of the aparapi shared library using the archlinux package management system, and uninstallation as well. The AUR can be found here:

    https://github.com/Syncleus/aparapi-archlinux

    To add the unofficial repository to archlinux for use with pacman then add the following line to your /etc/pacman.conf file, before all the other repositories:

    [aparapi]
    SigLevel = Optional TrustAll
    Server = http://syncleus.com/aparapi-archlinux-repo/
    

    The examples are now also a separate repository, as it isnt really needed for the library itself. This has the life sample mavenized and working but still need to mavenize the rest of the examples. This repo can be found here:

    https://github.com/Syncleus/aparapi-examples

    Finally I purchased the aparapi.com domain name so I can start hosting some useful information about it and host some files there. I also have an account on maven central so I will soon be able to upload aparapi there as it is now in a state where it can be consumed as a dependency (since it is completely mavenized).

    So let me know what you guys think about joining efforts and perhaps moving the development over to the new repositories?

    Copied from original issue: aparapi/aparapi#39

    opened by freemo 26
  • Nvidia RTX 3080 GPU not detected

    Nvidia RTX 3080 GPU not detected

    Hi,

    Aparapi is not detecting my gpu even though I have OpenCL installed. When I run these two lines:

    Device device = Device.firstGPU();
    Range range = device.createRange(size);
    

    It gives me null pointer exception which proves that GPU is not detected.

    GPU: Nvidia RTX 3080 OpenCL version: 3.0 Operating System: Pop OS (Ubuntu)

    CudaSquare.java:

    /*
     * Copyright (C) 2021 - present by EY LLP. and EYQP team. 
     * Any change of "copy" of code without author permission is not allowed.
     * This is NOT a open source project, DONOT copy any Code.
     * For QuantLib please see LICENSCE.TXT provided. 
    */
    package cudaProgramming.parallel;
    
    import com.aparapi.Kernel;
    import com.aparapi.Range;
    import com.aparapi.device.Device;
    
    /**
     * An example Aparapi application which computes and displays squares of a set
     * of 512 input values. While executing on GPU using Aparpi framework, each
     * square value is computed in a separate kernel invocation and can thus
     * maximize performance by optimally utilizing all GPU computing units
     *
     * @author gfrost
     * @version $Id: $Id
     */
    public class CudaSquare {
    
    	/**
    	 * <p>
    	 * main.
    	 * </p>
    	 *
    	 * @param _args an array of {@link java.lang.String} objects.
    	 */
    	public static void main(String[] _args) {
    		// loop must be multiple of 8 or 8 bit
    		final int size = 160;
    
    		/** Input float array for which square values need to be computed. */
    		final float[] values = new float[size];
    
    		/** Initialize input array. */
    		for (int i = 0; i < size; i++) {
    			values[i] = i;
    		}
    
    		/**
    		 * Output array which will be populated with square values of corresponding
    		 * input array elements.
    		 */
    		final float[] squares = new float[size];
    
    		/**
    		 * Aparapi Kernel which computes squares of input array elements and populates
    		 * them in corresponding elements of output array.
    		 **/
    		Kernel kernel = new Kernel() {
    			@Override
    			public void run() {
    				int gid = getGlobalId();
    				squares[gid] = values[gid] * values[gid];
    			}
    		};
    
    		Device device = Device.firstGPU();
    		Range range = device.createRange(size);
    
    		// Execute Kernel.
    
    		kernel.execute(range);
    
    		// Report target execution mode: GPU or JTP (Java Thread Pool).
    		System.out.println("Device = " + kernel.getTargetDevice().getShortDescription());
    
    		// Display computed square values.
    		for (int i = 0; i < size; i++) {
    			System.out.printf("%6.0f %8.0f\n", values[i], squares[i]);
    		}
    
    		// Dispose Kernel resources.
    		kernel.dispose();
    	}
    
    }
    

    Output of /usr/bin/clinfo:

    Number of platforms                               1
      Platform Name                                   NVIDIA CUDA
      Platform Vendor                                 NVIDIA Corporation
      Platform Version                                OpenCL 3.0 CUDA 11.7.89
      Platform Profile                                FULL_PROFILE
      Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_opaque_fd cl_khr_external_memory_opaque_fd
      Platform Extensions with Version                cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                      cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                      cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                      cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                      cl_khr_fp64                                                      0x400000 (1.0.0)
                                                      cl_khr_3d_image_writes                                           0x400000 (1.0.0)
                                                      cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                      cl_khr_icd                                                       0x400000 (1.0.0)
                                                      cl_khr_gl_sharing                                                0x400000 (1.0.0)
                                                      cl_nv_compiler_options                                           0x400000 (1.0.0)
                                                      cl_nv_device_attribute_query                                     0x400000 (1.0.0)
                                                      cl_nv_pragma_unroll                                              0x400000 (1.0.0)
                                                      cl_nv_copy_opts                                                  0x400000 (1.0.0)
                                                      cl_nv_create_buffer                                              0x400000 (1.0.0)
                                                      cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                      cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                      cl_khr_device_uuid                                               0x400000 (1.0.0)
                                                      cl_khr_pci_bus_info                                              0x400000 (1.0.0)
                                                      cl_khr_external_semaphore                                          0x9000 (0.9.0)
                                                      cl_khr_external_memory                                             0x9000 (0.9.0)
                                                      cl_khr_external_semaphore_opaque_fd                                0x9000 (0.9.0)
                                                      cl_khr_external_memory_opaque_fd                                   0x9000 (0.9.0)
      Platform Numeric Version                        0xc00000 (3.0.0)
      Platform Extensions function suffix             NV
      Platform Host timer resolution                  0ns
    
      Platform Name                                   NVIDIA CUDA
    Number of devices                                 1
      Device Name                                     NVIDIA GeForce RTX 3080 Laptop GPU
      Device Vendor                                   NVIDIA Corporation
      Device Vendor ID                                0x10de
      Device Version                                  OpenCL 3.0 CUDA
      Device UUID                                     808d8c70-1aef-49f3-cc51-44aaea760720
      Driver UUID                                     808d8c70-1aef-49f3-cc51-44aaea760720
      Valid Device LUID                               No
      Device LUID                                     6d69-637300000000
      Device Node Mask                                0
      Device Numeric Version                          0xc00000 (3.0.0)
      Driver Version                                  515.48.07
      Device OpenCL C Version                         OpenCL C 1.2 
      Device OpenCL C all versions                    OpenCL C                                                         0x400000 (1.0.0)
                                                      OpenCL C                                                         0x401000 (1.1.0)
                                                      OpenCL C                                                         0x402000 (1.2.0)
                                                      OpenCL C                                                         0xc00000 (3.0.0)
      Device OpenCL C features                        __opencl_c_fp64                                                  0xc00000 (3.0.0)
                                                      __opencl_c_images                                                0xc00000 (3.0.0)
                                                      __opencl_c_int64                                                 0xc00000 (3.0.0)
                                                      __opencl_c_3d_image_writes                                       0xc00000 (3.0.0)
      Latest comfornace test passed                   v2021-02-01-00
      Device Type                                     GPU
      Device Topology (NV)                            PCI-E, 0000:01:00.0
      Device Profile                                  FULL_PROFILE
      Device Available                                Yes
      Compiler Available                              Yes
      Linker Available                                Yes
      Max compute units                               48
      Max clock frequency                             1710MHz
      Compute Capability (NV)                         8.6
      Device Partition                                (core)
        Max number of sub-devices                     1
        Supported partition types                     None
        Supported affinity domains                    (n/a)
      Max work item dimensions                        3
      Max work item sizes                             1024x1024x64
      Max work group size                             1024
      Preferred work group size multiple (device)     32
      Preferred work group size multiple (kernel)     32
      Warp size (NV)                                  32
      Max sub-groups per work group                   0
      Preferred / native vector sizes                 
        char                                                 1 / 1       
        short                                                1 / 1       
        int                                                  1 / 1       
        long                                                 1 / 1       
        half                                                 0 / 0        (n/a)
        float                                                1 / 1       
        double                                               1 / 1        (cl_khr_fp64)
      Half-precision Floating-point support           (n/a)
      Single-precision Floating-point support         (core)
        Denormals                                     Yes
        Infinity and NANs                             Yes
        Round to nearest                              Yes
        Round to zero                                 Yes
        Round to infinity                             Yes
        IEEE754-2008 fused multiply-add               Yes
        Support is emulated in software               No
        Correctly-rounded divide and sqrt operations  Yes
      Double-precision Floating-point support         (cl_khr_fp64)
        Denormals                                     Yes
        Infinity and NANs                             Yes
        Round to nearest                              Yes
        Round to zero                                 Yes
        Round to infinity                             Yes
        IEEE754-2008 fused multiply-add               Yes
        Support is emulated in software               No
      Address bits                                    64, Little-Endian
      Global memory size                              16908353536 (15.75GiB)
      Error Correction support                        No
      Max memory allocation                           4227088384 (3.937GiB)
      Unified memory for Host and Device              No
      Integrated memory (NV)                          No
      Shared Virtual Memory (SVM) capabilities        (core)
        Coarse-grained buffer sharing                 Yes
        Fine-grained buffer sharing                   No
        Fine-grained system sharing                   No
        Atomics                                       No
      Minimum alignment for any data type             128 bytes
      Alignment of base address                       4096 bits (512 bytes)
      Preferred alignment for atomics                 
        SVM                                           0 bytes
        Global                                        0 bytes
        Local                                         0 bytes
      Atomic memory capabilities                      relaxed, work-group scope
      Atomic fence capabilities                       relaxed, acquire/release, work-group scope
      Max size for global variable                    0
      Preferred total size of global vars             0
      Global Memory cache type                        Read/Write
      Global Memory cache size                        1376256 (1.312MiB)
      Global Memory cache line size                   128 bytes
      Image support                                   Yes
        Max number of samplers per kernel             32
        Max size for 1D images from buffer            268435456 pixels
        Max 1D or 2D image array size                 2048 images
        Max 2D image size                             32768x32768 pixels
        Max 3D image size                             16384x16384x16384 pixels
        Max number of read image args                 256
        Max number of write image args                32
        Max number of read/write image args           0
      Pipe support                                    No
      Max number of pipe args                         0
      Max active pipe reservations                    0
      Max pipe packet size                            0
      Local memory type                               Local
      Local memory size                               49152 (48KiB)
      Registers per block (NV)                        65536
      Max number of constant args                     9
      Max constant buffer size                        65536 (64KiB)
      Generic address space support                   No
      Max size of kernel argument                     4352 (4.25KiB)
      Queue properties (on host)                      
        Out-of-order execution                        Yes
        Profiling                                     Yes
      Device enqueue capabilities                     (n/a)
      Queue properties (on device)                    
        Out-of-order execution                        No
        Profiling                                     No
        Preferred size                                0
        Max size                                      0
      Max queues on device                            0
      Max events on device                            0
      Prefer user sync for interop                    No
      Profiling timer resolution                      1000ns
      Execution capabilities                          
        Run OpenCL kernels                            Yes
        Run native kernels                            No
        Non-uniform work-groups                       No
        Work-group collective functions               No
        Sub-group independent forward progress        No
        Kernel execution timeout (NV)                 Yes
      Concurrent copy and kernel execution (NV)       Yes
        Number of async copy engines                  2
        IL version                                    (n/a)
        ILs with version                              <printDeviceInfo:186: get CL_DEVICE_ILS_WITH_VERSION : error -30>
      printf() buffer size                            1048576 (1024KiB)
      Built-in kernels                                (n/a)
      Built-in kernels with version                   <printDeviceInfo:190: get CL_DEVICE_BUILT_IN_KERNELS_WITH_VERSION : error -30>
      Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_opaque_fd cl_khr_external_memory_opaque_fd
      Device Extensions with Version                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                      cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                      cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                      cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                      cl_khr_fp64                                                      0x400000 (1.0.0)
                                                      cl_khr_3d_image_writes                                           0x400000 (1.0.0)
                                                      cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                      cl_khr_icd                                                       0x400000 (1.0.0)
                                                      cl_khr_gl_sharing                                                0x400000 (1.0.0)
                                                      cl_nv_compiler_options                                           0x400000 (1.0.0)
                                                      cl_nv_device_attribute_query                                     0x400000 (1.0.0)
                                                      cl_nv_pragma_unroll                                              0x400000 (1.0.0)
                                                      cl_nv_copy_opts                                                  0x400000 (1.0.0)
                                                      cl_nv_create_buffer                                              0x400000 (1.0.0)
                                                      cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                      cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                      cl_khr_device_uuid                                               0x400000 (1.0.0)
                                                      cl_khr_pci_bus_info                                              0x400000 (1.0.0)
                                                      cl_khr_external_semaphore                                          0x9000 (0.9.0)
                                                      cl_khr_external_memory                                             0x9000 (0.9.0)
                                                      cl_khr_external_semaphore_opaque_fd                                0x9000 (0.9.0)
                                                      cl_khr_external_memory_opaque_fd                                   0x9000 (0.9.0)
    
    NULL platform behavior
      clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform
      clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform
      clCreateContext(NULL, ...) [default]            No platform
      clCreateContext(NULL, ...) [other]              Success [NV]
      clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  No platform
      clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
      clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No platform
      clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
      clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  Invalid device type for platform
      clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  No platform
    
    
    opened by Raunak-Singh-Inventor 17
  • Update: Add support for Local arguments in kernel functions (refs #79)

    Update: Add support for Local arguments in kernel functions (refs #79)

    Allows kernel functions to be called having as arguments arrays in local memory. Such arguments are designated by @Local annotation or by the _$local$ suffix name. As described in Issue 79 by the provided sample code.

    A test case was added validate this implementation.

    enhancement 
    opened by CoreRasurae 17
  • [BOUNTY $25] JVM Crash: SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED

    [BOUNTY $25] JVM Crash: SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED

    Hmm when I create more then one Kernel from the same class then I suddenly run into such messages like below. I could not find anything in the documentation that this should not be supported. Basically I have an object pool of a limited number of Kernels and different threads can aquire those limited resources. But this results in a JVM crash.

    Thanks KIC

    PS you could find a runnable version at: https://github.com/KIC/LPPL the problem arises at kic.lppl.SornetteTest

    Connected to the target VM, address: '127.0.0.1:64263', transport: 'socket'
    Oct 15, 2017 9:46:04 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
    SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
    Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
    SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
    Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
    SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
    Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
    SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
    Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
    SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
    Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
    SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
    Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
    SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
    Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
    SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
    Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
    SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
    Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
    SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
    Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
    SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
    Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
    SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
    Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
    SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
    Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
    SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
    #
    # A fatal error has been detected by the Java Runtime Environment:
    #
    #  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00007ffe652ebbdf, pid=18076, tid=0x0000000000004310
    #
    # JRE version: Java(TM) SE Runtime Environment (8.0_102-b14) (build 1.8.0_102-b14)
    # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.102-b14 mixed mode windows-amd64 compressed oops)
    # Problematic frame:
    # C  [ntdll.dll+0x3bbdf]
    #
    # Failed to write core dump. Minidumps are not enabled by default on client versions of Windows
    #
    # An error report file with more information is saved as:
    # C:\Users\xxxx\sources\kic\dataframe\LPPL\hs_err_pid18076.log
    #
    # If you would like to submit a bug report, please visit:
    #   http://bugreport.java.com/bugreport/crash.jsp
    # The crash happened outside the Java Virtual Machine in native code.
    # See problematic frame for where to report the bug.
    #
    
    bug bounty $$$ 
    opened by KIC 17
  • mavenize?

    mavenize?

    From @m0wfo on May 31, 2015 20:52

    Any chance you guys could publish releases on Maven Central? It'd help a whole lot. I can send a PR w/ any necessary changes if it's of interest.

    Copied from original issue: aparapi/aparapi#8

    opened by freemo 15
  • Exception UnsatisfiedLinkError thrown when running on Windows.

    Exception UnsatisfiedLinkError thrown when running on Windows.

    The following exception occurs when trying to run Aparapi on windows. The exception never seems to occur on linux or mac.

    Exception in thread "main" java.lang.UnsatisfiedLinkError: C:\Users\Vincenzo\AppData\Local\Temp\libaparapi_x86_643218484219112159561.dll: Can't find dependent libraries
    at java.lang.ClassLoader$NativeLibrary.load(Native Method)
    at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1941)
    at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1824)
    at java.lang.Runtime.load0(Runtime.java:809)
    at java.lang.System.load(System.java:1086)
    at com.aparapi.natives.util.NativeUtils.loadLibraryFromJar(NativeUtils.java:100)
    at com.aparapi.natives.NativeLoader.load(NativeLoader.java:42)
    at com.aparapi.internal.opencl.OpenCLLoader.<clinit>(OpenCLLoader.java:43)
    at com.aparapi.internal.opencl.OpenCLPlatform.getOpenCLPlatforms(OpenCLPlatform.java:73)
    at com.aparapi.device.OpenCLDevice.listDevices(OpenCLDevice.java:458)
    at com.aparapi.internal.kernel.KernelManager.createDefaultPreferredDevices(KernelManager.java:203)
    at com.aparapi.internal.kernel.KernelManager.createDefaultPreferences(KernelManager.java:178)
    at com.aparapi.internal.kernel.KernelManager.<init>(KernelManager.java:46)
    at com.aparapi.internal.kernel.KernelManager.<clinit>(KernelManager.java:38)
    at com.aparapi.internal.kernel.KernelRunner.<init>(KernelRunner.java:170)
    at com.aparapi.Kernel.prepareKernelRunner(Kernel.java:2270)
    at com.aparapi.Kernel.execute(Kernel.java:2439)
    at com.aparapi.Kernel.execute(Kernel.java:2396)
    at com.aparapi.Kernel.execute(Kernel.java:2371)
    
    opened by freemo 14
  • [BOUNTY $25] JVM crash when using multi-dimensional local arrays

    [BOUNTY $25] JVM crash when using multi-dimensional local arrays

    This may be a user error, since I'm relatively new to both Aparapi and OpenCL, but it seems to me that there is a problem related to multidimensional local arrays.

    When I'm executing the kernel shown at https://github.com/raner/top.java.matrix/blob/syncleus-aparapi-issue-51/src/main/java/top/java/matrix/fast/TiledFastMatrix.java#L72 on my MacBook Air (Mac OS X 10.11, Intel HD Graphics 6000 with 48 execution units), I'm consistently getting a JVM crash:

    # A fatal error has been detected by the Java Runtime Environment:
    #
    #  SIGSEGV (0xb) at pc=0x000000011b42650c, pid=88211, tid=5891
    #
    # JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build 1.8.0_60-b27)
    # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode bsd-amd64 compressed oops)
    # Problematic frame:
    # C  [libaparapi_x86_643808666101765060573.dylib+0xd50c]  KernelArg::setLocalBufferArg(JNIEnv_*, int, int, bool)+0x3c
    

    The relevant snippet in the generated OpenCL code seems to be

                int tiledRow = (this->val$TILE_SIZE * tile) + localRow;
                int tiledColumn = (this->val$TILE_SIZE * tile) + localColumn;
                (&this->tileA[localColumn * this->tileA__javaArrayDimension0])[localRow]  = this->val$A[((tiledColumn * this->val$numberOfRows) + row)];
                (&this->tileB[localColumn * this->tileB__javaArrayDimension0])[localRow]  = this->val$B[((column * this->val$numberOfColumns) + tiledRow)];
                barrier(CLK_LOCAL_MEM_FENCE);
                for (int repeat = 0; repeat<this->val$TILE_SIZE; repeat++){
                   value = value + ((&this->tileA[repeat * this->tileA__javaArrayDimension0])[localRow] * (&this->tileB[localColumn * this->tileB__javaArrayDimension0])[repeat]);
                }
                barrier(CLK_LOCAL_MEM_FENCE);
    

    I noticed that it refers to the arrays' dimensions as tile…__javaArrayDimension0 to calculate the proper index into the two-dimensional array, which, I believe, is what necessitates the earlier invocation of KernelArg::setLocalBufferArg. Anyway, I didn't have the time to do a deep dive on this issue, but when I change the arrays to one-dimensional arrays and perform the index calculation myself (as shown in https://github.com/raner/top.java.matrix/commit/cb4988d), the code will work correctly.

    bug bounty $$$ 
    opened by raner 13
  • Kernel.dispose always fail in openjdk12

    Kernel.dispose always fail in openjdk12

    Every call of Kernel.dispose fails in openjdk-12 and in openjdk-11 with error:

    # A fatal error has been detected by the Java Runtime Environment:
    #
    #  SIGSEGV (0xb) at pc=0x00007f9c2c4481ac, pid=21273, tid=21274
    #
    # JRE version: OpenJDK Runtime Environment (12.0.2+9) (build 12.0.2+9)
    # Java VM: OpenJDK 64-Bit Server VM (12.0.2+9, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
    # Problematic frame:
    # V  [libjvm.so+0xc201ac]  OopStorage::Block::release_entries(unsigned long, OopStorage*)+0x3c
    
     #
    # A fatal error has been detected by the Java Runtime Environment:
    #
    #  SIGSEGV (0xb) at pc=0x00007f8aebbacf4c, pid=23028, tid=23032
    #
    # JRE version: OpenJDK Runtime Environment (11.0.4+11) (build 11.0.4+11)
    # Java VM: OpenJDK 64-Bit Server VM (11.0.4+11, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
    # Problematic frame:
    # V  [libjvm.so+0xc25f4c]  OopStorage::Block::release_entries(unsigned long, OopStorage::Block* volatile*)+0x3c
    
    
    

    , in openjdk-8 works well.

    opened by AlexanderFedyukov 12
  • Fix: Issue #101 possible deadlock when running in JTP - threads below…

    Fix: Issue #101 possible deadlock when running in JTP - threads below…

    … the FJSafeCyclicBarrier number of parties (refs #101)

    There a bug since at least Aparapi 1.4.1 that may cause a deadlock when running in Java Thread Pool mode. At the end of KernelRunner a FJSafeCyclicBarrier is used and is set to the group size, however if the number of threads able to run is less than the group size a deadlock could occur. Also introduced a mechanism to monitor if pool threads of the ForkJoinPool die they are now expected to be logged.

    opened by CoreRasurae 12
  • Fix: Memory leak by not disposing kernel in unit tests of feature #79 plus code quality #81

    Fix: Memory leak by not disposing kernel in unit tests of feature #79 plus code quality #81

    Feature #79: Renames the unit test LocalArrayArgsIssue79Test to Issue79LocalArrayArgsTest, so that it is no longer skipped, plus removes the memory leak by ensuring the kernel is disposed at the end.

    Feature #81: Improves code quality by logically splitting long unit test class into two unit test classes, one with the base validations and another with the advanced validations. Also KernelRunner new methods are refactored to reduce code complexity by extracting methods, and reducing code duplication.

    chore 
    opened by CoreRasurae 12
  • Kernel overall local size

    Kernel overall local size

    Good afternoon.

    When I use the NVIDIA GeForce RTX 3060 Ti graphics card in java-code, I get an error: Kernel overall local size: 1000 exceeds maximum kernel allowed local size of: 256 failed Running the same code on an Intel HD Graphics 630 or AMD RadeonT R7 450 graphics card, everything works fine. If in this part of the code I put a number less than 256, then the code with the NVIDIA GeForce RTX 3060 Ti graphics card works fine:

    Range range = needDevice.createRange(255);
    kernel.execute(range)
    

    The NVIDIA GeForce RTX 3060 Ti video card is more modern than the Intel HD Graphics 630 or AMD RadeonT R7 450, but for some reason the parameter for createRange is less than for older video cards. What could be the problem?

    opened by forreg16 0
  • Build(deps): Bump scala-library from 2.13.6 to 2.13.9

    Build(deps): Bump scala-library from 2.13.6 to 2.13.9

    Bumps scala-library from 2.13.6 to 2.13.9.

    Release notes

    Sourced from scala-library's releases.

    Scala 2.13.9

    The following changes are highlights of this release:

    Compatibility with Scala 3

    • Tasty Reader: Add support for Scala 3.2 (#10068)
    • Tasty Reader: Restrict access to experimental definitions (#10020)
    • To aid cross-building, accept and ignore using in method calls (#10064 by @​som-snytt)
    • To aid cross-building, allow ? as a wildcard even without -Xsource:3 (#9990)
    • Make Scala-3-style implicit resolution explicitly opt-in rather than bundled in -Xsource:3 (#10012 by @​povder)
    • Prefer type of overridden member when inferring (under -Xsource:3) (#9891 by @​som-snytt)

    JDK version support

    Warnings and lints

    • Add -Wnonunit-statement to warn about discarded values in statement position (#9893 by @​som-snytt)
    • Make unused-import warnings easier to silence (support filtering by origin=) (#9939 by @​som-snytt)
    • Add -Wperformance lints for *Ref boxing and nonlocal return (#9889 by @​som-snytt)

    Language improvements

    • Improve support for Unicode supplementary characters in identifiers and string interpolation (#9805 by @​som-snytt)

    Compiler options

    Security

    • Error on source files with Unicode directional formatting characters (#10017)
    • Prevent Function0 execution during LazyList deserialization (#10118)

    Bugfixes

    • Emit all bridge methods non-final (perhaps affecting serialization compat) (#9976)
    • Fix null-pointer regression in Vector#prependedAll and Vector#appendedAll (#9983)
    • Improve concurrent behavior of Java ConcurrentMap wrapper (#10027 by @​igabaydulin)
    • Preserve null policy in wrapped Java Maps (#10129 by @​som-snytt)

    Changes that shipped in Scala 2.12.16 and 2.12.17 are also included in this release.

    For the complete 2.13.9 change lists, see all merged PRs and all closed bugs.

    Compatibility

    ... (truncated)

    Commits
    • 986dcc1 Merge pull request #10129 from som-snytt/followup/12586-preserve-NPE
    • b824b84 Preserve null policy in wrapped Java Map
    • d578a02 Merge pull request #10128 from SethTisue/revert-10114-10123
    • e5fe919 Revert "Args files are 1 arg per line, fix -Vprint-args -"
    • 362c5d1 Revert "Trim and filter empties in arg files"
    • 864148d Revert "process.Parser strips escaping backslash"
    • f69fe8b Merge pull request #10127 from scalacenter/tasty/support-3.2.0-final
    • 0aa6bd4 remove tasty escape hatch for 3.2.0-RC4
    • af56abc Merge pull request #10123 from som-snytt/dev/814-window-cmd-escapes
    • 7e844a5 Merge pull request #10121 from scala-steward/update/slf4j-nop-2.0.0
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Question: Support for Xeon Phi?

    Question: Support for Xeon Phi?

    I know it had been long since the Xeon Phi had been discontinued, and there is no value in any attempts to revive the Xeon Phi, my questions below are plainly for studying purposes.

    • Are Xeon Phi(s) still supported by the Aparapi project? If so, how long will it continue to receive support from this project?
    • What setups/configurations will be needed for the current Aparapi project to work on Xeon Phi(s)? What attempts should be taken?
    • If Aparapi still works on Xeon Phi(s), what's the performance that it would get? or at suspected to have?

    I'm really sorry to bother you all, the main reason that I created this issue is because I've recently gotten a Xeon Phi, and in my attempts to make it work, I've came across multiple Stack Overflow posts and forums mentioning that "JamVM" and "Aparapi" would work with the Xeon Phi.

    https://stackoverflow.com/questions/17309471/using-xeon-phi-with-jvm-based-language

    I've conducted hours of research but was only able to come up with a single pull request in the original Aparapi repo mentioning that the support for Xeon Phi was added.

    https://github.com/aparapi/aparapi/pull/1

    Any attempts to help and all responses will be highly appreciated. EDIT: Any other information or documentations regarding to the Xeon Phi would also be highly appreciated.

    opened by czhangdev 0
  • Can be added Kernel.put(short)?

    Can be added Kernel.put(short)?

    If use Kernel.put(char), then (short)char to get negative numbers will cost performance,Can be added Kernel.put(short)? And it would be nice to have the 8-bit unsigned integer values to kernel

    opened by yiulo 0
  • Aparapi previously working now only runs on CPU instead of GPU after driver update

    Aparapi previously working now only runs on CPU instead of GPU after driver update

    I recently did a fresh install of windows 10 and was using Aparapi with IntelliJ and the driver Windows 10 installed for my nvidia card automatically. It was working flawlessly and showing all the warnings correctly, but after installing CUDA 11.3 (and the Quadro drivers along with it), it doesn't even show the warnings that it fell back to CPU. I figured it was doing this just by looking at all my CPU's going to 100%. Checked the driver of the card details, and it shows OpenCL in there under C:\\WINDOWS\system32\OpenCL.dll and C:\\WINDOWS\SysWow64\OpenCL.dll but do not know what they were previously or if they changed, or if this is an issue.

    Is there anything else I need to check or do for this to work again? I tried uninstalling the driver and using the one from windows but it did not work any more. Not sure if uninstalling everything CUDA related would help.

    opened by maflores16 1
Releases(v3.0.0)
Owner
Syncleus
An open-source business intelligence company.
Syncleus
The missing bridge between Java and native C++

JavaCPP Commercial support: Introduction JavaCPP provides efficient access to native C++ inside Java, not unlike the way some C/C++ compilers interact

Bytedeco 4k Jan 8, 2023
Java Native Access

Java Native Access (JNA) The definitive JNA reference (including an overview and usage details) is in the JavaDoc. Please read the overview. Questions

Java Native Access 7.6k Jan 1, 2023
Jssembly is a library that allows you to execute native assembly from Java.

jssembly Jssembly is a library that allows you to execute native assembly from Java via a JNI bridge. The goal is to provide wrappers and parsers for

David Titarenco 121 Jun 3, 2022
A JNI code generator based on the JNI generator used by the eclipse SWT project

HawtJNI Description HawtJNI is a code generator that produces the JNI code needed to implement java native methods. It is based on the jnigen code gen

FuseSource 153 Nov 17, 2022
Java Abstracted Foreign Function Layer

jnr-ffi jnr-ffi is a Java library for loading native libraries without writing JNI code by hand, or using tools such as SWIG. Example package hellowor

The Java Native Runtime Project 1.1k Dec 31, 2022
Java Bindings for V8

J2V8 J2V8 is a set of Java bindings for V8. J2V8 focuses on performance and tight integration with V8. It also takes a 'primitive first' approach, mea

EclipseSource 2.3k Jan 4, 2023
Low-overhead, non-blocking I/O, external Process implementation for Java

NuProcess NuProcess is proud to power Facebook's Buck build. A low-overhead, non-blocking I/O, external Process execution implementation for Java. It

Brett Wooldridge 644 Dec 29, 2022
java deep learning algorithms and deep neural networks with gpu acceleration

Deep Neural Networks with GPU support Update This is a newer version of the framework, that I developed while working at ExB Research. Currently, you

Ivan Vasilev 1.2k Jan 6, 2023
A mod that tells what GPU you're using for Minecraft from the Menu screen and in your logs

A mod that tells what GPU you're using for Minecraft from the Menu screen and in your logs, useful to not need to launch the game constantly into singleplayer/multiplayer and joining to see from F3 screen.

devOS: Sanity Edition 3 Oct 16, 2022
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

ANTLR v4 Build status ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating

Antlr Project 13.6k Dec 28, 2022
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

ANTLR v4 Build status ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating

Antlr Project 13.6k Jan 3, 2023
Plugin for executing android intents from Capacitor app

capacitor-intent Plugin for executing android intents from Capacitor app Install npm install capacitor-intent npx cap sync API startActivity(...) Inte

Vladimír Fojtík 1 Jan 19, 2022
tinylog is a lightweight logging framework for Java, Kotlin, Scala, and Android

tinylog 2 Example import org.tinylog.Logger; public class Application { public static void main(String[] args) { Logger.info("Hello

tinylog.org 547 Dec 30, 2022
tinylog is a lightweight logging framework for Java, Kotlin, Scala, and Android

tinylog 2 Example import org.tinylog.Logger; public class Application { public static void main(String[] args) { Logger.info("Hello

tinylog.org 551 Jan 4, 2023
Spring Native provides beta support for compiling Spring applications to native executables using GraalVM native-image compiler.

Spring Native provides beta support for compiling Spring applications to native executables using GraalVM native-image compiler.

Spring Projects Experimental 2.8k Jan 6, 2023
Official React Native client for FingerprintJS PRO. 100% accurate device identification for fraud detection.

FingerprintJS PRO React Native Official React Native module for 100% accurate device identification, created for the FingerprintJS Pro Server API. Thi

FingerprintJS 26 Nov 22, 2022
Google Mr4c GNU Lesser 3 Google Mr4c MR4C is an implementation framework that allows you to run native code within the Hadoop execution framework. License: GNU Lesser 3, .

Introduction to the MR4C repo About MR4C MR4C is an implementation framework that allows you to run native code within the Hadoop execution framework.

Google 911 Dec 9, 2022
MixStack lets you connects Flutter smoothly with Native pages, supports things like Multiple Tab Embeded Flutter View, Dynamic tab changing, and more. You can enjoy a smooth transition from legacy native code to Flutter with it.

中文 README MixStack MixStack lets you connects Flutter smoothly with Native pages, supports things like Multiple Tab Embeded Flutter View, Dynamic tab

Yuewen Engineering 80 Dec 19, 2022
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Apache Zeppelin Documentation: User Guide Mailing Lists: User and Dev mailing list Continuous Integration: Contributing: Contribution Guide Issue Trac

The Apache Software Foundation 5.9k Jan 8, 2023