The New Official Aparapi: a framework for executing native Java and Scala code on the GPU.

Last update: Dec 29, 2022

Overview

A framework for executing native Java code on the GPU.

Licensed under the Apache Software License v2

Aparapi allows developers to write native Java code capable of being executed directly on a graphics card GPU by converting Java byte code to an OpenCL kernel dynamically at runtime. Because it is backed by OpenCL Aparapi is compatible with all OpenCL compatible Graphics Cards.

A GPU has a unique architecture that causes them to behave differently than a CPU. One of the most noticeable differences is that while a typical CPU has less than a dozen cores a high end GPU may have hundreds of cores. This makes them uniquely suited for data-parallel computation that can result in speedups hundreds of times more than what is capable with your average CPU. This can mean the difference between needing a whole data center to house your application versus just one or two computers, potentially saving millions in server costs.

Aparapi was originally a project conceived and developed by AMD corporation. It was later abandoned by AMD and sat mostly idle for several years. Despite this there were some failed efforts by the community to keep the project alive, but without a clear community leader no new releases ever came. Eventually we came along and rescued the project. Finally after such a long wait the first Aparapi release in 5 years was published and the community continues to push forward with renewed excitement.

Below you will find two side-by-side comparisons for the nbody problem on a CPU vs a GPU. The simulation is being run on an inexpensive graphics card; you can even run it yourself from the examples project. Its obvious the drastic performance gains that can be achieved with Aparapi.


GPU Accelerated	CPU Multi-threaded (8 cores)

Donating

As an open-source project we run entierly off donations. Buy one of our hardworking developers a beer by donating here. All donations go to our bounty fund and allow us to place bounties on important bugs and enhancements. You may also use Beerpay linked via the above badges.

Support and Documentation

This project is officially hosted on QOTO GitLab here however an up-to-date mirror is also maintained on Github here.

Aparapi Javadocs: latest - 2.0.0 - 1.10.0 - 1.9.0 - 1.8.0 - 1.7.0 - 1.6.0 - 1.5.0 - 1.4.1 - 1.4.0 - 1.3.4 - 1.3.3 - 1.3.2 - 1.3.1 - 1.3.0 - 1.2.0 - 1.1.2 - 1.1.1 - 1.1.0 - 1.0.0

For detailed documentation see Aparapi.com or check out the latest Javadocs.

For support please use Gitter or the official Aparapi mailing list and Discourse forum.

Please file bugs and feature requests on QOTO GitLab our old archived issues can still be viewed on Github as well.

Aparapi conforms to the Semantic Versioning 2.0.0 standard. That means the version of a release isnt arbitrary but rather describes how the library interfaces have changed. Read more about it at the Semantic Versioning page.

Related Projects

This particular repository only represents the core Java library. There are several other related repositories worth taking a look at.

Aparapi Examples - A collection of Java examples to showcase the Aparapi library and help developers get started.
Aparapi JNI - A java JAR which embeds and loads the native components at runtime. This prevents the need to seperately install the Aparapi Native library.
Aparapi Native - The native library component. Without this the Java library can't talk to the graphics card. This is not a java project but rather a C/C++ project.
Aparapi Vagrant - A vagrant environment for compiling aparapi native libraries for linux, both x86 an x64.
Aparapi Website - Source for the Aparapi website as hosted at http://aparapi.com. The site also contains our detailed documentation.

Prerequisites

Aparapi will run as-is on the CPU, however in order to access the GPU it requires OpenCL to be installed on the local system. If OpenCL isnt found then the library will just fallback to CPU mode. Aparapi supports, and has been tested on, both OpenCL 1.2, OpenCL 2.0, and OpenCL 2.1.

Aparapi runs on all operating systems and platforms, however GPU acceleration support is currently provided for the following platforms: Windows 64bit, Windows 32bit, Mac OSX 64bit, Linux 64bit, and Linux 32bit.

Note: It is no longer required to manually install the Aparapi JNI native interface, this is now done automatically through maven as a dependency on Aparapi.

Java Dependency

To include Aparapi in your project of choice include the following Maven dependency into your build.

<dependency>
    <groupId>com.aparapi</groupId>
    <artifactId>aparapi</artifactId>
    <version>2.0.0</version>
</dependency>

Obtaining the Source

The official source repository for Aparapi is located in the Syncleus Github repository and can be cloned using the following command.

git clone https://git.qoto.org/aparapi/aparapi.git

Getting Started

With Aparapi we can take a sequential loop such as this (which adds each element from inA and inB arrays and puts the result in result).

final float inA[] = .... // get a float array of data from somewhere
final float inB[] = .... // get a float array of data from somewhere
assert (inA.length == inB.length);
final float result = new float[inA.length];

for (int i = 0; i < array.length; i++) {
    result[i] = inA[i] + inB[i];
}

And refactor the sequential loop to the following form:

Kernel kernel = new Kernel() {
    @Override
    public void run() {
        int i = getGlobalId();
        result[i] = inA[i] + inB[i];
    }
};

Range range = Range.create(result.length);
kernel.execute(range);

Comments

Fix and Update: Fix issue #62 and provide new API for kernel profiling under multithreading (refs #62)

Fixes issue #62 - SEVERE log level messages for the Aparapi kernel profiling functionality, due to missing multithreading safeness when multiple threads execute the same kernel class on the same device. Updates Aparapi API for kernel profiling functionality providing thread safe variants while maintaining backwards compatibility.
bug enhancement bounty $$$

opened by CoreRasurae 39
Can I take over development of this project?
From @freemo on October 17, 2016 15:20

Hi, my project has a pressing need to rely on aparapi and as such I have been contributing on the project in my own repositories. Since I will be contributing significant amount of work I'd like to contribute back that effort. You are of course welcome to adopt the project back into yours but you will find a lot has changed and that may be difficult at this point, but please feel free. If not perhaps the team would like to consider moving over to my new repositories for future effort? I am willing to discuss alternatives as well.

Here is a recap of what I did so far and where the new repositories can be found.

Everything has been mavinized!

All new code is licensed under the apache license. Also since AMD is no longer the maintainer I changed the root package across all the projects.

I pulled out the core java library. This is where all the platform independent code lives and ultimately produces the jar that will be used as the dependencies. This is now its own repository and can be found here:

https://github.com/Syncleus/aparapi

I pulled out all the platform specific code, the JNI layer, into its own repository. This no longer uses ant, as the project has been mavenized. But it isnt a Java project either, so it doesnt use maven. It has been refactored to use autotools, which is a platform independent way to compile shared libraries. It currently only compiles for linux platforms though. The code for the JNI libraries can now be found here, it uses submodules so please clone recursively:

https://github.com/Syncleus/aparapi-jni

I also created an Archlinux AUR and an unofficial binary repository for the aparapi system-specific shared library. This will allow installation of the aparapi shared library using the archlinux package management system, and uninstallation as well. The AUR can be found here:

https://github.com/Syncleus/aparapi-archlinux

To add the unofficial repository to archlinux for use with pacman then add the following line to your /etc/pacman.conf file, before all the other repositories:

[aparapi] SigLevel = Optional TrustAll Server = http://syncleus.com/aparapi-archlinux-repo/

The examples are now also a separate repository, as it isnt really needed for the library itself. This has the life sample mavenized and working but still need to mavenize the rest of the examples. This repo can be found here:

https://github.com/Syncleus/aparapi-examples

Finally I purchased the aparapi.com domain name so I can start hosting some useful information about it and host some files there. I also have an account on maven central so I will soon be able to upload aparapi there as it is now in a state where it can be consumed as a dependency (since it is completely mavenized).

So let me know what you guys think about joining efforts and perhaps moving the development over to the new repositories?

Copied from original issue: aparapi/aparapi#39
opened by freemo 26

Nvidia RTX 3080 GPU not detected

Hi,

Aparapi is not detecting my gpu even though I have OpenCL installed. When I run these two lines:

Device device = Device.firstGPU();
Range range = device.createRange(size);

It gives me null pointer exception which proves that GPU is not detected.

GPU: Nvidia RTX 3080 OpenCL version: 3.0 Operating System: Pop OS (Ubuntu)

CudaSquare.java:

/*
 * Copyright (C) 2021 - present by EY LLP. and EYQP team. 
 * Any change of "copy" of code without author permission is not allowed.
 * This is NOT a open source project, DONOT copy any Code.
 * For QuantLib please see LICENSCE.TXT provided. 
*/
package cudaProgramming.parallel;

import com.aparapi.Kernel;
import com.aparapi.Range;
import com.aparapi.device.Device;

/**
 * An example Aparapi application which computes and displays squares of a set
 * of 512 input values. While executing on GPU using Aparpi framework, each
 * square value is computed in a separate kernel invocation and can thus
 * maximize performance by optimally utilizing all GPU computing units
 *
 * @author gfrost
 * @version $Id: $Id
 */
public class CudaSquare {

	/**
	 * <p>
	 * main.
	 * </p>
	 *
	 * @param _args an array of {@link java.lang.String} objects.
	 */
	public static void main(String[] _args) {
		// loop must be multiple of 8 or 8 bit
		final int size = 160;

		/** Input float array for which square values need to be computed. */
		final float[] values = new float[size];

		/** Initialize input array. */
		for (int i = 0; i < size; i++) {
			values[i] = i;
		}

		/**
		 * Output array which will be populated with square values of corresponding
		 * input array elements.
		 */
		final float[] squares = new float[size];

		/**
		 * Aparapi Kernel which computes squares of input array elements and populates
		 * them in corresponding elements of output array.
		 **/
		Kernel kernel = new Kernel() {
			@Override
			public void run() {
				int gid = getGlobalId();
				squares[gid] = values[gid] * values[gid];
			}
		};

		Device device = Device.firstGPU();
		Range range = device.createRange(size);

		// Execute Kernel.

		kernel.execute(range);

		// Report target execution mode: GPU or JTP (Java Thread Pool).
		System.out.println("Device = " + kernel.getTargetDevice().getShortDescription());

		// Display computed square values.
		for (int i = 0; i < size; i++) {
			System.out.printf("%6.0f %8.0f\n", values[i], squares[i]);
		}

		// Dispose Kernel resources.
		kernel.dispose();
	}

}

Output of /usr/bin/clinfo:

Number of platforms                               1
  Platform Name                                   NVIDIA CUDA
  Platform Vendor                                 NVIDIA Corporation
  Platform Version                                OpenCL 3.0 CUDA 11.7.89
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_opaque_fd cl_khr_external_memory_opaque_fd
  Platform Extensions with Version                cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_khr_fp64                                                      0x400000 (1.0.0)
                                                  cl_khr_3d_image_writes                                           0x400000 (1.0.0)
                                                  cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_khr_gl_sharing                                                0x400000 (1.0.0)
                                                  cl_nv_compiler_options                                           0x400000 (1.0.0)
                                                  cl_nv_device_attribute_query                                     0x400000 (1.0.0)
                                                  cl_nv_pragma_unroll                                              0x400000 (1.0.0)
                                                  cl_nv_copy_opts                                                  0x400000 (1.0.0)
                                                  cl_nv_create_buffer                                              0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_khr_device_uuid                                               0x400000 (1.0.0)
                                                  cl_khr_pci_bus_info                                              0x400000 (1.0.0)
                                                  cl_khr_external_semaphore                                          0x9000 (0.9.0)
                                                  cl_khr_external_memory                                             0x9000 (0.9.0)
                                                  cl_khr_external_semaphore_opaque_fd                                0x9000 (0.9.0)
                                                  cl_khr_external_memory_opaque_fd                                   0x9000 (0.9.0)
  Platform Numeric Version                        0xc00000 (3.0.0)
  Platform Extensions function suffix             NV
  Platform Host timer resolution                  0ns

  Platform Name                                   NVIDIA CUDA
Number of devices                                 1
  Device Name                                     NVIDIA GeForce RTX 3080 Laptop GPU
  Device Vendor                                   NVIDIA Corporation
  Device Vendor ID                                0x10de
  Device Version                                  OpenCL 3.0 CUDA
  Device UUID                                     808d8c70-1aef-49f3-cc51-44aaea760720
  Driver UUID                                     808d8c70-1aef-49f3-cc51-44aaea760720
  Valid Device LUID                               No
  Device LUID                                     6d69-637300000000
  Device Node Mask                                0
  Device Numeric Version                          0xc00000 (3.0.0)
  Driver Version                                  515.48.07
  Device OpenCL C Version                         OpenCL C 1.2 
  Device OpenCL C all versions                    OpenCL C                                                         0x400000 (1.0.0)
                                                  OpenCL C                                                         0x401000 (1.1.0)
                                                  OpenCL C                                                         0x402000 (1.2.0)
                                                  OpenCL C                                                         0xc00000 (3.0.0)
  Device OpenCL C features                        __opencl_c_fp64                                                  0xc00000 (3.0.0)
                                                  __opencl_c_images                                                0xc00000 (3.0.0)
                                                  __opencl_c_int64                                                 0xc00000 (3.0.0)
                                                  __opencl_c_3d_image_writes                                       0xc00000 (3.0.0)
  Latest comfornace test passed                   v2021-02-01-00
  Device Type                                     GPU
  Device Topology (NV)                            PCI-E, 0000:01:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               48
  Max clock frequency                             1710MHz
  Compute Capability (NV)                         8.6
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x64
  Max work group size                             1024
  Preferred work group size multiple (device)     32
  Preferred work group size multiple (kernel)     32
  Warp size (NV)                                  32
  Max sub-groups per work group                   0
  Preferred / native vector sizes                 
    char                                                 1 / 1       
    short                                                1 / 1       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 0 / 0        (n/a)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              16908353536 (15.75GiB)
  Error Correction support                        No
  Max memory allocation                           4227088384 (3.937GiB)
  Unified memory for Host and Device              No
  Integrated memory (NV)                          No
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   No
    Fine-grained system sharing                   No
    Atomics                                       No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       4096 bits (512 bytes)
  Preferred alignment for atomics                 
    SVM                                           0 bytes
    Global                                        0 bytes
    Local                                         0 bytes
  Atomic memory capabilities                      relaxed, work-group scope
  Atomic fence capabilities                       relaxed, acquire/release, work-group scope
  Max size for global variable                    0
  Preferred total size of global vars             0
  Global Memory cache type                        Read/Write
  Global Memory cache size                        1376256 (1.312MiB)
  Global Memory cache line size                   128 bytes
  Image support                                   Yes
    Max number of samplers per kernel             32
    Max size for 1D images from buffer            268435456 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             32768x32768 pixels
    Max 3D image size                             16384x16384x16384 pixels
    Max number of read image args                 256
    Max number of write image args                32
    Max number of read/write image args           0
  Pipe support                                    No
  Max number of pipe args                         0
  Max active pipe reservations                    0
  Max pipe packet size                            0
  Local memory type                               Local
  Local memory size                               49152 (48KiB)
  Registers per block (NV)                        65536
  Max number of constant args                     9
  Max constant buffer size                        65536 (64KiB)
  Generic address space support                   No
  Max size of kernel argument                     4352 (4.25KiB)
  Queue properties (on host)                      
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Device enqueue capabilities                     (n/a)
  Queue properties (on device)                    
    Out-of-order execution                        No
    Profiling                                     No
    Preferred size                                0
    Max size                                      0
  Max queues on device                            0
  Max events on device                            0
  Prefer user sync for interop                    No
  Profiling timer resolution                      1000ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Non-uniform work-groups                       No
    Work-group collective functions               No
    Sub-group independent forward progress        No
    Kernel execution timeout (NV)                 Yes
  Concurrent copy and kernel execution (NV)       Yes
    Number of async copy engines                  2
    IL version                                    (n/a)
    ILs with version                              <printDeviceInfo:186: get CL_DEVICE_ILS_WITH_VERSION : error -30>
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                (n/a)
  Built-in kernels with version                   <printDeviceInfo:190: get CL_DEVICE_BUILT_IN_KERNELS_WITH_VERSION : error -30>
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_opaque_fd cl_khr_external_memory_opaque_fd
  Device Extensions with Version                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_khr_fp64                                                      0x400000 (1.0.0)
                                                  cl_khr_3d_image_writes                                           0x400000 (1.0.0)
                                                  cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_khr_gl_sharing                                                0x400000 (1.0.0)
                                                  cl_nv_compiler_options                                           0x400000 (1.0.0)
                                                  cl_nv_device_attribute_query                                     0x400000 (1.0.0)
                                                  cl_nv_pragma_unroll                                              0x400000 (1.0.0)
                                                  cl_nv_copy_opts                                                  0x400000 (1.0.0)
                                                  cl_nv_create_buffer                                              0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_khr_device_uuid                                               0x400000 (1.0.0)
                                                  cl_khr_pci_bus_info                                              0x400000 (1.0.0)
                                                  cl_khr_external_semaphore                                          0x9000 (0.9.0)
                                                  cl_khr_external_memory                                             0x9000 (0.9.0)
                                                  cl_khr_external_semaphore_opaque_fd                                0x9000 (0.9.0)
                                                  cl_khr_external_memory_opaque_fd                                   0x9000 (0.9.0)

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform
  clCreateContext(NULL, ...) [default]            No platform
  clCreateContext(NULL, ...) [other]              Success [NV]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  Invalid device type for platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  No platform

opened by Raunak-Singh-Inventor 17

Update: Add support for Local arguments in kernel functions (refs #79)

Allows kernel functions to be called having as arguments arrays in local memory. Such arguments are designated by @Local annotation or by the _$local$ suffix name. As described in Issue 79 by the provided sample code.

A test case was added validate this implementation.
enhancement

opened by CoreRasurae 17

[BOUNTY $25] JVM Crash: SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED

Hmm when I create more then one Kernel from the same class then I suddenly run into such messages like below. I could not find anything in the documentation that this should not be supported. Basically I have an object pool of a limited number of Kernels and different threads can aquire those limited resources. But this results in a JVM crash.

Thanks KIC

PS you could find a runnable version at: https://github.com/KIC/LPPL the problem arises at kic.lppl.SornetteTest

Connected to the target VM, address: '127.0.0.1:64263', transport: 'socket'
Oct 15, 2017 9:46:04 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00007ffe652ebbdf, pid=18076, tid=0x0000000000004310
#
# JRE version: Java(TM) SE Runtime Environment (8.0_102-b14) (build 1.8.0_102-b14)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.102-b14 mixed mode windows-amd64 compressed oops)
# Problematic frame:
# C  [ntdll.dll+0x3bbdf]
#
# Failed to write core dump. Minidumps are not enabled by default on client versions of Windows
#
# An error report file with more information is saved as:
# C:\Users\xxxx\sources\kic\dataframe\LPPL\hs_err_pid18076.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

bug bounty $$$

opened by KIC 17

mavenize?

From @m0wfo on May 31, 2015 20:52

Any chance you guys could publish releases on Maven Central? It'd help a whole lot. I can send a PR w/ any necessary changes if it's of interest.

Copied from original issue: aparapi/aparapi#8

opened by freemo 15

Exception UnsatisfiedLinkError thrown when running on Windows.

The following exception occurs when trying to run Aparapi on windows. The exception never seems to occur on linux or mac.

Exception in thread "main" java.lang.UnsatisfiedLinkError: C:\Users\Vincenzo\AppData\Local\Temp\libaparapi_x86_643218484219112159561.dll: Can't find dependent libraries
at java.lang.ClassLoader$NativeLibrary.load(Native Method)
at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1941)
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1824)
at java.lang.Runtime.load0(Runtime.java:809)
at java.lang.System.load(System.java:1086)
at com.aparapi.natives.util.NativeUtils.loadLibraryFromJar(NativeUtils.java:100)
at com.aparapi.natives.NativeLoader.load(NativeLoader.java:42)
at com.aparapi.internal.opencl.OpenCLLoader.<clinit>(OpenCLLoader.java:43)
at com.aparapi.internal.opencl.OpenCLPlatform.getOpenCLPlatforms(OpenCLPlatform.java:73)
at com.aparapi.device.OpenCLDevice.listDevices(OpenCLDevice.java:458)
at com.aparapi.internal.kernel.KernelManager.createDefaultPreferredDevices(KernelManager.java:203)
at com.aparapi.internal.kernel.KernelManager.createDefaultPreferences(KernelManager.java:178)
at com.aparapi.internal.kernel.KernelManager.<init>(KernelManager.java:46)
at com.aparapi.internal.kernel.KernelManager.<clinit>(KernelManager.java:38)
at com.aparapi.internal.kernel.KernelRunner.<init>(KernelRunner.java:170)
at com.aparapi.Kernel.prepareKernelRunner(Kernel.java:2270)
at com.aparapi.Kernel.execute(Kernel.java:2439)
at com.aparapi.Kernel.execute(Kernel.java:2396)
at com.aparapi.Kernel.execute(Kernel.java:2371)

opened by freemo 14

[BOUNTY $25] JVM crash when using multi-dimensional local arrays

This may be a user error, since I'm relatively new to both Aparapi and OpenCL, but it seems to me that there is a problem related to multidimensional local arrays.

When I'm executing the kernel shown at https://github.com/raner/top.java.matrix/blob/syncleus-aparapi-issue-51/src/main/java/top/java/matrix/fast/TiledFastMatrix.java#L72 on my MacBook Air (Mac OS X 10.11, Intel HD Graphics 6000 with 48 execution units), I'm consistently getting a JVM crash:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x000000011b42650c, pid=88211, tid=5891
#
# JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build 1.8.0_60-b27)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode bsd-amd64 compressed oops)
# Problematic frame:
# C  [libaparapi_x86_643808666101765060573.dylib+0xd50c]  KernelArg::setLocalBufferArg(JNIEnv_*, int, int, bool)+0x3c

The relevant snippet in the generated OpenCL code seems to be

            int tiledRow = (this->val$TILE_SIZE * tile) + localRow;
            int tiledColumn = (this->val$TILE_SIZE * tile) + localColumn;
            (&this->tileA[localColumn * this->tileA__javaArrayDimension0])[localRow]  = this->val$A[((tiledColumn * this->val$numberOfRows) + row)];
            (&this->tileB[localColumn * this->tileB__javaArrayDimension0])[localRow]  = this->val$B[((column * this->val$numberOfColumns) + tiledRow)];
            barrier(CLK_LOCAL_MEM_FENCE);
            for (int repeat = 0; repeat<this->val$TILE_SIZE; repeat++){
               value = value + ((&this->tileA[repeat * this->tileA__javaArrayDimension0])[localRow] * (&this->tileB[localColumn * this->tileB__javaArrayDimension0])[repeat]);
            }
            barrier(CLK_LOCAL_MEM_FENCE);

I noticed that it refers to the arrays' dimensions as tile…__javaArrayDimension0 to calculate the proper index into the two-dimensional array, which, I believe, is what necessitates the earlier invocation of KernelArg::setLocalBufferArg. Anyway, I didn't have the time to do a deep dive on this issue, but when I change the arrays to one-dimensional arrays and perform the index calculation myself (as shown in https://github.com/raner/top.java.matrix/commit/cb4988d), the code will work correctly.

bug bounty $$$

opened by raner 13

Kernel.dispose always fail in openjdk12

Every call of Kernel.dispose fails in openjdk-12 and in openjdk-11 with error:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f9c2c4481ac, pid=21273, tid=21274
#
# JRE version: OpenJDK Runtime Environment (12.0.2+9) (build 12.0.2+9)
# Java VM: OpenJDK 64-Bit Server VM (12.0.2+9, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0xc201ac]  OopStorage::Block::release_entries(unsigned long, OopStorage*)+0x3c

 #
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f8aebbacf4c, pid=23028, tid=23032
#
# JRE version: OpenJDK Runtime Environment (11.0.4+11) (build 11.0.4+11)
# Java VM: OpenJDK 64-Bit Server VM (11.0.4+11, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0xc25f4c]  OopStorage::Block::release_entries(unsigned long, OopStorage::Block* volatile*)+0x3c

, in openjdk-8 works well.

opened by AlexanderFedyukov 12

Fix: Issue #101 possible deadlock when running in JTP - threads below…

… the FJSafeCyclicBarrier number of parties (refs #101)

There a bug since at least Aparapi 1.4.1 that may cause a deadlock when running in Java Thread Pool mode. At the end of KernelRunner a FJSafeCyclicBarrier is used and is set to the group size, however if the number of threads able to run is less than the group size a deadlock could occur. Also introduced a mechanism to monitor if pool threads of the ForkJoinPool die they are now expected to be logged.

opened by CoreRasurae 12
Fix: Memory leak by not disposing kernel in unit tests of feature #79 plus code quality #81

Feature #79: Renames the unit test LocalArrayArgsIssue79Test to Issue79LocalArrayArgsTest, so that it is no longer skipped, plus removes the memory leak by ensuring the kernel is disposed at the end.

Feature #81: Improves code quality by logically splitting long unit test class into two unit test classes, one with the base validations and another with the advanced validations. Also KernelRunner new methods are refactored to reduce code complexity by extracting methods, and reducing code duplication.
chore

opened by CoreRasurae 12
Kernel overall local size
Good afternoon.

When I use the NVIDIA GeForce RTX 3060 Ti graphics card in java-code, I get an error: Kernel overall local size: 1000 exceeds maximum kernel allowed local size of: 256 failed Running the same code on an Intel HD Graphics 630 or AMD RadeonT R7 450 graphics card, everything works fine. If in this part of the code I put a number less than 256, then the code with the NVIDIA GeForce RTX 3060 Ti graphics card works fine:

Range range = needDevice.createRange(255); kernel.execute(range)

The NVIDIA GeForce RTX 3060 Ti video card is more modern than the Intel HD Graphics 630 or AMD RadeonT R7 450, but for some reason the parameter for createRange is less than for older video cards. What could be the problem?
opened by forreg16 0
Build(deps): Bump scala-library from 2.13.6 to 2.13.9
Bumps scala-library from 2.13.6 to 2.13.9.

Release notes

Sourced from scala-library's releases.

Scala 2.13.9

The following changes are highlights of this release:

Compatibility with Scala 3

Tasty Reader: Add support for Scala 3.2 (#10068)

Tasty Reader: Restrict access to experimental definitions (#10020)

To aid cross-building, accept and ignore using in method calls (#10064 by @som-snytt)

To aid cross-building, allow ? as a wildcard even without -Xsource:3 (#9990)

Make Scala-3-style implicit resolution explicitly opt-in rather than bundled in -Xsource:3 (#10012 by @povder)

Prefer type of overridden member when inferring (under -Xsource:3) (#9891 by @som-snytt)

JDK version support

Make -release more useful, deprecate -target, align with Scala 3 (#9982 by @som-snytt)

Support JDK 19 (#10001 by @Philippus)

Warnings and lints

Add -Wnonunit-statement to warn about discarded values in statement position (#9893 by @som-snytt)

Make unused-import warnings easier to silence (support filtering by origin=) (#9939 by @som-snytt)

Add -Wperformance lints for *Ref boxing and nonlocal return (#9889 by @som-snytt)

Language improvements

Improve support for Unicode supplementary characters in identifiers and string interpolation (#9805 by @som-snytt)

Compiler options

Use subcolon args to simplify optimizer options (#9810 by @som-snytt)

For troubleshooting compiler, add -Vdebug-type-error (and remove -Yissue-debug) (#9824 by @tribbloid)

Security

Error on source files with Unicode directional formatting characters (#10017)

Prevent Function0 execution during LazyList deserialization (#10118)

Bugfixes

Emit all bridge methods non-final (perhaps affecting serialization compat) (#9976)

Fix null-pointer regression in Vector#prependedAll and Vector#appendedAll (#9983)

Improve concurrent behavior of Java ConcurrentMap wrapper (#10027 by @igabaydulin)

Preserve null policy in wrapped Java Maps (#10129 by @som-snytt)

Changes that shipped in Scala 2.12.16 and 2.12.17 are also included in this release.

For the complete 2.13.9 change lists, see all merged PRs and all closed bugs.

Compatibility

... (truncated)

Commits

986dcc1 Merge pull request #10129 from som-snytt/followup/12586-preserve-NPE

b824b84 Preserve null policy in wrapped Java Map

d578a02 Merge pull request #10128 from SethTisue/revert-10114-10123

e5fe919 Revert "Args files are 1 arg per line, fix -Vprint-args -"

362c5d1 Revert "Trim and filter empties in arg files"

864148d Revert "process.Parser strips escaping backslash"

f69fe8b Merge pull request #10127 from scalacenter/tasty/support-3.2.0-final

0aa6bd4 remove tasty escape hatch for 3.2.0-RC4

af56abc Merge pull request #10123 from som-snytt/dev/814-window-cmd-escapes

7e844a5 Merge pull request #10121 from scala-steward/update/slf4j-nop-2.0.0

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Question: Support for Xeon Phi?
I know it had been long since the Xeon Phi had been discontinued, and there is no value in any attempts to revive the Xeon Phi, my questions below are plainly for studying purposes.

Are Xeon Phi(s) still supported by the Aparapi project? If so, how long will it continue to receive support from this project?

What setups/configurations will be needed for the current Aparapi project to work on Xeon Phi(s)? What attempts should be taken?

If Aparapi still works on Xeon Phi(s), what's the performance that it would get? or at suspected to have?

I'm really sorry to bother you all, the main reason that I created this issue is because I've recently gotten a Xeon Phi, and in my attempts to make it work, I've came across multiple Stack Overflow posts and forums mentioning that "JamVM" and "Aparapi" would work with the Xeon Phi.

https://stackoverflow.com/questions/17309471/using-xeon-phi-with-jvm-based-language

I've conducted hours of research but was only able to come up with a single pull request in the original Aparapi repo mentioning that the support for Xeon Phi was added.

https://github.com/aparapi/aparapi/pull/1

Any attempts to help and all responses will be highly appreciated. EDIT: Any other information or documentations regarding to the Xeon Phi would also be highly appreciated.
opened by czhangdev 0
Can be added Kernel.put(short)?

If use Kernel.put(char), then (short)char to get negative numbers will cost performance,Can be added Kernel.put(short)? And it would be nice to have the 8-bit unsigned integer values to kernel

opened by yiulo 0
Aparapi previously working now only runs on CPU instead of GPU after driver update

I recently did a fresh install of windows 10 and was using Aparapi with IntelliJ and the driver Windows 10 installed for my nvidia card automatically. It was working flawlessly and showing all the warnings correctly, but after installing CUDA 11.3 (and the Quadro drivers along with it), it doesn't even show the warnings that it fell back to CPU. I figured it was doing this just by looking at all my CPU's going to 100%. Checked the driver of the card details, and it shows OpenCL in there under C:\\WINDOWS\system32\OpenCL.dll and C:\\WINDOWS\SysWow64\OpenCL.dll but do not know what they were previously or if they changed, or if this is an issue.

Is there anything else I need to check or do for this to work again? I tried uninstalling the driver and using the one from windows but it did not work any more. Not sure if uninstalling everything CUDA related would help.

opened by maflores16 1