Model import deployment framework for retraining models (pytorch, tensorflow,keras) deploying in JVM Micro service environments, mobile devices, iot, and Apache Spark

Overview

Documentation Get help at the community forum javadoc javadoc License GitHub commit activity

The Eclipse Deeplearning4J (DL4J) ecosystem is a set of projects intended to support all the needs of a JVM based deep learning application. This means starting with the raw data, loading and preprocessing it from wherever and whatever format it is in to building and tuning a wide variety of simple and complex deep learning networks.

Because Deeplearning4J runs on the JVM you can use it with a wide variety of JVM based languages other than Java, like Scala, Kotlin, Clojure and many more.

The DL4J stack comprises of:

  • DL4J: High level API to build MultiLayerNetworks and ComputationGraphs with a variety of layers, including custom ones. Supports importing Keras models from h5, including tf.keras models (as of 1.0.0-beta7) and also supports distributed training on Apache Spark
  • ND4J: General purpose linear algebra library with over 500 mathematical, linear algebra and deep learning operations. ND4J is based on the highly-optimized C++ codebase LibND4J that provides CPU (AVX2/512) and GPU (CUDA) support and acceleration by libraries such as OpenBLAS, OneDNN (MKL-DNN), cuDNN, cuBLAS, etc
  • SameDiff : Part of the ND4J library, SameDiff is our automatic differentiation / deep learning framework. SameDiff uses a graph-based (define then run) approach, similar to TensorFlow graph mode. Eager graph (TensorFlow 2.x eager/PyTorch) graph execution is planned. SameDiff supports importing TensorFlow frozen model format .pb (protobuf) models. Import for ONNX, TensorFlow SavedModel and Keras models are planned. Deeplearning4j also has full SameDiff support for easily writing custom layers and loss functions.
  • DataVec: ETL for machine learning data in a wide variety of formats and files (HDFS, Spark, Images, Video, Audio, CSV, Excel etc)
  • Arbiter: Library for hyperparameter search
  • LibND4J : C++ library that underpins everything. For more information on how the JVM acceses native arrays and operations refer to JavaCPP

All projects in the DL4J ecosystem support Windows, Linux and macOS. Hardware support includes CUDA GPUs (10.0, 10.1, 10.2 except OSX), x86 CPU (x86_64, avx2, avx512), ARM CPU (arm, arm64, armhf) and PowerPC (ppc64le).

Using Eclipse Deeplearning4J in your project

Deeplearning4J has quite a few dependencies. For this reason we only support usage with a build tool.

<dependencies>
  <dependency>
      <groupId>org.deeplearning4j</groupId>
      <artifactId>deeplearning4j-core</artifactId>
      <version>1.0.0-beta7</version>
  </dependency>
  <dependency>
      <groupId>org.nd4j</groupId>
      <artifactId>nd4j-native-platform</artifactId>
      <version>1.0.0-beta7</version>
  </dependency>
</dependencies>

Add these dependencies to your pom.xml file to use Deeplearning4J with the CPU backend. A full standalone project example is available in the example repository, if you want to start a new Maven project from scratch.

A taste of code

Deeplearning4J offers a very high level API for defining even complex neural networks. The following example code shows you how LeNet, a convolutional neural network, is defined in DL4J.

MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
                .seed(seed)
                .l2(0.0005)
                .weightInit(WeightInit.XAVIER)
                .updater(new Adam(1e-3))
                .list()
                .layer(new ConvolutionLayer.Builder(5, 5)
                        .stride(1,1)
                        .nOut(20)
                        .activation(Activation.IDENTITY)
                        .build())
                .layer(new SubsamplingLayer.Builder(PoolingType.MAX)
                        .kernelSize(2,2)
                        .stride(2,2)
                        .build())
                .layer(new ConvolutionLayer.Builder(5, 5)
                        .stride(1,1)
                        .nOut(50)
                        .activation(Activation.IDENTITY)
                        .build())
                .layer(new SubsamplingLayer.Builder(PoolingType.MAX)
                        .kernelSize(2,2)
                        .stride(2,2)
                        .build())
                .layer(new DenseLayer.Builder().activation(Activation.RELU)
                        .nOut(500).build())
                .layer(new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                        .nOut(outputNum)
                        .activation(Activation.SOFTMAX)
                        .build())
                .setInputType(InputType.convolutionalFlat(28,28,1))
                .build();

Documentation, Guides and Tutorials

You can find the official documentation for Deeplearning4J and the other libraries of its ecosystem at http://deeplearning4j.konduit.ai/.

Want some examples?

We have separate repository with various examples available: https://github.com/eclipse/deeplearning4j-examples

Building from source

It is preferred to use the official pre-compiled releases (see above). But if you want to build from source, first take a look at the prerequisites for building from source here: https://deeplearning4j.konduit.ai/getting-started/build-from-source.

To build everything, we can use commands like

./change-cuda-versions.sh x.x
./change-scala-versions.sh 2.xx
./change-spark-versions.sh x
mvn clean install -Dmaven.test.skip -Dlibnd4j.cuda=x.x -Dlibnd4j.compute=xx

or

mvn -B -V -U clean install -pl '!jumpy,!pydatavec,!pydl4j' -Dlibnd4j.platform=linux-x86_64 -Dlibnd4j.chip=cuda -Dlibnd4j.cuda=9.2 -Dlibnd4j.compute=<your GPU CC> -Djavacpp.platform=linux-x86_64 -Dmaven.test.skip=true

An example of GPU "CC" or compute capability is 61 for Titan X Pascal.

License

Apache License 2.0

Commercial Support

Deeplearning4J is actively developed by the team at Konduit K.K..

If you need any commercial support feel free to reach out to us.

Comments
  • [WIP] Keras upgrades

    [WIP] Keras upgrades

    Work in progress...

    Upgrades to deeplearning4j-keras to be a little better structured and expand the API from Keras to DL4J. Encourages hijacking model methods rather than implementing an actual Keras backend for efficiency and performance.

    Main goals of this PR include:

    • expanding Keras to better support DL4J
    • model saving methods via save_model
    • supporting Keras functional API
    opened by crockpotveggies 103
  • Implement new UI functionality using Play framework

    Implement new UI functionality using Play framework

    _WIP DO NOT MERGE_

    Play framework UI: builds upon earlier StatsListener and StatsStorage work implemented here: https://github.com/deeplearning4j/deeplearning4j/pull/2143

    opened by AlexDBlack 90
  • Fix RBMs and AE

    Fix RBMs and AE

    • Setup vb params to persist and be updated when in pretraining mode. It was skipping the update part
    • Added flag for pretraining to configuration at layer level and set trigger to turn off after layer pretrains. LayerUpdater will skip vb params when running outside pretrain. In previous setup, backprop was hard coded to true in many cases when setting params or gradients and it would skip vb (visual bias) during pretrain phase. In this change, getting the count for params or gradients or updating them will take vb into account. It will just not have any changes applied in the updater when it is not in pretrain mode.
    • HiddenUnit is the activation in RBM - added backpropGradient and derivative for hidden unit in RBM to account for this fact
    • RBM needed a reverse sign on application of gradients for the step function
    • Deprecated unused code in RBM and cleaned up functions in AE that appeared out of date
    • Expanded RBM tests and fixed gradient checks
    opened by nyghtowl 86
  • "A fatal error has been detected by the Java Runtime Environment" when running ParagraphVectors.inferVector(), 1.0.0-alpha

    Issue Description

    I submitted this issue before for dl4j v0.80, and thought it was resolved after upgrading to 1.00-alpha. However when I built a new ParagraphVectors model and called the method inferVector() to infer a batch of new texts, the error came back again. The information about the issue is as follows:

    I'm running DL4J on my personal laptop, within Eclipse IDE. If I saved the ParagraphVectors model to a file and then loaded the model from the same file to call ParagraphVectors.inferVector, I received the error message of "A fatal error has been detected by the Java Runtime Environment". One error report is in attachment.

    I noticed that this issue appears to be more likely to happen when the new text is a (slightly) longer sentence. The data for training the model and new texts are in Simplified Chinese, all being properly processed before using Dl4J.

    The code snippet causing this issue is as follows, within a next() function of a DataSetIterator:

            for(int j=0; j<report.size(); j++){
                String stc = report.get(j);
                // this is where the problem is
                // m_SWV is loaded from a saved model, and proper TokenizerFactory has been set
                INDArray vector = ((ParagraphVectors)m_SWV).inferVector(stc);  
    
                features.put(new INDArrayIndex[]{NDArrayIndex.point(i), NDArrayIndex.all(), NDArrayIndex.point(j)}, vector);
                temp[1] = j;
                featuresMask.putScalar(temp, 1.0); 
            }
    

    Version Information

    Please indicate relevant versions, including, if relevant:

    • Deeplearning4j 1.0.0-alpha
    • platform information (OS, etc): DELL Inspiron 15 laptop with Windows 8 as OS
    • Java version: jdk1.8.0_60

    hs_err_pid4712_jdk1.8_60.log

    Bug Release Burndown 
    opened by xinxu75 85
  • Word2Vec/ParagraphVectors/DeepWalk Spark

    Word2Vec/ParagraphVectors/DeepWalk Spark

    WIP; DO NOT MERGE;

    Word2Vec/ParagraphVectors/DeepWalk implementation for Spark, using VoidParameterServer available in ND4j

    Do not merge before this: https://github.com/deeplearning4j/nd4j/pull/1551

    opened by raver119 82
  • DL4J Hanging after

    DL4J Hanging after "Loaded [JCublasBackend] backend"

    Hi,

    We are running some DL4J code as part of a wider system. This code runs fine on an Alienware development PC with CUDA 9.1 on Ubuntu, run from Eclipse.

    However, when we package this application and run it on a RHEL ppc64le server with CUDA 9.1, we see that ND4J is not doing anything after the following output:

    2309 [pool-8-thread-1] INFO org.nd4j.linalg.factory.Nd4jBackend - Loaded [JCublasBackend] backend

    I have verified we are running the latest NVIDIA drivers and CUDA 9.1 is installed successfully. Below is the output from running the CUDA 9.1 sample deviceQuery, which lists the GPU devices:

     CUDA Device Query (Runtime API) version (CUDART static linking)
    
    Detected 4 CUDA Capable device(s)
    
    Device 0: "Tesla P100-SXM2-16GB"
      CUDA Driver Version / Runtime Version          9.1 / 9.1
      CUDA Capability Major/Minor version number:    6.0
      Total amount of global memory:                 16276 MBytes (17066885120 bytes)
      (56) Multiprocessors, ( 64) CUDA Cores/MP:     3584 CUDA Cores
      GPU Max Clock rate:                            1481 MHz (1.48 GHz)
      Memory Clock rate:                             715 Mhz
      Memory Bus Width:                              4096-bit
      L2 Cache Size:                                 4194304 bytes
      Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
      Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
      Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
      Total amount of constant memory:               65536 bytes
      Total amount of shared memory per block:       49152 bytes
      Total number of registers available per block: 65536
      Warp size:                                     32
      Maximum number of threads per multiprocessor:  2048
      Maximum number of threads per block:           1024
      Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
      Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
      Maximum memory pitch:                          2147483647 bytes
      Texture alignment:                             512 bytes
      Concurrent copy and kernel execution:          Yes with 3 copy engine(s)
      Run time limit on kernels:                     No
      Integrated GPU sharing Host Memory:            No
      Support host page-locked memory mapping:       Yes
      Alignment requirement for Surfaces:            Yes
      Device has ECC support:                        Enabled
      Device supports Unified Addressing (UVA):      Yes
      Supports Cooperative Kernel Launch:            Yes
      Supports MultiDevice Co-op Kernel Launch:      Yes
      Device PCI Domain ID / Bus ID / location ID:   2 / 1 / 0
      Compute Mode:
         < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
    
    Device 1: "Tesla P100-SXM2-16GB"
      CUDA Driver Version / Runtime Version          9.1 / 9.1
      CUDA Capability Major/Minor version number:    6.0
      Total amount of global memory:                 16276 MBytes (17066885120 bytes)
      (56) Multiprocessors, ( 64) CUDA Cores/MP:     3584 CUDA Cores
      GPU Max Clock rate:                            1481 MHz (1.48 GHz)
      Memory Clock rate:                             715 Mhz
      Memory Bus Width:                              4096-bit
      L2 Cache Size:                                 4194304 bytes
      Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
      Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
      Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
      Total amount of constant memory:               65536 bytes
      Total amount of shared memory per block:       49152 bytes
      Total number of registers available per block: 65536
      Warp size:                                     32
      Maximum number of threads per multiprocessor:  2048
      Maximum number of threads per block:           1024
      Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
      Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
      Maximum memory pitch:                          2147483647 bytes
      Texture alignment:                             512 bytes
      Concurrent copy and kernel execution:          Yes with 3 copy engine(s)
      Run time limit on kernels:                     No
      Integrated GPU sharing Host Memory:            No
      Support host page-locked memory mapping:       Yes
      Alignment requirement for Surfaces:            Yes
      Device has ECC support:                        Enabled
      Device supports Unified Addressing (UVA):      Yes
      Supports Cooperative Kernel Launch:            Yes
      Supports MultiDevice Co-op Kernel Launch:      Yes
      Device PCI Domain ID / Bus ID / location ID:   3 / 1 / 0
      Compute Mode:
         < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
    
    Device 2: "Tesla P100-SXM2-16GB"
      CUDA Driver Version / Runtime Version          9.1 / 9.1
      CUDA Capability Major/Minor version number:    6.0
      Total amount of global memory:                 16276 MBytes (17066885120 bytes)
      (56) Multiprocessors, ( 64) CUDA Cores/MP:     3584 CUDA Cores
      GPU Max Clock rate:                            1481 MHz (1.48 GHz)
      Memory Clock rate:                             715 Mhz
      Memory Bus Width:                              4096-bit
      L2 Cache Size:                                 4194304 bytes
      Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
      Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
      Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
      Total amount of constant memory:               65536 bytes
      Total amount of shared memory per block:       49152 bytes
      Total number of registers available per block: 65536
      Warp size:                                     32
      Maximum number of threads per multiprocessor:  2048
      Maximum number of threads per block:           1024
      Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
      Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
      Maximum memory pitch:                          2147483647 bytes
      Texture alignment:                             512 bytes
      Concurrent copy and kernel execution:          Yes with 3 copy engine(s)
      Run time limit on kernels:                     No
      Integrated GPU sharing Host Memory:            No
      Support host page-locked memory mapping:       Yes
      Alignment requirement for Surfaces:            Yes
      Device has ECC support:                        Enabled
      Device supports Unified Addressing (UVA):      Yes
      Supports Cooperative Kernel Launch:            Yes
      Supports MultiDevice Co-op Kernel Launch:      Yes
      Device PCI Domain ID / Bus ID / location ID:   6 / 1 / 0
      Compute Mode:
         < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
    
    Device 3: "Tesla P100-SXM2-16GB"
      CUDA Driver Version / Runtime Version          9.1 / 9.1
      CUDA Capability Major/Minor version number:    6.0
      Total amount of global memory:                 16276 MBytes (17066885120 bytes)
      (56) Multiprocessors, ( 64) CUDA Cores/MP:     3584 CUDA Cores
      GPU Max Clock rate:                            1481 MHz (1.48 GHz)
      Memory Clock rate:                             715 Mhz
      Memory Bus Width:                              4096-bit
      L2 Cache Size:                                 4194304 bytes
      Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
      Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
      Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
      Total amount of constant memory:               65536 bytes
      Total amount of shared memory per block:       49152 bytes
      Total number of registers available per block: 65536
      Warp size:                                     32
      Maximum number of threads per multiprocessor:  2048
      Maximum number of threads per block:           1024
      Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
      Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
      Maximum memory pitch:                          2147483647 bytes
      Texture alignment:                             512 bytes
      Concurrent copy and kernel execution:          Yes with 3 copy engine(s)
      Run time limit on kernels:                     No
      Integrated GPU sharing Host Memory:            No
      Support host page-locked memory mapping:       Yes
      Alignment requirement for Surfaces:            Yes
      Device has ECC support:                        Enabled
      Device supports Unified Addressing (UVA):      Yes
      Supports Cooperative Kernel Launch:            Yes
      Supports MultiDevice Co-op Kernel Launch:      Yes
      Device PCI Domain ID / Bus ID / location ID:   7 / 1 / 0
      Compute Mode:
         < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
    > Peer access from Tesla P100-SXM2-16GB (GPU0) -> Tesla P100-SXM2-16GB (GPU1) : Yes
    > Peer access from Tesla P100-SXM2-16GB (GPU0) -> Tesla P100-SXM2-16GB (GPU2) : No
    > Peer access from Tesla P100-SXM2-16GB (GPU0) -> Tesla P100-SXM2-16GB (GPU3) : No
    > Peer access from Tesla P100-SXM2-16GB (GPU1) -> Tesla P100-SXM2-16GB (GPU0) : Yes
    > Peer access from Tesla P100-SXM2-16GB (GPU1) -> Tesla P100-SXM2-16GB (GPU2) : No
    > Peer access from Tesla P100-SXM2-16GB (GPU1) -> Tesla P100-SXM2-16GB (GPU3) : No
    > Peer access from Tesla P100-SXM2-16GB (GPU2) -> Tesla P100-SXM2-16GB (GPU0) : No
    > Peer access from Tesla P100-SXM2-16GB (GPU2) -> Tesla P100-SXM2-16GB (GPU1) : No
    > Peer access from Tesla P100-SXM2-16GB (GPU2) -> Tesla P100-SXM2-16GB (GPU3) : Yes
    > Peer access from Tesla P100-SXM2-16GB (GPU3) -> Tesla P100-SXM2-16GB (GPU0) : No
    > Peer access from Tesla P100-SXM2-16GB (GPU3) -> Tesla P100-SXM2-16GB (GPU1) : No
    > Peer access from Tesla P100-SXM2-16GB (GPU3) -> Tesla P100-SXM2-16GB (GPU2) : Yes
    
    deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.1, CUDA Runtime Version = 9.1, NumDevs = 4
    Result = PASS
    
    

    Can someone please help us with diagnosing this issue? It seems CUDA is installed correctly but DL4J is not producing any output and the following Java code is just hanging when calling Nd4j.create() for the first time:

    ...
    Nd4j.create()
    ...
    

    Note that this same code works fine on the AlienWare on Ubuntu 64 bit.

    Aha! Link: https://skymindai.aha.io/features/ND4J-143

    DevOps ND4J 
    opened by madhs53 79
  • Feature Request: Add Support for Apple Silicon M1

    Feature Request: Add Support for Apple Silicon M1

    Issue Description

    New Apple Silicon M1 processor yields javacpp.platform of macosx-arm64. These artifacts aren't available in Maven Central Repository which causes builds and IDEs on this new hardware to error/complain.

    See these two forum topics for more information: https://community.konduit.ai/t/support-for-apple-silicon-m1/1168 https://community.konduit.ai/t/compiling-on-arm/283

    Expected behavior: prebuilt jars for macosx-arm64 should exist in maven central repo

    Enhancement ARM 
    opened by bpossolo 77
  • Convert Mat image to INDArray, When trying to convert Mat image to INDArray it is returning me INDArray null

    Convert Mat image to INDArray, When trying to convert Mat image to INDArray it is returning me INDArray null

    I have this code and I do not understand why my IDNarray image is returning me null when I try convert Mat in INDArray. I using the android sutdio 3.0.1.

    //************************* Digit classification *******************************************************************
            for (int i = 0; i < rects.size() ; i++) {
                Rect rect = rects.get(i);
                digit = inverted.submat(rect.y, rect.y + rect.height, rect.x, rect.x + rect.width);
                Imgproc.resize(digit, digit, new Size(28, 28));
    
                    NativeImageLoader nativeImageLoader = new NativeImageLoader(digit.height(), digit.width(), digit.channels());//Use the nativeImageLoader to convert to numerical matrix
                    INDArray image = nativeImageLoader.asMatrix(digit);//put image into INDArray
    
                System.out.println("carregar modelo matrixes  " + image);
     }
    

    output: carregar modelo matrixes NULL

    Bug Enhancement DataVec / ETL 
    opened by AILTON091 76
  • Add CenterLossOutputLayer for efficient training

    Add CenterLossOutputLayer for efficient training

    Work in progress...

    Center loss has proven to be more efficient than triplet loss, and it enables classifier training which is also more speedy than triplets.

    @AlexDBlack can you take a look at CenterLossParamInitializer and confirm it's on the right track? Also, should we just specify numClasses in layer conf? Let's keep discussion in Gitter :)

    opened by crockpotveggies 65
  • Can not run CUDA example on Jetson TX1

    Can not run CUDA example on Jetson TX1

    Issue Description

    deeplearning4jtest-1.0/bin/deeplearning4jtest 10000 10 09:07:35.540 [main] INFO deeplearning4jtest.CSVExample - Build model.... 09:07:35.652 [main] INFO org.nd4j.linalg.factory.Nd4jBackend - Loaded [JCublasBackend] backend Exception in thread "main" java.lang.ExceptionInInitializerError at org.nd4j.jita.concurrency.CudaAffinityManager.getNumberOfDevices(CudaAffinityManager.java:173) at org.nd4j.jita.constant.ConstantProtector.purgeProtector(ConstantProtector.java:36) at org.nd4j.jita.constant.ConstantProtector.(ConstantProtector.java:29) at org.nd4j.jita.constant.ConstantProtector.(ConstantProtector.java:19) at org.nd4j.jita.constant.ProtectedCudaConstantHandler.(ProtectedCudaConstantHandler.java:45) at org.nd4j.jita.constant.CudaConstantHandler.(CudaConstantHandler.java:17) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at org.nd4j.linalg.factory.Nd4j.initWithBackend(Nd4j.java:5753) at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5694) at org.nd4j.linalg.factory.Nd4j.(Nd4j.java:184) at org.deeplearning4j.nn.conf.NeuralNetConfiguration$Builder.seed(NeuralNetConfiguration.java:677) at deeplearning4jtest.CSVExample.main(CSVExample.java:54) Caused by: java.lang.RuntimeException: ND4J is probably missing dependencies. For more information, please refer to: http://nd4j.org/getstarted.html at org.nd4j.nativeblas.NativeOpsHolder.(NativeOpsHolder.java:51) at org.nd4j.nativeblas.NativeOpsHolder.(NativeOpsHolder.java:19) ... 13 more Caused by: java.lang.UnsatisfiedLinkError: no jnind4jcuda in java.library.path at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867) at java.lang.Runtime.loadLibrary0(Runtime.java:870) at java.lang.System.loadLibrary(System.java:1122) at org.bytedeco.javacpp.Loader.loadLibrary(Loader.java:963) at org.bytedeco.javacpp.Loader.load(Loader.java:764) at org.bytedeco.javacpp.Loader.load(Loader.java:671) at org.nd4j.nativeblas.Nd4jCuda.(Nd4jCuda.java:10) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.bytedeco.javacpp.Loader.load(Loader.java:726) at org.bytedeco.javacpp.Loader.load(Loader.java:671) at org.nd4j.nativeblas.Nd4jCuda$NativeOps.(Nd4jCuda.java:62) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at org.nd4j.nativeblas.NativeOpsHolder.(NativeOpsHolder.java:29) ... 14 more Caused by: java.lang.UnsatisfiedLinkError: no nd4jcuda in java.library.path at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867) at java.lang.Runtime.loadLibrary0(Runtime.java:870) at java.lang.System.loadLibrary(System.java:1122) at org.bytedeco.javacpp.Loader.loadLibrary(Loader.java:963) at org.bytedeco.javacpp.Loader.load(Loader.java:752) ... 24 more

    Version Information

    Please indicate relevant versions, including, if relevant:

    • Deeplearning4j version - 0.8.0
    • platform information (OS, etc) - Ubuntu 16.04, arm64, Jetson TX1
    • CUDA version, if used - 8.0
    • NVIDIA driver version, if in use -

    Contributing

    If you'd like to help us fix the issue by contributing some code, but would like guidance or help in doing so, please mention it! - I could help, if I can.

    DevOps 
    opened by gospodinbodurov 60
  • libopenblas_nolapack.so.0: cannot open shared object file: No such file or directory

    libopenblas_nolapack.so.0: cannot open shared object file: No such file or directory

    Hello,

    I've just tried to run my application on beta2 and I've got the follow exception: Caused by: java.lang.UnsatisfiedLinkError: /app/.javacpp/cache/openblas-0.3.0-1.4.2-linux-x86_64.jar/org/bytedeco/javacpp/linux-x86_64/libjniopenblas_nolapack.so: libopenblas_nolapack.so.0: cannot open shared object file: No such file or directory

    You can find full stacktrace here - https://gist.github.com/sergmain/0685cda1456721595637def8ca347662

    Few days ago, I opened an issue https://github.com/deeplearning4j/deeplearning4j/issues/6083 Since then the issue was fixed and beta2 was released.

    I rolled back to beta and my application started to work.

    there is stub project for reproducing this problem on heroku https://github.com/sergmain/dl4j-uber-jar It doesn't contain actual keras model in this repo but you can use any.

    Summary: beta - working beta2 - not working target OS - Heroku's PaaS target pratform for DL4J is specified in /.mvn/jvm.config

    Question ND4J 
    opened by sergmain 59
  • Problem importing keras model

    Problem importing keras model

    #### Issue Description

    Please describe our issue, along with: I try to import a deeplearning model created in Python using last tensorflow version and a pre-trained model (EfficientNet). My goal is to import my H5 model into java but i always get an error saying "Unsupported keras layer type Functional."

    • expected behavior
    • encountered behavior

    Version Information

    Please indicate relevant versions, including, if relevant:

    • Deeplearning4j version : 1.0.0-M2.1
    • Platform information (OS, etc) : Windows
    • CUDA version, if used
    • NVIDIA driver version, if in use

    Additional Information

    Where applicable, please also provide:

    Exception in thread "main" java.lang.RuntimeException: org.deeplearning4j.nn.modelimport.keras.exceptions.UnsupportedKerasConfigurationException: Unsupported keras layer type Functional. Please file an issue at https://github.com/eclipse/deeplearning4j/issues. at net.PneumoniaDetection.makePrediction(PneumoniaDetection.java:85) at net.Main.main(Main.java:25) Caused by: org.deeplearning4j.nn.modelimport.keras.exceptions.UnsupportedKerasConfigurationException: Unsupported keras layer type Functional. Please file an issue at https://github.com/eclipse/deeplearning4j/issues. at org.deeplearning4j.nn.modelimport.keras.utils.KerasLayerUtils.getKerasLayerFromConfig(KerasLayerUtils.java:337) at org.deeplearning4j.nn.modelimport.keras.KerasModel.prepareLayers(KerasModel.java:223) at org.deeplearning4j.nn.modelimport.keras.KerasModel.(KerasModel.java:165) at org.deeplearning4j.nn.modelimport.keras.KerasModel.(KerasModel.java:97) at org.deeplearning4j.nn.modelimport.keras.utils.KerasModelBuilder.buildModel(KerasModelBuilder.java:311) at org.deeplearning4j.nn.modelimport.keras.KerasModelImport.importKerasModelAndWeights(KerasModelImport.java:167) at net.PneumoniaDetection.makePrediction(PneumoniaDetection.java:78) ... 1 more

    Process finished with exit code 1

    Contributing

        try {
                String fullModel = new ClassPathResource("/model_multi_B4_3_1.h5").getFile().getPath();
            ComputationGraph model = KerasModelImport.importKerasModelAndWeights(fullModel);
            System.out.println("model loaded");
    
        } catch (IOException e) {
            System.out.println(e);
    
        } catch (UnsupportedKerasConfigurationException | InvalidKerasConfigurationException e) {
            throw new RuntimeException(e);
        }
        }
    
    opened by stevo32800 2
  • CreateView Op issues

    CreateView Op issues

    Issue Description

    There are 2 identified issues for org.nd4j.linalg.api.ops.impl.shape.CreateView:

    1. if the Op is called using SDVariable.getView() thenNo op known for hash: -6201726597031682680 and name create_view is thrown.
    2. if the Op is called directly as SameDiff.createView(), then Please extend DynamicCustomOp.doDiff to support SameDiff backprop operations. Op: org.nd4j.linalg.api.ops.impl.shape.CreateView is thrown

    Anyway, this Op is currently non-functional.

    • expected behavior : successful operation execution
    • encountered behavior: Exception

    Version Information

    • Deeplearning4j version - M 2.1
    • Platform information: Windows 10
    • CUDA version - N/A
    • NVIDIA driver version - N/A

    Additional Information

    Stack Trace for the first issue:

    java.lang.IllegalStateException: No op known for hash: -6201726597031682680 and name create_view
    	at org.nd4j.imports.converters.DifferentialFunctionClassHolder.customOpClassForHashAndName(DifferentialFunctionClassHolder.java:389)
    	at org.nd4j.autodiff.samediff.serde.FlatBuffersMapper.fromFlatNode(FlatBuffersMapper.java:448)
    	at org.nd4j.autodiff.samediff.serde.FlatBuffersMapper.cloneViaSerialize(FlatBuffersMapper.java:1065)
    	at org.nd4j.autodiff.samediff.SameDiff.invokeGraphOn(SameDiff.java:656)
    	at org.nd4j.autodiff.samediff.SameDiff.lambda$createGradFunction$1(SameDiff.java:4770)
    	at org.nd4j.autodiff.samediff.SameDiff.defineFunction(SameDiff.java:4557)
    	at org.nd4j.autodiff.samediff.SameDiff.defineFunction(SameDiff.java:4542)
    	at org.nd4j.autodiff.samediff.SameDiff.createGradFunction(SameDiff.java:4762)
    	at org.nd4j.autodiff.samediff.SameDiff.createGradFunction(SameDiff.java:4669)
    	at org.nd4j.autodiff.samediff.SameDiff.fitHelper(SameDiff.java:1870)
    	at org.nd4j.autodiff.samediff.SameDiff.fit(SameDiff.java:1792)
    	at org.nd4j.autodiff.samediff.SameDiff.fit(SameDiff.java:1732)
    

    Stack Trace for the second issue:

    java.lang.UnsupportedOperationException: Please extend DynamicCustomOp.doDiff to support SameDiff backprop operations. Op: org.nd4j.linalg.api.ops.impl.shape.CreateView
    	at org.nd4j.linalg.api.ops.DynamicCustomOp.doDiff(DynamicCustomOp.java:744)
    	at org.nd4j.autodiff.functions.DifferentialFunction.diff(DifferentialFunction.java:677)
    	at org.nd4j.autodiff.samediff.SameDiff.lambda$createGradFunction$1(SameDiff.java:5042)
    	at org.nd4j.autodiff.samediff.SameDiff.defineFunction(SameDiff.java:4557)
    	at org.nd4j.autodiff.samediff.SameDiff.defineFunction(SameDiff.java:4542)
    	at org.nd4j.autodiff.samediff.SameDiff.createGradFunction(SameDiff.java:4762)
    	at org.nd4j.autodiff.samediff.SameDiff.createGradFunction(SameDiff.java:4669)
    	at org.nd4j.autodiff.samediff.SameDiff.fitHelper(SameDiff.java:1870)
    	at org.nd4j.autodiff.samediff.SameDiff.fit(SameDiff.java:1792)
    	at org.nd4j.autodiff.samediff.SameDiff.fit(SameDiff.java:1732)
    
    opened by partarstu 0
  • Split Op fails if numSplit parameter is an SDVariable

    Split Op fails if numSplit parameter is an SDVariable

    Issue Description

    The error message: org.nd4j.linalg.exception.ND4UnresolvedOutputVariables: Could not determine number of output variables for op split - Split. Ops can override getNumOutputs() to specify number of outputs if required

    • expected behavior : successful operation execution
    • encountered behavior: Exception

    Version Information

    • Deeplearning4j version - M 2.1
    • Platform information: Windows 10
    • CUDA version - N/A
    • NVIDIA driver version - N/A

    Additional Information

    Split Op uses the integer numSplitparameter in order to define the number of output variables. That doesn't work in case numSplitis an SDVariable. For that case an additional handling of override of the method getNumOutputs() is needed.

    Stack Trace:

    org.nd4j.linalg.exception.ND4UnresolvedOutputVariables: Could not determine number of output variables for op split - Split. Ops can override getNumOutputs() to specify number of outputs if required
    	at org.nd4j.autodiff.samediff.SameDiff.generateOutputVariableForOp(SameDiff.java:4383)
    	at org.nd4j.linalg.api.ops.DynamicCustomOp.outputVariables(DynamicCustomOp.java:254)
    	at org.nd4j.linalg.api.ops.DynamicCustomOp.outputVariables(DynamicCustomOp.java:237)
    	at org.nd4j.autodiff.samediff.ops.SDBaseOps.split(SDBaseOps.java:4375)
    
    opened by partarstu 0
  • Fatal error, core dumped, when running GenerateTxtModel example with commit 03e11c727222f21d74aa34cc9a069eca7ab54b1a in CPU mode

    Fatal error, core dumped, when running GenerateTxtModel example with commit 03e11c727222f21d74aa34cc9a069eca7ab54b1a in CPU mode

    Issue Description

    Please describe our issue, along with:

    • Expected behavior: GenerateTxtModel should run without error
    • Encountered behavior:
    mvn -nsu compile exec:java -Dexec.mainClass="org.deeplearning4j.examples.advanced.modelling.charmodelling.generatetext.GenerateTxtModel"
    [INFO] Scanning for projects...
    [INFO] 
    [INFO] ------------------< org.deeplearning4j:dl4j-examples >------------------
    [INFO] Building Introduction to DL4J 1.0.0-SNAPSHOT
    [INFO] --------------------------------[ jar ]---------------------------------
    [INFO] 
    [INFO] --- maven-enforcer-plugin:1.0.1:enforce (enforce-default) @ dl4j-examples ---
    [INFO] 
    [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ dl4j-examples ---
    [INFO] Using 'UTF-8' encoding to copy filtered resources.
    [INFO] Copying 2 resources
    [INFO] 
    [INFO] --- maven-compiler-plugin:3.8.1:compile (default-compile) @ dl4j-examples ---
    [INFO] Nothing to compile - all classes are up to date
    [INFO] 
    [INFO] --- exec-maven-plugin:1.4.0:java (default-cli) @ dl4j-examples ---
    Using existing text file at /tmp/Shakespeare.txt
    Loaded and converted file: 963172 valid characters of 969521 total characters (6349 removed)
    o.n.l.f.Nd4jBackend - Loaded [CpuBackend] backend
    o.n.n.NativeOpsHolder - Number of threads used for linear algebra: 12
    o.n.l.c.n.CpuNDArrayFactory - Binary level Generic x86 optimization level AVX/AVX2
    o.n.n.Nd4jBlas - Number of threads used for OpenMP BLAS: 12
    o.n.l.a.o.e.DefaultOpExecutioner - Backend used: [CPU]; OS: [Linux]
    o.n.l.a.o.e.DefaultOpExecutioner - Cores: [24]; Memory: [15.7GB];
    o.n.l.a.o.e.DefaultOpExecutioner - Blas vendor: [OPENBLAS]
    o.n.l.c.n.CpuBackend - Backend build information:
     GCC: "11.3.0"
    STD version: 201103L
    DEFAULT_ENGINE: samediff::ENGINE_CPU
    HAVE_FLATBUFFERS
    HAVE_OPENBLAS
    o.d.n.m.MultiLayerNetwork - Starting MultiLayerNetwork with WorkspaceModes set to [training: ENABLED; inference: ENABLED], cacheMode set to [NONE]
    
    =====================================================================================
    LayerName (LayerType)     nIn,nOut   TotalParams   ParamsShape                       
    =====================================================================================
    layer0 (LSTM)             77,200     222,400       W:{77,800}, RW:{200,800}, b:{800} 
    layer1 (LSTM)             200,200    320,800       W:{200,800}, RW:{200,800}, b:{800}
    layer2 (RnnOutputLayer)   200,77     15,477        W:{200,77}, b:{77}                
    -------------------------------------------------------------------------------------
                Total Parameters:  558,677
            Trainable Parameters:  558,677
               Frozen Parameters:  0
    =====================================================================================
    
    o.d.o.l.ScoreIterationListener - Score at iteration 0 is 217.36468444651516
    #
    # A fatal error has been detected by the Java Runtime Environment:
    #
    #  SIGSEGV (0xb) at pc=0x00007fd5d7e3e077, pid=3805136, tid=3805204
    #
    # JRE version: OpenJDK Runtime Environment Zulu18.30+11-CA (18.0.1+10) (build 18.0.1+10)
    # Java VM: OpenJDK 64-Bit Server VM Zulu18.30+11-CA (18.0.1+10, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
    # Problematic frame:
    # C  [libnd4jcpu.so+0x130f077]  sd::ShapeDescriptor::ShapeDescriptor(long long const*, bool)+0x47
    #
    # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /home/greg_barton/dl4j/deeplearning4j-examples/dl4j-examples/core.3805136)
    #
    # An error report file with more information is saved as:
    # /home/greg_barton/dl4j/deeplearning4j-examples/dl4j-examples/hs_err_pid3805136.log
    #
    # If you would like to submit a bug report, please visit:
    #   http://www.azul.com/support/
    # The crash happened outside the Java Virtual Machine in native code.
    # See problematic frame for where to report the bug.
    #
    Aborted (core dumped)
    

    Version Information

    Please indicate relevant versions, including, if relevant:

    • Deeplearning4j version: master branch, commit 03e11c727222f21d74aa34cc9a069eca7ab54b1a
    • Platform information (OS, etc): Ubuntu 22.04 and OSX
    • CUDA version, if used: N/A
    • NVIDIA driver version, if in use: N/A

    hs_err_pid3805136.log.gz

    opened by gregbarton 0
  • Fixes #9869 linear layer equivalencies

    Fixes #9869 linear layer equivalencies

    What changes were proposed in this pull request?

    Fixes linear layer equivalencies, adds test (Please fill in changes proposed in this fix)

    How was this patch tested?

    (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

    Quick checklist

    The following checklist helps ensure your PR is complete:

    • [X ] Eclipse Contributor Agreement signed, and signed commits - see IP Requirements page for details
    • [ X] Reviewed the Contributing Guidelines and followed the steps within.
    • [ X] Created tests for any significant new code additions.
    • [ X] Relevant tests for your changes are passing.
    opened by agibsonccc 0
  • Training using sd.nn.linear and sd.nn.reluLayer doesn't succeed

    Training using sd.nn.linear and sd.nn.reluLayer doesn't succeed

    Issue Description

    • expected behavior: The expressions input.mmul(weights).add(bias) and sd.nn.linear(input, weights, bias) should be equivalent.
    • encountered behavior: In the code below, training will only succeed when the first variant is used: Variant 1 archives 100% accuracy, while variant 2 doesn't get better than random guessing (often even 0% accuracy).

    I also added a third variant below, which uses sd.nn.reluLayer(input, weights, bias). Although this is not equivalent to the other two variants (it additionally has a ReLU activation function) it should nonetheless allow learning the task with high accuracy, but it doesn't (note that the weights are initialized all positive, so the ReLU should not make a difference).

    Caveat: I could only test this with M2.1 due to issue #9862, which is fixed but not yet in SNAPSHOT.

    int batchSize = 32;
    int modelDim = 10;
    
    SameDiff sd = SameDiff.create();
    
    SDVariable features = sd.placeHolder("features", FLOAT, batchSize, modelDim);
    SDVariable labels = sd.placeHolder("labels", FLOAT, batchSize, modelDim);
    SDVariable weights = sd.var("weights", new OneInitScheme('c'), FLOAT, modelDim, modelDim);
    SDVariable bias = sd.zero("bias", modelDim);
    // SDVariable predictions = features.mmul(weights).add("predictions", bias);         // <<< variant 1 (works)
    SDVariable predictions = sd.nn.linear("predictions", features, weights, bias);       // <<< variant 2 (doesn't work)
    // SDVariable predictions = sd.nn.reluLayer("predictions", features, weights, bias); // <<< variant 3 (doesn't work)
    sd.loss.meanSquaredError("loss", labels, predictions, null);
    
    TrainingConfig config = new TrainingConfig.Builder()
            .updater(new Adam(0.1))
            .dataSetFeatureMapping("features")
            .dataSetLabelMapping("labels")
            .build();
    sd.setTrainingConfig(config);
    
    // the task is to reconstruct the one-hot encoded input
    DataSetIterator iterator = new ReconstructionDataSetIterator(new RandomDataSetIterator(100, new long[]{batchSize, modelDim}, new long[]{}, ONE_HOT, ZEROS));
    
    sd.fit(iterator, 10);
    
    Evaluation evaluation = new Evaluation();
    sd.evaluate(iterator, "predictions", evaluation);
    System.out.println(evaluation.stats());
    

    Version Information

    • Deeplearning4j version: 1.0.0-M2.1
    • Platform information (OS, etc): Linux Mint 21
    • CUDA version, if used: N/A
    • NVIDIA driver version, if in use: N/A
    opened by CompilerCrash 2
Firestorm is a Remote Shuffle Service, and provides the capability for Apache Spark applications to store shuffle data on remote servers

What is Firestorm Firestorm is a Remote Shuffle Service, and provides the capability for Apache Spark applications to store shuffle data on remote ser

Tencent 246 Nov 29, 2022
TensorFlow Lite Object Detection Android Demo

GSoC Project 2021 - TensorFlow Description This repository contains the project where I contributed to the TensorFlow Team during GSoC in the year 202

Sayan Nath 6 Dec 31, 2022
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

Oryx 2 is a realization of the lambda architecture built on Apache Spark and Apache Kafka, but with specialization for real-time large scale machine l

Oryx Project 1.8k Dec 28, 2022
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

Oryx 2 is a realization of the lambda architecture built on Apache Spark and Apache Kafka, but with specialization for real-time large scale machine l

Oryx Project 1.7k Mar 12, 2021
Data Structures and Algorithms (DSA) - Java Language Using Integrated Development Environments NetBeans

Data Structures and Algorithms (DSA) Course Code : CSC211 Credit Hours : 4 Language : JAVA Integrated development environments : NETBEANS Topic Covere

Ossama Mehmood 샘 2 Oct 1, 2022
Care aims to create an IoT solution to hospitals interconnecting smart monitors to decrease the time a doctor takes to respond to an emergency.

Care Description This project called Care, developed for the INFO1127 course - Software Engineering - aims to create an IoT solution to hospitals inte

null 5 Oct 4, 2022
Apache Spark - A unified analytics engine for large-scale data processing

Apache Spark Spark is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an op

The Apache Software Foundation 34.7k Jan 2, 2023
Flink/Spark Connectors for Apache Doris(Incubating)

Apache Doris (incubating) Connectors The repository contains connectors for Apache Doris (incubating) Flink Doris Connector More information about com

The Apache Software Foundation 30 Dec 7, 2022
Word Count in Apache Spark using Java

Word Count in Apache Spark using Java

Arjun Gautam 2 Feb 24, 2022
Transform ML models into a native code (Java, C, Python, Go, JavaScript, Visual Basic, C#, R, PowerShell, PHP, Dart, Haskell, Ruby, F#, Rust) with zero dependencies

m2cgen m2cgen (Model 2 Code Generator) - is a lightweight library which provides an easy way to transpile trained statistical models into a native cod

Bayes' Witnesses 2.3k Jan 4, 2023
SparkFE is the LLVM-based and high-performance Spark native execution engine which is designed for feature engineering.

Spark has rapidly emerged as the de facto standard for big data processing. However, it is not designed for machine learning which has more and more limitation in AI scenarios. SparkFE rewrite the execution engine in C++ and achieve more than 6x performance improvement for feature extraction. It guarantees the online-offline consistency which makes AI landing much easier. For further details, please refer to SparkFE Documentation.

4Paradigm 67 Jun 10, 2021
Sparkling Water provides H2O functionality inside Spark cluster

Sparkling Water Sparkling Water integrates H2O's fast scalable machine learning engine with Spark. It provides: Utilities to publish Spark data struct

H2O.ai 939 Jan 2, 2023
Serverless proxy for Spark cluster

Hydrosphere Mist Hydrosphere Mist is a serverless proxy for Spark cluster. Mist provides a new functional programming framework and deployment model f

hydrosphere.io 317 Dec 1, 2022
Spark interface for Drsti

Drsti for Spark (ai.jgp.drsti-spark) Spark interface for Drsti Resources Bringing vision to Apache Spark (2021-09-21) introduces Drsti and explains ho

Jean-Georges 3 Sep 22, 2021
calculator when you be using a model that employs RPN (Reverse Polish Notation)

calculator when you be using a model that employs RPN (Reverse Polish Notation) in its calculations and be a custom build all at the same time? The kids may have colour TFTs and graphing functions, but your keyboard has no equals sign, and that means something.

Eslam ElBeak 8 Oct 28, 2021
mBERT is a mutation testing tool that uses a pre-trained language model (CodeBERT) to generate mutants.

mBERT is a mutation testing tool that uses a pre-trained language model (CodeBERT) to generate mutants.

null 7 Oct 22, 2022
Archinsight project tends to implement architecture-as-code definition of a standard c4 architecture model

Archinsight project tends to implement architecture-as-code definition of a standard c4 architecture model. This project offers a new Insight language designed in such way that an Architect can focus on architecture definition, not visualization. Compared to UML, the Insight language is more specific and is unable to describe an arbitrary entity, but shorter and probably easier to use.

null 25 Nov 24, 2022
👄 The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike

Quick Info this library tries to solve language detection of very short words and phrases, even shorter than tweets makes use of both statistical and

Peter M. Stahl 532 Dec 28, 2022