Model import deployment framework for retraining models (pytorch, tensorflow,keras) deploying in JVM Micro service environments, mobile devices, iot, and Apache Spark

Last update: Dec 30, 2022

Overview

The Eclipse Deeplearning4J (DL4J) ecosystem is a set of projects intended to support all the needs of a JVM based deep learning application. This means starting with the raw data, loading and preprocessing it from wherever and whatever format it is in to building and tuning a wide variety of simple and complex deep learning networks.

Because Deeplearning4J runs on the JVM you can use it with a wide variety of JVM based languages other than Java, like Scala, Kotlin, Clojure and many more.

The DL4J stack comprises of:

DL4J: High level API to build MultiLayerNetworks and ComputationGraphs with a variety of layers, including custom ones. Supports importing Keras models from h5, including tf.keras models (as of 1.0.0-beta7) and also supports distributed training on Apache Spark
ND4J: General purpose linear algebra library with over 500 mathematical, linear algebra and deep learning operations. ND4J is based on the highly-optimized C++ codebase LibND4J that provides CPU (AVX2/512) and GPU (CUDA) support and acceleration by libraries such as OpenBLAS, OneDNN (MKL-DNN), cuDNN, cuBLAS, etc
SameDiff : Part of the ND4J library, SameDiff is our automatic differentiation / deep learning framework. SameDiff uses a graph-based (define then run) approach, similar to TensorFlow graph mode. Eager graph (TensorFlow 2.x eager/PyTorch) graph execution is planned. SameDiff supports importing TensorFlow frozen model format .pb (protobuf) models. Import for ONNX, TensorFlow SavedModel and Keras models are planned. Deeplearning4j also has full SameDiff support for easily writing custom layers and loss functions.
DataVec: ETL for machine learning data in a wide variety of formats and files (HDFS, Spark, Images, Video, Audio, CSV, Excel etc)
Arbiter: Library for hyperparameter search
LibND4J : C++ library that underpins everything. For more information on how the JVM acceses native arrays and operations refer to JavaCPP

All projects in the DL4J ecosystem support Windows, Linux and macOS. Hardware support includes CUDA GPUs (10.0, 10.1, 10.2 except OSX), x86 CPU (x86_64, avx2, avx512), ARM CPU (arm, arm64, armhf) and PowerPC (ppc64le).

Using Eclipse Deeplearning4J in your project

Deeplearning4J has quite a few dependencies. For this reason we only support usage with a build tool.

<dependencies>
  <dependency>
      <groupId>org.deeplearning4j</groupId>
      <artifactId>deeplearning4j-core</artifactId>
      <version>1.0.0-beta7</version>
  </dependency>
  <dependency>
      <groupId>org.nd4j</groupId>
      <artifactId>nd4j-native-platform</artifactId>
      <version>1.0.0-beta7</version>
  </dependency>
</dependencies>

Add these dependencies to your pom.xml file to use Deeplearning4J with the CPU backend. A full standalone project example is available in the example repository, if you want to start a new Maven project from scratch.

A taste of code

Deeplearning4J offers a very high level API for defining even complex neural networks. The following example code shows you how LeNet, a convolutional neural network, is defined in DL4J.

MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
                .seed(seed)
                .l2(0.0005)
                .weightInit(WeightInit.XAVIER)
                .updater(new Adam(1e-3))
                .list()
                .layer(new ConvolutionLayer.Builder(5, 5)
                        .stride(1,1)
                        .nOut(20)
                        .activation(Activation.IDENTITY)
                        .build())
                .layer(new SubsamplingLayer.Builder(PoolingType.MAX)
                        .kernelSize(2,2)
                        .stride(2,2)
                        .build())
                .layer(new ConvolutionLayer.Builder(5, 5)
                        .stride(1,1)
                        .nOut(50)
                        .activation(Activation.IDENTITY)
                        .build())
                .layer(new SubsamplingLayer.Builder(PoolingType.MAX)
                        .kernelSize(2,2)
                        .stride(2,2)
                        .build())
                .layer(new DenseLayer.Builder().activation(Activation.RELU)
                        .nOut(500).build())
                .layer(new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                        .nOut(outputNum)
                        .activation(Activation.SOFTMAX)
                        .build())
                .setInputType(InputType.convolutionalFlat(28,28,1))
                .build();

Documentation, Guides and Tutorials

You can find the official documentation for Deeplearning4J and the other libraries of its ecosystem at http://deeplearning4j.konduit.ai/.

Want some examples?

We have separate repository with various examples available: https://github.com/eclipse/deeplearning4j-examples

Building from source

It is preferred to use the official pre-compiled releases (see above). But if you want to build from source, first take a look at the prerequisites for building from source here: https://deeplearning4j.konduit.ai/getting-started/build-from-source.

To build everything, we can use commands like

./change-cuda-versions.sh x.x
./change-scala-versions.sh 2.xx
./change-spark-versions.sh x
mvn clean install -Dmaven.test.skip -Dlibnd4j.cuda=x.x -Dlibnd4j.compute=xx

mvn -B -V -U clean install -pl '!jumpy,!pydatavec,!pydl4j' -Dlibnd4j.platform=linux-x86_64 -Dlibnd4j.chip=cuda -Dlibnd4j.cuda=9.2 -Dlibnd4j.compute=<your GPU CC> -Djavacpp.platform=linux-x86_64 -Dmaven.test.skip=true

An example of GPU "CC" or compute capability is 61 for Titan X Pascal.

License

Apache License 2.0

Commercial Support

Deeplearning4J is actively developed by the team at Konduit K.K..

If you need any commercial support feel free to reach out to us.

Comments

[WIP] Keras upgrades
Work in progress...

Upgrades to deeplearning4j-keras to be a little better structured and expand the API from Keras to DL4J. Encourages hijacking model methods rather than implementing an actual Keras backend for efficiency and performance.

Main goals of this PR include:

expanding Keras to better support DL4J

model saving methods via save_model

supporting Keras functional API
opened by crockpotveggies 103
Implement new UI functionality using Play framework

_WIP DO NOT MERGE_

Play framework UI: builds upon earlier StatsListener and StatsStorage work implemented here: https://github.com/deeplearning4j/deeplearning4j/pull/2143

opened by AlexDBlack 90
Fix RBMs and AE
Setup vb params to persist and be updated when in pretraining mode. It was skipping the update part

Added flag for pretraining to configuration at layer level and set trigger to turn off after layer pretrains. LayerUpdater will skip vb params when running outside pretrain. In previous setup, backprop was hard coded to true in many cases when setting params or gradients and it would skip vb (visual bias) during pretrain phase. In this change, getting the count for params or gradients or updating them will take vb into account. It will just not have any changes applied in the updater when it is not in pretrain mode.

HiddenUnit is the activation in RBM - added backpropGradient and derivative for hidden unit in RBM to account for this fact

RBM needed a reverse sign on application of gradients for the step function

Deprecated unused code in RBM and cleaned up functions in AE that appeared out of date

Expanded RBM tests and fixed gradient checks
opened by nyghtowl 86
"A fatal error has been detected by the Java Runtime Environment" when running ParagraphVectors.inferVector(), 1.0.0-alpha
Issue Description

I submitted this issue before for dl4j v0.80, and thought it was resolved after upgrading to 1.00-alpha. However when I built a new ParagraphVectors model and called the method inferVector() to infer a batch of new texts, the error came back again. The information about the issue is as follows:

I'm running DL4J on my personal laptop, within Eclipse IDE. If I saved the ParagraphVectors model to a file and then loaded the model from the same file to call ParagraphVectors.inferVector, I received the error message of "A fatal error has been detected by the Java Runtime Environment". One error report is in attachment.

I noticed that this issue appears to be more likely to happen when the new text is a (slightly) longer sentence. The data for training the model and new texts are in Simplified Chinese, all being properly processed before using Dl4J.

The code snippet causing this issue is as follows, within a next() function of a DataSetIterator:

for(int j=0; j<report.size(); j++){ String stc = report.get(j); // this is where the problem is // m_SWV is loaded from a saved model, and proper TokenizerFactory has been set INDArray vector = ((ParagraphVectors)m_SWV).inferVector(stc); features.put(new INDArrayIndex[]{NDArrayIndex.point(i), NDArrayIndex.all(), NDArrayIndex.point(j)}, vector); temp[1] = j; featuresMask.putScalar(temp, 1.0); }

Version Information

Please indicate relevant versions, including, if relevant:

Deeplearning4j 1.0.0-alpha

platform information (OS, etc): DELL Inspiron 15 laptop with Windows 8 as OS

Java version: jdk1.8.0_60

hs_err_pid4712_jdk1.8_60.log
Bug Release Burndown
opened by xinxu75 85
Word2Vec/ParagraphVectors/DeepWalk Spark

WIP; DO NOT MERGE;

Word2Vec/ParagraphVectors/DeepWalk implementation for Spark, using VoidParameterServer available in ND4j

Do not merge before this: https://github.com/deeplearning4j/nd4j/pull/1551

opened by raver119 82

DL4J Hanging after "Loaded [JCublasBackend] backend"

Hi,

We are running some DL4J code as part of a wider system. This code runs fine on an Alienware development PC with CUDA 9.1 on Ubuntu, run from Eclipse.

However, when we package this application and run it on a RHEL ppc64le server with CUDA 9.1, we see that ND4J is not doing anything after the following output:

2309 [pool-8-thread-1] INFO org.nd4j.linalg.factory.Nd4jBackend - Loaded [JCublasBackend] backend

I have verified we are running the latest NVIDIA drivers and CUDA 9.1 is installed successfully. Below is the output from running the CUDA 9.1 sample deviceQuery, which lists the GPU devices:

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 4 CUDA Capable device(s)

Device 0: "Tesla P100-SXM2-16GB"
  CUDA Driver Version / Runtime Version          9.1 / 9.1
  CUDA Capability Major/Minor version number:    6.0
  Total amount of global memory:                 16276 MBytes (17066885120 bytes)
  (56) Multiprocessors, ( 64) CUDA Cores/MP:     3584 CUDA Cores
  GPU Max Clock rate:                            1481 MHz (1.48 GHz)
  Memory Clock rate:                             715 Mhz
  Memory Bus Width:                              4096-bit
  L2 Cache Size:                                 4194304 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 3 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Enabled
  Device supports Unified Addressing (UVA):      Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   2 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 1: "Tesla P100-SXM2-16GB"
  CUDA Driver Version / Runtime Version          9.1 / 9.1
  CUDA Capability Major/Minor version number:    6.0
  Total amount of global memory:                 16276 MBytes (17066885120 bytes)
  (56) Multiprocessors, ( 64) CUDA Cores/MP:     3584 CUDA Cores
  GPU Max Clock rate:                            1481 MHz (1.48 GHz)
  Memory Clock rate:                             715 Mhz
  Memory Bus Width:                              4096-bit
  L2 Cache Size:                                 4194304 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 3 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Enabled
  Device supports Unified Addressing (UVA):      Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   3 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 2: "Tesla P100-SXM2-16GB"
  CUDA Driver Version / Runtime Version          9.1 / 9.1
  CUDA Capability Major/Minor version number:    6.0
  Total amount of global memory:                 16276 MBytes (17066885120 bytes)
  (56) Multiprocessors, ( 64) CUDA Cores/MP:     3584 CUDA Cores
  GPU Max Clock rate:                            1481 MHz (1.48 GHz)
  Memory Clock rate:                             715 Mhz
  Memory Bus Width:                              4096-bit
  L2 Cache Size:                                 4194304 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 3 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Enabled
  Device supports Unified Addressing (UVA):      Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   6 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 3: "Tesla P100-SXM2-16GB"
  CUDA Driver Version / Runtime Version          9.1 / 9.1
  CUDA Capability Major/Minor version number:    6.0
  Total amount of global memory:                 16276 MBytes (17066885120 bytes)
  (56) Multiprocessors, ( 64) CUDA Cores/MP:     3584 CUDA Cores
  GPU Max Clock rate:                            1481 MHz (1.48 GHz)
  Memory Clock rate:                             715 Mhz
  Memory Bus Width:                              4096-bit
  L2 Cache Size:                                 4194304 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 3 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Enabled
  Device supports Unified Addressing (UVA):      Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   7 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
> Peer access from Tesla P100-SXM2-16GB (GPU0) -> Tesla P100-SXM2-16GB (GPU1) : Yes
> Peer access from Tesla P100-SXM2-16GB (GPU0) -> Tesla P100-SXM2-16GB (GPU2) : No
> Peer access from Tesla P100-SXM2-16GB (GPU0) -> Tesla P100-SXM2-16GB (GPU3) : No
> Peer access from Tesla P100-SXM2-16GB (GPU1) -> Tesla P100-SXM2-16GB (GPU0) : Yes
> Peer access from Tesla P100-SXM2-16GB (GPU1) -> Tesla P100-SXM2-16GB (GPU2) : No
> Peer access from Tesla P100-SXM2-16GB (GPU1) -> Tesla P100-SXM2-16GB (GPU3) : No
> Peer access from Tesla P100-SXM2-16GB (GPU2) -> Tesla P100-SXM2-16GB (GPU0) : No
> Peer access from Tesla P100-SXM2-16GB (GPU2) -> Tesla P100-SXM2-16GB (GPU1) : No
> Peer access from Tesla P100-SXM2-16GB (GPU2) -> Tesla P100-SXM2-16GB (GPU3) : Yes
> Peer access from Tesla P100-SXM2-16GB (GPU3) -> Tesla P100-SXM2-16GB (GPU0) : No
> Peer access from Tesla P100-SXM2-16GB (GPU3) -> Tesla P100-SXM2-16GB (GPU1) : No
> Peer access from Tesla P100-SXM2-16GB (GPU3) -> Tesla P100-SXM2-16GB (GPU2) : Yes

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.1, CUDA Runtime Version = 9.1, NumDevs = 4
Result = PASS

Can someone please help us with diagnosing this issue? It seems CUDA is installed correctly but DL4J is not producing any output and the following Java code is just hanging when calling Nd4j.create() for the first time:

...
Nd4j.create()
...

Note that this same code works fine on the AlienWare on Ubuntu 64 bit.

Aha! Link: https://skymindai.aha.io/features/ND4J-143

DevOps ND4J

opened by madhs53 79

Feature Request: Add Support for Apple Silicon M1

Issue Description

New Apple Silicon M1 processor yields javacpp.platform of macosx-arm64. These artifacts aren't available in Maven Central Repository which causes builds and IDEs on this new hardware to error/complain.

See these two forum topics for more information: https://community.konduit.ai/t/support-for-apple-silicon-m1/1168 https://community.konduit.ai/t/compiling-on-arm/283

Expected behavior: prebuilt jars for macosx-arm64 should exist in maven central repo
Enhancement ARM

opened by bpossolo 77

Convert Mat image to INDArray, When trying to convert Mat image to INDArray it is returning me INDArray null

I have this code and I do not understand why my IDNarray image is returning me null when I try convert Mat in INDArray. I using the android sutdio 3.0.1.

//************************* Digit classification *******************************************************************
        for (int i = 0; i < rects.size() ; i++) {
            Rect rect = rects.get(i);
            digit = inverted.submat(rect.y, rect.y + rect.height, rect.x, rect.x + rect.width);
            Imgproc.resize(digit, digit, new Size(28, 28));

                NativeImageLoader nativeImageLoader = new NativeImageLoader(digit.height(), digit.width(), digit.channels());//Use the nativeImageLoader to convert to numerical matrix
                INDArray image = nativeImageLoader.asMatrix(digit);//put image into INDArray

            System.out.println("carregar modelo matrixes  " + image);
 }

output: carregar modelo matrixes NULL

Bug Enhancement DataVec / ETL

opened by AILTON091 76

Add CenterLossOutputLayer for efficient training

Work in progress...

Center loss has proven to be more efficient than triplet loss, and it enables classifier training which is also more speedy than triplets.

@AlexDBlack can you take a look at CenterLossParamInitializer and confirm it's on the right track? Also, should we just specify numClasses in layer conf? Let's keep discussion in Gitter :)

opened by crockpotveggies 65
Can not run CUDA example on Jetson TX1
Issue Description

deeplearning4jtest-1.0/bin/deeplearning4jtest 10000 10 09:07:35.540 [main] INFO deeplearning4jtest.CSVExample - Build model.... 09:07:35.652 [main] INFO org.nd4j.linalg.factory.Nd4jBackend - Loaded [JCublasBackend] backend Exception in thread "main" java.lang.ExceptionInInitializerError at org.nd4j.jita.concurrency.CudaAffinityManager.getNumberOfDevices(CudaAffinityManager.java:173) at org.nd4j.jita.constant.ConstantProtector.purgeProtector(ConstantProtector.java:36) at org.nd4j.jita.constant.ConstantProtector.(ConstantProtector.java:29) at org.nd4j.jita.constant.ConstantProtector.(ConstantProtector.java:19) at org.nd4j.jita.constant.ProtectedCudaConstantHandler.(ProtectedCudaConstantHandler.java:45) at org.nd4j.jita.constant.CudaConstantHandler.(CudaConstantHandler.java:17) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at org.nd4j.linalg.factory.Nd4j.initWithBackend(Nd4j.java:5753) at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5694) at org.nd4j.linalg.factory.Nd4j.(Nd4j.java:184) at org.deeplearning4j.nn.conf.NeuralNetConfiguration$Builder.seed(NeuralNetConfiguration.java:677) at deeplearning4jtest.CSVExample.main(CSVExample.java:54) Caused by: java.lang.RuntimeException: ND4J is probably missing dependencies. For more information, please refer to: http://nd4j.org/getstarted.html at org.nd4j.nativeblas.NativeOpsHolder.(NativeOpsHolder.java:51) at org.nd4j.nativeblas.NativeOpsHolder.(NativeOpsHolder.java:19) ... 13 more Caused by: java.lang.UnsatisfiedLinkError: no jnind4jcuda in java.library.path at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867) at java.lang.Runtime.loadLibrary0(Runtime.java:870) at java.lang.System.loadLibrary(System.java:1122) at org.bytedeco.javacpp.Loader.loadLibrary(Loader.java:963) at org.bytedeco.javacpp.Loader.load(Loader.java:764) at org.bytedeco.javacpp.Loader.load(Loader.java:671) at org.nd4j.nativeblas.Nd4jCuda.(Nd4jCuda.java:10) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.bytedeco.javacpp.Loader.load(Loader.java:726) at org.bytedeco.javacpp.Loader.load(Loader.java:671) at org.nd4j.nativeblas.Nd4jCuda$NativeOps.(Nd4jCuda.java:62) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at org.nd4j.nativeblas.NativeOpsHolder.(NativeOpsHolder.java:29) ... 14 more Caused by: java.lang.UnsatisfiedLinkError: no nd4jcuda in java.library.path at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867) at java.lang.Runtime.loadLibrary0(Runtime.java:870) at java.lang.System.loadLibrary(System.java:1122) at org.bytedeco.javacpp.Loader.loadLibrary(Loader.java:963) at org.bytedeco.javacpp.Loader.load(Loader.java:752) ... 24 more

Version Information

Please indicate relevant versions, including, if relevant:

Deeplearning4j version - 0.8.0

platform information (OS, etc) - Ubuntu 16.04, arm64, Jetson TX1

CUDA version, if used - 8.0

NVIDIA driver version, if in use -

Contributing

If you'd like to help us fix the issue by contributing some code, but would like guidance or help in doing so, please mention it! - I could help, if I can.
DevOps
opened by gospodinbodurov 60
libopenblas_nolapack.so.0: cannot open shared object file: No such file or directory

Hello,

I've just tried to run my application on beta2 and I've got the follow exception: Caused by: java.lang.UnsatisfiedLinkError: /app/.javacpp/cache/openblas-0.3.0-1.4.2-linux-x86_64.jar/org/bytedeco/javacpp/linux-x86_64/libjniopenblas_nolapack.so: libopenblas_nolapack.so.0: cannot open shared object file: No such file or directory

You can find full stacktrace here - https://gist.github.com/sergmain/0685cda1456721595637def8ca347662

Few days ago, I opened an issue https://github.com/deeplearning4j/deeplearning4j/issues/6083 Since then the issue was fixed and beta2 was released.

I rolled back to beta and my application started to work.

there is stub project for reproducing this problem on heroku https://github.com/sergmain/dl4j-uber-jar It doesn't contain actual keras model in this repo but you can use any.

Summary: beta - working beta2 - not working target OS - Heroku's PaaS target pratform for DL4J is specified in /.mvn/jvm.config
Question ND4J

opened by sergmain 59
Problem importing keras model
#### Issue Description

Please describe our issue, along with: I try to import a deeplearning model created in Python using last tensorflow version and a pre-trained model (EfficientNet). My goal is to import my H5 model into java but i always get an error saying "Unsupported keras layer type Functional."

expected behavior

encountered behavior

Version Information

Please indicate relevant versions, including, if relevant:

Deeplearning4j version : 1.0.0-M2.1

Platform information (OS, etc) : Windows

CUDA version, if used

NVIDIA driver version, if in use

Additional Information

Where applicable, please also provide:

Exception in thread "main" java.lang.RuntimeException: org.deeplearning4j.nn.modelimport.keras.exceptions.UnsupportedKerasConfigurationException: Unsupported keras layer type Functional. Please file an issue at https://github.com/eclipse/deeplearning4j/issues. at net.PneumoniaDetection.makePrediction(PneumoniaDetection.java:85) at net.Main.main(Main.java:25) Caused by: org.deeplearning4j.nn.modelimport.keras.exceptions.UnsupportedKerasConfigurationException: Unsupported keras layer type Functional. Please file an issue at https://github.com/eclipse/deeplearning4j/issues. at org.deeplearning4j.nn.modelimport.keras.utils.KerasLayerUtils.getKerasLayerFromConfig(KerasLayerUtils.java:337) at org.deeplearning4j.nn.modelimport.keras.KerasModel.prepareLayers(KerasModel.java:223) at org.deeplearning4j.nn.modelimport.keras.KerasModel.(KerasModel.java:165) at org.deeplearning4j.nn.modelimport.keras.KerasModel.(KerasModel.java:97) at org.deeplearning4j.nn.modelimport.keras.utils.KerasModelBuilder.buildModel(KerasModelBuilder.java:311) at org.deeplearning4j.nn.modelimport.keras.KerasModelImport.importKerasModelAndWeights(KerasModelImport.java:167) at net.PneumoniaDetection.makePrediction(PneumoniaDetection.java:78) ... 1 more

Process finished with exit code 1

Contributing

try { String fullModel = new ClassPathResource("/model_multi_B4_3_1.h5").getFile().getPath(); ComputationGraph model = KerasModelImport.importKerasModelAndWeights(fullModel); System.out.println("model loaded"); } catch (IOException e) { System.out.println(e); } catch (UnsupportedKerasConfigurationException | InvalidKerasConfigurationException e) { throw new RuntimeException(e); } }
opened by stevo32800 2

CreateView Op issues

Issue Description

There are 2 identified issues for org.nd4j.linalg.api.ops.impl.shape.CreateView:

if the Op is called using SDVariable.getView() thenNo op known for hash: -6201726597031682680 and name create_view is thrown.
if the Op is called directly as SameDiff.createView(), then Please extend DynamicCustomOp.doDiff to support SameDiff backprop operations. Op: org.nd4j.linalg.api.ops.impl.shape.CreateView is thrown

Anyway, this Op is currently non-functional.

expected behavior : successful operation execution
encountered behavior: Exception

Version Information

Deeplearning4j version - M 2.1
Platform information: Windows 10
CUDA version - N/A
NVIDIA driver version - N/A

Additional Information

Stack Trace for the first issue:

java.lang.IllegalStateException: No op known for hash: -6201726597031682680 and name create_view
	at org.nd4j.imports.converters.DifferentialFunctionClassHolder.customOpClassForHashAndName(DifferentialFunctionClassHolder.java:389)
	at org.nd4j.autodiff.samediff.serde.FlatBuffersMapper.fromFlatNode(FlatBuffersMapper.java:448)
	at org.nd4j.autodiff.samediff.serde.FlatBuffersMapper.cloneViaSerialize(FlatBuffersMapper.java:1065)
	at org.nd4j.autodiff.samediff.SameDiff.invokeGraphOn(SameDiff.java:656)
	at org.nd4j.autodiff.samediff.SameDiff.lambda$createGradFunction$1(SameDiff.java:4770)
	at org.nd4j.autodiff.samediff.SameDiff.defineFunction(SameDiff.java:4557)
	at org.nd4j.autodiff.samediff.SameDiff.defineFunction(SameDiff.java:4542)
	at org.nd4j.autodiff.samediff.SameDiff.createGradFunction(SameDiff.java:4762)
	at org.nd4j.autodiff.samediff.SameDiff.createGradFunction(SameDiff.java:4669)
	at org.nd4j.autodiff.samediff.SameDiff.fitHelper(SameDiff.java:1870)
	at org.nd4j.autodiff.samediff.SameDiff.fit(SameDiff.java:1792)
	at org.nd4j.autodiff.samediff.SameDiff.fit(SameDiff.java:1732)

Stack Trace for the second issue:

java.lang.UnsupportedOperationException: Please extend DynamicCustomOp.doDiff to support SameDiff backprop operations. Op: org.nd4j.linalg.api.ops.impl.shape.CreateView
	at org.nd4j.linalg.api.ops.DynamicCustomOp.doDiff(DynamicCustomOp.java:744)
	at org.nd4j.autodiff.functions.DifferentialFunction.diff(DifferentialFunction.java:677)
	at org.nd4j.autodiff.samediff.SameDiff.lambda$createGradFunction$1(SameDiff.java:5042)
	at org.nd4j.autodiff.samediff.SameDiff.defineFunction(SameDiff.java:4557)
	at org.nd4j.autodiff.samediff.SameDiff.defineFunction(SameDiff.java:4542)
	at org.nd4j.autodiff.samediff.SameDiff.createGradFunction(SameDiff.java:4762)
	at org.nd4j.autodiff.samediff.SameDiff.createGradFunction(SameDiff.java:4669)
	at org.nd4j.autodiff.samediff.SameDiff.fitHelper(SameDiff.java:1870)
	at org.nd4j.autodiff.samediff.SameDiff.fit(SameDiff.java:1792)
	at org.nd4j.autodiff.samediff.SameDiff.fit(SameDiff.java:1732)

opened by partarstu 0

Split Op fails if numSplit parameter is an SDVariable
Issue Description

The error message: org.nd4j.linalg.exception.ND4UnresolvedOutputVariables: Could not determine number of output variables for op split - Split. Ops can override getNumOutputs() to specify number of outputs if required

expected behavior : successful operation execution

encountered behavior: Exception

Version Information

Deeplearning4j version - M 2.1

Platform information: Windows 10

CUDA version - N/A

NVIDIA driver version - N/A

Additional Information

Split Op uses the integer numSplitparameter in order to define the number of output variables. That doesn't work in case numSplitis an SDVariable. For that case an additional handling of override of the method getNumOutputs() is needed.

Stack Trace:

org.nd4j.linalg.exception.ND4UnresolvedOutputVariables: Could not determine number of output variables for op split - Split. Ops can override getNumOutputs() to specify number of outputs if required at org.nd4j.autodiff.samediff.SameDiff.generateOutputVariableForOp(SameDiff.java:4383) at org.nd4j.linalg.api.ops.DynamicCustomOp.outputVariables(DynamicCustomOp.java:254) at org.nd4j.linalg.api.ops.DynamicCustomOp.outputVariables(DynamicCustomOp.java:237) at org.nd4j.autodiff.samediff.ops.SDBaseOps.split(SDBaseOps.java:4375)
opened by partarstu 0

Fatal error, core dumped, when running GenerateTxtModel example with commit 03e11c727222f21d74aa34cc9a069eca7ab54b1a in CPU mode

Issue Description

Please describe our issue, along with:

Expected behavior: GenerateTxtModel should run without error
Encountered behavior:

mvn -nsu compile exec:java -Dexec.mainClass="org.deeplearning4j.examples.advanced.modelling.charmodelling.generatetext.GenerateTxtModel"
[INFO] Scanning for projects...
[INFO] 
[INFO] ------------------< org.deeplearning4j:dl4j-examples >------------------
[INFO] Building Introduction to DL4J 1.0.0-SNAPSHOT
[INFO] --------------------------------[ jar ]---------------------------------
[INFO] 
[INFO] --- maven-enforcer-plugin:1.0.1:enforce (enforce-default) @ dl4j-examples ---
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ dl4j-examples ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 2 resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.8.1:compile (default-compile) @ dl4j-examples ---
[INFO] Nothing to compile - all classes are up to date
[INFO] 
[INFO] --- exec-maven-plugin:1.4.0:java (default-cli) @ dl4j-examples ---
Using existing text file at /tmp/Shakespeare.txt
Loaded and converted file: 963172 valid characters of 969521 total characters (6349 removed)
o.n.l.f.Nd4jBackend - Loaded [CpuBackend] backend
o.n.n.NativeOpsHolder - Number of threads used for linear algebra: 12
o.n.l.c.n.CpuNDArrayFactory - Binary level Generic x86 optimization level AVX/AVX2
o.n.n.Nd4jBlas - Number of threads used for OpenMP BLAS: 12
o.n.l.a.o.e.DefaultOpExecutioner - Backend used: [CPU]; OS: [Linux]
o.n.l.a.o.e.DefaultOpExecutioner - Cores: [24]; Memory: [15.7GB];
o.n.l.a.o.e.DefaultOpExecutioner - Blas vendor: [OPENBLAS]
o.n.l.c.n.CpuBackend - Backend build information:
 GCC: "11.3.0"
STD version: 201103L
DEFAULT_ENGINE: samediff::ENGINE_CPU
HAVE_FLATBUFFERS
HAVE_OPENBLAS
o.d.n.m.MultiLayerNetwork - Starting MultiLayerNetwork with WorkspaceModes set to [training: ENABLED; inference: ENABLED], cacheMode set to [NONE]

=====================================================================================
LayerName (LayerType)     nIn,nOut   TotalParams   ParamsShape                       
=====================================================================================
layer0 (LSTM)             77,200     222,400       W:{77,800}, RW:{200,800}, b:{800} 
layer1 (LSTM)             200,200    320,800       W:{200,800}, RW:{200,800}, b:{800}
layer2 (RnnOutputLayer)   200,77     15,477        W:{200,77}, b:{77}                
-------------------------------------------------------------------------------------
            Total Parameters:  558,677
        Trainable Parameters:  558,677
           Frozen Parameters:  0
=====================================================================================

o.d.o.l.ScoreIterationListener - Score at iteration 0 is 217.36468444651516
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fd5d7e3e077, pid=3805136, tid=3805204
#
# JRE version: OpenJDK Runtime Environment Zulu18.30+11-CA (18.0.1+10) (build 18.0.1+10)
# Java VM: OpenJDK 64-Bit Server VM Zulu18.30+11-CA (18.0.1+10, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  [libnd4jcpu.so+0x130f077]  sd::ShapeDescriptor::ShapeDescriptor(long long const*, bool)+0x47
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /home/greg_barton/dl4j/deeplearning4j-examples/dl4j-examples/core.3805136)
#
# An error report file with more information is saved as:
# /home/greg_barton/dl4j/deeplearning4j-examples/dl4j-examples/hs_err_pid3805136.log
#
# If you would like to submit a bug report, please visit:
#   http://www.azul.com/support/
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
Aborted (core dumped)

Version Information

Please indicate relevant versions, including, if relevant:

Deeplearning4j version: master branch, commit 03e11c727222f21d74aa34cc9a069eca7ab54b1a
Platform information (OS, etc): Ubuntu 22.04 and OSX
CUDA version, if used: N/A
NVIDIA driver version, if in use: N/A

hs_err_pid3805136.log.gz

opened by gregbarton 0

Fixes #9869 linear layer equivalencies
What changes were proposed in this pull request?

Fixes linear layer equivalencies, adds test (Please fill in changes proposed in this fix)

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

Quick checklist

The following checklist helps ensure your PR is complete:

[X ] Eclipse Contributor Agreement signed, and signed commits - see IP Requirements page for details

[ X] Reviewed the Contributing Guidelines and followed the steps within.

[ X] Created tests for any significant new code additions.

[ X] Relevant tests for your changes are passing.
opened by agibsonccc 0

Training using sd.nn.linear and sd.nn.reluLayer doesn't succeed

Issue Description

expected behavior: The expressions input.mmul(weights).add(bias) and sd.nn.linear(input, weights, bias) should be equivalent.
encountered behavior: In the code below, training will only succeed when the first variant is used: Variant 1 archives 100% accuracy, while variant 2 doesn't get better than random guessing (often even 0% accuracy).

I also added a third variant below, which uses sd.nn.reluLayer(input, weights, bias). Although this is not equivalent to the other two variants (it additionally has a ReLU activation function) it should nonetheless allow learning the task with high accuracy, but it doesn't (note that the weights are initialized all positive, so the ReLU should not make a difference).

Caveat: I could only test this with M2.1 due to issue #9862, which is fixed but not yet in SNAPSHOT.

int batchSize = 32;
int modelDim = 10;

SameDiff sd = SameDiff.create();

SDVariable features = sd.placeHolder("features", FLOAT, batchSize, modelDim);
SDVariable labels = sd.placeHolder("labels", FLOAT, batchSize, modelDim);
SDVariable weights = sd.var("weights", new OneInitScheme('c'), FLOAT, modelDim, modelDim);
SDVariable bias = sd.zero("bias", modelDim);
// SDVariable predictions = features.mmul(weights).add("predictions", bias);         // <<< variant 1 (works)
SDVariable predictions = sd.nn.linear("predictions", features, weights, bias);       // <<< variant 2 (doesn't work)
// SDVariable predictions = sd.nn.reluLayer("predictions", features, weights, bias); // <<< variant 3 (doesn't work)
sd.loss.meanSquaredError("loss", labels, predictions, null);

TrainingConfig config = new TrainingConfig.Builder()
        .updater(new Adam(0.1))
        .dataSetFeatureMapping("features")
        .dataSetLabelMapping("labels")
        .build();
sd.setTrainingConfig(config);

// the task is to reconstruct the one-hot encoded input
DataSetIterator iterator = new ReconstructionDataSetIterator(new RandomDataSetIterator(100, new long[]{batchSize, modelDim}, new long[]{}, ONE_HOT, ZEROS));

sd.fit(iterator, 10);

Evaluation evaluation = new Evaluation();
sd.evaluate(iterator, "predictions", evaluation);
System.out.println(evaluation.stats());