Running compute-intense parts of BigStitcher distributed

Overview

BigStitcher-Spark

Running compute-intense parts of BigStitcher distributed. For now we support fusion with affine transformation models (including translations of course). It should scale very well to large datasets as it tests for each block that is written which images are overlapping. You simply need to specify the XML of a BigSticher project and decide which channels, timepoints, etc. to fuse. Warning: not tested on 2D yet.

Sharing this early as it might be useful ...

Here is my example config for this example dataset for the main class net.preibisch.bigstitcher.spark.AffineFusion:

-x '~/test/dataset.xml'
-o '~/test/test-spark.n5'
-d '/ch488/s0'
--UINT8
--minIntensity 1
--maxIntensity 254
--channelId 0

Note: here I save it as UINT8 [0..255] and scale all intensities between 1 and 254 to that range (so it is more obvious what happens). If you omit UINT8, it'll save as FLOAT32 and no minIntensity and maxIntensity are required. UINT16 [0..65535] is also supported.

Importantly: since we have more than one channel, I specified to use channel 0, otherwise the channels are fused together, which is most likely not desired. Same applies if multiple timepoints are present.

The blocksize is currently hardcoded to 128x128x128, but can easily be added as another parameter (pull requests welcome :).

And for local spark you need JVM paramters (8 cores, 50GB RAM):

-Dspark.master=local[8] -Xmx50G

Ask your sysadmin for help how to run it on your cluster. mvn clean package builds target/BigStitcher-Spark-jar-with-dependencies.jar for distribution.

You can open the N5 in Fiji (File > Import > N5) or by using n5-view from the n5-utils package (https://github.com/saalfeldlab/n5-utils).

You can create a multiresolution pyramid of this data using https://github.com/saalfeldlab/n5-spark

Update: now there is support for non-rigid distributed fusion using net.preibisch.bigstitcher.spark.NonRigidFusionSpark In order to run it one needs to additionally define the corresponding interest points, e.g. -ip beads that will be used to compute the non-rigid transformation.

Comments
  • could not find XmlIoBasicImgLoader implementation for format bdv.n5

    could not find XmlIoBasicImgLoader implementation for format bdv.n5

    Hi @StephanPreibisch,

    Tried running this on the Janelia cluster and got this error.

    mpicbg.spim.data.SpimDataInstantiationException: could not find XmlIoBasicImgLoader implementation for format bdv.n5
    

    I am using the latest spark-janelia from here

    This is how I build the repo and ran it
      [login1 - moharb@e05u15]~>git clone https://github.com/PreibischLab/BigStitcher-Spark.git
    Cloning into 'BigStitcher-Spark'...
    remote: Enumerating objects: 181, done.
    remote: Counting objects: 100% (181/181), done.
    remote: Compressing objects: 100% (104/104), done.
    remote: Total 181 (delta 69), reused 108 (delta 22), pack-reused 0
    Receiving objects: 100% (181/181), 35.31 KiB | 2.35 MiB/s, done.
    Resolving deltas: 100% (69/69), done.
    [login1 - moharb@e05u15]~>cd BigStitcher-Spark/
    [login1 - moharb@e05u15]~/BigStitcher-Spark>~/apache-maven-3.8.4/bin/mvn clean package
    [INFO] Scanning for projects...
    [INFO]
    [INFO] ------------------< net.preibisch:BigStitcher-Spark >-------------------
    [INFO] Building BigStitcher Spark 0.0.1-SNAPSHOT
    [INFO] --------------------------------[ jar ]---------------------------------
    [INFO]
    [INFO] --- maven-clean-plugin:3.1.0:clean (default-clean) @ BigStitcher-Spark ---
    [INFO]
    [INFO] --- maven-enforcer-plugin:3.0.0-M3:enforce (enforce-rules) @ BigStitcher-Spark ---
    [INFO] Adding ignore: module-info
    [INFO] Adding ignore: META-INF/versions/*/module-info
    [INFO] Adding ignore: com.esotericsoftware.kryo.*
    [INFO] Adding ignore: com.esotericsoftware.minlog.*
    [INFO] Adding ignore: com.esotericsoftware.reflectasm.*
    [INFO] Adding ignore: com.google.inject.*
    [INFO] Adding ignore: jnr.ffi.*
    [INFO] Adding ignore: org.apache.hadoop.yarn.*.package-info
    [INFO] Adding ignore: org.apache.spark.unused.UnusedStubClass
    [INFO] Adding ignore: org.hibernate.stat.ConcurrentStatisticsImpl
    [INFO] Adding ignore: org.jetbrains.kotlin.daemon.common.*
    [INFO] Adding ignore: org.junit.runner.Runner
    [INFO] Adding ignore: module-info
    [INFO] Adding ignore: module-info
    [INFO]
    [INFO] --- build-helper-maven-plugin:3.0.0:regex-property (sanitize-version) @ BigStitcher-Spark ---
    [INFO]
    [INFO] --- build-helper-maven-plugin:3.0.0:regex-property (guess-package) @ BigStitcher-Spark ---
    [INFO]
    [INFO] --- buildnumber-maven-plugin:1.4:create (default) @ BigStitcher-Spark ---
    [INFO] Executing: /bin/sh -c cd '/groups/spruston/home/moharb/BigStitcher-Spark' && 'git' 'rev-parse' '--verify' 'HEAD'
    [INFO] Working directory: /groups/spruston/home/moharb/BigStitcher-Spark
    [INFO] Storing buildNumber: e2b676364526588195f16931e998a7a756ca778b at timestamp: 1640470297294
    [INFO] Storing buildScmBranch: main
    [INFO]
    [INFO] --- scijava-maven-plugin:2.0.0:set-rootdir (set-rootdir) @ BigStitcher-Spark ---
    [INFO] Setting rootdir: /groups/spruston/home/moharb/BigStitcher-Spark
    [INFO]
    [INFO] --- jacoco-maven-plugin:0.8.6:prepare-agent (jacoco-initialize) @ BigStitcher-Spark ---
    [WARNING] The artifact xml-apis:xml-apis:jar:2.0.2 has been relocated to xml-apis:xml-apis:jar:1.0.b2
    [INFO] argLine set to -javaagent:/groups/spruston/home/moharb/.m2/repository/org/jacoco/org.jacoco.agent/0.8.6/org.jacoco.agent-0.8.6-runtime.jar=destfile=/groups/spruston/home/moharb/BigStitcher-Spark/target/jacoco.exec
    [INFO]
    [INFO] --- maven-resources-plugin:3.1.0:resources (default-resources) @ BigStitcher-Spark ---
    [INFO] Using 'UTF-8' encoding to copy filtered resources.
    [INFO] skip non existing resourceDirectory /groups/spruston/home/moharb/BigStitcher-Spark/src/main/resources
    [INFO]
    [INFO] --- maven-compiler-plugin:3.8.1:compile (default-compile) @ BigStitcher-Spark ---
    [INFO] Compiling 6 source files to /groups/spruston/home/moharb/BigStitcher-Spark/target/classes
    [INFO]
    [INFO] --- maven-resources-plugin:3.1.0:testResources (default-testResources) @ BigStitcher-Spark ---
    [INFO] Using 'UTF-8' encoding to copy filtered resources.
    [INFO] skip non existing resourceDirectory /groups/spruston/home/moharb/BigStitcher-Spark/src/test/resources
    [INFO]
    [INFO] --- maven-compiler-plugin:3.8.1:testCompile (default-testCompile) @ BigStitcher-Spark ---
    [INFO] No sources to compile
    [INFO]
    [INFO] --- maven-surefire-plugin:2.22.2:test (default-test) @ BigStitcher-Spark ---
    [INFO] No tests to run.
    [INFO]
    [INFO] --- maven-jar-plugin:3.2.0:jar (default-jar) @ BigStitcher-Spark ---
    [INFO] Building jar: /groups/spruston/home/moharb/BigStitcher-Spark/target/BigStitcher-Spark.jar
    [INFO]
    [INFO] >>> maven-source-plugin:3.2.1:jar (attach-sources-jar) > generate-sources @ BigStitcher-Spark >>>
    [INFO]
    [INFO] --- maven-enforcer-plugin:3.0.0-M3:enforce (enforce-rules) @ BigStitcher-Spark ---
    [INFO] Adding ignore: module-info
    [INFO] Adding ignore: META-INF/versions/*/module-info
    [INFO] Adding ignore: com.esotericsoftware.kryo.*
    [INFO] Adding ignore: com.esotericsoftware.minlog.*
    [INFO] Adding ignore: com.esotericsoftware.reflectasm.*
    [INFO] Adding ignore: com.google.inject.*
    [INFO] Adding ignore: jnr.ffi.*
    [INFO] Adding ignore: org.apache.hadoop.yarn.*.package-info
    [INFO] Adding ignore: org.apache.spark.unused.UnusedStubClass
    [INFO] Adding ignore: org.hibernate.stat.ConcurrentStatisticsImpl
    [INFO] Adding ignore: org.jetbrains.kotlin.daemon.common.*
    [INFO] Adding ignore: org.junit.runner.Runner
    [INFO] Adding ignore: module-info
    [INFO] Adding ignore: module-info
    [INFO]
    [INFO] --- build-helper-maven-plugin:3.0.0:regex-property (sanitize-version) @ BigStitcher-Spark ---
    [INFO]
    [INFO] --- build-helper-maven-plugin:3.0.0:regex-property (guess-package) @ BigStitcher-Spark ---
    [INFO]
    [INFO] --- buildnumber-maven-plugin:1.4:create (default) @ BigStitcher-Spark ---
    [INFO]
    [INFO] --- scijava-maven-plugin:2.0.0:set-rootdir (set-rootdir) @ BigStitcher-Spark ---
    [INFO]
    [INFO] --- jacoco-maven-plugin:0.8.6:prepare-agent (jacoco-initialize) @ BigStitcher-Spark ---
    [INFO] argLine set to -javaagent:/groups/spruston/home/moharb/.m2/repository/org/jacoco/org.jacoco.agent/0.8.6/org.jacoco.agent-0.8.6-runtime.jar=destfile=/groups/spruston/home/moharb/BigStitcher-Spark/target/jacoco.exec
    [INFO]
    [INFO] <<< maven-source-plugin:3.2.1:jar (attach-sources-jar) < generate-sources @ BigStitcher-Spark <<<
    [INFO]
    [INFO]
    [INFO] --- maven-source-plugin:3.2.1:jar (attach-sources-jar) @ BigStitcher-Spark ---
    [INFO] Building jar: /groups/spruston/home/moharb/BigStitcher-Spark/target/BigStitcher-Spark-sources.jar
    [INFO]
    [INFO] --- jacoco-maven-plugin:0.8.6:report (jacoco-site) @ BigStitcher-Spark ---
    [INFO] Skipping JaCoCo execution due to missing execution data file.
    [INFO]
    [INFO] --- maven-assembly-plugin:3.1.1:single (make-assembly) @ BigStitcher-Spark ---
    [INFO] Building jar: /groups/spruston/home/moharb/BigStitcher-Spark/target/BigStitcher-Spark-jar-with-dependencies.jar
    [INFO]
    [INFO] --- maven-jar-plugin:3.2.0:test-jar (default) @ BigStitcher-Spark ---
    [INFO] Skipping packaging of the test-jar
    [INFO] ------------------------------------------------------------------------
    [INFO] BUILD SUCCESS
    [INFO] ------------------------------------------------------------------------
    [INFO] Total time:  32.978 s
    [INFO] Finished at: 2021-12-25T17:12:02-05:00
    [INFO] ------------------------------------------------------------------------
    [login1 - moharb@e05u15]~/BigStitcher-Spark>TERMINATE=1 RUNTIME=8:00 TMPDIR=~/tmp ~/spark-janelia/flintstone.sh 3 \
    > ~/BigStitcher-Spark/target/BigStitcher-Spark-jar-with-dependencies.jar \
    > net.preibisch.bigstitcher.spark.AffineFusion \
    > -x '/nrs/svoboda/moharb/dataset.xml' \
    > -o  '/nrs/svoboda/moharb/output3.n5' \
    > -d '/GFP/s0' \
    > --channelId 0 \
    > --UINT8 \
    > --minIntensity 1 \
    > --maxIntensity 254
    
    On e05u15.int.janelia.org with Python 3.6.10 :: Anaconda, Inc., running:
    
      /groups/spruston/home/moharb/spark-janelia/spark-janelia  --consolidate_logs --nnodes=3 --gb_per_slot=15 --driverslots=32 --worker_slots=32 --minworkers=1 --hard_runtime=8:00 --submitargs=" --verbose --conf spark.default.parallelism=270 --conf spark.executor.instances=6 --conf spark.executor.cores=5 --conf spark.executor.memory=75g --class net.preibisch.bigstitcher.spark.AffineFusion /groups/spruston/home/moharb/BigStitcher-Spark/target/BigStitcher-Spark-jar-with-dependencies.jar -x /nrs/svoboda/moharb/dataset.xml -o /nrs/svoboda/moharb/output3.n5 -d /GFP/s0 --channelId 0 --UINT8 --minIntensity 1 --maxIntensity 254" generate-and-launch-run
    
    
    Created:
      /groups/spruston/home/moharb/.spark/20211225_171249/conf
      /groups/spruston/home/moharb/.spark/20211225_171249/logs
      /groups/spruston/home/moharb/.spark/20211225_171249/scripts
    
    Running:
      /groups/spruston/home/moharb/.spark/20211225_171249/scripts/00-queue-lsf-jobs.sh
    
    Sat Dec 25 17:12:49 EST 2021 [e05u15.int.janelia.org] submitting jobs to scheduler
    This job will be billed to svoboda
    Job <114095184> is submitted to default queue <local>.
    This job will be billed to svoboda
    Job <114095185> is submitted to default queue <short>.
    This job will be billed to svoboda
    Job <114095186> is submitted to default queue <local>.
    This job will be billed to svoboda
    Job <114095189> is submitted to default queue <local>.
    This job will be billed to svoboda
    Job <114095190> is submitted to default queue <short>.
    
    Queued jobs are:
    JOBID      USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
    114095184  moharb  PEND  local      e05u15                  *171249_ma Dec 25 17:12
    114095185  moharb  PEND  short      e05u15                  *171249_ur Dec 25 17:12
    114095186  moharb  PEND  local      e05u15                  *249_wo[1] Dec 25 17:12
    114095186  moharb  PEND  local      e05u15                  *249_wo[2] Dec 25 17:12
    114095186  moharb  PEND  local      e05u15                  *249_wo[3] Dec 25 17:12
    114095189  moharb  PEND  local      e05u15                  *171249_dr Dec 25 17:12
    114095190  moharb  PEND  short      e05u15                  *171249_sd Dec 25 17:12
    
    
    To get web user interface URL after master has started, run:
      grep "Bound MasterWebUI to" /groups/spruston/home/moharb/.spark/20211225_171249/logs/01-master.log
    
    
    These are the driver logs with the error
    Sat Dec 25 17:13:34 EST 2021 [h07u13] running /misc/local/spark-3.0.1/bin/spark-submit --deploy-mode client --master spark://10.36.107.39:7077  --verbose --conf spark.default.parallelism=270 --conf spark.executor.instances=6 --conf spark.executor.cores=5 --conf spark.executor.memory=75g --class net.preibisch.bigstitcher.spark.AffineFusion /groups/spruston/home/moharb/BigStitcher-Spark/target/BigStitcher-Spark-jar-with-dependencies.jar -x /nrs/svoboda/moharb/dataset.xml -o /nrs/svoboda/moharb/output3.n5 -d /GFP/s0 --channelId 0 --UINT8 --minIntensity 1 --maxIntensity 254
    Using properties file: /groups/spruston/home/moharb/.spark/20211225_171249/conf/spark-defaults.conf
    Adding default property: spark.storage.blockManagerHeartBeatMs=30000
    Adding default property: spark.driver.maxResultSize=0
    Adding default property: spark.kryoserializer.buffer.max=1024m
    Adding default property: spark.rpc.askTimeout=300s
    Adding default property: spark.driver.memory=479g
    Adding default property: spark.submit.deployMode=cluster
    Adding default property: spark.rpc.retry.wait=30s
    Adding default property: spark.core.connection.ack.wait.timeout=600s
    Parsed arguments:
      master                  spark://10.36.107.39:7077
      deployMode              client
      executorMemory          75g
      executorCores           5
      totalExecutorCores      null
      propertiesFile          /groups/spruston/home/moharb/.spark/20211225_171249/conf/spark-defaults.conf
      driverMemory            479g
      driverCores             null
      driverExtraClassPath    null
      driverExtraLibraryPath  null
      driverExtraJavaOptions  null
      supervise               false
      queue                   null
      numExecutors            6
      files                   null
      pyFiles                 null
      archives                null
      mainClass               net.preibisch.bigstitcher.spark.AffineFusion
      primaryResource         file:/groups/spruston/home/moharb/BigStitcher-Spark/target/BigStitcher-Spark-jar-with-dependencies.jar
      name                    net.preibisch.bigstitcher.spark.AffineFusion
      childArgs               [-x /nrs/svoboda/moharb/dataset.xml -o /nrs/svoboda/moharb/output3.n5 -d /GFP/s0 --channelId 0 --UINT8 --minIntensity 1 --maxIntensity 254]
      jars                    null
      packages                null
      packagesExclusions      null
      repositories            null
      verbose                 true
    
    Spark properties used, including those specified through
     --conf and those from the properties file /groups/spruston/home/moharb/.spark/20211225_171249/conf/spark-defaults.conf:
      (spark.default.parallelism,270)
      (spark.driver.memory,479g)
      (spark.executor.instances,6)
      (spark.executor.memory,75g)
      (spark.rpc.askTimeout,300s)
      (spark.storage.blockManagerHeartBeatMs,30000)
      (spark.kryoserializer.buffer.max,1024m)
      (spark.submit.deployMode,cluster)
      (spark.core.connection.ack.wait.timeout,600s)
      (spark.driver.maxResultSize,0)
      (spark.rpc.retry.wait,30s)
      (spark.executor.cores,5)
    
        
    2021-12-25 17:13:36,473 [main] WARN [NativeCodeLoader]: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Main class:
    net.preibisch.bigstitcher.spark.AffineFusion
    Arguments:
    -x
    /nrs/svoboda/moharb/dataset.xml
    -o
    /nrs/svoboda/moharb/output3.n5
    -d
    /GFP/s0
    --channelId
    0
    --UINT8
    --minIntensity
    1
    --maxIntensity
    254
    Spark config:
    (spark.storage.blockManagerHeartBeatMs,30000)
    (spark.driver.maxResultSize,0)
    (spark.jars,file:/groups/spruston/home/moharb/BigStitcher-Spark/target/BigStitcher-Spark-jar-with-dependencies.jar)
    (spark.kryoserializer.buffer.max,1024m)
    (spark.app.name,net.preibisch.bigstitcher.spark.AffineFusion)
    (spark.rpc.askTimeout,300s)
    (spark.driver.memory,479g)
    (spark.executor.instances,6)
    (spark.submit.pyFiles,)
    (spark.default.parallelism,270)
    (spark.submit.deployMode,client)
    (spark.master,spark://10.36.107.39:7077)
    (spark.rpc.retry.wait,30s)
    (spark.executor.memory,75g)
    (spark.executor.cores,5)
    (spark.core.connection.ack.wait.timeout,600s)
    Classpath elements:
    file:/groups/spruston/home/moharb/BigStitcher-Spark/target/BigStitcher-Spark-jar-with-dependencies.jar
    
    
    [-x, /nrs/svoboda/moharb/dataset.xml, -o, /nrs/svoboda/moharb/output3.n5, -d, /GFP/s0, --channelId, 0, --UINT8, --minIntensity, 1, --maxIntensity, 254]
    mpicbg.spim.data.SpimDataInstantiationException: could not find XmlIoBasicImgLoader implementation for format bdv.n5
    	at mpicbg.spim.data.generic.sequence.ImgLoaders.createXmlIoForFormat(ImgLoaders.java:72)
    	at mpicbg.spim.data.generic.sequence.XmlIoAbstractSequenceDescription.fromXml(XmlIoAbstractSequenceDescription.java:110)
    	at mpicbg.spim.data.generic.XmlIoAbstractSpimData.fromXml(XmlIoAbstractSpimData.java:153)
    	at net.preibisch.mvrecon.fiji.spimdata.XmlIoSpimData2.fromXml(XmlIoSpimData2.java:164)
    	at net.preibisch.mvrecon.fiji.spimdata.XmlIoSpimData2.fromXml(XmlIoSpimData2.java:52)
    	at mpicbg.spim.data.generic.XmlIoAbstractSpimData.load(XmlIoAbstractSpimData.java:95)
    	at net.preibisch.bigstitcher.spark.AffineFusion.call(AffineFusion.java:94)
    	at net.preibisch.bigstitcher.spark.AffineFusion.call(AffineFusion.java:39)
    	at picocli.CommandLine.executeUserObject(CommandLine.java:1853)
    	at picocli.CommandLine.access$1100(CommandLine.java:145)
    	at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2255)
    	at picocli.CommandLine$RunLast.handle(CommandLine.java:2249)
    	at picocli.CommandLine$RunLast.handle(CommandLine.java:2213)
    	at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2080)
    	at picocli.CommandLine.execute(CommandLine.java:1978)
    	at net.preibisch.bigstitcher.spark.AffineFusion.main(AffineFusion.java:283)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:498)
    	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
    	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
    	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
    	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
    	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
    	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
    	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    2021-12-25 17:13:37,007 [Thread-1] INFO [ShutdownHookManager]: Shutdown hook called
    2021-12-25 17:13:37,007 [Thread-1] INFO [ShutdownHookManager]: Deleting directory /tmp/spark-fcf876a8-e1e9-4de3-bffe-d05ee3fb0a4a
    
    ------------------------------------------------------------
    Sender: LSF System <lsfadmin@h07u13>
    Subject: Job 114095189: <spark_moharb_20211225_171249_dr> in cluster <Janelia> Exited
    
    Job <spark_moharb_20211225_171249_dr> was submitted from host <e05u15> by user <moharb> in cluster <Janelia> at Sat Dec 25 17:12:49 2021
    Job was executed on host(s) <32*h07u13>, in queue <local>, as user <moharb> in cluster <Janelia> at Sat Dec 25 17:13:31 2021
    </groups/spruston/home/moharb> was used as the home directory.
    </groups/spruston/home/moharb/BigStitcher-Spark> was used as the working directory.
    Started at Sat Dec 25 17:13:31 2021
    Terminated at Sat Dec 25 17:13:37 2021
    Results reported at Sat Dec 25 17:13:37 2021
    
    Your job looked like:
    
    ------------------------------------------------------------
    # LSBATCH: User input
    /groups/spruston/home/moharb/.spark/20211225_171249/scripts/04-launch-driver.sh
    ------------------------------------------------------------
    
    Exited with exit code 1.
    
    Resource usage summary:
    
        CPU time :                                   3.96 sec.
        Max Memory :                                 11 MB
        Average Memory :                             11.00 MB
        Total Requested Memory :                     491520.00 MB
        Delta Memory :                               491509.00 MB
        Max Swap :                                   -
        Max Processes :                              4
        Max Threads :                                5
        Run time :                                   6 sec.
        Turnaround time :                            48 sec.
    
    The output (if any) is above this job summary.
    

    Same happens for nonRigid, am I doing something wrong?

    Thanks! Boaz

    opened by boazmohar 29
  • Fusion fails on local Spark instance

    Fusion fails on local Spark instance

    Hi @trautmane,

    As requested, here are the details on what we are running into trying to fuse a BDV file using BigStitcher-Spark. The plugin was built with the code changes on main, but not fix_bdv_n5 as I got a conflict when I tried to merge the two branches. I've attached the XML as well.

    It wasn't totally clear if the extraJavaOptions should be passed to the driver or executors when in local mode, so we tried both. The same error as pasted here message pops up. We also tried allocating more RAM to the executors, same error message as pasted here pops up.

    The error usually occurs once >6,000 files within the N5 have been written. In this particular case, ~7,200 files were written.

    Please let me know what other information I can provide.

    Thanks! Doug

    Linux version
    Linux qi2labserver 5.4.0-74-generic #83~18.04.1-Ubuntu SMP Tue May 11 16:01:00 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
    
    Maven and java details
    Apache Maven 3.6.0
    Maven home: /usr/share/maven
    Java version: 1.8.0_312, vendor: Private Build, runtime: /usr/lib/jvm/java-8-openjdk-amd64/jre
    Default locale: en_US, platform encoding: UTF-8
    OS name: "linux", version: "5.4.0-74-generic", arch: "amd64", family: "unix"
    
    Spark details
    22/01/04 11:54:33 WARN Utils: Your hostname, qi2labserver resolves to a loopback address: 127.0.1.1; using 10.206.25.77 instead (on interface enp5s0f0)
    22/01/04 11:54:33 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _\ \/ _ \/ _ `/ __/  '_/
       /___/ .__/\_,_/_/ /_/\_\   version 3.2.0
          /_/
    
    Using Scala version 2.12.15, OpenJDK 64-Bit Server VM, 1.8.0_312
    Branch HEAD
    Compiled by user ubuntu on 2021-10-06T12:46:30Z
    Revision 5d45a415f3a29898d92380380cfd82bfc7f579ea
    Url https://github.com/apache/spark
    Type --help for more information.
    
    Call to Spark
    spark-submit --master local[32,8] 
    --conf spark.driver.memory=100G 
    --conf "spark.executor.extraJavaOptions=-XX:ActiveProcessorCount=1" 
    --class net.preibisch.bigstitcher.spark.AffineFusion ~/Documents/github/BigStitcher-Spark/target/BigStitcher-Spark-0.0.1-SNAPSHOT.jar 
    -x /mnt/opm2/20210924b/deskew_flatfield_output/bdv/AMC_cy7_test_bdv.xml 
    -o /mnt/opm2/20210924b/n5/output.n5 
    -d /DAPI/s0 
    --channelId 0 
    --UINT16 
    --minIntensity 0
    --maxIntensity 65535
    
    Error message
    (Tue Jan 04 11:21:37 MST 2022): Requesting Img from ImgLoader (tp=0, setup=7), using level=0, [1.0 x 1.0 x 1.0]
    22/01/04 11:21:37 ERROR Executor: Exception in task 28.0 in stage 0.0 (TID 28)
    java.lang.OutOfMemoryError: unable to create new native thread
            at java.lang.Thread.start0(Native Method)
            at java.lang.Thread.start(Thread.java:717)
            at net.imglib2.cache.queue.FetcherThreads.<init>(FetcherThreads.java:92)
            at net.imglib2.cache.queue.FetcherThreads.<init>(FetcherThreads.java:70)
            at bdv.img.hdf5.Hdf5ImageLoader.open(Hdf5ImageLoader.java:209)
            at bdv.img.hdf5.Hdf5ImageLoader.<init>(Hdf5ImageLoader.java:158)
            at bdv.img.hdf5.Hdf5ImageLoader.<init>(Hdf5ImageLoader.java:144)
            at bdv.img.hdf5.Hdf5ImageLoader.<init>(Hdf5ImageLoader.java:139)
            at bdv.img.hdf5.XmlIoHdf5ImageLoader.fromXml(XmlIoHdf5ImageLoader.java:70)
            at bdv.img.hdf5.XmlIoHdf5ImageLoader.fromXml(XmlIoHdf5ImageLoader.java:49)
            at mpicbg.spim.data.generic.sequence.XmlIoAbstractSequenceDescription.fromXml(XmlIoAbstractSequenceDescription.java:111)
            at mpicbg.spim.data.generic.XmlIoAbstractSpimData.fromXml(XmlIoAbstractSpimData.java:153)
            at net.preibisch.mvrecon.fiji.spimdata.XmlIoSpimData2.fromXml(XmlIoSpimData2.java:164)
            at net.preibisch.mvrecon.fiji.spimdata.XmlIoSpimData2.fromXml(XmlIoSpimData2.java:52)
            at mpicbg.spim.data.generic.XmlIoAbstractSpimData.load(XmlIoAbstractSpimData.java:95)
            at net.preibisch.bigstitcher.spark.AffineFusion.lambda$call$c48314ca$1(AffineFusion.java:208)
            at org.apache.spark.api.java.JavaRDDLike.$anonfun$foreach$1(JavaRDDLike.scala:352)
            at org.apache.spark.api.java.JavaRDDLike.$anonfun$foreach$1$adapted(JavaRDDLike.scala:352)
            at scala.collection.Iterator.foreach(Iterator.scala:943)
            at scala.collection.Iterator.foreach$(Iterator.scala:943)
            at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
            at org.apache.spark.rdd.RDD.$anonfun$foreach$2(RDD.scala:1012)
            at org.apache.spark.rdd.RDD.$anonfun$foreach$2$adapted(RDD.scala:1012)
            at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2254)
            at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
            at org.apache.spark.scheduler.Task.run(Task.scala:131)
            at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
            at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
            at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
    22/01/04 11:21:37 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker for task 28.0 in stage 0.0 (TID 28),5,main]
    java.lang.OutOfMemoryError: unable to create new native thread
            at java.lang.Thread.start0(Native Method)
            at java.lang.Thread.start(Thread.java:717)
            at net.imglib2.cache.queue.FetcherThreads.<init>(FetcherThreads.java:92)
            at net.imglib2.cache.queue.FetcherThreads.<init>(FetcherThreads.java:70)
            at bdv.img.hdf5.Hdf5ImageLoader.open(Hdf5ImageLoader.java:209)
            at bdv.img.hdf5.Hdf5ImageLoader.<init>(Hdf5ImageLoader.java:158)
            at bdv.img.hdf5.Hdf5ImageLoader.<init>(Hdf5ImageLoader.java:144)
            at bdv.img.hdf5.Hdf5ImageLoader.<init>(Hdf5ImageLoader.java:139)
            at bdv.img.hdf5.XmlIoHdf5ImageLoader.fromXml(XmlIoHdf5ImageLoader.java:70)
            at bdv.img.hdf5.XmlIoHdf5ImageLoader.fromXml(XmlIoHdf5ImageLoader.java:49)
            at mpicbg.spim.data.generic.sequence.XmlIoAbstractSequenceDescription.fromXml(XmlIoAbstractSequenceDescription.java:111)
            at mpicbg.spim.data.generic.XmlIoAbstractSpimData.fromXml(XmlIoAbstractSpimData.java:153)
            at net.preibisch.mvrecon.fiji.spimdata.XmlIoSpimData2.fromXml(XmlIoSpimData2.java:164)
            at net.preibisch.mvrecon.fiji.spimdata.XmlIoSpimData2.fromXml(XmlIoSpimData2.java:52)
            at mpicbg.spim.data.generic.XmlIoAbstractSpimData.load(XmlIoAbstractSpimData.java:95)
            at net.preibisch.bigstitcher.spark.AffineFusion.lambda$call$c48314ca$1(AffineFusion.java:208)
            at org.apache.spark.api.java.JavaRDDLike.$anonfun$foreach$1(JavaRDDLike.scala:352)
            at org.apache.spark.api.java.JavaRDDLike.$anonfun$foreach$1$adapted(JavaRDDLike.scala:352)
            at scala.collection.Iterator.foreach(Iterator.scala:943)
            at scala.collection.Iterator.foreach$(Iterator.scala:943)
            at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
            at org.apache.spark.rdd.RDD.$anonfun$foreach$2(RDD.scala:1012)
            at org.apache.spark.rdd.RDD.$anonfun$foreach$2$adapted(RDD.scala:1012)
            at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2254)
            at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
            at org.apache.spark.scheduler.Task.run(Task.scala:131)
            at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
            at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
            at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
    22/01/04 11:21:37 ERROR Inbox: An error happened while processing message in the inbox for LocalSchedulerBackendEndpoint
    java.lang.OutOfMemoryError: unable to create new native thread
            at java.lang.Thread.start0(Native Method)
            at java.lang.Thread.start(Thread.java:717)
            at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
            at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1367)
            at org.apache.spark.scheduler.TaskResultGetter.enqueueFailedTask(TaskResultGetter.scala:137)
            at org.apache.spark.scheduler.TaskSchedulerImpl.liftedTree2$1(TaskSchedulerImpl.scala:817)
            at org.apache.spark.scheduler.TaskSchedulerImpl.statusUpdate(TaskSchedulerImpl.scala:791)
            at org.apache.spark.scheduler.local.LocalEndpoint$$anonfun$receive$1.applyOrElse(LocalSchedulerBackend.scala:71)
            at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
            at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
            at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
            at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
            at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
    Exception in thread "dispatcher-event-loop-30" java.lang.OutOfMemoryError: unable to create new native thread
            at java.lang.Thread.start0(Native Method)
            at java.lang.Thread.start(Thread.java:717)
            at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
            at java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1025)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
    

    XML file: AMC_cy7_test_bdv.zip

    opened by dpshepherd 18
  • Error Could not initialize class ch.systemsx.cisd.hdf5.CharacterEncoding on AffineExport on h5 file

    Error Could not initialize class ch.systemsx.cisd.hdf5.CharacterEncoding on AffineExport on h5 file

    Hi @StephanPreibisch,

    I am trying to do an AffineExport with spark:

    ~/spark-janelia/flintstone.sh 4 \
    /groups/spruston/home/moharb/BigStitcher-Spark/target/BigStitcher-Spark-0.0.2-SNAPSHOT.jar \ 
    net.preibisch.bigstitcher.spark.AffineFusion \
    -x '/groups/mousebrainmicro/mousebrainmicro/data/Lightsheet/20210812_AG/ML_Rendering-test/aligned_data.xml' \
    -o  '/nrs/svoboda/moharb/test_ML.n5' -d '/s0' 
    

    And get this error:

    2022-04-21 15:45:37,731 [task-result-getter-0] ERROR [TaskSetManager]: Task 1 in stage 0.0 failed 4 times; aborting job
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 78, 10.36.107.42, executor 0): java.lang.NoClassDefFoundError: Could not initialize class ch.systemsx.cisd.hdf5.CharacterEncoding
    	at ch.systemsx.cisd.hdf5.HDF5BaseReader.<init>(HDF5BaseReader.java:143)
    	at ch.systemsx.cisd.hdf5.HDF5BaseReader.<init>(HDF5BaseReader.java:126)
    	at ch.systemsx.cisd.hdf5.HDF5ReaderConfigurator.reader(HDF5ReaderConfigurator.java:86)
    	at ch.systemsx.cisd.hdf5.HDF5FactoryProvider$HDF5Factory.openForReading(HDF5FactoryProvider.java:54)
    	at ch.systemsx.cisd.hdf5.HDF5Factory.openForReading(HDF5Factory.java:55)
    	at bdv.img.hdf5.Hdf5ImageLoader.open(Hdf5ImageLoader.java:183)
    	at bdv.img.hdf5.Hdf5ImageLoader.getSetupImgLoader(Hdf5ImageLoader.java:381)
    	at bdv.img.hdf5.Hdf5ImageLoader.getSetupImgLoader(Hdf5ImageLoader.java:79)
    	at net.preibisch.bigstitcher.spark.util.ViewUtil.getTransformedBoundingBox(ViewUtil.java:32)
    	at net.preibisch.bigstitcher.spark.AffineFusion.lambda$call$7b7a6284$1(AffineFusion.java:268)
    	at org.apache.spark.api.java.JavaRDDLike.$anonfun$foreach$1(JavaRDDLike.scala:351)
    	at org.apache.spark.api.java.JavaRDDLike.$anonfun$foreach$1$adapted(JavaRDDLike.scala:351)
    	at scala.collection.Iterator.foreach(Iterator.scala:941)
    	at scala.collection.Iterator.foreach$(Iterator.scala:941)
    	at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
    	at org.apache.spark.rdd.RDD.$anonfun$foreach$2(RDD.scala:986)
    	at org.apache.spark.rdd.RDD.$anonfun$foreach$2$adapted(RDD.scala:986)
    	at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2139)
    	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    	at org.apache.spark.scheduler.Task.run(Task.scala:127)
    	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
    	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
    	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    	at java.lang.Thread.run(Thread.java:748)
    

    I can open it in Fiji and look at the data with BigStitcher without an issue. The xml is in: /groups/mousebrainmicro/mousebrainmicro/data/Lightsheet/20210812_AG/ML_Rendering-test/aligned_data.xml Any idea what to do? Found this, might be related.

    Thanks, Boaz

    opened by boazmohar 14
  • OutOfMemoryError caused by creation of too many N5ImageLoader fetcher threads

    OutOfMemoryError caused by creation of too many N5ImageLoader fetcher threads

    While working through issue 2 with a larger data set, @boazmohar discovered many OutOfMemoryError: unable to create new native thread exceptions in the worker logs. These exceptions are raised because parallelized RDDs create many N5ImageLoader instances like this one and each N5ImageLoader instance in turn creates Runtime.getRuntime().availableProcessors() fetcher threads.

    I reduced some of the fetcher thread creation by reusing loaders in this commit. However, reusing loaders did not completely solve the problem.

    I think the best solution is to parameterize the number of fetcher threads in the N5ImageLoader and then explicitly set fetcher thread counts in spark clients. This issue can remain open until that happens or until another long term solution is developed.

    In the mean time as a work-around, overriding the default availableProcessors value with a -XX:ActiveProcessorCount=1 JVM directive seems to fix the problem.

    More specifically, here are the spark-janelia flintstone.sh environment parameters I used to successfully process @boazmohar 's larger data set:

    # --------------------------------------------------------------------
    # Default Spark Setup (11 cores per worker)
    # --------------------------------------------------------------------
    export N_EXECUTORS_PER_NODE=2
    export N_CORES_PER_EXECUTOR=5
    export N_OVERHEAD_CORES_PER_WORKER=1
    # Note: N_CORES_PER_WORKER=$(( (N_EXECUTORS_PER_NODE * N_CORES_PER_EXECUTOR) + N_OVERHEAD_CORES_PER_WORKER ))
    
    # To distribute work evenly, recommended number of tasks/partitions is 3 times the number of cores.
    export N_TASKS_PER_EXECUTOR_CORE=3
    
    export N_CORES_DRIVER=1
    
    # setting ActiveProcessorCount to 1 ensures Runtime.availableProcessors() returns 1
    export SUBMIT_ARGS="--conf spark.executor.extraJavaOptions=-XX:ActiveProcessorCount=1"
    

    With the limited active processor count and reusing loaders, no OutOfMemory exceptions occur and processing completes much faster. @boazmohar noted that with his original setup, it took 3.5 hours using a Spark cluster with 2011 cores. My run with the parameters above took 7 minutes using 2200 cores (on 200 11-core worker nodes). Boaz's original run might have had other configuration issues, so this isn't necessarily apples-to-apples. Nevertheless, my guess is that his performance was adversely affected by the fetcher thread problem.

    Finally, @StephanPreibisch may want to revisit the getTransformedBoundingBox code and any other loading/reading to see if there are other options for reducing/reusing loaded data within the parallelized RDD loops. Broadcast variables might be suitable/helpful for this use case - but I'm not sure.

    opened by trautmane 10
  • Build failure issue on local server

    Build failure issue on local server

    Hi all,

    I just pulled the most release recent (about 10 minutes ago) and tried to build this project on our Linux Mint 19 server.

    Running (added flags to get debugging): mvn -e -X clean package -P fatjar

    I get a build error. I have attached the build log at the end of this message.

    It looks like it might be a Java version mismatch? Any suggestions on how to correctly build the project? I can change the Java JDK if I know which one to install.

    Relevant versions:

    mvn -version
    Apache Maven 3.6.0
    Maven home: /usr/share/maven
    Java version: 11.0.13, vendor: Ubuntu, runtime: /usr/lib/jvm/java-11-openjdk-amd64
    Default locale: en_US, platform encoding: UTF-8
    OS name: "linux", version: "5.4.0-74-generic", arch: "amd64", family: "unix"
    

    log.txt

    Thanks!

    opened by dpshepherd 3
  • NoSuchMethodError for Gson library

    NoSuchMethodError for Gson library

    Hi @kgabor,

    You mentioned that after adding zarr export support to AffineFusion, you ran into the following error when running a distributed spark instance:

    Exception in thread "main" java.lang.NoSuchMethodError: com.google.gson.reflect.TypeToken.getParameterized(Ljava/lang/reflect/Type;[Ljava/lang/reflect/Type;)Lcom/google/gson/reflect/TypeToken;
            at org.janelia.saalfeldlab.n5.zarr.N5ZarrReader.getZArraryAttributes(N5ZarrReader.java:259)
            ...
    

    I think this problem occurs because the Hadoop libraries used by Spark pull in an ancient version of Gson (likely 2.2.4 or similar) and the n5 zarr library (currently) relies upon Gson 2.8.6. Even though Gson 2.8.6 is bundled in the big-stitcher fat jar, the Hadoop stuff is higher in the classpath when running a Spark cluster - so you end up running with ancient Gson. This post describes the issue very nicely.

    As the post mentions, the best way to fix this issue is to force Spark to use a newer Gson library by specifying additional --conf arguments when launching spark-submit like this:

    /misc/local/spark-3.0.1/bin/spark-submit 
      --deploy-mode client 
      --master spark://... 
      --conf spark.driver.extraClassPath=/groups/scicompsoft/home/trautmane/bigstitcher/gabor/gson-2.8.6.jar
      --conf spark.executor.extraClassPath=/groups/scicompsoft/home/trautmane/bigstitcher/gabor/gson-2.8.6.jar
      ...
    

    You'll need to:

    • find a gson-2.8.6.jar file - I pulled it from my local maven repo: ${HOME}/.m2/repository/com/google/code/gson/gson/2.8.6/gson-2.8.6.jar,
    • copy it to a network filesystem location that your spark driver and workers can access, and
    • then add the path to the spark.driver.extraClassPath and spark.executor.extraClassPath configuration as I did above.

    Give this a shot and let me know if it solves the errors you were getting. I ran a small test case at Janelia and was able to successfully produce a zarr result - so, I'm hopeful this will work for you.

    Finally while debugging this problem, I made a few minor tweaks to your commits here that mean you will need to specify -s ZARR (capitalized) instead of your original lower case version if you pull and run with the latest code.

    Let me know how it goes, Eric

    opened by trautmane 2
  • Add section about building the executable to README

    Add section about building the executable to README

    This adds a short section pointing out the install script and its options.

    Background: When trying to install this on my local machine, I had various issues trying to build/install this as a non-Java-person. I managed to build with maven eventually but had problems with dependencies. Only then did I notice the install script that handled all of this gracefully. I think it deserves a prominent placement in the README :)

    opened by VolkerH 0
  • Output in a way that BigStitcher can open the fused data

    Output in a way that BigStitcher can open the fused data

    This should support downsampling (@trautmane) and an XML and maybe integration of several channels in one XML (@boazmohar).

    Maybe we should adjust how we load N5's in BDV @tpietzsch?

    opened by StephanPreibisch 2
  • Preserve original data anisotropy?

    Preserve original data anisotropy?

    Hi all,

    We've got this working fairly well locally. Still struggling with our school Hadoop cluster, which I think is a config issue.

    A lot of our data has anisotropic xy pixel size vs z steps. What is the best way to get the BigStitcher-Spark affine fusion to act the same way as the "preserve original data anisotropy" setting in BigStitcher?

    One thought I had was to edit the XML to change the calibration before fusion.

    Thanks!

    opened by dpshepherd 28
  • computational complexity of big sticher fusion shows undesirable scaling behaviour with number of tiles

    computational complexity of big sticher fusion shows undesirable scaling behaviour with number of tiles

    Hi @StephanPreibisch ,

    thanks for this new project. Need to set up Spark first, but keen to give this a try. Saw the announcement on twitter but as I don't have a twitter account I'll ask a related question here:

    We ran into issues running fusion with a large number of 2D tiles (not using the Spark version). The fusion step would just take many hours when fusing around 700 individual 2D tiles (mosaic scan of a whole slide). We observed that the scaling behaviour with the number of tiles was very unfortunate (polynomic), where I would expect it should only grow approximately linearly with the number of output pixels.

    As I had the impression that Big Stiticher was primarily developed for Light-Sheet data (fewer but much larger volume tiles and not many 2D tiles) this scaling behaviour with the number of tiles might have gone unnoticed?

    EDIT to add:

    The above behaviour was noticed on the non-Spark version of affine fusion, any chance this has already been fixed with this code?

    opened by VolkerH 2
Owner
PreibischLab
Preibisch Lab @ Berlin Institute of Medical Systems Biology
PreibischLab
A Java Virtual Machine - running on a Java Virtual Machine - running on a (jk).

Javaception A Java Virtual Machine - running on a Java Virtual Machine - running on a (jk). Goals JVMS compliant Java Virtual Machine Somewhat fast Re

null 33 Oct 10, 2022
Roman Beskrovnyi 250 Jan 9, 2023
Roman Beskrovnyi 248 Dec 21, 2022
Library to easily configure API Key authentication in (parts of) your Spring Boot Application

42 API Key Authentication A library to easily configure API Key authentication in (parts of) your Spring Boot Application. Features Easily configure A

null 2 Dec 8, 2021
Distributed Tracing, Metrics and Context Propagation for application running on the JVM

Kamon Kamon is a set of tools for instrumenting applications running on the JVM. The best way to get started is to go to our official Get Started Page

Kamon Open Source Project 1.4k Dec 25, 2022
Cadence is a distributed, scalable, durable, and highly available orchestration engine to execute asynchronous long-running business logic in a scalable and resilient way.

Cadence This repo contains the source code of the Cadence server and other tooling including CLI, schema tools, bench and canary. You can implement yo

Uber Open Source 6.5k Jan 4, 2023
Distributed and fault-tolerant realtime computation: stream processing, continuous computation, distributed RPC, and more

IMPORTANT NOTE!!! Storm has Moved to Apache. The official Storm git repository is now hosted by Apache, and is mirrored on github here: https://github

Nathan Marz 8.9k Dec 26, 2022
Vector map library and writer - running on Android and Desktop.

Mapsforge See the integration guide and changelog. And read through how to contribute guidelines. If you have any questions or problems, don't hesitat

mapsforge 1k Dec 30, 2022
Vector map library and writer - running on Android and Desktop.

Mapsforge See the integration guide and changelog. And read through how to contribute guidelines. If you have any questions or problems, don't hesitat

mapsforge 1k Jan 7, 2023
A scale demo of Neo4j Fabric spanning up to 1129 machines/shards running a 100TB (LDBC) dataset with 1.2tn nodes and relationships.

Demo application instructions Overview This repository contains the code necessary to reproduce the results for the Trillion Entity demonstration that

Neo4j 84 Nov 23, 2022
Minecraft mod running on the TTCp engine to load modules written in JS at runtime - with runtime deobfuscation!

PolyFire ClickGUI opens with NUMROW_0 How to use: Run -jsmodules to initialize Navigate to your .minecraft folder Go to config/pf/modules/ Copy Exampl

Daniel H. 8 Nov 18, 2022
A Minestom extension that opens the port that the Minestom server is running on!

OpenPortStom A project that uses weupnp to forward the port for you when starting your server, it will also attempt to close the port. Yes this is a s

null 4 Apr 24, 2022
A client-side Fabric mod for Minecraft Beta 1.7.3 that allows you to connect to servers running almost any patch from Alpha v1.1.2_01 to Beta 1.7.3.

multiversion-fabric A client-side Fabric mod for Minecraft Beta 1.7.3 that allows you to connect to servers running almost any patch from Alpha v1.1.2

0n1 2 Mar 13, 2022
Provides lobby features for mc servers running BedWars1058 in bungee mode.

BedWarsProxy is a plugin for Bungeecord networks that are running BedWars1058 in BUNGEE mode. This plugin provides features for lobby servers: join gu

Andrei Dascălu 18 Dec 26, 2022
Development Driven Testing (DDT) lets you generate unit tests from a running application. Reproduce a bug, generate a properly mocked test

DDTJ: It kills bugs DDT is the flip side of TDD (Test-driven development). It stands for "Development Driven Tests". Notice that it doesn’t contradict

null 4 Dec 30, 2021
IntelliJ IDEA and JUnit: Writing, Finding, and Running Tests

IntelliJ IDEA and JUnit: Writing, Finding, and Running Tests ?? Webinar https://blog.jetbrains.com/idea/2021/11/live-stream-recording-intellij-idea-an

Christian Stein 11 Jul 23, 2022