Sparkling Water provides H2O functionality inside Spark cluster

Overview

Sparkling Water

Join the chat at https://gitter.im/h2oai/sparkling-water image2 image3 Powered by H2O.ai

Sparkling Water integrates H2O's fast scalable machine learning engine with Spark. It provides:

  • Utilities to publish Spark data structures (RDDs, DataFrames, Datasets) as H2O's frames and vice versa.
  • DSL to use Spark data structures as input for H2O's algorithms.
  • Basic building blocks to create ML applications utilizing Spark and H2O APIs.
  • Python interface enabling use of Sparkling Water directly from PySpark.

Getting Started

User Documentation

The documentation contains also documentation for our clients, PySparkling and RSparkling.

Download Binaries

Maven

Each Sparkling Water release is published into Maven central with following coordinates:

  • ai.h2o:sparkling-water-core_{{scala_version}}:{{version}} - Includes core of Sparkling Water

  • ai.h2o:sparkling-water-examples_{{scala_version}}:{{version}} - Includes example applications

  • ai.h2o:sparkling-water-repl_{{scala_version}}:{{version}} - Spark REPL integration into H2O Flow UI

  • ai.h2o:sparkling-water-ml_{{scala_version}}:{{version}} - Extends Spark ML package by H2O-based transformations

  • ai.h2o:sparkling-water-scoring_{{scala_version}}:{{version}} - A library containing scoring logic and definition of Sparkling Water MOJO models.

  • ai.h2o:sparkling-water-scoring-package_{{scala_version}}:{{version}} - Lightweight Sparkling Water package including all dependencies required just for scoring with H2O-3 and DAI MOJO models.

  • ai.h2o:sparkling-water-package_{{scala_version}}:{{version}} - Sparkling Water package containing all dependencies required for model training and scoring. This is designed to use as Spark package via --packages option.

    Note: The {{version}} references to a release version of Sparkling Water, the {{scala_version}} references to Scala base version.

The full list of published packages is available here.

Sparkling Water Requirements for Spark 3.0

  • Linux/OS X/Windows
  • Java 8+
  • Python 2.7+ For Python version of Sparkling Water (PySparkling)
  • Spark 3.0 and SPARK_HOME shell variable must point to your local Spark installation

To see requirements for older Spark version, please visit relevant documentation.


Use Sparkling Water

Sparkling Water is distributed as a Spark application library which can be used by any Spark application. Furthermore, we provide also zip distribution which bundles the library and shell scripts.

There are several ways of using Sparkling Water:

  • Sparkling Shell
  • Sparkling Water driver
  • Spark Shell and include Sparkling Water library via --jars or --packages option
  • Spark Submit and include Sparkling Water library via --jars or --packages option
  • PySpark with PySparkling

Run Sparkling shell

The Sparkling shell encapsulates a regular Spark shell and append Sparkling Water library on the classpath via --jars option. The Sparkling Shell supports creation of an H2O cloud and execution of H2O algorithms.

  1. Either download or build Sparkling Water

  2. Configure the location of Spark cluster:

    export SPARK_HOME="/path/to/spark/installation"
    export MASTER="local[*]"

    In this case, local[*] points to an embedded single node cluster.

  3. Run Sparkling Shell:

    bin/sparkling-shell

    Sparkling Shell accepts common Spark Shell arguments. For example, to increase memory allocated by each executor, use the spark.executor.memory parameter: bin/sparkling-shell --conf "spark.executor.memory=4g"

  4. Initialize H2OContext

    import ai.h2o.sparkling._
    val hc = H2OContext.getOrCreate()

    H2OContext starts H2O services on top of Spark cluster and provides primitives for transformations between H2O and Spark data structures.

Use Sparkling Water with PySpark

Sparkling Water can be also used directly from PySpark and the integration is called PySparkling.

See PySparkling README to learn about PySparkling.

Use Sparkling Water via Spark Packages

To see how Sparkling Water can be used as Spark package, please see Use as Spark Package.

Use Sparkling Water in Windows environments

See Windows Tutorial to learn how to use Sparkling Water in Windows environments.

Sparkling Water examples

To see how to run examples for Sparkling Water, please see Running Examples.


Sparkling Water Backends

Sparkling water supports two backend/deployment modes - internal and external. Sparkling Water applications are independent on the selected backend. The backend can be specified before creation of the H2OContext.

For more details regarding the internal or external backend, please see Backends.


FAQ

List of all Frequently Asked Questions is available at FAQ.


Development

Complete development documentation is available at Development Documentation.

Build Sparkling Water

To see how to build Sparkling Water, please see Build Sparkling Water.

Develop applications with Sparkling Water

An application using Sparkling Water is regular Spark application which bundling Sparkling Water library. See Sparkling Water Droplet providing an example application here.

Contributing

Look at our list of JIRA tasks or send your idea to [email protected].

Filing Bug Reports and Feature Requests

You can file a bug report of feature request directly in the Sparkling Water JIRA page at http://jira.h2o.ai/.

  1. Log in to the Sparkling Water JIRA tracking system. (Create an account if necessary.)

  2. Once inside the home page, click the Create button.

    center
  3. A form will display allowing you to enter information about the bug or feature request.

    center

    Enter the following on the form:

    • Select the Project that you want to file the issue under. For example, if this is an open source public bug, you should file it under SW (SW).
    • Specify the Issue Type. For example, if you believe you've found a bug, then select Bug, or if you want to request a new feature, then select New Feature.
    • Provide a short but concise summary about the issue. The summary will be shown when engineers organize, filter, and search for Jira tickets.
    • Specify the urgency of the issue using the Priority dropdown menu.
    • If there is a due date specify it with the Due Date.
    • The Components drop down refers to the API or language that the issue relates to. (See the drop down menu for available options.)
    • You can leave Affects Version/s, Fix Versions, and Assignee fields blank. Our engineering team will fill this in.
    • Add a detailed description of your bug in the Description section. Best practice for descriptions include:
    • A summary of what the issue is
    • What you think is causing the issue
    • Reproducible code that can be run end to end without requiring an engineer to edit your code. Use {code} {code} around your code to make it appear in code format.
    • Any scripts or necessary documents. Add by dragging and dropping your files into the create issue dialogue box.

    You can be able to leave the rest of the ticket blank.

  4. When you are done with your ticket, simply click on the Create button at the bottom of the page.

    center

After you click Create, a pop up will appear on the right side of your screen with a link to your Jira ticket. It will have the form https://0xdata.atlassian.net/browse/SW-####. You can use this link to later edit your ticket.

Please note that your Jira ticket number along with its summary will appear in one of the Jira ticket slack channels, and anytime you update the ticket anyone associated with that ticket, whether as the assignee or a watcher will receive an email with your changes.

Have Questions?

We also respond to questions tagged with sparkling-water and h2o tags on the Stack Overflow.

Change Logs

Change logs are available at Change Logs.


Comments
  • Unable to create h2o_context on Databricks using R and Scala

    Unable to create h2o_context on Databricks using R and Scala

    I'm trying to use sparkling water on Azure Databricks and I'm not able to create h2o_context. I tried this in both R and Scala on the same cluster.

    R Sample code:

    install.packages("sparklyr")
    install.packages("rsparkling")
    install.packages("h2o", type="source", repos="https://h2o-release.s3.amazonaws.com/h2o/rel-yates/2/R")
    
    library(rsparkling)
    library(sparklyr)
    
    sc <- spark_connect(method="databricks")
    h2o_context(sc)
    

    Scala sample code:

    import org.apache.spark.h2o._
    val hc = H2OContext.getOrCreate(spark)
    

    Configuration details

    • Azure Databricks version: 5.3 ML (includes Apache Spark 2.4.0, Scala 2.11)
    • Driver type: Standard_DS3_V2: 14.0 GB Memory, 4 Cores, 0.75 DBU
    • Min workers: 2
    • Max workers: 8
    • Enable autoscaling: Yes
    • Sparkling Water library: sparkling_water_assembly_2_11_2_4_10_all.jar

    Error log: Error : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 14.0 failed 4 times, most recent failure: Lost task 1.3 in stage 14.0 (TID 144, 10.139.64.6, executor 18): ExecutorLostFailure (executor 18 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:2355) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2343) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2342) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2342) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:1096) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:1096) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1096) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2574) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2522) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2510) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:893) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2233) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2255) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2274) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2299) at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:961) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:379) at org.apache.spark.rdd.RDD.collect(RDD.scala:960) at org.apache.spark.h2o.backends.internal.InternalBackendUtils$class.startH2O(InternalBackendUtils.scala:196) at org.apache.spark.h2o.backends.internal.InternalBackendUtils$.startH2O(InternalBackendUtils.scala:306) at org.apache.spark.h2o.backends.internal.InternalH2OBackend.init(InternalH2OBackend.scala:104) at org.apache.spark.h2o.H2OContext.init(H2OContext.scala:129) at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:403) at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:438) at org.apache.spark.h2o.H2OContext.getOrCreate(H2OContext.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sparklyr.Invoke.invoke(invoke.scala:139) at sparklyr.StreamHandler.handleMethodCall(stream.scala:123) at sparklyr.StreamHandler.read(stream.scala:66) at sparklyr.BackendHandler.channelRead0(handler.scala:51) at sparklyr.BackendHandler.channelRead0(handler.scala:4) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:284) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:138) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) at java.lang.Thread.run(Thread.java:748)

    opened by sasikiran 36
  • Memory usage blows up for GLM with many variables.

    Memory usage blows up for GLM with many variables.

    When I run GLM with 15k variables, H2O needed less than 100GB to finish the job, however when I run GLM with 21k variables (20k of which are interactions), H2O could not finish even with 640GB memory. Clusterstatus in Flow shows that the Free Memory for each executor drops to 2-3GB, and then the executors exit with Code 143 & 52.

    Spark version 2.2.0 H2O version 3.16.0.2 Sparkling Water version 2.2.4 R 3.3.3 sparklyr_0.7.0-9105

    The Spark was initiated with the following config

    config <- spark_config()                   
    config$sparklyr.cores.local <- 2
    config$spark.executor.extraJavaOptions <- "-XX:-UseGCOverheadLimit -XX:+UseG1GC -XX:+UnlockExperimentalVMOptions -XX:-ResizePLAB -XX:+ParallelRefProcEnabled -XX:+AlwaysPreTouch -XX:MaxGCPauseMillis=100 -XX:ParallelGCThreads=20 -XX:ConcGCThreads=15 -XX:G1NewSizePercent=1 -XX:G1MaxNewSizePercent=5 -XX:G1MixedGCLiveThresholdPercent=85 -XX:G1HeapWastePercent=2 -XX:InitiatingHeapOccupancyPercent=35"                                                                             
    config$`sparklyr.shell.driver-memory` <- "8g"
    config$`sparklyr.shell.executor-memory` <- "8g"
    config$spark.yarn.executor.memoryOverhead <- "2g"
    config$spark.yarn.driver.memoryOverhead <- "2g"
    config$spark.executor.instances <- 36
    config$spark.executor.cores <- 4                                                                                  
    config$spark.executor.memory <- "20g"
    

    stdout log from executor

    12-06 16:12:34.661 10.114.134.134:54321  16017  #e Thread WARN: Unblock allocations; cache below desired, but also OOM: GC CALLBACK, (K/V:105.2 MB + POJO:14.79 GB + FREE:2.89 GB == MEM_MAX:17.78 GB), desiredKV=2.22 GB OOM!
    #
    # java.lang.OutOfMemoryError: Java heap space
    # -XX:OnOutOfMemoryError="kill %p"
    #   Executing /bin/sh -c "kill 16017"...
    12-06 16:13:53.834 10.114.134.134:54321  16017  #39:54321 ERRR: java.lang.OutOfMemoryError: Java heap space
    

    stderr log from executor

    17/12/06 16:00:36 WARN retry.RetryInvocationHandler: A failover has occurred since the start of this method invocation attempt.
    17/12/06 16:00:36 WARN retry.RetryInvocationHandler: A failover has occurred since the start of this method invocation attempt.
    17/12/06 16:00:36 WARN retry.RetryInvocationHandler: A failover has occurred since the start of this method invocation attempt.
    java.lang.OutOfMemoryError: Java heap space
    17/12/06 16:13:53 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
    17/12/06 16:13:53 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[UDP-TCP-READ-rkalsdatanode032.kau.roche.com/10.114.134.138:54321,10,main]
    java.lang.OutOfMemoryError: Java heap space
    17/12/06 16:13:53 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Heartbeat,10,main]
    java.lang.OutOfMemoryError: Java heap space
    17/12/06 16:13:53 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[FailedNodeWatchdogThread,5,main]
    java.lang.OutOfMemoryError: GC overhead limit exceeded
    17/12/06 16:13:53 INFO storage.DiskBlockManager: Shutdown hook called
    17/12/06 16:13:53 INFO util.ShutdownHookManager: Shutdown hook called
    17/12/06 16:13:53 INFO util.ShutdownHookManager: Deleting directory /srv/hdfs/disk4/yarn/nm/usercache/laic16/appcache/application_1511791866449_5610/spark-32b887f4-b7ee-41df-96fc-635d9c3c1a2d
    
    opened by axiomoixa 34
  • R + sparkling-water (H2O/Spark)

    R + sparkling-water (H2O/Spark)

    All of the demos/examples on the README.md seem to be doing all the prediction code in scala and only has R as an after thought using the residualPlotRCode function from here to visualise in R. Scala is on my "to learn" list , but in the mean time... Given h20's close connections with R, is it possible to see/have a sparkling water example/demo with R as the interface? and ideally with an EC2 example too to illustrate the benefits of distributed parallel computing? Maybe it might need to use SparkR? or something else? I'm not sure...

    help wanted 
    opened by hmaeda 26
  • H2OContext.getOrCreate() error on CDH

    H2OContext.getOrCreate() error on CDH

    • When trying to start H2OContext on CDSW (Cloudera Data Science Workbench). Command is : spark2-submit pysparkling_test.py I got the following error: It repeatedly print the logs.. 19/08/16 06:57:32 INFO spark.SparkContext: Added JAR /home/cdsw/.local/lib/python2.7/site-packages/sparkling_water/sparkling_water_assembly.jar at spark://10.156.4.64:24583/jars/sparkling_water_assembl y.jar with timestamp 1565938652817 19/08/16 06:57:32 WARN internal.InternalH2OBackend: Increasing 'spark.locality.wait' to value 0 (Infinitive) as we need to ensure we run on the nodes with H2O 19/08/16 06:57:32 WARN internal.InternalBackendUtils: Unsupported options spark.dynamicAllocation.enabled detected! 19/08/16 06:57:32 INFO internal.InternalH2OBackend: Starting H2O services: Sparkling Water configuration: backend cluster mode : internal workers : None cloudName : sparkling-water-cdsw_application_1563277926760_191096 clientBasePort : 54321 nodeBasePort : 54321 cloudTimeout : 60000 h2oNodeLog : INFO h2oClientLog : INFO nthreads : -1 drddMulFactor : 10 19/08/16 06:57:33 INFO spark.SparkContext: Starting job: collect at SpreadRDDBuilder.scala:62 19/08/16 06:57:33 INFO scheduler.DAGScheduler: Registering RDD 2 (distinct at SpreadRDDBuilder.scala:62) 19/08/16 06:57:33 INFO scheduler.DAGScheduler: Got job 0 (collect at SpreadRDDBuilder.scala:62) with 201 output partitions 19/08/16 06:57:33 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (collect at SpreadRDDBuilder.scala:62) 19/08/16 06:57:33 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 0) 19/08/16 06:57:33 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 0) 19/08/16 06:57:33 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[2] at distinct at SpreadRDDBuilder.scala:62), which has no missing parents 19/08/16 06:57:33 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 9.3 KB, free 7.6 GB) 19/08/16 06:57:33 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 4.9 KB, free 7.6 GB) 19/08/16 06:57:33 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.156.4.64:21432 (size: 4.9 KB, free: 7.6 GB) 19/08/16 06:57:33 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1039 19/08/16 06:57:33 INFO scheduler.DAGScheduler: Submitting 201 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[2] at distinct at SpreadRDDBuilder.scala:62) (first 15 tasks are for partitions Vect or(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)) 19/08/16 06:57:33 INFO cluster.YarnScheduler: Adding task set 0.0 with 201 tasks 19/08/16 06:57:34 INFO spark.ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 1) 19/08/16 06:57:35 INFO spark.ExecutorAllocationManager: Requesting 2 new executors because tasks are backlogged (new desired total will be 3) 19/08/16 06:57:36 INFO spark.ExecutorAllocationManager: Requesting 4 new executors because tasks are backlogged (new desired total will be 7) 19/08/16 06:57:37 INFO spark.ExecutorAllocationManager: Requesting 8 new executors because tasks are backlogged (new desired total will be 15) 19/08/16 06:57:38 INFO spark.ExecutorAllocationManager: Requesting 16 new executors because tasks are backlogged (new desired total will be 31) 19/08/16 06:57:39 INFO spark.ExecutorAllocationManager: Requesting 32 new executors because tasks are backlogged (new desired total will be 63) 19/08/16 06:57:40 INFO spark.ExecutorAllocationManager: Requesting 64 new executors because tasks are backlogged (new desired total will be 127) 19/08/16 06:57:40 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (100.66.128.0:58270) with ID 2 19/08/16 06:57:40 INFO spark.ExecutorAllocationManager: New executor 2 has registered (new total is 1) 19/08/16 06:57:40 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, aup7964s.unix.anz, executor 2, partition 0, PROCESS_LOCAL, 7743 bytes) 19/08/16 06:57:40 INFO storage.BlockManagerMasterEndpoint: Registering block manager aup7964s.unix.anz:33662 with 366.3 MB RAM, BlockManagerId(2, aup7964s.unix.anz, 33662, None) 19/08/16 06:57:40 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (100.66.128.0:58280) with ID 4 19/08/16 06:57:40 INFO spark.ExecutorAllocationManager: New executor 4 has registered (new total is 2) 19/08/16 06:57:40 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, aup7964s.unix.anz, executor 4, partition 1, PROCESS_LOCAL, 7743 bytes) 19/08/16 06:57:40 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (100.66.128.0:58282) with ID 14 19/08/16 06:57:40 INFO spark.ExecutorAllocationManager: New executor 14 has registered (new total is 3)

    • Once switch to command: spark2-submit --master local[2] pysparkling_test.py the errors are below: 19/08/16 06:59:53 INFO spark.SparkContext: Added JAR /home/cdsw/.local/lib/python2.7/site-packages/sparkling_water/sparkling_water_assembly.jar at spark://10.156.4.64:24583/jars/sparkling_water_assembl y.jar with timestamp 1565938793481 19/08/16 06:59:53 WARN internal.InternalH2OBackend: Increasing 'spark.locality.wait' to value 0 (Infinitive) as we need to ensure we run on the nodes with H2O 19/08/16 06:59:53 WARN internal.InternalBackendUtils: Unsupported options spark.dynamicAllocation.enabled detected! 19/08/16 06:59:53 INFO internal.InternalH2OBackend: Starting H2O services: Sparkling Water configuration: backend cluster mode : internal workers : None cloudName : sparkling-water-cdsw_local-1565938791492 clientBasePort : 54321 nodeBasePort : 54321 cloudTimeout : 60000 h2oNodeLog : INFO h2oClientLog : INFO nthreads : -1 drddMulFactor : 10 19/08/16 06:59:53 INFO java.NativeLibrary: Loaded library from lib/linux_64/libxgboost4j_gpu.so (/tmp/libxgboost4j_gpu392673508027643109.so) Sparkling Water version: 3.26.2-2.3 Spark version: 2.3.0.cloudera3 Integrated H2O version: 3.26.0.2 The following Spark configuration is used: (spark.eventLog.enabled,true) (spark.app.name,SparklingWaterApp) (spark.scheduler.minRegisteredResourcesRatio,1) (spark.ext.h2o.cloud.name,sparkling-water-cdsw_local-1565938791492) (spark.driver.memory,14976m) (spark.yarn.jars,local:/app/hadoop/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2/jars/) (spark.eventLog.dir,hdfs://nameservice1/user/spark/spark2ApplicationHistory) (spark.ui.killEnabled,true) (spark.yarn.appMasterEnv.PYSPARK_PYTHON,/app/hadoop/parcels/Anaconda-4.3.1/bin/python) (spark.ui.port,20049) (spark.driver.bindAddress,100.66.128.10) (spark.dynamicAllocation.executorIdleTimeout,60) (spark.serializer,org.apache.spark.serializer.KryoSerializer) (spark.ext.h2o.client.log.dir,/home/cdsw/h2ologs/local-1565938791492) (spark.io.encryption.enabled,false) (spark.yarn.am.extraLibraryPath,/app/hadoop/parcels/CDH-5.13.3-1.cdh5.13.3.p3486.3704/lib/hadoop/lib/native:/app/hadoop/parcels/GPLEXTRAS-5.13.3-1.cdh5.13.3.p0.2/lib/hadoop/lib/native) (spark.authenticate,false) (spark.sql.hive.metastore.jars,${env:HADOOP_COMMON_HOME}/../hive/lib/:${env:HADOOP_COMMON_HOME}/client/*) (spark.lineage.log.dir,/var/log/spark2/lineage) (spark.app.id,local-1565938791492) (spark.serializer.objectStreamReset,100) (spark.locality.wait,0) (spark.submit.deployMode,client) (spark.sql.autoBroadcastJoinThreshold,-1) (spark.yarn.historyServer.address,http://aup7727s.unix.anz:18089) (spark.network.crypto.enabled,false) (spark.dynamicAllocation,false) (spark.lineage.enabled,false) (spark.shuffle.service.enabled,true) (spark.hadoop.hadoop.treat.subject.external,true) (spark.executor.id,driver) (spark.dynamicAllocation.schedulerBacklogTimeout,1) (spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON,/app/hadoop/parcels/Anaconda-4.3.1/bin/python) (spark.shuffle.service.port,7337) (spark.sql.hive.metastore.version,1.1.0) (spark.ext.h2o.fail.on.unsupported.spark.param,false) (spark.yarn.rmProxy.enabled,false) (spark.sql.warehouse.dir,/user/hive/warehouse) (spark.ext.h2o.client.ip,10.156.4.64) (spark.sql.catalogImplementation,hive) (spark.rdd.compress,True) (spark.executor.extraLibraryPath,/app/hadoop/parcels/CDH-5.13.3-1.cdh5.13.3.p3486.3704/lib/hadoop/lib/native:/app/hadoop/parcels/GPLEXTRAS-5.13.3-1.cdh5.13.3.p0.2/lib/hadoop/lib/native) (spark.yarn.config.gatewayPath,/app/hadoop/parcels) (spark.ui.enabled,false) (spark.dynamicAllocation.minExecutors,0) (spark.yarn.config.replacementPath,{{HADOOP_COMMON_HOME}}/../../..) (spark.dynamicAllocation.enabled,true) (spark.driver.extraLibraryPath,/app/hadoop/parcels/CDH-5.13.3-1.cdh5.13.3.p3486.3704/lib/hadoop/lib/native:/app/hadoop/parcels/GPLEXTRAS-5.13.3-1.cdh5.13.3.p0.2/lib/hadoop/lib/native) (spark.files,file:/home/cdsw/pysparkling_test.py) (spark.driver.blockManager.port,21432) (spark.master,local[2]) (spark.driver.port,24583) (spark.driver.host,10.156.4.64)

    ----- H2O started ----- Build git branch: rel-yau Build git hash: 4854053b2e1773e6df02e04895709f692ebf7088 Build git describe: jenkins-3.26.0.1-71-g4854053 Build project version: 3.26.0.2 Build age: 20 days Built by: 'jenkins' Built on: '2019-07-26 23:05:58' Found H2O Core extensions: [HiveTableImporter, StackTraceCollector, Watchdog, XGBoost] Processed H2O arguments: [-name, sparkling-water-cdsw_local-1565938791492, -port_offset, 1, -quiet, -log_level, INFO, -log_dir, /home/cdsw/h2ologs/local-1565938791492, -baseport, 54321, -ip, 10.156.4.64, -flatfile, /tmp/1565938793606-0/flatfile.txt] Java availableProcessors: 64 Java heap totalMemory: 2.46 GB Java heap maxMemory: 13.00 GB Java version: Java 1.8.0_111 (from Oracle Corporation) JVM launch parameters: [-Xmx14976m] OS version: Linux 3.10.0-862.25.3.el7.x86_64 (amd64) Machine physical memory: 251.62 GB Machine locale: en_US X-h2o-cluster-id: 1565938793549 User name: 'cdsw' IPv6 stack selected: false Possible IP Address: eth0 (eth0), 100.66.128.10 Possible IP Address: lo (lo), 127.0.0.1 IP address not found on this machine 19/08/16 06:59:54 INFO spark.SparkContext: Invoking stop() from shutdown hook 19/08/16 06:59:54 INFO server.AbstractConnector: Stopped Spark@2b2add4a{HTTP/1.1,[http/1.1]}{0.0.0.0:20049} 19/08/16 06:59:54 INFO ui.SparkUI: Stopped Spark web UI at http://10.156.4.64:20049 19/08/16 06:59:54 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 19/08/16 06:59:54 INFO memory.MemoryStore: MemoryStore cleared 19/08/16 06:59:54 INFO storage.BlockManager: BlockManager stopped 19/08/16 06:59:54 INFO storage.BlockManagerMaster: BlockManagerMaster stopped 19/08/16 06:59:54 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 19/08/16 06:59:54 INFO spark.SparkContext: Successfully stopped SparkContext 19/08/16 06:59:54 INFO util.ShutdownHookManager: Shutdown hook called 19/08/16 06:59:54 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-ff331aa3-bb6b-474c-80a9-7b887e278c1d 19/08/16 06:59:54 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-ff331aa3-bb6b-474c-80a9-7b887e278c1d/pyspark-4ad91a7c-a673-419e-afa6-d292234f630d 19/08/16 06:59:54 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-83099783-3b75-4c01-a6e9-71db2cd82014

    Providing us with the observed and expected behavior definitely helps. Giving us with the following information definitively helps:

    • Sparkling Water/PySparkling/RSparkling version h2o_pysparkling_2.3

    • Hadoop Version & Distribution CDH

    • Execution mode YARN-client, YARN-cluster, standalone, local .. YARN-client

    Please also provide us with the full and minimal reproducible code. from pysparkling import * import h2o from h2o.estimators.xgboost import *

    spark = SparkSession
    .builder
    .appName('SparklingWaterApp')
    .getOrCreate()

    h2oConf = H2OConf(spark)
    .set('spark.ui.enabled', 'false')
    .set('spark.ext.h2o.fail.on.unsupported.spark.param', 'false')
    .set('spark.dynamicAllocation', 'false')
    .set('spark.scheduler.minRegisteredResourcesRatio', '1')
    .set('spark.sql.autoBroadcastJoinThreshold', '-1')
    .set('spark.locality.wait', '0') hc = H2OContext.getOrCreate(spark, conf=h2oConf)

    h2o.cluster().shutdown() spark.stop()

    opened by GlockGao 25
  • IPs are not equal

    IPs are not equal" error when starting H2OContext with Spark Context in Zeppelin

    Hi, Similar to https://github.com/h2oai/sparkling-water/issues/37 trying to start H2o context via zeppelin using latest sparkling water assembly (sparkling-water-assembly_2.10-1.6.11-all.jar) got the below error.

    spark version -> 1.6.2 sparkling water version -> 1.6.11 deployment type ( spark MASTER variable - local, yarn ) ->Spark yarn client mode ( zeppelin ) data on which this exception happened -> today reproducible code -> import org.apache.spark.h2o._ val h2oContext = H2OContext.getOrCreate(sc) import h2oContext._ import h2oContext.implicits._

    Appreciate any help

    org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 58, datanode-045.domain.com): java.lang.AssertionError: assertion failed: SpreadRDD failure - IPs are not equal: (1,datanode-060.domain.com,-1) != (2, datanode-045.domain.com)
    	at scala.Predef$.assert(Predef.scala:179)
    	at org.apache.spark.h2o.backends.internal.InternalBackendUtils$$anonfun$7.apply(InternalBackendUtils.scala:103)
    	at org.apache.spark.h2o.backends.internal.InternalBackendUtils$$anonfun$7.apply(InternalBackendUtils.scala:102)
    	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    	at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
    	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
    	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
    	at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
    	at scala.collection.AbstractIterator.to(Iterator.scala:1157)
    	at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
    	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
    	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
    	at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
    	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:934)
    	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:934)
    	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1882)
    	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1882)
    	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    	at org.apache.spark.scheduler.Task.run(Task.scala:89)
    	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    	at java.lang.Thread.run(Thread.java:745)
    Driver stacktrace:
    	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1433)
    	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1421)
    	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1420)
    	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1420)
    	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:801)
    	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:801)
    	at scala.Option.foreach(Option.scala:236)
    	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:801)
    	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1642)
    	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1601)
    	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1590)
    	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
    	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:622)
    	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1856)
    	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1869)
    	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1882)
    	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1953)
    	at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:934)
    	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
    	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
    	at org.apache.spark.rdd.RDD.withScope(RDD.scala:323)
    	at org.apache.spark.rdd.RDD.collect(RDD.scala:933)
    	at org.apache.spark.h2o.backends.internal.InternalBackendUtils$class.startH2O(InternalBackendUtils.scala:165)
    	at org.apache.spark.h2o.backends.internal.InternalBackendUtils$.startH2O(InternalBackendUtils.scala:263)
    	at org.apache.spark.h2o.backends.internal.InternalH2OBackend.init(InternalH2OBackend.scala:103)
    	at org.apache.spark.h2o.H2OContext.init(H2OContext.scala:112)
    	at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:294)
    	at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:316)
    	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:32)
    	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:37)
    	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:39)
    	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:41)
    	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:43)
    	at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:45)
    	at $iwC$$iwC$$iwC$$iwC.<init>(<console>:47)
    	at $iwC$$iwC$$iwC.<init>(<console>:49)
    	at $iwC$$iwC.<init>(<console>:51)
    	at $iwC.<init>(<console>:53)
    	at <init>(<console>:55)
    	at .<init>(<console>:59)
    	at .<clinit>(<console>)
    	at .<init>(<console>:7)
    	at .<clinit>(<console>)
    	at $print(<console>)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:498)
    	at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
    	at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
    	at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
    	at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
    	at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:498)
    	at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:38)
    	at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:972)
    	at org.apache.zeppelin.spark.SparkInterpreter.interpretInput(SparkInterpreter.java:1198)
    	at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:1144)
    	at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:1137)
    	at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:95)
    	at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:490)
    	at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
    	at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
    	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    	at java.lang.Thread.run(Thread.java:745)
    Caused by: java.lang.AssertionError: assertion failed: SpreadRDD failure - IPs are not equal: (1,datanode-060.domain.com,-1) != (2, datanode-045.domain.com)
    	at scala.Predef$.assert(Predef.scala:179)
    	at org.apache.spark.h2o.backends.internal.InternalBackendUtils$$anonfun$7.apply(InternalBackendUtils.scala:103)
    	at org.apache.spark.h2o.backends.internal.InternalBackendUtils$$anonfun$7.apply(InternalBackendUtils.scala:102)
    	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    	at scala
    .collection.Iterator$class.foreach(Iterator.scala:727)
    	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    	at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
    	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
    	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
    	at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
    	at scala.collection.AbstractIterator.to(Iterator.scala:1157)
    	at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
    	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
    	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
    	at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
    	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:934)
    	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:934)
    	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1882)
    	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1882)
    	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    	at org.apache.spark.scheduler.Task.run(Task.scala:89)
    	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
    	... 3 more
    
    opened by lordlinus 24
  • [ERROR] Executor without H2O instance discovered, killing the cloud!

    [ERROR] Executor without H2O instance discovered, killing the cloud!

    I'm getting the error mentioned in the title. No clue why.

    The command I use to run Sparkling Water is: spark-submit --class water.SparklingWaterDriver --master yarn-client --num-executors 3 --driver-memory 4g --executor-memory 2g --executor-cores 1 /opt/sparkling-water/sparkling-water-1.5.14/assembly/build/libs/*.jar

    Full error stacktrace looks like this:

    16/05/16 09:24:15 ERROR LiveListenerBus: Listener anon1 threw an exception java.lang.IllegalArgumentException: Executor without H2O instance discovered, killing the cloud! at org.apache.spark.h2o.H2OContext$$anon$1.onExecutorAdded(H2OContext.scala:180) at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:58) at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:56) at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37) at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:79) at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1136) at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:63) 16/05/16 09:24:16 INFO BlockManagerMasterEndpoint: Registering block manager bda1node05.na.pg.com:17644 with 1060.0 MB RAM, BlockManagerId(4, bda1node05.na.pg.com, 17644) Exception in thread "main" java.lang.RuntimeException: Cloud size under 3 at water.H2O.waitForCloudSize(H2O.java:1547) at org.apache.spark.h2o.H2OContext.start(H2OContext.scala:223) at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:337) at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:363) at water.SparklingWaterDriver$.main(SparklingWaterDriver.scala:38) at water.SparklingWaterDriver.main(SparklingWaterDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

    opened by Dom-nik 21
  • org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection has grown past JVM limit of 0xFFFF

    org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection has grown past JVM limit of 0xFFFF

    We started getting this error on wide datasets after upgrading to latest SW 2.2.3. It was not happening on previous SW release 2.2.2.

    executor 16): java.lang.RuntimeException: Error while encoding: org.codehaus.janino.JaninoRuntimeException: failed to compile: org.codehaus.janino.JaninoRuntimeException: Constant pool for class org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection has grown past JVM limit of 0xFFFF /* 001 / public java.lang.Object generate(Object[] references) { / 002 / return new SpecificUnsafeProjection(references); / 003 / } / 004 / / 005 / class SpecificUnsafeProjection extends org.apache.spark.sql.catalyst.expressions.UnsafeProjection { / 006 / / 007 / private Object[] references; / 008 / private int argValue; / 009 / private java.lang.String argValue1; / 010 / private boolean isNull11; / 011 / private boolean value11; / 012 / private boolean isNull12; /

    Code:

    new_df = df.drop('FILE_CODE', 'ZIP_CODE', 'ZIP_PLUS_4', 'ADDRESS_KEY', 'HOUSEHOLD_KEY', 'AGILITY_ADDRESS', 'AGILITY_HOUSEHOLD')
    print "Drop vars"
    skippy_binary = hc.as_h2o_frame(new_df,framename='skippy_binary')
    skippy_binary["SEGMENT"] = skippy_binary["SEGMENT"].asfactor()
    print "H2O Frame Created"
    
    

    This error happens on a dataframe with ~3k variables, but doesn't happen on a dataframe with ~800 columns for example. But again, SW 2.2.2 didn't have this problem on the same same data/same code.

    opened by Tagar 20
  • [SWPRIVATE-16] NA handling for Spark algorithms

    [SWPRIVATE-16] NA handling for Spark algorithms

    • added NA value handling for SVM: mean imputation, skip, not allowed (old variant with an error message)
    • means are calculated via RDD as H2OFrame.means() doesn't support enum columns
    • changed the SVM model scoring/pojo generation to include column means when running in MeanImputation mode
    • refactored tests a bit (frame creation now supports multi column frames)
    approved 
    opened by mdymczyk 19
  • Not able to start external backend on YARN : java.io.IOException: Cannot run program

    Not able to start external backend on YARN : java.io.IOException: Cannot run program "hadoop": error=2, No such file or directory

    Here are the logs

    20/01/31 14:06:10 WARN external.ExternalH2OBackend: Increasing 'spark.locality.wait' to value 30000
    20/01/31 14:06:10 INFO h2o.H2OContext$H2OContextClientBased: Sparkling Water version: 3.28.0.1-1-2.4
    20/01/31 14:06:10 INFO h2o.H2OContext$H2OContextClientBased: Spark version: 2.4.4
    20/01/31 14:06:10 INFO h2o.H2OContext$H2OContextClientBased: Integrated H2O version: 3.28.0.1
    20/01/31 14:06:10 INFO h2o.H2OContext$H2OContextClientBased: The following Spark configuration is used: 
        (spark.ext.h2o.external.cluster.size,2)
        (spark.driver.host,project-master)
        (spark.sql.shuffle.partitions,4)
        (spark.submit.deployMode,cluster)
        (spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS,project-master)
        (spark.ext.h2o.external.h2o.driver,/home/project/sparkling-water-3.28.0.1-1-2.4/h2odriver-sw3.28.0-hdp2.6-extended.jar)
        (spark.ext.h2o.cluster.info.name,notify_H2O_via_SparklingWater_application_1580479162776_0001)
        (spark.app.name,clone3)
        (spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES,http://project-master:8088/proxy/application_1580479162776_0001)
        (spark.executor.id,driver)
        (spark.yarn.dist.files,file:///home/project/dist-0.0.1/project-distribution/application.yml)
        (spark.ext.h2o.hadoop.memory,2G)
        (spark.ext.h2o.cloud.name,H2O_via_SparklingWater_application_1580479162776_0001)
        (spark.yarn.app.container.log.dir,/home/project/usr/local/hadoop/logs/userlogs/application_1580479162776_0001/container_1580479162776_0001_01_000001)
        (spark.master,yarn)
        (spark.ui.port,0)
        (spark.app.id,application_1580479162776_0001)
        (spark.ext.h2o.client.log.dir,logs/H2Ologs)
        (spark.driver.port,38363)
        (spark.locality.wait,30000)
        (spark.executorEnv.JAVA_HOME,/usr/lib/jvm/java-8-openjdk-amd64)
        (spark.ext.h2o.external.start.mode,auto)
        (spark.jars,)
        (spark.ui.filters,org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter)
        (spark.ext.h2o.external.yarn.queue,default)
        (spark.ext.h2o.backend.cluster.mode,external)
        (spark.yarn.app.id,application_1580479162776_0001)
    20/01/31 14:06:10 INFO external.ExternalH2OBackend: Starting the external H2O cluster on YARN.
    20/01/31 14:06:10 INFO external.ExternalH2OBackend: Command used to start H2O on yarn: hadoop jar /home/project/sparkling-water-3.28.0.1-1-2.4/h2odriver-sw3.28.0-hdp2.6-extended.jar -Dmapreduce.job.queuename=default -Dmapreduce.job.tags=H2O/Sparkling-Water,Sparkling-Water/Spark/application_1580479162776_0001 -Dai.h2o.args.config=sparkling-water-external -nodes 2 -notify notify_H2O_via_SparklingWater_application_1580479162776_0001 -jobname H2O_via_SparklingWater_application_1580479162776_0001 -mapperXmx 2G -nthreads -1 -J -log_level -J INFO -port_offset 1 -baseport 54321 -timeout 120 -disown -sw_ext_backend -J -rest_api_ping_timeout -J 60000 -J -client_disconnect_timeout -J 60000 -extramempercent 10
    20/01/31 14:06:10 ERROR job.projectJobDriver$: Job failed in cluster mode with clone3
    java.io.IOException: Cannot run program "hadoop": error=2, No such file or directory
    	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
    	at scala.sys.process.ProcessBuilderImpl$Simple.run(ProcessBuilderImpl.scala:69)
    	at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.run(ProcessBuilderImpl.scala:100)
    	at scala.sys.process.ProcessBuilderImpl$AbstractBuilder$$anonfun$runBuffered$1.apply(ProcessBuilderImpl.scala:148)
    	at scala.sys.process.ProcessBuilderImpl$AbstractBuilder$$anonfun$runBuffered$1.apply(ProcessBuilderImpl.scala:148)
    	at scala.sys.process.ProcessLogger$$anon$1.buffer(ProcessLogger.scala:99)
    	at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.runBuffered(ProcessBuilderImpl.scala:148)
    	at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.$bang(ProcessBuilderImpl.scala:114)
    	at org.apache.spark.h2o.backends.external.ExternalBackendUtils$class.launchShellCommand(ExternalBackendUtils.scala:111)
    	at org.apache.spark.h2o.backends.external.ExternalH2OBackend$.launchShellCommand(ExternalH2OBackend.scala:233)
    	at org.apache.spark.h2o.backends.external.ExternalH2OBackend.launchExternalH2OOnYarn(ExternalH2OBackend.scala:104)
    	at org.apache.spark.h2o.backends.external.ExternalH2OBackend.init(ExternalH2OBackend.scala:46)
    	at org.apache.spark.h2o.H2OContext$H2OContextClientBased.initBackend(H2OContext.scala:448)
    	at org.apache.spark.h2o.H2OContext.init(H2OContext.scala:150)
    	at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:606)
    
    opened by BhushG 18
  • Cannot test sparkling water app with Graddle

    Cannot test sparkling water app with Graddle

    I´m using sparkling water with scala and it works excellent in spark2-shell mode. Now I´m creating a graddle based app to build a fat jar and execute it with spark2-submit.

    I added the needed dependencies to build.graddle file (the same that I used in spark2-shell mode), I´m usin spark 2.3.0

    compile "ai.h2o:sparkling-water-core_2.11:2.3.28"
    

    If I try to test my application it throws an error:

    Exception in thread "H2O Launcher thread" java.lang.NoClassDefFoundError: org/eclipse/jetty/util/component/AggregateLifeCycle
    	at java.lang.ClassLoader.defineClass1(Native Method)
    	at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
    	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    	at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
    	at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
    	at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
    	at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    	at java.lang.ClassLoader.defineClass1(Native Method)
    	at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
    	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    	at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
    	at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
    	at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
    	at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    	at java.lang.ClassLoader.defineClass1(Native Method)
    	at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
    	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    	at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
    	at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
    	at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
    	at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    	at water.webserver.jetty8.Jetty8ServerAdapter.create(Jetty8ServerAdapter.java:35)
    	at water.webserver.jetty8.Jetty8Facade.createWebServer(Jetty8Facade.java:12)
    	at water.init.NetworkInit.initializeNetworkSockets(NetworkInit.java:77)
    	at water.H2O.startLocalNode(H2O.java:1620)
    	at water.H2O.main(H2O.java:2080)
    	at water.H2OStarter.start(H2OStarter.java:22)
    	at water.H2OStarter.start(H2OStarter.java:47)
    	at org.apache.spark.h2o.backends.internal.InternalBackendUtils$$anonfun$7$$anon$1.run(InternalBackendUtils.scala:169)
    Caused by: java.lang.ClassNotFoundException: org.eclipse.jetty.util.component.AggregateLifeCycle
    	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    	... 44 more
    

    So I tried to add jetty util to my app dependencies:

    testCompile "org.eclipse.jetty:jetty-util:9.4.18.v20190429"
    

    But this it throwing the same error.

    Any idea?

    opened by sergiocalde94 17
  • Sparkling Water fails to create h2oContext in simple spark project

    Sparkling Water fails to create h2oContext in simple spark project

    I am setting up for the first time Sparkling Water on a standalone cluster running spark 2.2. I have run Sparkling Water on such a cluster before via R (using rsparkling + sparklyr + h2o), but am having issues setting this up as a spark application (in scala).

    The app is built with Maven, so I have added the latest sparkling water dependancy:

    <dependency>
        <groupId>ai.h2o</groupId>
        <artifactId>sparkling-water-core_2.11</artifactId>
        <version>2.2.2</version>
    </dependency>
    

    Then the app code is as follows:

    package com.me.app
    
    import org.apache.spark.sql.{DataFrame, SparkSession}        
    import org.apache.spark.h2o._
    import water.Key
    import water.fvec.Frame
    
    object sparklingWaterH2o {
    
      def sparklingWaterH2o(): Unit = {
    
    
        val sparkSession = SparkSession
          .builder()
          .master("spark://clsuter.address:0077")
          .appName("sparklingWaterH2o")
          .config("spark.executor.memory", "32G")
          .config("spark.executor.cores", "5")
          .config("spark.cores.max", "40")
          .config("spark.ext.h2o.nthreads", "40")
          .config("spark.jars", "/path/to/fat/jar/app-1.0-SNAPSHOT-jar-with-dependencies.jar")
          .getOrCreate()
    
        val h2oContext = H2OContext.getOrCreate(sparkSession)
    
        import h2oContext._
    
        val df = Seq(
          (1, "2014/07/31 23:00:01"),
          (1, "2016/12/09 10:12:43")).toDF("id", "date")
    
        val h2oTrainFrame = h2oContext.asH2OFrame(df)
    
        println(s"h2oContext = ${h2oContext.toString()}")
    

    I then compile the fat jar to send to the cluster, however the h2oContext never gets created and the SparkContext gets shut down with exit code 255. The app exiting with no error codes before an h2o context is created - the only potentially useful message is IP address not found on this machine.

    I've tried this with Sparkling Water version 2.2.0 and get the same issues, also tried adding dependencies for sparkling-water-ml and sparkling-water-repl, as well as adding all the h2o core dependencies (though assuming these are not needed as they are integrated into sparkling water?).

    The strange thing is that I get almost the exact same issue when trying to connect via R (using rsparkling and sparklyr, see here) - and that method worked correctly up until a few weeks ago.

    See log below.

    
    objc[39611]: Class JavaLaunchHelper is implemented in both /Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home/bin/java (0x10ab4b4c0) and /Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home/jre/lib/libinstrument.dylib (0x10bb724e0). One of the two will be used. Which one is undefined.
    Usinrg Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/Users/username/.m2/repository/org/slf4j/slf4j-log4j12/1.7.16/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/Users/username/.m2/repository/org/apache/logging/log4j/log4j-slf4j-impl/2.6.2/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
    17/11/17 10:16:01 INFO SparkContext: Running Spark version 2.2.0
    17/11/17 10:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    17/11/17 10:16:02 INFO SparkContext: Submitted application: sparklingWaterH2o
    17/11/17 10:16:02 INFO SecurityManager: Changing view acls to: username
    17/11/17 10:16:02 INFO SecurityManager: Changing modify acls to: username
    17/11/17 10:16:02 INFO SecurityManager: Changing view acls groups to: 
    17/11/17 10:16:02 INFO SecurityManager: Changing modify acls groups to: 
    17/11/17 10:16:02 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(username); groups with view permissions: Set(); users  with modify permissions: Set(username); groups with modify permissions: Set()
    17/11/17 10:16:03 INFO Utils: Successfully started service 'sparkDriver' on port 53775.
    17/11/17 10:16:03 INFO SparkEnv: Registering MapOutputTracker
    17/11/17 10:16:03 INFO SparkEnv: Registering BlockManagerMaster
    17/11/17 10:16:03 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
    17/11/17 10:16:03 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
    17/11/17 10:16:03 INFO DiskBlockManager: Created local directory at /private/var/folders/gl/vgw262w9227cwqvzk595rbvjygdzh8/T/blockmgr-d29de5c5-9116-4abf-812c-04ca680781fe
    17/11/17 10:16:03 INFO MemoryStore: MemoryStore started with capacity 1002.3 MB
    17/11/17 10:16:03 INFO SparkEnv: Registering OutputCommitCoordinator
    17/11/17 10:16:03 INFO Utils: Successfully started service 'SparkUI' on port 4040.
    17/11/17 10:16:03 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.103.46:4040
    17/11/17 10:16:03 INFO SparkContext: Added JAR /path/to/app/target/app-1.0-SNAPSHOT-jar-with-dependencies.jar at spark://192.168.103.46:53775/jars/app-1.0-SNAPSHOT-jar-with-dependencies.jar with timestamp 1510913763424
    17/11/17 10:16:03 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://rnd-centos7-ben-31.nominet.org.uk:7077...
    17/11/17 10:16:03 INFO TransportClientFactory: Successfully created connection to rnd-centos7-ben-31.nominet.org.uk/XXX.XXX.211.31:7077 after 26 ms (0 ms spent in bootstraps)
    17/11/17 10:16:03 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20171117101603-0031
    17/11/17 10:16:03 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20171117101603-0031/0 on worker-20171013100055-XXX.XXX.211.30-33565 (XXX.XXX.211.30:33565) with 5 cores
    17/11/17 10:16:03 INFO StandaloneSchedulerBackend: Granted executor ID app-20171117101603-0031/0 on hostPort XXX.XXX.211.30:33565 with 5 cores, 32.0 GB RAM
    17/11/17 10:16:03 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20171117101603-0031/1 on worker-20171013100055-XXX.XXX.211.33-34424 (XXX.XXX.211.33:34424) with 5 cores
    17/11/17 10:16:03 INFO StandaloneSchedulerBackend: Granted executor ID app-20171117101603-0031/1 on hostPort XXX.XXX.211.33:34424 with 5 cores, 32.0 GB RAM
    17/11/17 10:16:03 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20171117101603-0031/2 on worker-20171013100055-XXX.XXX.211.31-37513 (XXX.XXX.211.31:37513) with 5 cores
    17/11/17 10:16:03 INFO StandaloneSchedulerBackend: Granted executor ID app-20171117101603-0031/2 on hostPort XXX.XXX.211.31:37513 with 5 cores, 32.0 GB RAM
    17/11/17 10:16:03 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20171117101603-0031/3 on worker-20171013100054-XXX.XXX.211.32-36797 (XXX.XXX.211.32:36797) with 5 cores
    17/11/17 10:16:03 INFO StandaloneSchedulerBackend: Granted executor ID app-20171117101603-0031/3 on hostPort XXX.XXX.211.32:36797 with 5 cores, 32.0 GB RAM
    17/11/17 10:16:03 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20171117101603-0031/2 is now RUNNING
    17/11/17 10:16:03 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20171117101603-0031/1 is now RUNNING
    17/11/17 10:16:03 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20171117101603-0031/3 is now RUNNING
    17/11/17 10:16:03 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20171117101603-0031/0 is now RUNNING
    17/11/17 10:16:03 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 53777.
    17/11/17 10:16:03 INFO NettyBlockTransferService: Server created on 192.168.103.46:53777
    17/11/17 10:16:03 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
    17/11/17 10:16:03 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.103.46, 53777, None)
    17/11/17 10:16:03 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.103.46:53777 with 1002.3 MB RAM, BlockManagerId(driver, 192.168.103.46, 53777, None)
    17/11/17 10:16:03 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.103.46, 53777, None)
    17/11/17 10:16:03 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.103.46, 53777, None)
    17/11/17 10:16:05 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (XXX.XXX.211.31:46906) with ID 2
    17/11/17 10:16:05 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (XXX.XXX.211.30:54738) with ID 0
    17/11/17 10:16:05 INFO BlockManagerMasterEndpoint: Registering block manager XXX.XXX.211.31:45376 with 8.4 GB RAM, BlockManagerId(2, XXX.XXX.211.31, 45376, None)
    17/11/17 10:16:05 INFO BlockManagerMasterEndpoint: Registering block manager XXX.XXX.211.30:34172 with 8.4 GB RAM, BlockManagerId(0, XXX.XXX.211.30, 34172, None)
    17/11/17 10:16:05 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (XXX.XXX.211.32:53076) with ID 3
    17/11/17 10:16:05 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (XXX.XXX.211.33:47478) with ID 1
    17/11/17 10:16:05 INFO BlockManagerMasterEndpoint: Registering block manager XXX.XXX.211.32:34360 with 8.4 GB RAM, BlockManagerId(3, XXX.XXX.211.32, 34360, None)
    17/11/17 10:16:05 INFO BlockManagerMasterEndpoint: Registering block manager XXX.XXX.211.33:34342 with 8.4 GB RAM, BlockManagerId(1, XXX.XXX.211.33, 34342, None)
    17/11/17 10:16:33 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
    17/11/17 10:16:33 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/path/to/app/spark-warehouse/').
    17/11/17 10:16:33 INFO SharedState: Warehouse path is 'file:/path/to/app/spark-warehouse/'.
    17/11/17 10:16:34 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
    17/11/17 10:16:34 WARN InternalH2OBackend: Increasing 'spark.locality.wait' to value 30000
    17/11/17 10:16:34 WARN InternalH2OBackend: Due to non-deterministic behavior of Spark broadcast-based joins
    We recommend to disable them by
    configuring `spark.sql.autoBroadcastJoinThreshold` variable to value `-1`:
    sqlContext.sql("SET spark.sql.autoBroadcastJoinThreshold=-1")
    17/11/17 10:16:34 INFO InternalH2OBackend: Starting H2O services: Sparkling Water configuration:
      backend cluster mode : internal
      workers              : None
      cloudName            : sparkling-water-username_app-20171117101603-0031
      flatfile             : true
      clientBasePort       : 54321
      nodeBasePort         : 54321
      cloudTimeout         : 60000
      h2oNodeLog           : INFO
      h2oClientLog         : WARN
      nthreads             : 40
      drddMulFactor        : 10
    17/11/17 10:16:34 INFO SparkContext: Starting job: collect at SpreadRDDBuilder.scala:105
    17/11/17 10:16:34 INFO DAGScheduler: Got job 0 (collect at SpreadRDDBuilder.scala:105) with 41 output partitions
    17/11/17 10:16:34 INFO DAGScheduler: Final stage: ResultStage 0 (collect at SpreadRDDBuilder.scala:105)
    17/11/17 10:16:34 INFO DAGScheduler: Parents of final stage: List()
    17/11/17 10:16:34 INFO DAGScheduler: Missing parents: List()
    17/11/17 10:16:34 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at mapPartitionsWithIndex at SpreadRDDBuilder.scala:102), which has no missing parents
    17/11/17 10:16:34 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 2.1 KB, free 1002.3 MB)
    17/11/17 10:16:34 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1379.0 B, free 1002.3 MB)
    17/11/17 10:16:34 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.103.46:53777 (size: 1379.0 B, free: 1002.3 MB)
    17/11/17 10:16:34 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006
    17/11/17 10:16:34 INFO DAGScheduler: Submitting 41 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at mapPartitionsWithIndex at SpreadRDDBuilder.scala:102) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14))
    17/11/17 10:16:34 INFO TaskSchedulerImpl: Adding task set 0.0 with 41 tasks
    17/11/17 10:16:34 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, XXX.XXX.211.31, executor 2, partition 0, PROCESS_LOCAL, 4829 bytes)
    17/11/17 10:16:34 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, XXX.XXX.211.30, executor 0, partition 1, PROCESS_LOCAL, 4829 bytes)
    ...
    17/11/17 10:16:34 INFO TaskSetManager: Starting task 19.0 in stage 0.0 (TID 19, XXX.XXX.211.33, executor 1, partition 19, PROCESS_LOCAL, 4829 bytes)
    17/11/17 10:16:43 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on XXX.XXX.211.30:34172 (size: 1379.0 B, free: 8.4 GB)
    17/11/17 10:16:43 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on XXX.XXX.211.32:34360 (size: 1379.0 B, free: 8.4 GB)
    17/11/17 10:16:43 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on XXX.XXX.211.33:34342 (size: 1379.0 B, free: 8.4 GB)
    17/11/17 10:16:43 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on XXX.XXX.211.31:45376 (size: 1379.0 B, free: 8.4 GB)
    17/11/17 10:16:43 INFO BlockManagerInfo: Added rdd_0_13 in memory on XXX.XXX.211.30:34172 (size: 32.0 B, free: 8.4 GB)
    ...
    17/11/17 10:16:43 INFO TaskSetManager: Finished task 40.0 in stage 0.0 (TID 40) in 29 ms on XXX.XXX.211.33 (executor 1) (41/41)
    17/11/17 10:16:43 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
    17/11/17 10:16:43 INFO DAGScheduler: ResultStage 0 (collect at SpreadRDDBuilder.scala:105) finished in 8.913 s
    17/11/17 10:16:43 INFO DAGScheduler: Job 0 finished: collect at SpreadRDDBuilder.scala:105, took 9.072610 s
    17/11/17 10:16:43 INFO ParallelCollectionRDD: Removing RDD 0 from persistence list
    17/11/17 10:16:43 INFO BlockManager: Removing RDD 0
    17/11/17 10:16:43 INFO SpreadRDDBuilder: Detected 4 spark executors for 4 H2O workers!
    17/11/17 10:16:43 INFO InternalH2OBackend: Launching H2O on following 4 nodes: (0,XXX.XXX.211.30,-1),(1,XXX.XXX.211.33,-1),(2,XXX.XXX.211.31,-1),(3,XXX.XXX.211.32,-1)
    17/11/17 10:16:43 INFO SparkContext: Starting job: collect at InternalBackendUtils.scala:163
    17/11/17 10:16:43 INFO DAGScheduler: Got job 1 (collect at InternalBackendUtils.scala:163) with 4 output partitions
    17/11/17 10:16:43 INFO DAGScheduler: Final stage: ResultStage 1 (collect at InternalBackendUtils.scala:163)
    17/11/17 10:16:43 INFO DAGScheduler: Parents of final stage: List()
    17/11/17 10:16:43 INFO DAGScheduler: Missing parents: List()
    17/11/17 10:16:43 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[3] at map at InternalBackendUtils.scala:100), which has no missing parents
    17/11/17 10:16:43 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.1 KB, free 1002.3 MB)
    17/11/17 10:16:43 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2029.0 B, free 1002.3 MB)
    17/11/17 10:16:43 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.103.46:53777 (size: 2029.0 B, free: 1002.3 MB)
    17/11/17 10:16:43 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
    17/11/17 10:16:43 INFO DAGScheduler: Submitting 4 missing tasks from ResultStage 1 (MapPartitionsRDD[3] at map at InternalBackendUtils.scala:100) (first 15 tasks are for partitions Vector(0, 1, 2, 3))
    17/11/17 10:16:43 INFO TaskSchedulerImpl: Adding task set 1.0 with 4 tasks
    17/11/17 10:16:43 INFO TaskSetManager: Starting task 2.0 in stage 1.0 (TID 41, XXX.XXX.211.31, executor 2, partition 2, NODE_LOCAL, 4821 bytes)
    17/11/17 10:16:43 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 42, XXX.XXX.211.30, executor 0, partition 0, NODE_LOCAL, 4821 bytes)
    17/11/17 10:16:43 INFO TaskSetManager: Starting task 3.0 in stage 1.0 (TID 43, XXX.XXX.211.32, executor 3, partition 3, NODE_LOCAL, 4821 bytes)
    17/11/17 10:16:43 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 44, XXX.XXX.211.33, executor 1, partition 1, NODE_LOCAL, 4821 bytes)
    17/11/17 10:16:43 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on XXX.XXX.211.30:34172 (size: 2029.0 B, free: 8.4 GB)
    17/11/17 10:16:43 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on XXX.XXX.211.31:45376 (size: 2029.0 B, free: 8.4 GB)
    17/11/17 10:16:43 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on XXX.XXX.211.33:34342 (size: 2029.0 B, free: 8.4 GB)
    17/11/17 10:16:43 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on XXX.XXX.211.32:34360 (size: 2029.0 B, free: 8.4 GB)
    17/11/17 10:16:43 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 42) in 349 ms on XXX.XXX.211.30 (executor 0) (1/4)
    17/11/17 10:16:43 INFO TaskSetManager: Finished task 2.0 in stage 1.0 (TID 41) in 358 ms on XXX.XXX.211.31 (executor 2) (2/4)
    17/11/17 10:16:43 INFO TaskSetManager: Finished task 3.0 in stage 1.0 (TID 43) in 394 ms on XXX.XXX.211.32 (executor 3) (3/4)
    17/11/17 10:16:43 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 44) in 408 ms on XXX.XXX.211.33 (executor 1) (4/4)
    17/11/17 10:16:43 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
    17/11/17 10:16:43 INFO DAGScheduler: ResultStage 1 (collect at InternalBackendUtils.scala:163) finished in 0.411 s
    17/11/17 10:16:43 INFO DAGScheduler: Job 1 finished: collect at InternalBackendUtils.scala:163, took 0.428038 s
    17/11/17 10:16:43 INFO SparkContext: Starting job: foreach at InternalBackendUtils.scala:175
    17/11/17 10:16:43 INFO DAGScheduler: Got job 2 (foreach at InternalBackendUtils.scala:175) with 4 output partitions
    17/11/17 10:16:43 INFO DAGScheduler: Final stage: ResultStage 2 (foreach at InternalBackendUtils.scala:175)
    17/11/17 10:16:43 INFO DAGScheduler: Parents of final stage: List()
    17/11/17 10:16:43 INFO DAGScheduler: Missing parents: List()
    17/11/17 10:16:43 INFO DAGScheduler: Submitting ResultStage 2 (InvokeOnNodesRDD[2] at RDD at InvokeOnNodesRDD.scala:27), which has no missing parents
    17/11/17 10:16:43 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 1832.0 B, free 1002.3 MB)
    17/11/17 10:16:43 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1209.0 B, free 1002.3 MB)
    17/11/17 10:16:43 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.103.46:53777 (size: 1209.0 B, free: 1002.3 MB)
    17/11/17 10:16:43 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1006
    17/11/17 10:16:43 INFO DAGScheduler: Submitting 4 missing tasks from ResultStage 2 (InvokeOnNodesRDD[2] at RDD at InvokeOnNodesRDD.scala:27) (first 15 tasks are for partitions Vector(0, 1, 2, 3))
    17/11/17 10:16:43 INFO TaskSchedulerImpl: Adding task set 2.0 with 4 tasks
    17/11/17 10:16:43 INFO TaskSetManager: Starting task 2.0 in stage 2.0 (TID 45, XXX.XXX.211.31, executor 2, partition 2, NODE_LOCAL, 4821 bytes)
    17/11/17 10:16:43 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 46, XXX.XXX.211.30, executor 0, partition 0, NODE_LOCAL, 4821 bytes)
    17/11/17 10:16:43 INFO TaskSetManager: Starting task 3.0 in stage 2.0 (TID 47, XXX.XXX.211.32, executor 3, partition 3, NODE_LOCAL, 4821 bytes)
    17/11/17 10:16:43 INFO TaskSetManager: Starting task 1.0 in stage 2.0 (TID 48, XXX.XXX.211.33, executor 1, partition 1, NODE_LOCAL, 4821 bytes)
    17/11/17 10:16:43 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on XXX.XXX.211.31:45376 (size: 1209.0 B, free: 8.4 GB)
    17/11/17 10:16:43 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on XXX.XXX.211.33:34342 (size: 1209.0 B, free: 8.4 GB)
    17/11/17 10:16:43 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on XXX.XXX.211.32:34360 (size: 1209.0 B, free: 8.4 GB)
    17/11/17 10:16:43 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on XXX.XXX.211.30:34172 (size: 1209.0 B, free: 8.4 GB)
    17/11/17 10:16:43 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 46) in 28 ms on XXX.XXX.211.30 (executor 0) (1/4)
    17/11/17 10:16:43 INFO TaskSetManager: Finished task 1.0 in stage 2.0 (TID 48) in 28 ms on XXX.XXX.211.33 (executor 1) (2/4)
    17/11/17 10:16:43 INFO TaskSetManager: Finished task 2.0 in stage 2.0 (TID 45) in 30 ms on XXX.XXX.211.31 (executor 2) (3/4)
    17/11/17 10:16:43 INFO TaskSetManager: Finished task 3.0 in stage 2.0 (TID 47) in 32 ms on XXX.XXX.211.32 (executor 3) (4/4)
    17/11/17 10:16:43 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 
    17/11/17 10:16:43 INFO DAGScheduler: ResultStage 2 (foreach at InternalBackendUtils.scala:175) finished in 0.034 s
    17/11/17 10:16:43 INFO DAGScheduler: Job 2 finished: foreach at InternalBackendUtils.scala:175, took 0.043737 s
    17/11/17 10:16:43 INFO InternalH2OBackend: Starting H2O client on the Spark Driver (192.168.103.46): -name sparkling-water-username_app-20171117101603-0031 -nthreads 40 -ga_opt_out -quiet -log_level WARN -log_dir /path/to/app/h2ologs/app-20171117101603-0031 -baseport 54321 -client -ip 192.168.103.46 -flatfile /var/folders/gl/vgw262w9227cwqvzk595rbvjygdzh8/T/1510913803950-0/flatfile.txt
    17/11/17 10:16:44 INFO NativeLibrary: Loaded XGBoost library from lib/osx_64/libxgboost4j.dylib (/var/folders/gl/vgw262w9227cwqvzk595rbvjygdzh8/T/libxgboost4j2584224510491657515.dylib)
    Found XGBoost backend with library: xgboost4j
    Your system supports only minimal version of XGBoost (no GPUs, no multithreading)!
    IP address not found on this machine
    17/11/17 10:16:45 INFO SparkContext: Invoking stop() from shutdown hook
    17/11/17 10:16:45 INFO SparkUI: Stopped Spark web UI at http://192.168.103.46:4040
    17/11/17 10:16:45 INFO StandaloneSchedulerBackend: Shutting down all executors
    17/11/17 10:16:45 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down
    17/11/17 10:16:45 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
    17/11/17 10:16:45 INFO MemoryStore: MemoryStore cleared
    17/11/17 10:16:45 INFO BlockManager: BlockManager stopped
    17/11/17 10:16:45 INFO BlockManagerMaster: BlockManagerMaster stopped
    17/11/17 10:16:45 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
    17/11/17 10:16:45 INFO SparkContext: Successfully stopped SparkContext
    17/11/17 10:16:45 INFO ShutdownHookManager: Shutdown hook called
    17/11/17 10:16:45 INFO ShutdownHookManager: Deleting directory /private/var/folders/gl/vgw262w9227cwqvzk595rbvjygdzh8/T/spark-51594e29-1ea0-4a4d-9aa0-dd65ef5146dd
    
    opened by renegademonkey 17
  • Failed to Create H20Context

    Failed to Create H20Context

    Followed everything here, and unable to create h2o context.

    https://docs.h2o.ai/sparkling-water/3.3/latest-stable/doc/rsparkling.html#install-sparklyr

    Clear libraries

    # The following two commands remove any previously installed H2O packages for R.
    if ("package:rsparkling" %in% search()) { detach("package:rsparkling", unload=TRUE) }
    if ("rsparkling" %in% rownames(installed.packages())) { remove.packages("rsparkling") }
    
    if ("package:h2o" %in% search()) { detach("package:h2o", unload=TRUE) }
    if ("h2o" %in% rownames(installed.packages())) { remove.packages("h2o") }
    
    # Install packages H2O depends on
    pkgs <- c("methods", "statmod", "stats", "graphics", "RCurl", "jsonlite", "tools", "utils")
    for (pkg in pkgs) {
        if (! (pkg %in% rownames(installed.packages()))) { install.packages(pkg) }
    }
    

    Install libraries

    if (!require("h2o", quietly = TRUE)) install.packages("h2o", type = "source", repos = "http://h2o-release.s3.amazonaws.com/h2o/rel-zygmund/2/R")
    if (!require("rsparkling")) install.packages("rsparkling", type = "source", repos = "http://h2o-release.s3.amazonaws.com/sparkling-water/spark-3.3/3.38.0.2-1-3.3/R") 
    
    library("rsparkling")
    library("h2o")
    

    installed successfully

    * installing *source* package ‘h2o’ ...
    ** using staged installation
    ** R
    ** demo
    ** inst
    ** byte-compile and prepare package for lazy loading
    ** help
    *** installing help indices
    ** building package indices
    ** testing if installed package can be loaded from temporary location
    ** testing if installed package can be loaded from final location
    ** testing if installed package keeps a record of temporary installation path
    * DONE (h2o)
    * installing *source* package ‘rsparkling’ ...
    ** using staged installation
    ** R
    ** inst
    ** byte-compile and prepare package for lazy loading
    ** help
    No man pages found in package  ‘rsparkling’ 
    *** installing help indices
    ** building package indices
    ** testing if installed package can be loaded from temporary location
    ** testing if installed package can be loaded from final location
    ** testing if installed package keeps a record of temporary installation path
    * DONE (rsparkling)
    Installing package into ‘/local_disk0/.ephemeral_nfs/envs/rEnv-fc8e4f75-68fe-4030-b830-1b5166d74880’
    (as ‘lib’ is unspecified)
    trying URL 'http://h2o-release.s3.amazonaws.com/h2o/rel-zygmund/2/R/src/contrib/h2o_3.38.0.2.tar.gz'
    Content type 'application/x-tar' length 177414951 bytes (169.2 MB)
    ==================================================
    downloaded 169.2 MB
    
    
    The downloaded source packages are in
    	‘/tmp/RtmpeQOg9g/downloaded_packages’
    Loading required package: rsparkling
    Warning in library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
      there is no package called ‘rsparkling’
    Installing package into ‘/local_disk0/.ephemeral_nfs/envs/rEnv-fc8e4f75-68fe-4030-b830-1b5166d74880’
    (as ‘lib’ is unspecified)
    trying URL 'http://h2o-release.s3.amazonaws.com/sparkling-water/spark-3.3/3.38.0.2-1-3.3/R/src/contrib/rsparkling_3.38.0.2-1-3.3.tar.gz'
    Content type 'application/x-tar' length 161159310 bytes (153.7 MB)
    ==================================================
    downloaded 153.7 MB
    
    
    The downloaded source packages are in
    	‘/tmp/RtmpeQOg9g/downloaded_packages’
    
    ----------------------------------------------------------------------
    
    Your next step is to start H2O:
        > h2o.init()
    
    For H2O package documentation, ask for help:
        > ??h2o
    
    After starting H2O, you can use the Web UI at http://localhost:54321
    For more information visit https://docs.h2o.ai
    
    ----------------------------------------------------------------------
    
    
    Attaching package: ‘h2o’
    
    The following objects are masked from ‘package:stats’:
    
        cor, sd, var
    
    The following objects are masked from ‘package:base’:
    
        &&, %*%, %in%, ||, apply, as.factor, as.numeric, colnames,
        colnames<-, ifelse, is.character, is.factor, is.numeric, log,
        log10, log1p, log2, round, signif, trunc
    

    Load Libraries and Establish Connection

    #load libraries
    library(sparklyr)
    library(tidyverse)
    library(lubridate)
    
    # spark_home_set()
    config <- spark_config()
    # Memory
    config["sparklyr.shell.driver-memory"] <- "64g"
    # Cores
    config["sparklyr.connect.cores.local"] <- 8
    
    sc <- spark_connect(method = "databricks", config = config) #remotely spark_home = "c:/programdata/anaconda3/lib/site-packages/pyspark"
    options(warn=-1) #suppress warning messages
    h2o.init()
    

    Create H2o Context and fails here

    h2oConf <- H2OConf()
    

    Error Log

    Error : java.lang.ClassNotFoundException: ai.h2o.sparkling.H2OConf
    	at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
    	at com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader.loadClass(ClassLoaders.scala:151)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
    	at java.lang.Class.forName0(Native Method)
    	at java.lang.Class.forName(Class.java:264)
    	at sparklyr.StreamHandler.handleMethodCall(stream.scala:111)
    	at sparklyr.StreamHandler.read(stream.scala:62)
    	at sparklyr.BackendHandler.$anonfun$channelRead0$1(handler.scala:60)
    	at scala.util.control.Breaks.breakable(Breaks.scala:42)
    	at sparklyr.BackendHandler.channelRead0(handler.scala:41)
    	at sparklyr.BackendHandler.channelRead0(handler.scala:14)
    	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
    	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
    	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
    	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
    	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:327)
    	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:299)
    	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
    	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
    	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
    	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
    	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)
    	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
    	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
    	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
    	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
    	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    	at java.lang.Thread.run(Thread.java:750)
    
    Error: java.lang.ClassNotFoundException: ai.h2o.sparkling.H2OConf
    Error: java.lang.ClassNotFoundException: ai.h2o.sparkling.H2OConf
    	at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
    	at com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader.loadClass(ClassLoaders.scala:151)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
    	at java.lang.Class.forName0(Native Method)
    	at java.lang.Class.forName(Class.java:264)
    	at sparklyr.StreamHandler.handleMethodCall(stream.scala:111)
    	at sparklyr.StreamHandler.read(stream.scala:62)
    	at sparklyr.BackendHandler.$anonfun$channelRead0$1(handler.scala:60)
    	at scala.util.control.Breaks.breakable(Breaks.scala:42)
    	at sparklyr.BackendHandler.channelRead0(handler.scala:41)
    	at sparklyr.BackendHandler.channelRead0(handler.scala:14)
    	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
    	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
    	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
    	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
    	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:327)
    	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:299)
    	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
    	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
    	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
    	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
    	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)
    	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
    	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
    	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
    	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
    	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    	at java.lang.Thread.run(Thread.java:750)
    

    Session Info

    R version 4.1.3 (2022-03-10)
    Platform: x86_64-pc-linux-gnu (64-bit)
    Running under: Ubuntu 20.04.5 LTS
    
    Matrix products: default
    BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
    LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
    
    locale:
     [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
     [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
     [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
    [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
    
    attached base packages:
    [1] stats     graphics  grDevices utils     datasets  methods   base     
    
    other attached packages:
     [1] lubridate_1.8.0           forcats_0.5.1            
     [3] stringr_1.4.0             dplyr_1.0.9              
     [5] purrr_0.3.4               readr_2.1.2              
     [7] tidyr_1.2.0               tibble_3.1.7             
     [9] ggplot2_3.3.6             tidyverse_1.3.1          
    [11] sparklyr_1.7.5            h2o_3.38.0.2             
    [13] rsparkling_3.38.0.2-1-3.3
    
    loaded via a namespace (and not attached):
     [1] tidyselect_1.1.2  forge_0.2.0       haven_2.5.0       colorspace_2.0-3 
     [5] vctrs_0.4.1       generics_0.1.2    htmltools_0.5.2   yaml_2.3.5       
     [9] base64enc_0.1-3   utf8_1.2.2        SparkR_3.3.0      rlang_1.0.2      
    [13] pillar_1.7.0      withr_2.5.0       glue_1.6.2        DBI_1.1.2        
    [17] Rserve_1.8-10     dbplyr_2.1.1      modelr_0.1.8      readxl_1.4.0     
    [21] lifecycle_1.0.1   munsell_0.5.0     gtable_0.3.0      cellranger_1.1.0 
    [25] rvest_1.0.2       htmlwidgets_1.5.4 tzdb_0.3.0        fastmap_1.1.0    
    [29] curl_4.3.2        parallel_4.1.3    fansi_1.0.3       broom_0.8.0      
    [33] r2d3_0.2.6        backports_1.4.1   scales_1.2.0      jsonlite_1.8.0   
    [37] config_0.3.1      fs_1.5.2          hms_1.1.1         digest_0.6.29    
    [41] stringi_1.7.6     rprojroot_2.0.3   grid_4.1.3        cli_3.3.0        
    [45] tools_4.1.3       bitops_1.0-7      magrittr_2.0.3    RCurl_1.98-1.9   
    [49] crayon_1.5.1      pkgconfig_2.0.3   ellipsis_0.3.2    xml2_1.3.3       
    [53] reprex_2.0.1      assertthat_0.2.1  httr_1.4.3        rstudioapi_0.13  
    [57] R6_2.5.1          compiler_4.1.3   
    
    opened by tsengj 1
  • Error when training XGBoost or conduct target encoding on data with high cardinality features on sparkling water

    Error when training XGBoost or conduct target encoding on data with high cardinality features on sparkling water

    Providing us with the observed and expected behavior definitely helps. Giving us with the following information definitively helps:

    I initialized the h2o sparkling water cluster on Google Cloud Dataproc, and did some XGBoost training on some data with both categorical and numerical columns. It was fine at first, but when I train the model on a large data with some categorical features with more than 3,000,000 unique values, it will trigger the error java.lang.OutOfMemoryError: Requested array size exceeds VM limitand the entire h2o cluster crashed. The algorithm could be ran without any problem when deleting those high cardinality features. Since it's a company internal data, I'm not able to share the exact data. But the code is actually quite simple:

    from pysparkling.ml import H2OXGBoost    
      estimator = H2OXGBoost(
            labelCol="label",
            ntrees = 500,
            maxDepth = 10,
            learnRate = 0.1,
            categoricalEncoding='SortByResponse',
            convertUnknownCategoricalLevelsToNa=True,
            seed = 100
        )
        pipeline = Pipeline(stages=[estimator])
    

    Any idea on how I should get around this error? - Sparkling Water/PySparkling/RSparkling version: h2o cluster version 3.36.1.3 - Hadoop Version & Distribution: 3.2.3 - Execution mode: YARN-cluster(not quite sure about this, but I guess should be Yarn cluster) **- YARN logs in case of running on yarn. To collect such a logs you may run yarn logs -applicationId <application ID> where the application ID is displayed when Sparkling Water is started

    • H2O & Spark logs if not running on YARN. You can find these logs in Spark work directory**
    Py4JJavaError: An error occurred while calling o116.fit.
    : ai.h2o.sparkling.backend.exceptions.RestApiCommunicationException: H2O node http://10.192.80.233:54321/ responded with
    Status code: 500 : java.lang.OutOfMemoryError: Requested array size exceeds VM limit? at java.lang.StringCoding.encode(StringCoding.java:350)? at java.lang.String.getBytes(String.java:941)? at water.util.StringUtils.bytesOf(StringUtils.java:197)? at water.api.NanoResponse.<init>(NanoResponse.java:40)? at water.api.RequestServer.serveSchema(RequestServer.java:781)? at water.api.RequestServer.serve(RequestServer.java:474)? at water.api.RequestServer.doGeneric(RequestServer.java:303)? at water.api.RequestServer.doGet(RequestServer.java:225)? at javax.servlet.http.HttpServlet.service(HttpServlet.java:503)? at javax.servlet.http.HttpServlet.service(HttpServlet.java:590)? at ai.h2o.org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:865)? at ai.h2o.org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:535)? at ai.h2o.org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)? at ai.h2o.org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)? at ai.h2o.org
    Server error: <html>
    <head>
    <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
    <title>Error 500 java.lang.OutOfMemoryError: Requested array size exceeds VM limit
    	at java.lang.StringCoding.encode(StringCoding.java:350)
    	at java.lang.String.getBytes(String.java:941)
    	at water.util.StringUtils.bytesOf(StringUtils.java:197)
    	at water.api.NanoResponse.&lt;init&gt;(NanoResponse.java:40)
    	at water.api.RequestServer.serveSchema(RequestServer.java:781)
    	at water.api.RequestServer.serve(RequestServer.java:474)
    	at water.api.RequestServer.doGeneric(RequestServer.java:303)
    	at water.api.RequestServer.doGet(RequestServer.java:225)
    	at javax.servlet.http.HttpServlet.service(HttpServlet.java:503)
    	at javax.servlet.http.HttpServlet.service(HttpServlet.java:590)
    	at ai.h2o.org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:865)
    	at ai.h2o.org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:535)
    	at ai.h2o.org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
    	at ai.h2o.org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)
    	at ai.h2o.org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
    	at ai.h2o.org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
    	at ai.h2o.org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
    	at ai.h2o.org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)
    	at ai.h2o.org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
    	at ai.h2o.org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
    	at ai.h2o.org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
    	at water.webserver.jetty9.Jetty9ServerAdapter$LoginHandler.handle(Jetty9ServerAdapter.java:130)
    	at ai.h2o.org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
    	at ai.h2o.org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
    	at ai.h2o.org.eclipse.jetty.server.Server.handle(Server.java:531)
    	at ai.h2o.org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)
    	at ai.h2o.org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
    	at ai.h2o.org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)
    	at ai.h2o.org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
    	at ai.h2o.org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
    	at ai.h2o.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
    	at ai.h2o.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
    </title>
    </head>
    <body><h2>HTTP ERROR 500</h2>
    <p>Problem accessing /3/Frames/frame_rdd_18787349126/summary. Reason:
    <pre>    java.lang.OutOfMemoryError: Requested array size exceeds VM limit
    	at java.lang.StringCoding.encode(StringCoding.java:350)
    	at java.lang.String.getBytes(String.java:941)
    	at water.util.StringUtils.bytesOf(StringUtils.java:197)
    	at water.api.NanoResponse.&lt;init&gt;(NanoResponse.java:40)
    	at water.api.RequestServer.serveSchema(RequestServer.java:781)
    	at water.api.RequestServer.serve(RequestServer.java:474)
    	at water.api.RequestServer.doGeneric(RequestServer.java:303)
    	at water.api.RequestServer.doGet(RequestServer.java:225)
    	at javax.servlet.http.HttpServlet.service(HttpServlet.java:503)
    	at javax.servlet.http.HttpServlet.service(HttpServlet.java:590)
    	at ai.h2o.org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:865)
    	at ai.h2o.org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:535)
    	at ai.h2o.org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
    	at ai.h2o.org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)
    	at ai.h2o.org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
    	at ai.h2o.org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
    	at ai.h2o.org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
    	at ai.h2o.org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)
    	at ai.h2o.org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
    	at ai.h2o.org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
    	at ai.h2o.org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
    	at water.webserver.jetty9.Jetty9ServerAdapter$LoginHandler.handle(Jetty9ServerAdapter.java:130)
    	at ai.h2o.org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
    	at ai.h2o.org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
    	at ai.h2o.org.eclipse.jetty.server.Server.handle(Server.java:531)
    	at ai.h2o.org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)
    	at ai.h2o.org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
    	at ai.h2o.org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)
    	at ai.h2o.org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
    	at ai.h2o.org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
    	at ai.h2o.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
    	at ai.h2o.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
    </pre></p>
    </body>
    </html>
    
    	at ai.h2o.sparkling.backend.utils.RestCommunication.checkResponseCode(RestCommunication.scala:414)
    	at ai.h2o.sparkling.backend.utils.RestCommunication.checkResponseCode$(RestCommunication.scala:394)
    	at ai.h2o.sparkling.H2OFrame$.checkResponseCode(H2OFrame.scala:287)
    	at ai.h2o.sparkling.backend.utils.RestCommunication.readURLContent(RestCommunication.scala:386)
    	at ai.h2o.sparkling.backend.utils.RestCommunication.readURLContent$(RestCommunication.scala:370)
    	at ai.h2o.sparkling.H2OFrame$.readURLContent(H2OFrame.scala:287)
    	at ai.h2o.sparkling.backend.utils.RestCommunication.request(RestCommunication.scala:182)
    	at ai.h2o.sparkling.backend.utils.RestCommunication.request$(RestCommunication.scala:172)
    	at ai.h2o.sparkling.H2OFrame$.request(H2OFrame.scala:287)
    	at ai.h2o.sparkling.backend.utils.RestCommunication.query(RestCommunication.scala:67)
    	at ai.h2o.sparkling.backend.utils.RestCommunication.query$(RestCommunication.scala:59)
    	at ai.h2o.sparkling.H2OFrame$.query(H2OFrame.scala:287)
    	at ai.h2o.sparkling.H2OFrame$.getFrame(H2OFrame.scala:342)
    	at ai.h2o.sparkling.H2OFrame$.apply(H2OFrame.scala:291)
    	at ai.h2o.sparkling.backend.Writer$.convert(Writer.scala:109)
    	at ai.h2o.sparkling.backend.converters.SparkDataFrameConverter$.toH2OFrame(SparkDataFrameConverter.scala:77)
    	at ai.h2o.sparkling.H2OContext.$anonfun$asH2OFrame$2(H2OContext.scala:176)
    	at ai.h2o.sparkling.backend.utils.H2OContextExtensions.withConversionDebugPrints(H2OContextExtensions.scala:86)
    	at ai.h2o.sparkling.backend.utils.H2OContextExtensions.withConversionDebugPrints$(H2OContextExtensions.scala:74)
    	at ai.h2o.sparkling.H2OContext.withConversionDebugPrints(H2OContext.scala:65)
    	at ai.h2o.sparkling.H2OContext.asH2OFrame(H2OContext.scala:176)
    	at ai.h2o.sparkling.H2OContext.asH2OFrame(H2OContext.scala:162)
    	at ai.h2o.sparkling.ml.features.H2OTargetEncoder.fit(H2OTargetEncoder.scala:55)
    	at ai.h2o.sparkling.ml.features.H2OTargetEncoder.fit(H2OTargetEncoder.scala:34)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:498)
    	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    	at py4j.Gateway.invoke(Gateway.java:282)
    	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    	at py4j.commands.CallCommand.execute(CallCommand.java:79)
    	at py4j.GatewayConnection.run(GatewayConnection.java:238)
    	at java.lang.Thread.run(Thread.java:750)
    
    
    22/08/15 21:07:36 WARN org.apache.spark.h2o.backends.internal.InternalH2OBackend: New spark executor joined the cloud, however it won't be used for the H2O computations.
    Exception in thread "Thread-57" ai.h2o.sparkling.backend.exceptions.H2OClusterNotReachableException: H2O cluster 10.192.80.233:54321 - sparkling-water-root_application_1660588132711_0002 is not reachable,
    H2OContext has been closed! Please create a new H2OContext to a healthy and reachable (web enabled)
    H2O cluster.
    	at ai.h2o.sparkling.H2OContext$$anon$2.run(H2OContext.scala:382)
    Caused by: ai.h2o.sparkling.backend.exceptions.RestApiNotReachableException: H2O node http://10.192.80.233:54321/ is not reachable.
    Please verify that you are passing ip and port of existing cluster node and the cluster
    is running with web enabled.
    	at ai.h2o.sparkling.backend.utils.RestCommunication.throwRestApiNotReachableException(RestCommunication.scala:433)
    	at ai.h2o.sparkling.backend.utils.RestCommunication.readURLContent(RestCommunication.scala:390)
    	at ai.h2o.sparkling.backend.utils.RestCommunication.readURLContent$(RestCommunication.scala:370)
    	at ai.h2o.sparkling.backend.utils.RestApiUtils$.readURLContent(RestApiUtils.scala:96)
    	at ai.h2o.sparkling.backend.utils.RestCommunication.request(RestCommunication.scala:182)
    	at ai.h2o.sparkling.backend.utils.RestCommunication.request$(RestCommunication.scala:172)
    	at ai.h2o.sparkling.backend.utils.RestApiUtils$.request(RestApiUtils.scala:96)
    	at ai.h2o.sparkling.backend.utils.RestCommunication.query(RestCommunication.scala:67)
    	at ai.h2o.sparkling.backend.utils.RestCommunication.query$(RestCommunication.scala:59)
    	at ai.h2o.sparkling.backend.utils.RestApiUtils$.query(RestApiUtils.scala:96)
    	at ai.h2o.sparkling.backend.utils.RestApiUtils.getPingInfo(RestApiUtils.scala:32)
    	at ai.h2o.sparkling.backend.utils.RestApiUtils.getPingInfo$(RestApiUtils.scala:30)
    	at ai.h2o.sparkling.backend.utils.RestApiUtils$.getPingInfo(RestApiUtils.scala:96)
    	at ai.h2o.sparkling.H2OContext.ai$h2o$sparkling$H2OContext$$getSparklingWaterHeartbeatEvent(H2OContext.scala:344)
    	at ai.h2o.sparkling.H2OContext$$anon$2.run(H2OContext.scala:356)
    Caused by: java.net.ConnectException: Connection refused (Connection refused)
    	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    	at sun.net.www.protocol.http.HttpURLConnection$10.run(HttpURLConnection.java:1952)
    	at sun.net.www.protocol.http.HttpURLConnection$10.run(HttpURLConnection.java:1947)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1946)
    	at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1516)
    	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1500)
    	at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
    	at ai.h2o.sparkling.backend.utils.RestCommunication.$anonfun$checkResponseCode$1(RestCommunication.scala:398)
    	at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.java:23)
    	at scala.util.Try$.apply(Try.scala:213)
    	at ai.h2o.sparkling.backend.utils.RestCommunication.retry(RestCommunication.scala:439)
    	at ai.h2o.sparkling.backend.utils.RestCommunication.checkResponseCode(RestCommunication.scala:398)
    	at ai.h2o.sparkling.backend.utils.RestCommunication.checkResponseCode$(RestCommunication.scala:394)
    	at ai.h2o.sparkling.backend.utils.RestApiUtils$.checkResponseCode(RestApiUtils.scala:96)
    	at ai.h2o.sparkling.backend.utils.RestCommunication.readURLContent(RestCommunication.scala:386)
    	... 13 more
    Caused by: java.net.ConnectException: Connection refused (Connection refused)
    	at java.net.PlainSocketImpl.socketConnect(Native Method)
    	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
    	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
    	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
    	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    	at java.net.Socket.connect(Socket.java:607)
    	at java.net.Socket.connect(Socket.java:556)
    	at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
    	at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
    	at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
    	at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
    	at sun.net.www.http.HttpClient.New(HttpClient.java:339)
    	at sun.net.www.http.HttpClient.New(HttpClient.java:357)
    	at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1228)
    	at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1162)
    	at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1056)
    	at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:990)
    	at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1572)
    	... 23 more
    

    **

    • Are you using Windows/Linux/MAC? I think the cluster should run on Linux
    • Spark & Sparkling Water configuration including the memory configuration**
    image

    Please also provide us with the full and minimal reproducible code.

    opened by cliu-sift 13
  • Ref count mismatch for vec ERROR while training GLM models

    Ref count mismatch for vec ERROR while training GLM models

    We are using a Databricks notebook where we are training around 6k GLM models in one step and then another 6k GLM models in another step. The training data contains around 250 variables. We are using the models to make predictions in the same notebook, the models are saved in DBFS which is mapped with AzureDataLake.

    We are facing the following error in the training step:

    image This error is appearing randomly in terms of frequency and it usually works successfully after the restart of the cluster.

    • Sparkling Water/PySparkling/RSparkling version: ai.h2o:sparkling-water-package_2.12:3.32.0.4-1-3.0
    • Spark version: 3.0.1
    • Scala version: 2.12
    • Execution mode: Databricks Spark Cluster

    We don't have reproducible code because the error is appearing randomly.

    Most of the time it is appearing on the second step of training the models. We tried to use h2o.removeAll() command between the trainings, but the error is still appearing sometimes, or a new type of error appeared: image

    The error appears more frequently when running from ADF pipelines, but it was reproduced also running from Databricks. The error appears with different frequencies on our 4 environments that we are testing on, appearing every 10th run to every 3rd run, but no rule found. We tried to change the configuration of the cluster to use 4 or 8 nodes, no impact made.

    Output Log.txt

    Any type of support is welcomed, for more information please request and we'll try to provide.

    opened by denisabarar 22
Owner
H2O.ai
Fast Scalable Machine Learning For Smarter Applications
H2O.ai
Serverless proxy for Spark cluster

Hydrosphere Mist Hydrosphere Mist is a serverless proxy for Spark cluster. Mist provides a new functional programming framework and deployment model f

hydrosphere.io 317 Dec 1, 2022
Firestorm is a Remote Shuffle Service, and provides the capability for Apache Spark applications to store shuffle data on remote servers

What is Firestorm Firestorm is a Remote Shuffle Service, and provides the capability for Apache Spark applications to store shuffle data on remote ser

Tencent 246 Nov 29, 2022
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

Oryx 2 is a realization of the lambda architecture built on Apache Spark and Apache Kafka, but with specialization for real-time large scale machine l

Oryx Project 1.8k Dec 28, 2022
Apache Spark - A unified analytics engine for large-scale data processing

Apache Spark Spark is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an op

The Apache Software Foundation 34.7k Jan 2, 2023
Model import deployment framework for retraining models (pytorch, tensorflow,keras) deploying in JVM Micro service environments, mobile devices, iot, and Apache Spark

The Eclipse Deeplearning4J (DL4J) ecosystem is a set of projects intended to support all the needs of a JVM based deep learning application. This mean

Eclipse Foundation 12.7k Dec 30, 2022
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

Oryx 2 is a realization of the lambda architecture built on Apache Spark and Apache Kafka, but with specialization for real-time large scale machine l

Oryx Project 1.7k Mar 12, 2021
SparkFE is the LLVM-based and high-performance Spark native execution engine which is designed for feature engineering.

Spark has rapidly emerged as the de facto standard for big data processing. However, it is not designed for machine learning which has more and more limitation in AI scenarios. SparkFE rewrite the execution engine in C++ and achieve more than 6x performance improvement for feature extraction. It guarantees the online-offline consistency which makes AI landing much easier. For further details, please refer to SparkFE Documentation.

4Paradigm 67 Jun 10, 2021
Spark interface for Drsti

Drsti for Spark (ai.jgp.drsti-spark) Spark interface for Drsti Resources Bringing vision to Apache Spark (2021-09-21) introduces Drsti and explains ho

Jean-Georges 3 Sep 22, 2021
Flink/Spark Connectors for Apache Doris(Incubating)

Apache Doris (incubating) Connectors The repository contains connectors for Apache Doris (incubating) Flink Doris Connector More information about com

The Apache Software Foundation 30 Dec 7, 2022
Word Count in Apache Spark using Java

Word Count in Apache Spark using Java

Arjun Gautam 2 Feb 24, 2022
Please visit https://github.com/h2oai/h2o-3 for latest H2O

Caution: H2O-3 is now the current H2O! Please visit https://github.com/h2oai/h2o-3 H2O H2O makes Hadoop do math! H2O scales statistics, machine learni

H2O.ai 2.2k Jan 6, 2023
Please visit https://github.com/h2oai/h2o-3 for latest H2O

Caution: H2O-3 is now the current H2O! Please visit https://github.com/h2oai/h2o-3 H2O H2O makes Hadoop do math! H2O scales statistics, machine learni

H2O.ai 2.2k Dec 9, 2022
Serverless proxy for Spark cluster

Hydrosphere Mist Hydrosphere Mist is a serverless proxy for Spark cluster. Mist provides a new functional programming framework and deployment model f

hydrosphere.io 317 Dec 1, 2022
A simple expressive web framework for java. Spark has a kotlin DSL https://github.com/perwendel/spark-kotlin

Spark - a tiny web framework for Java 8 Spark 2.9.3 is out!! Changeset <dependency> <groupId>com.sparkjava</groupId> <artifactId>spark-core</a

Per Wendel 9.4k Dec 29, 2022
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

Sparkler A web crawler is a bot program that fetches resources from the web for the sake of building applications like search engines, knowledge bases

USC Information Retrieval & Data Science 396 Dec 17, 2022
Water logistics mod, now on Fabric.

Little Logistics: Fabric Edition Download There are no official downloads at this point. From time to time you may find builds floating arround on the

Luca 1 Jul 10, 2022
Provides some Apple Wallet functionality, like adding passes, removing passes and checking passises for existing.

react-native-wallet-manager Provides some Apple Wallet's functionality, like adding passes, removing passes and checking passises for existing. Instal

dev.family 50 Nov 12, 2022
Core ORMLite functionality that provides a lite Java ORM in conjunction with ormlite-jdbc or ormlite-android

ORMLite Core This package provides the core functionality for the JDBC and Android packages. Users that are connecting to SQL databases via JDBC shoul

Gray 547 Dec 25, 2022
TopologyAPI provides the functionality to access, manage and store device topologies.

TopologyAPI Providing the functionality to access, manage and store device topologies, given different json files each includes one topology, storing

Islam Walid 3 Mar 31, 2022