:elephant: Elasticsearch real-time search and analytics natively integrated with Hadoop

Last update: Dec 22, 2022

Related tags

Overview

Elasticsearch Hadoop

Elasticsearch real-time search and analytics natively integrated with Hadoop. Supports Map/Reduce, Apache Hive, Apache Pig, Apache Spark and Apache Storm.

See project page and documentation for detailed information.

Requirements

Elasticsearch (1.x or higher (2.x highly recommended)) cluster accessible through REST. That's it! Significant effort has been invested to create a small, dependency-free, self-contained jar that can be downloaded and put to use without any dependencies. Simply make it available to your job classpath and you're set. For a certain library, see the dedicated chapter.

ES-Hadoop 6.x and higher are compatible with Elasticsearch 1.X, 2.X, 5.X, and 6.X

ES-Hadoop 5.x and higher are compatible with Elasticsearch 1.X, 2.X and 5.X

ES-Hadoop 2.2.x and higher are compatible with Elasticsearch 1.X and 2.X

ES-Hadoop 2.0.x and 2.1.x are compatible with Elasticsearch 1.X only

Installation

Stable Release (currently `7.12.0`)

Available through any Maven-compatible tool:

<dependency>
  <groupId>org.elasticsearch</groupId>
  <artifactId>elasticsearch-hadoop</artifactId>
  <version>7.12.0</version>
</dependency>

or as a stand-alone ZIP.

Development Snapshot

Grab the latest nightly build from the repository again through Maven:

<dependency>
  <groupId>org.elasticsearch</groupId>
  <artifactId>elasticsearch-hadoop</artifactId>
  <version>8.0.0-SNAPSHOT</version>
</dependency>

<repositories>
  <repository>
    <id>sonatype-oss</id>
    <url>http://oss.sonatype.org/content/repositories/snapshots</url>
    <snapshots><enabled>true</enabled></snapshots>
  </repository>
</repositories>

or build the project yourself.

We do build and test the code on each commit.

Supported Hadoop Versions

Running against Hadoop 1.x is deprecated in 5.5 and will no longer be tested against in 6.0. ES-Hadoop is developed for and tested against Hadoop 2.x and YARN. More information in this section.

Feedback / Q&A

We're interested in your feedback! You can find us on the User mailing list - please append [Hadoop] to the post subject to filter it out. For more details, see the community page.

Online Documentation

The latest reference documentation is available online on the project home page. Below the README contains basic usage instructions at a glance.

Usage

Configuration Properties

All configuration properties start with es prefix. Note that the es.internal namespace is reserved for the library internal use and should not be used by the user at any point. The properties are read mainly from the Hadoop configuration but the user can specify (some of) them directly depending on the library used.

Required

es.resource=<ES resource location, relative to the host/port specified above>

Essential

es.query=<uri or query dsl query>              # defaults to {"query":{"match_all":{}}}
es.nodes=<ES host address>                     # defaults to localhost
es.port=<ES REST port>                         # defaults to 9200

The full list is available here

Map/Reduce

For basic, low-level or performance-sensitive environments, ES-Hadoop provides dedicated InputFormat and OutputFormat that read and write data to Elasticsearch. To use them, add the es-hadoop jar to your job classpath (either by bundling the library along - it's ~300kB and there are no-dependencies), using the DistributedCache or by provisioning the cluster manually. See the documentation for more information.

Note that es-hadoop supports both the so-called 'old' and the 'new' API through its EsInputFormat and EsOutputFormat classes.

'Old' (`org.apache.hadoop.mapred`) API

Reading

To read data from ES, configure the EsInputFormat on your job configuration along with the relevant properties:

JobConf conf = new JobConf();
conf.setInputFormat(EsInputFormat.class);
conf.set("es.resource", "radio/artists");
conf.set("es.query", "?q=me*");             // replace this with the relevant query
...
JobClient.runJob(conf);

Writing

Same configuration template can be used for writing but using EsOuputFormat:

JobConf conf = new JobConf();
conf.setOutputFormat(EsOutputFormat.class);
conf.set("es.resource", "radio/artists"); // index or indices used for storing data
...
JobClient.runJob(conf);

'New' (`org.apache.hadoop.mapreduce`) API

Reading

Configuration conf = new Configuration();
conf.set("es.resource", "radio/artists");
conf.set("es.query", "?q=me*");             // replace this with the relevant query
Job job = new Job(conf)
job.setInputFormatClass(EsInputFormat.class);
...
job.waitForCompletion(true);

Writing

Configuration conf = new Configuration();
conf.set("es.resource", "radio/artists"); // index or indices used for storing data
Job job = new Job(conf)
job.setOutputFormatClass(EsOutputFormat.class);
...
job.waitForCompletion(true);

Apache Hive

ES-Hadoop provides a Hive storage handler for Elasticsearch, meaning one can define an external table on top of ES.

Add es-hadoop-.jar to hive.aux.jars.path or register it manually in your Hive script (recommended):

ADD JAR /path_to_jar/es-hadoop-<version>.jar;

Reading

To read data from ES, define a table backed by the desired index:

CREATE EXTERNAL TABLE artists (
    id      BIGINT,
    name    STRING,
    links   STRUCT<url:STRING, picture:STRING>)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'radio/artists', 'es.query' = '?q=me*');

The fields defined in the table are mapped to the JSON when communicating with Elasticsearch. Notice the use of TBLPROPERTIES to define the location, that is the query used for reading from this table.

Once defined, the table can be used just like any other:

SELECT * FROM artists;

Writing

To write data, a similar definition is used but with a different es.resource:

CREATE EXTERNAL TABLE artists (
    id      BIGINT,
    name    STRING,
    links   STRUCT<url:STRING, picture:STRING>)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'radio/artists');

Any data passed to the table is then passed down to Elasticsearch; for example considering a table s, mapped to a TSV/CSV file, one can index it to Elasticsearch like this:

INSERT OVERWRITE TABLE artists
    SELECT NULL, s.name, named_struct('url', s.url, 'picture', s.picture) FROM source s;

As one can note, currently the reading and writing are treated separately but we're working on unifying the two and automatically translating HiveQL to Elasticsearch queries.

Apache Pig

ES-Hadoop provides both read and write functions for Pig so you can access Elasticsearch from Pig scripts.

REGISTER /path_to_jar/es-hadoop-<version>.jar;

Additionally one can define an alias to save some chars:

%define ESSTORAGE org.elasticsearch.hadoop.pig.EsStorage()

and use $ESSTORAGE for storage definition.

Reading

To read data from ES, use EsStorage and specify the query through the LOAD function:

A = LOAD 'radio/artists' USING org.elasticsearch.hadoop.pig.EsStorage('es.query=?q=me*');
DUMP A;

Writing

Use the same Storage to write data to Elasticsearch:

A = LOAD 'src/artists.dat' USING PigStorage() AS (id:long, name, url:chararray, picture: chararray);
B = FOREACH A GENERATE name, TOTUPLE(url, picture) AS links;
STORE B INTO 'radio/artists' USING org.elasticsearch.hadoop.pig.EsStorage();

Apache Spark

ES-Hadoop provides native (Java and Scala) integration with Spark: for reading a dedicated RDD and for writing, methods that work on any RDD. Spark SQL is also supported

Scala

Reading

To read data from ES, create a dedicated RDD and specify the query as an argument:

import org.elasticsearch.spark._

..
val conf = ...
val sc = new SparkContext(conf)
sc.esRDD("radio/artists", "?q=me*")

Spark SQL

import org.elasticsearch.spark.sql._

// DataFrame schema automatically inferred
val df = sqlContext.read.format("es").load("buckethead/albums")

// operations get pushed down and translated at runtime to Elasticsearch QueryDSL
val playlist = df.filter(df("category").equalTo("pikes").and(df("year").geq(2016)))

Writing

Import the org.elasticsearch.spark._ package to gain savetoEs methods on your RDDs:

import org.elasticsearch.spark._

val conf = ...
val sc = new SparkContext(conf)

val numbers = Map("one" -> 1, "two" -> 2, "three" -> 3)
val airports = Map("OTP" -> "Otopeni", "SFO" -> "San Fran")

sc.makeRDD(Seq(numbers, airports)).saveToEs("spark/docs")

Spark SQL

import org.elasticsearch.spark.sql._

val df = sqlContext.read.json("examples/people.json")
df.saveToEs("spark/people")

Java

In a Java environment, use the org.elasticsearch.spark.rdd.java.api package, in particular the JavaEsSpark class.

Reading

To read data from ES, create a dedicated RDD and specify the query as an argument.

import org.apache.spark.api.java.JavaSparkContext;
import org.elasticsearch.spark.rdd.api.java.JavaEsSpark;

SparkConf conf = ...
JavaSparkContext jsc = new JavaSparkContext(conf);

JavaPairRDD<String, Map<String, Object>> esRDD = JavaEsSpark.esRDD(jsc, "radio/artists");

Spark SQL

SQLContext sql = new SQLContext(sc);
DataFrame df = sql.read().format("es").load("buckethead/albums");
DataFrame playlist = df.filter(df.col("category").equalTo("pikes").and(df.col("year").geq(2016)))

Writing

Use JavaEsSpark to index any RDD to Elasticsearch:

import org.elasticsearch.spark.rdd.api.java.JavaEsSpark;

SparkConf conf = ...
JavaSparkContext jsc = new JavaSparkContext(conf);

Map<String, ?> numbers = ImmutableMap.of("one", 1, "two", 2);
Map<String, ?> airports = ImmutableMap.of("OTP", "Otopeni", "SFO", "San Fran");

JavaRDD<Map<String, ?>> javaRDD = jsc.parallelize(ImmutableList.of(numbers, airports));
JavaEsSpark.saveToEs(javaRDD, "spark/docs");

Spark SQL

import org.elasticsearch.spark.sql.api.java.JavaEsSparkSQL;

DataFrame df = sqlContext.read.json("examples/people.json")
JavaEsSparkSQL.saveToEs(df, "spark/docs")

Apache Storm

ES-Hadoop provides native integration with Storm: for reading a dedicated Spout and for writing a specialized Bolt

Reading

To read data from ES, use EsSpout:

import org.elasticsearch.storm.EsSpout;

TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("es-spout", new EsSpout("storm/docs", "?q=me*"), 5);
builder.setBolt("bolt", new PrinterBolt()).shuffleGrouping("es-spout");

Writing

To index data to ES, use EsBolt:

import org.elasticsearch.storm.EsBolt;

TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 10);
builder.setBolt("es-bolt", new EsBolt("storm/docs"), 5).shuffleGrouping("spout");

Building the source

Elasticsearch Hadoop uses Gradle for its build system and it is not required to have it installed on your machine. By default (gradlew), it automatically builds the package and runs the unit tests. For integration testing, use the integrationTests task. See gradlew tasks for more information.

To create a distributable zip, run gradlew distZip from the command line; once completed you will find the jar in build/libs.

To build the project, JVM 8 (Oracle one is recommended) or higher is required.

License

This project is released under version 2.0 of the Apache License

Licensed to Elasticsearch under one or more contributor
license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright
ownership. Elasticsearch licenses this file to you under
the Apache License, Version 2.0 (the "License"); you may
not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied.  See the License for the
specific language governing permissions and limitations
under the License.

Comments

[Feature] Spark3.0 support
Feature description

Spark3 is currently in RC. Will there be support for Spark3 in the next release version (v8) or will we have to wait for v9 ? more precisely, do you guys plan to start supporting spark3 only when official spark3 release is made ?

spark3 now support only scala 2.12

enhancement :Spark v8.0.0-alpha1 v7.12.0
opened by jonathanyuechun 62
EsHadoopIllegalStateException reading Geo-Shape into DataFrame - SparkSQL
Create an index type with a mapping consisting of a field of type geo_shape.

Create an RDD[String] containing a polygon as GeoJSON, as the value of a field whose name matches the mapping: """{"rect":{"type":"Polygon","coordinates":[[[50,32],[69,32],[69,50],[50,50],[50,32]]],"crs":null}}"""

Write to an index type in Elasticsearch: rdd1.saveJsonToEs(indexName+"/"+indexType, connectorConfig)

Read into SparkSQL DataFrame with either esDF or read-format-load:

sqlContext.esDF(indexName+"/"+indexType, connectorConfig)

sqlContext.read.format("org.elasticsearch.spark.sql").options(connectorConfig).load(indexName+"/"+indexType)

Result is: org.elasticsearch.hadoop.EsHadoopIllegalStateException: Field 'rect' not found; typically this occurs with arrays which are not mapped as single value Full stack trace in gist. Elasticsearch Hadoop v2.1.2
bug :Spark v2.2.0
opened by randallwhitman 46
Document Count in ES Different from Number of Entries Pushed

We are seeing an issue where the number of documents successfully created/indexed in ES is different (more--very odd, or less) from the number of documents which we are giving ES to index.

Specifically, we are using spark to grab specific correctly formatted json from a repository and simply write it to ES into a new index. The difference in documents only seems to occur if there is a large number of documents being inserted (specifically we tested with 929,660) and too few shards (specifically 1). We also did a test at a larger scale (roughly 9 million documents and over 100 shards) which also gave incorrect document counts. We have ruled out this being a spark issue, but there are no apparent issues or exceptions in the logs (ES or Spark/ES-Hadoop), including when they are upped to debug. We do notice that ES does throttle the input. The fact that this only happens when large numbers of documents are created/indexed and that it seems to alleviate when more shards are added to balance the load points it to a load issue. I believe this to be an issue with ES core, but I was interested in seeing if anyone else here had seen this due to ES-hadoop being able to create a load issue more easily than the normal api. We have tried ES-hadoop 2.1.0-beta and 2.0.1 release with the same results.

I can provide more details as needed, but as there is nothing in the logs, there are not many other observables that I can detail. It is very similar to: http://elasticsearch-users.115913.n3.nabble.com/missing-documents-after-bulk-indexing-td4056100.html, however I am not seeing any exceptions. Please let me know if anyone has seen this issue or has any resolution.
bug :Rest v2.0.3 v2.1.0.Beta4

opened by mparker4 31

"Cannot find node with id" exception even when the node is alive and cluster is green.

I am getting the following exception when pushing data from hadoop M/R job. When this happens, the node in question is responding and cluster is also healthy (green). Also, plenty of resources on the box. CPU usage is less than 30%, free memory is over 50G. With this exception, the hadoop map task is failing and getting restarted and eventually succeeding (may be by connecting to a different ES node). These errors are not consistent. They are very intermittent.

org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot find node with id [Q4pQkOIJSSi2oXRXGUVs8w]
    at org.elasticsearch.hadoop.util.Assert.notNull(Assert.java:40)
    at org.elasticsearch.hadoop.rest.RestRepository.getWriteTargetPrimaryShards(RestRepository.java:251)
    at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.initSingleIndex(EsOutputFormat.java:218)
    at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:201)
    at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.write(EsOutputFormat.java:159)
    at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:638)
    at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
    at afi.search.hadoop.es.ESMapper1.map(ESMapper1.java:227)
    at afi.search.hadoop.es.ESMapper1.map(ESMapper1.java:1)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)

bug :Rest

opened by skkolli 29

Upgrade to Spark 2.4 and support Scala 2.12

Feature description

Spark 2.4.0 has recently been released, along with Scala 2.12 support. For those of us wanting to use Spark 2.4 features or even just enjoy Scala 2.12 binary compatibility, it would be great to update elasticsearch-spark-20 to use Spark 2.4 and support Scala 2.12 builds.

I can offer a PR (currently based off the 6.5 branch) with changes that I've been using in production without issue (and which build against Scala 2.10, 2.11 and 2.12 successfully). If there is appetite for this, please confirm which branch to target.
enhancement :Spark v8.0.0-alpha1 v7.12.0

opened by bpiper 28

EsSpark.esJsonRDD error

Hi dude... I try to put a lot of document and get it. result:

15/03/02 23:49:41 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.IllegalArgumentException: Invalid position given=154612 -1
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:234)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:165)
    at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:403)
    at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:76)
    at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:46)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
    at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:782)
    at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:782)
    at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1353)
    at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1353)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
    at org.apache.spark.scheduler.Task.run(Task.scala:56)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
15/03/02 23:49:41 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.IllegalArgumentException: Invalid position given=154612 -1
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:234)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:165)
    at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:403)
    at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:76)
    at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:46)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
    at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:782)
    at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:782)
    at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1353)
    at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1353)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
    at org.apache.spark.scheduler.Task.run(Task.scala:56)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

jsonPointers: doc = [154447,159056], fragments = 0, if you want I can send you the input. the input contain the following data: https://gist.githubusercontent.com/b0c1/486a7bea069e26d54b43/raw/a9a23bdbbbaf1a9b86918d302c3a838839b3a566/gistfile1.txt

bug :Rest v2.1.0.Beta4

opened by b0c1 27

Insert slowdowns on Master branch

Hello

I'm playing with the master branch to test the unique ID or parent child features. Testing it out I was surprised to see huge slowdowns in hive insertion queries. My test I have a table with 30 columns and I want to insert 4M entries. I have the same insert overwrite query but I have changed the deployed jar. CREATE EXTERNAL TABLE es_test( values String..STORED BY 'org.elasticsearch.hadoop.hive.ESStorageHandler' TBLPROPERTIES('es.resource' = 'test/test','es.host' = 'myhost')

When I insert with current release I get a thoughput of 12k/s so 5 minutes for insertion with on or 2 map tasks failed on:

java.lang.IllegalStateException: Cannot get response body for [POST]test/test/_bulk] at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:189)

When I insert with the master the same data takes about 57 minutes for insertion I get around 50-60% failed map tasks with a lot of:

java.lang.IllegalStateException: Cannot get response body for [PUT][test] at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:221)

I/O exception (java.net.ConnectException) caught when processing request: Connection timed out and some EsRejectedExecutionExceptions bulk thread pool 50.

I'll keep investigating to find a more usefull error or a more definite reason. Hope this posts helps.
bug :Hive :Rest

opened by nmaillard 27

Unable to index JSON from HDFS using SchemaRDD.saveToEs()

Elasticsearch-hadoop: elasticsearch-hadoop-2.1.0.BUILD-20150224.023403-309.zip Spark: 1.2.0

I'm attempting to take a very basic JSON document on HDFS and index it in ES using SchemaRDD.saveToEs(). According to the documentation under "writing existing JSON to elasticsearch" it should be as easy as creating a SchemaRDD via SQLContext.jsonFile() and then index using .saveToEs(), but I'm getting an error.

Replicating the problem:

Create JSON file on hdfs with the content:

{"key":"value"}

Execute code in spark-shell

import org.apache.spark.SparkContext._
import org.elasticsearch.spark._

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext._

val input = sqlContext.jsonFile("hdfs://nameservice1/user/mshirley/test.json")
input.saveToEs("mshirley_spark_test/test")

Error:

 org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found unrecoverable error [Bad Request(400) - Invalid JSON fragment received[["value"]][MapperParsingException[failed to parse]; nested: ElasticsearchParseException[Failed to derive xcontent from (offset=13, length=9): [123, 34, 105, 110, 100, 101, 120, 34, 58, 123, 125, 125, 10, 91, 34, 118, 97, 108, 117, 101, 34, 93, 10]]; ]];

input object:

res1: org.apache.spark.sql.SchemaRDD = 
SchemaRDD[6] at RDD at SchemaRDD.scala:108
== Query Plan ==
== Physical Plan ==
PhysicalRDD [key#0], MappedRDD[5] at map at JsonRDD.scala:47

input.printSchema():

root
 |-- key: string (nullable = true)

Expected result: I expect to be able to read a file from HDFS that contains a JSON document per line and submit that data to ES for indexing.

bug invalid :Spark v2.1.0.Beta4

opened by mshirley 26

ES-Hadoop Pig

[ ] Bug report. If you’ve found a bug, please provide a code snippet or test to reproduce it below.
The easier it is to track down the bug, the faster it is solved.

Issue description

I'm trying to load a json document to elasticsearch and run but it pops out an error

Description

Used the below command to load data:

grunt> REGISTER hdfs://localhost:9000/lib/elasticsearch-hadoop-2.1.1.jar;

grunt> JSON_DATA = LOAD '/ch07/crimes.json' USING PigStorage() AS (json:chararray);

grunt> STORE JSON_DATA INTO 'esh_pig/crimes_json' USING org.elasticsearch.hadoop.pig.EsStorage('es.input.json=true');

Steps to reproduce

Below is the error:

2017-07-25 12:21:17,674 [Thread-301] WARN  org.apache.hadoop.mapred.LocalJobRunner - job_local1500679296_0008
java.lang.Exception: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found unrecoverable error [Bad Request(400) - [MapperParsingException[failed to parse]; nested: ElasticsearchParseException[Failed to derive xcontent]; ]]; Bailing out..
	at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found unrecoverable error [Bad Request(400) - [MapperParsingException[failed to parse]; nested: ElasticsearchParseException[Failed to derive xcontent]; ]]; Bailing out..
	at org.elasticsearch.hadoop.rest.RestClient.retryFailedEntries(RestClient.java:203)
	at org.elasticsearch.hadoop.rest.RestClient.bulk(RestClient.java:167)
	at org.elasticsearch.hadoop.rest.RestRepository.tryFlush(RestRepository.java:209)
	at org.elasticsearch.hadoop.rest.RestRepository.flush(RestRepository.java:232)
	at org.elasticsearch.hadoop.rest.RestRepository.close(RestRepository.java:250)
	at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.doClose(EsOutputFormat.java:214)
	at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.close(EsOutputFormat.java:196)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.close(PigOutputFormat.java:146)
	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:670)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:748)
2017-07-25 12:21:22,664 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.

Version Info

OS: : CentOS 7 JVM : 1.8.0_131 Hadoop: Hadoop 2.7.3 ES : 1.7.1

invalid :Pig :Serialization

opened by splikhita 25

Unable to insert data into ES from Hive - Cannot determine write shards

Hello,

I'm having a problem inserting data from Hive into ES using ES Hadoop and I cant for the life of my figure out what is going wrong.

I've created a simple Hive table in a database called dev:

CREATE external TABLE IF NOT EXISTS dev.carltest (
  TimeTaken                      string
)
row format serde 'com.bizo.hive.serde.csv.CSVSerde'
with serdeproperties (
"separatorChar" = ","
)   
LOCATION '/user/cloudera/carltest'

When I run a select query on this table it pulls back the single vale absolutely fine. I've also created another simple Hive table in a database called elasticsearchtables:

CREATE EXTERNAL TABLE IF NOT EXISTS elasticsearchtables.carltest (
TimeTaken string
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES ('es.resource' = 'proxylog-carltest/event',
               'es.index.auto.create' = 'true',
               'es.nodes' = <servername here>,
               'es.port' = '9200',
               'es.field.read.empty.as.null' ='true',
               'es.mapping.names' = 'TimeTaken:TimeTaken'
)

When I run the following Hive query:

INSERT OVERWRITE TABLE elasticsearchtables.carltest
SELECT TimeTaken
FROM dev.carltest

I get a map reduce error and looking in the hive log it states: Task with the most failures(4):

Diagnostic Messages for this Task:

Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"timetaken":"1"}
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:175)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"timetaken":"1"}
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:529)
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157)
    ... 8 more
Caused by: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot determine write shards for [proxylog-carltest/event]; likely its format is incorrect (maybe it contains illegal characters?)
    at org.elasticsearch.hadoop.util.Assert.isTrue(Assert.java:50)
    at org.elasticsearch.hadoop.rest.RestService.initSingleIndex(RestService.java:427)
    at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:392)
    at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:173)
    at org.elasticsearch.hadoop.hive.EsHiveOutputFormat$EsHiveRecordWriter.write(EsHiveOutputFormat.java:58)
    at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:638)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:847)
    at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:847)
    at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:847)
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:519)
    ... 9 more

My ES cluster is running 1.5.2 and i'm using the latest beta release of ES-Hadoop and I'm running CDH 5.1.3

Any help with this would be greatly appreciated.

question :Hive v2.2.0-m1

opened by pricecarl 25

Out of nodes
Me again with another, probably silly, question. Im trying to index many files which are already in the hdfs. Indexing single files is no problem, no I'm passing all the files which should be indexed at once through the command line using the script above. For a few minutes it works fine, until it gives this error:

ERROR 2997: Encountered IOException. Out of nodes and retries; caught exception register /home/kolbe/elasticsearch-hadoop-1.3.0.M2/dist/elasticsearch-hadoop-1.3.0.M2-yarn.jar a = load '$files' using PigStorage() as (json:chararray); store a into '$index' using org.elasticsearch.hadoop.pig.EsStorage('es.input.json=true','es.nodes=<adress>:9200');
bug :Pig v2.0.0.RC1
opened by Foolius 25
Spark connector's implentation of "explode" does not work on nested fields
Hi everyone,

What kind an issue is this?

[X] Bug report.

[ ] Feature Request

Issue description

We use Spark to manipulated an array of distinct objects in an ElasticSearch Index. The ElasticSearch index's field is mapped as :

"array_field": { "type": "nested", "properties": { "property1": { "type": "text", "fields": { "keyword": { "type": "keyword" } } }, "property2": { "type": "text", "fields": { "keyword": { "type": "keyword" } } }, "property3": { "type": "text", "fields": { "keyword": { "type": "keyword" } } }, "property4": { "type": "text", "fields": { "keyword": { "type": "keyword" } } }, "property5": { "type": "date" } } }

When we use the explode Spark function on a dataset created from reading from ElasticSearch the connector generates the following query :

"query": { "bool": { "must": [ { "match_all": {} } ], "filter": [ { "exists": { "field": "array_field" } } ] } }

The "exists" part in the query is generated to differentiate calls of explode and explode_outer because explode drops nulls elements whereas explode_outer keeps them. But since the field is a nested, the query never gets any match because it is not a nested query therefore the dataset is always empty.

Steps to reproduce

Create an index with a nested mapped field

Put a document with a valued nested field

Read the index from Spark into a dataset

Call Spark explode(field) on the nested field on the dataset

The dataset is empty because the generated query does not match any document

Version Info

OS: : Linux JVM : 1.8 Hadoop/Spark: Spark 3.3.0
ES-Hadoop : elasticsearch-spark-30_2.12:8.2.2
opened by ThibSCH 0

Casting SparkSQL - scala.collection.convert.Wrappers$JListWrapper is not a valid external type for schema

What kind an issue is this?

[x] Bug report. If you’ve found a bug, please provide a code snippet or test to reproduce it below.
The easier it is to track down the bug, the faster it is solved.
[ ] Feature Request. Start by telling us what problem you’re trying to solve.
Often a solution already exists! Don’t send pull requests to implement new features without first getting our support. Sometimes we leave features out on purpose to keep the project small.

Issue description

Hello, I'd try reading data from an instance of elastic search via SparkSQL and it throw a casting exception.

The provided schema is the following:

root
 |-- categories: struct (nullable = true)
 |    |-- id: string (nullable = true)
 |    |-- name: string (nullable = true)

Steps to reproduce

Code:

  val indice: String = "movies"

  lazy val customSparkOptions: Map[String, String] = Map(
    "es.nodes" -> "localhost",
    "es.net.http.auth.user" -> "elastic",
    "es.net.http.auth.pass" -> "admin",
    "es.resource" -> indice,
    "es.nodes.wan.only" -> "true",

  )
  lazy val sparkConf = new SparkConf().setAll(customSparkOptions)

  lazy val spark: SparkSession =
    SparkSession
      .builder()
      .master("local[2]")
      .config(sparkConf)
      .getOrCreate()

  val elasticDF: DataFrame = spark.read.format("org.elasticsearch.spark.sql").load(indice).select("categories")

  elasticDF.printSchema()
  elasticDF.show(false)

Strack trace:

java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: scala.collection.convert.Wrappers$JListWrapper is not a valid external type for schema of struct<id:string,name:string>
if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else named_struct(id, if (validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 0, categories), StructField(id,StringType,true), StructField(name,StringType,true)).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getexternalrowfield(validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 0, categories), StructField(id,StringType,true), StructField(name,StringType,true)), 0, id), StringType), true, false), name, if (validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 0, categories), StructField(id,StringType,true), StructField(name,StringType,true)).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getexternalrowfield(validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 0, categories), StructField(id,StringType,true), StructField(name,StringType,true)), 1, name), StringType), true, false)) AS categories#28
	at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:215)
	at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:197)
	at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:340)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:872)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:872)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:127)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.RuntimeException: scala.collection.convert.Wrappers$JListWrapper is not a valid external type for schema of struct<id:string,name:string>
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.Invoke_0$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.CreateNamedStruct_0$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
	at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:211)
	... 19 more

Version Info

OS: : ubuntu 20.04 JVM : 1.8.0_342 Hadoop/Spark: 3.0.1 ES-Hadoop : 7.12.0 ES : 7.12.0 Scala : 2.12.8

Feature description

opened by rafafrdz 3

Update configuration.adoc
Thank you for submitting a pull request!

Please make sure you have signed our Contributor License Agreement (CLA).
We are not asking you to assign copyright to us, but to give us the right to distribute your code without restriction. We ask this of all contributors in order to assure our users of the origin and continuing existence of the code.
You only need to sign the CLA once.

[ ] I have signed the Contributor License Agreement (CLA)

>docs
opened by cbc666 2

Support for ArrayType in es.update.script.params

What kind an issue is this?

[x] Bug report. If you’ve found a bug, please provide a code snippet or test to reproduce it below.
The easier it is to track down the bug, the faster it is solved.
[ ] Feature Request. Start by telling us what problem you’re trying to solve.
Often a solution already exists! Don’t send pull requests to implement new features without first getting our support. Sometimes we leave features out on purpose to keep the project small.

Issue description

Arrays fields are not working with ES Update script. tags is an array in shared example and it fails in transformation stage. Do we have reference to use ArrayType with es.update.script.params

Steps to reproduce

Code:

        List<String> data = Collections.singletonList("{\"Context\":\"129\",\"MessageType\":{\"id\":\"1013\",\"content\":\"Hello World\"},\"Module\":\"1203\",\"time\":3249,\"critical\":0,\"id\":1, \"tags\":[\"user\",\"device\"]}");

        SparkConf sparkConf = new SparkConf();

        sparkConf.set("es.nodes", "localhost");
        sparkConf.set("es.port", "9200");
        sparkConf.set("es.net.ssl", "false");
        sparkConf.set("es.nodes.wan.only", "true");

        SparkSession session = SparkSession.builder().appName("SparkElasticSearchTest").master("local[*]").config(sparkConf).getOrCreate();

        Dataset<Row> df = session.createDataset(data, Encoders.STRING()).toDF();
        Dataset<String> df1 = df.as(Encoders.STRING());
        Dataset<Row> df2 = session.read().json(df1.javaRDD());

        df2.printSchema();
        df2.show(false);

        String script = "ctx._source.Context = params.Context; ctx._source.Module = params.Module; ctx._source.critical = params.critical; ctx._source.id = params.id; if (ctx._source.time == null) {ctx._source.time = params.time} ctx._source.MessageType = new HashMap(); ctx._source.MessageType.put('id', params.MessageTypeId); ctx._source.MessageType.put('content', params.MessageTypeContent); ctx._source.tags = params.tags";

        String ja = "MessageTypeId:MessageType.id, MessageTypeContent:MessageType.content, Context:Context, Module:Module, time:time, critical:critical, id:id, tags:tags";

        DataFrameWriter<Row> dsWriter = df2.write()
                .format("org.elasticsearch.spark.sql")
                .option(ConfigurationOptions.ES_NODES, "localhost")
                .option(ConfigurationOptions.ES_PORT, "9200")
                .option(ConfigurationOptions.ES_NET_USE_SSL, false)
                .option(ConfigurationOptions.ES_NODES_WAN_ONLY, "true")
                .option(ConfigurationOptions.ES_MAPPING_ID, "id")
                .option(ConfigurationOptions.ES_WRITE_OPERATION, ConfigurationOptions.ES_OPERATION_UPSERT)
                .option(ConfigurationOptions.ES_UPDATE_SCRIPT_UPSERT, "true")
                .option(ConfigurationOptions.ES_UPDATE_SCRIPT_INLINE, script)
                .option(ConfigurationOptions.ES_UPDATE_SCRIPT_LANG, "painless")
                .option(ConfigurationOptions.ES_UPDATE_SCRIPT_PARAMS, ja);


        dsWriter.mode("append");
        dsWriter.save("user-details");

Strack trace:

org.elasticsearch.hadoop.serialization.EsHadoopSerializationException: java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.sql.Row
	at org.elasticsearch.hadoop.serialization.bulk.BulkEntryWriter.writeBulkEntry(BulkEntryWriter.java:136)
	at org.elasticsearch.hadoop.rest.RestRepository.writeToIndex(RestRepository.java:170)
	at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:83)
	at org.elasticsearch.spark.sql.EsSparkSQL$.$anonfun$saveToEs$1(EsSparkSQL.scala:103)
	at org.elasticsearch.spark.sql.EsSparkSQL$.$anonfun$saveToEs$1$adapted(EsSparkSQL.scala:103)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:131)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.sql.Row
	at org.elasticsearch.spark.sql.DataFrameValueWriter.writeArray(DataFrameValueWriter.scala:75)
	at org.elasticsearch.spark.sql.DataFrameValueWriter.write(DataFrameValueWriter.scala:69)
	at org.elasticsearch.hadoop.serialization.bulk.AbstractBulkFactory$FieldWriter.doWrite(AbstractBulkFactory.java:153)
	at org.elasticsearch.hadoop.serialization.bulk.AbstractBulkFactory$FieldWriter.write(AbstractBulkFactory.java:123)
	at org.elasticsearch.hadoop.serialization.bulk.TemplatedBulk.writeTemplate(TemplatedBulk.java:80)
	at org.elasticsearch.hadoop.serialization.bulk.TemplatedBulk.write(TemplatedBulk.java:56)
	at org.elasticsearch.hadoop.serialization.bulk.BulkEntryWriter.writeBulkEntry(BulkEntryWriter.java:68)
	... 12 more

Version Info

OS: : Any JVM : 1.8 Hadoop/Spark: 3.2.1
ES-Hadoop : 8.2.2 ES : 7.10.2

Feature description

opened by vararo27 2

Add an option to pass a truststore in a format of a base64 string
What kind an issue is this?

[ ] Bug report. If you’ve found a bug, please provide a code snippet or test to reproduce it below.
The easier it is to track down the bug, the faster it is solved.

[x] Feature Request. Start by telling us what problem you’re trying to solve.
Often a solution already exists! Don’t send pull requests to implement new features without first getting our support. Sometimes we leave features out on purpose to keep the project small.

Feature description

The current implementation of passing TrustStore is limited to a path to certain storage - either hdfs or some other location elasticsearch-hadoop could access. It burdens the user with uploading and maintaining the file somewhere, which can be quite a challenge in the case of many connections.

To simplify the process, Truststore could be loaded from a base64 string, decrypted, and loaded inside the elasticsearch-hadoop package. Of course, such functionality could be allowed on top of the existing solution as an additional parameter.
opened by M0arcin 1

Releases(v8.5.3)

v8.5.3(Dec 8, 2022)

Downloads: https://elastic.co/downloads/hadoop Release notes: https://www.elastic.co/guide/en/elasticsearch/hadoop/8.5/eshadoop-8.5.3.html
Source code(tar.gz)
Source code(zip)
v7.17.8(Dec 8, 2022)

Downloads: https://elastic.co/downloads/hadoop Release notes: https://www.elastic.co/guide/en/elasticsearch/hadoop/7.17/eshadoop-7.17.8.html
Source code(tar.gz)
Source code(zip)
v8.5.2(Nov 22, 2022)

Downloads: https://elastic.co/downloads/hadoop Release notes: https://www.elastic.co/guide/en/elasticsearch/hadoop/8.5/eshadoop-8.5.2.html
Source code(tar.gz)
Source code(zip)
v8.5.1(Nov 15, 2022)

Downloads: https://elastic.co/downloads/hadoop Release notes: https://www.elastic.co/guide/en/elasticsearch/hadoop/8.5/eshadoop-8.5.1.html
Source code(tar.gz)
Source code(zip)
v8.5.0(Nov 1, 2022)

Downloads: https://elastic.co/downloads/hadoop Release notes: https://www.elastic.co/guide/en/elasticsearch/hadoop/8.5/eshadoop-8.5.0.html
Source code(tar.gz)
Source code(zip)
v7.17.7(Oct 25, 2022)

Downloads: https://elastic.co/downloads/hadoop Release notes: https://www.elastic.co/guide/en/elasticsearch/hadoop/7.17/eshadoop-7.17.7.html
Source code(tar.gz)
Source code(zip)
v8.4.3(Oct 5, 2022)

Downloads: https://elastic.co/downloads/hadoop Release notes: https://www.elastic.co/guide/en/elasticsearch/hadoop/8.4/eshadoop-8.4.3.html
Source code(tar.gz)
Source code(zip)
v8.4.2(Sep 20, 2022)

Downloads: https://elastic.co/downloads/hadoop Release notes: https://www.elastic.co/guide/en/elasticsearch/hadoop/8.4/eshadoop-8.4.2.html
Source code(tar.gz)
Source code(zip)
v8.4.1(Aug 30, 2022)

Downloads: https://elastic.co/downloads/hadoop Release notes: https://www.elastic.co/guide/en/elasticsearch/hadoop/8.4/eshadoop-8.4.1.html
Source code(tar.gz)
Source code(zip)
v8.4.0(Aug 24, 2022)

Downloads: https://elastic.co/downloads/hadoop Release notes: https://www.elastic.co/guide/en/elasticsearch/hadoop/8.4/eshadoop-8.4.0.html
Source code(tar.gz)
Source code(zip)
v7.17.6(Aug 24, 2022)

Downloads: https://elastic.co/downloads/hadoop Release notes: https://www.elastic.co/guide/en/elasticsearch/hadoop/7.17/eshadoop-7.17.6.html
Source code(tar.gz)
Source code(zip)
v8.3.3(Jul 28, 2022)

Downloads: https://elastic.co/downloads/hadoop Release notes: https://www.elastic.co/guide/en/elasticsearch/hadoop/8.3/eshadoop-8.3.3.html
Source code(tar.gz)
Source code(zip)
v8.3.2(Jul 7, 2022)

Downloads: https://elastic.co/downloads/hadoop Release notes: https://www.elastic.co/guide/en/elasticsearch/hadoop/8.3/eshadoop-8.3.2.html
Source code(tar.gz)
Source code(zip)
v8.3.1(Jun 30, 2022)

Downloads: https://elastic.co/downloads/hadoop Release notes: https://www.elastic.co/guide/en/elasticsearch/hadoop/8.3/eshadoop-8.3.1.html
Source code(tar.gz)
Source code(zip)
v8.3.0(Jun 28, 2022)

Downloads: https://elastic.co/downloads/hadoop Release notes: https://www.elastic.co/guide/en/elasticsearch/hadoop/8.3/eshadoop-8.3.0.html
Source code(tar.gz)
Source code(zip)
v7.17.5(Jun 28, 2022)

Downloads: https://elastic.co/downloads/hadoop Release notes: https://www.elastic.co/guide/en/elasticsearch/hadoop/7.17/eshadoop-7.17.5.html
Source code(tar.gz)
Source code(zip)
v8.2.3(Jun 14, 2022)

Downloads: https://elastic.co/downloads/hadoop Release notes: https://www.elastic.co/guide/en/elasticsearch/hadoop/8.2/eshadoop-8.2.3.html
Source code(tar.gz)
Source code(zip)
v8.2.2(May 26, 2022)

Downloads: https://elastic.co/downloads/hadoop Release notes: https://www.elastic.co/guide/en/elasticsearch/hadoop/8.2/eshadoop-8.2.2.html
Source code(tar.gz)
Source code(zip)
v8.2.1(May 24, 2022)

Downloads: https://elastic.co/downloads/hadoop Release notes: https://www.elastic.co/guide/en/elasticsearch/hadoop/8.2/eshadoop-8.2.1.html
Source code(tar.gz)
Source code(zip)
v7.17.4(May 24, 2022)

Downloads: https://elastic.co/downloads/hadoop Release notes: https://www.elastic.co/guide/en/elasticsearch/hadoop/7.17/eshadoop-7.17.4.html
Source code(tar.gz)
Source code(zip)
v8.2.0(May 3, 2022)

Downloads: https://elastic.co/downloads/hadoop Release notes: https://www.elastic.co/guide/en/elasticsearch/hadoop/8.2/eshadoop-8.2.0.html
Source code(tar.gz)
Source code(zip)
v8.1.3(Apr 20, 2022)

Downloads: https://elastic.co/downloads/hadoop Release notes: https://www.elastic.co/guide/en/elasticsearch/hadoop/8.1/eshadoop-8.1.3.html
Source code(tar.gz)
Source code(zip)
v7.17.3(Apr 20, 2022)

Downloads: https://elastic.co/downloads/hadoop Release notes: https://www.elastic.co/guide/en/elasticsearch/hadoop/7.17/eshadoop-7.17.3.html
Source code(tar.gz)
Source code(zip)
v8.1.2(Mar 31, 2022)

Downloads: https://elastic.co/downloads/hadoop Release notes: https://www.elastic.co/guide/en/elasticsearch/hadoop/8.1/eshadoop-8.1.2.html
Source code(tar.gz)
Source code(zip)
v7.17.2(Mar 31, 2022)

Downloads: https://elastic.co/downloads/hadoop Release notes: https://www.elastic.co/guide/en/elasticsearch/hadoop/7.17/eshadoop-7.17.2.html
Source code(tar.gz)
Source code(zip)
v8.1.1(Mar 22, 2022)

Downloads: https://elastic.co/downloads/hadoop Release notes: https://www.elastic.co/guide/en/elasticsearch/hadoop/8.1/eshadoop-8.1.1.html
Source code(tar.gz)
Source code(zip)
v8.1.0(Mar 8, 2022)

Downloads: https://elastic.co/downloads/hadoop Release notes: https://www.elastic.co/guide/en/elasticsearch/hadoop/8.1/eshadoop-8.1.0.html
Source code(tar.gz)
Source code(zip)
v8.0.1(Mar 1, 2022)

Downloads: https://elastic.co/downloads/hadoop Release notes: https://www.elastic.co/guide/en/elasticsearch/hadoop/8.0/eshadoop-8.0.1.html
Source code(tar.gz)
Source code(zip)
v7.17.1(Feb 28, 2022)

Downloads: https://elastic.co/downloads/hadoop Release notes: https://www.elastic.co/guide/en/elasticsearch/hadoop/7.17/eshadoop-7.17.1.html
Source code(tar.gz)
Source code(zip)
v8.0.0(Feb 10, 2022)

Downloads: https://elastic.co/downloads/hadoop Release notes: https://www.elastic.co/guide/en/elasticsearch/hadoop/8.0/eshadoop-8.0.0.html
Source code(tar.gz)
Source code(zip)