Apache Drill is a distributed MPP query layer for self describing data

Last update: Jan 7, 2023

Overview

Apache Drill

Apache Drill is a distributed MPP query layer that supports SQL and alternative query languages against NoSQL and Hadoop data storage systems. It was inspired in part by Google's Dremel.

Developers

Please read Environment.md for setting up and running Apache Drill. For complete developer documentation see DevDocs.md.

More Information

Please see the Apache Drill Website or the Apache Drill Documentation for more information including:

Remote Execution Installation Instructions
Running Drill on Docker instructions
Information about how to submit logical and distributed physical plans
More example queries and sample data
Find out ways to be involved or discuss Drill

Join the community!

Apache Drill is an Apache Foundation project and is seeking all types of users and contributions. Please say hello on the Apache Drill mailing list.You can also join our Google Hangouts or join our Slack Channel if you need help with using or developing Apache Drill (more information can be found on Apache Drill website).

Export Control

This distribution includes cryptographic software. The country in which you currently reside may have restrictions on the import, possession, use, and/or re-export to another country, of encryption software. BEFORE using any encryption software, please check your country's laws, regulations and policies concerning the import, possession, or use, and re-export of encryption software, to see if this is permitted. See http://www.wassenaar.org/ for more information.
The U.S. Government Department of Commerce, Bureau of Industry and Security (BIS), has classified this software as Export Commodity Control Number (ECCN) 5D002.C.1, which includes information security software using or performing cryptographic functions with asymmetric algorithms. The form and manner of this Apache Software Foundation distribution makes it eligible for export under the License Exception ENC Technology Software Unrestricted (TSU) exception (see the BIS Export Administration Regulations, Section 740.13) for both object code and source code. The following provides more details on the included cryptographic software: Java SE Security packages are used to provide support for authentication, authorization and secure sockets communication. The Jetty Web Server is used to provide communication via HTTPS. The Cyrus SASL libraries, Kerberos Libraries and OpenSSL Libraries are used to provide SASL based authentication and SSL communication.

Comments

DRILL-6349: Drill JDBC driver fails on Java 1.9+ with NoClassDefFoundError: sun/misc/VM
PR allows both build and run with JDK8, JDK 10 (and, likely, JDK9). All tests, except HBase, Hive, Kafka Storage Plugin tests, works on JDK10:

HBase cannot start master: HMaster ctor fails with message "Unexpected version format: 10.0.2"

Hive cannot create HiveMetaStoreClient: ctor fails with "java.base/[Ljava.lang.Object; cannot be cast to java.base/[Ljava.net.URI;"

Kafka: KafkaFilterPushdownTest fails with errors "java.lang.NoSuchMethodError: sun.nio.ch.DirectBuffer.cleaner()Lsun/misc/Cleaner;"

Changes:

Added a DrillPlatformDependent class, which tries to read maxDirectMemory from jdk.internal.misc.VM, otherwise fallbacks to netty PlatformDependent

asm dependency updated to 6.2.1, and ReplacingInterpreter fixed

fixed List.toArray() call in FileSystemPartitionDescriptor (in JDK10 this method return Object[] and cast fails)

surefire plugin updated to 2.21.0

surefire configuration changed:

added -XX:+IgnoreUnrecognizedVMOptions

added java.se module (mostly, for java.sql module)

added -Djdk.attach.allowAttachSelf=true, required by jmockit

added locale and country settings, because format tests fails with my system locale

compiler plugin updated to 3.8.0

JarBuilder fixed for JDK10 (target 1.5 and source 1.5 not supported by javac 10)

Drill2489CallsAfterCloseThrowExceptionsTest.ThrowsClosedBulkChecker skips new methods in JDK10 (JDK9 ?) jdbc api

added jaxb-api and javax.activation dependencies, because javax.xml.bind and javax.activation modules will be removed in JDK11 (javax.activation used by jersey)

drill-config.sh and sqlline.bat changed:

added -XX:+IgnoreUnrecognizedVMOptions

added --add-modules java.se (mostly, for java.sql module)

added --add-opens java.base/jdk.internal.misc=ALL-UNNAMED (allows access to jdk.internal.misc.VM)

P.S. I am sorry for possible mistakes because of my bad English
opened by oleg-zinovev 47
DRILL-6373: Refactor Result Set Loader for Union, List support

This PR builds on the previous refactoring of the column accessors to prepare for Union, (non-repeated) List and Repeated List support. The PR includes four closely related changes divided across four commits:

Correct the Type of the Data Vector in a Nullable Vector

The nullable vectors contain a "bits" vector and a "data" vector. The data vector has historically been created using the same MaterializedField as the nullable vector, meaning that the data vector is labeled as "nullable" even though it has no bits vector.

This PR creates a clone MaterializedField with the same name as the outer nullable vector, but with a Required type.

This change ensures that the overflow logic works correctly as it uses the vector metadata (in the MaterializedField) to know what kind of vector to create for the "lookahead" vector.

Result Set Loader Refactor

The second commit pretty much just rearranges the deck chairs in a way that we an slot in the new types in the next PR. The need for the changes can be seen in the full code set (the union and list support was pulled out for this PR.)

A union is a container, like a map, so the tuple state was refactored to create a common parent container state.

List and unions are very complex to build, so the code to build the internal workings of each vector was pulled out into a separate builder class.

Projection Handling and the Vector Cache

Previous versions of the result set loader handled projection and a cache for vectors reused across readers in the same Scan operator. Once we introduce nested maps, projection within maps, unions and lists, projection gets much more complex, as does vector caching.

This PR adds logic to support projection and vector caching to any arbitrary level of maps. It turns out that handling projection of an entire map, and projection of fields within maps, is far more complex than you'd think, requiring quite a bit of internal state to keep everything straight. The result is that we can now handle a map m with three fields {a, b, c} and project just one of them, m.a, say.

Further, Drill allows projection of non-existent columns. So, we might ask for field m.d which does not exist in the above map. The projection mechanism handles this case as well, creating the right kind of null column.

Unit Tests

New tests are added to exercise the projection and cache mechanisms. Existing tests were updated for the changes made in the refactoring.

Reference Design

All of this work is done in support of the overall "batch sizing" project explained here.

opened by paul-rogers 42
DRILL-5735: UI options grouping and filtering & Metrics hints

(Note: DRILL-4699 is also resolved in this) Additional details, like the description is provided as well in a JScript lookup map. This helps reduce the need for the server to constantly recreate the entire page with the description details, as the client browser can fill in these details. Developers will be expected to update the description as old/new options are introduced or deprecated.

opened by kkhatua 39
DRILL-4653.json - Malformed JSON should not stop the entire query from progressing
https://issues.apache.org/jira/browse/DRILL-4653

The default is to stop processing as is today when JSON parser encounters an exception

Setting store.json.reader.skip_malformed_records will ensure that query progresses after skipping the bad records

Added two unit tests

Also did testing after deploying the new build: Both positive and negative tests were done.

Negative test result: org.apache.drill.common.exceptions.UserRemoteException: DATA_READ ERROR: Error parsing JSON - Unexpected character ('{' (code 123)): was expecting comma to separate OBJECT entries
opened by ssriniva123 36
DRILL-6960: AutoLimit the size of ResultSet for a WebUI (or REST) client
Fixes the bug introduced with DRILL-6050 (#1593 )

Check if query can be wrapped with a limit and provide a warning if option was selected

Switch help from onclick to a hovering tooltip

Screenshot:
opened by kkhatua 33
DRILL-6763: Codegen optimization of SQL functions with constant values

Details in DRILL-6763:

Here is the descriptions of the change: ~~1. add system option exec.optimize_function_compilation to toggle to state of this functionality.~~ 2. codegen is changed by declaring setter method in EvaluationVisitor#visitXXXconstants 3. the member declared in step2 is initialized when the instance of the class is created in XXBatch and others 4. attachment is the code of the same query mentioned in DRILL-6763 generated by setting the value of exec.optimize_function_compilation to true @arina-ielchiieva @vdiravka Would you please take a look? What kind of unit tests should be added? query.txt

opened by lushuifeng 33
DRILL-5956: Add Storage Plugin for Apache Druid
Starting work to add a connector for Apache DRUID.

Currently, supports Select queries only.

Files Reviewed:

[ ] DruidAndFilter.java

[ ] DruidBoundFilter.java

[ ] DruidCompareFunctionProcessor.java

[ ] DruidFilterBuilder.java

[ ] DruidGroupScan.java

[x] DruidScanBatchCreator.java

[x] DruidScanSpecBuilder.java ~~- [ ] DruidScanner.java~~

[x] DruidStoragePlugin.java

[x] DruidStoragePluginConfig.java

[x] DruidSubScan.java

[ ] README.md

enhancement documentation new-storage
opened by akkapur 29
DRILL-5796: Filter pruning for multi rowgroup parquet file

In ParquetFilterPredicate, replaced canDrop with ROWS_MATCH enum to keep inside rowgroup the filter result information. This information allows to prune the filter when all rows match.

opened by jbimbert 27
Drill 7882 + Drill 7883 - Fix LGTM Alerts in /common and /contrib

DRILL-7882: Fix LGTM Alerts in common folder

#DRILL-7883: Fix LGTM Alerts in contrib folder

Done

https://lgtm.com/projects/g/apache/drill/snapshot/5ba93040059efc36da020f3cfd1ad31489e2e55e/files/common/src/main/java/org/apache/drill/common/config/DrillProperties.java?sort=name&dir=ASC&mode=heatmap DrillProperties.java line 115 Added synchronized keyword to method

https://lgtm.com/projects/g/apache/drill/snapshot/5ba93040059efc36da020f3cfd1ad31489e2e55e/files/common/src/main/java/org/apache/drill/common/HistoricalLog.java?sort=name&dir=ASC&mode=heatmap HistoricalLog.java Line 122 & 129 Suppressed b/c they were comments

in format-excel

https://lgtm.com/projects/g/apache/drill/snapshot/5ba93040059efc36da020f3cfd1ad31489e2e55e/files/contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java#x4f0d3a45123fb50b:1 excelbatchreader.java line 280 checked if datacell is null before switch case statement

in format-hdf5

https://lgtm.com/projects/g/apache/drill/snapshot/5ba93040059efc36da020f3cfd1ad31489e2e55e/files/contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/HDF5BatchReader.java?sort=name&dir=ASC&mode=heatmap HDF5BatchReader.java line 593 Changed {} to %s so that format call works

https://lgtm.com/projects/g/apache/drill/snapshot/5ba93040059efc36da020f3cfd1ad31489e2e55e/files/contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/writers/HDF5ByteDataWriter.java?sort=name&dir=ASC&mode=heatmap HDF5ByteDataWriter.java line 71 Despite the fact the counter only runs after, and that if write() would run again it would return false due to the if statement beforehand. LGTM still picks up on it. Thus, I have used a try and catch statement as a way to avoid the alert (although I have no way of testing unless its on the main repo).

https://lgtm.com/projects/g/apache/drill/snapshot/5ba93040059efc36da020f3cfd1ad31489e2e55e/files/contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/writers/HDF5DoubleDataWriter.java?sort=name&dir=ASC&mode=heatmap HDF5DoubleDataWriter.java line 69 Same as above

https://lgtm.com/projects/g/apache/drill/snapshot/5ba93040059efc36da020f3cfd1ad31489e2e55e/files/contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/writers/HDF5FloatDataWriter.java?sort=name&dir=ASC&mode=heatmap HDF5FloatDataWriter.java line 69 same as above

https://lgtm.com/projects/g/apache/drill/snapshot/5ba93040059efc36da020f3cfd1ad31489e2e55e/files/contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/writers/HDF5IntDataWriter.java?sort=name&dir=ASC&mode=heatmap HDF5IntDataWriter.java line 70 same as above

https://lgtm.com/projects/g/apache/drill/snapshot/5ba93040059efc36da020f3cfd1ad31489e2e55e/files/contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/writers/HDF5LongDataWriter.java?sort=name&dir=ASC&mode=heatmap HDF5LongDataWriter.java line 69 same as above

https://lgtm.com/projects/g/apache/drill/snapshot/5ba93040059efc36da020f3cfd1ad31489e2e55e/files/contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/writers/HDF5SmallIntDataWriter.java?sort=name&dir=ASC&mode=heatmap HDF5SmallIntDataWriter.java line 71 same as above

https://lgtm.com/projects/g/apache/drill/snapshot/5ba93040059efc36da020f3cfd1ad31489e2e55e/files/contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/writers/HDF5TimestampDataWriter.java?sort=name&dir=ASC&mode=heatmap HDF5TimeSTampDataWriter line 48 same as above

in format-img

https://lgtm.com/projects/g/apache/drill/snapshot/5ba93040059efc36da020f3cfd1ad31489e2e55e/files/contrib/format-image/src/main/java/org/apache/drill/exec/store/image/GenericMetadataDescriptor.java?sort=name&dir=ASC&mode=heatmap GenericMetadataDescriptor.java line 82,83,84 Converted type from Integer to int (didnt seem like there was a need for Integer Class)

https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/format-image/src/main/java/org/apache/drill/exec/store/image/ImageDirectoryProcessor.java?sort=name&dir=ASC&mode=heatmap ImageDirectoryProcessor.java line 124 Suppressed, needs to be initialized with an arbitrary value (so keep it with null)

in format-maprdb

https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/format-maprdb/src/main/java/org/apache/drill/exec/planner/index/MapRDBStatistics.java?sort=name&dir=ASC&mode=heatmap line 777 Added if (pattern != null) statement to avoid potential NPE error

line 801 same as above

line 807 if statement but for escape

https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/binary/BinaryTableGroupScan.java?sort=name&dir=ASC&mode=heatmap BinaryTableGroupScan.java line 190 To avoid int overflow, made numColumns a long variable

https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/JsonTableGroupScan.java?sort=name&dir=ASC&mode=heatmap JsonTableGroupScan JsonTableGroupScan.java Line 380 boolean includes scanSpec != null (if spanspec is null then scanspec.getserializedfilter() would also be null)

line 493 suppressed because its comments

line 520 The 5th format call is for the estimated size, but there is no fn that gets/determines the estimated size... For now I put in "Can't determine estimated size" left it empty

line 527, 528, 541, 542, 632 All are comments, suppressed

https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/MaprDBJsonRecordReader.java?sort=name&dir=ASC&mode=heatmap MaprDBJsonRecordReader.java Line 431 Changed suppression to suggestion that CodeQL ppl suggested document == null || document.asReader() == null ? ...

https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBFormatPluginConfig.java?sort=name&dir=ASC&mode=heatmap MapRDBFormatPluginConfig.java Line 28 There is no equals function (there is impEquals, but is that the same thing?) but there is an overridden hashcode function. Again, I don't know what it's referring to.

https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBGroupScan.java?sort=name&dir=ASC&mode=heatmap MapRDBGroupScan.java line 255 Format call had wrong syntax (for format(), use % not {} to take in args)

line 324 It already logs an error if null so there is no point in catching NPE, suppressed

In format-xml

https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/format-xml/src/main/java/org/apache/drill/exec/store/xml/XMLBatchReader.java?sort=name&dir=ASC&mode=heatmap XMLBatchReader.java line 94 changed {} to %s

In storage-druid

https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/storage-druid/src/main/java/org/apache/drill/exec/store/druid/DruidGroupScan.java?sort=name&dir=ASC&mode=heatmap DruidGroupScan.java line 201 Added L for long type specification

In storage-hbase

https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseUtils.java?sort=name&dir=ASC&mode=heatmap HBaseUtils.java line 79 Imported java.utils.Arrays to directly convert filterBytes to a string so that it does not implicitly convert it in the error msg

https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/config/HBasePersistentStore.java?sort=name&dir=ASC&mode=heatmap HBasePersistentStore.java line 201 suppressed b/c its in try catch statement

In storage-kudu

https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/storage-kudu/src/main/java/org/apache/drill/exec/store/kudu/KuduGroupScan.java?sort=name&dir=ASC&mode=heatmap KuduGroupScan.java line 210 Added L for long type specification

https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/udfs/src/main/java/org/apache/drill/exec/udfs/NetworkFunctions.java?sort=name&dir=ASC&mode=heatmap NetworkFunctions.java Line 434 Multiplied long with assignment so that there is no implicit conversion

TO-DO (revise) / ASK FOR HELP

https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/JsonTableRangePartitionFunction.java?sort=name&dir=ASC&mode=heatmap JsonTableRangePartitionFunction.java Line 46 There is an overridden equals function but no hashcode function so I don't know what it's referring to Suppressed

In storage-kafka

https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaPartitionScanSpec.java?sort=name&dir=ASC&mode=heatmap KafkaPartitionScanSpec.java line 25 There is no hashcode fn, if there is no fn, then there's no need for it, suppressed

In udfs

https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/udfs/src/main/java/org/apache/drill/exec/udfs/CryptoFunctions.java?sort=name&dir=ASC&mode=heatmap CryptoFunctions.java line 288 Alert recommends using AES, but code already uses AES encryption. Maybe the alternatives are weak? Suppressed for now.

line 339 Same reason as above

In storage-splunk

https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/storage-splunk/src/main/java/org/apache/drill/exec/store/splunk/SplunkBatchReader.java?sort=name&dir=ASC&mode=heatmap SplunkBatchReader.java line 232 Very unsure. I am not aware of what the contents are so I can't really analyze this. Suppressed for now.

https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBTableCache.java?sort=name&dir=ASC&mode=heatmap MapRDBTableCache.java Line 73 If table is null, I believe that it should just throw an NPE regardless (especially if its required in maprdb altho im not entirely sure), it's suppressed

perhaps i could use a logger.debug() or something like that

https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/streams/StreamsFormatPluginConfig.java?sort=name&dir=ASC&mode=heatmap StreamsFormatpluginConfig.javas line 27 Suppressed, hashcode and impequals are both overridden. From what I understand about the alert is trying to find hashcode and equals fn, but because it cant find equals (which is just impequals) it assumes that its not overridden

https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/TableFormatPluginConfig.java?sort=name&dir=ASC&mode=heatmap TableFormatPluginConfig.java Line 22 Same reason as above

https://lgtm.com/projects/g/apache/drill/snapshot/5ba93040059efc36da020f3cfd1ad31489e2e55e/files/common/src/main/java/org/apache/drill/common/exceptions/UserException.java?sort=name&dir=ASC&mode=heatmap UserException.java Line 481 Suppressed first b/c suggested solution was not valid, format fn needs those 2 parameters

Line 643 It's already in a try statement alert suppressed

Documentation

N/A

Testing

Due to how LGTM alerts work, if it isnt on the actual project, we can't see if it works.
code-cleanup

opened by eevanwong 25
DRILL-7534: Convert HTTPD Format Plugin to EVF
DRILL-7534: Convert HTTPD Format Plugin to EVF

Description

This PR updates the HTTPD format plugin to use the Enhanced Vector Framework (EVF). In theory there are few changes a user might notice.

A new configuration option maxErrors has been added which will allow a user to tune how fault tolerant they want Drill to be when reading log files.

Two new implicit fields have been added, _raw and _matched. They are described in the docs below.

The plugin now includes a limit pushdown which significantly improves query times for queries with limits.

The plugin code is now in the contrib folder.

Added a flattenWildcards option which allows the user to flatten nested fields.

This PR also refactors the code and includes some optimizations which should, in theory, result in faster queries.

In addition, this PR updates the associated User Agent parsing functions with the latest version of the underlying libraries.

Documentation

Web Server Log Format Plugin (HTTPD)

This plugin enables Drill to read and query httpd (Apache Web Server) and nginx logs natively. This plugin uses the work by Niels Basjes which is available here: https://github.com/nielsbasjes/logparser.

Configuration

There are three fields which you will need to configure in order for Drill to read web server logs which are:

logFormat: The log format string is the format string found in your web server configuration.

timestampFormat: The format of time stamps in your log files.

extensions: The file extension of your web server logs.

maxErrors: Sets the plugin error tolerence. When set to any value less than 0, Drill will ignore all errors.

flattenWildcards: Flattens nested fields

"httpd" : { "type" : "httpd", "logFormat" : "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-agent}i\"", "timestampFormat" : "dd/MMM/yyyy:HH:mm:ss ZZ", "maxErrors": 0, "flattenWildcards": false }

Implicit Columns

Data queried by this plugin will return two implicit columns:

_raw: This returns the raw, unparsed log line

_matched: Returns true or false depending on whether the line matched the config string.

Thus, if you wanted to see which lines in your log file were not matching the config, you could use the following query:

SELECT _raw FROM <data> WHERE _matched = false

Testing

Added additional unit tests for this plugin. Ran all unit tests for the parse_user_agent() UDF as well.
refactoring
opened by cgivre 24
DRILL-8037: Add V2 JSON Format Plugin based on EVF
DRILL-8037: Add V2 JSON Format Plugin based on EVF

Description

This adds new V2 beta JSON Format Plugin based on the "Extended Vector Framework". This is follow up DRILL-6953 (was closed with the decision to merge it by small pieces). So it is based on #1913 and rev2 work.

Documentation

"V1" - "legacy" reader "V2" - new beta version JSON Format Plugin based on the result set loader. The V2 version is a bit more robust, and supports the row set framework. However, V1 supports unions and reading corrupted JSON files.

The new "V2" JSON scan is controlled by a new option: store.json.enable_v2_reader, which is true by default in this PR.

Adds a "projection type" to the column writer so that the JSON parser can receive a "hint" as to the expected type. The hint is from the form of the projected column: a[0], a.b or just a. Therefore it supports schema provision. Example:

ALTER SESSION SET `store.json.enable_v2_reader` = true; apache drill (dfs.tmp)> select * from test; +---------------+-------+---+ | a | e | f | +---------------+-------+---+ | {"b":1,"c":1} | false | 1 | | {"b":1,"c":1} | null | 2 | | {"b":1,"c":1} | true | 3 | +---------------+-------+---+ apache drill (dfs.tmp)> create or replace schema (`e` BOOLEAN default 'false', `new` VARCHAR not null default 'schema evolution') for table test; apache drill (dfs.tmp)> select * from test; +-------+------------------+---------------+---+ | e | new | a | f | +-------+------------------+---------------+---+ | false | schema evolution | {"b":1,"c":1} | 1 | | null | schema evolution | {"b":1,"c":1} | 2 | | true | schema evolution | {"b":1,"c":1} | 3 | +-------+------------------+---------------+---+

Testing

A lot of existing test cases are running for both readers. It is needed until V1 is still present in Drill code.
json
opened by vdiravka 23
DRILL-5033: Query on JSON That Has Null as Value For Each Key
DRILL-5033: Query on JSON That Has Null as Value For Each Key

Description

Drill returns same result with or without store.json.all_text_mode=true Note that each key in the JSON has null as its value. [root@cent01 null_eq_joins]# cat right_all_nulls.json

{ "intKey" : null, "bgintKey": null, "strKey": null, "boolKey": null, "fltKey": null, "dblKey": null, "timKey": null, "dtKey": null, "tmstmpKey": null, "intrvldyKey": null, "intrvlyrKey": null }

Querying the above JSON file results in null as query result. We should see each of the keys in the JSON as a column in query result. And in each column the value should be a null value. Current behavior does not look right.

0: jdbc:drill:schema=dfs.tmp> select * from `right_all_nulls.json`; +-------+ | * | +-------+ | null | +-------+ 1 row selected (0.313 seconds)

Documentation

(Please describe user-visible changes similar to what should appear in the Drill documentation.)

Testing

(Please describe how this PR has been tested.)
opened by unical1988 5
DRILL-8376: Add Distribution UDFs
DRILL-8376: Add Distribution UDFs

Description

This PR adds several new UDFs to help with statistical analysis. They are width_bucket which mirrors the functionality of the POSTGRES function of the same name. (https://www.oreilly.com/library/view/sql-in-a/9780596155322/re91.html). This function is useful for building histograms of data.

This also adds the kendall_correlation, regr_slope, regr_intercept functions which are two function for calculating correlation coefficients of two columns.

Documentation

Drill has several functions for correlations and understanding the distribution of your data.

The functions are:

width_bucket(value, min, max, buckets): Useful for crafting histograms and understanding distributions of continuous variables.

kendall_correlation(col1, col2): Calculates the kendall correlation coefficient of two columns within a dataset.

regr_slope(x,y): Determines the slope of the least-squares-fit linear equation

regr_intercept(x,y): Computes the y-intercept of the least-squares-fit linear equation

Testing

Added unit tests.
enhancement doc-impacting minor-update udf
opened by cgivre 0

DRILL-8372: Unfreed buffers when running a LIMIT 0 query over delimited text

Description

With the following data layout

/tmp/foo/bar:
large_csv.csvh
/tmp/foo/boo:
large_csv.csvh

a LIMIT 0 query over it results in unfreed buffer errors as shown below.

apache drill (dfs.tmp)> select * from `foo` limit 0;
Error: SYSTEM ERROR: IllegalStateException: Allocator[op:0:0:4:EasySubScan] closed with outstanding buffers allocated (3).
Allocator(op:0:0:4:EasySubScan) 1000000/299008/3182592/10000000000 (res/actual/peak/limit)
  child allocators: 0
  ledgers: 3
    ledger[113] allocator: op:0:0:4:EasySubScan), isOwning: true, size: 262144, references: 1, life: 277785186322881..0, allocatorManager: [109, life: 277785186258906..0] holds 1 buffers.
        DrillBuf[142], udle: [110 0..262144]
    ledger[114] allocator: op:0:0:4:EasySubScan), isOwning: true, size: 32768, references: 1, life: 277785186463824..0, allocatorManager: [110, life: 277785186414654..0] holds 1 buffers.
        DrillBuf[143], udle: [111 0..32768]
    ledger[112] allocator: op:0:0:4:EasySubScan), isOwning: true, size: 4096, references: 1, life: 277785186046095..0, allocatorManager: [108, life: 277785185921147..0] holds 1 buffers.
        DrillBuf[141], udle: [109 0..4096]
  reservations: 0

Documentation

N/A

Testing

TODO

bug

opened by jnturton 0

Failed to execute an insert statement across the database
Failed to execute an insert statement across the database.

Steps to reproduce the behavior:

Prepare the mysql and postgresql table structures and data. mysql： create table t1(c1 int, c2 int); insert into t1 values (1,1), (2,2); postgres: create table t1(c1 int, c2 int);

Create mysql and postgresql Plugins in Storage label via http://localhost:8047/storage pages.

Execute the following sql statement using sqlline. Jupiter> insert into pg.public.t1 select c1, c2 from mysql.test.t1; Error: DATA_READ ERROR: The JDBC storage plugin failed while trying setup the SQL query. Sql: INSERT INTO public.t1 (c1, c2) (SELECT * FROM test.t1) Fragment: 0:0 [Error Id: a5b3ee38-38f5-4945-afda-8a7d4746df4c on DESKTOP-PHHB7LC:31010] (state=,code=0)

The execution plan in the log is as follows： 2022-12-19 14:36:32,447 [1c5ffc63-764a-c5ab-4d06-90e658b8e132:foreman] DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - Drill Physical: 00-00 Screen : rowType = RecordType(BIGINT ROWCOUNT): rowcount = 1.0E9, cumulative cost = {1.1E9 rows, 1.1E9 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 614 00-01 Jdbc(sql=[INSERT INTO public.t1 (c1, c2) (SELECT * FROM test.t1) ]) : rowType = RecordType(BIGINT ROWCOUNT): rowcount = 1.0E9, cumulative cost = {1.0E9 rows, 1.0E9 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 604
opened by weijunlu 8
select * from hive ；report refcnt = 0 error

create table in hive schame by drill as follows:

create table hive.t (a string ,b string, c string) insert value lengths should longer than 256 words eg:

insert into hive.t values("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb","ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc");

and then select * from hive.t

report error as follows:

INTERNAL_ERROR ERROR: refcnt: 0

opened by yaozhu 4

When use muti data source join query join condition in where clause failed cause by missing conversions

The version information is as follows:

    apache drill> select commit_message, commit_time from sys.version;
    +----------------------------------------------------------------------------------+---------------------------+
    |                                  commit_message                                  |        commit_time        |
    +----------------------------------------------------------------------------------+---------------------------+
    | DRILL-8357: Add new config options to the Splunk storage plugin (extra docs) (#2706) | 15.11.2022 @ 20:34:55 CST |
    +----------------------------------------------------------------------------------+---------------------------+
    1 row selected (0.133 seconds)

Create tables and insert data in postgresql and mysql database

    create table t1(col1 int);
    insert into t1 values(1), (2);

Create mysql and postgresql Plugins in Storage label via http://localhost:8047/storage pages
The sql statement is executed as follows

  apache drill> select * from pgsql.t1 as pg ,my.test.t1 as my where pg.col1 = my.col1 and  pg.col1 = 1 and my.col1 =1 ;
  Error: UNSUPPORTED_OPERATION ERROR: This query cannot be planned possibly due to either a cartesian join or an inequality join. 
  If a cartesian or inequality join is used intentionally, set the option 'planner.enable_nljoin_for_scalar_only' to false and try again.
  
  
  [Error Id: 91583f69-5aa6-44d5-a29c-63e2358932ea ] (state=,code=0)
  apache drill> set planner.enable_nljoin_for_scalar_only=false;
  +------+------------------------------------------------+
  |  ok  |                    summary                     |
  +------+------------------------------------------------+
  | true | planner.enable_nljoin_for_scalar_only updated. |
  +------+------------------------------------------------+
  1 row selected (0.212 seconds)
  apache drill> select * from pgsql.t1 as pg ,my.test.t1 as my where pg.col1 = my.col1 and  pg.col1 = 1 and my.col1 =1 ;
  Error: UNSUPPORTED_OPERATION ERROR: This query cannot be planned possibly due to either a cartesian join or an inequality join. 
  If a cartesian or inequality join is used intentionally, set the option 'planner.enable_nljoin_for_scalar_only' to false and try again.
  
  
  [Error Id: 37d0dbca-40d1-4de5-9443-b48ce3a172c0 ] (state=,code=0)
  apache drill>

The below is drillbit.log info

  2022-11-18 08:08:01,169 [1c88c29e-3040-efc1-46a6-33152cfbab32:foreman] INFO  o.a.drill.exec.work.foreman.Foreman - Query text for query with id 1c88c29e-3040-efc1-46a6-33152cfbab32 issued by test: select * from pgsql.t1 as pg ,my.test.t1 as my where pg.col1 = my.col1 and  pg.col1 = 1 and my.col1 =1 
  2022-11-18 08:08:01,258 [1c88c29e-3040-efc1-46a6-33152cfbab32:foreman] ERROR o.a.d.e.p.s.h.DefaultSqlHandler - There are not enough rules to produce a node with desired properties: convention=PHYSICAL, DrillDistributionTraitDef=SINGLETON([]), sort=[].
  Missing conversions are JdbcFilter[convention: JDBC.pgsql -> JDBC.my], JdbcFilter[convention: JDBC.my -> JDBC.pgsql]
  There are 2 empty subsets:
  Empty subset 0: rel#15241:RelSubset#12.JDBC.my.ANY([]).[], the relevant part of the original plan is as follows
  15211:JdbcFilter(condition=[=($0, 1)])
    14914:JdbcTableScan(subset=[rel#15210:RelSubset#11.JDBC.pgsql.ANY([]).[]], table=[[pgsql, t1]])
  
  Empty subset 1: rel#15247:RelSubset#15.JDBC.pgsql.ANY([]).[], the relevant part of the original plan is as follows
  15216:JdbcFilter(condition=[=($0, 1)])
    14915:JdbcTableScan(subset=[rel#15215:RelSubset#14.JDBC.my.ANY([]).[]], table=[[my, test, t1]])
  
  Root: rel#15224:RelSubset#18.PHYSICAL.SINGLETON([]).[]
  Original rel:
  LogicalProject(subset=[rel#14956:RelSubset#4.LOGICAL.ANY([]).[]], col1=[$0], col10=[$1]): rowcount = 3.375E15, cumulative cost = {3.375E15 rows, 6.75E15 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 14954
    LogicalFilter(subset=[rel#14953:RelSubset#3.NONE.ANY([]).[]], condition=[AND(=($0, $1), =($0, 1), =($1, 1))]): rowcount = 3.375E15, cumulative cost = {3.375E15 rows, 1.0E18 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 14952
      LogicalJoin(subset=[rel#14951:RelSubset#2.NONE.ANY([]).[]], condition=[true], joinType=[inner]): rowcount = 1.0E18, cumulative cost = {1.0E18 rows, 0.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 14950
        JdbcTableScan(subset=[rel#14948:RelSubset#0.JDBC.pgsql.ANY([]).[]], table=[[pgsql, t1]]): rowcount = 1.0E9, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 14914
        JdbcTableScan(subset=[rel#14949:RelSubset#1.JDBC.my.ANY([]).[]], table=[[my, test, t1]]): rowcount = 1.0E9, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 14915
  
  Sets:
  Set#11, type: RecordType(INTEGER col1)
	  rel#15210:RelSubset#11.JDBC.pgsql.ANY([]).[], best=rel#14914
		  rel#14914:JdbcTableScan.JDBC.pgsql.ANY([]).[](table=[pgsql, t1]), rowcount=1.0E9, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io, 0.0 network, 0.0 memory}
	  rel#15227:RelSubset#11.LOGICAL.ANY([]).[], best=rel#15226
		  rel#15226:VertexDrel.LOGICAL.ANY([]).[](input=RelSubset#15210), rowcount=1.0E9, cumulative cost={1.000001E8 rows, 1.00000101E8 cpu, 0.0 io, 0.0 network, 0.0 memory}
	  rel#15254:RelSubset#11.PHYSICAL.SINGLETON([]).[], best=rel#15253
		  rel#15253:JdbcIntermediatePrel.PHYSICAL.SINGLETON([]).[](input=RelSubset#15210), rowcount=1.0E9, cumulative cost={1.0000001E9 rows, 1.000000101E9 cpu, 0.0 io, 0.0 network, 0.0 memory}
  Set#12, type: RecordType(INTEGER col1)
	  rel#15212:RelSubset#12.JDBC.pgsql.ANY([]).[], best=rel#15211
		  rel#15211:JdbcFilter.JDBC.pgsql.ANY([]).[](input=RelSubset#15210,condition==($0, 1)), rowcount=1.5E8, cumulative cost={1.500001E8 rows, 1.000000101E9 cpu, 0.0 io, 0.0 network, 0.0 memory}
		  rel#15246:AbstractConverter.JDBC.pgsql.ANY([]).[](input=RelSubset#15232,convention=JDBC.pgsql,DrillDistributionTraitDef=ANY([]),sort=[]), rowcount=1.5E8, cumulative cost={inf}
	  rel#15230:RelSubset#12.LOGICAL.ANY([]).[], best=rel#15213
		  rel#15213:VertexDrel.LOGICAL.ANY([]).[](input=RelSubset#15212), rowcount=1.5E8, cumulative cost={1.650001E8 rows, 1.015000101E9 cpu, 0.0 io, 0.0 network, 0.0 memory}
		  rel#15258:VertexDrel.LOGICAL.ANY([]).[](input=RelSubset#15241), rowcount=1.5E8, cumulative cost={inf}
	  rel#15232:RelSubset#12.PHYSICAL.SINGLETON([]).[], best=rel#15231
		  rel#15231:JdbcIntermediatePrel.PHYSICAL.SINGLETON([]).[](input=RelSubset#15212), rowcount=1.5E8, cumulative cost={3.000001E8 rows, 1.150000101E9 cpu, 0.0 io, 0.0 network, 0.0 memory}
		  rel#15271:JdbcIntermediatePrel.PHYSICAL.SINGLETON([]).[](input=RelSubset#15241), rowcount=1.5E8, cumulative cost={inf}
	  rel#15241:RelSubset#12.JDBC.my.ANY([]).[], best=null
		  rel#15242:AbstractConverter.JDBC.my.ANY([]).[](input=RelSubset#15232,convention=JDBC.my,DrillDistributionTraitDef=ANY([]),sort=[]), rowcount=1.5E8, cumulative cost={inf}
  Set#14, type: RecordType(INTEGER col1)
	  rel#15215:RelSubset#14.JDBC.my.ANY([]).[], best=rel#14915
		  rel#14915:JdbcTableScan.JDBC.my.ANY([]).[](table=[my, test, t1]), rowcount=1.0E9, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io, 0.0 network, 0.0 memory}
	  rel#15235:RelSubset#14.LOGICAL.ANY([]).[], best=rel#15234
		  rel#15234:VertexDrel.LOGICAL.ANY([]).[](input=RelSubset#15215), rowcount=1.0E9, cumulative cost={1.000001E8 rows, 1.00000101E8 cpu, 0.0 io, 0.0 network, 0.0 memory}
	  rel#15256:RelSubset#14.PHYSICAL.SINGLETON([]).[], best=rel#15255
		  rel#15255:JdbcIntermediatePrel.PHYSICAL.SINGLETON([]).[](input=RelSubset#15215), rowcount=1.0E9, cumulative cost={1.0000001E9 rows, 1.000000101E9 cpu, 0.0 io, 0.0 network, 0.0 memory}
  Set#15, type: RecordType(INTEGER col1)
	  rel#15217:RelSubset#15.JDBC.my.ANY([]).[], best=rel#15216
		  rel#15216:JdbcFilter.JDBC.my.ANY([]).[](input=RelSubset#15215,condition==($0, 1)), rowcount=1.5E8, cumulative cost={1.500001E8 rows, 1.000000101E9 cpu, 0.0 io, 0.0 network, 0.0 memory}
		  rel#15243:AbstractConverter.JDBC.my.ANY([]).[](input=RelSubset#15240,convention=JDBC.my,DrillDistributionTraitDef=ANY([]),sort=[]), rowcount=1.5E8, cumulative cost={inf}
	  rel#15238:RelSubset#15.LOGICAL.ANY([]).[], best=rel#15218
		  rel#15218:VertexDrel.LOGICAL.ANY([]).[](input=RelSubset#15217), rowcount=1.5E8, cumulative cost={1.650001E8 rows, 1.015000101E9 cpu, 0.0 io, 0.0 network, 0.0 memory}
		  rel#15268:VertexDrel.LOGICAL.ANY([]).[](input=RelSubset#15247), rowcount=1.5E8, cumulative cost={inf}
	  rel#15240:RelSubset#15.PHYSICAL.SINGLETON([]).[], best=rel#15239
		  rel#15239:JdbcIntermediatePrel.PHYSICAL.SINGLETON([]).[](input=RelSubset#15217), rowcount=1.5E8, cumulative cost={3.000001E8 rows, 1.150000101E9 cpu, 0.0 io, 0.0 network, 0.0 memory}
		  rel#15274:JdbcIntermediatePrel.PHYSICAL.SINGLETON([]).[](input=RelSubset#15247), rowcount=1.5E8, cumulative cost={inf}
	  rel#15247:RelSubset#15.JDBC.pgsql.ANY([]).[], best=null
		  rel#15248:AbstractConverter.JDBC.pgsql.ANY([]).[](input=RelSubset#15240,convention=JDBC.pgsql,DrillDistributionTraitDef=ANY([]),sort=[]), rowcount=1.5E8, cumulative cost={inf}
  Set#17, type: RecordType(INTEGER col1, INTEGER col10)
	  rel#15221:RelSubset#17.LOGICAL.ANY([]).[], best=rel#15220
		  rel#15220:DrillJoinRel.LOGICAL.ANY([]).[](left=RelSubset#15230,right=RelSubset#15238,condition=true,joinType=inner), rowcount=2.25E16, cumulative cost={6.300002E8 rows, 2.030000202E9 cpu, 0.0 io, 0.0 network, 1.32E9 memory}
		  rel#15263:VertexDrel.LOGICAL.ANY([]).[](input=RelSubset#15245), rowcount=2.25E16, cumulative cost={inf}
		  rel#15270:VertexDrel.LOGICAL.ANY([]).[](input=RelSubset#15250), rowcount=2.25E16, cumulative cost={inf}
	  rel#15245:RelSubset#17.JDBC.my.ANY([]).[], best=null
		  rel#15244:JdbcJoin.JDBC.my.ANY([]).[](left=RelSubset#15241,right=RelSubset#15217,condition=true,joinType=inner), rowcount=1.5E8, cumulative cost={inf}
	  rel#15250:RelSubset#17.JDBC.pgsql.ANY([]).[], best=null
		  rel#15249:JdbcJoin.JDBC.pgsql.ANY([]).[](left=RelSubset#15212,right=RelSubset#15247,condition=true,joinType=inner), rowcount=1.5E8, cumulative cost={inf}
	  rel#15251:RelSubset#17.PHYSICAL.SINGLETON([]).[], best=null
		  rel#15273:JdbcIntermediatePrel.PHYSICAL.SINGLETON([]).[](input=RelSubset#15245), rowcount=2.25E16, cumulative cost={inf}
		  rel#15276:JdbcIntermediatePrel.PHYSICAL.SINGLETON([]).[](input=RelSubset#15250), rowcount=2.25E16, cumulative cost={inf}
  Set#18, type: RecordType(INTEGER col1, INTEGER col10)
	  rel#15223:RelSubset#18.LOGICAL.ANY([]).[], best=rel#15222
		  rel#15222:DrillScreenRel.LOGICAL.ANY([]).[](input=RelSubset#15221), rowcount=2.25E16, cumulative cost={2.2500006300002E15 rows, 2.250002030000202E15 cpu, 0.0 io, 0.0 network, 1.32E9 memory}
	  rel#15224:RelSubset#18.PHYSICAL.SINGLETON([]).[], best=null
		  rel#15252:ScreenPrel.PHYSICAL.SINGLETON([]).[](input=RelSubset#15251), rowcount=2.25E16, cumulative cost={inf}

When i modify query use inner/right/full join, the query run normally

  apache drill> select * from pgsql.t1 as pg inner join my.wubq.t1 as my on pg.col1 = my.col1 where pg.col1 = 1 and my.col1 =1 ;
  +------+-------+
  | col1 | col10 |
  +------+-------+
  | 1    | 1     |
  +------+-------+
  1 row selected (0.644 seconds)
  apache drill>

opened by Javelin2007 2

Releases(drill-1.20.2)

drill-1.20.2(Aug 3, 2022)

Apache Drill 1.20.2 is a patch release in the 1.20.x series. The release notes list the fixes it includes.
Source code(tar.gz)
Source code(zip)
drill-1.20.1(Aug 3, 2022)

Apache Drill 1.20.1 is a patch release in the 1.20.x series. The release notes list the fixes it includes.
Source code(tar.gz)
Source code(zip)
drill-1.20.0(Feb 25, 2022)

Apache Drill 1.20's highlights are: the Apache Phoenix storage plugin with impersonation support, the Apache Iceberg format plugin, expanded push down support for MongoDB, persistent table and storage aliases, a release for Hadoop 2 environments, write support in the JDBC storage plugin, SAS and PDF format plugins, pagination and OAuth support in the HTTP storage plugin, a HashiCorp Vault authentication and credential storage providers and read/write support for all compression codecs and both format versions of Parquet.
Source code(tar.gz)
Source code(zip)
drill-1.19.0(Jun 15, 2021)
Apache Drill 1.19's highlights are:

DRILL-92 - Cassandra Storage Plugin

DRILL-3637 - Elasticsearch Storage Plugin

DRILL-7823 - XML Storage Plugin

DRILL-7751 - Splunk Storage Plugin

DRILL-5940 - Avro with schema registry support for Kafka

DRILL-7855 - Secure mechanism for specifying storage plugin credentials

DRILL-7921 - Linux ARM64 based system support

DRILL-6953 - Rowset based JSON reader

DRILL-7733 - Use streaming for REST JSON queries

Several plugins have been converted to the Enhanced Vector Framework (EVF)

DRILL-7525 - Convert SequenceFiles to EVF

DRILL-7532 - Convert SysLog to EVF

DRILL-7533 - Convert Pcapng to EVF

DRILL-7534 - Convert HTTPD format plugin to EVF

DRILL-7533 - Convert Image Format to EVF

Source code(tar.gz)
Source code(zip)
drill-1.18.0(Sep 5, 2020)

Apache Drill 1.18's highlights are: Drill Metadata management "Drill Metastore", Format Plugin for HDF5, Support for DICT type in RowSet Framework, Storage Plugin for Generic HTTP REST API, Dynamic credit based flow control, Support for injecting BufferManager into UDF, Drill RDBMS Metastore
Source code(tar.gz)
Source code(zip)
drill-1.17.0(Dec 26, 2019)

Apache Drill 1.17's highlights are: Hive complex types support, Upgrade to HADOOP-3.2, Schema Provision using File / Table Function, Drill Metastore support, Excel and ESRI Shapefile (shp) format plugins support, Parquet runtime row group pruning, empty Parquet files support, User-Agent UDFs, canonical Map<K,V> support, and more.
Source code(tar.gz)
Source code(zip)
drill-1.15.0(Dec 31, 2018)

Apache Drill 1.15's highlights are: SQLLine upgrade, index support, ability to secure znodes with custom ACLs, INFORMATION_SCHEMA files table, systemfunctions table, and more.
Source code(tar.gz)
Source code(zip)
drill-1.14.0(Dec 31, 2018)

Apache Drill 1.14's highlights are: Ability to run Drill in a Docker container, ability to export and save storage plugin configurations, a storage plugin configuration file to manage storage plugin configurations, an image metadata format plugin, option to set Hive properties at the session level, and more.
Source code(tar.gz)
Source code(zip)
drill-1.13.0(Dec 31, 2018)

Apache Drill 1.13's highlights are: YARN support, support for HTTP Kerberos authentication using SPNEGO, SQL syntax highlighting of queries, and user and distribution specific configuration checks during startup.
Source code(tar.gz)
Source code(zip)
drill-1.12.0(Dec 31, 2018)

Apache Drill 1.12's highlights are: Kafka and OpenTSDB storage plugins, SSL and network encryption support, queue-based memory assignment for buffering operators, networking functions, and the ability to prevent users from accessing paths outside the root of a workspace.
Source code(tar.gz)
Source code(zip)
drill-1.11.0(Dec 31, 2018)

Apache Drill 1.11's highlights are: Cryptography-related functions, spill to disk for the hash aggregate operator, Format plugin support for PCAP files, ability to change the HDFS block Size for Parquet files, ability to store query profiles in memory, configurable CTAS directory and file permissions option, support for network encryption, relative paths stored in the metadata file, and support for ANSI_QUOTES.
Source code(tar.gz)
Source code(zip)
drill1.10.0(Dec 31, 2018)

Apache Drill 1.10's highlights are: CTTAS, improved fault tolerance, Drill version and statistics in Web Console, implicit interpretation of INT96, and Kerberos authentication.
Source code(tar.gz)
Source code(zip)
drill-1.9.0(Dec 31, 2018)

Apache Drill 1.9's highlights are: asynchronous Parquet reader, Parquet filter pushdown, dynamic UDF support, and HTTPD format plugin.
Source code(tar.gz)
Source code(zip)
drill-1.8.0(Dec 31, 2018)

Apache Drill 1.8's highlights are: metadata cache pruning, IF EXISTS support, DESCRIBE SCHEMA command, multi-byte delimiter support, and new parameters for filter selectivity estimates.
Source code(tar.gz)
Source code(zip)
drill-1.7.0(Dec 31, 2018)

Apache Drill 1.7's highlights are: Monitoring via JMX, Hive CHAR data type support, and HBase 1.x support.
Source code(tar.gz)
Source code(zip)
drill-1.6.0(Dec 31, 2018)

Apache Drill 1.6's highlights are: Inbound impersonation and additional custom window frames.
Source code(tar.gz)
Source code(zip)
drill-1.5.0(Dec 31, 2018)

Apache Drill 1.5's highlights are: Authentication and security for the Web interface and REST API, experimental query support for Apache Kudu (incubating), an improved memory allocator, and configurable caching for Hive metadata.
Source code(tar.gz)
Source code(zip)
drill-1.4.0(Dec 31, 2018)

Apache Drill 1.4's highlights are: "select with options" queries that can change storage plugin settings, improved behavior when parsing CSV file header names, a variable to set non-pretty (e.g. compact) printing of JSON, and better drillbit.log files that include query text.
Source code(tar.gz)
Source code(zip)
drill-1.3.0(Dec 31, 2018)

Drill 1.3 has been released. Users can now query Hadoop sequence files and text delimited files with headers. In addition, this release provides significant performance and usability improvements for working with Amazon S3. Drill 1.3 also adds support for heterogeneous types, enabling queries on datasets with columns that have more than one data type (commonly seen in JSON files, MongoDB collections, etc.).
Source code(tar.gz)
Source code(zip)
drill-1.2.0(Dec 31, 2018)

Drill 1.2 has been released. This release includes a new JDBC storage plugin for querying relational databases, as well as new window functions such as NTILE, FIRST_VALUE, LAST_VALUE, LEAD and LAG. This release addresses 210 JIRAs, including performance, stability and documentation enhancements.
Source code(tar.gz)
Source code(zip)
drill-1.1.0(Dec 31, 2018)

Drill 1.1 has been released, providing window functions, automatic partitioning, improved MongoDB support and more. This release addresses 162 JIRAs.
Source code(tar.gz)
Source code(zip)
drill-1.0.0(Dec 31, 2018)

Drill 1.0 has been released, representing a major milestone for the Drill community. Drill in now production-ready, making it easier than ever to explore and analyze data in non-relational datastores.
Source code(tar.gz)
Source code(zip)
drill-0.9.0(Dec 31, 2018)

The community has just released Drill 0.9, which includes 199 resolved JIRAs and numerous enhancements.
Source code(tar.gz)
Source code(zip)
drill-0.8.0(Dec 31, 2018)

The community has just released Drill 0.8, which includes 243 resolved JIRAs and numerous enhancements.
Source code(tar.gz)
Source code(zip)
drill-0.7.0(Dec 31, 2018)

The community has just released Drill 0.7, which includes 228 resolved JIRAs and numerous enhancements.
Source code(tar.gz)
Source code(zip)
drill-0.4.0(Dec 31, 2018)

Apache Drill Milestone 1 - (Drill Alpha 0.4.0 version)
Source code(tar.gz)
Source code(zip)
drill-0.1.0(Dec 31, 2018)

Apache Drill Milestone 1 - (Drill Alpha)
Source code(tar.gz)
Source code(zip)

Apache Drill is a distributed MPP query layer for self describing data

Related tags

Overview

Apache Drill

Developers

More Information

Join the community!

Export Control

Comments

Correct the Type of the Data Vector in a Nullable Vector

Result Set Loader Refactor

Projection Handling and the Vector Cache

Unit Tests

Reference Design

DRILL-7882: Fix LGTM Alerts in common folder

Done

TO-DO (revise) / ASK FOR HELP

Documentation

Testing

DRILL-7534: Convert HTTPD Format Plugin to EVF

Description

Documentation

Web Server Log Format Plugin (HTTPD)

Configuration

Implicit Columns

Testing

DRILL-8037: Add V2 JSON Format Plugin based on EVF

Description

Documentation

Testing

DRILL-5033: Query on JSON That Has Null as Value For Each Key

Description

Documentation

Testing

DRILL-8376: Add Distribution UDFs

Description

Documentation

Testing

DRILL-8372: Unfreed buffers when running a LIMIT 0 query over delimited text

Description

Documentation

Testing

Releases(drill-1.20.2)

drill-1.20.2(Aug 3, 2022)

drill-1.20.1(Aug 3, 2022)

drill-1.20.0(Feb 25, 2022)

drill-1.19.0(Jun 15, 2021)

drill-1.18.0(Sep 5, 2020)

drill-1.17.0(Dec 26, 2019)

drill-1.15.0(Dec 31, 2018)

drill-1.14.0(Dec 31, 2018)

drill-1.13.0(Dec 31, 2018)

drill-1.12.0(Dec 31, 2018)

drill-1.11.0(Dec 31, 2018)

drill1.10.0(Dec 31, 2018)

drill-1.9.0(Dec 31, 2018)

drill-1.8.0(Dec 31, 2018)

drill-1.7.0(Dec 31, 2018)

drill-1.6.0(Dec 31, 2018)

drill-1.5.0(Dec 31, 2018)

drill-1.4.0(Dec 31, 2018)

drill-1.3.0(Dec 31, 2018)

drill-1.2.0(Dec 31, 2018)

drill-1.1.0(Dec 31, 2018)

drill-1.0.0(Dec 31, 2018)

drill-0.9.0(Dec 31, 2018)

drill-0.8.0(Dec 31, 2018)

drill-0.7.0(Dec 31, 2018)

drill-0.4.0(Dec 31, 2018)

drill-0.1.0(Dec 31, 2018)

Owner

The Apache Software Foundation

The official home of the Presto distributed SQL query engine for big data

Java implementation of Condensation - a zero-trust distributed database that ensures data ownership and data security

Apache Pinot - A realtime distributed OLAP datastore

HurricaneDB a real-time distributed OLAP engine, powered by Apache Pinot

requery - modern SQL based query & persistence for Java / Kotlin / Android

blockchain database, cata metadata query

A Java library to query pictures with SQL-like language

A Java library to query pictures with SQL-like language.