Apache Drill is a distributed MPP query layer for self describing data

Overview

Apache Drill

Build Status Artifact License Stack Overflow Join Drill Slack

Apache Drill is a distributed MPP query layer that supports SQL and alternative query languages against NoSQL and Hadoop data storage systems. It was inspired in part by Google's Dremel.

Developers

Please read Environment.md for setting up and running Apache Drill. For complete developer documentation see DevDocs.md.

More Information

Please see the Apache Drill Website or the Apache Drill Documentation for more information including:

  • Remote Execution Installation Instructions
  • Running Drill on Docker instructions
  • Information about how to submit logical and distributed physical plans
  • More example queries and sample data
  • Find out ways to be involved or discuss Drill

Join the community!

Apache Drill is an Apache Foundation project and is seeking all types of users and contributions. Please say hello on the Apache Drill mailing list.You can also join our Google Hangouts or join our Slack Channel if you need help with using or developing Apache Drill (more information can be found on Apache Drill website).

Export Control

This distribution includes cryptographic software. The country in which you currently reside may have restrictions on the import, possession, use, and/or re-export to another country, of encryption software. BEFORE using any encryption software, please check your country's laws, regulations and policies concerning the import, possession, or use, and re-export of encryption software, to see if this is permitted. See http://www.wassenaar.org/ for more information.
The U.S. Government Department of Commerce, Bureau of Industry and Security (BIS), has classified this software as Export Commodity Control Number (ECCN) 5D002.C.1, which includes information security software using or performing cryptographic functions with asymmetric algorithms. The form and manner of this Apache Software Foundation distribution makes it eligible for export under the License Exception ENC Technology Software Unrestricted (TSU) exception (see the BIS Export Administration Regulations, Section 740.13) for both object code and source code. The following provides more details on the included cryptographic software: Java SE Security packages are used to provide support for authentication, authorization and secure sockets communication. The Jetty Web Server is used to provide communication via HTTPS. The Cyrus SASL libraries, Kerberos Libraries and OpenSSL Libraries are used to provide SASL based authentication and SSL communication.

Comments
  • DRILL-6349: Drill JDBC driver fails on Java 1.9+ with NoClassDefFoundError: sun/misc/VM

    DRILL-6349: Drill JDBC driver fails on Java 1.9+ with NoClassDefFoundError: sun/misc/VM

    PR allows both build and run with JDK8, JDK 10 (and, likely, JDK9). All tests, except HBase, Hive, Kafka Storage Plugin tests, works on JDK10:

    • HBase cannot start master: HMaster ctor fails with message "Unexpected version format: 10.0.2"
    • Hive cannot create HiveMetaStoreClient: ctor fails with "java.base/[Ljava.lang.Object; cannot be cast to java.base/[Ljava.net.URI;"
    • Kafka: KafkaFilterPushdownTest fails with errors "java.lang.NoSuchMethodError: sun.nio.ch.DirectBuffer.cleaner()Lsun/misc/Cleaner;"

    Changes:

    • Added a DrillPlatformDependent class, which tries to read maxDirectMemory from jdk.internal.misc.VM, otherwise fallbacks to netty PlatformDependent
    • asm dependency updated to 6.2.1, and ReplacingInterpreter fixed
    • fixed List.toArray() call in FileSystemPartitionDescriptor (in JDK10 this method return Object[] and cast fails)
    • surefire plugin updated to 2.21.0
    • surefire configuration changed:
      1. added -XX:+IgnoreUnrecognizedVMOptions
      2. added java.se module (mostly, for java.sql module)
      3. added -Djdk.attach.allowAttachSelf=true, required by jmockit
      4. added locale and country settings, because format tests fails with my system locale
    • compiler plugin updated to 3.8.0
    • JarBuilder fixed for JDK10 (target 1.5 and source 1.5 not supported by javac 10)
    • Drill2489CallsAfterCloseThrowExceptionsTest.ThrowsClosedBulkChecker skips new methods in JDK10 (JDK9 ?) jdbc api
    • added jaxb-api and javax.activation dependencies, because javax.xml.bind and javax.activation modules will be removed in JDK11 (javax.activation used by jersey)
    • drill-config.sh and sqlline.bat changed:
      1. added -XX:+IgnoreUnrecognizedVMOptions
      2. added --add-modules java.se (mostly, for java.sql module)
      3. added --add-opens java.base/jdk.internal.misc=ALL-UNNAMED (allows access to jdk.internal.misc.VM)

    P.S. I am sorry for possible mistakes because of my bad English

    opened by oleg-zinovev 47
  • DRILL-6373: Refactor Result Set Loader for Union, List support

    DRILL-6373: Refactor Result Set Loader for Union, List support

    This PR builds on the previous refactoring of the column accessors to prepare for Union, (non-repeated) List and Repeated List support. The PR includes four closely related changes divided across four commits:

    Correct the Type of the Data Vector in a Nullable Vector

    The nullable vectors contain a "bits" vector and a "data" vector. The data vector has historically been created using the same MaterializedField as the nullable vector, meaning that the data vector is labeled as "nullable" even though it has no bits vector.

    This PR creates a clone MaterializedField with the same name as the outer nullable vector, but with a Required type.

    This change ensures that the overflow logic works correctly as it uses the vector metadata (in the MaterializedField) to know what kind of vector to create for the "lookahead" vector.

    Result Set Loader Refactor

    The second commit pretty much just rearranges the deck chairs in a way that we an slot in the new types in the next PR. The need for the changes can be seen in the full code set (the union and list support was pulled out for this PR.)

    A union is a container, like a map, so the tuple state was refactored to create a common parent container state.

    List and unions are very complex to build, so the code to build the internal workings of each vector was pulled out into a separate builder class.

    Projection Handling and the Vector Cache

    Previous versions of the result set loader handled projection and a cache for vectors reused across readers in the same Scan operator. Once we introduce nested maps, projection within maps, unions and lists, projection gets much more complex, as does vector caching.

    This PR adds logic to support projection and vector caching to any arbitrary level of maps. It turns out that handling projection of an entire map, and projection of fields within maps, is far more complex than you'd think, requiring quite a bit of internal state to keep everything straight. The result is that we can now handle a map m with three fields {a, b, c} and project just one of them, m.a, say.

    Further, Drill allows projection of non-existent columns. So, we might ask for field m.d which does not exist in the above map. The projection mechanism handles this case as well, creating the right kind of null column.

    Unit Tests

    New tests are added to exercise the projection and cache mechanisms. Existing tests were updated for the changes made in the refactoring.

    Reference Design

    All of this work is done in support of the overall "batch sizing" project explained here.

    opened by paul-rogers 42
  • DRILL-5735: UI options grouping and filtering & Metrics hints

    DRILL-5735: UI options grouping and filtering & Metrics hints

    (Note: DRILL-4699 is also resolved in this) Additional details, like the description is provided as well in a JScript lookup map. This helps reduce the need for the server to constantly recreate the entire page with the description details, as the client browser can fill in these details. Developers will be expected to update the description as old/new options are introduced or deprecated.

    opened by kkhatua 39
  • DRILL-4653.json - Malformed JSON should not stop the entire query from progressing

    DRILL-4653.json - Malformed JSON should not stop the entire query from progressing

    https://issues.apache.org/jira/browse/DRILL-4653

    • The default is to stop processing as is today when JSON parser encounters an exception
    • Setting store.json.reader.skip_malformed_records will ensure that query progresses after skipping the bad records
    • Added two unit tests
    • Also did testing after deploying the new build: Both positive and negative tests were done.
    • Negative test result: org.apache.drill.common.exceptions.UserRemoteException: DATA_READ ERROR: Error parsing JSON - Unexpected character ('{' (code 123)): was expecting comma to separate OBJECT entries
    opened by ssriniva123 36
  • DRILL-6960: AutoLimit the size of ResultSet for a WebUI (or REST) client

    DRILL-6960: AutoLimit the size of ResultSet for a WebUI (or REST) client

    Fixes the bug introduced with DRILL-6050 (#1593 )

    1. Check if query can be wrapped with a limit and provide a warning if option was selected
    2. Switch help from onclick to a hovering tooltip

    Screenshot: image

    opened by kkhatua 33
  • DRILL-6763: Codegen optimization of SQL functions with constant values

    DRILL-6763: Codegen optimization of SQL functions with constant values

    Details in DRILL-6763:

    Here is the descriptions of the change: 1. add system option exec.optimize_function_compilation to toggle to state of this functionality. 2. codegen is changed by declaring setter method in EvaluationVisitor#visitXXXconstants 3. the member declared in step2 is initialized when the instance of the class is created in XXBatch and others 4. attachment is the code of the same query mentioned in DRILL-6763 generated by setting the value of exec.optimize_function_compilation to true @arina-ielchiieva @vdiravka Would you please take a look? What kind of unit tests should be added? query.txt

    opened by lushuifeng 33
  • DRILL-5956: Add Storage Plugin for Apache Druid

    DRILL-5956: Add Storage Plugin for Apache Druid

    Starting work to add a connector for Apache DRUID.

    Currently, supports Select queries only.

    Files Reviewed:

    • [ ] DruidAndFilter.java
    • [ ] DruidBoundFilter.java
    • [ ] DruidCompareFunctionProcessor.java
    • [ ] DruidFilterBuilder.java
    • [ ] DruidGroupScan.java
    • [x] DruidScanBatchCreator.java
    • [x] DruidScanSpecBuilder.java ~~- [ ] DruidScanner.java~~
    • [x] DruidStoragePlugin.java
    • [x] DruidStoragePluginConfig.java
    • [x] DruidSubScan.java
    • [ ] README.md
    enhancement documentation new-storage 
    opened by akkapur 29
  • DRILL-5796: Filter pruning for multi rowgroup parquet file

    DRILL-5796: Filter pruning for multi rowgroup parquet file

    In ParquetFilterPredicate, replaced canDrop with ROWS_MATCH enum to keep inside rowgroup the filter result information. This information allows to prune the filter when all rows match.

    opened by jbimbert 27
  • Drill 7882 + Drill 7883 - Fix LGTM Alerts in /common and /contrib

    Drill 7882 + Drill 7883 - Fix LGTM Alerts in /common and /contrib

    DRILL-7882: Fix LGTM Alerts in common folder

    #DRILL-7883: Fix LGTM Alerts in contrib folder

    Done

    https://lgtm.com/projects/g/apache/drill/snapshot/5ba93040059efc36da020f3cfd1ad31489e2e55e/files/common/src/main/java/org/apache/drill/common/config/DrillProperties.java?sort=name&dir=ASC&mode=heatmap DrillProperties.java line 115 Added synchronized keyword to method

    https://lgtm.com/projects/g/apache/drill/snapshot/5ba93040059efc36da020f3cfd1ad31489e2e55e/files/common/src/main/java/org/apache/drill/common/HistoricalLog.java?sort=name&dir=ASC&mode=heatmap HistoricalLog.java Line 122 & 129 Suppressed b/c they were comments

    in format-excel

    https://lgtm.com/projects/g/apache/drill/snapshot/5ba93040059efc36da020f3cfd1ad31489e2e55e/files/contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java#x4f0d3a45123fb50b:1 excelbatchreader.java line 280 checked if datacell is null before switch case statement

    in format-hdf5

    https://lgtm.com/projects/g/apache/drill/snapshot/5ba93040059efc36da020f3cfd1ad31489e2e55e/files/contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/HDF5BatchReader.java?sort=name&dir=ASC&mode=heatmap HDF5BatchReader.java line 593 Changed {} to %s so that format call works

    https://lgtm.com/projects/g/apache/drill/snapshot/5ba93040059efc36da020f3cfd1ad31489e2e55e/files/contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/writers/HDF5ByteDataWriter.java?sort=name&dir=ASC&mode=heatmap HDF5ByteDataWriter.java line 71 Despite the fact the counter only runs after, and that if write() would run again it would return false due to the if statement beforehand. LGTM still picks up on it. Thus, I have used a try and catch statement as a way to avoid the alert (although I have no way of testing unless its on the main repo).

    https://lgtm.com/projects/g/apache/drill/snapshot/5ba93040059efc36da020f3cfd1ad31489e2e55e/files/contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/writers/HDF5DoubleDataWriter.java?sort=name&dir=ASC&mode=heatmap HDF5DoubleDataWriter.java line 69 Same as above

    https://lgtm.com/projects/g/apache/drill/snapshot/5ba93040059efc36da020f3cfd1ad31489e2e55e/files/contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/writers/HDF5FloatDataWriter.java?sort=name&dir=ASC&mode=heatmap HDF5FloatDataWriter.java line 69 same as above

    https://lgtm.com/projects/g/apache/drill/snapshot/5ba93040059efc36da020f3cfd1ad31489e2e55e/files/contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/writers/HDF5IntDataWriter.java?sort=name&dir=ASC&mode=heatmap HDF5IntDataWriter.java line 70 same as above

    https://lgtm.com/projects/g/apache/drill/snapshot/5ba93040059efc36da020f3cfd1ad31489e2e55e/files/contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/writers/HDF5LongDataWriter.java?sort=name&dir=ASC&mode=heatmap HDF5LongDataWriter.java line 69 same as above

    https://lgtm.com/projects/g/apache/drill/snapshot/5ba93040059efc36da020f3cfd1ad31489e2e55e/files/contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/writers/HDF5SmallIntDataWriter.java?sort=name&dir=ASC&mode=heatmap HDF5SmallIntDataWriter.java line 71 same as above

    https://lgtm.com/projects/g/apache/drill/snapshot/5ba93040059efc36da020f3cfd1ad31489e2e55e/files/contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/writers/HDF5TimestampDataWriter.java?sort=name&dir=ASC&mode=heatmap HDF5TimeSTampDataWriter line 48 same as above

    in format-img

    https://lgtm.com/projects/g/apache/drill/snapshot/5ba93040059efc36da020f3cfd1ad31489e2e55e/files/contrib/format-image/src/main/java/org/apache/drill/exec/store/image/GenericMetadataDescriptor.java?sort=name&dir=ASC&mode=heatmap GenericMetadataDescriptor.java line 82,83,84 Converted type from Integer to int (didnt seem like there was a need for Integer Class)

    https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/format-image/src/main/java/org/apache/drill/exec/store/image/ImageDirectoryProcessor.java?sort=name&dir=ASC&mode=heatmap ImageDirectoryProcessor.java line 124 Suppressed, needs to be initialized with an arbitrary value (so keep it with null)

    in format-maprdb

    https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/format-maprdb/src/main/java/org/apache/drill/exec/planner/index/MapRDBStatistics.java?sort=name&dir=ASC&mode=heatmap line 777 Added if (pattern != null) statement to avoid potential NPE error

    line 801 same as above

    line 807 if statement but for escape

    https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/binary/BinaryTableGroupScan.java?sort=name&dir=ASC&mode=heatmap BinaryTableGroupScan.java line 190 To avoid int overflow, made numColumns a long variable

    https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/JsonTableGroupScan.java?sort=name&dir=ASC&mode=heatmap JsonTableGroupScan JsonTableGroupScan.java Line 380 boolean includes scanSpec != null (if spanspec is null then scanspec.getserializedfilter() would also be null)

    line 493 suppressed because its comments

    line 520 The 5th format call is for the estimated size, but there is no fn that gets/determines the estimated size... For now I put in "Can't determine estimated size" left it empty

    line 527, 528, 541, 542, 632 All are comments, suppressed

    https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/MaprDBJsonRecordReader.java?sort=name&dir=ASC&mode=heatmap MaprDBJsonRecordReader.java Line 431 Changed suppression to suggestion that CodeQL ppl suggested document == null || document.asReader() == null ? ...

    https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBFormatPluginConfig.java?sort=name&dir=ASC&mode=heatmap MapRDBFormatPluginConfig.java Line 28 There is no equals function (there is impEquals, but is that the same thing?) but there is an overridden hashcode function. Again, I don't know what it's referring to.

    https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBGroupScan.java?sort=name&dir=ASC&mode=heatmap MapRDBGroupScan.java line 255 Format call had wrong syntax (for format(), use % not {} to take in args)

    line 324 It already logs an error if null so there is no point in catching NPE, suppressed

    In format-xml

    https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/format-xml/src/main/java/org/apache/drill/exec/store/xml/XMLBatchReader.java?sort=name&dir=ASC&mode=heatmap XMLBatchReader.java line 94 changed {} to %s

    In storage-druid

    https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/storage-druid/src/main/java/org/apache/drill/exec/store/druid/DruidGroupScan.java?sort=name&dir=ASC&mode=heatmap DruidGroupScan.java line 201 Added L for long type specification

    In storage-hbase

    https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseUtils.java?sort=name&dir=ASC&mode=heatmap HBaseUtils.java line 79 Imported java.utils.Arrays to directly convert filterBytes to a string so that it does not implicitly convert it in the error msg

    https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/config/HBasePersistentStore.java?sort=name&dir=ASC&mode=heatmap HBasePersistentStore.java line 201 suppressed b/c its in try catch statement

    In storage-kudu

    https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/storage-kudu/src/main/java/org/apache/drill/exec/store/kudu/KuduGroupScan.java?sort=name&dir=ASC&mode=heatmap KuduGroupScan.java line 210 Added L for long type specification

    https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/udfs/src/main/java/org/apache/drill/exec/udfs/NetworkFunctions.java?sort=name&dir=ASC&mode=heatmap NetworkFunctions.java Line 434 Multiplied long with assignment so that there is no implicit conversion

    TO-DO (revise) / ASK FOR HELP

    https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/JsonTableRangePartitionFunction.java?sort=name&dir=ASC&mode=heatmap JsonTableRangePartitionFunction.java Line 46 There is an overridden equals function but no hashcode function so I don't know what it's referring to Suppressed

    In storage-kafka

    https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaPartitionScanSpec.java?sort=name&dir=ASC&mode=heatmap KafkaPartitionScanSpec.java line 25 There is no hashcode fn, if there is no fn, then there's no need for it, suppressed

    In udfs

    https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/udfs/src/main/java/org/apache/drill/exec/udfs/CryptoFunctions.java?sort=name&dir=ASC&mode=heatmap CryptoFunctions.java line 288 Alert recommends using AES, but code already uses AES encryption. Maybe the alternatives are weak? Suppressed for now.

    line 339 Same reason as above

    In storage-splunk

    https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/storage-splunk/src/main/java/org/apache/drill/exec/store/splunk/SplunkBatchReader.java?sort=name&dir=ASC&mode=heatmap SplunkBatchReader.java line 232 Very unsure. I am not aware of what the contents are so I can't really analyze this. Suppressed for now.

    https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBTableCache.java?sort=name&dir=ASC&mode=heatmap MapRDBTableCache.java Line 73 If table is null, I believe that it should just throw an NPE regardless (especially if its required in maprdb altho im not entirely sure), it's suppressed

    perhaps i could use a logger.debug() or something like that

    https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/streams/StreamsFormatPluginConfig.java?sort=name&dir=ASC&mode=heatmap StreamsFormatpluginConfig.javas line 27 Suppressed, hashcode and impequals are both overridden. From what I understand about the alert is trying to find hashcode and equals fn, but because it cant find equals (which is just impequals) it assumes that its not overridden

    https://lgtm.com/projects/g/apache/drill/snapshot/e2a0925dd18aacf3a5657acd738f89a63a3b8576/files/contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/TableFormatPluginConfig.java?sort=name&dir=ASC&mode=heatmap TableFormatPluginConfig.java Line 22 Same reason as above

    https://lgtm.com/projects/g/apache/drill/snapshot/5ba93040059efc36da020f3cfd1ad31489e2e55e/files/common/src/main/java/org/apache/drill/common/exceptions/UserException.java?sort=name&dir=ASC&mode=heatmap UserException.java Line 481 Suppressed first b/c suggested solution was not valid, format fn needs those 2 parameters

    Line 643 It's already in a try statement alert suppressed

    Documentation

    N/A

    Testing

    Due to how LGTM alerts work, if it isnt on the actual project, we can't see if it works.

    code-cleanup 
    opened by eevanwong 25
  • DRILL-7534: Convert HTTPD Format Plugin to EVF

    DRILL-7534: Convert HTTPD Format Plugin to EVF

    DRILL-7534: Convert HTTPD Format Plugin to EVF

    Description

    This PR updates the HTTPD format plugin to use the Enhanced Vector Framework (EVF). In theory there are few changes a user might notice.

    1. A new configuration option maxErrors has been added which will allow a user to tune how fault tolerant they want Drill to be when reading log files.
    2. Two new implicit fields have been added, _raw and _matched. They are described in the docs below.
    3. The plugin now includes a limit pushdown which significantly improves query times for queries with limits.
    4. The plugin code is now in the contrib folder.
    5. Added a flattenWildcards option which allows the user to flatten nested fields.

    This PR also refactors the code and includes some optimizations which should, in theory, result in faster queries.

    In addition, this PR updates the associated User Agent parsing functions with the latest version of the underlying libraries.

    Documentation

    Web Server Log Format Plugin (HTTPD)

    This plugin enables Drill to read and query httpd (Apache Web Server) and nginx logs natively. This plugin uses the work by Niels Basjes which is available here: https://github.com/nielsbasjes/logparser.

    Configuration

    There are three fields which you will need to configure in order for Drill to read web server logs which are:

    • logFormat: The log format string is the format string found in your web server configuration.
    • timestampFormat: The format of time stamps in your log files.
    • extensions: The file extension of your web server logs.
    • maxErrors: Sets the plugin error tolerence. When set to any value less than 0, Drill will ignore all errors.
    • flattenWildcards: Flattens nested fields
    "httpd" : {
      "type" : "httpd",
      "logFormat" : "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-agent}i\"",
      "timestampFormat" : "dd/MMM/yyyy:HH:mm:ss ZZ",
      "maxErrors": 0,
      "flattenWildcards": false
    }
    

    Implicit Columns

    Data queried by this plugin will return two implicit columns:

    • _raw: This returns the raw, unparsed log line
    • _matched: Returns true or false depending on whether the line matched the config string.

    Thus, if you wanted to see which lines in your log file were not matching the config, you could use the following query:

    SELECT _raw
    FROM <data>
    WHERE _matched = false
    

    Testing

    Added additional unit tests for this plugin. Ran all unit tests for the parse_user_agent() UDF as well.

    refactoring 
    opened by cgivre 24
  • DRILL-8037: Add V2 JSON Format Plugin based on EVF

    DRILL-8037: Add V2 JSON Format Plugin based on EVF

    DRILL-8037: Add V2 JSON Format Plugin based on EVF

    Description

    This adds new V2 beta JSON Format Plugin based on the "Extended Vector Framework". This is follow up DRILL-6953 (was closed with the decision to merge it by small pieces). So it is based on #1913 and rev2 work.

    Documentation

    "V1" - "legacy" reader "V2" - new beta version JSON Format Plugin based on the result set loader. The V2 version is a bit more robust, and supports the row set framework. However, V1 supports unions and reading corrupted JSON files.

    The new "V2" JSON scan is controlled by a new option: store.json.enable_v2_reader, which is true by default in this PR.

    Adds a "projection type" to the column writer so that the JSON parser can receive a "hint" as to the expected type. The hint is from the form of the projected column: a[0], a.b or just a. Therefore it supports schema provision. Example:

    ALTER SESSION SET `store.json.enable_v2_reader` = true;
    apache drill (dfs.tmp)> select * from test;
    +---------------+-------+---+
    |       a       |   e   | f |
    +---------------+-------+---+
    | {"b":1,"c":1} | false | 1 |
    | {"b":1,"c":1} | null  | 2 |
    | {"b":1,"c":1} | true  | 3 |
    +---------------+-------+---+
    apache drill (dfs.tmp)> create or replace schema (`e` BOOLEAN default 'false', `new` VARCHAR not null default 'schema evolution') for table test;
    apache drill (dfs.tmp)> select * from test;
    +-------+------------------+---------------+---+
    |   e   |       new        |       a       | f |
    +-------+------------------+---------------+---+
    | false | schema evolution | {"b":1,"c":1} | 1 |
    | null  | schema evolution | {"b":1,"c":1} | 2 |
    | true  | schema evolution | {"b":1,"c":1} | 3 |
    +-------+------------------+---------------+---+
    

    Testing

    A lot of existing test cases are running for both readers. It is needed until V1 is still present in Drill code.

    json 
    opened by vdiravka 23
  • DRILL-5033: Query on JSON That Has Null as Value For Each Key

    DRILL-5033: Query on JSON That Has Null as Value For Each Key

    DRILL-5033: Query on JSON That Has Null as Value For Each Key

    Description

    Drill returns same result with or without store.json.all_text_mode=true Note that each key in the JSON has null as its value. [root@cent01 null_eq_joins]# cat right_all_nulls.json

    {
    "intKey" : null,
    "bgintKey": null,
    "strKey": null,
    "boolKey": null,
    "fltKey": null,
    "dblKey": null,
    "timKey": null,
    "dtKey": null,
    "tmstmpKey": null,
    "intrvldyKey": null,
    "intrvlyrKey": null
    }
    

    Querying the above JSON file results in null as query result. We should see each of the keys in the JSON as a column in query result. And in each column the value should be a null value. Current behavior does not look right.

    0: jdbc:drill:schema=dfs.tmp> select * from `right_all_nulls.json`;
    +-------+
    |   *   |
    +-------+
    | null  |
    +-------+
    1 row selected (0.313 seconds)
    

    Documentation

    (Please describe user-visible changes similar to what should appear in the Drill documentation.)

    Testing

    (Please describe how this PR has been tested.)

    opened by unical1988 5
  • DRILL-8376: Add Distribution UDFs

    DRILL-8376: Add Distribution UDFs

    DRILL-8376: Add Distribution UDFs

    Description

    This PR adds several new UDFs to help with statistical analysis. They are width_bucket which mirrors the functionality of the POSTGRES function of the same name. (https://www.oreilly.com/library/view/sql-in-a/9780596155322/re91.html). This function is useful for building histograms of data.

    This also adds the kendall_correlation, regr_slope, regr_intercept functions which are two function for calculating correlation coefficients of two columns.

    Documentation

    Drill has several functions for correlations and understanding the distribution of your data.

    The functions are:

    • width_bucket(value, min, max, buckets): Useful for crafting histograms and understanding distributions of continuous variables.
    • kendall_correlation(col1, col2): Calculates the kendall correlation coefficient of two columns within a dataset.
    • regr_slope(x,y): Determines the slope of the least-squares-fit linear equation
    • regr_intercept(x,y): Computes the y-intercept of the least-squares-fit linear equation

    Testing

    Added unit tests.

    enhancement doc-impacting minor-update udf 
    opened by cgivre 0
  • DRILL-8372: Unfreed buffers when running a LIMIT 0 query over delimited text

    DRILL-8372: Unfreed buffers when running a LIMIT 0 query over delimited text

    DRILL-8372: Unfreed buffers when running a LIMIT 0 query over delimited text

    Description

    With the following data layout

    /tmp/foo/bar:
    large_csv.csvh
    /tmp/foo/boo:
    large_csv.csvh
    

    a LIMIT 0 query over it results in unfreed buffer errors as shown below.

    apache drill (dfs.tmp)> select * from `foo` limit 0;
    Error: SYSTEM ERROR: IllegalStateException: Allocator[op:0:0:4:EasySubScan] closed with outstanding buffers allocated (3).
    Allocator(op:0:0:4:EasySubScan) 1000000/299008/3182592/10000000000 (res/actual/peak/limit)
      child allocators: 0
      ledgers: 3
        ledger[113] allocator: op:0:0:4:EasySubScan), isOwning: true, size: 262144, references: 1, life: 277785186322881..0, allocatorManager: [109, life: 277785186258906..0] holds 1 buffers.
            DrillBuf[142], udle: [110 0..262144]
        ledger[114] allocator: op:0:0:4:EasySubScan), isOwning: true, size: 32768, references: 1, life: 277785186463824..0, allocatorManager: [110, life: 277785186414654..0] holds 1 buffers.
            DrillBuf[143], udle: [111 0..32768]
        ledger[112] allocator: op:0:0:4:EasySubScan), isOwning: true, size: 4096, references: 1, life: 277785186046095..0, allocatorManager: [108, life: 277785185921147..0] holds 1 buffers.
            DrillBuf[141], udle: [109 0..4096]
      reservations: 0 
    

    Documentation

    N/A

    Testing

    TODO

    bug 
    opened by jnturton 0
  • Failed to execute an insert statement across the database

    Failed to execute an insert statement across the database

    Failed to execute an insert statement across the database.

    Steps to reproduce the behavior:

    1. Prepare the mysql and postgresql table structures and data. mysql: create table t1(c1 int, c2 int); insert into t1 values (1,1), (2,2); postgres: create table t1(c1 int, c2 int);

    2. Create mysql and postgresql Plugins in Storage label via http://localhost:8047/storage pages.

    3. Execute the following sql statement using sqlline. Jupiter> insert into pg.public.t1 select c1, c2 from mysql.test.t1; Error: DATA_READ ERROR: The JDBC storage plugin failed while trying setup the SQL query. Sql: INSERT INTO public.t1 (c1, c2) (SELECT * FROM test.t1) Fragment: 0:0 [Error Id: a5b3ee38-38f5-4945-afda-8a7d4746df4c on DESKTOP-PHHB7LC:31010] (state=,code=0)

    4. The execution plan in the log is as follows: 2022-12-19 14:36:32,447 [1c5ffc63-764a-c5ab-4d06-90e658b8e132:foreman] DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - Drill Physical: 00-00    Screen : rowType = RecordType(BIGINT ROWCOUNT): rowcount = 1.0E9, cumulative cost = {1.1E9 rows, 1.1E9 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 614 00-01      Jdbc(sql=[INSERT INTO public.t1 (c1, c2)  (SELECT *  FROM test.t1) ]) : rowType = RecordType(BIGINT ROWCOUNT): rowcount = 1.0E9, cumulative cost = {1.0E9 rows, 1.0E9 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 604

    opened by weijunlu 8
  • select * from hive ;report refcnt = 0 error

    select * from hive ;report refcnt = 0 error

    create table in hive schame by drill as follows:

    create table hive.t (a string ,b string, c string) insert value lengths should longer than 256 words eg:

    insert into hive.t values("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb","ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc");

    and then select * from hive.t

    report error as follows:

    INTERNAL_ERROR ERROR: refcnt: 0

    image

    opened by yaozhu 4
  • When use muti data source join query join condition in where clause failed cause by  missing conversions

    When use muti data source join query join condition in where clause failed cause by missing conversions

    1. The version information is as follows:
        apache drill> select commit_message, commit_time from sys.version;
        +----------------------------------------------------------------------------------+---------------------------+
        |                                  commit_message                                  |        commit_time        |
        +----------------------------------------------------------------------------------+---------------------------+
        | DRILL-8357: Add new config options to the Splunk storage plugin (extra docs) (#2706) | 15.11.2022 @ 20:34:55 CST |
        +----------------------------------------------------------------------------------+---------------------------+
        1 row selected (0.133 seconds)
    
    1. Create tables and insert data in postgresql and mysql database
        create table t1(col1 int);
        insert into t1 values(1), (2);
    
    1. Create mysql and postgresql Plugins in Storage label via http://localhost:8047/storage pages

    2. The sql statement is executed as follows

      apache drill> select * from pgsql.t1 as pg ,my.test.t1 as my where pg.col1 = my.col1 and  pg.col1 = 1 and my.col1 =1 ;
      Error: UNSUPPORTED_OPERATION ERROR: This query cannot be planned possibly due to either a cartesian join or an inequality join. 
      If a cartesian or inequality join is used intentionally, set the option 'planner.enable_nljoin_for_scalar_only' to false and try again.
      
      
      [Error Id: 91583f69-5aa6-44d5-a29c-63e2358932ea ] (state=,code=0)
      apache drill> set planner.enable_nljoin_for_scalar_only=false;
      +------+------------------------------------------------+
      |  ok  |                    summary                     |
      +------+------------------------------------------------+
      | true | planner.enable_nljoin_for_scalar_only updated. |
      +------+------------------------------------------------+
      1 row selected (0.212 seconds)
      apache drill> select * from pgsql.t1 as pg ,my.test.t1 as my where pg.col1 = my.col1 and  pg.col1 = 1 and my.col1 =1 ;
      Error: UNSUPPORTED_OPERATION ERROR: This query cannot be planned possibly due to either a cartesian join or an inequality join. 
      If a cartesian or inequality join is used intentionally, set the option 'planner.enable_nljoin_for_scalar_only' to false and try again.
      
      
      [Error Id: 37d0dbca-40d1-4de5-9443-b48ce3a172c0 ] (state=,code=0)
      apache drill> 
    
    1. The below is drillbit.log info
      2022-11-18 08:08:01,169 [1c88c29e-3040-efc1-46a6-33152cfbab32:foreman] INFO  o.a.drill.exec.work.foreman.Foreman - Query text for query with id 1c88c29e-3040-efc1-46a6-33152cfbab32 issued by test: select * from pgsql.t1 as pg ,my.test.t1 as my where pg.col1 = my.col1 and  pg.col1 = 1 and my.col1 =1 
      2022-11-18 08:08:01,258 [1c88c29e-3040-efc1-46a6-33152cfbab32:foreman] ERROR o.a.d.e.p.s.h.DefaultSqlHandler - There are not enough rules to produce a node with desired properties: convention=PHYSICAL, DrillDistributionTraitDef=SINGLETON([]), sort=[].
      Missing conversions are JdbcFilter[convention: JDBC.pgsql -> JDBC.my], JdbcFilter[convention: JDBC.my -> JDBC.pgsql]
      There are 2 empty subsets:
      Empty subset 0: rel#15241:RelSubset#12.JDBC.my.ANY([]).[], the relevant part of the original plan is as follows
      15211:JdbcFilter(condition=[=($0, 1)])
        14914:JdbcTableScan(subset=[rel#15210:RelSubset#11.JDBC.pgsql.ANY([]).[]], table=[[pgsql, t1]])
      
      Empty subset 1: rel#15247:RelSubset#15.JDBC.pgsql.ANY([]).[], the relevant part of the original plan is as follows
      15216:JdbcFilter(condition=[=($0, 1)])
        14915:JdbcTableScan(subset=[rel#15215:RelSubset#14.JDBC.my.ANY([]).[]], table=[[my, test, t1]])
      
      Root: rel#15224:RelSubset#18.PHYSICAL.SINGLETON([]).[]
      Original rel:
      LogicalProject(subset=[rel#14956:RelSubset#4.LOGICAL.ANY([]).[]], col1=[$0], col10=[$1]): rowcount = 3.375E15, cumulative cost = {3.375E15 rows, 6.75E15 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 14954
        LogicalFilter(subset=[rel#14953:RelSubset#3.NONE.ANY([]).[]], condition=[AND(=($0, $1), =($0, 1), =($1, 1))]): rowcount = 3.375E15, cumulative cost = {3.375E15 rows, 1.0E18 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 14952
          LogicalJoin(subset=[rel#14951:RelSubset#2.NONE.ANY([]).[]], condition=[true], joinType=[inner]): rowcount = 1.0E18, cumulative cost = {1.0E18 rows, 0.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 14950
            JdbcTableScan(subset=[rel#14948:RelSubset#0.JDBC.pgsql.ANY([]).[]], table=[[pgsql, t1]]): rowcount = 1.0E9, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 14914
            JdbcTableScan(subset=[rel#14949:RelSubset#1.JDBC.my.ANY([]).[]], table=[[my, test, t1]]): rowcount = 1.0E9, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 14915
      
      Sets:
      Set#11, type: RecordType(INTEGER col1)
    	  rel#15210:RelSubset#11.JDBC.pgsql.ANY([]).[], best=rel#14914
    		  rel#14914:JdbcTableScan.JDBC.pgsql.ANY([]).[](table=[pgsql, t1]), rowcount=1.0E9, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io, 0.0 network, 0.0 memory}
    	  rel#15227:RelSubset#11.LOGICAL.ANY([]).[], best=rel#15226
    		  rel#15226:VertexDrel.LOGICAL.ANY([]).[](input=RelSubset#15210), rowcount=1.0E9, cumulative cost={1.000001E8 rows, 1.00000101E8 cpu, 0.0 io, 0.0 network, 0.0 memory}
    	  rel#15254:RelSubset#11.PHYSICAL.SINGLETON([]).[], best=rel#15253
    		  rel#15253:JdbcIntermediatePrel.PHYSICAL.SINGLETON([]).[](input=RelSubset#15210), rowcount=1.0E9, cumulative cost={1.0000001E9 rows, 1.000000101E9 cpu, 0.0 io, 0.0 network, 0.0 memory}
      Set#12, type: RecordType(INTEGER col1)
    	  rel#15212:RelSubset#12.JDBC.pgsql.ANY([]).[], best=rel#15211
    		  rel#15211:JdbcFilter.JDBC.pgsql.ANY([]).[](input=RelSubset#15210,condition==($0, 1)), rowcount=1.5E8, cumulative cost={1.500001E8 rows, 1.000000101E9 cpu, 0.0 io, 0.0 network, 0.0 memory}
    		  rel#15246:AbstractConverter.JDBC.pgsql.ANY([]).[](input=RelSubset#15232,convention=JDBC.pgsql,DrillDistributionTraitDef=ANY([]),sort=[]), rowcount=1.5E8, cumulative cost={inf}
    	  rel#15230:RelSubset#12.LOGICAL.ANY([]).[], best=rel#15213
    		  rel#15213:VertexDrel.LOGICAL.ANY([]).[](input=RelSubset#15212), rowcount=1.5E8, cumulative cost={1.650001E8 rows, 1.015000101E9 cpu, 0.0 io, 0.0 network, 0.0 memory}
    		  rel#15258:VertexDrel.LOGICAL.ANY([]).[](input=RelSubset#15241), rowcount=1.5E8, cumulative cost={inf}
    	  rel#15232:RelSubset#12.PHYSICAL.SINGLETON([]).[], best=rel#15231
    		  rel#15231:JdbcIntermediatePrel.PHYSICAL.SINGLETON([]).[](input=RelSubset#15212), rowcount=1.5E8, cumulative cost={3.000001E8 rows, 1.150000101E9 cpu, 0.0 io, 0.0 network, 0.0 memory}
    		  rel#15271:JdbcIntermediatePrel.PHYSICAL.SINGLETON([]).[](input=RelSubset#15241), rowcount=1.5E8, cumulative cost={inf}
    	  rel#15241:RelSubset#12.JDBC.my.ANY([]).[], best=null
    		  rel#15242:AbstractConverter.JDBC.my.ANY([]).[](input=RelSubset#15232,convention=JDBC.my,DrillDistributionTraitDef=ANY([]),sort=[]), rowcount=1.5E8, cumulative cost={inf}
      Set#14, type: RecordType(INTEGER col1)
    	  rel#15215:RelSubset#14.JDBC.my.ANY([]).[], best=rel#14915
    		  rel#14915:JdbcTableScan.JDBC.my.ANY([]).[](table=[my, test, t1]), rowcount=1.0E9, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io, 0.0 network, 0.0 memory}
    	  rel#15235:RelSubset#14.LOGICAL.ANY([]).[], best=rel#15234
    		  rel#15234:VertexDrel.LOGICAL.ANY([]).[](input=RelSubset#15215), rowcount=1.0E9, cumulative cost={1.000001E8 rows, 1.00000101E8 cpu, 0.0 io, 0.0 network, 0.0 memory}
    	  rel#15256:RelSubset#14.PHYSICAL.SINGLETON([]).[], best=rel#15255
    		  rel#15255:JdbcIntermediatePrel.PHYSICAL.SINGLETON([]).[](input=RelSubset#15215), rowcount=1.0E9, cumulative cost={1.0000001E9 rows, 1.000000101E9 cpu, 0.0 io, 0.0 network, 0.0 memory}
      Set#15, type: RecordType(INTEGER col1)
    	  rel#15217:RelSubset#15.JDBC.my.ANY([]).[], best=rel#15216
    		  rel#15216:JdbcFilter.JDBC.my.ANY([]).[](input=RelSubset#15215,condition==($0, 1)), rowcount=1.5E8, cumulative cost={1.500001E8 rows, 1.000000101E9 cpu, 0.0 io, 0.0 network, 0.0 memory}
    		  rel#15243:AbstractConverter.JDBC.my.ANY([]).[](input=RelSubset#15240,convention=JDBC.my,DrillDistributionTraitDef=ANY([]),sort=[]), rowcount=1.5E8, cumulative cost={inf}
    	  rel#15238:RelSubset#15.LOGICAL.ANY([]).[], best=rel#15218
    		  rel#15218:VertexDrel.LOGICAL.ANY([]).[](input=RelSubset#15217), rowcount=1.5E8, cumulative cost={1.650001E8 rows, 1.015000101E9 cpu, 0.0 io, 0.0 network, 0.0 memory}
    		  rel#15268:VertexDrel.LOGICAL.ANY([]).[](input=RelSubset#15247), rowcount=1.5E8, cumulative cost={inf}
    	  rel#15240:RelSubset#15.PHYSICAL.SINGLETON([]).[], best=rel#15239
    		  rel#15239:JdbcIntermediatePrel.PHYSICAL.SINGLETON([]).[](input=RelSubset#15217), rowcount=1.5E8, cumulative cost={3.000001E8 rows, 1.150000101E9 cpu, 0.0 io, 0.0 network, 0.0 memory}
    		  rel#15274:JdbcIntermediatePrel.PHYSICAL.SINGLETON([]).[](input=RelSubset#15247), rowcount=1.5E8, cumulative cost={inf}
    	  rel#15247:RelSubset#15.JDBC.pgsql.ANY([]).[], best=null
    		  rel#15248:AbstractConverter.JDBC.pgsql.ANY([]).[](input=RelSubset#15240,convention=JDBC.pgsql,DrillDistributionTraitDef=ANY([]),sort=[]), rowcount=1.5E8, cumulative cost={inf}
      Set#17, type: RecordType(INTEGER col1, INTEGER col10)
    	  rel#15221:RelSubset#17.LOGICAL.ANY([]).[], best=rel#15220
    		  rel#15220:DrillJoinRel.LOGICAL.ANY([]).[](left=RelSubset#15230,right=RelSubset#15238,condition=true,joinType=inner), rowcount=2.25E16, cumulative cost={6.300002E8 rows, 2.030000202E9 cpu, 0.0 io, 0.0 network, 1.32E9 memory}
    		  rel#15263:VertexDrel.LOGICAL.ANY([]).[](input=RelSubset#15245), rowcount=2.25E16, cumulative cost={inf}
    		  rel#15270:VertexDrel.LOGICAL.ANY([]).[](input=RelSubset#15250), rowcount=2.25E16, cumulative cost={inf}
    	  rel#15245:RelSubset#17.JDBC.my.ANY([]).[], best=null
    		  rel#15244:JdbcJoin.JDBC.my.ANY([]).[](left=RelSubset#15241,right=RelSubset#15217,condition=true,joinType=inner), rowcount=1.5E8, cumulative cost={inf}
    	  rel#15250:RelSubset#17.JDBC.pgsql.ANY([]).[], best=null
    		  rel#15249:JdbcJoin.JDBC.pgsql.ANY([]).[](left=RelSubset#15212,right=RelSubset#15247,condition=true,joinType=inner), rowcount=1.5E8, cumulative cost={inf}
    	  rel#15251:RelSubset#17.PHYSICAL.SINGLETON([]).[], best=null
    		  rel#15273:JdbcIntermediatePrel.PHYSICAL.SINGLETON([]).[](input=RelSubset#15245), rowcount=2.25E16, cumulative cost={inf}
    		  rel#15276:JdbcIntermediatePrel.PHYSICAL.SINGLETON([]).[](input=RelSubset#15250), rowcount=2.25E16, cumulative cost={inf}
      Set#18, type: RecordType(INTEGER col1, INTEGER col10)
    	  rel#15223:RelSubset#18.LOGICAL.ANY([]).[], best=rel#15222
    		  rel#15222:DrillScreenRel.LOGICAL.ANY([]).[](input=RelSubset#15221), rowcount=2.25E16, cumulative cost={2.2500006300002E15 rows, 2.250002030000202E15 cpu, 0.0 io, 0.0 network, 1.32E9 memory}
    	  rel#15224:RelSubset#18.PHYSICAL.SINGLETON([]).[], best=null
    		  rel#15252:ScreenPrel.PHYSICAL.SINGLETON([]).[](input=RelSubset#15251), rowcount=2.25E16, cumulative cost={inf}
    
    1. When i modify query use inner/right/full join, the query run normally
      apache drill> select * from pgsql.t1 as pg inner join my.wubq.t1 as my on pg.col1 = my.col1 where pg.col1 = 1 and my.col1 =1 ;
      +------+-------+
      | col1 | col10 |
      +------+-------+
      | 1    | 1     |
      +------+-------+
      1 row selected (0.644 seconds)
      apache drill> 
    
    opened by Javelin2007 2
Releases(drill-1.20.2)
  • drill-1.20.2(Aug 3, 2022)

  • drill-1.20.1(Aug 3, 2022)

  • drill-1.20.0(Feb 25, 2022)

    Apache Drill 1.20's highlights are: the Apache Phoenix storage plugin with impersonation support, the Apache Iceberg format plugin, expanded push down support for MongoDB, persistent table and storage aliases, a release for Hadoop 2 environments, write support in the JDBC storage plugin, SAS and PDF format plugins, pagination and OAuth support in the HTTP storage plugin, a HashiCorp Vault authentication and credential storage providers and read/write support for all compression codecs and both format versions of Parquet.

    Source code(tar.gz)
    Source code(zip)
  • drill-1.19.0(Jun 15, 2021)

    Apache Drill 1.19's highlights are:

    Source code(tar.gz)
    Source code(zip)
  • drill-1.18.0(Sep 5, 2020)

    Apache Drill 1.18's highlights are: Drill Metadata management "Drill Metastore", Format Plugin for HDF5, Support for DICT type in RowSet Framework, Storage Plugin for Generic HTTP REST API, Dynamic credit based flow control, Support for injecting BufferManager into UDF, Drill RDBMS Metastore

    Source code(tar.gz)
    Source code(zip)
  • drill-1.17.0(Dec 26, 2019)

    Apache Drill 1.17's highlights are: Hive complex types support, Upgrade to HADOOP-3.2, Schema Provision using File / Table Function, Drill Metastore support, Excel and ESRI Shapefile (shp) format plugins support, Parquet runtime row group pruning, empty Parquet files support, User-Agent UDFs, canonical Map<K,V> support, and more.

    Source code(tar.gz)
    Source code(zip)
  • drill-1.15.0(Dec 31, 2018)

    Apache Drill 1.15's highlights are: SQLLine upgrade, index support, ability to secure znodes with custom ACLs, INFORMATION_SCHEMA files table, systemfunctions table, and more.

    Source code(tar.gz)
    Source code(zip)
  • drill-1.14.0(Dec 31, 2018)

    Apache Drill 1.14's highlights are: Ability to run Drill in a Docker container, ability to export and save storage plugin configurations, a storage plugin configuration file to manage storage plugin configurations, an image metadata format plugin, option to set Hive properties at the session level, and more.

    Source code(tar.gz)
    Source code(zip)
  • drill-1.13.0(Dec 31, 2018)

    Apache Drill 1.13's highlights are: YARN support, support for HTTP Kerberos authentication using SPNEGO, SQL syntax highlighting of queries, and user and distribution specific configuration checks during startup.

    Source code(tar.gz)
    Source code(zip)
  • drill-1.12.0(Dec 31, 2018)

    Apache Drill 1.12's highlights are: Kafka and OpenTSDB storage plugins, SSL and network encryption support, queue-based memory assignment for buffering operators, networking functions, and the ability to prevent users from accessing paths outside the root of a workspace.

    Source code(tar.gz)
    Source code(zip)
  • drill-1.11.0(Dec 31, 2018)

    Apache Drill 1.11's highlights are: Cryptography-related functions, spill to disk for the hash aggregate operator, Format plugin support for PCAP files, ability to change the HDFS block Size for Parquet files, ability to store query profiles in memory, configurable CTAS directory and file permissions option, support for network encryption, relative paths stored in the metadata file, and support for ANSI_QUOTES.

    Source code(tar.gz)
    Source code(zip)
  • drill1.10.0(Dec 31, 2018)

    Apache Drill 1.10's highlights are: CTTAS, improved fault tolerance, Drill version and statistics in Web Console, implicit interpretation of INT96, and Kerberos authentication.

    Source code(tar.gz)
    Source code(zip)
  • drill-1.9.0(Dec 31, 2018)

  • drill-1.8.0(Dec 31, 2018)

    Apache Drill 1.8's highlights are: metadata cache pruning, IF EXISTS support, DESCRIBE SCHEMA command, multi-byte delimiter support, and new parameters for filter selectivity estimates.

    Source code(tar.gz)
    Source code(zip)
  • drill-1.7.0(Dec 31, 2018)

  • drill-1.6.0(Dec 31, 2018)

  • drill-1.5.0(Dec 31, 2018)

    Apache Drill 1.5's highlights are: Authentication and security for the Web interface and REST API, experimental query support for Apache Kudu (incubating), an improved memory allocator, and configurable caching for Hive metadata.

    Source code(tar.gz)
    Source code(zip)
  • drill-1.4.0(Dec 31, 2018)

    Apache Drill 1.4's highlights are: "select with options" queries that can change storage plugin settings, improved behavior when parsing CSV file header names, a variable to set non-pretty (e.g. compact) printing of JSON, and better drillbit.log files that include query text.

    Source code(tar.gz)
    Source code(zip)
  • drill-1.3.0(Dec 31, 2018)

    Drill 1.3 has been released. Users can now query Hadoop sequence files and text delimited files with headers. In addition, this release provides significant performance and usability improvements for working with Amazon S3. Drill 1.3 also adds support for heterogeneous types, enabling queries on datasets with columns that have more than one data type (commonly seen in JSON files, MongoDB collections, etc.).

    Source code(tar.gz)
    Source code(zip)
  • drill-1.2.0(Dec 31, 2018)

    Drill 1.2 has been released. This release includes a new JDBC storage plugin for querying relational databases, as well as new window functions such as NTILE, FIRST_VALUE, LAST_VALUE, LEAD and LAG. This release addresses 210 JIRAs, including performance, stability and documentation enhancements.

    Source code(tar.gz)
    Source code(zip)
  • drill-1.1.0(Dec 31, 2018)

    Drill 1.1 has been released, providing window functions, automatic partitioning, improved MongoDB support and more. This release addresses 162 JIRAs.

    Source code(tar.gz)
    Source code(zip)
  • drill-1.0.0(Dec 31, 2018)

    Drill 1.0 has been released, representing a major milestone for the Drill community. Drill in now production-ready, making it easier than ever to explore and analyze data in non-relational datastores.

    Source code(tar.gz)
    Source code(zip)
  • drill-0.9.0(Dec 31, 2018)

  • drill-0.8.0(Dec 31, 2018)

  • drill-0.7.0(Dec 31, 2018)

  • drill-0.4.0(Dec 31, 2018)

  • drill-0.1.0(Dec 31, 2018)

Owner
The Apache Software Foundation
The Apache Software Foundation
The official home of the Presto distributed SQL query engine for big data

Presto Presto is a distributed SQL query engine for big data. See the User Manual for deployment instructions and end user documentation. Requirements

Presto 14.3k Dec 30, 2022
Java implementation of Condensation - a zero-trust distributed database that ensures data ownership and data security

Java implementation of Condensation About Condensation enables to build modern applications while ensuring data ownership and security. It's a one sto

CondensationDB 43 Oct 19, 2022
Apache Pinot - A realtime distributed OLAP datastore

What is Apache Pinot? Features When should I use Pinot? Building Pinot Deploying Pinot to Kubernetes Join the Community Documentation License What is

The Apache Software Foundation 4.4k Dec 30, 2022
HurricaneDB a real-time distributed OLAP engine, powered by Apache Pinot

HurricaneDB is a real-time distributed OLAP datastore, built to deliver scalable real-time analytics with low latency. It can ingest from batch data sources (such as Hadoop HDFS, Amazon S3, Azure ADLS, Google Cloud Storage) as well as stream data sources (such as Apache Kafka).

GuinsooLab 4 Dec 28, 2022
requery - modern SQL based query & persistence for Java / Kotlin / Android

A light but powerful object mapping and SQL generator for Java/Kotlin/Android with RxJava and Java 8 support. Easily map to or create databases, perfo

requery 3.1k Jan 5, 2023
blockchain database, cata metadata query

Drill Storage Plugin for IPFS 中文 Contents Introduction Compile Install Configuration Run Introduction Minerva is a storage plugin of Drill that connec

null 145 Dec 7, 2022
A Java library to query pictures with SQL-like language

PicSQL A Java library to query pictures with SQL-like language. Features : Select and manipulate pixels of pictures in your disk with SQL-like dialect

Olivier Cavadenti 16 Dec 25, 2022
A Java library to query pictures with SQL-like language.

PicSQL A Java library to query pictures with SQL-like language. Features : Select and manipulate pixels of pictures in your disk with SQL-like dialect

null 16 Dec 25, 2022
Aggregation query proxy is a scalable sidecar application that sits between a customer application and Amazon Keyspaces/DynamoDB

Aggregation query proxy is a scalable sidecar application that sits between a customer application and Amazon Keyspaces/DynamoDB. It allows you to run bounded aggregation queries against Amazon Keyspaces and DynamoDB services.

AWS Samples 3 Jul 18, 2022
CrateDB is a distributed SQL database that makes it simple to store and analyze massive amounts of machine data in real-time.

About CrateDB is a distributed SQL database that makes it simple to store and analyze massive amounts of machine data in real-time. CrateDB offers the

Crate.io 3.6k Jan 2, 2023
A distributed in-memory data store for the cloud

EVCache EVCache is a memcached & spymemcached based caching solution that is mainly used for AWS EC2 infrastructure for caching frequently used data.

Netflix, Inc. 1.9k Jan 2, 2023
Apache Cayenne is an open source persistence framework licensed under the Apache License

Apache Cayenne is an open source persistence framework licensed under the Apache License, providing object-relational mapping (ORM) and remoting services.

The Apache Software Foundation 284 Dec 31, 2022
IoTDB (Internet of Things Database) is a data management system for time series data

English | 中文 IoTDB Overview IoTDB (Internet of Things Database) is a data management system for time series data, which can provide users specific ser

The Apache Software Foundation 3k Jan 1, 2023
Distributed ID Generate Service

Leaf There are no two identical leaves in the world. — Leibnitz 中文文档 | English Document Introduction Leaf refers to some common ID generation schemes

美团 5.7k Dec 29, 2022
A scalable, distributed Time Series Database.

___ _____ ____ ____ ____ / _ \ _ __ ___ _ _|_ _/ ___|| _ \| __ ) | | | | '_ \ / _ \ '_ \| | \___ \| | | | _ \

OpenTSDB 4.8k Dec 26, 2022
Apache Calcite

Apache Calcite Apache Calcite is a dynamic data management framework. It contains many of the pieces that comprise a typical database management syste

The Apache Software Foundation 3.6k Dec 31, 2022
Apache Druid: a high performance real-time analytics database.

Website | Documentation | Developer Mailing List | User Mailing List | Slack | Twitter | Download Apache Druid Druid is a high performance real-time a

The Apache Software Foundation 12.3k Jan 1, 2023