Apache Calcite

Last update: Dec 31, 2022

Overview

Apache Calcite

Apache Calcite is a dynamic data management framework.

It contains many of the pieces that comprise a typical database management system but omits the storage primitives. It provides an industry standard SQL parser and validator, a customisable optimizer with pluggable rules and cost functions, logical and physical algebraic operators, various transformation algorithms from SQL to algebra (and the opposite), and many adapters for executing SQL queries over Cassandra, Druid, Elasticsearch, MongoDB, Kafka, and others, with minimal configuration.

For more details, see the home page.

Comments

[CALCITE-3873] Use global caching for ReflectiveVisitDispatcher implementation

By examining a simple query through flame graph (see issue), one interesting point is that I find there are too many calls using reflection, which is not performant, although the total overhead is less than 1%, I still spend some time trying to improve. Most invocations are rooted down to ReflectiveVisitDispatcher, the current implementation creates new instance whenever needed, and looking up methods by reflection per instance, I think by caching methods globally, as the methods count is countable to 68 possible places, different ReflectiveVisitDispatcher in different thread is able to reuse. The fundamental change will benefit other likewise invocations as well.

opened by neoremind 27
[CALCITE-3737][CALCITE-3780] Implement HOP and SESSION table functions
see:
https://issues.apache.org/jira/browse/CALCITE-3737 https://issues.apache.org/jira/browse/CALCITE-3780

Some highlights on this PR:

support HOP as a table function.

support SESSION as a table function.

rename "table-valued function" to "table function" to improve naming.

LGTM-will-merge-soon needs-a-final-review
opened by amaliujia 27
[CALCITE-3272] Support TUMBLE as Table Valued Function including an enumerable implementation, stream.iq and DESCRIPTOR
High level speaking, this PR adds the following support:

SELECT * FROM TABLE(Tumble( TABLE ORDERS , DESCRIPTOR(ROWTIME) , INTERVAL '1' MINUTES))

This PR adds TUMBLE as table value function and also adds stream.iq along with Enumerable implementation. This is a big PR that actually is also related to the following JIRAs:

https://jira.apache.org/jira/browse/CALCITE-3340 https://jira.apache.org/jira/browse/CALCITE-3501 https://jira.apache.org/jira/browse/CALCITE-3499 https://jira.apache.org/jira/browse/CALCITE-3418 https://jira.apache.org/jira/browse/CALCITE-3339

Note that DESCRIPTOR support is also included in this PR.
needs-a-final-review
opened by amaliujia 27
[CALCITE-2913] Adapter for Apache Kafka

Add an adapter to expose Kafka topics as STREAM tables.

KafkaTableFactory is used here so end users need to specify table-topic mapping one-by-one.

JIRA: https://issues.apache.org/jira/browse/CALCITE-2913

CC: @danny0405
LGTM-will-merge-soon

opened by mingmxu 26
[CALCITE-2808] Add the JSON_LENGTH function
JSON_LENGTH(**json_doc**[, *path*])

Returns the length of a JSON document, or, if a path argument is given, the length of the value within the document identified by the path. Returns NULL if any argument is NULL or the path argument does not identify a value in the document. An error occurs if the json_doc argument is not a valid JSON document or the path argument is not a valid path expression or contains a {} or }}{{* wildcard.

The length of a document is determined as follows:

The length of a scalar is 1.

The length of an array is the number of array elements.

The length of an object is the number of object members.

The length does not count the length of nested arrays or objects.

Example Sql:

SELECT JSON_LENGTH(v) AS c1 ,JSON_LENGTH(v, 'lax $.a') AS c2 ,JSON_LENGTH(v, 'strict $.a[0]') AS c3 ,JSON_LENGTH(v, 'strict $.a[1]') AS c4 FROM (VALUES ('{"a": [10, true]}')) AS t(v) LIMIT 10;

Result:

| c1 | c2 | c3 | c4 | | ---- | ---- | ---- | ---- | | 1 | 2 | 1 | 1 |
LGTM-will-merge-soon
opened by XuQianJin-Stars 24

[CALCITE-2601] Add REVERSE function

Fix ISSUE #2601

mysql

mysql> SELECT REVERSE('hello');
+------------------+
| REVERSE('hello') |
+------------------+
| olleh            |
+------------------+
1 row in set (0.00 sec)

sql server


DECLARE @str NVARCHAR(100) 

SET @str='ABCD'

SELECT REVERSE(@str)

testdb=# SELECT REVERSE('abcd');
 reverse
---------
 dcba
(1 row)

oracle


SQL> select reverse('12345') from dual;
REVER

54321

doc:

https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_reverse

returned-with-feedback

opened by ambition119 23

[CALCITE-4368] TopDownOptTest fails if applying non-substitution rule first
Usually O_INPUTS are only applied for groups with physical convention. But when enabling AbstractConverter, the input of AbstractConverter might be a group with NONE convention. In that case, no need to apply O_INPUTS. Otherwise, it might throw an exception due to impossible transformation( physical convention -> none convention).

Other change:

some cosmetic fix-ups

print the upper bound of the RelSubSet

returned-with-feedback tests-missing
opened by chunweilei 22
[CALCITE-4787] Replace ImmutableBeans with Immutables

Replace the use of reflection/dynamic proxies with the AnnotationProcessor provided by Immutables

NOTE: This is an initial patch that only changes one ImmutableBean to Immutables to show what the changes look like

opened by jacques-n 19
[CALCITE-1128] Implement JDBC batch update methods in remote driver
This commit provides an implementation for:

Statement.addBatch(String)

PreparedStatement.addBatch()

PreparedStatement.executeBatch()

The implementation is fairly straightforward except for the addition of a new server interface: ProtobufMeta. This is a new interface which the Meta implementation can choose to also implement to provide a "native" implementation on top of Protobuf objects instead of the Avatica POJOs.

During the investigations Avatica performance pre-1.7.0, it was found that converting protobufs to POJOs was a very hot code path. This short-circuit helps us avoid extra objects on the heap and computation to create them in what should be a very hot code path for write-workloads.
opened by joshelser 19
CALCITE-1386 ITEM operator seems to ignore the value type of collection and assign the value to Object
Observed behavior is described here: https://issues.apache.org/jira/browse/CALCITE-1386

Below is the description of this patch:

Modify MethodImplementor to cast return value to desired return type when necessary

Change ItemImplementor to use NullPolicy.ANY since ITEM can still return null even though both operands are not null

Fix Types.castIfNecessary to handle RecordType as an exceptional case (toClass() doesn't handle RecordType and throws Exception)

Change Csv tests to test its behavior

Please let me know if I'm encouraged to do additional works. Thanks in advance!
opened by HeartSaVioR 18
[CALCITE-4898] Upgrading Elasticsearch version from 7.0.1 to 7.15.2
This PR upgrades embedded Elasticsearch version from 7.0.1 to 7.15.2.

Description:

New dependencies: org.codelibs.elasticsearch.module:scripting-painless-spi, as module "org.elasticsearch.painless.spi" is removed after ES 7.15.0 in lang-painless

Third maven repo: org.codelibs.elasticsearch.module:lang-painless is no longer maintained after ES 7.10.2, which is migrated to https://maven.codelibs.org/

RestClient Upgrading: the low level rest client in ES has good compatibilities(just http request) among 7.X, which is also upgraded to 7.15.2

Self-verification: I've run some tests locally to make sure new feature can be applied(not added in unit test).

Supported: new features like RareTerms、minimun_interval in auto_date_histogram can be successfully applied, which are not supported in ES 7.0.1

Not supported: top_metrics、multi_terms、rate and other features in x-pack are not supported currently. Those features can be registered when AnalyticsPlugin is loaded to build ES Node(test environment), however, dependency org.elasticsearch.plugin:x-pack-analytics cannot be reached through maven central at present
opened by ILuffZhe 17
[CALCITE-5452] Add BigQuery LENGTH() as synonym for CHAR_LENGTH()

Add LENGTH() as a library function as an alias for the standard CHAR_LENGTH(). Some dependencies for soon-to-be deprecated standard functions refactored to avoid null pointer exceptions as a result of circular dependencies between the standard and library operators. This decision was made with the help of @mkou .

opened by tanclary 1
[CALCITE-5436] Implement DATE_SUB, TIME_SUB, TIMESTAMP_SUB (compatible w/ BigQuery)

Add support for BigQuery's DATE_SUB, TIME_SUB, and TIMESTAMP_SUB functions. Create MINUS_DATE2 operator to handle subtracting an interval from a timestamp, time, or date expression. This differs from the standard MINUS_DATE operator which takes 3 arguments which is designed to subtract one time expression from another. Add week and quarter as valid time units for intervals in the parser.

opened by tanclary 0

Owner

The Apache Software Foundation

GitHub https://calcite.apache.org/

Apache Druid: a high performance real-time analytics database.

12.3k Jan 1, 2023

Apache Hive

Apache Hive (TM) The Apache Hive (TM) data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storag

4.6k Dec 28, 2022

OrientDB is the most versatile DBMS supporting Graph, Document, Reactive, Full-Text, Geospatial and Key-Value models in one Multi-Model product. OrientDB can run distributed (Multi-Master), supports SQL, ACID Transactions, Full-Text indexing and Reactive Queries. OrientDB Community Edition is Open Source using a liberal Apache 2 license.

OrientDB | Develop branch: | 2.2.x branch: | Chat with the community: What is OrientDB? OrientDB is an Open Source Multi-Model NoSQL DBMS with the sup

4.5k Dec 30, 2022

The Chronix Server implementation that is based on Apache Solr.

Chronix Server The Chronix Server is an implementation of the Chronix API that stores time series in Apache Solr. Chronix uses several techniques to o

262 Jul 3, 2022

Apache Pinot - A realtime distributed OLAP datastore

What is Apache Pinot? Features When should I use Pinot? Building Pinot Deploying Pinot to Kubernetes Join the Community Documentation License What is

4.4k Dec 30, 2022

Apache Ant is a Java-based build tool.

Apache Ant What is it? ----------- Ant is a Java based build tool. In theory it is kind of like "make" without makes wrinkles and with

355 Dec 22, 2022

Apache Aurora - A Mesos framework for long-running services, cron jobs, and ad-hoc jobs

NOTE: The Apache Aurora project has been moved into the Apache Attic. A fork led by members of the former Project Management Committee (PMC) can be fo

627 Nov 28, 2022

Apache Drill is a distributed MPP query layer for self describing data

Apache Drill Apache Drill is a distributed MPP query layer that supports SQL and alternative query languages against NoSQL and Hadoop data storage sys

1.8k Jan 7, 2023

Flink Connector for Apache Doris(incubating)

Flink Connector for Apache Doris (incubating) Flink Doris Connector More information about compilation and usage, please visit Flink Doris Connector L

115 Dec 20, 2022

HurricaneDB a real-time distributed OLAP engine, powered by Apache Pinot

HurricaneDB is a real-time distributed OLAP datastore, built to deliver scalable real-time analytics with low latency. It can ingest from batch data sources (such as Hadoop HDFS, Amazon S3, Azure ADLS, Google Cloud Storage) as well as stream data sources (such as Apache Kafka).

4 Dec 28, 2022

Calcite Clojure wrapper / integration

calcite-clj - Use Apache Calcite from Clojure Small library to facilitate the implementation of calcite adapters in clojure. It implements org.apache.

24 Nov 5, 2022

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

Oryx 2 is a realization of the lambda architecture built on Apache Spark and Apache Kafka, but with specialization for real-time large scale machine l

1.8k Dec 28, 2022

Encog java core Apache 2 Encog java core Encog is an advanced machine learning framework that supports a variety of advanced algorithms, as well as support classes to normalize and process data. Machine learning algorithms such as Support Vector Machines, Artificial Neural Networks, Genetic Programming, Bayesian Networks, Hidden Markov Models, Genetic Programming and Genetic Algorithms are supported. License: Apache 2 , .

Encog Machine Learning Framework Encog is a pure-Java/C# machine learning framework that I created back in 2008 to support genetic programming, NEAT/H

739 Dec 17, 2022

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

Oryx 2 is a realization of the lambda architecture built on Apache Spark and Apache Kafka, but with specialization for real-time large scale machine l

1.7k Mar 12, 2021

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

Oryx 2 is a realization of the lambda architecture built on Apache Spark and Apache Kafka, but with specialization for real-time large scale machine l

1.8k Dec 28, 2022

Google Firing range Apache 2 Google Firing range Firing Range is a test bed for web application security scanners, providing synthetic, wide coverage for an array of vulnerabilities. It can be deployed as a Google App Engine application. License: Apache 2 , .

What is Firing Range? Firing Range is a test bed for web application security scanners, providing synthetic, wide coverage for an array of vulnerabili

1.3k Jan 7, 2023

Equivalent Exchange 3 Apache 2 Equivalent Exchange 3 pahimar Equivalent-Exchange-3. Mods for Minecraft. License: Apache 2 , .

Welcome to Equivalent Exchange 3! All versions are available here Minecraft Forums page Compiling EE3 - For those that want the latest unreleased feat

709 Dec 15, 2022

Apache Calcite

Related tags

Overview

Apache Calcite

Comments

[CALCITE-3873] Use global caching for ReflectiveVisitDispatcher implementation

[CALCITE-3737][CALCITE-3780] Implement HOP and SESSION table functions

[CALCITE-3272] Support TUMBLE as Table Valued Function including an enumerable implementation, stream.iq and DESCRIPTOR

[CALCITE-2913] Adapter for Apache Kafka

[CALCITE-2808] Add the JSON_LENGTH function

[CALCITE-2601] Add REVERSE function

[CALCITE-4368] TopDownOptTest fails if applying non-substitution rule first

[CALCITE-4787] Replace ImmutableBeans with Immutables

[CALCITE-1128] Implement JDBC batch update methods in remote driver

CALCITE-1386 ITEM operator seems to ignore the value type of collection and assign the value to Object

[CALCITE-4898] Upgrading Elasticsearch version from 7.0.1 to 7.15.2

[CALCITE-5452] Add BigQuery LENGTH() as synonym for CHAR_LENGTH()

[CALCITE-5436] Implement DATE_SUB, TIME_SUB, TIMESTAMP_SUB (compatible w/ BigQuery)

Owner

The Apache Software Foundation

Apache Druid: a high performance real-time analytics database.

Apache Hive

The Chronix Server implementation that is based on Apache Solr.

Apache Pinot - A realtime distributed OLAP datastore

Apache Ant is a Java-based build tool.

Apache Aurora - A Mesos framework for long-running services, cron jobs, and ad-hoc jobs

Apache Drill is a distributed MPP query layer for self describing data

Flink Connector for Apache Doris(incubating)

HurricaneDB a real-time distributed OLAP engine, powered by Apache Pinot

Calcite Clojure wrapper / integration

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

Google Firing range Apache 2 Google Firing range Firing Range is a test bed for web application security scanners, providing synthetic, wide coverage for an array of vulnerabilities. It can be deployed as a Google App Engine application. License: Apache 2 , .

Equivalent Exchange 3 Apache 2 Equivalent Exchange 3 pahimar Equivalent-Exchange-3. Mods for Minecraft. License: Apache 2 , .

Apache Solr is an enterprise search platform written in Java and using Apache Lucene.

FLiP: StreamNative: Cloud-Native: Streaming Analytics Using Apache Flink SQL on Apache Pulsar

Apache Cayenne is an open source persistence framework licensed under the Apache License