The Chronix Server implementation that is based on Apache Solr.

Overview

Build Status Coverage Status Sputnik Apache License 2 Download

Chronix Server

The Chronix Server is an implementation of the Chronix API that stores time series in Apache Solr. Chronix uses several techniques to optimize query times and storage demand. Thus Chronix achieves on a benchmark asking serveral ranges (.5 day up to 180 days) an average runtime per range-query of 23 milliseconds. The dataset contains about 3.7 billion pairs and takes 108 GB serialized as CSV. Chronix needs only 8.7 GB to store the dataset. Everything runs on a standard laptop computer. No need of clustering, parallel processing or another complex stuff. Check it out and give it a try.

The repository chronix.examples contains some examples.

How Chronix Server stores time series

Chronix Architecture

The key data type of Chronix is called a record. It stores a chunk of time series data in a compressed binary large object. The record also stores technical fields, time stamps for start and end, that describe the time range of the chunk of data, and a set of arbitrary user-defined attributes. Storing records instead of individual pairs of time stamp and value has two major advantages:

  1. A reduced storage demand due to compression
  2. Almost constant query times for accessing a chunk due to indexable attributes and a constant overhead for decompression.

The architecture of Chronix has the four building blocks shown in Figure. It is well-suited to the parallelism of multi-core systems. All blocks can work in parallel to each other to increase the throughput.

Semantic Compression

Semantic Compression is optional and reduces the amount of time series with the goal of storing fewer records. It uses techniques that exploit knowledge on the shape and the significance of a time series to remove irrelevant details even if some accuracy is lost, e.g. dimensionality reduction through aggregation.

Attributes and Chunks

Attributes and Chunks breaks down time series into chunks of n data points that are serialized into c Bytes. It also calculates the attributes and the pre-calculated values of the records. Part of this serialization is a Date-Delta Compaction that compares the deltas between time stamps. It serializes only the value if the aberration of two deltas is within a defined range, otherwise it writes both the time stamp and the value to the record's data field.

Basic Compression

Then Basic Compression uses gzip, a lossless compression technique that operates on c consecutive bytes. Only the record's data field is compressed to reduce the storage demand while the attributes remain uncompressed for access. Compression of operational time series data yields a high compression rate due its value characteristics. In spite of the decompression costs when accessing data, compression actually improves query times as data is processed faster.

Multi-Dimensional Storage

The Multi-Dimensional Storage holds the records in a compressed binary format. Only the fields that are necessary to locate the records are visible as so-called dimensions to the data storage system. Queries can then use any combination of those dimensions to locate records. Chronix uses Apache Solr as it ideally matches the requirements. Furthermore Chronix has built-in analysis functions, e.g, a trend and outlier detector, to optimize operational time series analyses.

Data model

Chronix allows one to store any kind of time series and hence the data model is open to your needs. Chronix Server per default uses the Chronix Time Series package. The data model for the Chronix Time Series package.

A time series has at least the following required fields:

Field Name Value Type
start Long
end Long
name String
type String
data Byte[]

The data field contains json serialized and gzip compressed points of time stamp (long) and numeric value (double). Furthermore a time series can have arbitrary user-defined attributes. The type of an attribute is restricted by the available fields of Apache Solr.

Chronix Server Client (Source)

A Java client that is used to store and stream time series from Chronix. The following code snippet shows how to setup an connection to Chronix and stream time series. The examples uses the Chronix API, Chronix Server Client, Chronix Time Series and SolrJ

//An connection to Solr
SolrClient solr = new HttpSolrClient("http://localhost:8983/solr/chronix/");

//Define a group by function for the time series records
Function<MetricTimeSeries, String> groupBy = ts -> ts.getName() + "-" + ts.attribute("host");

//Define a reduce function for the grouped time series records
BinaryOperator<MetricTimeSeries> reduce = (ts1, ts2) -> {
      MetricTimeSeries.Builder reduced = new MetricTimeSeries.Builder(ts1.getName(),ts1.getType())
            .points(concat(ts1.getTimestamps(), ts2.getTimestamps()),
                  concat(ts1.getValues(), ts2.getValues()))
            .attributes(ts1.attributes());
            return reduced.build();
        };

//Create a Chronix Client with a metric time series and the Chronix Solr Storage
ChronixClient<MetricTimeSeries,SolrClient,SolrQuery> chronix = 
                                          new ChronixClient<>(new MetricTimeSeriesConverter(),
                                          new ChronixSolrStorage<>(nrOfDocsPerBatch,groupBy,reduce));

//Lets stream time series from Chronix. We want the maximum of all time series that metric matches *load*.
SolrQuery query = new SolrQuery("name:*load*");
query.setParam("cf","metric{max}");

//The result is a Java Stream. We simply collect the result into a list.
List<MetricTimeSeries> maxTS = chronix.stream(solr, query).collect(Collectors.toList());

Chronix Server Parts

The Chronix server parts are Solr extensions (e.g. a custom query handler). Hence there is no need to build a custom modified Solr. We just plug the Chronix server parts into a standard Solr.

The following sub projects are Solr extensions and ship with the binary release of Chronix. The latest release of Chronix server is based on Apache Solr version 6.4.2

Chronix Server Query Handler (Source)

The Chronix Server Query Handler is the entry point for requests asking for time series. It splits a request based on the filter queries up in range or function queries:

  • cf={function;function};{function;function};... (for aggregations, analyses, or transformations)
  • cf='' (empty, for range queries)

But before the Chronix Query Handler delegates a request, it modifies the user query string. This is necessary as Chronix stores records and hence a query asking for a specific time range has to be modified. As a result it converts a query:

host:prodI4 AND name:\\HeapMemory\\Usage\\Used AND start:NOW-1MONTH AND end:NOW-10DAYS

in the following query:

host:prodI4 AND name:\\HeapMemory\\Usage\\Used AND -start:[NOW-10DAYS-1ms TO *] AND -end:[* TO NOW-1MONTH-1ms]

Range Query

A range query is answered using the default Solr query handler which supports all the great features (fields, facets, ...) of Apache Solr.

Example Result:

{
  "responseHeader":{
    "query_start_long":0,
    "query_end_long":9223372036854775807,
    "status":0,
    "QTime":3},
  "response":{"numFound":21,"start":0,"docs":[
      {
        "start":1377468017361,
        "name":"\\Load\\max",
        "end":1377554376850,
        "data":"byte[]" // serialized and compressed points
       },...
   ]
}

Function Query

A custom query handler answers function queries. Chronix determines if a query is a function query by using the filter query mechanism of Apache Solr. There are three types of functions: Aggregations, Transformations, and High-level Analyses.

Currently the following functions are available:

(See the GPL2 branch that has more functions)

  • Maximum (metric{max})
  • Minimum (metric{min})
  • Average (metric{avg})
  • Standard Deviation (metric{dev})
  • Percentiles (metric{p:[0.1,...,1.0]})
  • Count (metric{count}) (Release 0.2)
  • Sum (metric{sum}) (Release 0.2)
  • Range (metric{range}) (Release 0.2)
  • First/Last (metric{first/last}) (Release 0.2)
  • Bottom/Top (metric{bottom/top:10}) (Release 0.2)
  • Derivative (metric{derivative}) (Release 0.2)
  • Non Negative Derivative (metric{nnderivative}) (Release 0.2)
  • Difference (metric{diff}) (Release 0.2)
  • Signed Difference (metric{sdiff}) (Release 0.2)
  • Scale (metric{scale:0.5}) (Release 0.2)
  • Divide (metric{divide:4}) (Release 0.2)
  • Time window based Moving Average (metric{movavg:10,MINUTES}) (Release 0.2)
  • Samples based Moving Average (metric{smovavg:10}) (Release 0.4)
  • Add (metric{add:4}) (Release 0.2)
  • Subtract (metric{sub:4}) (Release 0.2)
  • A linear trend detection (metric{trend})
  • Outlier detection (metric{outlier})
  • Frequency detection (metric{frequency:10,6})
  • Time series similarity search (metric{fastdtw:compare(metric=Load),1,0.8})
  • Timeshift (metric{timeshift:[+/-]10,DAYS}) (Release 0.3)
  • Distinct (metric{distinct}) (Release 0.4)
  • Integral (metric{integral}) (Release 0.4)
  • SAX (metric{sax:*af*,10,60,0.01})

Multiple analyses, aggregations, and transformations are allowed per query. If so, Chronix will first execute the transformations in the order they occur. Then it executes the analyses and aggregations on the result of the chained transformations. For example the query:

cf=metric{max;min;trend;movavg:10,minutes;scale:4}

is executed as follows:

  1. Calculate the moving average
  2. Scale the result of the moving average by 4
  3. Calculate the max, min, and the trend based on the prior result.

A function query does not return the raw time series data by default. It returns all requested time series attributes, the analysis and its result. With the enabled option fl=+data Chronix will return the data for the analyses. The attributes are merged using a set to avoid duplicates. For example a query for a metric that is collected on several hosts might return the following result:

{
  "responseHeader":{
    "query_start_long":0,
    "query_end_long":9223372036854775807,
    "status":0,
    "QTime":3},
  "response":{"numFound":21,"start":0,"docs":[
      {
        "start":1377468017361,
        "name":"\\Load\\max",
        "end":1377554376850,
        "host:"["host-1","host-2", ...]
       }...
   ]
}

A few example analyses:

q=name:*load* // Get all time series that metric name matches *load*

+ cf=metric{max} //Get the maximum of 
+ cf=metric{p:0.25} //To get the 25% percentile of the time series data
+ cf=metric{trend} //Returns all time series that have a positive trend
+ cf=metric{frequency=10,6} //Checks time frames of 10 minutes if there are more than 6 points. If true it returns the time series.
+ cf=metric{fastdtw(metric:*load*),1,0.8} //Uses fast dynamic time warping to search for similar time series

Join Time Series Records

An query can include multiple records of time series and therefore Chronix has to know how to group records that belong together. Chronix uses a so called join function that can use any arbitrary set of time series attributes to group records. For example we want to join all records that have the same attribute values for host, process, and name:

cj=host,process,name

If no join function is defined Chronix applies a default join function that uses the name.

Modify Chronix' response

Per default Chronix returns (as Solr does) all defined fields in the schema.xml. One has three ways to modify the response using the fl parameter:

One specific user defined field

If only a specific user defined field is needed, e.g. the host field, one can set:

fl=host

Then Chronix will return the host field and the required fields (start,end,data,id).

Exclude a specific field

If one do not need a specific field, such as the data field, one can pass -data in the fl parameter.

fl=-data

In that case all fields, expect the data field, are returned. Even when the excluded field is a required field.

Explicit return of a field

This is useful in combination with an analysis. Analyses per default do not return the raw data for performance reasons. But if the raw data is needed, one can pass

fl=+data

Chronix Response Writer

This allows one to query raw (uncompressed) data from Chronix in JSON format. To execute the transformer you have to add it to the fl parameter:

q=name:*load*&fl=+dataAsJson //to get all fields and the dataAsJson field
q=name:*load*&fl=dataAsJson //to get only the required fields (except the data field) and dataAsJson

The records in the result contains a field called dataAsJson that holds the raw time series data as json. Note: The data field that normally ship the compressed data is not included in the result.

Example Result:

{
  "responseHeader":{
    "query_start_long":0,
    "query_end_long":9223372036854775807,
    "status":0,
    "QTime":3},
  "response":{"numFound":21,"start":0,"docs":[
      {
        "start":1377468017361,
        "name":"\\Load\\max",
        "end":1377554376850,
        "dataAsJson":"[[timestamps],[values]]" //as json string
       }...
   ]
}

Chronix Plug-ins

Chronix provides a plug-in mechanism to add user-defined types as well as function for types.

Types

See the Metric type for an example.

Functions

See the NoOp funtion for metric types for an example.

We will provide more information in the new documentation of Chronix.

Chronix Server Retention (Source)

The Chronix Server Retention plugin deletes time series data that is older than a given threshold. The configuration of the plugin is within the config.xml of the Solr Core. The following snippet of Solr config.xml shows the configuration:

<requestHandler name="/retention" class="de.qaware.chronix.solr.retention.ChronixRetentionHandler">
  <lst name="invariants">
   <!-- Use the end field of a record to determine its age. -->
   <str name="queryField">end</str>
   <!-- Delete time series that are older than 40DAYS -->
   <str name="timeSeriesAge">40DAYS</str> 
    <!-- Do it daily at 12 o'clock -->
   <str name="removeDailyAt">12</str>
   <!-- Define the source  -->
   <str name="retentionUrl">http://localhost:8983/solr/chronix/retention</str>
   <!-- Define how the index is updated after deletion -->
   <str name="optimizeAfterDeletion">false</str>
   <str name="softCommit">false</str>
  </lst>
</requestHandler>

Usage

All libraries are available in the Chronix Bintray Maven repository. A build script snippet for use in all Gradle versions, using the Chronix Bintray Maven repository:

repositories {
    mavenCentral()
    maven {
        url "http://dl.bintray.com/chronix/maven"
    }
}
dependencies {
   compile 'de.qaware.chronix:chronix-server-client:<currentVersion>'
   compile 'de.qaware.chronix:chronix-server-query-handler:<currentVersion>'
   compile 'de.qaware.chronix:chronix-server-retention:<currentVersion>'
}

Contributing

Is there anything missing? Do you have ideas for new features or improvements? You are highly welcome to contribute your improvements, to the Chronix projects. All you have to do is to fork this repository, improve the code and issue a pull request.

Building Chronix from Scratch

Everything should run out of the box. The only two things that must be available:

  • Git
  • JDK 1.8

Just do the following steps:

cd <checkout-dir>
git clone https://github.com/ChronixDB/chronix.server.git
cd chronix.server
./gradlew clean build

Maintainer

Florian Lautenschlager @flolaut

License

This software is provided under the Apache License, Version 2.0 license.

See the LICENSE file for details.

Comments
  • Opentsdb Oddness --

    Opentsdb Oddness --

    So i am attempting to setup Chronix to ingest mostly opentsdb feeds while waiting to update our legacy applications to use chronix libs natively for imports. One thing i noticed is opentsdb.ingest doesn't support gzipped data incoming (Actual opentsdb api does) but that's not a deal breaker or anything - can prolly have a pull request to remedy that.

    My issue is actually having it store/retrieve data --> i've tried several schema for metric tags

    <?xml version="1.0" encoding="UTF-8" ?>
    
    <schema name="Chronix" version="1.5">
    
        <types>
            <fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
            <fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
            <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" positionIncrementGap="0"/>
            <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
            <fieldType name="binary" class="solr.BinaryField"/>
        </types>
    
        <fields>
            <field name="id" type="string" indexed="true" stored="true" required="true"/>
            <field name="_version_" type="long" indexed="true" stored="true"/>
            <field name="start" type="long" indexed="true" stored="true" required="true"/>
            <field name="end" type="long" indexed="true" stored="true" required="true"/>
            <field name="data" type="binary" indexed="true" stored="true" required="false"/>
            <field name="metric" type="string" indexed="true" stored="true" required="true"/>
            <!-- Added these after it complained about unknown fields -unknown field 'hashKey'  -->
            <field name="hashKey" type="string" indexed="true" stored="true" required="false"/>
            <field name="nodeKey" type="string" indexed="true" stored="true" required="false"/>
            <field name="cmts" type="string" indexed="true" stored="true" required="false"/>
            <field name="upstream" type="string" indexed="true" stored="true" required="false"/>
            <field name="downstream" type="string" indexed="true" stored="true" required="false"/>
            <!-- Dynamic field for tags-->
            <dynamicField name="*_s" type="string" indexed="true" stored="true"/>
        </fields>
        <uniqueKey>id</uniqueKey>
        <solrQueryParser defaultOperator="OR"/>
    </schema>
    

    previously, i had tried to set it up like your promethus example - have opentsdb 'tags' stored in dynamic field.. but that did not work. Also, the example schema.xml that comes with 0.5.zip did not have the metric/string field.

    The grafana plugin just hangs forever, and the java app for exploring doesn't return back info

    The opentsdb PUT data looks like this:

    [
      {
        "metric": "cablemodem.receive.modem",
        "timestamp": 1492613453,
        "value": "1.6",
        "tags": {
          "cmts": "192.168.0.254",
          "downstream": "cable-downstream-12/1/14",
          "hashKey": "57d076563a856e2ad4342a94d59d340ba31c7b8b",
          "nodeKey": "9a40b1104cca6375627af9b222898328993de5dd"
        }
      },
      {
        "metric": "cablemodem.snr.modem",
        "timestamp": 1492613453,
        "value": "36.8",
        "tags": {
          "cmts": "192.168.0.254",
          "downstream": "cable-downstream-12/1/14",
          "hashKey": "57d076563a856e2ad4342a94d59d340ba31c7b8b",
          "nodeKey": "9a40b1104cca6375627af9b222898328993de5dd"
        }
      }
    ]
    
    
    
    I guess what i am looking for is more verbose documentation on grafana plugin, and more documentation/help on setting up opentsdb ingest/proper schema to have seamless transition from opentsdb --> chronix
    
    bug enhancement help wanted waiting for feedback 
    opened by devaudio 28
  • NullPointerException in CF CQL Parsing for parallel requests

    NullPointerException in CF CQL Parsing for parallel requests

    Hi Florian,

    We see NPE issue in our deployment on few http requests. So I set-up a test(JMH) with 10 threads to simulate parallel queries. It seems the CQL parsing has some problem in concurrency from the initial look.

    2 NPE Issues:

    I build the chronix.server project with additional logs for the AnalysisHandler (with Handling analysis params). I think the problem is in the CQL cql = new CQL(TYPES, FUNCTIONS), may be some context is not thread-safe.

    1. NPE
    2019-08-18 05:19:15.023 INFO  (qtp1888442711-51) [   x:chronix] d.q.c.s.q.a.AnalysisHandler Handling analysis request {q=-end:[*+TO+1565684399999]+AND+-start:[1565699099999+TO+*]++AND+mdefId:(10020)&cf=metric{avg}&fl=mdefId,data,type,action,end,id,start,_version_,name&start=0&rows=2000&wt=javabin&version=2&query_start_long=1565684400000&query_end_long=1565699100000}
    
    2019-08-18 05:19:15.023 INFO  (qtp1888442711-51) [   x:chronix] d.q.c.s.q.a.AnalysisHandler **Handling analysis params thread(51)** - q=-end:[*+TO+1565684399999]+AND+-start:[1565699099999+TO+*]++AND+mdefId:(10020)&cf=metric{avg}&fl=mdefId,data,type,action,end,id,start,_version_,name&start=0&rows=2000&wt=javabin&version=2&query_start_long=1565684400000&query_end_long=1565699100000
    
    
    2019-08-18 05:19:15.025 ERROR (qtp1888442711-51) [   x:chronix] o.a.s.h.RequestHandlerBase java.lang.NullPointerException
    	at org.antlr.v4.runtime.Parser.exitRule(Parser.java:643)
    	at de.qaware.chronix.cql.antlr.CQLCFParser.cqlcf(CQLCFParser.java:151)
    	at de.qaware.chronix.cql.CQL.parseCF(CQL.java:80)
    	at de.qaware.chronix.solr.query.analysis.AnalysisHandler.handleRequestBody(AnalysisHandler.java:232)
    	at de.qaware.chronix.solr.query.ChronixQueryHandler.handleRequestBody(ChronixQueryHandler.java:112)
    	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
    	at org.apache.solr.core.SolrCore.execute(SolrCore.java:2566)
    	at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:756)
    	at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:542)
    	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:397)
    	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
    	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
    	at org.eclipse.jetty.servlets.CrossOriginFilter.handle(CrossOriginFilter.java:311)
    	at org.eclipse.jetty.servlets.CrossOriginFilter.doFilter(CrossOriginFilter.java:265)
    	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
    	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
    	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
    	at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:753)
    	at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
    
    1. NPE
    2019-08-18 05:19:15.029 INFO  (qtp1888442711-48) [   x:chronix] d.q.c.s.q.a.AnalysisHandler Handling analysis request {q=-end:[*+TO+1565684399999]+AND+-start:[1565699099999+TO+*]++AND+mdefId:(10020)&cf=metric{avg}&fl=mdefId,data,type,action,end,id,start,_version_,name&start=0&rows=2000&wt=javabin&version=2&query_start_long=1565684400000&query_end_long=1565699100000}
    
    2019-08-18 05:19:15.029 INFO  (qtp1888442711-48) [   x:chronix] d.q.c.s.q.a.AnalysisHandler **Handling analysis params thread(48)** - q=-end:[*+TO+1565684399999]+AND+-start:[1565699099999+TO+*]++AND+mdefId:(10020)&cf=metric{avg}&fl=mdefId,data,type,action,end,id,start,_version_,name&start=0&rows=2000&wt=javabin&version=2&query_start_long=1565684400000&query_end_long=1565699100000
    
    2019-08-18 05:19:15.030 ERROR (qtp1888442711-48) [   x:chronix] o.a.s.h.RequestHandlerBase java.lang.NullPointerException
    	at org.antlr.v4.runtime.Lexer.nextToken(Lexer.java:172)
    	at org.antlr.v4.runtime.UnbufferedTokenStream.fill(UnbufferedTokenStream.java:203)
    	at org.antlr.v4.runtime.UnbufferedTokenStream.<init>(UnbufferedTokenStream.java:99)
    	at org.antlr.v4.runtime.UnbufferedTokenStream.<init>(UnbufferedTokenStream.java:92)
    	at de.qaware.chronix.cql.CQL.init(CQL.java:95)
    	at de.qaware.chronix.cql.CQL.parseCF(CQL.java:78)
    	at de.qaware.chronix.solr.query.analysis.AnalysisHandler.handleRequestBody(AnalysisHandler.java:232)
    	at de.qaware.chronix.solr.query.ChronixQueryHandler.handleRequestBody(ChronixQueryHandler.java:112)
    	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
    	at org.apache.solr.core.SolrCore.execute(SolrCore.java:2566)
    	at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:756)
    	at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:542)
    	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:397)
    	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
    	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
    	at org.eclipse.jetty.servlets.CrossOriginFilter.handle(CrossOriginFilter.java:311)
    	at org.eclipse.jetty.servlets.CrossOriginFilter.doFilter(CrossOriginFilter.java:265)
    	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
    	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
    
    

    Data: No data in Chronix db. Code: (run with 10 threads (jmh or any setup)).

    String q = "start:2019-08-13T08:20:00.000Z AND end:2019-08-13T12:25:00.000Z AND mdefId:(10020)"; ChronixClient<MetricTimeSeries, SolrClient, SolrQuery> chronixClient = getChronixClient(200); HashMap<String, String> map = new HashMap<String, String>() {{ put("cf", "metric{avg}"); put("fl", "+data"); }}; SolrQuery solrQuery = new SolrQuery(q); for (Map.Entry<String, String> e : params.entrySet()) { solrQuery.setParam(e.getKey(), e.getValue()); } solrQuery.setRows(rows); List records = chronixClient.stream(solr, solrQuery).collect(Collectors.toList());

    Please help in this regard.

    Thanks, Alex

    bug 
    opened by alexnavis 11
  • Chronix v0.5-beta API and plugin-in documentation are not in sync

    Chronix v0.5-beta API and plugin-in documentation are not in sync

    Hi,

    I was trying to implement a custom function(bucketed avg to reduce data points) with the plugin-in framework (following the Noop example).

    I created a metric-functions.jar to deploy with the ChronixDb. I found the following issues:

    1. Chronix v0.5-beta release - does not have the FunctionCtx in the current release version. I see the master is 66 commits ahead of the release which contains the FunctionCtx changes. Current Release(0.5-beta) has the below contract: ChronixFunction.execute(T timeSeries, FunctionValueMap functionValueMap).
    2. Also the API(ChronixFunction) doesn't have way to set arguments setArguments(String[] args) (like the one in the latest documentation). I couldn't inject the arguments with the Guice @Inject as well. I'm stuck here.

    Questions:

    1. Is there a way to achieve the above to setArguments() for custom functions like (e.g., metric:{movavg:1,minutes} ) and make a workable plugin with the current Chronix v0.5-beta release ?
    2. If it is not possible to use v0.5-beta release, is it possible for you to just make release a new version for the plugin with the current master code changes ?. This will be helpful for people to be in sync with the readme document. I spent some time to understand why it didn't work.
    3. In the interim, is the master stable enough to use as the actual release ?. (I can do a local build to get the required jar).

    Thanks, Alex

    enhancement help wanted 
    opened by alexnavis 11
  • Chronix Simple Ingestion Interface

    Chronix Simple Ingestion Interface

    We should provide a simple ingestion interface for time series data, e.g. pairs of timestamp, value. We should adapt the protocols of InfluxDB, Graphite, ...

    enhancement 
    opened by FlorianLautenschlager 11
  • Key-value attributes

    Key-value attributes

    Hi,

    I just learned about Chronix, so dare with me if I have overlooked that but is it possible to add key value metadata to the measurements like host:myhost, application:myapp? Like the InfluxDB format or the format described here: https://www.elastic.co/blog/elasticsearch-as-a-time-series-data-store.

    Also it would be nice to have documentation about the http ingestion protocol and format, if available as well as the query api and aggregation functions.

    question 
    opened by felixbarny 11
  • Using SAX with Chronix

    Using SAX with Chronix

    Hi,

    Here at Georgia Tech, we're trying to use Chronix in NIH's MD2K research project (https://md2k.org). One feature in Chronix that's particularly interesting to us is a DB-native SAX implementation. However, I did not find any documentation on using SAX with Chronix.

    So here are my questions:

    1. How should one use Chronix to query the SAX representation of time-series?
    2. How should one get the calculated SAX representation (maybe pre-calculate with Chronix?)? With function queries?
    3. I saw in this issue (https://github.com/ChronixDB/chronix.examples/issues/14) that it's suggested we use the filtering method to directly find relevant representations. Is this approached preferred to storing the SAX representation back to Solr for searching/matching?
    4. I think SAX is only available in the GPL branch. Is there any features missing from that branch but are in master branch?

    Thanks so much for the help! Chronix truly seems like an amazing TSDB so far. /cc @FlorianLautenschlager

    question 
    opened by andyfangdz 10
  • missing some aggregations returned by cf

    missing some aggregations returned by cf

    Hi, I am facing the following issue, that sometimes I am not getting all aggregation results that I have sent in cf

    query {q=start:2019-10-14T21:00:00.000Z AND end:2019-10-15T21:00:00.000Z AND mdefId:(00024 10020) AND sVertexId:(26624 1632000 137984 1633792 272384 1633536 1628416 1630208 1684736 1700608), cf=metric{avg;max}, fl=-data,-gUpdatedTs, cj=name}

    A total of 900 records found but some with missing aggregation eg:

    [
        {
            "join_key":"...",
            "sVertexId":["26624"],
            "mdefId":["10020"],
            "start":1571124699940,
            "end":1571156694940,
            "0_function_max":0.18070587318441741,
            "1_function_avg":0.12480652753507317
        },
        {
            "join_key":"...",
            "sVertexId":["137984"],
            "mdefId":["00024"],
            "start":1571124684940,
            "end":1571156694940,
            "0_function_max":0.0
            **(missing avg)**
        },
        {
            "join_key":"...",
            "sVertexId":["26624"],
            "mdefId":["00024"],
            "start":1571124684940,
            "end":1571156694940,
            "1_function_avg":0.0
            **(missing max)**
        }
    ]
    

    This is not happening all the time but what would be the reason that avg or max be missing sometime.

    Chronix version: chronix-solr-8.1.1

    Thanks, Rishi

    bug 
    opened by Rishi0405 8
  • Missing Time series

    Missing Time series

    When I query more than 10 timeseries in a single rest call (in my case 12), sometimes I get one or more time series is missing, but I get all the document in solr query always.

    My sample query is

    {q=(id:metric1 OR id:metric2 OR id:metric3 OR id:metric4 OR id:metric5 OR id:metric6 OR id:metric7 OR id:metric8 OR id:metric9 OR id:metric10 OR id:metric11 OR id:metric12), tf=start:2019-11-08T09:00:00.000Z AND end:2019-11-08T10:00:00.000Z, fl=mdefId, name}

    Can you please help me on this?

    bug 
    opened by Rk85 6
  • adding gzip to Abstract Handler, _s dynamic string to opentsdb handler

    adding gzip to Abstract Handler, _s dynamic string to opentsdb handler

    Added gzip Pushback stream, so it can auto-detect if data incoming is gzipped or not.

    Added concat of "_s" to tags, similar to goclient method

    opened by devaudio 4
  • Bad benchmarks on initial test

    Bad benchmarks on initial test

    I've setup an initial test to see how an OK server would do with a decent load and the results don't look so I thought I would share them here and get some feedback on what I'm (probably) doing wrong.

    First thing to address would be storage footprint since that is relatively easy to compare apples to apples. The docs say Chronix will take 5-171 times less space (I assume this is compared to CSV or some relatively raw/simple data format). My data rows look like this:

    18,25,8547.736954352933,1523318400.973
    2,43,1980.6639051377774,1523318401.176
    17,69,9500.832828241511,1523318402.991
    13,12,1442.8229313976187,1523318403.377
    8,66,4088.2959033363563,1523318404.812
    5,84,5772.630417804327,1523318405.002
    1,54,7276.800267948981,1523318406.934
    

    After importing 16mm of these, I saw a storage footprint of 2.3GB. This is way over than what I expected. I have this data stored in CSV format at a rate of about 50mm data points in 2.2 GB without compression. Does this seem fishy?

    The second thing to address is insert rate. I've tried writing in batches of 100, 1000, 2500, 5000 using the golang lib (s_err := c.Store(series, true, time.Second)). The best I have seen was using a single thread to write 5000 data point batches at which point I was able to ingest ~ 800 datapoints/second. After this I tried writing with 10 threads and 20 threads in batches of 1000 and 2500 and max out around ~500 data points / second.

    Now I'm on a digital ocean server with 2 VCPU's, 4GB RAM, and Ubuntu but this still seems fishy based on some comparisons. Actually I'm not to worried about the write speed for my specific use case but it would be worth touching on as I'll probably want to be inserting at ~100-1000 data points / second. Is chronix/solr slow at inserts but fast at queries (is that the tradeoff?).

    Also I was definitely expecting a lighter storage footprint based on the claim. Am I doing something wrong?

    help wanted 
    opened by tamoyal 3
  • Chronix Server returns unknown field error

    Chronix Server returns unknown field error

    Hi,

    I'm using the OpenTSDB Java client to generate write requests to the chronix backend. Unfortunately, this fails with the error "unknown field 'gridCellId''" at server side. Here is my client code:

    ...
    MetricBuilder builder = MetricBuilder.getInstance();
    Metric metric = builder.addMetric(MEASUREMENT)
                        .setDataPoint(vehicleLocationReportStatistic.getLastUpdated().getTime(), vehicleLocationReportStatistic.getReportCount()).
                        .addTag("gridCellId", s(vehicleLocationReportStatistic.getVehicleReportInfo().getReportInfo().getGridCellId()));
    Response response = httpClient.pushMetrics(builder, ExpectResponse.STATUS_CODE);
    ...
    

    Do I somewhere have to register the tags I'm using in my time series in advance? Thanks four your help!

    opened by twiechert 3
  • Failed to collect dependencies

    Failed to collect dependencies

    Hi

    After the maven update, we couldn't download the dependencies from the maven repo, please check this issue. Below is the error seen when building the app

    [ERROR] Failed to execute goal on project oc-chronix-server-functions: Could not resolve dependencies for project com.opscruise:oc-chronix-server-functions:jar:0.1: Failed to collect dependencies at de.qaware.chronix:chronix-timeseries:jar:0.3.2-beta: Failed to read artifact descriptor for de.qaware.chronix:chronix-timeseries:jar:0.3.2-beta: Could not transfer artifact de.qaware.chronix:chronix-timeseries:pom:0.3.2-beta from/to chronix-mvn-repo (https://dl.bintray.com/chronix/maven/): Authorization failed for https://dl.bintray.com/chronix/maven/de/qaware/chronix/chronix-timeseries/0.3.2-beta/chronix-timeseries-0.3.2-beta.pom 403

    opened by Rishi0405 1
  • Query with multiple cores fails

    Query with multiple cores fails

    If we load data into 2 cores, the inserts go thru fine.

    If we query any one core at a time, the data comes thru fine.

    But if we specific more than one core in the query, the query fails.

    There is a time format error.

    opened by shridharV 1
  • querying chronix with cj=unknown_field makes chronix throw out of memory exception

    querying chronix with cj=unknown_field makes chronix throw out of memory exception

    Hi,

    I queried chronix with cj=some_field, few documents don't have the field I used with cj, after few seconds CPU processing went 100% and after some time it throws out of memory exception.

    Sample data

    {
    "type":"metric",
    "name":"m12",
    "start":1577701965293,
    "end":1577701965293,  "data":"H4sIAAAAAAAAAOPi1Dx7BgRq7Ll4FR4sF9JkgAIuJgNGLk7NWTNBQNOei00AKGvAKMAAAFjYSa0zAAAA",
    }
    

    Query http://localhost:8983/solr/chronix/select?_=1577708759371&cj=someField&q=*:*&rows=10&start=0

    Thanks, Rishi

    opened by Rishi0405 3
  • Please consider vagrant machine for demo

    Please consider vagrant machine for demo

    It would be much easier to take a look if this project is an interesting alternative if you did make it much easier to do so.

    If you provided a vagrant machine it would be easier to understand the project without having to spend hours of installing and configuration of unknown software.

    It will help a lot to gain some attention.

    opened by supersexy 1
  • Aggregation by time

    Aggregation by time

    I use chronix to store counts of available car sharing vehicles. I use tags/attributes to distinguish different vehicle classes, such that I don't need to manage a dedicated time series per vehicle type/class. Thus, at every t, I persist multiple records.

    How do I aggregate the time series by time and apply a sum operator, so that I'm able to retrieve a series of available vehicles regardless of vehicle type/class?

    Thanks in advance.

    question feature 
    opened by twiechert 3
Releases(0.5.2)
  • 0.5.2(Nov 18, 2019)

  • 0.5.1(Aug 24, 2019)

  • 0.5(Jul 15, 2019)

  • v0.5-beta(Mar 13, 2017)

  • v0.4(Nov 15, 2016)

    Features and Fixes

    The release includes the following fixes and features:

    • #43 Documentation
    • #49 Small chunk compaction
    • #57 Chronix simple ingestion interface
    • #76 Bug when using field in join key that is not part of the requested fields
    • #82 Fixed Moving Average implementation (based on time-window)
    • #92 Moving Average based on a fixed size of samples
    • #102 Bug when joining fields
    • #109 Bug when using functions and join within a query
    • #116 Upgraded to Solr 6.3.0
    Source code(tar.gz)
    Source code(zip)
    chronix-0.4.zip(122.56 MB)
  • 0.3(Jul 12, 2016)

    Features and Fixes

    The release includes the following fixes and features:

    • #61 Timeshift transformation
    • #62 CORS is now enabled by default (for use with grafana)
    • #63 Upgrade to Solr 6.1
    • #67 Server-side compression (gzip) is now available (accept-encoding: gzip)
    • #68 Points are now sorted when requested as json (Bug Fix)

    And all features of the prior versions. ;-)

    Source code(tar.gz)
    Source code(zip)
    chronix-0.3.zip(114.04 MB)
  • 0.2(Jun 3, 2016)

    Features and Fixes

    The release includes the following fixes and features:

    • #53 The chronix-respsonse-writer plugin is removed (see #35)
    • #39 Add / Subtract transformation
    • #35 Data as json even for transformations and range queries (fl=dataAsJson)
    • #34 Functions are only executed once (max,max => max)
    • #32 Empty arguments for functions and aggregations are not returned anymore
    • #26 Vectorization Transformation
    • #24 Bug in percentile aggregation
    • Upgraded to Solr 6.0.1

    And all features of the prior versions.

    Source code(tar.gz)
    Source code(zip)
    chronix-0.2.zip(113.51 MB)
  • v0.2-beta-1(May 3, 2016)

    Features and Fixes

    The release includes the following fixes and features:

    • #24 Bug in percentile aggregation
    • #22 Option to request data field
    • #21 Range aggregation
    • #20 Last aggregation
    • #19 First aggregation
    • #18 Count aggregation
    • #17 Sum aggregation
    • #16 Singed difference aggregation
    • #15 Difference aggregation
    • #10 Upgraded to Solr 6.0
    • #9 Analyze multiple analyses in one request
    Source code(tar.gz)
    Source code(zip)
    chronix-0.2-beta-1.zip(113.47 MB)
  • v0.1.3(Mar 7, 2016)

    Features and Fixes

    • The release includes the following fixes:
      • #4 Fixed a bug in the chronix client
      • #6 Date (fields for start / end) parsing in sub queries
      • #7 Merging attributes of an analysis results
      • #8 Fixed a problem with FastDTW when two or more timestamps have the same value

    Features

    • Kassiopeia 1.7 (performance improvements)
    • Some refactoring and performance optimizations
    Source code(tar.gz)
    Source code(zip)
    chronix-0.1.3.zip(112.64 MB)
  • v0.1.2(Feb 26, 2016)

    Solr 5.5.0 and FastDTW

    This release contains the following changes / improvements:

    Main Features

    • Upgraded Solr to Version 5.5.0
    • Changed syntax of Chronix functions
      • ag = still used for aggregations
      • analysis = still used for analyses
      • arguments are now split on "," ( ":" former)
      • aggregation / functions and arguments are divided with ":"
      • [ag|analysis]=([min,max,avg,p,dev]|[trend,outlier,frequency,fastdtw]):(arg,)*
    • Kassiopeia Simple 0.1.1 to 0.1.4
      • Date-Delta-Compaction now detects a drift
      • Internal API changes to avoid unnecessary object transformations
    • New Analysis: FastDTW to find similar time series
      • Uses Dynamic time warping to find time series that are similar with other time series
      • Example Query: q=metric:*Load*min&fq=analysis=fastdtw(metric:*Load*),5,0.8
    • Performance Improvements
      • Fewer object creations
      • Faster query and analysis times
    Source code(tar.gz)
    Source code(zip)
    chronix-0.1.2.zip(112.64 MB)
  • v0.1.1(Jan 9, 2016)

    Solr 5.4.0 and Protocol Buffers Serialization

    This release contains the following changes:

    • Solr 5.3.1 upgraded to Solr 5.4.0
    • Kassiopeia Simple 0.1 to 0.1.1
      • Serialization format change (Old: Json, New: Protocol Buffers)
        • dataAsJson still work ;-)
      • Requires fewer hard disk space.
      • Fewer object creations due to lists for primitive data types (long, double)
    Source code(tar.gz)
    Source code(zip)
    chronix-0.1.1.zip(111.97 MB)
  • v0.1(Dec 12, 2015)

    Performance Improvements

    The third release of Chronix with new features and performance improvements.

    Main Features:

    • A user can now query raw (uncompressed) time series from Chronix as Json.

    Usage:

    q=*:*&fl=dataAsJson:[dataAsJson]
    
    • Updated Chronix to Kassiopeia 0.1
      • Performance improvements on analysis queries.
    • Version upgrade. No longer "mini version numbers"
    Source code(tar.gz)
    Source code(zip)
    chronix-0.1.zip(110.71 MB)
  • v0.0.2(Dec 7, 2015)

    API Changes

    The second Chronix release that is based on Apache Solr 5.3.1.

    Main Features:

    • Implemented API changes of Chronix API version 0.0.2
    //----------------------------------------------------------------------------------------------------
    //before: Streaming and grouping a time series
    //----------------------------------------------------------------------------------------------------
    chronix = new ChronixClient<>(new KassiopeiaSimpleConverter(), new ChronixSolrStorage<>());
    
    List<MetricTimeSeries> result = chronix.stream(solrConnection, query, start, end, 200)
             .collect(groupingBy(MainController.this::join)).entrySet()
              .stream().map(stringListEntry -> stringListEntry.getValue()
                        .stream()
                        .reduce((ts1, ts2) -> {
                             ts1.addAll(ts2.getPoints());
                             return ts1;
                          }).get()).collect(Collectors.toList());
    
    //-----------------------------------------------------------------------------------------------------
    //now: Streaming and grouping a time series
    //----------------------------------------------------------------------------------------------------
     Function<MetricTimeSeries, String> groupBy = MainController.this::join;
    
     BinaryOperator<MetricTimeSeries> reduce = (ts1, ts2) -> {
            ts1.addAll(ts2.getPoints());
            return ts1;
    };
    chronix = new ChronixClient<>(new KassiopeiaSimpleConverter(),
                         new ChronixSolrStorage<>(200, groupBy, reduce));
    
    //The streaming call is much simpler
    List<MetricTimeSeries> result = chronix.stream(solrConnection, query).collect(Collectors.toList());
    
    
    • Integration Test uses real time series.
      • Data is shipped within the binary release.

    see README.md for more information

    Source code(tar.gz)
    Source code(zip)
    chronix-0.0.2.zip(110.93 MB)
  • v0.0.1(Dec 4, 2015)

    Chronix v0.0.1

    The very first Chronix release that is based on Apache Solr 5.3.1.

    Main Features:

    • Range queries on time series data
    • Aggregation queries: Min, Max, Avg, Standard Deviation, Percentiles
    • Analysis queries for Trend, Outlier, and Frequency
    • Compressed storage with a reduced storage demand
    • Multi-dimensional data model
    • Comes with example data for a quick start.
    • Unzip and run.
    • ...

    see README.md for more information

    Source code(tar.gz)
    Source code(zip)
    chronix-0.0.1.zip(109.01 MB)
Owner
Chronix
An efficient and fast time series storage.
Chronix
Apache Ant is a Java-based build tool.

Apache Ant What is it? ----------- Ant is a Java based build tool. In theory it is kind of like "make" without makes wrinkles and with

The Apache Software Foundation 355 Dec 22, 2022
Java implementation of Condensation - a zero-trust distributed database that ensures data ownership and data security

Java implementation of Condensation About Condensation enables to build modern applications while ensuring data ownership and security. It's a one sto

CondensationDB 43 Oct 19, 2022
Implementation of Enhancing cubes with models to describe multidimensional data.

Implementation of Enhancing cubes with models to describe multidimensional data.

Business Intelligence Group - University of Bologna 2 Dec 15, 2022
Apache Calcite

Apache Calcite Apache Calcite is a dynamic data management framework. It contains many of the pieces that comprise a typical database management syste

The Apache Software Foundation 3.6k Dec 31, 2022
Apache Druid: a high performance real-time analytics database.

Website | Documentation | Developer Mailing List | User Mailing List | Slack | Twitter | Download Apache Druid Druid is a high performance real-time a

The Apache Software Foundation 12.3k Jan 1, 2023
Apache Hive

Apache Hive (TM) The Apache Hive (TM) data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storag

The Apache Software Foundation 4.6k Dec 28, 2022
Apache Pinot - A realtime distributed OLAP datastore

What is Apache Pinot? Features When should I use Pinot? Building Pinot Deploying Pinot to Kubernetes Join the Community Documentation License What is

The Apache Software Foundation 4.4k Dec 30, 2022
Apache Aurora - A Mesos framework for long-running services, cron jobs, and ad-hoc jobs

NOTE: The Apache Aurora project has been moved into the Apache Attic. A fork led by members of the former Project Management Committee (PMC) can be fo

The Apache Software Foundation 627 Nov 28, 2022
Apache Drill is a distributed MPP query layer for self describing data

Apache Drill Apache Drill is a distributed MPP query layer that supports SQL and alternative query languages against NoSQL and Hadoop data storage sys

The Apache Software Foundation 1.8k Jan 7, 2023
Flink Connector for Apache Doris(incubating)

Flink Connector for Apache Doris (incubating) Flink Doris Connector More information about compilation and usage, please visit Flink Doris Connector L

The Apache Software Foundation 115 Dec 20, 2022
HurricaneDB a real-time distributed OLAP engine, powered by Apache Pinot

HurricaneDB is a real-time distributed OLAP datastore, built to deliver scalable real-time analytics with low latency. It can ingest from batch data sources (such as Hadoop HDFS, Amazon S3, Azure ADLS, Google Cloud Storage) as well as stream data sources (such as Apache Kafka).

GuinsooLab 4 Dec 28, 2022
requery - modern SQL based query & persistence for Java / Kotlin / Android

A light but powerful object mapping and SQL generator for Java/Kotlin/Android with RxJava and Java 8 support. Easily map to or create databases, perfo

requery 3.1k Jan 5, 2023
A Java library designed to make making decisions based on the current operating system easier.

Java OS Independence ...or JOSI for short, is a simple and lightweight Java library designed to make making decisions based on the current operating s

null 38 Dec 30, 2022
A tool based on mysql-connector to simplify the use of databases, tables & columns

Description A tool based on mysql-connector to simplify the use of databases, tables & columns. This tool automatically creates the databases & tables

nz 6 Nov 17, 2022
Time Series Metrics Engine based on Cassandra

Hawkular Metrics, a storage engine for metric data About Hawkular Metrics is the metric data storage engine part of Hawkular community. It relies on A

Hawkular 230 Dec 9, 2022
Student Result Management System - This is a CLI based software where the Software is capable of maintaining and generating Student's Result at the end of a semester after the teacher's have provided the respective marks.

Student Result Management System This is a CLI based software where the Software is capable of maintaining and generating Student's Result at the end

Abir Bhattacharya 3 Aug 27, 2022