Fast scalable time series database

Overview

KairosDB Build Status

KairosDB is a fast distributed scalable time series database written on top of Cassandra.

Documentation

Documentation is found here.

Frequently Asked Questions

Installing

Download the latest KairosDB release.

Installation instructions are found here

If you want to test KairosDB in Kubernetes please follow the instructions from KairosDB Helm chart.

Getting Involved

Join the KairosDB discussion group.

Contributing to KairosDB

Contributions to KairosDB are very welcome. KairosDB is mainly developed in Java, but there's a lot of tasks for non-Java programmers too, so don't feel shy and join us!

What you can do for KairosDB:

  • KairosDB Core: join the development of core features of KairosDB.
  • Website: improve the KairosDB website.
  • Documentation: improve our documentation, it's a very important task.

If you have any questions about how to contribute to KairosDB, join our discussion group and tell us your issue.

License

The license is the Apache License 2.0

Comments
  • Replace Hector by Datastax Java Driver for Apache Cassandra

    Replace Hector by Datastax Java Driver for Apache Cassandra

    The last release of hector was on 2014-06-16 and the homepage states

    THIS PROJECT IS NO LONGER ACTIVE

    Please use the official java-driver at https://github.com/datastax/java-driver/ for all Java-based Apache Cassandra projects. [...] The currently active branch is 1.0. The master tracks Apache Cassandra active development which is 1.1.x presently.

    The current release of Cassandra is 2.1.10. I don't know if that is too far away or if nothing important (from the point of the Java client side) has changed from 1.1.x to 2.1.10.

    enhancement 
    opened by koppor 36
  • Incredibly high (inaccurate) values reported

    Incredibly high (inaccurate) values reported

    after updating to 0.9.5-beta2 on one of our ten ingest nodes, we saw this node reporting ludicrously high values, in the sextillion range, for metrics that should be reporting in the hundreds, also all tags that we apply to our metrics were not present on these high values. the config was the same as we used for 0.9.4, though the kairosdb.datapoint,factory values were later added to see if the lack of that config was the cause (it was not).

    bug 
    opened by arussellsaw 23
  • Potential failure when storing String datatype

    Potential failure when storing String datatype

    Dear Author,

    I tried to store the date in the string type, and a data looks like this: ver_format=3,fmt_opt=1,app=AirBox,ver_app=0.35.0,device_id=74DA3895C5C0,tick=1517536368,date=2018-02-02,time=09:52:48,device=tses,s_0=178,s_1=100,s_2=1,s_3=0,s_d0=7,s_d1=8,s_d2=5,s_t0=17,s_h0=83,gps_lat=24.06,gps_lon=120.696,gps_fix=1,gps_num=9,gps_alt=2"

    However, when I query the data out, I got the follwoing: "values": [ [1517536201000, "ver_format=3,fm"], [1517536203000, "ver_for"], [1517536205000, "ver_fo"], [1517536208000, "ver_format"], [1517536209000, "ver_fo"], [1517536209000, "ver_for"], [1517536210000, "ver_format=3"], [1517536213000, "ver_forma"], [1517536214000, "ver_f"], [1517536216000, "ver"], [1517536217000, "ver_fo"], [1517536219000, "ver"], [1517536220000, "ver_fo"], [1517536221000, "ver"], [1517536223000, "v"], [1517536223000, "ver_fo"],

    It seems like the string is not fully stored in the database.

    I confirmed that my insert query is correct, which looks like: [{'tags': {'SiteName': 'taichung99', 'device': 'Taichung99', 'app': 'AirBox', 'device_id': '74DA3895C55A'}, 'type': 'string', 'name': 'AirBox.AllData', 'datapoints': [[1494470190000.0, 'ver_format=3,fmt_opt=1,app=AirBox,ver_app=0.35.0,device_id=74DA3895C55A,tick=1494470190,date=2017-05-11,time=10:36:30,device=Taichung99,s_0=1621,s_1=100,s_2=1,s_3=0,s_d0=47,s_d1=62,s_d2=33,s_t0=32.25,s_h0=73,gps_lat=24.308,gps_lon=120.704,gps_fix=1,gps_num=9,gps_alt=2']]}]

    Do you have any suggestion what might be going wrong? Thank you!

    opened by hippoandy 20
  • Create option for choosing second granularity in Cassandra

    Create option for choosing second granularity in Cassandra

    Taken from this conversation: https://groups.google.com/forum/?hl=en#!topic/kairosdb-group/k0rgI8w8DuE

    Create a way to switch Cassandra to store data with second granulaity. Basically making the rows 1000 times larger. The use case is for users who want to store data with one day granularity.

    enhancement 
    opened by brianhks 20
  • DST and leap year fix

    DST and leap year fix

    Hi,

    We had two problems with the computation of the ranges in the RangeAggregator:

    • it does not take DST into account. Example: when computing day aggregations over the year, the day range is from 0am to 0am in winter and from 1am to 1am in summer.
    • it does not take leap years into account.

    Getting the time zone

    In order to fix for DST, I needed the timezone of the user so

    • I added a timezone field in webroot/index.html
    • I modified the functions in webroot/js/kairosdb.js and webroot/js/kairosdb.js to deal with the timezone field
    • I added a DateTimeZoneDeserializer in GsonParser
    • I added a field timezone in core/datastore/Sampling

    If no timezone is specified, the time zone of the machine will be used (just like in previous versions).

    Fixing range computation

    All range computation are done with jodatime. To avoid dealing with TimeUnit throughout the code of RangeDataPointAggregator, I extract the DateTimeProperty corresponding to the TimeUnit in the constructor. When I compute the start and end of the ranges, I only use DateTimeProperties forgetting about the TimeUnit and their special cases.

    Jodatime deals with dst, leap years, month arithmetics for us.

    Additionally, the sampling is not aligned anymore to the beginning of weeks, month or year when using a unit of respectively, week, month or year (It allows to compute aggregation wednesdays to wednesday, etc).

    Unit tests

    Added unit tests in the range aggregator:

    • test_dstMarch
    • test_dstOctober
    • test_februaryRange
    • test_leapYears

    This is my first pull request, any advice is welcome :)

    opened by robinkeunen 18
  • Deb package init script issue

    Deb package init script issue

    Hi, once successfully installed KairosDB from the last DEB package on Ubuntu 14.04, after issuing the command for starting the service sudo service kairosdb start it replies with /etc/init.d/kairosdb: line 52: syntax error: unexpected end of file. Has anyone experienced the same behaviour? Thanks in advance!

    opened by slavomirdittrich 17
  • Undesired deletion of values depending on start_absolute parameter and datapoint timestamp

    Undesired deletion of values depending on start_absolute parameter and datapoint timestamp

    Hello,

    Scenario

    We are using Kairosdb 1.2 with an underlying Scylladb. We are inserting a double of '0.01' two times into the same metric 'doubledeletiontest', with the same timestamps, but with different tags (first: 'tag' -> 'child', second: 'tag' -> 'parent'). After waiting 500ms, we will try and delete the 'parent' datapoint with the following payload:

    {
      "cache_time": 0,
      "metrics": [
        {
          "name": "doubledeletiontest",
          "tags": {
            "tag": [
              "1"
            ]
          },
          "group_by": [],
          "aggregators": []
        }
      ],
      "start_absolute": 0
    }
    

    Problem

    But after this, both doubles are deleted. What are we missing here? Isn't the Delete-Query supposed to work the same way as the Get-Query, i.e. filtering the values by tag? Also, the undesired deletion does not seem to occur if we just send start_absolute of 1000. On further research, we found that the bug seems to happen when deleting datapoints from year 2009, but not so much from 1970.

    Proof

    I have written a JUnit Test case for better understanding.

    import org.apache.commons.collections.CollectionUtils;
    import org.apache.commons.lang3.StringUtils;
    import org.junit.After;
    import org.junit.Before;
    import org.junit.Test;
    import org.kairosdb.client.HttpClient;
    import org.kairosdb.client.builder.*;
    import org.kairosdb.client.response.QueryResponse;
    import org.kairosdb.client.response.Result;
    
    import javax.annotation.Nonnull;
    import java.io.IOException;
    import java.net.URISyntaxException;
    import java.time.ZoneId;
    import java.util.*;
    import java.util.concurrent.TimeUnit;
    import java.util.function.BiFunction;
    
    import static org.hamcrest.MatcherAssert.assertThat;
    import static org.hamcrest.Matchers.is;
    
    public class DoubleDeletionFailingTest {
    
        private static final String TAG_NAME = "tag";
        private static final String KAIROS_URL = "http://localhost:8083";
        private static final String PARENT_ID_TAG = "1";
        private static final String CHILD_ID_TAG = "2";
    
        private HttpClient kairosClient;
        private HashMap<String, String> parentTags;
        private HashMap<String, String> childTags;
        private long dec2009;
        private long jan1970;
    
        @Before
        public void setup() throws IOException {
            kairosClient = new HttpClient(KAIROS_URL);
    
            parentTags = new HashMap<>();
            parentTags.put(TAG_NAME, PARENT_ID_TAG);
    
            childTags = new HashMap<>();
            childTags.put(TAG_NAME, CHILD_ID_TAG);
    
            jan1970 = TimeUnit.HOURS.toMillis(24 * 10);
            dec2009 = TimeUnit.HOURS.toMillis(24 * 365 * 40);
        }
    
        private void insertData(long singleDate) throws URISyntaxException, IOException {
            // create 'child' data and 'parent' data
            MetricBuilder instance = MetricBuilder.getInstance();
            instance.addMetric(getMetricName()).addTag(TAG_NAME, CHILD_ID_TAG).addDataPoint(singleDate, 100.1D);
            kairosClient.pushMetrics(instance);
    
            instance = MetricBuilder.getInstance();
            instance.addMetric(getMetricName()).addTag(TAG_NAME, PARENT_ID_TAG).addDataPoint(singleDate, 100.1D);
            kairosClient.pushMetrics(instance);
        }
    
        @Test
        public void deleteSome1970Zero() throws InterruptedException, IOException, URISyntaxException {
            // doesn't work
            deleteAndCheck(0L);
        }
    
        @Test
        public void deleteSome1970One() throws InterruptedException, IOException, URISyntaxException {
            insertData(jan1970);
            // works
             deleteAndCheck(1L);
        }
    
        @Test
        public void deleteSome1970Two() throws InterruptedException, IOException, URISyntaxException {
            insertData(jan1970);
            // works
            deleteAndCheck(2L);
        }
    
        @Test
        public void deleteSome1970SomeSeconds() throws InterruptedException, IOException, URISyntaxException {
            insertData(jan1970);
            // works
            deleteAndCheck(86400L);
        }
    
        @Test
        public void deleteSome1970OneDay() throws InterruptedException, IOException, URISyntaxException {
            insertData(jan1970);
            // doesn't work
            deleteAndCheck(TimeUnit.HOURS.toMillis(24L));
        }
    
        @Test
        public void deleteSome1970TwoDays() throws InterruptedException, IOException, URISyntaxException {
            insertData(jan1970);
            // doesn't work
            deleteAndCheck(TimeUnit.HOURS.toMillis(48L));
        }
        
        @Test
        public void deleteSome2009Zero() throws InterruptedException, IOException, URISyntaxException {
            insertData(dec2009);
            // doesn't work
            deleteAndCheck(0L);
        }
    
        @Test
        public void deleteSome2009One() throws InterruptedException, IOException, URISyntaxException {
            insertData(dec2009);
            // works
             deleteAndCheck(1L);
        }
    
        @Test
        public void deleteSome2009Two() throws InterruptedException, IOException, URISyntaxException {
            insertData(dec2009);
            // works
            deleteAndCheck(2L);
        }
    
        @Test
        public void deleteSome2009SomeSeconds() throws InterruptedException, IOException, URISyntaxException {
            insertData(dec2009);
            // works
            deleteAndCheck(86400L);
        }
    
        @Test
        public void deleteSome2009OneDay() throws InterruptedException, IOException, URISyntaxException {
            insertData(dec2009);
            // doesn't work
            deleteAndCheck(TimeUnit.HOURS.toMillis(24L));
        }
    
        @Test
        public void deleteSome2009TwoDays() throws InterruptedException, IOException, URISyntaxException {
            insertData(dec2009);
            // doesn't work
            deleteAndCheck(TimeUnit.HOURS.toMillis(48L));
        }
    
        private void deleteAndCheck(long date) throws InterruptedException, URISyntaxException, IOException {
            TimeUnit.MILLISECONDS.sleep(500L);
    
            long count = 1L;
            assertThat(count(getMetricName(), childTags), is(count));
            assertThat(count(getMetricName(), parentTags), is(count));
    
            // delete only 'parent' values
            QueryBuilder deleteQueryBuilder = QueryBuilder.getInstance();
            deleteQueryBuilder.setStart(new Date(date));
            deleteQueryBuilder.addMetric(getMetricName()).addTags(parentTags);
    
            kairosClient.delete(deleteQueryBuilder);
    
            TimeUnit.MILLISECONDS.sleep(500L);
    
            assertThat(count(getMetricName(), parentTags), is(0L));
            // 'child' values must not have been deleted, but are in my environment
            assertThat(count(getMetricName(), childTags), is(count));
        }
    
        @After
        public void tearDown() throws IOException {
            kairosClient.deleteMetric(getMetricName());
    
            kairosClient.shutdown();
    
        }
    
        public String getMetricName() {
            return "doubledeletiontest";
        }
    
        private long count(@Nonnull String metricName, @Nonnull Map<String, String> tags) {
    
            QueryBuilder queryBuilder = QueryBuilder.getInstance();
            queryBuilder.setStart(new Date(1000L));
            queryBuilder.addMetric(metricName).addTags(tags);
    
            queryBuilder.getMetrics().get(0).addAggregator(AggregatorFactory.createCountAggregator(10, org.kairosdb.client.builder.TimeUnit.YEARS));
    
            return queryAndTransform(queryBuilder, (results, zoneId) -> {
                if (CollectionUtils.isEmpty(results) || CollectionUtils.isEmpty(results.get(0).getDataPoints())) {
                    return 0L;
                }
                DataPoint dataPoint = results.get(0).getDataPoints().get(0);
                try {
                    return dataPoint.longValue();
                } catch (DataFormatException e) {
                    throw new RuntimeException("Error deserializing values!", e);
                }
            }, ZoneId.of("UTC"));
        }
    
        private <T> T queryAndTransform(@Nonnull QueryBuilder queryBuilder,
                                        @Nonnull BiFunction<List<Result>, ZoneId, T> transformFunction,
                                        @Nonnull ZoneId timeZone) {
            try {
                queryBuilder.setTimeZone(TimeZone.getTimeZone(timeZone));
    
                QueryResponse queryResponse = kairosClient.query(queryBuilder);
                if (CollectionUtils.isNotEmpty(queryResponse.getErrors())) {
                    // throw IllegalState here to indicate it is most likely the programmer's error
                    throw new IllegalStateException("There have been errors upon querying the Kairos database! " + StringUtils.join(queryResponse.getErrors().iterator(), " | "));
                }
                List<Result> results = queryResponse.getQueries().get(0).getResults();
                return transformFunction.apply(results, timeZone);
            } catch (IOException ioex) {
                // we cannot recover here, so we must rethrow this as internal error
                throw new RuntimeException("IO-Exception while querying the Kairosdb database!", ioex);
            } catch (URISyntaxException uri) {
                // we cannot recover here, so we must rethrow this as internal error
                throw new IllegalStateException("Illegal URI configuration led to URISyntaxException!", uri);
            }
        }
    }
    
    opened by gmuehlenberg 15
  • Get different result of same point when query between different time range

    Get different result of same point when query between different time range

    my two queries:

    1. query between 1489075200000 and 1489593600000 {'cache_time': 0, 'end_absolute': 1489593600000, 'metrics': [{'aggregators': [{'align_sampling': False, 'name': 'sum', 'sampling': {'unit': 'milliseconds', 'value': '900000'}}], 'name': 'ubmq_click_sum_1'}], 'start_absolute': 1489075200000} result {"queries":[{"sample_size":2505568,"results":[{"name":"ubmq_click_sum_1","group_by":[{"name":"type","type":"number"}],"tags":{"cmatch":["204","222","223","225","227","228","229"],"cookie":["no","yes"],"flow":["Hao123","Lu","Organic","Union","other"],"home_page":["no","yes"],"namespace":["fc_model_mean"],"page_turn":["no","yes"],"platform":["207L-0","207L-7","207L-dz","247L-1","247L-3","247L-4","247L-6","247L-dz","2615-1","2615-dz","2669-1","2669-dz","other"],"rank":["1","2","3","4","5"],"wmatch":["15","31","63"]},"values":[[1489075200000,456727]

    2.query between 1489075200000 and 1489161600000 {'cache_time': 0, 'end_absolute': 1489161600000, 'metrics': [{'aggregators': [{'align_sampling': False, 'name': 'sum', 'sampling': {'unit': 'milliseconds', 'value': '900000'}}], 'name': 'ubmq_click_sum_1'}], 'start_absolute': 1489075200000} result {"queries":[{"sample_size":422939,"results":[{"name":"ubmq_click_sum_1","group_by":[{"name":"type","type":"number"}],"tags":{"cmatch":["204","222","223","225","227","228"],"cookie":["no","yes"],"flow":["Hao123","Lu","Organic","Union","other"],"home_page":["no","yes"],"namespace":["fc_model_mean"],"page_turn":["no","yes"],"platform":["207L-0","207L-7","207L-dz","247L-1","247L-3","247L-4","247L-dz","2615-1","2615-dz","2669-1","2669-dz","other"],"rank":["1","2","3","4","5"],"wmatch":["15","31","63"]},"values":[[1489075200000,432949] finally get two different result of 1489075200000

    opened by james1986 15
  • Cassandra Compaction Strategy

    Cassandra Compaction Strategy

    We been experiencing the following issue: Large amounts of data is logged over the course of the day or night. The first query that is run which touches any of this new data will trigger compactions across the entire cassandra cluster. This first query will be rejected by cassandra with a timeout exception (same as this: http://stackoverflow.com/questions/6514343/cassandra-hector-timeouts-what-to-do). Once all new data has been "touched" it will become query-able reliably.

    It would seem this is related to the fact that kairos assumes the default compaction strategy (size tierd). This apparently not the best strategy for update-heavy workloads (i.e. kairos' wide rows): http://www.datastax.com/dev/blog/when-to-use-leveled-compaction

    We have manually altered our keyspace to use leveled compaction and are have already seen our data size fall by three quaters (down to around 100GB from 400). We have yet to observe any timeout exceptions with this strategy but haven't had any long periods of query inactivity so cannot make any judgements just yet.

    Anyway. I believe the cassandra defaults may need some investigation or input from others who have manually altered their config to reach a more optimum configuration for large workloads.

    opened by warmans 15
  • Official docker support

    Official docker support

    Docker can make it easier for people to get started with Kairosdb, There are already some unofficial images. But most of them do not have multiple versions, ie: only have 1.1.0. Also we can use docker compose to have Kairosdb run with Cassandra easily.

    Possible scenarios:

    • run Kairosdb locally without installing JDK and Cassandra on local machine.
    • use Kairosdb in CI. (GitLab CI supports using docker image)
    • use Docker image for production deployment

    Existing docker images on dockerhub:

    • https://hub.docker.com/r/wangdrew/kairosdb/ 100k+ pulls
    • https://hub.docker.com/r/mbessler/archlinux-kairosdb/ 5.5k pulls
    • https://hub.docker.com/r/kowens/kairosdb-statsd/ 1.4k pulls (but does not have description)

    I have already made one following this dockerfile, it is not published yet, the repo is here. And I have a few questions for Kairosdb environment setup

    • Which version of JDK is prefered? JDK7 or JDK8.
    • Which version of Cassandra is prefered?

    Thanks~

    opened by at15 14
  • Trying to resolve straight answers on performance...

    Trying to resolve straight answers on performance...

    I think in any situation the user must be willing to do the science and confirm their ideas about performance, that is write little scripts to test that. Someone has already asked for something like that or if anyone else has done it.

    While the user has some responsibility, the tool must also have some performance intentions otherwise it would be a bit pointless. I'd imagine there must be some tests somewhere.

    That aside the first performance document that comes up for a search isn't helpful even when it comes to covering basic usage. It's quite cryptic and potentially counter productive. It doesn't really explain things well and in some areas it really makes no sense at all,

    There are cases where it seems to jump into low level aspect where as it might be better to just say what kind of schema it establishes in for example cassandra allowing users to look at cassandra specs for specifics.

    It helps to narrow down some things:

    Data:

    • number of unique metric names
    • number of unique tag names
    • number of unique tag values
    • number of unique times
    • overlap and other complexities

    Access patterns:

    Are tags:

    • and
    • or

    I assume or but the documentation is a little tight lipped about it:

    It is possible to filter the data returned by specifying a tag. The data returned will only contain data points associated with the specified tag. Filtering is done using the “tags” property.

    It talks a little too much in the singular about a plural, however we find out they're ors here:

    Tags narrow down the search. Only metrics that include the tag and matches one of the values are returned. Tags is optional.

    This should probably be only the metrics that match as least one of the tag key value combinations will be returned. This I'm still left unsure. An array of values obviously would be an or but what about two key names? Though it would be a bit dysfunctional for those to be and. Most people would want (a = 1 AND b = 2) OR (a = 2 AND b = 1) not (a = 1 OR a = 2) AND (b = 2 OR b = 1)? Just a IN (1, 2) OR b IN(1, 2) would make more sense. Though the example with customer as well as hosts would imply they're AND between multiple key names. Will people flip key and value to get AND to work? Are people not asking the questions I am getting incorrect metrics without knowing (often it's worse as excess tends to appear more valid than deficit or vice versa in most cases)?

    For when people want and, the only solution to that is to DIY, that is:

    tags = {hotel: 123, room: 321, person: 666};
    keys = Object.keys(tags).sort();
    values = keys.map(key => tags[key]);
    tags[JSON.stringify(keys)] = JSON.stringify(values);
    query = JSON.stringify({metrics: [{tags}]});
    

    This would allow to search for "for how long during this period did person ? stay in room ? of the hotel ?".

    Concerns like this depend on actual usage and access. In this case it's quite common to want to want:

    • How many stays (a day) were there for the hotel ? during the period ?.
    • During the period ?, how many stays were there in room ? of hotel ?.

    Many people might do something simpler than the above and just have room: [hotel, room].join(delim). It's quite common to have a usage pattern where your lookup tends to consist of a list of possible ands (like a and b and c) but may only want out of those either a, a and b or a and b or c but not just b or a and c.

    Metric versus tags: Fight

    Starting out you must define both a metric and a tag for a datapoint. A problem here is that all basic use cases involving and/or can be both managed with metrics and tags. Whenever starting out with a single first use case it's very much neither here not there which works best to use more of. It's not until you start piling on the use cases that things start to become apparent.

    You might say well surely it's obvious when your queries are ten times bigger with using metrics (unless it turns out it secretly takes an array for multiple items like tag values do) or based on it spamming rows but it's not immediately obvious if that will be the case and the line between when to use metrics or tags is blurred especially where performance is concerned.

    Yes, I have seen people using metrics in place of tags. It happens, probably quite often. Then the moment you need to add another access pattern it quickly becomes insufficient. For tags you can easily add them or stop populating them with little impact but metrics has a lot of impact. This should be considered in any design, what will happen with tags versus metrics when different access patterns are needed.

    My view on the matter is keep metrics quite shallow and use tags by default. By shallow it's usually as in whichever first set of ands you'll always want. As in (application = hotels AND type = stays) AND (a = 1 OR a = 2 AND b = 1) then for that initial static part in brackets that might make a good metric but for the part that's very dynamic based on use case then that should be tags.

    Another rule of thumb (obvious one) should be to use metrics for any data that's always isolated, as in never included together with a query, should also use a separate metric. If it's separate data, use a separate metric.

    As I see it by default more metrics is bad but tags can build up until that's bad as well and there's then probably a kind of to and fro between splitting things up a bit with metrics than tags but generally the preference should be on tags, not metrics. I think you'l always have cases where either it turns out a portion of a metric should have been a tag or a tag should have been a metric.

    It's technically possible to make an abstraction that can switch between the two approaches for performance testing. It's also technically possible to make a profiling mode where given the appropriate usage patterns indicates if it appears a tag should be a metric or vise versa (usually by identifying that when reduced to metrics there's no overlap or that two metrics needed to be used for otherwise the same query).

    I never saw a database system where you can insert your use cases like rather than just insert a = 1, b = 2, insert a = 1 AND b = 2;a = 1;a IN(1) though that's probably out of the scope of this.

    The battle becomes even more epic when you pit group by against the other alternatives.

    Schema

    What is a row key? What do the indexes actually look like? Are there composite indexes? Are they ordered? Are they hashes? Is there a plan to expose left to right indexes?

    Rather than trying to explain it, it might be easier to just give the definitions used on the most concise level, IE in cassies syntax for example.

    Buckets you say?

    This immediately stands out as it gives the impression that's all their is. As in you always end up fetching at least three weeks of data for a given metric and time range. I assume that's not actually the case but if I were looking at this database at a glance and saw that I'd quickly walk if for example my use case consisted of a lot of small range lookups across a busy (populated with a lot of data) metric within a retention period of a fortnight.

    I would assume in reality cassandra provides a sparse array implementation and allows you to say you want from this column to that column? That raises the question because sparse columns are just an abstraction. Usually they're backed by either a hash map or a tree structure (though in some cases simplre or more complex solutions). If it's a map then that tends to preclude the possibility of a range lookup.

    WHAT?

    Similar to a query but only returns the tags (no data points returned). This can potentially return more tags than a query because it is optimized for speed and does not query all rows to narrow down the time range. This queries only the Row Key Index and thus the time range is the starting time range. Since the Cassandra row is set to 3 weeks, this can return tags for up to a 3 week period. See Cassandra Schema.

    {"start_absolute": 1357023600000, "end_relative": {"value": "5", "unit": "days"},

    Kick the bucket, bad bucket.

    opened by joeyhub 12
  • Remove Genormous from unrelated classes (other than H2)

    Remove Genormous from unrelated classes (other than H2)

    Hello, Genormous is an old artifact and we have warning reported about vulnerabilities on this library. It should only be required for H2... So we could remove this dependency in production.

    But Some other classes like AdminResource and QueryQueuingManager use Pair class from this library that prevent from removal in operation.

    opened by lcoulet 1
  • fix(sec): upgrade com.beust:jcommander to 1.75

    fix(sec): upgrade com.beust:jcommander to 1.75

    What happened?

    There are 1 security vulnerabilities found in com.beust:jcommander 1.35

    What did I do?

    Upgrade com.beust:jcommander from 1.35 to 1.75 for vulnerability fix

    What did you expect to happen?

    Ideally, no insecure libs should be used.

    The specification of the pull request

    PR Specification from OSCS

    opened by TopScrew 0
  • Fix typo for cassandra auth

    Fix typo for cassandra auth

    There is a typo in autsecret resource usage leading to error message as Error: YAML parse error on kairosdb/templates/deployment.yaml: error converting YAML to JSON: yaml: invalid map key: map[interface {}]interface {}{".Values.storage.cassandra.authSecret":interface {}(nil)}

    opened by rverma-nsl 0
  • Query Metric Tags does not respect relative time ranges

    Query Metric Tags does not respect relative time ranges

    The API for querying metric tags describes the possibiliby to use relative time ranges ("start_relative" and "end_relative").

    In our tests, these query properties are not being used. The response of those queries always contains every value available. We're using KairosDB v1.3.0.

    quick demo

    Posting 2 datapoints, both have tags "m" and "h" with different values:

    • first at Thu Aug 25 2022 05:00:00 GMT+0000
    • second at Thu Aug 25 2022 06:00:00 GMT+0000
    [
      {
          "name": "queryMetricTags.bug",
          "datapoints": [
              [1661403600000, 1]
          ],
          "tags": {
              "m": "true",
              "h": "h1"
          }
      },
      {
          "name": "queryMetricTags.bug",
          "datapoints": [
              [1661407200000, 2]
          ],
          "tags": {
              "m": "false",
              "h": "h2"
          }
      }
    ]
    

    Querying the tag values, filtering for tag "m"=false, and setting are relative time range of 1m (both datapoints are a couple of hours old at the time of writing/querying):

    {
       "start_relative": {
           "value": "1",
           "unit": "minutes"
       },
       "metrics": [
           {
               "tags": {
                   "m": ["true"]
               },
               "name": "queryMetricTags.bug"
           }
       ]
    }
    

    The result contains the combination of the second datapoint, which is wrong -> should be an empty result.

    {
      "queries": [
        {
          "results": [
            {
              "name": "queryMetricTags.bug",
              "tags": {
                "h": [
                  "h2"
                ],
                "m": [
                  "false"
                ]
              },
              "values": []
            }
          ]
        }
      ]
    }
    
    opened by sspieker 0
  • Data Allignment in aggregation is not proper.

    Data Allignment in aggregation is not proper.

    If the timestamp of my data point is exactly at 1:45:00, And I do a 1 min Average aggregation, and allign by end time,

    The same data point is moving to 1:46:00.

    Ideally this should not happen. it should stay in 1:45:00

    opened by biswaKL 0
Releases(v1.3.0)
Owner
null
Scalable Time Series Data Analytics

Time Series Data Analytics Working with time series is difficult due to the high dimensionality of the data, erroneous or extraneous data, and large d

Patrick Schäfer 286 Dec 7, 2022
The Heroic Time Series Database

DEPRECATION NOTICE This repo is no longer actively maintained. While it should continue to work and there are no major known bugs, we will not be impr

Spotify 842 Dec 20, 2022
IoTDB (Internet of Things Database) is a data management system for time series data

English | 中文 IoTDB Overview IoTDB (Internet of Things Database) is a data management system for time series data, which can provide users specific ser

The Apache Software Foundation 3k Jan 1, 2023
An open source SQL database designed to process time series data, faster

English | 简体中文 | العربية QuestDB QuestDB is a high-performance, open-source SQL database for applications in financial services, IoT, machine learning

QuestDB 9.9k Jan 1, 2023
Accumulo backed time series database

Timely is a time series database application that provides secure access to time series data. Timely is written in Java and designed to work with Apac

National Security Agency 367 Oct 11, 2022
The Prometheus monitoring system and time series database.

Prometheus Visit prometheus.io for the full documentation, examples and guides. Prometheus, a Cloud Native Computing Foundation project, is a systems

Prometheus 46.3k Jan 10, 2023
Time series monitoring and alerting platform.

Argus Argus is a time-series monitoring and alerting platform. It consists of discrete services to configure alerts, ingest and transform metrics & ev

Salesforce 495 Dec 1, 2022
Time Series Metrics Engine based on Cassandra

Hawkular Metrics, a storage engine for metric data About Hawkular Metrics is the metric data storage engine part of Hawkular community. It relies on A

Hawkular 230 Dec 9, 2022
The Most Advanced Time Series Platform

Warp 10 Platform Introduction Warp 10 is an Open Source Geo Time Series Platform designed to handle data coming from sensors, monitoring systems and t

SenX 322 Dec 29, 2022
DbLoadgen: A Scalable Solution for Generating Transactional Load Against a Database

DbLoadgen: A Scalable Solution for Generating Transactional Load Against a Database DbLoadgen is scalable solution for generating transactional loads

Qlik Partner Engineering 4 Feb 23, 2022
MapDB provides concurrent Maps, Sets and Queues backed by disk storage or off-heap-memory. It is a fast and easy to use embedded Java database engine.

MapDB: database engine MapDB combines embedded database engine and Java collections. It is free under Apache 2 license. MapDB is flexible and can be u

Jan Kotek 4.6k Dec 30, 2022
MapDB provides concurrent Maps, Sets and Queues backed by disk storage or off-heap-memory. It is a fast and easy to use embedded Java database engine.

MapDB: database engine MapDB combines embedded database engine and Java collections. It is free under Apache 2 license. MapDB is flexible and can be u

Jan Kotek 4.6k Jan 1, 2023
Apache Druid: a high performance real-time analytics database.

Website | Documentation | Developer Mailing List | User Mailing List | Slack | Twitter | Download Apache Druid Druid is a high performance real-time a

The Apache Software Foundation 12.3k Jan 1, 2023
CrateDB is a distributed SQL database that makes it simple to store and analyze massive amounts of machine data in real-time.

About CrateDB is a distributed SQL database that makes it simple to store and analyze massive amounts of machine data in real-time. CrateDB offers the

Crate.io 3.6k Jan 2, 2023
Aggregation query proxy is a scalable sidecar application that sits between a customer application and Amazon Keyspaces/DynamoDB

Aggregation query proxy is a scalable sidecar application that sits between a customer application and Amazon Keyspaces/DynamoDB. It allows you to run bounded aggregation queries against Amazon Keyspaces and DynamoDB services.

AWS Samples 3 Jul 18, 2022
HurricaneDB a real-time distributed OLAP engine, powered by Apache Pinot

HurricaneDB is a real-time distributed OLAP datastore, built to deliver scalable real-time analytics with low latency. It can ingest from batch data sources (such as Hadoop HDFS, Amazon S3, Azure ADLS, Google Cloud Storage) as well as stream data sources (such as Apache Kafka).

GuinsooLab 4 Dec 28, 2022
eXist Native XML Database and Application Platform

eXist-db Native XML Database eXist-db is a high-performance open source native XML database—a NoSQL document database and application platform built e

eXist-db.org 363 Dec 30, 2022
Flyway by Redgate • Database Migrations Made Easy.

Flyway by Redgate Database Migrations Made Easy. Evolve your database schema easily and reliably across all your instances. Simple, focused and powerf

Flyway by Boxfuse 6.9k Jan 9, 2023
Realm is a mobile database: a replacement for SQLite & ORMs

Realm is a mobile database that runs directly inside phones, tablets or wearables. This repository holds the source code for the Java version of Realm

Realm 11.4k Jan 5, 2023