Mirror of Apache Storm

Overview

Master Branch:
Travis CI Maven Version

Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation. Storm is simple, can be used with any programming language, is used by many companies, and is a lot of fun to use!

The Rationale page explains what Storm is and why it was built. This presentation is also a good introduction to the project.

Storm has a website at storm.apache.org. Follow @stormprocessor on Twitter for updates on the project.

Documentation

Documentation and tutorials can be found on the Storm website.

Developers and contributors should also take a look at our Developer documentation.

Getting help

NOTE: The google groups account [email protected] is now officially deprecated in favor of the Apache-hosted user/dev mailing lists.

Storm Users

Storm users should send messages and subscribe to [email protected].

You can subscribe to this list by sending an email to [email protected]. Likewise, you can cancel a subscription by sending an email to [email protected].

You can also browse the archives of the storm-user mailing list.

Storm Developers

Storm developers should send messages and subscribe to [email protected].

You can subscribe to this list by sending an email to [email protected]. Likewise, you can cancel a subscription by sending an email to [email protected].

You can also browse the archives of the storm-dev mailing list.

Storm developers who would want to track the JIRA issues should subscribe to [email protected].

You can subscribe to this list by sending an email to [email protected]. Likewise, you can cancel a subscription by sending an email to [email protected].

You can view the archives of the mailing list here.

Issue tracker

In case you want to raise a bug/feature or propose an idea, please use Apache Jira

Which list should I send/subscribe to?

If you are using a pre-built binary distribution of Storm, then chances are you should send questions, comments, storm-related announcements, etc. to [email protected].

If you are building storm from source, developing new features, or otherwise hacking storm source code, then [email protected] is more appropriate.

If you are committers and/or PMCs, or contributors looking for following up and participating development of Storm, then you would want to also subscribe [email protected] in addition to [email protected].

What will happen with [email protected]?

All existing messages will remain archived there, and can be accessed/searched here.

New messages sent to [email protected] will either be rejected/bounced or replied to with a message to direct the email to the appropriate Apache-hosted group.

IRC

You can also come to the #storm-user room on freenode. You can usually find a Storm developer there to help you out.

License

Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

The LICENSE and NOTICE files cover the source distributions. The LICENSE-binary and NOTICE-binary files cover the binary distributions. The DEPENDENCY-LICENSES file lists the licenses of all dependencies of Storm, including those not packaged in the source or binary distributions, such as dependencies of optional connector modules.

Project lead

Committers

Acknowledgements

YourKit is kindly supporting open source projects with its full-featured Java Profiler. YourKit, LLC is the creator of innovative and intelligent tools for profiling Java and .NET applications. Take a look at YourKit's leading software products: YourKit Java Profiler and YourKit .NET Profiler.

Comments
  • STORM-822: Kafka Spout New Consumer API

    STORM-822: Kafka Spout New Consumer API

    This patch is still under development and was uploaded at this moment for early testing. Please read README.

    There may be a bug in the offsets management, because of diff o 1. I am looking into it. Currently polling from an arbitrary offset is possible but it will come in the next patch, today or tomorrow I refactored the code a bit and left, maybe, some unnecessary locking. I am also looking into it.

    @connieyang @jianbzhou @tgravescs please let me know of any other requirements you may have and I will address them soon.

    Thanks

    opened by hmcl 82
  • STORM-329 : buffer message in client and reconnect remote server async

    STORM-329 : buffer message in client and reconnect remote server async

    Description can be found at https://issues.apache.org/jira/browse/STORM-329 The main change are:

    1. set a client buffer limit, if the buffer message exceed the limit, it will wait util there are enough space for the new arrival message.
    2. if we lose the connection with remote worker, the reconnect logic will be asynchronously and will not block other worker.
    3. there are timer for cache messages, if the messages are cache for more than "topology.message.timeout.secs", we will drop the cache message.
    opened by tedxia 76
  • [STORM-2693] Heartbeats and assignments promotion for storm2.0

    [STORM-2693] Heartbeats and assignments promotion for storm2.0

    For large cluster support, good performance for scheduling and assignments distribution.

    For nimbus heartbeats pressure test Heartbeats response time: e4a1e216-0a08-4e08-90c0-334e00db962c The horizontal ordinate is number of nodes/supervisors which report heartbeats every 1 second, for our production cluster, heartbeat reporting interval is 5 seconds and time out 30 seconds, so this means new heartbeats mode supports at least 5 * 2000 nodes.

    For topology submission time cost Topology submission scheduling time:

    cluster used slots | newly submit workers | after submission used slots | time cost milliseconds -- | -- | -- | -- 3700 | 100 | 3800 | 1181ms 3600 | 100 | 3700 | 886ms 3500 | 100 | 3600 | 925ms 3400 | 100 | 3500 | 1520ms 3300 | 100 | 3400 | 930ms 2000 | 100 | 2100 |  722ms 1500 | 100 | 1600 |  747ms 1000 | 100 | 1100 |  754ms 500 | 100 | 600 |  695ms 100 | 100 | 200 |  749ms

    This is PR for 1.1.x-branch of heartbeats and assignments promotion #2389

    This is PR for 1.1.x-branch of assignments promotion [closed now] #2319

    opened by danny0405 75
  • STORM-2306 : Messaging subsystem redesign.

    STORM-2306 : Messaging subsystem redesign.

    Having spent a lot of time on this, I am happy to share some good news and some even better news with you.

    Before venturing further, I must add, to limit the scope of this PR, no attempt was made to improve ACK-ing mode perf. Although there are some big latency gains seen in ACK mode, these are a side effect of the new messaging design and work remains to be done to improve ACK mode.

    Please see the design docs posted on the STORM-2306 jira for details on what is being done

    So, first the good news .. a quick competitive evaluation:

    1) Competitive Perf evaluation :

    Here are some quick comparison of Storm numbers taken on my laptop against numbers for similar/identical topologies published by Heron, Flink and Apex. Shall provide just rolled up summary here and leave the detailed analysis for later.

    Storm numbers here were run on my MacBook Pro (2015) with 16GB ram and a single 4 core Intel i7 chip.

    A) Compared To Heron and Flink:


    Heron recently published this blog about the big perf improvements (~4-6x) they achieved. https://blog.twitter.com/engineering/en_us/topics/open-source/2017/optimizing-twitter-heron.html They ran it on dual 12-core Intel Xeon chips (didn't say how many machines).

    They use a simplified word count topology that I have emulated for comparison purposes and included it as part of this PR (SimplifiedWordCountTopo).

    Flink also publishes numbers for a similar setup here https://flink.apache.org/features.html#streaming

    Below are per core throughput numbers.

    [:HERON:] Acking Disabled: per core = ~475 k/sec. Acking Enabled: per core = ~150 k/sec. Latency = 30ms

    [:FLINK:] Per core: ~1 mill/sec

    [:STORM:] Acking Disabled: per core = 2 mill/sec. (1 spout & 1 counter bolt) Acking Enabled: per core = 0.6 mill/sec, Latency = 0.73 ms (+1 acker)

    Takeaways:

    • Storm's with-ACK throughput is better than Heron's no-ACK throughput.
    • Without ACKs, Storm is far ahead of Heron and also better than Flink.
    • Storm's Latency (in microseconds) is also significantly better than both (although this metric is better to compared with multiple machines in the run). AFAIKT, Flink is generally not known for having good latency (as per Flink's own comparison with Storm - https://data-artisans.com/blog/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink).

    B) Compared to Apex:


    Apex appears to be the best performer among the opensource lot.. by a reasonably good margin. Some numbers they published in their early days (when it was called DataTorrent) were misleading/dubious IMO, but the newer numbers appear credible.

    Here we look at how fast inter spout/bolt communication can be achieved using an ultra minimalist topology. A ConstSpout emits a short string to a DevNull bolt that discards the tuples it receives. This topo has been in storm-perf for sometime now.

    Apex provides numbers for a identical setup ... what they call "container local" performance here: https://www.datatorrent.com/blog/blog-apex-performance-benchmark/

    Other than the fact that Storm numbers were on my laptop, these numbers are a good apples to apples comparison.

    [:APEX:] Container local Throughput : ~4.2 mill/sec

    [:STORM:] Worker local throughput : 8.1 mill/sec

    2) Core messaging Performance

    Now for the better news. The redesigned messaging system is actually much faster and able to move messages between threads at an astounding rate .... :

    • 120 mill/sec (batchSz=1000, 2 producers writing to 1 consumer).
    • 67 mill/sec (batchSz=1000, 1 producers writing to 1 consumer).

    I have included JCQueuePerfTest.java in this PR to help get quick measurements from within the IDE.

    That naturally begs the question .. why is Storm pushing only 8.1 mill/sec between a ConstSpout and DevNullBolt ? The short answer is ... there are big bottlenecks in other parts of the code. In this PR I have tackled some such bottlenecks but many still remain. We are faster than the competition, but still have room to be much much faster. If anyone is interested in pursuing these to push Storm's perf to the next level, I am happy to point them in the right direction.

    Again, please refer to the design docs in the JIRA for details on the new design and the rationale behind them.

    opened by roshannaik 71
  • STORM-297 Storm Performance cannot be scaled up by adding more CPU cores

    STORM-297 Storm Performance cannot be scaled up by adding more CPU cores

    STORM-297

    Description and test report can be found at https://issues.apache.org/jira/browse/STORM-297 The changes consists of:

    1. use netty async
    2. use batch send and batch receiver messaging api
    3. allow to configure multiple worker receiver threads.
    4. name the executor and netty threads
    opened by clockfly 62
  • STORM-205. Add REST API to Storm UI.

    STORM-205. Add REST API to Storm UI.

    1. removed html rendering from core.clj
    2. added RESTful apis to core.clj
    3. moved html rendering to jquery & mustache on top of newly added restful apis
    4. added api
      1. /api/cluster/configuration
      2. /api/cluster/summary
      3. /api/supervisor/summary
      4. /api/topology/summary
      5. /api/topology/:id
      6. /api/topology/:id/component/:component
      7. /api/topology/:id/activate (POST)
      8. /api/topology/:id/deactivate (POST)
      9. /api/topology/:id/rebalance/:wait-time (POST)
      10. /api/topology/:id/kill/:wait-time (POST) all of the above methods returns json response.
    opened by harshach 44
  • STORM-2018: Supervisor V2.

    STORM-2018: Supervisor V2.

    Still needs run as user and CGroup work, but the rest is working

    Any feedback on this would be welcome. I am particularly interested in the readability of the code and how easy is it to understand vs the original supervisor.

    All the unit tests pass and I have done a lot of manual testing switching back and forth between the original supervisor and supervisor V2 to ensure that this can be a rolling upgrade.

    There are still a number of TODOs in the code. Most of them are for removing the original supervisor code and cleaning up the result of that.

    opened by revans2 42
  • STORM-329: fix cascading Storm failure by improving reconnection strategy and buffering messages

    STORM-329: fix cascading Storm failure by improving reconnection strategy and buffering messages

    This PR contains the same code as https://github.com/apache/storm/pull/428 but as a single commit for a cleaner commit history of our Storm repo.

    This is an improved version of the original pull request discussed at https://github.com/apache/storm/pull/268. Please refer to the discussion in the link above.

    The changes of this pull request include:

    • Most importantly, we fix a bug in Storm that may cause a cascading failure in a Storm cluster, to the point where the whole cluster becomes unusable. This is achieved by the work described in the next bullet points.
    • We refactor and improve the Netty messaging backend, notably the client.
    • During the initial startup of a topology, Storm will now wait until worker (Netty) connections are ready for operation. See the original discussion thread for the detailed explanation and justification of this change.

    @clockfly, @tedxia: Please add any further comments to STORM-329 to this pull request, if possible.

    opened by miguno 39
  • Storm-166: Nimbus HA design doc and implementation.

    Storm-166: Nimbus HA design doc and implementation.

    I have deleted the bit torrent implementation from this pull request as the only available bit torrent library does not support tracker less torrents. In absence of tracker less torrents a single tracker becomes a single point of failure and a multi tracker implementation requires that if a tracker host fails the replacement host has same dns/network configuration.

    Some manual tests I executed:

    • start 3 nimbuses, test simple word count topology works. try storm list/activate/deactivate/rebalance/kill from ui and CLI.
    • set the replication factor to 2 run the first test again.
    • bring up a new nimbus, ensure it catches up and competes for leader lock.
    • with 3 nimbuses and 2 topologies, delete one topology code from each non leader nimbus. After killing master nimbus, ensure one of them eventually becomes leader.
    opened by Parth-Brahmbhatt 37
  • STORM-2016 Topology submission improvement: support adding local jars and maven artifacts on submission (1.x)

    STORM-2016 Topology submission improvement: support adding local jars and maven artifacts on submission (1.x)

    • JIRA issue: http://issues.apache.org/jira/browse/STORM-2016
    • design doc: https://cwiki.apache.org/confluence/display/STORM/A.+Design+doc%3A+adding+jars+and+maven+artifacts+at+submission
    • discussion thread: http://mail-archives.apache.org/mod_mbox/storm-dev/201608.mbox/%3CCAF5108i9+tJaNZ0LgRkTMkVQEL7F+53k9uyzxcT6zhSU6OHx9Q@mail.gmail.com%3E

    • bin/storm now supports "--jars" and "--packages" options
      • it's only effective with "storm jar" and "storm sql"
    • introduce new module: storm-submit to help resolving dependencies with handling transitive dependencies
    • StormSubmitter will upload dependencies to BlobStore when submitting topology
    • Supervisor will download dependencies from BlobStore when such topology is assigned
    • Supervisor will launch workers with adding downloaded dependencies to worker classpath

    TODO

    • [x] documentation
    • [x] craft pull request against master branch

    Btw, it might be better to place some modules out of external since 'external' has most of non storm-core modules and flux, sql, storm-submit, storm-kafka-monitor are not kind of connectors.

    opened by HeartSaVioR 36
  • [STORM-1057] Add throughput metrics to spouts/bolts and display them on web ui

    [STORM-1057] Add throughput metrics to spouts/bolts and display them on web ui

    The throughputs for the spouts and bolts could help the user to identify the performance bottleneck and detect the load balancing issue. In this RP, I take measurements on the throughput of the executors and display them on web UI.

    Summary of Changes

    1. Take throughput measurements on the spouts and bolts;
    2. Add throughput to ExecutorStats;
    3. Display the throughputs on web UI.

    Note: If you cannot see the throughputs on your web UI, please clean your browser cache and try again.

    Screenshots

    screen shot 2015-09-21 at 13 16 01 screen shot 2015-09-21 at 13 17 24 screen shot 2015-09-21 at 13 17 57 screen shot 2015-09-21 at 13 18 49

    opened by wangli1426 36
  • STORM-3888 HdfsBlobStoreFile set wrong permission for file

    STORM-3888 HdfsBlobStoreFile set wrong permission for file

       public OutputStream getOutputStream() throws IOException {
            FsPermission fileperms = new FsPermission(BLOBSTORE_FILE_PERMISSION);
            try {
                out = fileSystem.create(path, (short) this.getMetadata().get_replication_factor());
                fileSystem.setPermission(path, fileperms);
                fileSystem.setReplication(path, (short) this.getMetadata().get_replication_factor());
            } catch (IOException e) {
               ......
                out = fileSystem.create(path, (short) this.getMetadata().get_replication_factor());
                fileSystem.setPermission(path, dirperms);
                fileSystem.setReplication(path, (short) this.getMetadata().get_replication_factor());
            }
            ......
        }
    

    We can see that there are permission settings for path in both try and catch, but the permission in catch is different from that in try. In catch, the permission dirperms is given to the file. I think there is a problem here, and it should be the same as The permissions in try are consistent. Permissions should be set according to the following code

    
     public OutputStream getOutputStream() throws IOException {
            FsPermission fileperms = new FsPermission(BLOBSTORE_FILE_PERMISSION);
            try {
                out = fileSystem.create(path, (short) this.getMetadata().get_replication_factor());
                fileSystem.setPermission(path, fileperms);
                fileSystem.setReplication(path, (short) this.getMetadata().get_replication_factor());
            } catch (IOException e) {
               ......
                out = fileSystem.create(path, (short) this.getMetadata().get_replication_factor());
                fileSystem.setPermission(path, fileperms);
                fileSystem.setReplication(path, (short) this.getMetadata().get_replication_factor());
            }
            ......
        }
    
    opened by skysiders 0
  • [STORM-3885] fix(sec): upgrade com.fasterxml.jackson.core:jackson-databind to 2.12.6.1

    [STORM-3885] fix(sec): upgrade com.fasterxml.jackson.core:jackson-databind to 2.12.6.1

    What happened?

    There are 2 security vulnerabilities found in com.fasterxml.jackson.core:jackson-databind 2.10.5.1

    What did I do?

    Upgrade com.fasterxml.jackson.core:jackson-databind from 2.10.5.1 to 2.12.6.1 for vulnerability fix

    What did you expect to happen?

    Ideally, no insecure libs should be used.

    The specification of the pull request

    PR Specification from OSCS

    opened by denglunfuren 2
  • fix(sec): upgrade com.google.guava:guava to 30.0-jre

    fix(sec): upgrade com.google.guava:guava to 30.0-jre

    What happened?

    There are 2 security vulnerabilities found in com.google.guava:guava 17.0

    What did I do?

    Upgrade com.google.guava:guava from 17.0 to 30.0-jre for vulnerability fix

    What did you expect to happen?

    Ideally, no insecure libs should be used.

    The specification of the pull request

    PR Specification from OSCS

    opened by claire9910 1
  • fix(sec): upgrade org.elasticsearch:elasticsearch to 6.8.17

    fix(sec): upgrade org.elasticsearch:elasticsearch to 6.8.17

    opened by zhoumengyks 1
  • fix(sec): upgrade org.fusesource.mqtt-client:mqtt-client to 1.15

    fix(sec): upgrade org.fusesource.mqtt-client:mqtt-client to 1.15

    What happened?

    There are 1 security vulnerabilities found in org.fusesource.mqtt-client:mqtt-client 1.10

    What did I do?

    Upgrade org.fusesource.mqtt-client:mqtt-client from 1.10 to 1.15 for vulnerability fix

    What did you expect to happen?

    Ideally, no insecure libs should be used.

    The specification of the pull request

    PR Specification from OSCS

    opened by zhoumengyks 0
Owner
The Apache Software Foundation
The Apache Software Foundation
Apache ZooKeeper

Apache ZooKeeper For the latest information about Apache ZooKeeper, please visit our website at: https://zookeeper.apache.org and our wiki, at: https:

The Apache Software Foundation 11k Jan 6, 2023
Apache Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks

Apache Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks. It can run Hadoop, Jenkins, Spark, Aurora, and other frameworks on a dynamically shared pool of nodes.

The Apache Software Foundation 5k Dec 31, 2022
Mirror of Apache Storm

Master Branch: Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processi

The Apache Software Foundation 6.4k Jan 3, 2023
A scalable, mature and versatile web crawler based on Apache Storm

StormCrawler is an open source collection of resources for building low-latency, scalable web crawlers on Apache Storm. It is provided under Apache Li

DigitalPebble Ltd 776 Jan 2, 2023
Storm - a fast, easy to use, no-bullshit opinionated Java ORM inspired by Doctrine

A stupidly simple Java/MySQL ORM with native Hikaricp and Redis cache support supporting MariaDB and Sqlite

Mats 18 Dec 1, 2022
Mirror of Apache Deltaspike

Apache DeltaSpike Documentation Mailing Lists Contribution Guide JIRA Apache License v2.0 Apache DeltaSpike is a suite of portable CDI Extensions inte

The Apache Software Foundation 141 Jan 1, 2023
Mirror of Apache Mahout

Welcome to Apache Mahout! The goal of the Apache Mahout™ project is to build an environment for quickly creating scalable, performant machine learning

The Apache Software Foundation 2k Jan 4, 2023
Mirror of Apache Kafka

Apache Kafka See our web site for details on the project. You need to have Java installed. We build and test Apache Kafka with Java 8, 11 and 15. We s

The Apache Software Foundation 23.9k Jan 5, 2023
Mirror of Apache RocketMQ

Apache RocketMQ Apache RocketMQ is a distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level c

The Apache Software Foundation 18.5k Dec 28, 2022
Mirror of Apache ActiveMQ

Welcome to Apache ActiveMQ Apache ActiveMQ is a high performance Apache 2.0 licensed Message Broker and JMS 1.1 implementation. Getting Started To hel

The Apache Software Foundation 2.1k Jan 2, 2023
Mirror of Apache ActiveMQ Artemis

ActiveMQ Artemis This file describes some minimum 'stuff one needs to know' to get started coding in this project. Source For details about the modify

The Apache Software Foundation 824 Dec 26, 2022
Mirror of Apache SIS

============================================= Welcome to Apache SIS <http://sis.apache.org> ============================================= SIS is a Ja

The Apache Software Foundation 81 Dec 26, 2022
Mirror of Apache Cassandra

Apache Cassandra Apache Cassandra is a highly-scalable partitioned row store. Rows are organized into tables with a required primary key. Partitioning

The Apache Software Foundation 7.7k Jan 1, 2023
Mirror of Apache SystemML

Apache SystemDS Overview: SystemDS is a versatile system for the end-to-end data science lifecycle from data integration, cleaning, and feature engine

The Apache Software Foundation 940 Dec 25, 2022
Mirror of Apache SystemML

Apache SystemDS Overview: SystemDS is a versatile system for the end-to-end data science lifecycle from data integration, cleaning, and feature engine

The Apache Software Foundation 940 Dec 25, 2022
Real-time Query for Hadoop; mirror of Apache Impala

Welcome to Impala Lightning-fast, distributed SQL queries for petabytes of data stored in Apache Hadoop clusters. Impala is a modern, massively-distri

Cloudera 27 Dec 28, 2022
Mirror of Apache Qpid

We have moved to using individual Git repositories for the Apache Qpid components and you should look to those for new development. This Subversion re

The Apache Software Foundation 125 Dec 29, 2022
Mirror of Apache Velocity Engine

Title: Apache Velocity Engine Apache Velocity Welcome to Apache Velocity Engine! Apache Velocity is a general purpose template engine written in Java.

The Apache Software Foundation 298 Dec 22, 2022
Now redundant weka mirror. Visit https://github.com/Waikato/weka-trunk for the real deal

weka (mirror) Computing and Mathematical Sciences at the University of Waikato now has an official github organization including a read-only git mirro

Benjamin Petersen 313 Dec 16, 2022
Bouncy Castle Java Distribution (Mirror)

The Bouncy Castle Crypto Package For Java The Bouncy Castle Crypto package is a Java implementation of cryptographic algorithms, it was developed by t

Legion of the Bouncy Castle Inc 1.8k Dec 30, 2022