A reactive Java framework for building fault-tolerant distributed systems

Overview

Atomix

Atomix

Website | Javadoc | Slack | Google Group

Build Status Quality Gate Coverage Status Maven Central Javadocs

A reactive Java framework for building fault-tolerant distributed systems

Please see the website for full documentation.

Atomix 3.0 is a fully featured framework for building fault-tolerant distributed systems. It provides a set of high-level primitives commonly needed for building scalable and fault-tolerant distributed systems. These primitives include:

  • Cluster management and failure detection
  • Direct and publish-subscribe messaging
  • Distributed coordination primitives built on a novel implementation of the Raft consensus protocol
  • Scalable data primitives built on a multi-primary protocol
  • Synchronous and asynchronous Java APIs
  • Standalone agent
  • REST API

Acknowledgements

Atomix is developed as part of the ONOS project at the Open Networking Foundation. Atomix project thanks ONF for its ongoing support!

ONF


YourKit supports open source projects with its full-featured Java Profiler. YourKit, LLC is the creator of YourKit Java Profiler and YourKit .NET Profiler, innovative and intelligent tools for profiling Java and .NET applications.

YourKit

Comments
  • failed to join the cluster on example

    failed to join the cluster on example

    When running GitHub example we're getting this error.

    java -jar examples/leader-election/target/atomix-leader-election.jar logs/server3 localhost:5002 localhost:5000 localhost:5001 23:41:37.336 [copycat-server-localhost/127.0.0.1:5002] INFO i.a.c.server.state.ServerContext - Server started successfully! Exception in thread "main" java.util.concurrent.CompletionException: java.lang.IllegalStateException: failed to join the cluster at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:769) at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1969) at io.atomix.copycat.server.state.ServerState.join(ServerState.java:597) at io.atomix.copycat.server.state.ServerState.lambda$join$57(ServerState.java:591) at io.atomix.copycat.server.state.ServerState$$Lambda$40/812535648.accept(Unknown Source) at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1969) at io.atomix.catalyst.transport.NettyConnection.lambda$handleResponseFailure$5(NettyConnection.java:172) at io.atomix.catalyst.transport.NettyConnection$$Lambda$49/129084479.run(Unknown Source) at io.atomix.catalyst.util.concurrent.Runnables.lambda$logFailure$12(Runnables.java:20) at io.atomix.catalyst.util.concurrent.Runnables$$Lambda$11/399534175.run(Unknown Source) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalStateException: failed to join the cluster

    opened by hvandenb 28
  • Adding members after cluster initialization?

    Adding members after cluster initialization?

    Lets say I start up a single node, that only defines itself as a member. If I start up a 2nd node, that lists the first as its member, will the first node accept it? This wasn't clear after reading through the doc.

    Or, do all member need to be initialized exactly with each other, to start up?

    Also, two more quick things.

    1. The snapshot isn't deployed to maven central, please run a mvn deploy to do this.
    2. I'm thinking about using this to replicate a sqlite database. Would the file event log be my best bet? IE writing the INSERTS and such to the file log. Or would I have to use the state machine resource instead?

    Thanks.

    edit: I was finally able to install this locally by doing a git clone, and running mvn clean install -DskipTests, because there were a lot of test failures.

    opened by dessalines 24
  • Improve log compaction/snapshot timing and implement a hard limit on compaction

    Improve log compaction/snapshot timing and implement a hard limit on compaction

    This PR is a significant refactoring of how/when log compaction occurs in the Raft cluster. It implements three conditions for taking snapshots and compacting logs:

    • When the load on a Raft node is low it takes snapshots and compacts logs whenever possible and at a somewhat leisurely pace
    • When the load on a Raft node is high but the node is running out of disk space, it takes snapshots and compacts logs ASAP, hopefully before hitting the hard limit
    • When a Raft node doesn't have enough disk space to allocate any more full segments, it stops writes altogether and blocks until there's enough disk space

    To determine whether a node is running out of disk space, periodic sampling of disk usage is used to estimate the rate at which disk is being consumed. Monitoring the total available disk space rather than the rate at which the log is growing ensures that parallel nodes and other processes that use disk running on the same machine are taken into account.

    To block writes to the cluster when the disk fills up, the JournalWriter will throw an exception when the journal needs to roll over to a new segment but the segment cannot be allocated according to the configured maximum segment size. When this condition occurs, writes to the leader synchronously await log compaction, and AppendRequests to replicate entries to followers are rejected.

    opened by kuujo 19
  • Accept raft role listeners at raft partition server

    Accept raft role listeners at raft partition server

    This feature allows the user to add raft role change listeners to the RaftPartitionGroup.

    Would be nice if we could have that, we have the use case where we need to know who is the leader and listen for then role changes (whether we becoming the leader or step down). We want to start processing on leader nodes and stop the processing on followers.

    opened by Zelldon 15
  • ReferencePool serialization exception

    ReferencePool serialization exception

    Hi, My copycat is version 0.6.SNAPSHOT Trying to start 2 servers on the same machine from 2 different directories and 2 different ports using these commands server1 c:\JDK64\1.8.0.45\bin\java -jar copycat-server.jar 1:my_local_host:8080 2:my_local_host:8090 server2 c:\JDK64\1.8.0.45\bin\java -jar copycat-server.jar 1:my_local_host:8090 2:my_local_host:8080

    Got this exception

    15/08/31 09:23:24 ERROR concurrent.SingleThreadContext: An uncaught exception occurred net.kuujo.copycat.io.serializer.SerializationException: failed to serialize Java object at net.kuujo.copycat.io.serializer.Serializer.writeSerializable(Serializer.java:625) at net.kuujo.copycat.io.serializer.Serializer.writeObject(Serializer.java:549) at net.kuujo.copycat.io.transport.NettyConnection.writeRequest(NettyConnection.java:214) at net.kuujo.copycat.io.transport.NettyConnection.lambda$send$6(NettyConnection.java:288) at net.kuujo.copycat.io.transport.NettyConnection$$Lambda$42/1838865044.run(Unknown Source) at net.kuujo.copycat.util.concurrent.SingleThreadContext$1.lambda$execute$9(SingleThreadContext.java:28) at net.kuujo.copycat.util.concurrent.SingleThreadContext$1$$Lambda$6/1018547642.run(Unknown Source) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.NotSerializableException: net.kuujo.copycat.util.ReferencePool at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) at net.kuujo.copycat.io.serializer.Serializer.writeSerializable(Serializer.java:620) ... 13 more

    opened by mikkazan 15
  • ResourceFactories and Segmented ClassLoaders

    ResourceFactories and Segmented ClassLoaders

    With recent changes that introduce ResourceFactories, I'm having trouble creating a custom resource. The custom resource and its classes are defined in a bundle A and Atomix classes are in bundle B. They don't share class loaders.

    As a result deserialization code in https://github.com/atomix/atomix/blob/master/resource/src/main/java/io/atomix/resource/ResourceType.java#L83 fails with a ClassNotFoundException

    @kuujo: Is it possible to have this be compatible with a OSGi environment where there are multiple class loaders?

    enhancement 
    opened by madjam 13
  • Fix node catch up after log compaction

    Fix node catch up after log compaction

    Hey,

    I hope this PR is conform to your contribution guidelines, unfortunately I didn't found them.

    This PR fixes the problem described in #978 (might be more an easy fix, maybe you need to have an look why this segment is closed twice). I was able to reproduce the problem with the an unit test in the RaftTest, but after fixing this issue I would like to assert in the test that all nodes have the same state and log, but I don't know how to test that. But maybe it is also fine if the join was completed without any problems.

    closes #978

    opened by Zelldon 12
  • Add member location provider abstraction for pluggable discovery of cluster members

    Add member location provider abstraction for pluggable discovery of cluster members

    This PR is an attempt at a solution for #659. The implementation adds a MemberLocationProvider abstraction which can be configured on the Atomix instance. The provider is a simple ListenerService which triggers JOIN/LEAVE events containing an Address of new members. The ClusterMembershipService then connects to the Address to exchange higher level Member information.

    The default implementation, of course, is a Netty multicast-based location provider which is enabled when withMulticastEnabled is enabled.

    opened by kuujo 12
  • BlockingDistributedLock throws 100% of the time in some scenarios

    BlockingDistributedLock throws 100% of the time in some scenarios

    On my laptop, I can reproduce the error below by doing the follow - I’m using the latest 2.1.0-SNAPSHOT version

    1. Start atomix with 4 PERSISTENT nodes as follows...
            final Atomix atomix = Atomix.builder()
                    .withManagementGroup((RaftPartitionGroup.builder("system")
                            .withNumPartitions(1)
                            .withMembers(members.stream().toArray(Member[]::new))
                            .withDataDirectory(dataDir)
                            .build()))
                    .withPartitionGroups(RaftPartitionGroup.builder("raft")
                            .withNumPartitions(1)
                            .withMembers(members.stream().toArray(Member[]::new))
                            .withDataDirectory(dataDir)
                            .build())
                    .withMembers(members.stream().toArray(Member[]::new))
                    .withLocalMember(local)
                    .withClusterName("Cluster name")
                    .build()
    
    1. Take 1 node down
    2. Take 1 more node down (now we have 2 nodes ups out of 4)
    3. Start one of the previously downed nodes
    4. Any node attempting to take a lock will get the following exception
    io.atomix.primitive.PrimitiveException$Timeout: null
            at io.atomix.core.lock.impl.BlockingDistributedLock.complete(BlockingDistributedLock.java:77)
            at io.atomix.core.lock.impl.BlockingDistributedLock.tryLock(BlockingDistributedLock.java:57)
            at com.example.runOneIteration(xScheduler.java:55)
            at com.google.common.util.concurrent.AbstractScheduledService$ServiceDelegate$Task.run(AbstractScheduledService.java:193)
            at com.google.common.util.concurrent.Callables$4.run(Callables.java:119)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
    
    opened by rroller 12
  • Use phi accrual failure detectors for Raft elections and session timeouts

    Use phi accrual failure detectors for Raft elections and session timeouts

    This PR refactors how leadership elections and session expirations are handled in Raft.

    It adds a phi accrual failure detector used to determine when to start a new election. In order to avoid multiple servers starting an election at the same time, randomized timers are used to check the current phi value. The Raft election timeout is used as a fallback to ensure the timeout doesn't surpass that point.

    Sessions are also expired using phi accrual failure detectors. This is done by the leader sending heartbeats to clients. New sessions are opened with a minimum timeout, and the leader sends heartbeats to the clients at the rate of the minimum session timeout. Sending heartbeats from the leader to clients also ensures clients resolve new leaders as soon as possible. In order to account for the time period during which an old leader crash was being detected and a new leader was being elected, nodes track the last heartbeat time and subtract that time from session timeouts. This means sessions can be expired via the failure detector immediately after a leader change if the client can't be reached by the leader.

    opened by kuujo 11
  • WIP: Distributed group recovery

    WIP: Distributed group recovery

    Lets address some of the atomix client consistency guarantees. Let's take DistributedGroup for example. I create a new group and join it. Now the client's session expires for some reason. Default recoveryStrategy for atomix client is set to RECOVERY (cannot be overridden right now).

    Now when client successfully recovers, it gets a brand new session id. The problem is, it is not a member of distributed group anymore.

    Another problem is, that members of distributed group are cached. It does sync() when resource opens, but then updates its state based on onJoin and onLeave events. So with new session ID it doesn't receive those on reconnect and doesn't do any "resync" either.

    I think that atomix client should be able to handle this. Unfortunately I'm not sure how to properly address this, so at least I'm adding a test case to replicate the issue.

    What you guys think?

    Thanks, D.

    opened by dmvk 11
  • raft.getCluster().getMember(*).memberId() could Result in an NPE

    raft.getCluster().getMember(*).memberId() could Result in an NPE

    Expected behavior At line 178 of io.atomix.protocols.raft.roles.ActiveRole, the return value of raft.getCluster().getMember(raft.getLastVotedFor()) should be defended by a null check to avoid NullPointerException.

    protected VoteResponse handleVote(VoteRequest request) {
        //...
        else {
          log.debug("Rejected {}: already voted for {}", request, raft.getCluster().getMember(raft.getLastVotedFor()).memberId()); // NPE risk
          //...
        }
    }
    

    Actual behavior & Steps to reproduce raft.getCluster().getMember(raft.getLastVotedFor()) could return null when raft's last vote has no corresponding DefaultRaftMember and it's dereferenced unconditionally.

    Minimal yet complete reproducer code (or URL to code)

    protected VoteResponse handleVote(VoteRequest request) {
        //...
        else {
          DefaultRaftMember member = request, raft.getCluster().getMember(raft.getLastVotedFor());
          log.debug("Rejected {}: already voted for {}", member == null? null:member.memberId()); // NPE risk
          //...
        }
    }
    

    Environment

    • Atomix: default master
    • OS: [e.g. uname -a]
    • JVM [e.g. java -version]

    :warning:️ Please verify that your issue still occurs on the latest version of Atomix before reporting. The documentation is currently work-in-progress and is not yet complete.


    Have you searched the CLOSED issues already? How about checking in with the Atomix Google Group?

    opened by zhaoyangyingmu 0
  • Leak on server shutdown while still awaiting other nodes to join

    Leak on server shutdown while still awaiting other nodes to join

    Expected behavior

    Restarting Atomix configured to use N raft nodes, bit still not connected to any nodes, should not leak any Atomix instance.

    Actual behavior

    DefaultRaftServer::shutdown() is not closing it's RaftContext and cleaning up RaftContext::threadContext tasks.

    Steps to reproduce

    Create a single raft node (expecting N nodes in total), start it and stop it right after, while awaiting it to stop. After running a full GC, the Atomix instance is leaking (some screenshots below).

    Minimal yet complete reproducer code (or URL to code)

       public static Atomix createAtomix(String localMemberId,
                                         File dataDirectory,
                                         Map<String, Address> nodes) {
          final Address localAddress = nodes.get(localMemberId);
          if (localAddress == null) {
             throw new IllegalArgumentException("the local member id should been included in the node map");
          }
          final AtomixBuilder atomixBuilder = Atomix.builder().withMemberId(localMemberId).withAddress(localAddress);
          atomixBuilder
             .withMembershipProvider(BootstrapDiscoveryProvider.builder()
                                        .withNodes(
                                           nodes.entrySet().stream()
                                              .map(entry-> Node.builder()
                                                 .withId(entry.getKey())
                                                 .withAddress(entry.getValue())
                                                 .build())
                                              .collect(Collectors.toList())).build());
          // using Profile.consensus(members) is a short-cut of this but it won't left any config choice
          atomixBuilder
             .withManagementGroup(
                RaftPartitionGroup.builder("system")
                   .withNumPartitions(1)
                   .withMembers(nodes.keySet())
                   .withStorageLevel(StorageLevel.DISK)
                   .withDataDirectory(new File(dataDirectory, "management"))
                   .build())
             .withPartitionGroups(
                RaftPartitionGroup.builder("data")
                   .withNumPartitions(1)
                   .withMembers(nodes.keySet())
                   .withStorageLevel(StorageLevel.DISK)
                   .withDataDirectory(new File(dataDirectory, "data"))
                   .build());
          return atomixBuilder.build();
       }
    
       @Test
       public void atomixLeak() {
          File f = new File("./atomix");
          f.deleteOnExit();
          final String localId = "a";
          final Address localAddress = Address.from("localhost:7070");
          final Map<String, Address> nodes = new HashMap<>(3);
          nodes.put(localId, localAddress);
          nodes.put("b", Address.from("localhost:7071"));
          nodes.put("c", Address.from("localhost:7072"));
          Atomix atomix = createAtomix("a", f, nodes);
          try {
             // wait a bit in order to get the RaftServer::start called
             atomix.start().get(2, TimeUnit.SECONDS);
             Assert.fail();
          } catch (TimeoutException te) {
             try {
                atomix.stop().join();
             } catch (Throwable t) {
                Assert.fail();
             }
          } catch (Throwable t) {
             Assert.fail();
          }
       }
    

    It's important to take an heap snapshot after atomix.stop().join() is completed. I'm searching how to implement a minimal reproducer using just RaftServer for the PR I've sent to fix this.

    • Atomix: 3.2.0-SNAPSHOT
    opened by franz1981 1
  • Nodes are constantly joining and leaving with DnsDiscoveryProvider

    Nodes are constantly joining and leaving with DnsDiscoveryProvider

    I'm new to Atomix. I tried to run embedded Atomix inside my services on Kubernetes. I configured DnsDiscoveryProvided and I see in logs that nodes are constantly joining the cluster and leaving after a while. I have a question about this line:

    https://github.com/atomix/atomix/blob/842276c02541267a61bca47f82c2e9d740245fcb/cluster/src/main/java/io/atomix/cluster/discovery/DnsDiscoveryProvider.java#L142

    and

    https://github.com/atomix/atomix/blob/842276c02541267a61bca47f82c2e9d740245fcb/cluster/src/main/java/io/atomix/cluster/discovery/DnsDiscoveryProvider.java#L150

    newNodeIds contains all new discovered nodes. But in the second line (in the wrapping loop), all already discovered nodes are iterated and compared if the node is in newNodeIds. If it's not, an event is created that the node has left the cluster. But you can't compare it to newNodeIds because it contains only nodes that were now discovered (without nodes that are already discovered in the past). So the loop will remove all already discovered nodes.

    opened by kmrozek-shareablee 0
  • Error is not getting printed when an Channel could not connect to an address

    Error is not getting printed when an Channel could not connect to an address

    Expected behavior

    In ChannelPool.Java class in getChannel() method if a channel's future returns error that error is not getting printed in log.

    Line 134: LOGGER.debug("Failed to connect to {}", channel.remoteAddress(), e); and Line 104:LOGGER.debug("Failed to connect to {}", address, error);

    Actual behavior These lines should have additional {} to print error. Line 134: LOGGER.debug("Failed to connect to {} due to error {}", channel.remoteAddress(), e); and Line 104:LOGGER.debug("Failed to connect to {} due to error {}", address, error);

    Steps to reproduce

    Steps to reproduce the behavior.

    Minimal yet complete reproducer code (or URL to code)

    
    

    Environment

    • Atomix: [e.g. 3.0.0]
    • OS: [e.g. uname -a]
    • JVM [e.g. java -version]

    :warning:️ Please verify that your issue still occurs on the latest version of Atomix before reporting. The documentation is currently work-in-progress and is not yet complete.


    Have you searched the CLOSED issues already? How about checking in with the Atomix Google Group?

    opened by ashishmittal19 0
  • Is the repository open to take PRs for open issues?

    Is the repository open to take PRs for open issues?

    Hi, I am fascinated by atomix project. I see a number of open issues(53 at the time of this writing). Just wanted to know if the repository is open to take PRs for any of these issues? If yes, is there any contributing guide?

    opened by the123saurav 0
  • onos to atomix get timeout

    onos to atomix get timeout

    Expected behavior my atomix and onos on the same device, I compiler onos-2.2.4 with atomix-3.1.8(openjdk-11), When I tested with 2000 devices, onos will search some stats messages from atomix,I will get many timeout, This situation will make onos get oom Actual behavior when testing,I get this exception:

    2020-10-27T19:47:46,355 | DEBUG | raft-partition-group-raft-6 | RaftSessionConnection            | 129 - io.atomix.utils - 3.1.8 | SessionClient{29}{type=AtomicCounterType{name=atomic-counter}, name=sys-clock-counter} - CommandRequest{session=29, sequence=1106859, operation=PrimitiveOperation{id=DefaultOperationId{id=incrementAndGet, type=COMMAND}, value=null}} failed! Reason: {}
    java.util.concurrent.TimeoutException: Request type raft-partition-1-command timed out in 5000 milliseconds
            at io.atomix.cluster.messaging.impl.AbstractClientConnection$Callback.timeout(AbstractClientConnection.java:159) ~[?:?]
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
            at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
            at java.lang.Thread.run(Thread.java:834) [?:?]
    

    I print the log to try to find the problem, like this:

    final class RemoteServerConnection extends AbstractServerConnection {
      private static final byte[] EMPTY_PAYLOAD = new byte[0];
      private final Logger log = LoggerFactory.getLogger(getClass());
    
      private final Channel channel;
    
      RemoteServerConnection(HandlerRegistry handlers, Channel channel) {
        super(handlers);
        this.channel = channel;
      }
    
      @Override
      public void reply(ProtocolRequest message, ProtocolReply.Status status, Optional<byte[]> payload) {
        ProtocolReply response = new ProtocolReply(
            message.id(),
            payload.orElse(EMPTY_PAYLOAD),
            status);
        log.info("RemoteServerConnection reply, message subject {} message type {} message id {} message status {}", message.subject(), message.type(), message.id(), status.name());
        channel.writeAndFlush(response, channel.voidPromise());
      }
    }
    

    Then,I found a strange problem,if this log is info, the Timeout will get fewer,But if log is not info or not added,The timeout will get more. So, I have two question:

    1. why timeout?
    2. What is the impact of adding logs

    Environment

    • Atomix: [e.g. 3.1.8]
    • OS: [e.g. ubuntu-18.04]
    • JVM [e.g. openjdk-11]
    opened by xinchengwuxian 0
Owner
Atomix
A polyglot framework for building fault-tolerant distributed systems
Atomix
Orbit - Virtual actor framework for building distributed systems

Full Documentation See the documentation website for full documentation, examples and other information. Orbit 1 Looking for Orbit 1? Visit the orbit1

Orbit 1.7k Dec 28, 2022
Vert.x is a tool-kit for building reactive applications on the JVM

Vert.x Core This is the repository for Vert.x core. Vert.x core contains fairly low-level functionality, including support for HTTP, TCP, file system

Eclipse Vert.x 13.3k Jan 8, 2023
APM, (Application Performance Management) tool for large-scale distributed systems.

Visit our official web site for more information and Latest updates on Pinpoint. Latest Release (2020/01/21) We're happy to announce the release of Pi

null 12.5k Dec 29, 2022
Fault tolerance and resilience patterns for the JVM

Failsafe Failsafe is a lightweight, zero-dependency library for handling failures in Java 8+, with a concise API for handling everyday use cases and t

Jonathan Halterman 3.9k Dec 29, 2022
Resilience4j is a fault tolerance library designed for Java8 and functional programming

Fault tolerance library designed for functional programming Table of Contents 1. Introduction 2. Documentation 3. Overview 4. Resilience patterns 5. S

Resilience4j 8.5k Jan 2, 2023
A reactive dataflow engine, a data stream processing framework using Vert.x

?? NeonBee Core NeonBee is an open source reactive dataflow engine, a data stream processing framework using Vert.x. Description NeonBee abstracts mos

SAP 33 Jan 4, 2023
Reactive Microservices for the JVM

Lagom - The Reactive Microservices Framework Lagom is a Swedish word meaning just right, sufficient. Microservices are about creating services that ar

Lagom Framework 2.6k Dec 30, 2022
Build highly concurrent, distributed, and resilient message-driven applications on the JVM

Akka We believe that writing correct concurrent & distributed, resilient and elastic applications is too hard. Most of the time it's because we are us

Akka Project 12.6k Jan 3, 2023
Distributed Stream and Batch Processing

What is Jet Jet is an open-source, in-memory, distributed batch and stream processing engine. You can use it to process large volumes of real-time eve

hazelcast 1k Dec 31, 2022
Apache Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks

Apache Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks. It can run Hadoop, Jenkins, Spark, Aurora, and other frameworks on a dynamically shared pool of nodes.

The Apache Software Foundation 5k Dec 31, 2022
a reverse proxy load balancer using Java. Inspired by Nginx.

Project Outline: Project Main coding reverse proxy support configuration adding unit test works on Websocket Stress Test compared to Nginx load balanc

Feng 12 Aug 5, 2022
Java software that notifies by voice when a new Vaccine is available in your specified district/pincode

CowinVaccineAvailabilitySpeaker is a Java software that notifies user by voice when a new vaccine is available in the specified pin-code/district. It

Abhishek Chawla 10 May 24, 2021
Distributed and fault-tolerant realtime computation: stream processing, continuous computation, distributed RPC, and more

IMPORTANT NOTE!!! Storm has Moved to Apache. The official Storm git repository is now hosted by Apache, and is mirrored on github here: https://github

Nathan Marz 8.9k Dec 26, 2022
Netflix, Inc. 23.1k Jan 5, 2023
Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter

Heron is a realtime analytics platform developed by Twitter. It has a wide array of architectural improvements over it's predecessor. Heron in Apache

The Apache Software Foundation 3.6k Dec 28, 2022
Distributed, masterless, high performance, fault tolerant data processing

Onyx What is it? a masterless, cloud scale, fault tolerant, high performance distributed computation system batch and stream hybrid processing model e

Onyx 2k Dec 30, 2022
Operating Systems - Concepts of computer operating systems including concurrency, memory management, file systems, multitasking, performance analysis, and security. Offered spring only.

Nachos for Java README Welcome to Nachos for Java. We believe that working in Java rather than C++ will greatly simplify the development process by p

Sabir Kirpal 1 Nov 28, 2021