Fault tolerance and resilience patterns for the JVM

Overview

Failsafe

Build Status Maven Central License JavaDoc Join the chat at https://gitter.im/jhalterman/failsafe

Failsafe is a lightweight, zero-dependency library for handling failures in Java 8+, with a concise API for handling everyday use cases and the flexibility to handle everything else. It works by wrapping executable logic with one or more resilience policies, which can be combined and composed as needed. Current policies include Retry, Timeout, Fallback, and CircuitBreaker.

Usage

Visit the Failsafe website.

Contributing

Check out the contributing guidelines.

License

Copyright Jonathan Halterman and friends. Released under the Apache 2.0 license.

Comments
  • RetryPolicy thread safety

    RetryPolicy thread safety

    Is it safe to use RetryPolicy from multiple threads?

    I can see it doesn't set it's own members but it does add members to (array)lists of predicates. This is potentially not thread-safe and can break when using the same RetryPolicy object from multiple threads.

    Is there a standard to do this? I can copy the object for now when I modify it on specific threads, but perhaps making it thread safe is possible on your end.

    enhancement 3.0 
    opened by reutsharabani 49
  • How to reset failsafe?

    How to reset failsafe?

    Hi sir, Could you help me with my requirement? My requirement is to carry out a JDBC operation with retries. I am using the following approach:

    RetryPolicy retryPolicy = (RetryPolicy) new RetryPolicy()
            .withDelay(Duration.ofMillis(retryIntervalMillis))
            .withMaxRetries(retryLimit)
            .onFailedAttempt(e -> {
                ExecutionAttemptedEvent event = (ExecutionAttemptedEvent) e;
                LOG.warn("Error encountered while establishing connection or doing the " +
                                "read/write operation {}", event.getLastFailure().getMessage());
            })
            .onRetry(e -> {
                ExecutionAttemptedEvent event = (ExecutionAttemptedEvent) e;
                Throwable failure = event.getLastFailure();
                // log the retry as it seems to be a network/connection failure
                LOG.warn("Connection error encountered {}", failure.getMessage(),
                        failure);
                LOG.warn("Retrying {}th time to proceed again with a new connection",
                        event.getAttemptCount());
            })
            .handleIf(failure -> isRetryRequired((Throwable) failure));
    

    isRetryRequired() checks the exception's message and decides whether to do retry ot not. I am not showing its body here.

    Next, this is how failsafe is used:

    try {
        Failsafe
                .with(retryPolicy)
                .run(() -> {
                    getJdbcConnection();
                    createStatement();
                    executeQueries();
                });
    } catch (SQLException e1) {
        // todo
    } finally {
        closeConnection();
    }
    
    void executeQueries() {
      for (String query : queryList) {
       // execute the query
      }
    }
    

    My question is, if any of the method fails (getJdbcConnection or createStatement or executeQuery) then I want to retry and thats how I have written the code. However, suppose I had configured three retries and I had successfully obtained the connection in 2nd retry. Once the connection is successful, I want to reset the Failsafe so that it can do three retries again if the connection fails next time. How is this possible? How can I reset the Failsafe? Or what approach do you suggest?

    opened by Syed-SnapLogic 34
  • Dynamic delay

    Dynamic delay

    This PR is in response to #110. It adds a delay function property to RetryPolicy that is used, if set, to compute the next delay from the previous result or exception.

    There are some awkward bits here:

    • I used net.jodah.failsafe.util.Duration in signature of the delay function, though I really wanted to use java.time.Duration.
    • Combining delay factor other than 1 with delay function would be meaningless, but I've done nothing to make them mutually exclusive.
    • The one included test is pretty crude, passing if the actual delay is within a window of the requested delay.
    opened by Tembrel 25
  • Support basic Java Executor interface

    Support basic Java Executor interface

    Currently you can provide either ExecutorService, ScheduledExecutorService or implement custom Failsafe Scheduler. It would be good if it would also accept java.util.concurrent.Executor interface since it is a common interface returned by other libs. My current use case is in gRPC, when splitting context for multi-threading, and gRPC methods #fixedContextExecutor return base Executor.

    enhancement 3.0 
    opened by paulius-p 23
  • Async : How to gracefull shutdown ?

    Async : How to gracefull shutdown ?

    Hello,

    I'm trying FailSafe with:

    final ScheduledExecutorService executor = Executors.newScheduledThreadPool(2);
    final RetryPolicy retryPolicy = new RetryPolicy()
                    .withDelay(100, TimeUnit.MILLISECONDS)
                    .retryOn(RuntimeException.class)
                    .retryWhen(false)
                    ;
    
    // this task will fail 99% of time.
    Failsafe
                    .with(retryPolicy)
                    .with(executor)
                    .get((ctx) -> {
                            double i = Math.random();
                            if( i > 0.99 ) {
                                return true;
                            } else if (i < 0.1 ) {
                                throw new RuntimeException(" i = " + i);
                            }
                           return false;
                        })
            ;
    

    Then i add a ShutdownHook:

    Runtime.getRuntime().addShutdownHook(new Thread(() -> {
           executor.shutdown();
           executor.awaitTermination(1, TimeUnit.HOURS);
    }));
    

    Is there to do something like:

    FailSafe.awaitAllTaskComplete();
    executor.shutdown();
    executor.awaitTermination(1, TimeUnit.HOURS);
    

    Or should i have to store all future created and check if they all have been complete ?

    opened by pgoergler 23
  • Implement Timeout policy

    Implement Timeout policy

    For async, we use ForkJoinPool by default, currently with a CompletableFuture being used internally. Unfortunately, neither CompletableFuture or ForkJoinPool support cancel with interrupts for their tasks. So cancellation will only be effective for tasks waiting to be run.

    • The Timeout policy should be configurable to support interrupts or not.
    • The Timeout policy should (probably) fail with TimeoutException so that it can be bubbled up to outer policies and easily recognized by them as a failure (similar to CircuitBreakerOpenException).
    2.2 
    opened by jhalterman 22
  • Added support to easily create Proxy instances

    Added support to easily create Proxy instances

    It can be beneficial to wrap a whole interface easilyw ith Failsafe and provide RetryPolicy and CircuitBreaker as a whole.

    By leveraging JRE's built-in proxy construction, Failsafe can add support for creating proxy instances for interfaces (byteBuddy and other libraries can provide ability to create proxy classes of concrete types).

    Fixes #107

    opened by fzakaria 22
  • Restrictive time window for circuit breaker to record failures

    Restrictive time window for circuit breaker to record failures

    Right now it's possible to configure failure thresholds in terms of consecutive failures:

    • withFailureThreshold(4, 5), four failures out of five consecutive executions
    • withFailureThreshold(5, 10), five failures out of ten consecutive executions
    • etc.

    There is no notion of time right now. What I'm hoping for would be the ability to specify something like:

    • 10 failures within 1 second
    • 50 failures out of 200 executions within a minute

    My idea behind this is to effectively disable the circuit breaker in low-traffic scenarios but if traffic suddenly increases it should start to kick in. For requests with low traffic (less than ~10 per second) we identified circuit breakers as actually making matters worse, since they tend to stay open for longer, once they open up. And ultimately circuit breaker are means to protect overloading an application which doesn't really happen with few requests.

    opened by whiskeysierra 19
  • Improve handling of shutdown ExecutorService

    Improve handling of shutdown ExecutorService

    Hello all,

    first of all: thanks for this neat library, I appreciate it!

    But for my current usecase, I ran into a somewhat weird issue... Several experiments later, it seems I might have been able to isolate the problem a bit... But from the beginning:

    [Java 16 / Windows / Failsafe 2.4.0]

    I have a task, which will spawn some other tasks inside a local ExecutorService. Therefore, the ExecutorService needs to be shut down at the end. To harden this task, I want to use Failsafe, containing amongst other things a Timeout. But this Timeout does not work as expected in some cases, but gets swallowed so that the Task is not interrupted. (The tasks are responsive to interrupt.)

    I have put together a somewhat verbose example to illustrate the issue:

    import java.time.Duration;
    import java.util.concurrent.ExecutorService;
    import java.util.concurrent.Executors;
    import java.util.concurrent.ForkJoinPool;
    import java.util.concurrent.TimeUnit;
    import net.jodah.failsafe.Failsafe;
    import net.jodah.failsafe.Timeout;
    
    public class FailsafeShutdownDemo {
    
        public static void main(String[] args) throws InterruptedException {
            
    /* A */ ExecutorService executorService = Executors.newScheduledThreadPool(4);
    /* B */ // ExecutorService executorService = ForkJoinPool.commonPool();
    
            System.out.println("Start");
    
            Failsafe.with(Timeout.of(Duration.ofSeconds(3))
                                 .withCancel(true))
        /* 1 */     .with(executorService)  // TIMEOUT FAILS (ExecutorService)  -  AS EXPECTED (ForkJoinPool)
        /* 2 */  // .with((ScheduledExecutorService) executorService)  // AS EXPECTED (ExecutorService cast to ScheduledExecutorService)
                    .onComplete(complete -> System.out.println("onComplete  ->  " + (complete.getFailure() == null
                                                                                     ? "No Failure / Timeout didn't work. :-("
                                                                                     : "Failure is " + complete.getFailure() + " as expected. :-)")))
                    .getAsync(() -> {
                        System.out.println("runAsync()");
                        TimeUnit.SECONDS.sleep(5);
                        System.out.println("Hello World!  <- (after Timeout!)");
                        return "Success!";
                    });
    
            // prevent race-condition being a cause of this problem
            TimeUnit.SECONDS.sleep(1);
    
            System.out.println("Shutdown executorService  <- prevents Failsafe from executing the Timeout, "
                               + "IF a DelegatingScheduler is used (depending on method-overloading, see [1]). \n\t"
                               + "IF the ScheduledExecutorService is used directly (choose via cast in [2]), the Timeout works as expected.");
    /* X */ executorService.shutdown();
    
            // prevent daemons from exiting early
            TimeUnit.SECONDS.sleep(5);
    
            System.out.println("EXIT");
        }
    
    }
    
    

    The core is getAsync(CheckedSupplyer), whereas the Supplier represents my potentially long running heavy lifting task, which shall be interrupted.

    I have set up Failsafe to interrupt the Thread after the Timeout, which works - as long, as I do not shut down the executorService (marked by comment X).

    But when I enable the shutdown (as needed - and as it's not shutdownNow(), so I want to keep it early to prevent adding further tasks), things get interesting:

    • If I supply the ScheduledExecutorService as its superclass ExecutorService to Failsafe (see A + 1), the Task won't be cancelled.
    • If I supply the same ScheduledExecutorService (A + 2), the Task will be cancelled! This led me to the overloaded methods FailsafeExecutor.with(.), which decides, if a DelegatingScheduler is used.
    • If I use a ForkJoinPool (B + 1), it also works as expected and the Task gets cancelled.

    From these observations, it seems to be a Problem somewhat related to the scheduling of the timeout-trigger in the DelegatingScheduler...? I think it cannot be a race-condition between shutdown and adding the trigger, as I used a sleep for that as well. And shutdown() does not interrupt any existing tasks, so this should be fine as well (and it is for non-DelegatingScheduler)...

    Did I miss something? I hope the example makes it clear. In any case, it seems dangerous to get those different results, just by different executors or different method-overloads for the same (!) scheduler...

    opened by brainbytes42 18
  • Null policy

    Null policy

    Is there a way to get null/noop version of FailsafeExecutor and perhaps even of the individual policies? I would like to add a requirement to pass FailsafeExecutor to some APIs, but I have to make sure this feature can be easily disabled by submitting some kind of null/noop FailsafeExecutor that just executes everything as if the code was executed directly.

    I see that Failsafe.with() will throw IllegalArgumentException if I give it an empty list of policies. I could probably find a workaround, for example by calling .handleIf(e -> false) on RetryPolicy. But is there a clean, concise solution with minimal overhead?

    opened by robertvazan 18
  • Fallback success and failure policy listeners

    Fallback success and failure policy listeners

    Hi,

    I've a question on using policy listeners with Fallback policy. I understand that onSuccess() is executed when the fallback is executed successfully.

    However I'm observing something I didn't quite expect. For e.g., with the below Fallback policy configured to execute on null result, I would not expect Got from fallback to be printed because the main call returns non-null and so fallback logic itself should not be executed.

            Fallback<String> fallback = Fallback.of("hello")
                    .handleResult(null)
                    .onSuccess(e -> System.out.println("Got from fallback"))
                    .onFailure(e -> System.out.println("Failed to get from fallback"));
    
            String result = Failsafe.with(fallback)
                    .get(() -> "world");
    
            System.out.println("Result is " + result);
    

    But I get the below output -

    Got from fallback
    Result is world
    

    Why did the onSuccess() listener get executed?

    And if I change the main call to return null, then the onFailure() listener is getting executed even though the fallback executes successfully and returns the fallback value.

            Fallback<String> fallback = Fallback.of("hello")
                    .handleResult(null)
                    .onSuccess(e -> System.out.println("Got from fallback"))
                    .onFailure(e -> System.out.println("Failed to get from fallback"));
    
            String result = Failsafe.with(fallback)
                    .get(() -> null);
    
            System.out.println("Result is " + result);
    

    Output -

    Failed to get from fallback
    Result is hello
    

    Perhaps my understanding is incorrect or I'm being daft. 😅

    bug 
    opened by sanoopps 17
  • FYI: Very compact

    FYI: Very compact "lean" version of DelegatingScheduler

    This is the continuation of https://github.com/failsafe-lib/failsafe/issues/349

    Here: https://github.com/magicprinc/failsafe/commits/leap_of_faith

    Final memory balance: -1 fat object CompletableFuture -1 lambda Callable in DelegatingScheduler.schedule -1 Callable-Runnable wrapper in delayer().schedule (Runnables are wrapped as Callables in FutureTask ctor)

    +1 very lean object ScheduledCompletableFuture implements ScheduledFuture, Callable (not a CompletableFuture anymore)

    I am sure this is the final step and one can't optimize this class further. Not a single unused byte in memory!

    opened by magicprinc 1
  • Feature: micrometer.io metrics integration

    Feature: micrometer.io metrics integration

    If you are looking for new ideas: https://micrometer.io/ Metrics would be great!

    It is new SLF4J for metrics and all people I know use it as standard de facto.

    If you need something for an inspiration: https://github.com/brettwooldridge/HikariCP/tree/dev/src/main/java/com/zaxxer/hikari/metrics/micrometer

    https://github.com/micrometer-metrics/micrometer/blob/main/micrometer-core/src/main/java/io/micrometer/core/instrument/binder/cache/CaffeineCacheMetrics.java

    https://github.com/micrometer-metrics/micrometer/blob/main/micrometer-core/src/main/java/io/micrometer/core/instrument/binder/okhttp3/OkHttpMetricsEventListener.java

    opened by magicprinc 3
  • FailsafeCall micro refactoring, plus World-Wide Nr.1 duplicated utility method for OkHttp

    FailsafeCall micro refactoring, plus World-Wide Nr.1 duplicated utility method for OkHttp

    As you can see here: https://github.com/magicprinc/failsafe/commit/c517e3ef01aec35cd6b6aaa23779873f8e89ffab

    FailsafeCall micro refactoring:

    1. AtomicBoolean fields are final

    2. lambda expression instead of code block

    3. World-Wide Nr.1 duplicated utility method for OkHttp `/** [OkHttp Callback to JDK CompletableFuture]
      Helps eliminate dozens of utility classes World-wide with exactly this same method.
      Can be the first small step towards FailSafe.
      Returns normal JDK {@link CompletableFuture} without FailSafe policies. */

    public static CompletableFuture asPromise (okhttp3.Call call)`

    All around the World, people write this method again and again. I have done it too. We really need "The Chosen One". I recommend you to be this one :-)

    If you like it, I will send it as PR.

    opened by magicprinc 4
  • DelegatingScheduler singletons in modern style

    DelegatingScheduler singletons in modern style

    DelegatingScheduler uses an old singleton idiom with double volatile check and synchronized. Bill Pugh Singleton Implementation is better, shorter and uses (in some cases) less memory. Plus fields become "static final" so JVM can do some other optimizations.

    opened by magicprinc 9
  • Support accrual failure detection

    Support accrual failure detection

    As Failsafe already supports policies that are useful for networked operations, it would make sense to support phi accrural (or other accural algorithms) failure detection for situations where fixed timeouts don't adequately account for changing load conditions.

    This could be implemented as a new policy which measures execution times over a number of executions, to determine if some threshold is crossed which represents a failure. Phi accrual could be one strategy supported by the policy, but there could be others. When the threshold is crossed, a fallback-like function could be called, for example, to fail over a system from one node that has failed to another. In that sense, the policy would be like a time-based fallback (rather than result based), except unlike a fallback it would be stateful.

    Alternatively, this could be implemented as a Timeout option, where the timeout is stateful and adapts to execution time distributions.

    One open question for this policy is, similar to a circuit breaker or rate limiter, at what point should it "reset" after triggering a failure, or should it even reset?

    Any ideas for how this should work or what the policy should be named are welcome!

    enhancement new-policy 
    opened by jhalterman 4
Owner
Jonathan Halterman
Jonathan Halterman
Resilience4j is a fault tolerance library designed for Java8 and functional programming

Fault tolerance library designed for functional programming Table of Contents 1. Introduction 2. Documentation 3. Overview 4. Resilience patterns 5. S

Resilience4j 8.5k Jan 2, 2023
A reactive Java framework for building fault-tolerant distributed systems

Atomix Website | Javadoc | Slack | Google Group A reactive Java framework for building fault-tolerant distributed systems Please see the website for f

Atomix 2.3k Dec 29, 2022
Fibers, Channels and Actors for the JVM

Quasar Fibers, Channels and Actors for the JVM Getting started Add the following Maven/Gradle dependencies: Feature Artifact Core (required) co.parall

Parallel Universe 4.5k Dec 25, 2022
Build highly concurrent, distributed, and resilient message-driven applications on the JVM

Akka We believe that writing correct concurrent & distributed, resilient and elastic applications is too hard. Most of the time it's because we are us

Akka Project 12.6k Jan 3, 2023
Vert.x is a tool-kit for building reactive applications on the JVM

Vert.x Core This is the repository for Vert.x core. Vert.x core contains fairly low-level functionality, including support for HTTP, TCP, file system

Eclipse Vert.x 13.3k Jan 8, 2023
Reactive Microservices for the JVM

Lagom - The Reactive Microservices Framework Lagom is a Swedish word meaning just right, sufficient. Microservices are about creating services that ar

Lagom Framework 2.6k Dec 30, 2022
BitTorrent library and client with DHT, magnet links, encryption and more

Bt A full-featured BitTorrent implementation in Java 8 peer exchange | magnet links | DHT | encryption | LSD | private trackers | extended protocol |

Andrei Tomashpolskiy 2.1k Jan 2, 2023
Zuul is a gateway service that provides dynamic routing, monitoring, resiliency, security, and more.

Zuul Zuul is an L7 application gateway that provides capabilities for dynamic routing, monitoring, resiliency, security, and more. Please view the wik

Netflix, Inc. 12.4k Jan 3, 2023
Distributed Stream and Batch Processing

What is Jet Jet is an open-source, in-memory, distributed batch and stream processing engine. You can use it to process large volumes of real-time eve

hazelcast 1k Dec 31, 2022
a blockchain network simulator aimed at researching consensus algorithms for performance and security

Just Another Blockchain Simulator JABS - Just Another Blockchain Simulator. JABS is a blockchain network simulator aimed at researching consensus algo

null 49 Jan 1, 2023
Simple and lightweight sip server to create voice robots, based on vert.x

Overview Lightweight SIP application built on vert.x. It's intended to be used as addon for full-featured PBX to implement programmable voice scenario

Ivoice Technology 7 May 15, 2022
Apache Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks

Apache Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks. It can run Hadoop, Jenkins, Spark, Aurora, and other frameworks on a dynamically shared pool of nodes.

The Apache Software Foundation 5k Dec 31, 2022
Fault tolerance and resilience patterns for the JVM

Failsafe Failsafe is a lightweight, zero-dependency library for handling failures in Java 8+, with a concise API for handling everyday use cases and t

Jonathan Halterman 3.9k Jan 2, 2023
Fault tolerance and resilience patterns for the JVM

Failsafe Failsafe is a lightweight, zero-dependency library for handling failures in Java 8+, with a concise API for handling everyday use cases and t

Failsafe 3.9k Dec 29, 2022
Netflix, Inc. 23.1k Jan 5, 2023
Resilience4j is a fault tolerance library designed for Java8 and functional programming

Fault tolerance library designed for functional programming Table of Contents 1. Introduction 2. Documentation 3. Overview 4. Resilience patterns 5. S

Resilience4j 8.5k Jan 2, 2023
How To Implement Fault Tolerance In Microservices Using Resilience4j

springboot-resilience4j-demo How To Implement Fault Tolerance In Microservices Using Resilience4j? Things todo list: Clone this repository: git clone

Hendi Santika 4 Mar 30, 2022
A powerful flow control component enabling reliability, resilience and monitoring for microservices. (面向云原生微服务的高可用流控防护组件)

Sentinel: The Sentinel of Your Microservices Introduction As distributed systems become increasingly popular, the reliability between services is beco

Alibaba 20.4k Dec 31, 2022