Fault tolerance and resilience patterns for the JVM

Last update: Dec 29, 2022

Overview

Failsafe

Failsafe is a lightweight, zero-dependency library for handling failures in Java 8+, with a concise API for handling everyday use cases and the flexibility to handle everything else. It works by wrapping executable logic with one or more resilience policies, which can be combined and composed as needed. Current policies include Retry, CircuitBreaker, RateLimiter, Timeout, Bulkhead, and Fallback.

Usage

Visit failsafe.dev for usage info, docs, and additional resources.

Contributing

Check out the contributing guidelines.

License

Comments

RetryPolicy thread safety

Is it safe to use RetryPolicy from multiple threads?

I can see it doesn't set it's own members but it does add members to (array)lists of predicates. This is potentially not thread-safe and can break when using the same RetryPolicy object from multiple threads.

Is there a standard to do this? I can copy the object for now when I modify it on specific threads, but perhaps making it thread safe is possible on your end.
enhancement 3.0

opened by reutsharabani 49

How to reset failsafe?

Hi sir, Could you help me with my requirement? My requirement is to carry out a JDBC operation with retries. I am using the following approach:

RetryPolicy retryPolicy = (RetryPolicy) new RetryPolicy()
        .withDelay(Duration.ofMillis(retryIntervalMillis))
        .withMaxRetries(retryLimit)
        .onFailedAttempt(e -> {
            ExecutionAttemptedEvent event = (ExecutionAttemptedEvent) e;
            LOG.warn("Error encountered while establishing connection or doing the " +
                            "read/write operation {}", event.getLastFailure().getMessage());
        })
        .onRetry(e -> {
            ExecutionAttemptedEvent event = (ExecutionAttemptedEvent) e;
            Throwable failure = event.getLastFailure();
            // log the retry as it seems to be a network/connection failure
            LOG.warn("Connection error encountered {}", failure.getMessage(),
                    failure);
            LOG.warn("Retrying {}th time to proceed again with a new connection",
                    event.getAttemptCount());
        })
        .handleIf(failure -> isRetryRequired((Throwable) failure));

isRetryRequired() checks the exception's message and decides whether to do retry ot not. I am not showing its body here.

Next, this is how failsafe is used:

try {
    Failsafe
            .with(retryPolicy)
            .run(() -> {
                getJdbcConnection();
                createStatement();
                executeQueries();
            });
} catch (SQLException e1) {
    // todo
} finally {
    closeConnection();
}

void executeQueries() {
  for (String query : queryList) {
   // execute the query
  }
}

My question is, if any of the method fails (getJdbcConnection or createStatement or executeQuery) then I want to retry and thats how I have written the code. However, suppose I had configured three retries and I had successfully obtained the connection in 2nd retry. Once the connection is successful, I want to reset the Failsafe so that it can do three retries again if the connection fails next time. How is this possible? How can I reset the Failsafe? Or what approach do you suggest?

opened by Syed-SnapLogic 34

Dynamic delay
This PR is in response to #110. It adds a delay function property to RetryPolicy that is used, if set, to compute the next delay from the previous result or exception.

There are some awkward bits here:

I used net.jodah.failsafe.util.Duration in signature of the delay function, though I really wanted to use java.time.Duration.

Combining delay factor other than 1 with delay function would be meaningless, but I've done nothing to make them mutually exclusive.

The one included test is pretty crude, passing if the actual delay is within a window of the requested delay.
opened by Tembrel 25
Support basic Java Executor interface

Currently you can provide either ExecutorService, ScheduledExecutorService or implement custom Failsafe Scheduler. It would be good if it would also accept java.util.concurrent.Executor interface since it is a common interface returned by other libs. My current use case is in gRPC, when splitting context for multi-threading, and gRPC methods #fixedContextExecutor return base Executor.
enhancement 3.0

opened by paulius-p 23

Async : How to gracefull shutdown ?

Hello,

I'm trying FailSafe with:

final ScheduledExecutorService executor = Executors.newScheduledThreadPool(2);
final RetryPolicy retryPolicy = new RetryPolicy()
                .withDelay(100, TimeUnit.MILLISECONDS)
                .retryOn(RuntimeException.class)
                .retryWhen(false)
                ;

// this task will fail 99% of time.
Failsafe
                .with(retryPolicy)
                .with(executor)
                .get((ctx) -> {
                        double i = Math.random();
                        if( i > 0.99 ) {
                            return true;
                        } else if (i < 0.1 ) {
                            throw new RuntimeException(" i = " + i);
                        }
                       return false;
                    })
        ;

Then i add a ShutdownHook:

Runtime.getRuntime().addShutdownHook(new Thread(() -> {
       executor.shutdown();
       executor.awaitTermination(1, TimeUnit.HOURS);
}));

Is there to do something like:

FailSafe.awaitAllTaskComplete();
executor.shutdown();
executor.awaitTermination(1, TimeUnit.HOURS);

Or should i have to store all future created and check if they all have been complete ?

opened by pgoergler 23

Implement Timeout policy
For async, we use ForkJoinPool by default, currently with a CompletableFuture being used internally. Unfortunately, neither CompletableFuture or ForkJoinPool support cancel with interrupts for their tasks. So cancellation will only be effective for tasks waiting to be run.

The Timeout policy should be configurable to support interrupts or not.

The Timeout policy should (probably) fail with TimeoutException so that it can be bubbled up to outer policies and easily recognized by them as a failure (similar to CircuitBreakerOpenException).

2.2
opened by jhalterman 22
Added support to easily create Proxy instances

It can be beneficial to wrap a whole interface easilyw ith Failsafe and provide RetryPolicy and CircuitBreaker as a whole.

By leveraging JRE's built-in proxy construction, Failsafe can add support for creating proxy instances for interfaces (byteBuddy and other libraries can provide ability to create proxy classes of concrete types).

Fixes #107

opened by fzakaria 22
Restrictive time window for circuit breaker to record failures
Right now it's possible to configure failure thresholds in terms of consecutive failures:

withFailureThreshold(4, 5), four failures out of five consecutive executions

withFailureThreshold(5, 10), five failures out of ten consecutive executions

etc.

There is no notion of time right now. What I'm hoping for would be the ability to specify something like:

10 failures within 1 second

50 failures out of 200 executions within a minute

My idea behind this is to effectively disable the circuit breaker in low-traffic scenarios but if traffic suddenly increases it should start to kick in. For requests with low traffic (less than ~10 per second) we identified circuit breakers as actually making matters worse, since they tend to stay open for longer, once they open up. And ultimately circuit breaker are means to protect overloading an application which doesn't really happen with few requests.
opened by whiskeysierra 19
Improve handling of shutdown ExecutorService
Hello all,

first of all: thanks for this neat library, I appreciate it!

But for my current usecase, I ran into a somewhat weird issue... Several experiments later, it seems I might have been able to isolate the problem a bit... But from the beginning:

[Java 16 / Windows / Failsafe 2.4.0]

I have a task, which will spawn some other tasks inside a local ExecutorService. Therefore, the ExecutorService needs to be shut down at the end. To harden this task, I want to use Failsafe, containing amongst other things a Timeout. But this Timeout does not work as expected in some cases, but gets swallowed so that the Task is not interrupted. (The tasks are responsive to interrupt.)

I have put together a somewhat verbose example to illustrate the issue:

import java.time.Duration; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import java.util.concurrent.ForkJoinPool; import java.util.concurrent.TimeUnit; import net.jodah.failsafe.Failsafe; import net.jodah.failsafe.Timeout; public class FailsafeShutdownDemo { public static void main(String[] args) throws InterruptedException { /* A */ ExecutorService executorService = Executors.newScheduledThreadPool(4); /* B */ // ExecutorService executorService = ForkJoinPool.commonPool(); System.out.println("Start"); Failsafe.with(Timeout.of(Duration.ofSeconds(3)) .withCancel(true)) /* 1 */ .with(executorService) // TIMEOUT FAILS (ExecutorService) - AS EXPECTED (ForkJoinPool) /* 2 */ // .with((ScheduledExecutorService) executorService) // AS EXPECTED (ExecutorService cast to ScheduledExecutorService) .onComplete(complete -> System.out.println("onComplete -> " + (complete.getFailure() == null ? "No Failure / Timeout didn't work. :-(" : "Failure is " + complete.getFailure() + " as expected. :-)"))) .getAsync(() -> { System.out.println("runAsync()"); TimeUnit.SECONDS.sleep(5); System.out.println("Hello World! <- (after Timeout!)"); return "Success!"; }); // prevent race-condition being a cause of this problem TimeUnit.SECONDS.sleep(1); System.out.println("Shutdown executorService <- prevents Failsafe from executing the Timeout, " + "IF a DelegatingScheduler is used (depending on method-overloading, see [1]). \n\t" + "IF the ScheduledExecutorService is used directly (choose via cast in [2]), the Timeout works as expected."); /* X */ executorService.shutdown(); // prevent daemons from exiting early TimeUnit.SECONDS.sleep(5); System.out.println("EXIT"); } }

The core is getAsync(CheckedSupplyer), whereas the Supplier represents my potentially long running heavy lifting task, which shall be interrupted.

I have set up Failsafe to interrupt the Thread after the Timeout, which works - as long, as I do not shut down the executorService (marked by comment X).

But when I enable the shutdown (as needed - and as it's not shutdownNow(), so I want to keep it early to prevent adding further tasks), things get interesting:

If I supply the ScheduledExecutorService as its superclass ExecutorService to Failsafe (see A + 1), the Task won't be cancelled.

If I supply the same ScheduledExecutorService (A + 2), the Task will be cancelled! This led me to the overloaded methods FailsafeExecutor.with(.), which decides, if a DelegatingScheduler is used.

If I use a ForkJoinPool (B + 1), it also works as expected and the Task gets cancelled.

From these observations, it seems to be a Problem somewhat related to the scheduling of the timeout-trigger in the DelegatingScheduler...? I think it cannot be a race-condition between shutdown and adding the trigger, as I used a sleep for that as well. And shutdown() does not interrupt any existing tasks, so this should be fine as well (and it is for non-DelegatingScheduler)...

Did I miss something? I hope the example makes it clear. In any case, it seems dangerous to get those different results, just by different executors or different method-overloads for the same (!) scheduler...
opened by brainbytes42 18
Null policy

Is there a way to get null/noop version of FailsafeExecutor and perhaps even of the individual policies? I would like to add a requirement to pass FailsafeExecutor to some APIs, but I have to make sure this feature can be easily disabled by submitting some kind of null/noop FailsafeExecutor that just executes everything as if the code was executed directly.

I see that Failsafe.with() will throw IllegalArgumentException if I give it an empty list of policies. I could probably find a workaround, for example by calling .handleIf(e -> false) on RetryPolicy. But is there a clean, concise solution with minimal overhead?

opened by robertvazan 18

Fallback success and failure policy listeners

Hi,

I've a question on using policy listeners with Fallback policy. I understand that onSuccess() is executed when the fallback is executed successfully.

However I'm observing something I didn't quite expect. For e.g., with the below Fallback policy configured to execute on null result, I would not expect Got from fallback to be printed because the main call returns non-null and so fallback logic itself should not be executed.

        Fallback<String> fallback = Fallback.of("hello")
                .handleResult(null)
                .onSuccess(e -> System.out.println("Got from fallback"))
                .onFailure(e -> System.out.println("Failed to get from fallback"));

        String result = Failsafe.with(fallback)
                .get(() -> "world");

        System.out.println("Result is " + result);

But I get the below output -

Got from fallback
Result is world

Why did the onSuccess() listener get executed?

And if I change the main call to return null, then the onFailure() listener is getting executed even though the fallback executes successfully and returns the fallback value.

        Fallback<String> fallback = Fallback.of("hello")
                .handleResult(null)
                .onSuccess(e -> System.out.println("Got from fallback"))
                .onFailure(e -> System.out.println("Failed to get from fallback"));

        String result = Failsafe.with(fallback)
                .get(() -> null);

        System.out.println("Result is " + result);

Output -

Failed to get from fallback
Result is hello

Perhaps my understanding is incorrect or I'm being daft. 😅

bug

opened by sanoopps 17

FYI: Very compact "lean" version of DelegatingScheduler

This is the continuation of https://github.com/failsafe-lib/failsafe/issues/349

Here: https://github.com/magicprinc/failsafe/commits/leap_of_faith

Final memory balance: -1 fat object CompletableFuture -1 lambda Callable in DelegatingScheduler.schedule -1 Callable-Runnable wrapper in delayer().schedule (Runnables are wrapped as Callables in FutureTask ctor)

+1 very lean object ScheduledCompletableFuture implements ScheduledFuture, Callable (not a CompletableFuture anymore)

I am sure this is the final step and one can't optimize this class further. Not a single unused byte in memory!

opened by magicprinc 1
Feature: micrometer.io metrics integration

If you are looking for new ideas: https://micrometer.io/ Metrics would be great!

It is new SLF4J for metrics and all people I know use it as standard de facto.

If you need something for an inspiration: https://github.com/brettwooldridge/HikariCP/tree/dev/src/main/java/com/zaxxer/hikari/metrics/micrometer

https://github.com/micrometer-metrics/micrometer/blob/main/micrometer-core/src/main/java/io/micrometer/core/instrument/binder/cache/CaffeineCacheMetrics.java

https://github.com/micrometer-metrics/micrometer/blob/main/micrometer-core/src/main/java/io/micrometer/core/instrument/binder/okhttp3/OkHttpMetricsEventListener.java

opened by magicprinc 3
FailsafeCall micro refactoring, plus World-Wide Nr.1 duplicated utility method for OkHttp
As you can see here: https://github.com/magicprinc/failsafe/commit/c517e3ef01aec35cd6b6aaa23779873f8e89ffab

FailsafeCall micro refactoring:

AtomicBoolean fields are final

lambda expression instead of code block

World-Wide Nr.1 duplicated utility method for OkHttp `/** [OkHttp Callback to JDK CompletableFuture]
Helps eliminate dozens of utility classes World-wide with exactly this same method.
Can be the first small step towards FailSafe.
Returns normal JDK {@link CompletableFuture} without FailSafe policies. */

public static CompletableFuture asPromise (okhttp3.Call call)`

All around the World, people write this method again and again. I have done it too. We really need "The Chosen One". I recommend you to be this one :-)

If you like it, I will send it as PR.
opened by magicprinc 4
DelegatingScheduler singletons in modern style

DelegatingScheduler uses an old singleton idiom with double volatile check and synchronized. Bill Pugh Singleton Implementation is better, shorter and uses (in some cases) less memory. Plus fields become "static final" so JVM can do some other optimizations.

opened by magicprinc 9
Support accrual failure detection

As Failsafe already supports policies that are useful for networked operations, it would make sense to support phi accrural (or other accural algorithms) failure detection for situations where fixed timeouts don't adequately account for changing load conditions.

This could be implemented as a new policy which measures execution times over a number of executions, to determine if some threshold is crossed which represents a failure. Phi accrual could be one strategy supported by the policy, but there could be others. When the threshold is crossed, a fallback-like function could be called, for example, to fail over a system from one node that has failed to another. In that sense, the policy would be like a time-based fallback (rather than result based), except unlike a fallback it would be stateful.

Alternatively, this could be implemented as a Timeout option, where the timeout is stateful and adapts to execution time distributions.

One open question for this policy is, similar to a circuit breaker or rate limiter, at what point should it "reset" after triggering a failure, or should it even reset?

Any ideas for how this should work or what the policy should be named are welcome!
enhancement new-policy

opened by jhalterman 4

Owner

Failsafe

GitHub https://failsafe.dev

A fault tolerant, protocol-agnostic RPC system

Finagle Status This project is used in production at Twitter (and many other organizations), and is being actively developed and maintained. Releases

8.5k Jan 4, 2023

G&C (Good & Cheap) is a web application with the objective of ensuring sustainable consumption and production patterns in our cities.

MUBISOFT ECO Table of Contents G&C, Keep It Fresh! Sustainable Development Goals Application Requirements G&C, Keep It Fresh! G&C (Good & Cheap) is a

4 May 2, 2022

A sideproject to learn more about object-oriented programming, design patterns and Java meanwhile studying an OOP-course.

MyBank Description A console application that simulates a bank with very simple functions. Potential story could be an employee using this application