Affects: \v5.3.3
The symptom of this issue is similar to previous issue https://github.com/spring-projects/spring-framework/issues/23096 (response handling never completes) though the underlying cause is different.
I've reproduced this issue with the small demo spring-boot project which uses WebFlux on Tomcat, seen here: https://github.com/danielra/webflux_race_investigation
The project can be built with ./gradlew clean build
and then run via ./start.sh
.
To reproduce the problem I've been successful using wrk (https://github.com/wg/wrk) to throw some load at the service, stop and then throw some more etc.. For example:
for i in {1..100}; do ./wrk -t 2 -c 1000 -d 10s --latency --timeout 9s --script ./myPost.lua http://localhost:8080/post; done
Where ./myPost.lua looks like:
wrk.method = "POST"
wrk.body = "[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]"
wrk.headers["Content-Type"] = "application/json"
(I don't think this being a POST request really ended up being relevant, but I started here because I was trying to start the more isolated repro somewhere somewhat similar to the real system that was experiencing the problem.)
This will result in a lot of trace level logging under a logs
directory in the demo project. This can be searched through to find any occurrences like:
find . -name \*.gz -print0 | xargs -0 zgrep "Response timeout"
And then another search for the logPrefix found in resulting log entries can be performed to gain more context on the processing for the relevant request.
This is an example set of log entries for a repro case (with irrelevant line prefixes ommitted):
12:31:47,007 INFO c.e.d.f.DemoWebFilter [http-nio-8080-exec-368] [33d6780] Incoming request observed.
12:31:47,007 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-368] [33d6780] UNSUBSCRIBED -> SUBSCRIBING
12:31:47,007 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-368] [33d6780] 9223372036854775807 requested
12:31:47,007 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-368] [33d6780] SUBSCRIBING -> DEMAND
12:31:47,007 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-368] [33d6780] onDataAvailable
12:31:47,007 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-368] [33d6780] DEMAND -> READING
12:31:47,007 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-368] [33d6780] Read 8092 bytes
12:31:47,007 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-368] [33d6780] Publishing data read
12:31:47,007 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-368] [33d6780] Read 8192 bytes
12:31:47,007 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-368] [33d6780] Publishing data read
12:31:47,007 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-368] [33d6780] Read 4067 bytes
12:31:47,007 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-368] [33d6780] Publishing data read
12:31:47,007 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-368] [33d6780] No more data to read
12:31:47,007 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-368] [33d6780] READING -> DEMAND
12:31:47,007 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-368] [33d6780] onAllDataRead
12:31:47,007 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-368] [33d6780] DEMAND -> COMPLETED
12:31:47,780 TRACE _.s.h.s.r.AbstractListenerWriteFlushProcessor [parallel-2] [33d6780] UNSUBSCRIBED -> REQUESTED
12:31:47,780 TRACE _.s.h.s.r.AbstractListenerWriteFlushProcessor [parallel-2] [33d6780] Received onNext publisher
12:31:47,780 TRACE _.s.h.s.r.AbstractListenerWriteFlushProcessor [parallel-2] [33d6780] REQUESTED -> RECEIVED
12:31:47,780 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-2] [33d6780] UNSUBSCRIBED -> REQUESTED
12:31:47,780 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-2] [33d6780] Item to write
12:31:47,780 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-2] [33d6780] REQUESTED -> RECEIVED
12:31:47,780 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-2] [33d6780] onWritePossible
12:31:47,780 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-2] [33d6780] RECEIVED -> WRITING
12:31:47,780 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-2] [33d6780] Wrote 12931 of 12931 bytes
12:31:47,780 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-2] [33d6780] WRITING -> REQUESTED
12:31:47,780 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-2] [33d6780] No more items to write
12:31:47,780 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-2] [33d6780] REQUESTED -> RECEIVED
12:31:47,780 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-2] [33d6780] isWritePossible: false
12:31:47,780 TRACE _.s.h.s.r.WriteResultPublisher [parallel-2] [33d6780] UNSUBSCRIBED subscribe: org.springframework.http.server.reactive.AbstractListenerWriteFlushProcessor$State$WriteResultSubscriber@7a996bbf
12:31:47,780 TRACE _.s.h.s.r.WriteResultPublisher [parallel-2] [33d6780] SUBSCRIBING request: 9223372036854775807
12:31:47,780 TRACE _.s.h.s.r.AbstractListenerWriteFlushProcessor [parallel-2] [33d6780] Received onComplete
12:31:47,780 TRACE _.s.h.s.r.WriteResultPublisher [parallel-2] [33d6780] UNSUBSCRIBED subscribe: reactor.core.publisher.MonoNext$NextSubscriber@53a4d00d
12:31:47,780 TRACE _.s.h.s.r.WriteResultPublisher [parallel-2] [33d6780] SUBSCRIBING request: 9223372036854775807
12:31:47,781 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [http-nio-8080-exec-348] [33d6780] onWritePossible
12:31:47,781 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [http-nio-8080-exec-348] [33d6780] RECEIVED -> COMPLETED
12:31:47,781 TRACE _.s.h.s.r.WriteResultPublisher [http-nio-8080-exec-348] [33d6780] SUBSCRIBED publishComplete
12:31:47,781 TRACE _.s.h.s.r.AbstractListenerWriteFlushProcessor [http-nio-8080-exec-348] [33d6780] RECEIVED writeComplete
12:31:47,781 TRACE _.s.h.s.r.AbstractListenerWriteFlushProcessor [http-nio-8080-exec-348] [33d6780] Flush attempt
12:31:47,781 TRACE _.s.h.s.r.AbstractListenerWriteFlushProcessor [http-nio-8080-exec-348] [33d6780] RECEIVED flushingFailed
12:31:59,588 TRACE _.s.h.s.r.AbstractListenerReadPublisher [parallel-4] [33d6780] Cancellation
12:31:59,588 TRACE _.s.h.s.r.WriteResultPublisher [parallel-4] [33d6780] SUBSCRIBED cancel
12:31:59,588 ERROR c.e.d.f.DemoWebFilter [parallel-4] [33d6780] Response timeout after 12581 milliseconds for null request with uri 'http://localhost:8080/post'. Response status code was already committed: '200 OK'. 12:31:59,588 TRACE o.s.h.s.r.ServletHttpHandlerAdapter [parallel-4] [33d6780] Handling completed
Here the line 12:31:47,781 TRACE _.s.h.s.r.AbstractListenerWriteFlushProcessor [http-nio-8080-exec-348] [33d6780] RECEIVED flushingFailed
comes from a modification I made locally to the spring-framework to add an additional trace log line here: https://github.com/spring-projects/spring-framework/blob/v5.3.3/spring-web/src/main/java/org/springframework/http/server/reactive/AbstractListenerWriteFlushProcessor.java#L194-L195
like so:
if (rsWriteFlushLogger.isTraceEnabled()) {
rsWriteFlushLogger.trace(this.logPrefix + this.state + " flushingFailed", t);
}
For reference, the relevant Throwable logged at these lines in these reproductions with this demo project looks like:
org.apache.catalina.connector.ClientAbortException: java.io.IOException: Broken pipe
at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:309) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
at org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:272) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
at org.apache.catalina.connector.CoyoteOutputStream.flush(CoyoteOutputStream.java:118) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
at org.springframework.http.server.reactive.ServletServerHttpResponse.flush(ServletServerHttpResponse.java:198) ~[spring-web-5.3.3.jar!/:5.3.3]
at org.springframework.http.server.reactive.ServletServerHttpResponse.access$500(ServletServerHttpResponse.java:50) ~[spring-web-5.3.3.jar!/:5.3.3]
at org.springframework.http.server.reactive.ServletServerHttpResponse$ResponseBodyFlushProcessor.flush(ServletServerHttpResponse.java:316) ~[spring-web-5.3.3.jar!/:5.3.3]
at org.springframework.http.server.reactive.AbstractListenerWriteFlushProcessor$State$3.writeComplete(AbstractListenerWriteFlushProcessor.java:291) ~[spring-web-5.3.3.jar!/:5.3.3]
at org.springframework.http.server.reactive.AbstractListenerWriteFlushProcessor$State$WriteResultSubscriber.onComplete(AbstractListenerWriteFlushProcessor.java:437) ~[spring-web-5.3.3.jar!/:5.3.3]
at org.springframework.http.server.reactive.WriteResultPublisher$State.publishComplete(WriteResultPublisher.java:256) ~[spring-web-5.3.3.jar!/:5.3.3]
at org.springframework.http.server.reactive.WriteResultPublisher.publishComplete(WriteResultPublisher.java:84) ~[spring-web-5.3.3.jar!/:5.3.3]
at org.springframework.http.server.reactive.AbstractListenerWriteProcessor.changeStateToComplete(AbstractListenerWriteProcessor.java:280) ~[spring-web-5.3.3.jar!/:5.3.3]
at org.springframework.http.server.reactive.AbstractListenerWriteProcessor.access$300(AbstractListenerWriteProcessor.java:46) ~[spring-web-5.3.3.jar!/:5.3.3]
at org.springframework.http.server.reactive.AbstractListenerWriteProcessor$State$3.onWritePossible(AbstractListenerWriteProcessor.java:368) ~[spring-web-5.3.3.jar!/:5.3.3]
at org.springframework.http.server.reactive.AbstractListenerWriteProcessor.onWritePossible(AbstractListenerWriteProcessor.java:153) ~[spring-web-5.3.3.jar!/:5.3.3]
at org.springframework.http.server.reactive.ServletServerHttpResponse$ResponseBodyWriteListener.onWritePossible(ServletServerHttpResponse.java:270) ~[spring-web-5.3.3.jar!/:5.3.3]
at org.apache.coyote.Response.onWritePossible(Response.java:762) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
at org.apache.catalina.connector.CoyoteAdapter.asyncDispatch(CoyoteAdapter.java:188) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
at org.apache.coyote.AbstractProcessor.dispatch(AbstractProcessor.java:241) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:59) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:888) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1597) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_282]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_282]
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
Caused by: java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[?:1.8.0_282]
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) ~[?:1.8.0_282]
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) ~[?:1.8.0_282]
at sun.nio.ch.IOUtil.write(IOUtil.java:65) ~[?:1.8.0_282]
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:470) ~[?:1.8.0_282]
at org.apache.tomcat.util.net.NioChannel.write(NioChannel.java:138) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
at org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.doWrite(NioEndpoint.java:1269) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
at org.apache.tomcat.util.net.SocketWrapperBase.doWrite(SocketWrapperBase.java:764) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
at org.apache.tomcat.util.net.SocketWrapperBase.flushNonBlocking(SocketWrapperBase.java:735) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
at org.apache.tomcat.util.net.SocketWrapperBase.flush(SocketWrapperBase.java:709) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
at org.apache.coyote.http11.Http11OutputBuffer$SocketOutputBuffer.flush(Http11OutputBuffer.java:572) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
at org.apache.coyote.http11.filters.IdentityOutputFilter.flush(IdentityOutputFilter.java:117) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
at org.apache.coyote.http11.Http11OutputBuffer.flush(Http11OutputBuffer.java:220) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
at org.apache.coyote.http11.Http11Processor.flush(Http11Processor.java:1195) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
at org.apache.coyote.AbstractProcessor.action(AbstractProcessor.java:402) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
at org.apache.coyote.Response.action(Response.java:209) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:305) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
... 25 more
And appears to me to be a result of the client hanging up the connection before response handling is complete on the server.
After this flushingFailed log line, we can see a relatively large (~12 second) gap in time before the next log line which was triggered by the response timeout being triggered which is setup in the DemoWebFilter in the demo project here: https://github.com/danielra/webflux_race_investigation/blob/main/src/main/java/com/example/demo/filter/DemoWebFilter.java#L38 And only after this timeout does ServletHttpHandlerAdapter
log a line indicating that handling is complete.
I note that the flushingFailed
method where I added the extra log line above has a comment which reads:
Invoked when an error happens while flushing. Sub-classes may choose
to ignore this if they know the underlying API will provide an error
notification in a container thread.
Defaults to no-op.
Based on this description and the observed behavior, it seems to me that perhaps a notification of the error was expected elsewhere, but it was never actually received so handling of the response is never completed (without a timeout present). I don't know enough yet to say whether that was due to a bug in the code that should have emitted the error notification or on the listening side - or if there should actually be a real non-no-op operation implemented of flushingFailed in the relevant subclass in the Tomcat case. However, my expectation is that regardless of when a client hangs up the connection the server should proceed with completing response handling.
Please let me know if any additional information would be helpful. Thank you for your time!
in: web type: bug