#241 may be related
A consistent "xhr poll error" seems to be caused by hanging connections. The condition is typically caused when manager calls onclose() and interrupts a pending xhr request.
The error does typically never occur as long as xhr requests are serviced by the server, e.g. a packet is returned while the request is pending. The client then receives the packet and posts a new request.
Timeouts play a significant role. The client can only recover from the condition, if the maximum timeout imposed by reconnectionDelayMax is large enough to ensure the previous request has been cancelled. If it is still pending, then the new request will immediately fail as well, get stuck, and block the next poll request.
This is the reason why it sometimes takes extremely long, until a client recovers from disconnects. When connection is retried, the backoff timeout is reset to reconnectionDelay, then increased. Reconnect attempts fail for a long time, until the gap is large enough.
This issue may exclusively happen or happen worse on Android 4.2 (SKD 17) [probably 4.1-4.3]. As it seems, there is an underlying network problem when handling long-running requests (see #248). I could not see any relationships between size of connection pool and such, but I could see that sometimes multiple requests have been pending in parallel. It looks like it works fine as long as there is a single pending request, but it fails (always or often) if 2 or more long running requests are pending.
I tried using okhttp-urlconnection instead of Android's HttpUrlConnection, but this did not make it any better. The issue is just better debuggable, but it remains. Another indicator that we probably have a networking issue here.
How to reproduce:
-
set upgrade = false, so we only have xhr polling
-
open a socket
-
provoke a network outage
-
hope the client goes into the "xhr poll error" loop
-
watch how it does not recover
-
in Manager, set a breakpoint AFTER this line:
`this.reconnecting = true; // ln 501`
-
wait long enough to ensure all pending xhr connections have been timed out
-
keep running the code
-
watch how the client reconnects and recovers