Expected behavior
my atomix and onos on the same device, I compiler onos-2.2.4 with atomix-3.1.8(openjdk-11),
When I tested with 2000 devices, onos will search some stats messages from atomix,I will get many timeout,
This situation will make onos get oom
Actual behavior
when testing,I get this exception:
2020-10-27T19:47:46,355 | DEBUG | raft-partition-group-raft-6 | RaftSessionConnection | 129 - io.atomix.utils - 3.1.8 | SessionClient{29}{type=AtomicCounterType{name=atomic-counter}, name=sys-clock-counter} - CommandRequest{session=29, sequence=1106859, operation=PrimitiveOperation{id=DefaultOperationId{id=incrementAndGet, type=COMMAND}, value=null}} failed! Reason: {}
java.util.concurrent.TimeoutException: Request type raft-partition-1-command timed out in 5000 milliseconds
at io.atomix.cluster.messaging.impl.AbstractClientConnection$Callback.timeout(AbstractClientConnection.java:159) ~[?:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:834) [?:?]
I print the log to try to find the problem, like this:
final class RemoteServerConnection extends AbstractServerConnection {
private static final byte[] EMPTY_PAYLOAD = new byte[0];
private final Logger log = LoggerFactory.getLogger(getClass());
private final Channel channel;
RemoteServerConnection(HandlerRegistry handlers, Channel channel) {
super(handlers);
this.channel = channel;
}
@Override
public void reply(ProtocolRequest message, ProtocolReply.Status status, Optional<byte[]> payload) {
ProtocolReply response = new ProtocolReply(
message.id(),
payload.orElse(EMPTY_PAYLOAD),
status);
log.info("RemoteServerConnection reply, message subject {} message type {} message id {} message status {}", message.subject(), message.type(), message.id(), status.name());
channel.writeAndFlush(response, channel.voidPromise());
}
}
Then,I found a strange problem,if this log is info, the Timeout will get fewer,But if log is not info or not added,The timeout will get more. So, I have two question:
- why timeout?
- What is the impact of adding logs
Environment
- Atomix: [e.g. 3.1.8]
- OS: [e.g.
ubuntu-18.04
]
- JVM [e.g.
openjdk-11
]