A recent work of significantly increasing scalability of J2V8 by removing a lock between V8 instances and letting JAVA layer's lock V8 instance and isolate them caused number of issues (crashes). Some issues have been already patched (https://github.com/eclipsesource/J2V8/commit/8f8625015bc1314ee71cfab725d5d72f44979109) and others by passing explicitly Isolate in #308 and #310 but one crash is hard to solve due to inability to easily pass explicitly Isolate by using V8 API. There is also possibility that there maybe other places that could cause similar crash.
An investigation effort was launched to find out why it crashes and the following lines of code have been identified:
In class isolate.cpp
String::Value::Value(v8::Local<v8::Value> obj) : str_(NULL), length_(0) {
if (obj.IsEmpty()) return;
i::Isolate* isolate = i::Isolate::Current();
Isolate* v8_isolate = reinterpret_cast<Isolate*>(isolate);
ENTER_V8(isolate);
i::HandleScope scope(isolate);
Local<Context> context = v8_isolate->GetCurrentContext();
TryCatch try_catch(v8_isolate);
Local<String> str;
if (!obj->ToString(context).ToLocal(&str)) return;
length_ = str->Length();
str_ = i::NewArray<uint16_t>(length_ + 1);
str->Write(str_);
}
Clearly as with issues #308 and #310 the isolate retrieved via a static method is from time tpo time not the correct one. This suggests a race condition or generally some threading issue. Further investigation shows that Isolates in V8 are in fact stored on thread locals. Even though many places V8's code have been updated to allow passing Isolate explicitly to methods and constructors, as it is visible the constructor that creates String::Value has not been updated or it was not supposed to be updated (unclear). Reading V8 code further reveals places where those ThreadLocals are managed:
isolate.h
// Find the PerThread for this particular (isolate, thread) combination
// If one does not yet exist, return null.
PerIsolateThreadData* FindPerThreadDataForThisThread();
// Find the PerThread for given (isolate, thread) combination
// If one does not yet exist, return null.
PerIsolateThreadData* FindPerThreadDataForThread(ThreadId thread_id);
// Discard the PerThread for this particular (isolate, thread) combination
// If one does not yet exist, no-op.
void DiscardPerThreadDataForThisThread();
There is also a comment from the authors of V8 that the notion of accessing Isolates via ThreadLocals seems to be deprecated. Sadly, even as of 22 Jul 2017, master version of V8 still contains this TODO and the error prone and brittle accessing of Isolates via ThreadLocals in still the code.
This code seems to prove that the concept of using Isolate::Current() and probably Isolate::GetCurrent() is deprecated:
// TODO(jochen): Once we got rid of Isolate::Current(), we can remove this.
Isolate::Scope isolate_scope(v8_isolate);
if (params.entry_hook || !i::Snapshot::Initialize(isolate)) {
isolate->Init(NULL);
}
Due to Isolate::Current() it also looks that there is hacky code which does not bring confidence that V8 could flawlessly operate in the multi-threaded environment.
Having said that it is not 100% clear if the actual bug is in J2V8 code by misusing V8 API or if the crash happens and a defect is in fact in V8 code that thread locals do not maintain a proper state and hold the correct pointer of Isolate at all times of the lifecycle of the V8.
What does not help in the situation is that V8's locks are binary and V8's are counting semaphores. This is a different paradigm between the two and in theory could cause desynchronization of those locks.
IMHO, there are a few schools of thought how to continue further:
-
Find out why V8 and J2V8 Isolates pointers get desychronized and fix the problem. This is assuming that defect is in the J2V8 code and that J2V8 is incorrectly using V8 API. One possibility here could be to turn J2V8 locks also in counting semaphores so that models of locks stay in sync (despite the fact that counting semaphores are way complicated than binary ones.
-
Patch V8 in such a way that actually refactoring is performed to remove i::Isolate::Current(); and Isolate::GetCurrent() as they want this anyway, see their TODO entry above. Once refactoring is done, submit PR to V8 codebase. Replace all places where Isolate is retrieved via static method and usage of ThreadLocals with explicitly passing Isolate (tedious code and really verbose but maybe less brittle). Here the bare minimum would be find all places where J2V8 relies on methods that in fact used Isolate::Current, etc and remove them by passing Isolate explicitly. Stretch goal for PR to V8 could be in fact to fix all places where such code is used in the code-base - removing altogether completely Isolate::GetCurrent();
-
Consider some sort of silly patch and workaround that sets the Isolate pointer from J2V8's v8Runtime on ThreadLocal upon every high level entry to V8 method, this would mean that calls to Isolate::Current would be correct. Such hacky patch ideally should also destroy thread local before setting it and clear and or possibly clear that again just before leaving the top level API method call. This is a bit mine field though.
-
Refactor J2V8 code in such a way that it avoids using classes, constructors and methods that take Isolate from ThreadLocal like fire. This may require that one needs to fine alternatives to write code, e.g. turn Local into some other String than String::String::Value (which has a busted constructor that retrieves implicitly Isolate)
Crash report:
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007fc4c9a6b872, pid=38, tid=0x00007fc4cb117700
#
# JRE version: Java(TM) SE Runtime Environment (8.0_131-b11) (build 1.8.0_131-b11)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.131-b11 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C [libj2v8_linux_x86_64.so+0x56f872] v8::String::Value::Value(v8::Local<v8::Value>)+0x52
#
# Core dump written. Default location: //core or core.38
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
--------------- T H R E A D ---------------
Current thread (0x00007fc36000b800): JavaThread "http-nio-8080-exec-20" daemon [_thread_in_native, id=156, stack(0x00007fc4cb017000,0x00007fc4cb118000)]
siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000008ce8
Registers:
RAX=0x0000000000000000, RBX=0x0000000000000000, RCX=0x0000000000000000, RDX=0x00007fc4cb117a30
RSP=0x00007fc4cb1154a0, RBP=0x00007fc4cb115540, RSI=0x00007fc3ec05d0a8, RDI=0x0000000000000002
R8 =0x0000000000000000, R9 =0x00000359975ba600, R10=0x00007fc508f55a08, R11=0x00001e1900000000
R12=0x00007fc4cb1154d0, R13=0x00000000f858bb40, R14=0x00007fc4cb115990, R15=0x00007fc3ec05d0a8
RIP=0x00007fc4c9a6b872, EFLAGS=0x0000000000010246, CSGSFS=0x0000000000000033, ERR=0x0000000000000004
TRAPNO=0x000000000000000e
Top of Stack: (sp=0x00007fc4cb1154a0)
0x00007fc4cb1154a0: 0000000000000005 00007fc4cb115510
0x00007fc4cb1154b0: 00007fc4cb1154e0 00003efbe57060c7
0x00007fc4cb1154c0: 0000000000000002 00003efbe5706001
0x00007fc4cb1154d0: 00007fc4cb1154c0 0000000000000030
0x00007fc4cb1154e0: 000000000000bba1 10f049c554aa7700
0x00007fc4cb1154f0: 00002a8ba937af31 00007fc3ec05d0a8
0x00007fc4cb115500: 0000000000000000 10f049c554aa7700
0x00007fc4cb115510: 00007fc3ec054b40 00000000c53e7db8
0x00007fc4cb115520: 0000000000000000 00000000f858bb40
0x00007fc4cb115530: 00007fc3ec054b40 00007fc36000b800
0x00007fc4cb115540: 00007fc4cb1159c0 00007fc4c9a268e4
0x00007fc4cb115550: 00007fc4cb115598 00000000e570ed5a
0x00007fc4cb115560: 00007fc4cb115a20 00007fc508f40b10
0x00007fc4cb115570: 00007fc4cb1159f0 00007fc36000b9f8
0x00007fc4cb115580: 00002a79af68beb1 0000000c00000000
0x00007fc4cb115590: 0000000c00000000 0000000c00000000
0x00007fc4cb1155a0: 00003efbe5736961 0000000c00000000
0x00007fc4cb1155b0: 00007fc3ec05d0a8 00007fc3ec05d0a8
0x00007fc4cb1155c0: 0000000000000000 00002a8ba937ace9
0x00007fc4cb1155d0: 00007fc508f4cd50 0000000000000003
0x00007fc4cb1155e0: 00007fc4cab4ec6c 00007fc4cb115718
0x00007fc4cb1155f0: 00002a252da3d6f1 0000000100000000
0x00007fc4cb115600: 00007fc4cb1156e0 00007fc4c9c37223
0x00007fc4cb115610: 00007fc4cb115620 00007fc4c9d47379
0x00007fc4cb115620: 00007fc4cb115680 00002a252da04241
0x00007fc4cb115630: 00007fc3ec064720 00000003c9d4703c
0x00007fc4cb115640: 00007fc508f4cd50 00007fc3ec05d0b8
0x00007fc4cb115650: 0000000000000000 0000000000000000
0x00007fc4cb115660: 00007fc3ec05d0a8 00007fc3ec05d0a8
0x00007fc4cb115670: 00000359975ba6c9 00000359975ba6c9
0x00007fc4cb115680: 00000359975ba6c9 00000359975ba6c9
0x00007fc4cb115690: 00000359975ba6c8 000026df57c04ca9
Instructions: (pc=0x00007fc4c9a6b872)
0x00007fc4c9a6b852: 84 f5 00 00 00 48 8b 05 e2 23 0a 01 49 89 fe 49
0x00007fc4c9a6b862: 89 f7 4c 8d 65 90 8b 38 e8 21 d0 f6 ff 48 89 c3
0x00007fc4c9a6b872: 44 8b a8 e8 8c 00 00 48 89 85 70 ff ff ff c7 80
0x00007fc4c9a6b882: e8 8c 00 00 03 00 00 00 48 8b 80 28 8d 00 00 48
Register to memory mapping:
RAX=0x0000000000000000 is an unknown value
RBX=0x0000000000000000 is an unknown value
RCX=0x0000000000000000 is an unknown value
RDX=0x00007fc4cb117a30 is pointing into the stack for thread: 0x00007fc36000b800
RSP=0x00007fc4cb1154a0 is pointing into the stack for thread: 0x00007fc36000b800
RBP=0x00007fc4cb115540 is pointing into the stack for thread: 0x00007fc36000b800
RSI=0x00007fc3ec05d0a8 is an unknown value
RDI=0x0000000000000002 is an unknown value
R8 =0x0000000000000000 is an unknown value
R9 =0x00000359975ba600 is an unknown value
R10=0x00007fc508f55a08 is an unknown value
R11=0x00001e1900000000 is an unknown value
R12=0x00007fc4cb1154d0 is pointing into the stack for thread: 0x00007fc36000b800
R13=0x00000000f858bb40 is an oop
[Ljava.lang.Object;
- klass: 'java/lang/Object'[]
- length: 3
R14=0x00007fc4cb115990 is pointing into the stack for thread: 0x00007fc36000b800
R15=0x00007fc3ec05d0a8 is an unknown value
Stack: [0x00007fc4cb017000,0x00007fc4cb118000], sp=0x00007fc4cb1154a0, free space=1017k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C [libj2v8_linux_x86_64.so+0x56f872] v8::String::Value::Value(v8::Local<v8::Value>)+0x52
C [libj2v8_linux_x86_64.so+0x52a8e4] getResult(JNIEnv_*, _jobject*&, long, v8::Local<v8::Value>&, int)+0x732
C [libj2v8_linux_x86_64.so+0x518be4] Java_com_eclipsesource_v8_V8__1executeFunction__JIJLjava_lang_String_2J+0x1fb
J 10189 com.eclipsesource.v8.V8._executeFunction(JIJLjava/lang/String;J)Ljava/lang/Object; (0 bytes) @ 0x00007fc4f8b9b14d [0x00007fc4f8b9b080+0xcd]
J 12892 C2 com.eclipsesource.v8.V8Object.executeJSFunction(Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/Object; (279 bytes) @ 0x00007fc4f9fff508 [0x00007fc4f9fff1c0+0x348]
J 13578 C2 pubse.ecs.system.renderer.servlet.j2v8.PoolingJ2V8JavaScriptRenderer.apply(Lde/mobile/ecs/World;)V (340 bytes) @ 0x00007fc4fab1cfa4 [0x00007fc4fab1c780+0x824]
J 13092 C2 pubse.ecs.system.renderer.ResponseRenderer.apply(Lde/mobile/ecs/World;)V (79 bytes) @ 0x00007fc4fa92d3b4 [0x00007fc4fa92d020+0x394]
J 11516 C2 com.google.inject.internal.InterceptorStackCallback$InterceptedMethodInvocation.proceed()Ljava/lang/Object; (91 bytes) @ 0x00007fc4fa680350 [0x00007fc4fa67a3e0+0x5f70]
J 13634 C2 pubse.config.InboundModule$Monitor.invoke(Lorg/aopalliance/intercept/MethodInvocation;)Ljava/lang/Object; (70 bytes) @ 0x00007fc4f9cea5bc [0x00007fc4f9cea4a0+0x11c]
J 11516 C2 com.google.inject.internal.InterceptorStackCallback$InterceptedMethodInvocation.proceed()Ljava/lang/Object; (91 bytes) @ 0x00007fc4fa67a560 [0x00007fc4fa67a3e0+0x180]
J 12832 C2 pubse.tomcat.handler.ServletRouter$$Lambda$188.handle(Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V (10 bytes) @ 0x00007fc4faa54520 [0x00007fc4faa53be0+0x940]
J 11093 C2 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V (574 bytes) @ 0x00007fc4f969e020 [0x00007fc4f969dee0+0x140]
J 12551 C2 org.ocpsoft.rewrite.servlet.RewriteFilter.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;Ljavax/servlet/FilterChain;)V (402 bytes) @ 0x00007fc4fa992a80 [0x00007fc4fa9917c0+0x12c0]
J 11093 C2 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V (574 bytes) @ 0x00007fc4f969e328 [0x00007fc4f969dee0+0x448]
J 12771 C2 pubse.util.ServletFilters$Utf8EncodingFilter.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;Ljavax/servlet/FilterChain;)V (21 bytes) @ 0x00007fc4faa25180 [0x00007fc4faa24da0+0x3e0]
J 11093 C2 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V (574 bytes) @ 0x00007fc4f969e328 [0x00007fc4f969dee0+0x448]
J 12707 C2 pubse.util.ServletFilters$LoggingFilter.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;Ljavax/servlet/FilterChain;)V (76 bytes) @ 0x00007fc4fa3bcfe8 [0x00007fc4fa3bcbe0+0x408]
J 11093 C2 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V (574 bytes) @ 0x00007fc4f969e328 [0x00007fc4f969dee0+0x448]
J 13553 C2 org.apache.catalina.core.StandardWrapperValve.invoke(Lorg/apache/catalina/connector/Request;Lorg/apache/catalina/connector/Response;)V (1389 bytes) @ 0x00007fc4fa464fec [0x00007fc4fa464e00+0x1ec]
J 12726 C2 org.apache.catalina.authenticator.AuthenticatorBase.invoke(Lorg/apache/catalina/connector/Request;Lorg/apache/catalina/connector/Response;)V (903 bytes) @ 0x00007fc4fa5b2900 [0x00007fc4fa5b1980+0xf80]
J 12739 C2 org.apache.catalina.core.StandardHostValve.invoke(Lorg/apache/catalina/connector/Request;Lorg/apache/catalina/connector/Response;)V (402 bytes) @ 0x00007fc4faa03420 [0x00007fc4faa030c0+0x360]
J 13494 C2 org.apache.catalina.connector.CoyoteAdapter.service(Lorg/apache/coyote/Request;Lorg/apache/coyote/Response;)V (805 bytes) @ 0x00007fc4fa105994 [0x00007fc4fa1054a0+0x4f4]
J 13979 C2 org.apache.coyote.http11.AbstractHttp11Processor.process(Lorg/apache/tomcat/util/net/SocketWrapper;)Lorg/apache/tomcat/util/net/AbstractEndpoint$Handler$SocketState; (1128 bytes) @ 0x00007fc4f9b0f65c [0x00007fc4f9b0f020+0x63c]
J 13493 C2 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(Lorg/apache/tomcat/util/net/SocketWrapper;Lorg/apache/tomcat/util/net/SocketStatus;)Lorg/apache/tomcat/util/net/AbstractEndpoint$Handler$SocketState; (1073 bytes) @ 0x00007fc4fa475960 [0x00007fc4fa475480+0x4e0]
J 13115 C2 org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun()V (603 bytes) @ 0x00007fc4fa0607fc [0x00007fc4fa060680+0x17c]
J 12608 C2 org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run()V (79 bytes) @ 0x00007fc4fa49626c [0x00007fc4fa4961c0+0xac]
J 13557% C2 java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V (225 bytes) @ 0x00007fc4fa57c134 [0x00007fc4fa57bf20+0x214]
j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5
j org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run()V+4
j java.lang.Thread.run()V+11
v ~StubRoutines::call_stub
V [libjvm.so+0x691d16] JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x1056
V [libjvm.so+0x692221] JavaCalls::call_virtual(JavaValue*, KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x321
V [libjvm.so+0x6926c7] JavaCalls::call_virtual(JavaValue*, Handle, KlassHandle, Symbol*, Symbol*, Thread*)+0x47
V [libjvm.so+0x72da50] thread_entry(JavaThread*, Thread*)+0xa0
V [libjvm.so+0xa76833] JavaThread::thread_main_inner()+0x103
V [libjvm.so+0xa7697c] JavaThread::run()+0x11c
V [libjvm.so+0x927568] java_start(Thread*)+0x108