When its future is discarded, the cgroup task killer would stop immediately. This potentially leaves the cgroup frozen, therefore leaving all processes in uninterruptible state, which is quite bad - for example the tasks would be unkillable and the cgroup couldn't be destroyed.
Instead, when discarded, wait a bit to give the task killer a chance to finish cleanly, killing all the tasks and thawing the cgroup.
I started seeing this after updating my kernel - dozens of tests would fail randomly when trying to destroy cgroups, e.g.:
[ RUN ] SlaveRecoveryTest/0.ExecutorDanglingLatestSymlink
I0517 22:39:10.577574 573547 exec.cpp:164] Version: 1.12.0
I0517 22:39:10.591667 573550 exec.cpp:237] Executor registered on agent a7c3a499-2968-4eb6-91dd-1020fedc1522-S0
I0517 22:39:10.594743 573552 executor.cpp:190] Received SUBSCRIBED event
I0517 22:39:10.595999 573552 executor.cpp:194] Subscribed executor on thinkpad
I0517 22:39:10.596259 573552 executor.cpp:190] Received LAUNCH event
I0517 22:39:10.597656 573552 executor.cpp:722] Starting task 39b1539d-9c30-41b0-b643-3ece3e63bea5
I0517 22:39:10.617466 573552 executor.cpp:740] Forked command at 573554
../../src/tests/mesos.cpp:782: Failure
Failed to wait 15secs for cgroups::destroy(hierarchy, cgroup)
*** Aborted at 1621287565 (unix time) try "date -d @1621287565" if you are using GNU date ***
PC: @ 0x56129f5437e7 testing::UnitTest::AddTestPartResult()
*** SIGSEGV (@0x0) received by PID 573473 (TID 0x7fb79b027b00) from PID 0; stack trace: ***
@ 0x7fb79c287140 (unknown)
@ 0x56129f5437e7 testing::UnitTest::AddTestPartResult()
@ 0x56129f5366f3 testing::internal::AssertHelper::operator=()
@ 0x56129e60421b mesos::internal::tests::ContainerizerTest<>::TearDown()
@ 0x56129f5633cb testing::internal::HandleSehExceptionsInMethodIfSupported<>()
@ 0x56129f55d634 testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x56129f53d38d testing::Test::Run()
@ 0x56129f53dbb2 testing::TestInfo::Run()
@ 0x56129f53e1f9 testing::TestCase::Run()
@ 0x56129f544ce0 testing::internal::UnitTestImpl::RunAllTests()
@ 0x56129f564349 testing::internal::HandleSehExceptionsInMethodIfSupported<>()
@ 0x56129f55e24a testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x56129f543a0a testing::UnitTest::Run()
@ 0x56129e0f6aba RUN_ALL_TESTS()
@ 0x56129e0f6505 main
@ 0x7fb79c0d4d0a __libc_start_main
@ 0x56129d23545a _start
Erreur de segmentation (core dumped)
Comparing a successful run:
I0516 23:58:25.918263 466660 cgroups.cpp:2934] Freezing cgroup /sys/fs/cgroup/freezer/mesos_test_8851f242-b14a-4580-88b8-5d3dd552bfde/e2bf97c3-14a1-4b2b-b399-12dbd0f1c99d
I0516 23:58:25.918447 466657 cgroups.cpp:1323] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos_test_8851f242-b14a-4580-88b8-5d3dd552bfde/e2bf97c3-14a1-4b2b-b399-12dbd0f1c99d after 157952ns
I0516 23:58:25.918900 466660 cgroups.cpp:2952] Thawing cgroup /sys/fs/cgroup/freezer/mesos_test_8851f242-b14a-4580-88b8-5d3dd552bfde/e2bf97c3-14a1-4b2b-b399-12dbd0f1c99d
I0516 23:58:25.919064 466656 cgroups.cpp:1352] Successfully thawed cgroup /sys/fs/cgroup/freezer/mesos_test_8851f242-b14a-4580-88b8-5d3dd552bfde/e2bf97c3-14a1-4b2b-b399-12dbd0f1c99d after 134912ns
To an unsuccessful one:
I0516 23:58:42.638830 466897 linux_launcher.cpp:606] Destroying cgroup '/sys/fs/cgroup/freezer/mesos_test_98d2814b-03ad-4c14-a3f9-832ab688e386/3139257f-4e7a-4828-9af4-75c867846558'
I0516 23:58:42.639083 466898 composing.cpp:343] Finished recovering all containerizers
I0516 23:58:42.639549 466898 cgroups.cpp:2934] Freezing cgroup /sys/fs/cgroup/freezer/mesos_test_98d2814b-03ad-4c14-a3f9-832ab688e386/3139257f-4e7a-4828-9af4-75c867846558
I0516 23:58:42.714794 466899 cgroups.cpp:2952] Thawing cgroup /sys/fs/cgroup/freezer/mesos_test_98d2814b-03ad-4c14-a3f9-832ab688e386/3139257f-4e7a-4828-9af4-75c867846558
W0516 23:58:42.745649 466903 cgroups.cpp:294] Removal of cgroup /sys/fs/cgroup/memory/mesos_test_98d2814b-03ad-4c14-a3f9-832ab688e386/3139257f-4e7a-4828-9af4-75c867846558 failed with EBUSY, will try again
W0516 23:58:42.746997 466902 cgroups.cpp:294] Removal of cgroup /sys/fs/cgroup/memory/mesos_test_98d2814b-03ad-4c14-a3f9-832ab688e386/3139257f-4e7a-4828-9af4-75c867846558 failed with EBUSY, will try again
W0516 23:58:42.749405 466898 cgroups.cpp:294] Removal of cgroup /sys/fs/cgroup/memory/mesos_test_98d2814b-03ad-4c14-a3f9-832ab688e386/3139257f-4e7a-4828-9af4-75c867846558 failed with EBUSY, will try again
W0516 23:58:42.754009 466900 cgroups.cpp:294] Removal of cgroup /sys/fs/cgroup/memory/mesos_test_98d2814b-03ad-4c14-a3f9-832ab688e386/3139257f-4e7a-4828-9af4-75c867846558 failed with EBUSY, will try again
W0516 23:58:42.762802 466901 cgroups.cpp:294] Removal of cgroup /sys/fs/cgroup/memory/mesos_test_98d2814b-03ad-4c14-a3f9-832ab688e386/3139257f-4e7a-4828-9af4-75c867846558 failed with EBUSY, will try again
W0516 23:58:42.779531 466904 cgroups.cpp:294] Removal of cgroup /sys/fs/cgroup/memory/mesos_test_98d2814b-03ad-4c14-a3f9-832ab688e386/3139257f-4e7a-4828-9af4-75c867846558 failed with EBUSY, will try again
W0516 23:58:42.812022 466897 cgroups.cpp:294] Removal of cgroup /sys/fs/cgroup/memory/mesos_test_98d2814b-03ad-4c14-a3f9-832ab688e386/3139257f-4e7a-4828-9af4-75c867846558 failed with EBUSY, will try again
W0516 23:58:42.877116 466903 cgroups.cpp:294] Removal of cgroup /sys/fs/cgroup/memory/mesos_test_98d2814b-03ad-4c14-a3f9-832ab688e386/3139257f-4e7a-4828-9af4-75c867846558 failed with EBUSY, will try again
W0516 23:58:43.006693 466898 cgroups.cpp:294] Removal of cgroup /sys/fs/cgroup/memory/mesos_test_98d2814b-03ad-4c14-a3f9-832ab688e386/3139257f-4e7a-4828-9af4-75c867846558 failed with EBUSY, will try again
W0516 23:58:43.264366 466899 cgroups.cpp:294] Removal of cgroup /sys/fs/cgroup/memory/mesos_test_98d2814b-03ad-4c14-a3f9-832ab688e386/3139257f-4e7a-4828-9af4-75c867846558 failed with EBUSY, will try again
W0516 23:58:43.777858 466901 cgroups.cpp:294] Removal of cgroup /sys/fs/cgroup/memory/mesos_test_98d2814b-03ad-4c14-a3f9-832ab688e386/3139257f-4e7a-4828-9af4-75c867846558 failed with EBUSY, will try again
W0516 23:58:44.803258 466897 cgroups.cpp:294] Removal of cgroup /sys/fs/cgroup/memory/mesos_test_98d2814b-03ad-4c14-a3f9-832ab688e386/3139257f-4e7a-4828-9af4-75c867846558 failed with EBUSY, will try again
W0516 23:58:46.851841 466901 cgroups.cpp:294] Removal of cgroup /sys/fs/cgroup/memory/mesos_test_98d2814b-03ad-4c14-a3f9-832ab688e386/3139257f-4e7a-4828-9af4-75c867846558 failed with EBUSY, will try again
We can see that when the problem occur, it's because the cgroup doesn't freeze quickly. Since the tests quickly destroys the slave/containeriser after the task finished, the cgroups task killer gets interrupted after the cgroup has started to be frozen but before it's been thawed (after killing the tasks) - https://github.com/apache/mesos/blob/master/src/linux/cgroups.cpp#L1445
void killTasks() {
// Chain together the steps needed to kill all tasks in the cgroup.
chain = freeze() // Freeze the cgroup.
.then(defer(self(), &Self::kill)) // Send kill signal.
.then(defer(self(), &Self::thaw)) // Thaw cgroup to deliver signal.
.then(defer(self(), &Self::reap)); // Wait until all pids are reaped.
chain.onAny(defer(self(), &Self::finished, lambda::_1));
}
The problem is that the process sets up a discard callback which immediately terminates the process - https://github.com/apache/mesos/blob/master/src/linux/cgroups.cpp#L1407
void initialize() override
{
// Stop when no one cares.
promise.future().onDiscard(lambda::bind(
static_cast<void (*)(const UPID&, bool)>(terminate), self(), true));
killTasks();
}
Which means that the task killer can be interrupted after it's frozen the cgroup, but before killing the tasks and thawing.
Which means that the cgroup stays frozen, tasks are stuck in uninterruptible state, can't be killed and the cgroup can't be destroyed, as can be seen above.
I'm not adding a specific test because it's causing dozens of tests to fail already.
I'm not quite sure why this only started happening on recent kernels, but my guess is that for whatever reason freezing seems to sometime take longer than before, which means we're much more likely to interrupt the killer in the middle of its work.
In any case it's a long-standing bug which has potentially bad consequences so good to fix.
@asekretenko
@qianzhangxa