Fix flaky ConsumerQueue::Dequeue after import
We recently noticed some flakiness around buffer_hub_queue-test's
BufferHubQueueTest.TestAllocateBuffer test case. After digging deep here
is the actual sympton:
There are two consecutive ConsumerQueue::Dequeue() calls in the test,
right after the consumer queue get created. The second call gets a
certain chance to fail and it appears that the queue logic is dequeuing
the same (i.e. the first) buffer again. Since the buffer already
acquired during the first Dequeue() call, the second Dequeue() will
eventually fail due to invalid buffer state transition.
The problem here is that we Enqueue() the first buffer twice:
(1) During buffer import, we always check whether the buffer has been
signaled. If so, we Enqueue() it, then Acquire it asynchronously during
Dequeue().
(2) During Dequeue(), the buffer's event_fd is already in the
ConsumerQeue's epoll set and is will **possibly** trigger the same
buffer being Enqueue()'d the second time. Note that this may not
happened if the AcquireAsync in step (1) race ahead of the epoll_wait
and clears the buffer available signal.
We had the logic described in step (1) as a safe guard to explicitly
catch the cases where buffers are already available when added into an
epoll set. Under this assuption (1) and (2) should happen exclusively:
either a buffer is available on import, thus edge triggered epoll won't
fire and we should only trigger (1); or the buffer is not available on
import, but becomes available later, which should only
trigger (2). However, this is not how epoll/eventfd works. We found
though unit test on epoll that adding a already signaled event_fd into a
epoll set will always trigger the epoll fd no matter whether the
event_fd is already signaled when being added into the epoll set. We
verified this on Android and mainline Linux kernel.
Thus, the check in step (1) is redundant and should be removed. Once (1)
is removed, we will always catch new buffer events through the same
edge-triggered epoll_wait in (2). Since it is edge triggered, it will
be trigged only once when the buffer becomes available, so it's
resilient to the race condition related to AcquireAsync().
Bug: 70306222
Test: buffer_hub_queue-test, buffer_hub-test
Change-Id: Ic24188ad84664bc102b85ca93ed24c354cb5c5a3
2 files changed