You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
drivers: eth_nxp_enet_qos: Fix deadlock in system workqueue
There was a deadlock occurring, exposed by http server sample because of
situations like this caused by tx done work being blocked in deadlock:
1) The TX would be started by some thread and the driver TX sem would be
taken.
2) The http server socket would get scheduled on the system workqueue to
send something, claim the network TX interface mutex,
and be blocked taking the semaphore.
3) The RX traffic class handler would get blocked trying to claim the
network interface TX mutex, while trying to send an ACK in the TCP
callback. This means the RX packets would not be processed.
4) Lots of RX unable to allocate packets errors would happen, and all RX
would be dropped. This was the main symptom of the deadlock, which
made it look like a memory leak but actually had nothing to do with
the RX code nor any memory leak.
5) The TX DMA would finish and schedule the TX DMA done work onto the
system work queue, behind the http server socket which is blocked on
the waiting for the driver TX semaphore.
6) If the TX DMA done work would have ran, that's what gives the TX
driver semaphore. So this is the reason for the deadlock of all these
different threads and work items, the misqueue in the system
workqueue.
Fix by just calling the TX DMA done code directly from the ISR, it
should be ISR safe, and really not a lot of code to execute, just
freeing some net buffers and the packet and updating the stats.
An optimization can be made later if needed, but for now,
solving the deadlock is a more urgent priority.
Signed-off-by: Declan Snyder <declan.snyder@nxp.com>
0 commit comments