IPoIB: Fix deadlock on RTNL between bcast join comp and ipoib_stop() Taking rtnl_lock in ipoib_mcast_join_complete() causes a deadlock with ipoib_stop(). We avoid it by scheduling the piece of code that takes the lock on ipoib_workqueue instead of executing it directly. This works because we only flush the ipoib_workqueue with the RTNL not held. The deadlock happens because ipoib_stop() calls ipoib_ib_dev_down() which calls ipoib_mcast_dev_flush(), which calls ipoib_mcast_free(), which calls ipoib_mcast_leave(). The latter calls ib_sa_free_multicast(), and this waits until the multicast completion handler finishes. This handler is ipoib_mcast_join_complete(), which waits for the rtnl_lock(), which was already taken by ipoib_stop(). This bug was introduced in commit a77a57a1 ("IPoIB: Fix deadlock on RTNL in ipoib_stop()"). Signed-off-by: Yossi Etigin <yosefe@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>

commit: e8224e4b804b4fd26723191c1891101a5959bb8a [log] [tgz]
author: Yossi Etigin <yossi.openib@gmail.com> Tue Sep 16 11:57:45 2008 -0700
committer: Roland Dreier <rolandd@cisco.com> Tue Sep 16 11:57:45 2008 -0700
tree: 94aa1274989fca8154bd3912d5f73239e705e7a3
parent: 1941246dd98089dd637f44d3bd4f6cc1c61aa9e4 [diff] [blame]
diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index b0ffc9a..05eb41b 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h

@@ -293,6 +293,7 @@
 
 	struct delayed_work pkey_poll_task;
 	struct delayed_work mcast_task;
+	struct work_struct carrier_on_task;
 	struct work_struct flush_light;
 	struct work_struct flush_normal;
 	struct work_struct flush_heavy;
@@ -464,6 +465,7 @@
 void ipoib_dev_cleanup(struct net_device *dev);
 
 void ipoib_mcast_join_task(struct work_struct *work);
+void ipoib_mcast_carrier_on_task(struct work_struct *work);
 void ipoib_mcast_send(struct net_device *dev, void *mgid, struct sk_buff *skb);
 
 void ipoib_mcast_restart_task(struct work_struct *work);
commit	e8224e4b804b4fd26723191c1891101a5959bb8a	[log] [tgz]
author	Yossi Etigin <yossi.openib@gmail.com>	Tue Sep 16 11:57:45 2008 -0700
committer	Roland Dreier <rolandd@cisco.com>	Tue Sep 16 11:57:45 2008 -0700
tree	94aa1274989fca8154bd3912d5f73239e705e7a3
parent	1941246dd98089dd637f44d3bd4f6cc1c61aa9e4 [diff] [blame]