target: Fix percpu_ref_put race in transport_lun_remove_cmd

This patch fixes a percpu_ref_put race for se_lun->lun_ref in
transport_lun_remove_cmd() where ->lun_ref could end up being
put more than once per command via different target completion
and fabric release contexts.

It adds a cmpxchg() for se_cmd->lun_ref_active to ensure that
percpu_ref_put() is only ever called once per se_cmd.

This bug was manifesting itself as a LUN shutdown regression
bug in >= v3.13 code, where percpu_ref_kill() would end up
hanging indefinately due to the incorrect percpu_ref count.

(Change se_cmd->lun_ref_active from bool -> int to force at
 least a 4-byte cmpxchg with MIPS ll/sc ins. - Fengguang)

Reported-by: Tommy Apel <tommyapeldk@gmail.com>
Cc: Tommy Apel <tommyapeldk@gmail.com>
Cc: <stable@vger.kernel.org> #3.13+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 51a9736..c50fd9f 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -594,10 +594,11 @@
 {
 	struct se_lun *lun = cmd->se_lun;
 
-	if (!lun || !cmd->lun_ref_active)
+	if (!lun)
 		return;
 
-	percpu_ref_put(&lun->lun_ref);
+	if (cmpxchg(&cmd->lun_ref_active, true, false))
+		percpu_ref_put(&lun->lun_ref);
 }
 
 void transport_cmd_finish_abort(struct se_cmd *cmd, int remove)
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index d284186..909dacb 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -552,7 +552,7 @@
 	void			*priv;
 
 	/* Used for lun->lun_ref counting */
-	bool			lun_ref_active;
+	int			lun_ref_active;
 
 	/* DIF related members */
 	enum target_prot_op	prot_op;