documentation: Fix circular-buffer example.

The code sample in Documentation/circular-buffers.txt appears to have a
few ordering bugs.  This patch therefore applies the needed fixes.

Reported-by: Lech Fomicki <lfomicki@poczta.fm>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
diff --git a/Documentation/circular-buffers.txt b/Documentation/circular-buffers.txt
index 8117e5b..a36bed3 100644
--- a/Documentation/circular-buffers.txt
+++ b/Documentation/circular-buffers.txt
@@ -170,7 +170,7 @@
 
 		smp_wmb(); /* commit the item before incrementing the head */
 
-		buffer->head = (head + 1) & (buffer->size - 1);
+		ACCESS_ONCE(buffer->head) = (head + 1) & (buffer->size - 1);
 
 		/* wake_up() will make sure that the head is committed before
 		 * waking anyone up */
@@ -183,9 +183,14 @@
 before the head index makes it available to the consumer and then instructs the
 CPU that the revised head index must be written before the consumer is woken.
 
-Note that wake_up() doesn't have to be the exact mechanism used, but whatever
-is used must guarantee a (write) memory barrier between the update of the head
-index and the change of state of the consumer, if a change of state occurs.
+Note that wake_up() does not guarantee any sort of barrier unless something
+is actually awakened.  We therefore cannot rely on it for ordering.  However,
+there is always one element of the array left empty.  Therefore, the
+producer must produce two elements before it could possibly corrupt the
+element currently being read by the consumer.  Therefore, the unlock-lock
+pair between consecutive invocations of the consumer provides the necessary
+ordering between the read of the index indicating that the consumer has
+vacated a given element and the write by the producer to that same element.
 
 
 THE CONSUMER
@@ -200,7 +205,7 @@
 
 	if (CIRC_CNT(head, tail, buffer->size) >= 1) {
 		/* read index before reading contents at that index */
-		smp_read_barrier_depends();
+		smp_rmb();
 
 		/* extract one item from the buffer */
 		struct item *item = buffer[tail];
@@ -209,7 +214,7 @@
 
 		smp_mb(); /* finish reading descriptor before incrementing tail */
 
-		buffer->tail = (tail + 1) & (buffer->size - 1);
+		ACCESS_ONCE(buffer->tail) = (tail + 1) & (buffer->size - 1);
 	}
 
 	spin_unlock(&consumer_lock);
@@ -223,7 +228,10 @@
 This prevents the compiler from discarding and reloading its cached value -
 which some compilers will do across smp_read_barrier_depends().  This isn't
 strictly needed if you can be sure that the opposition index will _only_ be
-used the once.
+used the once.  Similarly, ACCESS_ONCE() is used in both algorithms to
+write the thread's index.  This documents the fact that we are writing
+to something that can be read concurrently and also prevents the compiler
+from tearing the store.
 
 
 ===============