Thursday, February 28, 2013

Atomic function implementation in ARM architecture


LDREX and STREX are the instructions on which atomic functions built on. Let's look briefly at those two instructions.



LDREX
 

LDREX loads data from memory.
  • If the physical address has the Shared TLB attribute, LDREX tags the physical address as exclusive access for the current processor, and clears any exclusive access tag for this processor for any other physical address.
  • Otherwise, it tags the fact that the executing processor has an outstanding tagged physical address.


STREX

STREX performs a conditional store to memory. The conditions are as follows:
  • If the physical address does not have the Shared TLB attribute, and the executing processor has an outstanding tagged physical address, the store takes place and the tag is cleared.
  • If the physical address does not have the Shared TLB attribute, and the executing processor does not have an outstanding tagged physical address, the store does not take place.
  • If the physical address has the Shared TLB attribute, and the physical address is tagged as exclusive access for the executing processor, the store takes place and the tag is cleared.
  • If the physical address has the Shared TLB attribute, and the physical address is not tagged as exclusive access for the executing processor, the store does not take place.


LDREX{size}{cond} Rd, {Rd2,} [Rn {, #offset}]
STREX{size}{cond} Rd, Rm, {Rm2,} [Rn {, #offset}]

Rd
is the destination register. After the instruction, this contains:
  • for LDREX, the data loaded from memory
  • for STREX, either:
    • 0: if the instruction succeeds
    • 1: if the instruction is locked out.


For more information about the above mentioned ARM instructions, refer the following link:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204g/Cihbghef.html


The following discussion is based on the assumption that the you are aware of the inline assembly in gcc. If you are not aware of the inline assembly in gcc, check out the follwoing links.
http://www.ethernut.de/en/documents/arm-inline-asm.html
http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html


There are various atomic functions like:
  • atomic_add( )
  • atomic_sub( )
  • atomic_cmpxchg( )
  • atomic_clear_mask( )
  • atomic_inc()
  • atomic_dec()
  • atomic_inc_and_test()
  • atomic_dec_and_test() etc


All of them are based on the same principle of using LDREX and STREX. Let's look at the implementation of atomic_add() and you can check the rest of the fucntions in arch/arm/include/asm/atomic.h which are almost similar in implementation.

/*
 * ARMv6 UP and SMP safe atomic ops.  We use load exclusive and
 * store exclusive to ensure that these are atomic.  We may loop
 * to ensure that the update happens.
 */
static inline void atomic_add(int i, atomic_t *v)
{
        unsigned long tmp;
        int result;

        __asm__ __volatile__("@ atomic_add\n"
"1:     ldrex   %0, [%3]\n"
"       add     %0, %0, %4\n"
"       strex   %1, %0, [%3]\n"
"       teq     %1, #0\n"
"       bne     1b"
        : "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
        : "r" (&v->counter), "Ir" (i)
        : "cc");
}



Here is the typedef for atomic variable:

typedef struct {
        int counter;
} atomic_t;


Let's go through instruction by instruction.


Scenario 1: atomic_add() succeeded in one go as no one else is accessing that variable which has to be incremented atomically.

ldrex   %0, [%3]
The address of the variable that has to incremented is in %3. The value at this address is copied into %0 i.e. 'result'.
LDREX tags the physical address as exclusive access for the current processor, and clears any exclusive access tag for this processor for any other physical address.

add     %0, %0, %4
Adds i to the vaiable which we want to increment.

strex   %1, %0, [%3]
Stores the incremented value in the memory and writes 0 into %1 i.e. 'tmp'.

teq     %1, #0
Sets the Z bit in CPSR as 'tmp' is 0.

bne     1b
As the Z bit in CPSR is set, this doesn't branch to the label '1:' and proceeds further.



Scenario 2: atomic_add() is not succeeded in first pass, but may succeed in one of the later passes. This happens when some one else is atomically accessing this variable.

First pass:

ldrex   %0, [%3]
The address of the variable that has to incremented is in %3. The value at this address is copied into %0 i.e. 'result'.
In the 1st pass, it tags the fact that the executing processor has an outstanding tagged physical address and LDREX can't tag the physical address as exclusive access for the current processor.

add     %0, %0, %4
Adds i to the vaiable which we want to increment.

strex   %1, %0, [%3]
Fails to store the incremented value in the memory as  the physical address is not tagged as exclusive access for the executing processor and writes 1 into %1 i.e. 'tmp'.

teq     %1, #0
Clears the Z bit in CPSR as 'tmp' is 1.

bne     1b
As the Z bit in CPSR is cleared, this branches to the label '1:' and repeats tha above instuctions.


2nd or later pass:

If no one is accessing the vaiable that we want to increment atomically then the code flow is same as Scenario 1, other wise the code flow is same as Scenraio 2: First pass.





1 comment:

  1. Do the __sync_fetch_and_xxx functions work the same way? What header should I include if working on user space app for Linux on ARM (omap3)? Thanks

    ReplyDelete