MIPS32: Additional bitCount optimizations.

The original algorithm computed the 64-bit bitCount by counting the
bits in two 32-bit words (sort of) in parallel. It was recognized that
at some point the subtotals for the words could be added reducing the
total number of operations to count the set bits for the original
64-bit input value. Doing so not only reduced the number of
instructions needed for the computation but also eliminated one
multiply instruction, and, typically, multiply instructions are multi-
cycles instructions.

Test: Boot MIPS32 QEMU and run 564-checker-bitcount tests.

Change-Id: Ifcbb56812a02a91ac1777543448b207ec0e1e5a6
1 file changed