Use 28 bits for type check bit string.

And reverse the order of fields in the Class::status_. This
avoids generated code size increase:
  - ClassStatus in high bits allows class initialization
    check using "status_high_byte < (kInitialized << 4)"
    which is unaffected by the low 4 bits of LHS instead of
    needing to extract the status bits,
  - the type check bit string in the bottom bits instead of
    somewehere in the middle allows the comparison on ARM
    to be done using the same code size as with the old
    layout in most cases (except when the compared value is
    9-16 bits and not a modified immediate: 2 bytes less for
    9-12 bits and sometimes 2 bytes more for 13-16 bits; the
    latter could be worked around using LDRH if the second
    character's boundary is at 16 bits).

Add one of the extra bits to the 2nd character to push its
boundary to 16 bits so that we can test an implementation
using 16-bit loads in a subsequent CL, arbitrarily add the
other three bits to the 3rd character. This CL is only
about making those bits available and allowing testing, the
determination of how to use the additonal bits for the best
impact (whether to have a 4th character or distribute them
differently among the three characters) shall be done later.

Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: Pixel 2 XL boots.
Test: testrunner.py --target --optimizing
Bug: 64692057
Change-Id: I38c59837e3df3accb813fb1e04dc42e9afcd2d73
16 files changed