Use the small thread-local cache for mterp field accesses.

This reduces the overhead of non-quickened code from 10% to 7.5%.
(measured on golem benchmarks for arm64)

Test: ./art/test.py -b -r --interpreter
Change-Id: Icce9183eb60c62ac30a0c6ff57e32c796c807f03
2 files changed