ART: Improve VisitStringGetCharsNoCheck intrinsic for compressed strings, using SIMD
The previous implementation of VisitStringGetCharsNoCheck
copies one character at a time for compressed strings (that
use 8 bits per char).
Instead, use SIMD instructions to copy 8 chars at once
where possible.
On a Pixel 3 phone:
Microbenchmarks for getCharsNoCheck on varying string
lengths show a speedup of up to 80% (big cores) and
70% (little cores) on long strings, and around 30% (big)
and 20% (little) on strings of only 8 characters.
The overhead for strings of < 8 characters is ~3%,
and is immediately amortized for strings of more
than 8 characters.
Dhrystone shows a consistent speedup of around 6% (big)
and 4% (little).
The getCharsNoCheck intrinsic is used by the StringBuilder
append() method, which is used by the String concatenate
operator ('+').
Image size change:
Before:
boot-core-libart.oat: 549040
boot.oat: 3789080
boot-framework.oat: 13356576
After:
boot-core-libart.oat: 549024 (-16B)
boot.oat: 3789144 (+64B)
boot-framework.oat: 13356576 (+ 0B)
Test: test_art_target.sh, test_art_host.sh
Test: 536-checker-intrinsic-optimization
Change-Id: I865e3df6d4725e151ae195a86e02e090dae8dd29
2 files changed