Improved LSE: Replacing loads with Phis.
Create "Phi placeholders" for tracking heap values that can
merge from different values and try to match existing Phis
or create new Phis to replace loads. For Phi placeholders
from loop headers we do not know whether they are fed by
unknown values through back-edges when processing the loop
header, so we delay processing loads that depend on them
until we walked the entire graph. We then try to match them
with existing instructions (when the location is unchanged
in the loop) or Phis or create new Phis if needed. If we
find a loop Phi placeholder fed with unknown value from a
back-edge, we mark the Phi placeholder unreplaceable and
reprocess loads and stores to propagate the unknown value.
This can sometimes allow other loads to be replaced. At the
end we re-calculate the heap values to find stores that can
be eliminated because they write over the same value.
Golem results:
art-opt-cc arm arm64 x86 x86-64
CaffeineFloat +6.7% +3.0% +5.9% +3.8%
KotlinMicroWhen +33.7% +4.8% +1.8% +0.6%
art-opt (more noisy than art-opt-cc)
CaffeineFloat +4.1% +4.4% +7.8% +10.5%
KotlinMicroWhen +33.6% +2.0% +1.8% +1.8%
The MoveLiteralColumn benchmark seems to gain significantly
(up to 22% on art-opt-cc but under 10% on art-opt) but it is
very noisy and the results are therefore unreliable.
Insignificant code size changes for aosp_blueline-userdebug:
- before:
arm boot*.oat: 15303468
arm64 boot*.oat: 18184736
services.odex: 25195944
grep -c pAllocObject boot.arm64.oatdump.txt: 27213
grep -c pAllocArray boot.arm64.oatdump.txt: 3620
- after:
arm boot*.oat: 15299524 (-4KiB, -0.03%)
arm64 boot*.oat: 18176528 (-8KiB, -0.05%)
services.odex: 25191832 (-4KiB, -0.02%)
grep -c pAllocObject boot.arm64.oatdump.txt: 27206 (-7)
grep -c pAllocArray boot.arm64.oatdump.txt: 3615 (-5)
Test: New tests in 530-checker-lse.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: blueline-userdebug boots.
Bug: 77906240
Change-Id: Ia9fe0cd3530f9d3941650dfefc00a7f7fd821994
16 files changed