ARM64: Combine LSR+ASR into ASR for Int32 HDiv/HRem

HDiv/HRem having a constant divisor are optimized by using
multiplication of the dividend by a sort of reciprocal of the divisor.
The multiplication is done by multiplying 32-bit numbers into a 64-bit
result. The high 32 bits of the result are used. In case of Int32 LSR
is used to get those bits. After that there might be correction
operations and ASR. When there are no correction operations between LSR
and ASR they can be combined into one ASR.

This CL implements this optimization.

Improvements (Pixel 3):
                                                little core  big core
  jit_aot/LoadCheck.RandomSumInvokeStaticMethod   7.1%         8.3%
  jit_aot/LoadCheck.RandomSumInvokeUserClass      4.6%         12.0%
  benchmarksgame/fasta                            3.3%         1.0%
  benchmarksgame/fasta_4                          2.4%         2.6%
  benchmarksgame/fastaredux                       2.2%         2.2%
  SPECjvm2k8 MPEGAudio                            1.7%         1.0%

Test: test.py --host --optimizing --jit
Test: test.py --target --optimizing --jit
Change-Id: I5267b38d3a58319e24152917fabe836d5b346bce
6 files changed