Initially allocate smaller local IRT

Introduce a separate small object allocator to allocate tables for
the initial local indirect reference tables, so they don't each
occupy a full page. This seemed to be easier than using linear_alloc,
since these allocations are done from multiple threads. This also
has the advantage that GC roots are concentrated on separate
pages, that could be protected as before, but there are many fewer of
then than before.

As discussed with Lokesh, it might eventually be better to allocate
these in the Java heap. But doing that cleanly seems to require a
major refactoring to split IrtEntrys into two separate components,
which complicates iteration, etc. And that has locality disadvantages
during lookup. Or we need to either drop the serial number of merge
it into the GcRoot, neither of which is ideal either.

Drive-by-fix: When trimming, don't call madvise on empty address ranges.

Test: Build and boot AOSP.
Test: art/test/testrunner/testrunner.py --host -b --64
Bug: 184847225
Change-Id: I297646acbdd9dbeab4af47461849fffa2fee23b1
diff --git a/runtime/runtime.cc b/runtime/runtime.cc
index ccd5443..0f6b521 100644
--- a/runtime/runtime.cc
+++ b/runtime/runtime.cc
@@ -89,6 +89,7 @@
 #include "handle_scope-inl.h"
 #include "hidden_api.h"
 #include "image-inl.h"
+#include "indirect_reference_table.h"
 #include "instrumentation.h"
 #include "intern_table-inl.h"
 #include "interpreter/interpreter.h"
@@ -497,6 +498,8 @@
   monitor_pool_ = nullptr;
   delete class_linker_;
   class_linker_ = nullptr;
+  delete small_irt_allocator_;
+  small_irt_allocator_ = nullptr;
   delete heap_;
   heap_ = nullptr;
   delete intern_table_;
@@ -1658,6 +1661,8 @@
   }
   linear_alloc_.reset(CreateLinearAlloc());
 
+  small_irt_allocator_ = new SmallIrtAllocator();
+
   BlockSignals();
   InitPlatformSignalHandlers();