crypto: run initcalls for generic implementations earlier

Use subsys_initcall for registration of all templates and generic
algorithm implementations, rather than module_init.  Then change
cryptomgr to use arch_initcall, to place it before the subsys_initcalls.

This is needed so that when both a generic and optimized implementation
of an algorithm are built into the kernel (not loadable modules), the
generic implementation is registered before the optimized one.
Otherwise, the self-tests for the optimized implementation are unable to
allocate the generic implementation for the new comparison fuzz tests.

Note that on arm, a side effect of this change is that self-tests for
generic implementations may run before the unaligned access handler has
been installed.  So, unaligned accesses will crash the kernel.  This is
arguably a good thing as it makes it easier to detect that type of bug.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
diff --git a/crypto/nhpoly1305.c b/crypto/nhpoly1305.c
index ec831a5..9ab4e07 100644
--- a/crypto/nhpoly1305.c
+++ b/crypto/nhpoly1305.c
@@ -244,7 +244,7 @@ static void __exit nhpoly1305_mod_exit(void)
 	crypto_unregister_shash(&nhpoly1305_alg);
 }
 
-module_init(nhpoly1305_mod_init);
+subsys_initcall(nhpoly1305_mod_init);
 module_exit(nhpoly1305_mod_exit);
 
 MODULE_DESCRIPTION("NHPoly1305 ε-almost-∆-universal hash function");