crypto: serpent - add 4-way parallel i586/SSE2 assembler implementation
Patch adds i586/SSE2 assembler implementation of serpent cipher. Assembler
functions crypt data in four block chunks.
Patch has been tested with tcrypt and automated filesystem tests.
Tcrypt benchmarks results (serpent-sse2/serpent_generic speed ratios):
Intel Atom N270:
size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec
16 0.95x 1.12x 1.02x 1.07x 0.97x 0.98x
64 1.73x 1.82x 1.08x 1.82x 1.72x 1.73x
256 2.08x 2.00x 1.04x 2.07x 1.99x 2.01x
1024 2.28x 2.18x 1.05x 2.23x 2.17x 2.20x
8192 2.28x 2.13x 1.05x 2.23x 2.18x 2.20x
Full output:
http://koti.mbnet.fi/axh/kernel/crypto/atom-n270/serpent-generic.txt
http://koti.mbnet.fi/axh/kernel/crypto/atom-n270/serpent-sse2.txt
Userspace test results:
Encryption/decryption of sse2-i586 vs generic on Intel Atom N270:
encrypt: 2.35x
decrypt: 2.54x
Encryption/decryption of sse2-i586 vs generic on AMD Phenom II:
encrypt: 1.82x
decrypt: 2.51x
Encryption/decryption of sse2-i586 vs generic on Intel Xeon E7330:
encrypt: 2.99x
decrypt: 3.48x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
4 files changed