223e23e8aa26b0bb62c597637e77295e14f6a62c - SHIFTPHONES/kernel/common

commit	223e23e8aa26b0bb62c597637e77295e14f6a62c	[log] [tgz]
author	Will Deacon <will.deacon@arm.com>	Tue Feb 02 12:46:25 2016 +0000
committer	Catalin Marinas <catalin.marinas@arm.com>	Tue Feb 16 15:12:33 2016 +0000
tree	264cb0aa4882664aba7561bcde45537b42935aa5
parent	d5370f754875460662abe8561388e019d90dd0c4 [diff]

arm64: lib: improve copy_page to deal with 128 bytes at a time

We want to avoid lots of different copy_page implementations, settling
for something that is "good enough" everywhere and hopefully easy to
understand and maintain whilst we're at it.

This patch reworks our copy_page implementation based on discussions
with Cavium on the list and benchmarking on Cortex-A processors so that:

  - The loop is unrolled to copy 128 bytes per iteration

  - The reads are offset so that we read from the next 128-byte block
    in the same iteration that we store the previous block

  - Explicit prefetch instructions are removed for now, since they hurt
    performance on CPUs with hardware prefetching

  - The loop exit condition is calculated at the start of the loop

Signed-off-by: Will Deacon <will.deacon@arm.com>
Tested-by: Andrew Pinski <apinski@cavium.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arch/arm64/lib/copy_page.S[diff]

1 file changed

tree: 264cb0aa4882664aba7561bcde45537b42935aa5