the approach of this implementation was heavily investigated prior to
adopting it. attempts to obtain similar performance with pure C code
were capping out at about 75% of the performance of the asm, with
considerably larger code size, and were fragile in that the compiler
would sometimes compile part of memcpy into a call to itself.
therefore, just using the asm seems to be the best option.
this commit is the first to make use of the new subarch-specific asm
framework. the new armel directory is the location for arm asm that
should not be used for all arm subarchs, only the default one. armhf
is the name of the little-endian hardfloat-ABI subarch, which can use
the exact same asm. in both cases, the build system finds the asm by
following a memcpy.sub file.
the other two subarchs, armeb and armebhf, would need a big-endian
variant of this code. it would not be hard to adapt the code to big
endian, but I will hold off on doing so until there is demand for it.