musl/src/math, branch v1.2.3

add SPE FPU support to powerpc-sf

2021-09-23T23:11:46+00:00

When the soft-float ABI for PowerPC was added in commit
5a92dd95c77cee81755f1a441ae0b71e3ae2bcdb, with Freescale cpus using
the alternative SPE FPU as the main use case, it was noted that we
could probably support hard float on them, but that it would involve
determining some difficult ABI constraints. This commit is the
completion of that work.

The Power-Arch-32 ABI supplement defines the ABI profiles, and indeed
ATR-SPE is built on ATR-SOFT-FLOAT. But setjmp/longjmp compatibility
are problematic for the same reason they're problematic on ARM, where
optional float-related parts of the register file are "call-saved if
present". This requires testing __hwcap, which is now done.

In keeping with the existing powerpc-sf subarch definition, which did
not have fenv, the fenv macros are not defined for SPE and the SPEFSCR
control register is left (and assumed to start in) the default mode.

math: fix fmaf not to depend on FE_TOWARDZERO

2021-07-06T04:29:57+00:00

math: fix expm1f overflow threshold

2021-02-10T19:06:50+00:00

the threshold was wrong so expm1f overflowed to inf a bit too early
and on most targets uint32_t compare is faster than float compare so
use that.

this also fixes sinhf incorrectly returning nan for some values where
the internal expm1f overflowed.

math: fix acoshf for negative inputs

2021-02-10T19:06:36+00:00

on some negative inputs (e.g. -0x1.1e6ae8p+5) acoshf failed to return
nan. ensure that negative inputs result nan without introducing new
branches. this was tried before in

  commit 101e6012856918440b5d7474739c3fc22a8d3b85
  math: fix acoshf on negative values

but that fix was wrong. there are 3 formulas used:

  log1p(x-1 + sqrt((x-1)*(x-1)+2*(x-1)))
  log(2*x - 1/(x+sqrt(x*x-1)))
  log(x) + 0.693147180559945309417232121458176568

the first fails on large negative inputs (may compute log1p(0) or
log1p(inf)), the second one fails on some mid range or large negative
inputs (may compute log(large) or log(inf)) and the last one fails on
-0 (returns -inf).

arm fabs and sqrt: support single-precision-only fpu variants

2020-11-29T05:49:24+00:00

math: new software sqrtl

2020-08-06T03:06:01+00:00

same approach as in sqrt.

sqrtl was broken on aarch64, riscv64 and s390x targets because
of missing quad precision support and on m68k-sf because of
missing ld80 sqrtl.

this implementation is written for quad precision and then
edited to make it work for both m68k and x86 style ld80 formats
too, but it is not expected to be optimal for them.

note: using fp instructions for the initial estimate when such
instructions are available (e.g. double prec sqrt or rsqrt) is
avoided because of fenv correctness.

math: add __math_invalidl

2020-08-06T03:05:57+00:00

for targets where long double is different from double.

math: new software sqrtf

2020-08-06T03:05:36+00:00

same method as in sqrt, this was tested on all inputs against
an sqrtf instruction. (the only difference found was that x86
sqrtf does not signal the x86 specific input-denormal exception
on negative subnormal inputs while the software sqrtf does,
this is fine as it was designed for ieee754 exceptions only.)

there is known faster method:
"Computing Floating-Point Square Roots via Bivariate Polynomial Evaluation"
that computes sqrtf directly via pipelined polynomial evaluation
which allows more parallelism, but the design does not generalize
easily to higher precisions.

math: new software sqrt

2020-08-06T03:05:33+00:00

approximate 1/sqrt(x) and sqrt(x) with goldschmidt iterations.
this is known to be a fast method for computing sqrt, but it is
tricky to get right, so added detailed comments.

use a lookup table for the initial estimate, this adds 256bytes
rodata but it can be shared between sqrt, sqrtf and sqrtl.
this saves one iteration compared to a linear estimate.

this is for soft float targets, but it supports fenv by using a
floating-point operation to get the final result.  the result
is correctly rounded in all rounding modes.  if fenv support is
turned off then the nearest rounded result is computed and
inexact exception is not signaled.

assumes fast 32bit integer arithmetics and 32 to 64bit mul.

add m68k sqrtl using native instruction

2020-08-03T03:31:51+00:00

this is actually a functional fix at present, since the C sqrtl does
not support ld80 and just wraps double sqrt. once that's fixed it will
just be an optimization.