despite documentation that makes it sound a lot different, the only
ABI-constraint difference between TLS variants II and I seems to be
that variant II stores the initial TLS segment immediately below the
thread pointer (i.e. the thread pointer points to the end of it) and
variant I stores the initial TLS segment above the thread pointer,
requiring the thread descriptor to be stored below. the actual value
stored in the thread pointer register also tends to have per-arch
random offsets applied to it for silly micro-optimization purposes.
with these changes applied, TLS should be basically working on all
supported archs except microblaze. I'm still working on getting the
necessary information and a working toolchain that can build TLS
binaries for microblaze, but in theory, static-linked programs with
TLS and dynamic-linked programs where only the main executable uses
TLS should already work on microblaze.
alignment constraints have not yet been heavily tested, so it's
possible that this code does not always align TLS segments correctly
on archs that need TLS variant I.