this implementation is rather heavy-weight, but it's the first
solution i've found that's actually correct. all waiters actually wait
twice at the barrier so that they can synchronize exit, and they hold
a "vm lock" that prevents changes to virtual memory mappings (and
blocks pthread_barrier_destroy) until all waiters are finished
inspecting the barrier.
thus, it is safe for any thread to destroy and/or unmap the barrier's
memory as soon as pthread_barrier_wait returns, without further