it turns out to be better to repeat the jacobi steps on a given (p,q) pair until it
is diagonal to machine precision, before going to the next (p,q) pair. it's also
an optimization as experiments show that in a majority of cases this allows to find out
that the (p,q) pair is already diagonal to machine precision.