Oh, I had missed part of the question. I don't recall why the documentation limits to 0x7FFF; either it was just to be conservative and not "leak" a constraint from either of the field implementations into the interface, or it was so an int on platforms with 16-bit int could be used.
Regarding the ranges of permitted magnitudes: indeed, the point is to avoid carries in additions. By having even just a few slack bits in every limb, it's possible to have field elements with a temporarily "denormalized" representation (where the individual limb values exceed 2^26 or 2^52). The restrictions on how much they permit exceeding that 2^26 or 2^52 depends mostly on the multiplication code, with is optimized to take advantage of these limits.