My list of subscribed blogs failed to notify me of the release of an interesting tool by ATI called Tootle, and the associated paper on which it is based. (Well, your ”blogroll” is better than mine, because now you’ve learned about it from me.) What it does is to rearrange mesh indices to optimize both for vertex cache hit rate AND early-Z hit rate of the resulting pixels; they do it by splitting the mesh in several “roughly planar” patches, doing traditional vertex cache order optimization within each patch, and order the patches to minimize overdraw.
They do it, however, with what seems to be an atrocious preprocessing cost. For example, for a mesh split in 32 patches (which is the low end of the range they cite), they do on the order of 32*32*162 renders of portions of the mesh to determine patch vs. patch occlusion via hardware occlusion culling; it’s not surprising they don’t mention preprocessing times in their results - to render 160K frames should take on the order of minutes, even if you’re being clever.