Archive for October, 2006

50 Books For Everyone In the Game Industry

Wednesday, October 11th, 2006

For those with a bunch of Amazon gift certificates and a month to spare, here’s Next Generation’s list of 50 Books For Everyone In the Game Industry. I’ve read, what, four? Too bad I don’t have the month, or the gift certificates.

Single-channel DXT1 compression

Tuesday, October 3rd, 2006

I did some experiments recently with “fast” (runtime) compression of single-channel images into the green channel of DXT1 blocks (6 bits of
endpoint precision). We use the so-called splat mapping technique of terrain texturing, and the splat maps, which get regenerated every time something changes the terrain textures (e.g. your mouse moves in the editor, when the terrain texture brush is active), occupy 8 bits per pixel and are pretty large.

I tried the so-called “range fit“, where the extremal points of the
input luminance data are used for endpoints, which is very fast, but
produced 20 units of mean square error over my (admittedly unrealistic) purely random test image. (Here and below, the error unit = 1/255 of max possible value, so literally units in a 8-bit channel.) For comparison, the mean square error from simply encoding into 6 bits is around 2 units. What does “very fast” mean? Without any effort spent in making the code run faster, the straightforward range fit takes around 600 ns per DXT1 block, which would mean on the order of 40 ms for a 1024×1024 image, or 25 MB/sec, if that sounds more familiar to you. This probably can be improved at least twice, if you tinker with it. (All numbers taken on a 2 GHz Athlon64. Its official moniker is “X2 3800+”, but since my tests were single-threaded, the X2 doesn’t matter, and it probably would have been called “3000+” if it was a single-core CPU.)

I tried brute force search, where all 4096/2 combinations of endpoints
are used, which produced around 15 units of MSE, but was very, very
slow.

Next I tried brute forcing not the entire endpoint space, but just
“wobbling” the endpoints around the extremal points of the input
range; varying the amplitude of this wobbling exhibited the typical
“90% of the benefit for 10% of the effort” curve with the sweet spot
around 8 units of the 6-bit endpoint range producing 15-ish MSEs for
less than 10% of the brute force time.

Of course, you wouldn’t want to compress purely random data; on reasonable images, like a splatmap, or the luminance channel of a diffuse color texture, the numbers are much, much more reasonable: 2.25 units of MSE for 6-bit luminance; 4.6 units of MSE for range fit into DXT1, which is virtually instant; and 3.4 units of MSE for brute force DXT1 search, which takes forever. Again, 8 units of “wobbling” the endpoints gets you within 1% of the MSE for 10% of the running time of the brute force search.

Moral of the story: it’s perfectly reasonable to compress single-channel images into DXT1 at runtime using the range fit algorithm. If you need better results, you can look several steps around the range fit endpoints. And, by the way, the x64 versions ran about 5% faster.

Update: Range fit on the X2 3800+ reaches 75 MB/s, not 25.