The Four Screens
August 29th, 2007I’ve been thinking lately howВ a game programmer should have not the (hopefully) standard two, but four monitors on his PC. Besides the usual two for the game itself and his favorite IDE, he must have one for the generated machine code, because you can’t really trust a compiler even with the most basic of tasks. And one for his profiler of choice, to always keep an eye of what the game really is doing.
Real-world example:В a misunderstood feature for dynamic remapping of terrain layer textures leads to a per-frame resolving of texture filenames to device texture objects. Textures are looked up by filename, which must be handledВ without regard for case; and stored in a hash map. The unfortunate combination of three seemingly simple, convenient tools - dynamic reloading of the terrain table, a case-insensitive std::string wrapper, and a hash-map - lead to 40 000 calls to tolower() per frame. Not a big deal - nothing you’d even think about without a profiler. But here goes 1% of the total CPU time of a completely CPU-bound game - a percent which you’d have a hard time to shave off somewhere else.
Another one: an innocent, smallish 12-byte structure is augmented with a “dirty” flag, to keep track of yet another type of dynamically modifiable data. The structure is put in a three-dimensional array - which have to be banned outright, by the way. Then the array is “sliced” - iterated over (the wrong) two of the dimensions, and the dirty flag is checked.В This particular access pattern and the particular arrayВ and structureВ sizesВ lead to about of 4000 successive cache misses per-frame. Which take up to 2.5% of the frame time on the target platform, if you’re going for 30 fps. See it in a profiler, and spend 5 minutes to move the dirty flags to a separate array (or keep a “modified data” queue aside if you have 30 minutes), and shave off these 2.5%. Or don’t, andВ shipВ a seemingly bottleneck-free, but severely underpeforming game, full of such pearls of wisdom.