Archive for the ‘Uncategorized’ Category

Tipsy Trounces Tootle

Friday, April 11th, 2008

A bit more than a year ago I found ATI Tootle, an interesting mesh preprocessing tool for simultaneous vertex cache AND overdraw optimization. Somebody wisely commented then that “there’s probably something that is 1000x faster and 99% as good”. Well, that thing is here today and it’s bearing the equaly ridiculous name Tipsy. I won’t get around to actually implementing it, just as I didn’t with Tootle, but I sincerely hope the fine purveyors of the relevant middleware we use will find the time.

PC Gaming Must Die

Saturday, October 6th, 2007

My seemingly innocent question about querying the video card memory on Vista turned into a 42-post bloodbath, giving little in the way of useful answers, but illustrating perfectly how there is no such thing as a “PC gaming platform”, and why I want to get out of the PC mess ASAP.

The problem with the video card memory size isn’t new. It’s a question Microsoft are actively trying to lie to you when answering; the simple query function in DirectX 9, IDirect3DDevice9::GetAvailableTextureMem, has always lied - it returns the memory on the video card plus the available physical system memory plus the current distance between the Mars moons Phobos and Deimos in Russian versts, divided by the temperature of the water in Sammamish Lake expressed in Rheomur degrees. An infiltrated enemy agent managed to sneak in the IDxDiag interface which works more or less reliably on XP, but in the run up to Vista he was discovered, shot down, and the oversight was corrected: on Vista the same IDxDiag code returns rubbish too - to the extend that even the DxDiag tool shipped with DirectX, and which countless QA staff and even users have been trained to run and send dumps, has become useless in that regard. So now you have to resort to quality software engineering techniques such as using DirectX 7 or DirectX 10 in an otherwise top-to-bottom DirectX 9 application. Or running GetAvailableTextureMem and subtracting the physical memory. Or dividing it by three. Or assuming that everybody with Vista has 256 MB of RAM on the videocard - hey, it’s the current mode, why not?

Apparently the Microsoft position is that it’s no business of the developer to know how much fast memory he can use. Please pretend that you can use as many textures as you like, we’ll take care of it. If we gave you this number, you’d only do stupid things with it… we’re here from the OS vendor, and we’re here to help you! Relax and watch the blinkenlights. People went so even as far as to suggest ridiculous things like start the game with the options screen so the user can pick the best settings for him (the “all my friends are geeks with $600 videocards” solution), or start the game with the ugliest settings by default (the “who cares about review scores” solution). What’s interesting is the clear demarcation line between people who are actually shipping games to be sold to real-world humans for a living and find real value knowing the video memory size, and the rest - technical evangelists, high-end demo creators and academics, who’s idea of development is pushing high-end hardware around and occasionally presenting to enthusiast users.

The PC as a platform is hopelessly fragmented. The rift between the high end and the low end is bigger than ever, from the Crysis crowd who consider 6800-class hardware “low-end”, and the Zuma clone crowd who don’t even have GPUs to speak of. The vendors - ATI^H^H^HAMD, NVIDIA, Microsoft - are pulling the rug each in their own direction, with little to no support to developers trying to stick to the rapidly disappearing “middle ground”, what was the mainstream PC gaming of a few years ago. (The rumored Intel intrusion in the field, trying to push raytracing on multi-multicore CPUs will make things much worse in this regard.) The publishers demand support for hardware (in our case, DX81-class GPUs) which has long ago fallen off the radar of Microsoft and isn’t even targeted by the shader compilers released with the DirectX SDKs. The reviewers demand graphics quality rivaling the multimillion 6-hour cinematic fests subsidized by console vendors and passed off as “AAA games”. The users demand not to to think. And a pony.

If only there were platforms where the hardware was cheap and powerful, the drivers appeared three or four times a year, the vendor was eager to help your development, and there were tens of millions of users eager to buy games. I would gladly accept the lack of a GetAvailableTextureMemory function - I’d replace it with a compile-time constant in a heartbeat.

Some days are better than others

Wednesday, September 12th, 2007

This wouldn’t be a real blog if I didn’t bitch about life in general, and post pictures of my cat. Unfortunately, I don’t have cat. Fortunately, although there has been much to bitch about around me lately, I had two bright spots in my day today, both vaguely music-related. First, one of my favorite games, Rez, is coming to XBLA. I have spent lots of hours in Rez, trying to beat this or that boss; usually I lose quickly interest in games which are mostly about being hard and presenting a challenge (that’s why I’m not so excite about the other oldskool legend announced today for XBLA, Ikaruga). And in the best of times, I’m indifferent by the trance/electronic/whatchamacallit type of music in Rez. But there’s something about it that hypnotizes me for hours, chasing the lines on the screen. If there was a game that could benefit from HDTV, it would be Rez; in a perfect world it should come with a vector screen, even. Eh, maybe I’m just a sucker for rail shooters, and the next announcement that would make me just as happy would be a Panzer Dragoon Orta spiritual sequel. By the way, Rez can’t work on the PS3 in its present form - it really needs the throbbing of the controller in your hands as part of the experience. (Please, no trance vibrator jokes… if at all possible.)

The second bright spot was this video of somebody called Richard Lewis (sorry, I even tried to learn who this guy was, without luck) singing a gentle love song accompanied by his Nintendo DS, strumming virtual chords with his stylus in a game called Jam Sessions. If this is not a killer app, I don’t know what is.

Local storage vs. caches

Monday, September 3rd, 2007

Somebody over at Beyond3D linked an interesting paper evaluating the tradeoffs between a software-managed local store (or scratchpad memory, as it was known on earlier Sony platforms) and the hardware-managed caches known from the desktop CPUs:

There are two basic models for the on-chip memory in CMP systems: hardware-managed coherent caches and software-managed streaming memory. This paper performs a direct comparison of the two models under the same set of assumptions about technology,
area, and computational capabilities. The goal is to quantify how and when they differ in terms of performance, energy consumption, bandwidth requirements, and latency tolerance for general-purpose CMPs. We demonstrate that for data-parallel applications, the cache-based and streaming models perform and scale equally well. For certain applications with little data reuse, streaming scales better due to better bandwidth use and macroscopic software prefetching. However, the introduction of techniques such as hardware prefetching and non-allocating stores to the cache-based model eliminates the streaming advantage. Overall, our results indicate that there is not sufficient advantage in building streaming memory systems where all on-chip memory structures are explicitly man-
aged. On the other hand, we show that streaming at the programming model level is particularly beneficial, even with the cache-based model, as it enhances locality and creates opportunities for bandwidth optimizations. Moreover, we observe that stream programming is actually easier with the cache-based model because the hardware guarantees correct, best-effort execution even when the programmer cannot fully regularize an application’s code.

T-Rex Playstation Demo from 1993

Monday, February 5th, 2007

I’ve heard long time ago legends of the original Playstation tech demo featuring a T-Rex, but couldn’t find it. Finally somebody uploaded it to YouTube for all the world to admire:

Comments from the thread on B3D:

This demo was first shown off on Playstation hardware back in 1993… 3dfx didn’t exist (as a company), Matrox hadn’t released *any* of it’s consumer cards with 3D acceleration yet (they were the first), and the fastest piece of x86 you could lay your hands on was a 66MHz P5 (which typically ranged between $5K-$8K for a machine with one).

Roberto Ierusalimschy on threads in Lua

Monday, February 5th, 2007

…We did not (and still do not) believe in the standard multithreading model, which is preemptive concurrency with shared memory: we still think that no one can write correct programs in a language where ‘a=a+1’ is not deterministic.

Making Of Lost Planet / Dead Rising

Wednesday, January 31st, 2007

Since “making of sotc” remains as the most important search phrase leading people here, I will attempt to whore some more hits by linking another interesting article, this time about Capcom’s new engine, used for Dead Rising and Lost Planet now, and for Devil May Cry 4 and Resident Evil 5 in the future. The Japanese original is here, and a semi-translated version is here. (For some reason Google Translate doesn’t translate all of the text - possibly on purpose.)

Several highlights, translated by kind people on various forums:

  1. This MT engine work started in Sept. 2004 and they had something up and running by January 2005. It’s based on the Onimusha 3 engine. The project started with just one engineer, then ramped up to 3, and now they have 5 people maintaining and upgrading the code. They added 4 people now just for the PS3 port which started in Oct. 2005. The MT engine is currently used for Dead Rising and Lost Planet, but future cross-platform next-gen games will also use it.
  2. They evaluated UE3, and they see they appreciate the strengths of that engine. But they were worried about some of the performance limitations at the time, and the lack of support personnel in Japan. They have high hopes for UE3 in the future. But they decided this time to go at building their own tech.
  3. They started with Xbox 360 since it is so close to the PC platform and mostly compatible.
  4. There have been requests from developers to license their engine due to the success of Lost Planet and Dead Rising. But Capcom feels that it would take too much effort to hire the appropriate support staff. They would rather put more effort into developing even better games for their users.
  5. They talk a bit about the multithreading techniques they are using to get the power out of the asynchronously multicore CPUs in the 360 and PS3.
  6. They give a detailed description of the technique and provide screenshots to show they are using to do motion blur on the Xbox 360. The algorithms are based on a talk given by Simon Green at NVIDIA at GDC back in 2003. (This is the one aspect of Lost Planet that looks truly next-gen, and makes the game really stand out and look unbelievably beautiful.)
  7. More info on Xenon performance, 1 3.2Ghz core = 2/3 power of P4 3.2Ghz, but if use all 6 threads, you can get 4X Xenon 3.2Ghz 3 cores 6 SMT = P4EE 3.2Ghz 2 cores 4 SMT. And the memory access latency, yes the latency is bad, but like current gen GPU, more threads means you can hide the memory access problem better. so that is not a big problem.
  8. Each character has 10k to 20k ploy, VS has 30k to 40k poly, background 500k. Normally there are around 160mb texture on the memory, 60mb - 80mb is background.
  9. Dynamic MSAA adjust, base on the scene FPS. i.e. < 30fps, 4AA -> 2AA.
  10. MT stands for Multi-Target, Multi-Threaded, Meta Tools engine.
  11. Dead Rising scenes render about 4M polys, and Lost Planet scenes max out at about 3M. This is because of the demanding number of particles needed for Lost Planet. They said that drawing bleeding zombies is technologically much easier than creating a rich organic world filled with smoke, fire, and snow.
  12. They are very proud of the techniques they been able to employ to get a tremendous amount of good looking particle effects on screen without causing slowdown. They said that utilizing the Xbox 360 EDRAM for certain screen effects gives them great speed without hurting frame rate. They said that this EDRAM, along with learning to properly use the multithreaded processors, are the two “tricks to making Xbox 360 games run well”.
  13. They also mention that although their MT engine is being launched with Dead Rising and Lost Planet on Xbox 360, next it will be used for the PS3 title Devil May Cry 4, and then they plan a Xbox 360/PS3 multiplatform game next, Resident Evil 5.

Edit: Here’s a more complete translation.

Switching out of Gear

Tuesday, January 30th, 2007

You’ve played too much modern shooters when you launch Unreal Tournament (the original, not some newfangled drive-fest) and:

* you constantly try to reload

* when you get hit, you instinctively try to hide to regenerate

* … and you keep wondering is it _really_ worth it to walk over the weird boxes with crosses on them

* somebody gets close, you try to melee him… where’s the melee button?

And, seriously, dude, where’s my textures? You call this high-resolution textures? Is that the Wii version or something?

50 Books For Everyone In the Game Industry

Wednesday, October 11th, 2006

For those with a bunch of Amazon gift certificates and a month to spare, here’s Next Generation’s list of 50 Books For Everyone In the Game Industry. I’ve read, what, four? Too bad I don’t have the month, or the gift certificates.

Single-channel DXT1 compression

Tuesday, October 3rd, 2006

I did some experiments recently with “fast” (runtime) compression of single-channel images into the green channel of DXT1 blocks (6 bits of
endpoint precision). We use the so-called splat mapping technique of terrain texturing, and the splat maps, which get regenerated every time something changes the terrain textures (e.g. your mouse moves in the editor, when the terrain texture brush is active), occupy 8 bits per pixel and are pretty large.

I tried the so-called “range fit“, where the extremal points of the
input luminance data are used for endpoints, which is very fast, but
produced 20 units of mean square error over my (admittedly unrealistic) purely random test image. (Here and below, the error unit = 1/255 of max possible value, so literally units in a 8-bit channel.) For comparison, the mean square error from simply encoding into 6 bits is around 2 units. What does “very fast” mean? Without any effort spent in making the code run faster, the straightforward range fit takes around 600 ns per DXT1 block, which would mean on the order of 40 ms for a 1024×1024 image, or 25 MB/sec, if that sounds more familiar to you. This probably can be improved at least twice, if you tinker with it. (All numbers taken on a 2 GHz Athlon64. Its official moniker is “X2 3800+”, but since my tests were single-threaded, the X2 doesn’t matter, and it probably would have been called “3000+” if it was a single-core CPU.)

I tried brute force search, where all 4096/2 combinations of endpoints
are used, which produced around 15 units of MSE, but was very, very

Next I tried brute forcing not the entire endpoint space, but just
“wobbling” the endpoints around the extremal points of the input
range; varying the amplitude of this wobbling exhibited the typical
“90% of the benefit for 10% of the effort” curve with the sweet spot
around 8 units of the 6-bit endpoint range producing 15-ish MSEs for
less than 10% of the brute force time.

Of course, you wouldn’t want to compress purely random data; on reasonable images, like a splatmap, or the luminance channel of a diffuse color texture, the numbers are much, much more reasonable: 2.25 units of MSE for 6-bit luminance; 4.6 units of MSE for range fit into DXT1, which is virtually instant; and 3.4 units of MSE for brute force DXT1 search, which takes forever. Again, 8 units of “wobbling” the endpoints gets you within 1% of the MSE for 10% of the running time of the brute force search.

Moral of the story: it’s perfectly reasonable to compress single-channel images into DXT1 at runtime using the range fit algorithm. If you need better results, you can look several steps around the range fit endpoints. And, by the way, the x64 versions ran about 5% faster.

Update: Range fit on the X2 3800+ reaches 75 MB/s, not 25.