The best type of float-to-int conversion

The topic of fast float-to-int conversion is one of the favorite among game developers, optimization freaks and other scum like that - so I was somewhat puzzled that I never encountered this problem in practice. Until today, that is.

We have some code which takes a bunch of mesh instances, transforms them and sticks them into a large vertex buffer in order to save draw calls. The source vertex data includes, among other things, a float4 containing in its xyz components a normal (range -1..1), which is transformed and compressed into the three of the four bytes of a uint32. We have a function called CompressNormalToD3DCOLOR, which takes a D3DXVECTOR4 and outputs a uint32, processing all four components. It was written long ago by the happy elves of the Land of Perfect Compilers. This is how it looked:

inline void CompressNormalToD3DCOLOR(const D3DXVECTOR4 & normal, uint32 & n )
В В В D3DXVECTOR4 norm( normal );
В В norm += D3DXVECTOR4( 1, 1, 1, 1 );
В В В norm *= 127.5f;

В В В uint32 r = uint32(norm.x);
В В В uint32 g = uint32(norm.y);
В В В uint32 b = uint32(norm.z);
В В В uint32 a = uint32(norm.w);

В В В n = (a<<24) + (r<<16) + (g<<8) + b;

Innocent enough?

In the land of Real Compilers, this straightforward piece of code compiles to 73 instructions, most of which are memory accesses. (VisualВ C++В from Visual Studio 2005).В В The function is called a zillion times, and in a certain worst case scenario which occurs on the 10th second after you run the game, it starts taking up to 25% of the frame time.

Some CPUs have a fscking instruction that does this, FFS.

My first reaction (after panic, which was the zeroeth) was to make a special-case function which only processes three of the components, since in this case we are sure we don’t need the fourth. At the cost of an additionalВ branch, half of the calls to the function were eliminated, all of which led to about a 3x reduction in the time taken just to compress normals.

I remembered I recently read something like an overview of the various float-to-int techniques on Mike Herf’s site. (Go there and read it. He’s the person behind the wonderfully smooth UI of Kai’s Power Tools, and 10 years later, Picasa.) I whipped up a simple synthetic benchmark with the original code, the three-component version, inline assembly FISTP, the double-magic-numbers trick, and the direct conversion trick (especially useful because in my case the normals are guaranteed to be in a certain range) - you can read the descriptions here. On my aging home Athlon 2 GHz, the original monstrosity takes about 200 clocks for one float4->uint32 conversion. The 3-component version is 136 clocks. Directly using FISTP via inline assembly, which requires manual setting of the FPU control word is only 28 clocks. The real superstars are the last two techniques, magic numbers which takes 22 clocks, and direct conversion, which only takes 20 clocks, a tenfold improvement of the original code!

Of course, all is not rosy. Wise men say FLD/FISTP doesn’t pipeline well. Magic numbers require that you keep your FPU in 53-bit precision mode - which a) isn’t a good idea and b) DirectX won’t honor. Direct conversion works if you can easily get your data in e.g. the [2, 4) range - notice how 4 is excluded: normals are [-1, 1], and it’s not trivial to get them into [-1, 1) without inflicting more clocks.

So, what turned out to be the best type of float-to-int conversion?

The one that takes zero clocks, of course. It turned out that most of the meshes which undergo this packing have fixed (0, 0, 1) normals in modelspace for all their vertices, which means the transformation and packing of their normals can happen per-instance, not per-vertex. Of course, I realized this only after spending an hour or so in reading Herf’s papers and benchmarking his suggested variants.

Well, at least I’ll be damn well prepared next time it comes up.

24 Responses to “The best type of float-to-int conversion”

  1. Range Of Data Conversion Says:

    Data Entry & Data Conversion India

    “One of the big advantages to outsourcing is flexibility–it can be a lot easier to cut back on a vendor than an employee. (Think of how you would feel if you had to tell an employee who is dependent on their job that you only need them half-time now….

  2. hollister deutschland Says:

    This is a great content, I’m so glad that I’ve found this high quality blog!

  3. abercrombie and fitch outlet uk Says:

    abercrombie and fitch store
    abercrombie and fitch online shop

  4. Microsoft Office Says:

    Great walk-through.

  5. Rolex Watches Says:

    Thanks for sharing this nice post.I will keep your article in my idea.

  6. 640-802 Says:

    lot easier to cut back on a vendor than an employee. (Think of how you would feel if you had to tell an employee who is dependent on their job that you only need them half-time now….


  7. SY0-301 Says:

    ut back on a vendor than an employee. (Think of how you would feel if you had to tell an employee who is dependent on their job that yo

  8. VCP-510 Says:

    back on a vendor than an employee. (Think of how you would feel if you had to tell an employee who is dependent on their job that you only need them half-time now….

    hollister deutsch

  9. BMW GT1 Says:

    Very delighted to set up your article, I genuinely an fantastic provide to like and concur jointly jointly with your place of view.

  10. abercrombie and fitch Says:

    one in the functional scenery of folks overcoats and additionally hooded sweatshirts is that often you are able to apply for wonderful deals within your on line arrangement. Immediately cheap abercrombie applications are certainly clever, or a discount preceding it will eventually simply just entice you to ultimately select the very same. And also hardwearing .

  11. five fingers shoes Says:

    vibram five fingers refers to a sports shoe usually made of canvas and having soft rubber soles. As for the definition of brand sports shoes, it is so easy to understand that they are more popular and more accepted sneakers shoes. Nowadays, the top 10 brand shoes include the following brands: Nike Sneakers Shoes, Converse Sneakers Shoes, Adidas Sneakers Shoes, five fingers shoes, Vans Sneakers Shoes, DC Sneakers Shoes, Reebok Sneakers Shoes, Marc Ecko Sneakers Shoes, New Balance Sneakers Shoes and Royal Sneakers Shoes. Of course, the popularity and status of them are ever changing all the time.
    And now vibram five fingers sale store recommend 5 fingers shoes online cheaper sale.

  12. abercrombie and fitch uk sale Says:

    it’s a good post,like abercrombie and fitch uk sale very much

  13. abercrombie and fitch uk Says:

    And now, we have a clothesline for children far too. The children’s garments assortment comes with knitwear, trousers, skirts, and footwear. Currently, abercrombie and fitch uk appear in virtually any locations around the world. Inside the You.Ohydrates. on your own, you will find a minumum of one Abercrombie&Fitch save in lots of towns and cities. Its keep are usually 3 times as much internet vendors carrying their very own primary clothing available for any busy home buyers.

  14. wicker rocking chairs Says:

    Straining his powers of listening to the utmost, he listened for any sound that might denote suspicion or alarm. There was none. Keys turned, doors clashed, footsteps passed wicker rocking chairs, no cry was raised, or hurry made, that seemed unusual. Breathing more freely in a little while, he sat down at the table, and listened again until the clock struck Two. Sounds that he was not afraid of , for he divined their meaning, then began to be audible. Several doors were opened in succession, and finally his own.

  15. My Search Results Says:

    My search result is the catalog of web pages returned via a search engine in reply to a keyword query. The outcome usually contain a list of web pages with titles, a link to the page, and a brief explanation

  16. Discount Christian Louboutin Says:

    Discount Christian Louboutin Saleand Missouri rivers they have small villages,

  17. Peuterey Outlet Says:

    Seen in black color made Peuterey Outlet jackets are manufactured with high quality fabrics that make the jackets very durable for wearing. With fur hooded made Peuterey Online jackets would keep you warm as well ad with great windproof.

  18. peugeot lexia Says:

    lexia 3,peugeot planet 2000,lexia 3 pp2000,lexia 3 citroen peugeot

  19. ralph lauren outlet Says:

    Many people like to shop online today.When they surf in the Internet,they will look for some special clothes to order.Young people like to shop from abercrombie outlet abercrombie outlet to find their favorite things.Some elder people like to shop from ralph lauren outlet ralph lauren outlet to buy ralph lauren polos online.

  20. Air Conditioning Compressor Says:

    Thanks for sharing this nice post.I will keep your article in my idea.

  21. Khamos Says:

    Creation a belief to teaching theory connected is no lengthier tranquil. Standard Help has an low-cost of loyalty. It do my essay gains so uncountable usual patrons that it was pleased to behavior the rate yonder blue than any other deep in the souk. See for completely and termination with us.!

  22. Says:

    precious data. fortuitous me I discovered your website accidental, and I’m alarmed why this misfortune didn’t took location previous! I bookmarked it

  23. smartphone Says:

    The smartphone segment is facing heavy competition in India which is really leading to better prices for consumers. smartphone offers wide variety of mobiles in different category for Indian users.

  24. Bashar Says:

    Truly a great exertion is here. This is the article to which I was incisive for a long time and now I found it here, I enjoy after finding this precious column and also would like to say thanks to you for such a revealing post. You have Carpet Cleaners done a very striking work.

Leave a Reply