Last we left our mad hacker, he was staring at a screen like this one:
The last optimization worked to get T2DTileLayer out of the running for top framerate sink. Now T2DAnimatedSprite had risen up to claim the crown. A bit of investigation pinned vertex buffer updates as the slowest operation associated with rendering T2DAnimatedSprites. Similar to the index buffer updating that slowed T2DTileLayer rendering down, locking the actual graphical resource data for updates could be potentially very slow on the Xbox 360.
Unlike T2DTileLayer, however, simply skipping superfluous updates wouldn’t work since, for the most part, T2DAnimatedSprite was already, via T2DQuad, skipping updates that shouldn’t change the vertex buffer (it could do with being a little pickier about when actually perform the vertex buffer refill, but that’s a story for another day.) The big problem for T2DAnimatedSprite is that the main feature that makes it so useful, the ability to change the part of a texture displayed based on animation data, causes it to lock and update the texture coordinates in the vertex buffer every time a frame of animation changes.
The key to optimizing T2DAnimatedSprite is to make it able to update texture sampling coordinates without locking the vertex buffer, thereby making the updates a whole lot cheaper.
Using the UV coordinates of a vertex buffer to define the portion of a texture used when drawing a quad is natural. When a quad needs to display a different part of a texture, it’s straightforward and easy to change the UV coordinates in the vertex buffer to make this happen. It’s also really slow and if, like T2DAnimatedSprite, you need to do it often, it can tank your framerate very quickly.
Luckily, there are other ways to access texture coordinates. With vertex and pixel shaders, there’s a nearly infinite number of ways to address a texture. Since animating across a series of images on a single texture essentially boils down to tweaking the UV coordinates of a vertex, an implementation that takes care of the texture animation in a vertex shader instead of using UV coordinates of the vertex directly seems like a promising way to avoid expensive vertex buffer locks.
To do so, I created a vertex shader, basically a copy of SimpleVS with some extra logic to calculate the output texture coordinate by adding a float2 called uvDisplacement to the base coordinate passed in from the vertex buffer. I then modified T2DQuad to calculate the UV displacement value whenever its TextureCoordinates were set (usually by T2DAnimatedSprite.) Basically, the first time T2DQuad gets texture coordinates, it sets the base texture coordinates in the vertex buffer (incurring a vertex buffer lock.) After that, it calculates the UV displacement based on the difference between the texture coordinates it receives and the base values already set in the vertex buffer. After that, I just needed to ferry the UV displacement value from T2DQuad to the vertex shader via new data and logic in RenderInstance and SimpleMaterial, respectively. A link to the source code for this change is available at the end of this article.
With the optimization implemented, T2DAnimatedSprite loses its spot on the chart. T2DStaticSprite is still eating up a bit more time than it probably deserves doing redundant buffer locks, but my frame rate is now solid at 60fps.
No rest for the wicked, it seems…
Click here to download the source code (as an SVN patch) for this change (implemented on top of the 3.1.0 version of Torque X 2D.) The code changes for the optimization are bracketed with #if USE_UV_DISPLACEMENT preprocessor directives to allow for easy toggling. Just add USE_UV_DISPLACEMENT to the conditonal compilation symbols for TorqueCore and Torque2D to put the optimization into action.