Quick and Dirty Torque X 2D Optimization Tips: T2DTileLayer

With the content cleared of the easy performance problems, it was time to settle into some serious code profiling. And so, I rolled up my sleeves, ran Legend of the Rune Lords through Torque 2D X’s profiler and got a little something like this:

LotRL T2DTileLayer Performance Problem

What immediately jumped out was seeing that I was spending over half of my sampling time in the Render() function for T2DTileLayer. Something was up and it was time to figure out what it was and find a way to fix it.

With my target area narrowed down, I dove into T2DTileLayer’s Render() function and found out where exactly I was spending most of my time.
The T2DTileLayer.Render() looks a little something like this, with omitted code encased in braces:

public override void Render(SceneRenderState srs)
        {
            Assert.Fatal(_renderedTileTypes.Count == 0, "previous render tiles not cleared");

            Profiler.Instance.StartBlock("T2DTileLayer.Render");

            //[Create and Fill Vertex Buffer if it doesn't already exist]

            //[Calculate values global to the Render() function like the tilemap-to-world matrix]

            //[Create as list of tiles from this layer that we actually need to render]

            if (tileCount != 0)
            {
                //[Check index buffer validity, create it if necessary]

                int[] indexScratch = TorqueUtil.GetScratchArray<int>(tileCount * 6);

                //[Calculate values to be used when setting up RenderInstances. The world transformation matrix, for example]
                for (int i = 0; i < _renderedTileTypes.Count; i++)
                {
                    T2DTileType tileType = _renderedTileTypes[i];

                    RenderInstance ri = SceneRenderer.RenderManager.AllocateInstance();

                    //[Fill RenderInstance ri with values appropriate for the set of tiles to render]

                    // fill up index buffer
                    for (int j = 0; j < tileType._renderTiles.Count; j++)
                    {
                        int tileIdx = tileType._renderTiles[j];
                        indexScratch[idx++] = 4 * tileIdx + 0;
                        indexScratch[idx++] = 4 * tileIdx + 1;
                        indexScratch[idx++] = 4 * tileIdx + 2;
                        indexScratch[idx++] = 4 * tileIdx + 0;
                        indexScratch[idx++] = 4 * tileIdx + 2;
                        indexScratch[idx++] = 4 * tileIdx + 3;
                    }
                    tileType._renderTiles.Clear();

                    SceneRenderer.RenderManager.AddInstance(ri);
                }
                //[Clear data structures about tiles to render for use again next frame]

                // set index data
                GFXDevice.Instance.Device.Indices = null;
                _indexBuffer.Instance.SetData<int>(indexScratch, 0, idx);
            }

            //[Render collision bounds if they are enabled for display (debug feature)]

            Profiler.Instance.EndBlock("T2DTileLayer.Render");
        }

As you may guess from the code I haven’t omitted, most of my time was setting up the index buffer for tile rendering. More specifically, I was spending most of my time setting data to the actual graphical resource used for rendering with following code:

// set index data
GFXDevice.Instance.Device.Indices = null;
_indexBuffer.Instance.SetData<int>(indexScratch, 0, idx);

Now, here’s the rub: during Legend of the Rune Lord’s battles, since the background is static, the tile layer index buffer should never need to change once it’s been set. Spending half of my rendering time setting data that never changes is a terrible waste of resources, but it suggests a simple solution. I added a bit of code to omit the SetData call unless the contents of the tile layer actually changed.

My quick hack went a little something like so:

public override void Render(SceneRenderState srs)
        {
            Assert.Fatal(_renderedTileTypes.Count == 0, "previous render tiles not cleared");

            Profiler.Instance.StartBlock("T2DTileLayer.Render");

            //[Create and Fill Vertex Buffer if it doesn't already exist]

            //[Calculate values global to the Render() function like the tilemap-to-world matrix]

            //[Create as list of tiles from this layer that we actually need to render]

            if (tileCount != 0)
            {
                bool bIndexBufferChanged = false;         //ADDED: Boolean to track where index buffer has changed

                //[Check index buffer validity, create it if necessary]

                int[] indexScratch = TorqueUtil.GetScratchArray<int>(tileCount * 6);

                //[Calculate values to be used when setting up RenderInstances. The world transformation matrix, for example]
                for (int i = 0; i < _renderedTileTypes.Count; i++)
                {
                    T2DTileType tileType = _renderedTileTypes[i];

                    RenderInstance ri = SceneRenderer.RenderManager.AllocateInstance();

                    //[Fill RenderInstance ri with values appropriate for the set of tiles to render]

                    // fill up index buffer
                    for (int j = 0; j < tileType._renderTiles.Count; j++)
                    {
                        //MODIFIED: Index buffer setting also checks for changes in value versus previous frame
                        int tileIdx = 4 * tileType._renderTiles[j];
                        bIndexBufferChanged |= _HasIndexBufferChangedAtIndex(idx, tileIdx + 0);
                        indexScratch[idx++] = tileIdx + 0;

                        bIndexBufferChanged |= _HasIndexBufferChangedAtIndex(idx, tileIdx + 1);
                        indexScratch[idx++] = tileIdx + 1;

                        bIndexBufferChanged |= _HasIndexBufferChangedAtIndex(idx, tileIdx + 2);
                        indexScratch[idx++] = tileIdx + 2;

                        bIndexBufferChanged |= _HasIndexBufferChangedAtIndex(idx, tileIdx + 0);
                        indexScratch[idx++] = tileIdx + 0;

                        bIndexBufferChanged |= _HasIndexBufferChangedAtIndex(idx, tileIdx + 2);
                        indexScratch[idx++] = tileIdx + 2;

                        bIndexBufferChanged |= _HasIndexBufferChangedAtIndex(idx, tileIdx + 3);
                        indexScratch[idx++] = tileIdx + 3;
                    }
                    tileType._renderTiles.Clear();

                    SceneRenderer.RenderManager.AddInstance(ri);
                }
                //[Clear data structures about tiles to render for use again next frame]

                //ADDED: Check for index buffer change before setting actual data and log index buffer data that was used for future reference
                if (bIndexBufferChanged)
                {
                    // set index data
                    GFXDevice.Instance.Device.Indices = null;
                    _indexBuffer.Instance.SetData<int>(indexScratch, 0, idx);
                    _indexScratchCache = indexScratch;
                }            }

            //[Render collision bounds if they are enabled for display (debug feature)]

            Profiler.Instance.EndBlock("T2DTileLayer.Render");
        }

        //ADDED: Function for performing actual comparison of a value against a value in the previous frame's index buffer
        private bool _HasIndexBufferChangedAtIndex(int idx, int value)
        {
            return (_indexScratchCache == null || idx >= _indexScratchCache.Length) ? true : _indexScratchCache[idx] != value;
        }

        //ADDED: Reference to the data used last time index buffer was set
        private int[] _indexScratchCache;

The biggest additions to the code are some logic to keep a reference to the last indexBuffer that was actually set for this particular tile layer and logic to compare the values being used when constructing a new index buffer. If the new buffer isn’t any different from the old buffer, we don’t actually set the buffer and save ourselves the potentially long wait associated with updating graphical resources.

It’s not a particularly clean or elegant solution to the problem. In fact, it’s almost as brute force as the original implementation. I could probably work out a more efficient solution by further studying how the tile rendering is implemented in Torque X 2D. This trick, however, is enough to take out the biggest bottleneck in T2DTileLayer’s Render() function and net me big performance boost.

LotRL T2DAnimatedSprite Performance Problem

With the tile layer rendering out of the way, we see the next suspect in the Case of the Missing Frame Rate: T2DAnimatedSprite and its accomplice Render() function. More on that guy next time.

Check here for more articles in this series: Game Development
  • Language

  • Sponsored Links

  • Rob's Portfolio

    Games Writing Contributions

No comments yet.

Leave a comment