Last week I tried to make rendering code optimisation in order to speed up the frametime.
Until now, I didn't bother too much about optimising stuff; some optimisation techniques were implemented but were very somewhat buggy.
The main reason for such bugged code is that we switched from pure 2D/orthographic to 3D/perspective way of thinking the rendering and the game.
The main thing to do when optimizing rendering is trying to render only what will be visible on screen. It seems logical but it’s not automatically done by a graphic card… you have to do it by yourself on CPU side and only send minimum stuff to the drawing. The process is called culling.
To achieve such an optimisation, you can rely on the frustum. Frustum is a truncated pyramid shape which defines what the camera will actually project on screen. Objects partially or entirely contained in the frustum will be potentially visible on screen (potentially because some other objects may occlude other ones).
As you have guessed, the idea is to discard invisible objects from rendering thanks to frustum. Here is some pseudo code to illustrate:
foreach(object in Level List Of Objects)
if(object is overlapping camera Frustum)
render object (or update object,
or make things with the object)
Here is visual example of a scene containing 25 objects, view from the top.
Only 3 of the 25 tested objects will be sent to the rendering in this example.
With this technique, we gain a lot of time by avoiding some useless drawing/updating/whatever-ing… but we still go through ALL objects in the scene to be able to know what to do with them… If there are several objects, it will take a LONG time… and in video game we do not want to loose time doing stuff that can be avoided. The goal is to take something like 16ms to build a full frame in order to have a decent framerate.
To avoid going through all objects, we can make groups of objects spatially placed in the same area.
>> Grid culling
There is a bunch of ways for elaborating structures to organize objects between themselves. We can mention BSP, or Portals, or Quadtree and Octree… but we don’t use any of them. We use a pretty simple and funky 3D grid.
Because Seasons is not a real 3D game, we don’t need such “complicated” systems. We just need to cut off some objects from the previous pseudo code loop, as fast as we can. And maintaining a more complex system than a simple regular 3D grid will unbalance the gain of the objects culled during the process.
We can see that 2 cells are visible from the camera point of view.
Here is the new pseudo code; it’s a little bit more complicated because it’s in 2 parts.
The first part is when we add a new object in the scene:
unregister object from the 3D Grid
(if it was registered)
find grid cells that overlap with the object
(easy thing to do since we got a regular grid)
register the object in those cells
(in the best case there is just one cell found)
And here is the equivalent of the previous pseudo code (i.e the moment where determine what stuff is visible through camera’ eyes).
determine what are the cells around the camera
(easy thing to do since we have a regular grid)
foreach(cell in cells around camera)
foreach(object registered in the cell)
if(object has not been treated yet)
if(object overlaps camera frustum)
(or update object,
or make things with the object)
mark the object as treated
(to be not treated again in case
it was overlapping several cells)
Here is a visual illustration of the grid combined with objects.
2 cells are visible (inside the frustum), and 7 objects are tested against the frustum (instead of 25 previously).
The process needs to be tweaked cause it’s important to take care about the size of the cells composing the grid.
If the cells are too small there will be few objects registered in them and in extreme case there will be only one object per cell, and grid will not help us at all. At the opposite, if cells are too big, a lot of objects will be registered and tested again the camera frustum, that situation is better than the previous case, but we will have to cull object against frustum a lot.
The best way to tweak this kind of things is to build level without thinking about optimisation and find a good compromise when we want to optimise things (or maybe I just search excuses to not optimize from the very beginning).
When culling things (object against frustum or cell grid against frustum), we need to know what is the size of a given item to cull. Always in order to be as fast as possible, we do not use item meshes or the real objects bounding box (OOBB), but a simpler kind of bounding box: the Axis Aligned Bounding Box (AABB).
Here is an example of the evolution of an AABB vs. OOBB with some cat sprite.
Now you see rotating cat without taking drugs (top AABB, bottom OOBB)
Why using AABB and not OOBB? Because it is super easy to know if 2 AABB are overlapping. Here is a scheme of different situations between 2 AABB.
As you can see, there is a lot of configurations and only a few are represented here… but there is an algorithm that takes always the same computation speed to determine if 2 AABB are overlapping. Here are the same configurations shown above but with 2 vectors drawn on each configuration:
The vectors always link the bottom left corner of an AABB with the top right corner of the other AABB.
Now it’s easy to say that 2 AABB are overlapping of the 2 vectors are pointing a North East direction! We can also accept vectors pointing only East (horizontal vectors) and North (vertical vectors).
You can test on a sheet of paper with all the other configurations possible, it will still work!
>> Results in game
Here are some realtime rendering done in the engine on a given level.
You will see red lines which are the objects AABBs, the unique blue AABB is the AABB of the frustum at Z = 0, and the green AABBs are the grid cells AABBs.
From the framerate point of view, here are numbers for this level (little level):
- 48 fps without culling optimization
- 55 fps when only cull objects without use of a grid
- 60 fps when using a grid (the engine limits framerate to 60 fps max)
The game must run on my poor computer (pentium4 C 2.8Ghz and GeForce 7600gt AGP) and maximize the compatibility with various configurations.
>> To conclude
This was the very first pass of optimization for the game to allow it running on modest configurations. During a next optimization article, we will surely talk about batching, use dynamic buffer vs. static ones, and maybe instancing too.
But for now, let's go back to gameplay (and bugfix)!