Update 69

The level 4 is finished, I am now working on the last level. This level is the ending of the game so it’s very short gameplay wise and they are a lot of dialogs.

Last week I have done an explosion with the particles system, this explosion has 100000 particles. At first, 100000 was too much to handle for the particle system.

After a lot of profiling, the engine I have found some interesting things:

rand() is too slow, so replace it with a xorshift generator, in the game engine I use xorshit128:

static uint32_t xor128x = 123456789;
static uint32_t xor128y = 362436069;
static uint32_t xor128z = 521288629;
static uint32_t xor128w = 88675123;
 
inline uint32_t xor128(void) {
	uint32_t t;
	t = xor128x ^ (xor128x << 11);
	xor128x = xor128y; 
	xor128y = xor128z; xor128z = xor128w;
	return xor128w = xor128w ^ (xor128w >> 19) ^ (t ^ (t >> 8));
}

Do not use std::vector to store the particles, push_back() and erase(iterator) are too slow. If you do you will see a lot CPU spent on memmove. A dynamically allocated array is way faster, only the initialization take time.

Use batching to reduce the number of draw call, with SDL_gpu the best way is to draw a lot of primitives is to use GPU_TriangleBatch. It adds a lot of code but you can’t have something faster than that.

unsigned short indices[3] = {0, 1, 2};  // These are references to the vertex array's vertices, 3 for a triangle
float values[8 * 3];  // Each vertex will need x, y, s, t, r, g, b, a (8 floats per vertex)
values[0] = x1;  // Position
values[1] = y1;
values[2] = s1;  // Texture coordinates (normalized)
values[3] = t1;
values[4] = r1;  // Color (normalized)
values[5] = g1;
values[6] = b1;
values[7] = a1;
values[8] = x2;  // Position
values[9] = y2;
values[10] = s2;  // Texture coordinates (normalized)
values[11] = t2;
values[12] = r2;  // Color (normalized)
values[13] = g2;
values[14] = b2;
values[15] = a2;
values[16] = x3;  // Position
values[17] = y3;
values[18] = s3;  // Texture coordinates (normalized)
values[19] = t3;
values[20] = r3;  // Color (normalized)
values[21] = g3;
values[22] = b3;
values[23] = a3;

GPU_TriangleBatch(image, target, 3, values, num_indices, indices, GPU_BATCH_XY_ST_RGBA);

With SDL you will need to use OpenGL VBO to achieve efficient batching.

Leave a Reply

Your email address will not be published. Required fields are marked *