The level 4 is finished, I am now working on the last level. This level is the ending of the game so it’s very short gameplay wise and they are a lot of dialogs.
Last week I have done an explosion with the particles system, this explosion has 100000 particles. At first, 100000 was too much to handle for the particle system.
After a lot of profiling, the engine I have found some interesting things:
rand() is too slow, so replace it with a xorshift generator, in the game engine I use xorshit128:
static uint32_t xor128x = 123456789; static uint32_t xor128y = 362436069; static uint32_t xor128z = 521288629; static uint32_t xor128w = 88675123; inline uint32_t xor128(void) { uint32_t t; t = xor128x ^ (xor128x << 11); xor128x = xor128y; xor128y = xor128z; xor128z = xor128w; return xor128w = xor128w ^ (xor128w >> 19) ^ (t ^ (t >> 8)); }
Do not use std::vector to store the particles, push_back() and erase(iterator) are too slow. If you do you will see a lot CPU spent on memmove. A dynamically allocated array is way faster, only the initialization take time.
Use batching to reduce the number of draw call, with SDL_gpu the best way is to draw a lot of primitives is to use GPU_TriangleBatch. It adds a lot of code but you can’t have something faster than that.
unsigned short indices[3] = {0, 1, 2}; // These are references to the vertex array's vertices, 3 for a triangle
float values[8 * 3]; // Each vertex will need x, y, s, t, r, g, b, a (8 floats per vertex)
values[0] = x1; // Position
values[1] = y1;
values[2] = s1; // Texture coordinates (normalized)
values[3] = t1;
values[4] = r1; // Color (normalized)
values[5] = g1;
values[6] = b1;
values[7] = a1;
values[8] = x2; // Position
values[9] = y2;
values[10] = s2; // Texture coordinates (normalized)
values[11] = t2;
values[12] = r2; // Color (normalized)
values[13] = g2;
values[14] = b2;
values[15] = a2;
values[16] = x3; // Position
values[17] = y3;
values[18] = s3; // Texture coordinates (normalized)
values[19] = t3;
values[20] = r3; // Color (normalized)
values[21] = g3;
values[22] = b3;
values[23] = a3;
GPU_TriangleBatch(image, target, 3, values, num_indices, indices, GPU_BATCH_XY_ST_RGBA);
With SDL you will need to use OpenGL VBO to achieve efficient batching.