Late Input Composition

I have been working on my game engine and I've started researching frame timing strategies. The current big thing here is Nvidia Reflex, and it's a good solution for GPU-bound games. But I am making a 2D game; those are primarily CPU-bound. I went down the rabbit hole and found the ideal solution for my game. I call it Late Input Composition.

Once upon a time there was screen tearing

We all know VSync, right? The thing that prevents screen tearing.

Screen tearing occurs when the frame rate (rendered by the GPU) does not match the refresh rate of the monitor. VSync adds a delay after the frame is rendered to wait for the monitor refresh. This gets the frame rate of the GPU to match the refresh rate of the monitor.

VSync rightfully has a bad reputation because it comes at a cost.

VSync adds a delay. This means that the frame that is drawn to the screen was made with old data. Players perceive this as input lag.

VRR (variable refresh rate)

We have heard of VRR under different names. G-Sync, FreeSync, etc.

What it comes down to is that it makes the monitor the slave of the GPU. VRR monitors can draw frames at fluctuating rates. This allows the GPU to render at its pace and forces the monitor to adjust to the speed of the GPU.

VRR is a good solution but not the silver bullet. It has technical downsides like brightness and gamma flickering, overdrive and ghosting, and LFC stutter (Low Framerate Compensation).

The main reason I am not considering VRR for my solution is that not every gamer has a VRR-capable monitor. Especially for my target audience with slower-paced games, it is less common than in competitive games.

Reflex

Reflex is the current industry darling for latency reduction, and for good reason. It works by eliminating the "render queue."

In modern high-fidelity 3D games, the CPU is often much faster than the GPU. The CPU prepares frame instructions and lines them up in a buffer (the queue) for the GPU to process. While this keeps the GPU fed, it creates input lag because your latest inputs are stuck in line behind old frames.

Reflex solves this by synchronizing the CPU to the GPU. It forces the CPU to wait and only sample input right before the GPU is ready to receive commands. It effectively empties the waiting line.

But this relies on one major assumption: that the GPU is the bottleneck.

In my case (and for most 2D games), we are CPU-bound. My GPU is powerful; it renders my 2D sprites instantly and then sits idle, waiting for the CPU to finish the game logic for the next frame. There is no render queue building up because the GPU is eating the frames faster than the CPU can cook them.

Since there is no queue to eliminate, enabling Reflex in a CPU-bound scenario does absolutely nothing. We need a different strategy.

Late Input Composition

Late Input Composition is a strategy that works similarly to Reflex but for CPU-bound games.

We mainly target the perceived input latency. We won't calculate frames faster, but we reduce the input lag by shifting things around.

Think about this: How do you notice the input lag?

You notice it when you press a button but your player character waits a few milliseconds before it responds.

You don't notice any lag when it comes to the environment or NPCs moving around. You can't notice because you don't know when the AI decided to throw a fireball at you. You only notice when you see the fireball on the screen.

Normally we have two parts:

Reading inputs, calculating physics, animations, AI, etc.
Rendering the frame.

Why not split this up?

1. Calculate physics, animations, AI, etc., but nothing that is related to the player character.
2. Render the frame.
1. Read inputs.
2. Calculate physics, animations, etc., but only for the player character.
3. Overdraw the frame or fully redraw it.

The point is that what creates the perceived input lag is the disconnect between user input and player character actions. By delaying everything that is related to the user input, we can reduce this lag.

But this is not the end of the story.

This alone is irrelevant. We have reduced the input lag by a few milliseconds, but we are still stuck with the delay from VSync.

Just-In-Time Frame Pacing

Let's take a look at VSync again. It's nearly perfect actually. We just need to invert it.

VSync is pretty simple. We do all of our calculations and rendering, then we wait for the monitor to be ready.

But what if we could wait first, then do all of our calculations? By the time our frame is ready to be displayed, the monitor is ready as well. We get perfect synchronization and get low input latency because the input is read as late as possible.

In theory, this is the best we can do.

But in practice, it's not possible to get the timing this perfect.

The problem is that not every frame takes the same amount of time to be calculated.

Even if every frame takes the same amount of time to be calculated, the OS might schedule other processes in the meantime, resulting in fluctuations in the frame timing.

If we wait for too long before starting the frame calculations, we will miss the monitor refresh and need to wait a whole frame.

We need to do an approximation.