Skip to content

Fix acks locking up after packet gets unacknowledged (Chatbug no. 2)

Hanicef requested to merge Hanicef/SRB2Classic:fix-unack-lockup into next
srb2's tech debt is as high as the us's debt I swear
 - AirFox, 2024

So, after months of searching, spending countless hours scourging the code and using multiple tools to dissect and diagnose SRB2's networking behavior, I have finally fixed it.

Another chatbug has been fixed.

I don't know if this is the last one, or how many there are left if it isn't, but I can confirm after testing that this fixes another one of these bugs. The core problem here is how packets are processed by the server, where the fact that all acks are expected to have a response. This is normal, since without it, we can't have reliable packets in netcode which would make the netcode ridiculously fragile, so this is a good thing.

However, this is only under the assumption that the server actually acknowledges them. The problem with Net_UnAcknowledgePacket is that it actually removes the ack from the packet list, causing the server to ignore that ack. The server will still send an ackreturn, but it will only be able to return properly if the next packet actually reaches the client before another reliable packet is sent. This is where PT_TextCmd broke pretty much everything: it used that function to erase the ack from the server, so if the client manages to send another packet before that, it will register a new ack over the old ackreturn. In response, the server will think the old ack was responded to and respond to the new one instead, but the client still hasn't gotten it's previous ack responded to. The result is that it will think that the old ack is not replied to, and thus stall other reliable packets until it gets a response - a response the server will never send since it just erased the old ack.

And looking at the code, it just looks flat out wrong, since no other packet type behaves this way. All other packet handlers either drop the connection with a synth failure or just returns from the function, causing it to be acked as usual. Why Net_UnAcknowledgePacket even exist is beyond me, but as you'd expect, it is causing issues in this case, and it raises a pretty important question: how do we even get into this situation in the first place? For a packet to be unacknowledged, it must be invalid in some way, and that shouldn't be able to happen under normal circumstances, right?

Well, there is one case that can happen under normal circumstances: Textcmd too long. This error is the result if a NetXCmd buffer is exhausted on specifically the server, which can happen with enough players. I reproduced this by intentionally crafting multiple invalid TextCmd packets and sending them in the same tic, but this can happen under normal circumstances as long as you have enough players on a server. It is also possible that there might be another bug somewhere else in the code that causes bad packets to be sent, but either way, the cause and result is the same.

Here's how you can reproduce it yourself locally:

  1. Create a netem network environment on localhost, with packet loss configured (can be done with sudo tc qdisc add dev lo root netem drop 20%).
  2. Run the provided chatbug command; you can increase it's chances by invoking it multiple times on the same line using ;.
  3. Spam any command that uses NetXCmd, preferrably a Lua function of some form.
  4. If you don't get chatbugged, go back to step 2 and repeat the following steps.
  5. Remove the netem environment once you're done with sudo tc qdisc del dev lo root netem so your internet connection goes back to normal.

I recommend only copying Command_Chatbug and related things first so you can reproduce it locally before applying the entire patch to verify it's fixed. I will also keep a server running for a few hours with this patch applied on 2.2.14pre4 for anyone who wants to join in for some field testing - don't hesitate to join if you want to help me test it out, and if you still chatbug, please report back to me!

Merge request reports

Loading