How I cut GTA Online loading times by 70%

comboy · on Feb 28, 2021

Holy cow, I'm a very casual gamer, I was excited about the game but when it came out I decided I don't want to wait that long and I'll wait until they sort it out. 2 years later it still sucked. So I abandoned it. But.. this... ?! This is unbelievable. I'm certain that many people left this game because of the waiting time. Then there are man-years wasted (in a way different than desired).

Parsing JSON?! I thought it was some network game logic finding session magic. If this is true that's the biggest WTF I saw in the last few years and we've just finished 2020.

Stunning work just having binary at hand. But how could R* not do this? GTAV is so full of great engineering. But if it was a CPU bottleneck then who works there that wouldn't just be irked to try to nail it? I mean it seems like a natural thing to try to understand what's going on inside when time is much higher than expected even in the case where performance is not crucial. It was crucial here. Almost directly translates to profits. Unbelievable.

dan-robertson · on Feb 28, 2021

I don’t think the lesson here is “be careful when parsing json” so much as it’s “stop writing quadratic code.” The json quadratic algorithm was subtle. I think most people’s mental model of sscanf is that it would be linear in the number of bytes it scans, not that it would be linear in the length of the input. With smaller test data this may have been harder to catch. The linear search was also an example of bad quadratic code that works fine for small inputs.

Some useful lessons might be:

- try to make test more like prod.

- actually measure performance and try to improve it

- it’s very easy to write accidentally quadratic code and the canonical example is this sort of triangular computation where you do some linear amount of work processing all the finished or remaining items on each item you process.

As I read the article, my guess was that it was some terrible synchronisation bug (eg download a bit of data -> hand off to two sub tasks in parallel -> each tries to take out the same lock on something (eg some shared data or worse, a hash bucket but your hash function is really bad so collisions are frequent) -> one process takes a while doing something, the other doesn’t take long but more data can’t be downloaded until it’s done -> the slow process consistently wins the race on some machines -> downloads get blocked and only 1 cpu is being used)

acdha · on March 1, 2021

> actually measure performance and try to improve it

This really rings truest to me: I find it hard to believe nobody ever plays their own game but I’d easily believe that the internal culture doesn’t encourage anyone to do something about it. It’s pretty easy to imagine a hostile dev-QA relationship or management keeping everyone busy enough that it’s been in the backlog since it’s not causing crashes. After all, if you cut “overhead” enough you might turn a $1B game into a $1.5B one, right?

Jach · on March 1, 2021

Lots of possibilities. Another one I imagined is that "only the senior devs know how to use a profiler, and they're stuck in meetings all the time."

acdha · on March 1, 2021

I could easily imagine variations of that where this is in maintenance mode with a couple of junior programmers because the senior ones either burnt out or moved on to another project. I’ve definitely gotten the impression that most games studios have roughly the same attitude towards their employees as a strip-miner has towards an Appalachian hilltop.

disgruntledphd2 · on March 1, 2021

If this were anyone else but Rockstar, I'd agree with you.

But Rockstar essentially only have GTA and Red Dead to take care of, it's not like they're making an annual title or something :)

acdha · on March 1, 2021

True, but they could still be understaffing and have their senior people working on the next big version rather than maintenance. It’s definitely penny wise, pound foolish no matter the exact details.

Nitramp · on Feb 28, 2021

- do not implement your own JSON parser (I mean, really?).

- if you do write a parser, do not use scanf (which is complex and subtle) for parsing, write a plain loop that dispatches on characters in a switch. But really, don't.

dan-robertson · on March 1, 2021

I think sscanf is subtle because when you think about what it does (for a given format string), it’s reasonably straightforward. The code in question did sscanf("%d", ...), which you read as “parse the digits at the start of the string into a number,” which is obviously linear. The subtlety is that sscanf doesn’t do what you expect. I think that “don’t use library functions that don’t do what you expect” is impossible advice.

I don’t use my own json parser but I nearly do. If this were some custom format rather than json and the parser still used sscanf, the bug would still happen. So I think json is somewhat orthogonal to the matter.

Nitramp · on March 1, 2021

> The code in question did sscanf("%d", ...), which you read as “parse the digits at the start of the string into a number,” which is obviously linear.

I think part of the problem is that scanf has a very broad API and many features via its format string argument. I assume that's where the slowdown comes from here - scanf needs to implement a ton of features, some of which need the input length, and the implementor expected it to be run on short strings.

> The subtlety is that sscanf doesn’t do what you expect. I think that “don’t use library functions that don’t do what you expect” is impossible advice.

I don't know, at face value it seems reasonable to expect programmers to carefully check whether the library function they use does what they want it to do? How would you otherwise ever be sure what your program does?

There might be an issue that scanf doesn't document it's performance well. But using a more appropriate and tighter function (atoi?) would have avoided the issue as well.

Or, you know, don't implement your own parser. JSON is deceptively simple, but there's still enough subtlety to screw things up, qed.

dan-robertson · on March 1, 2021

But sscanf does do what they want it to do by parsing numbers. The problem is that it also calls strlen. I’m still not convinced that it’s realistically possible to have people very carefully understand the performance characteristics of every function they use.

Every programmer I know thinks about performance of functions either by thinking about what the function is doing and guessing linear/constant, or by knowing what the data structure is and guessing (eg if you know you’re doing some insert operation on a binary tree, guess that it’s logarithmic), or by knowing that the performance is subtle (eg “you would guess that this is log but it needs to update some data on every node so it’s linear”). When you write your own library you can hopefully avoid having functions with subtle performance and make sure things are documented well (but then you also don’t think they should be writing their own library). When you use the C stdlib you’re a bit stuck. Maybe most of the functions there should just be banned from the codebase, but I would guess that would be hard.

thaumasiotes · on March 1, 2021

> I assume that's where the slowdown comes from here - scanf needs to implement a ton of features, some of which need the input length, and the implementor expected it to be run on short strings.

I didn't get that impression. It sounded like the slowdown comes from the fact that someone expected sscanf to terminate when all directives were successfully matched, whereas it actually terminates when either (1) the input is exhausted; or (2) a directive fails. There is no expectation that you run sscanf on short strings; it works just as well on long ones. The expectation is that you're intentionally trying to read all of the input you have. (This expectation makes a little more sense for scanf than it does for sscanf.)

The scanf man page isn't very clear, but it looks to me like replacing `sscanf("%d", ...)` with `sscanf("%d\0", ...)` would solve the problem. "%d" will parse an integer and then dutifully read and discard the rest of the input. "%d\0" will parse an integer and immediately fail to match '\0', forcing a termination.

EDIT: on my xubuntu install, scanf("%d") does not clear STDIN when it's called, which conflicts with my interpretation here.

JdeBP · on March 1, 2021

No it would not. Think about what the function would see as its format string in both cases.

The root cause here isn't formatting or scanned items. It is C library implementations that implement the "s" versions of these functions by turning the input string into a nonce FILE object on every call, which requires an initial call to strlen() to set up the end of read buffer point. (C libraries do not have to work this way. Neither P.J. Plauger's Standard C library nor mine implement sscanf() this way. I haven't checked Borland's or Watcom's.)

See https://news.ycombinator.com/item?id=26298300 and indeed Roger Leigh six months ago at https://news.ycombinator.com/item?id=24460852 .

pja · on March 1, 2021

Yes, it looks that way. On the unix/linux side of things, glibc also implements scanf() by converting to a FILE* object, as does the OpenBSD implementation.

It looks like this approach is taken by the majority of sscanf() implementations!

I honestly would not personally have expected sscanf() to implicitly call strlen() on every call.

azernik · on March 2, 2021

> If this were some custom format rather than json and the parser still used sscanf, the bug would still happen. So I think json is somewhat orthogonal to the matter.

What's the point of using standard formats if you're not taking advantage of off-the-shelf software for handling it?

thw0rted · on March 1, 2021

This is probably good advice but not even relevant. It's down one level from the real problem: when your game spends 6 minutes on a loading screen, *profile* the process first. You can't optimize what you haven't measured. Now, if you've identified that JSON parsing is slow, you can start worrying about how to fix that (which, obviously, should be "find and use a performant and well-tested library".)

petters · on Feb 28, 2021

Is there some reason sscanf can not be implemented without calling strlen?

ddulaney · on Feb 28, 2021

It could be, and the article acknowledges that possibility. For example, a cursory check of the musl sscanf [0] suggests that it does not (though I may have missed something). However, whichever implementation Rockstar used apparently does.

[0]: https://git.musl-libc.org/cgit/musl/tree/src/stdio/vfscanf.c

pja · on March 1, 2021

A lot of libc implementations seem to implement sscanf() this way: As well as the windows libc ones mentioned above, I checked the OpenBSD & glibc implemtations & they worked the same way.

SolarNet · on March 3, 2021

Part of this is that game companies are notorious for re-implementing standard libraries for "performance". I suspect both shitty implementations of sscanf and the not-a-hashmap stem from this.

JdeBP · on March 1, 2021

Slightly less cursory: https://news.ycombinator.com/item?id=26298300

woko · on March 1, 2021

> actually measure performance and try to improve it

This reminds me that I used to do that all the time when programming with Matlab. I have stopped investigating performance bottlenecks after switching to Python. It is as if I traded performance profiling with unit testing in my switch from Matlab to Python.

I wonder if there are performance profilers which I could easily plug into PyCharm to do what I used to do with Matlab's default IDE (with a built-in profiler) and catch up with good programming practices. Or maybe PyCharm does that already and I was not curious enough to investigate.

simias · on March 1, 2021

The JSON parsing is forgivable (I actually didn't know that scanf computed the length of the string for every call) but the deduplication code is a lot less so, especially in C++ where maps are available in the STL.

It also comforts me into my decision of never using scanf, instead preferring manual parsing with strtok_r and strtol and friends. It's just not robust and flexible enough.

ant6n · on Feb 28, 2021

I thought the lesson is "listen to your customers and fix the issues they complain about".

jiggawatts · on March 1, 2021

> Parsing JSON?!

Many developers I have spoken to out there in the wild in my role as a consultant have wildly distorted mental models of performance, often many orders of magnitude incorrect.

They hear somewhere that "JSON is slow", which it is, but you and I know that it's not this slow. But "slow" can encompass something like 10 orders of magnitude, depending on context. Is it slow relative to a non-validating linear binary format? Yes. Is it minutes slow for a trivial amount of data? No. But in their mind... it is, and there's "nothing" that can be done about it.

Speaking of which: A HTTPS REST API call using JSON encoding between two PaaS web servers in Azure is about 3-10ms. A local function call is 3-10ns. In other words, a lightweight REST call is one million times slower than a local function call, yet many people assume that a distributed mesh microservices architecture has only "small overheads"! Nothing could be further from the truth.

Similarly, a disk read on a mechanical drive is hundreds of thousands of times slower than local memory, which is a thousand times slower than L1 cache.

With ratios like that being involved on a regular basis, it's no wonder that programmers make mistakes like this...

salawat · on March 1, 2021

Tbe funny thing is, as a long time SDET, I had to give up trying to get people to write or architect in a more "local first" manner.

Everyone thinks the network is free... Until it isn't. Every bit move in a computer has a time cost, and yes, it's small... But... When you have processors as fast as what exist today, it seems a sin that we delegate so much functionality out to some other machine across a network boundary when the same work could be done locally. The reason why though?

Monetizability and trust. All trivial computation must be done on my services so they can be metered and charged for.

We're hamstringing the programs we run for the sole reason that we don't want to make tools. We want to make invoices.

gridspy · on March 1, 2021

And like so many things, we're blind to how our economic systems are throwing sand in the gears of our technical ones.

I love your point that shipping a library (of code to locally execute) with a good API would outperform an online HTTPS API for almost all tasks.

XCSme · on Feb 28, 2021

> But how could R* not do this? GTAV is so full of great engineering

I assume there were different people working on the core game engine and mechanics VS loading. It could as well be some modular system, where someone just implemented the task "load items during online mode loading screen screen".

mdoms · on Feb 28, 2021

Me and my twice-a-week gaming group enjoyed GTA V but abandoned it years ago simply because of the load times. We have 2x short slots (90-120 minutes) each week to play and don't want to waste them in loading screens.

We all would have picked this game back up in a second if the load times were reduced. Although I must say even with the same results as this author, 2 minutes is still too long. But I'll bet that, given the source code, there are other opportunities to improve.

thw0rted · on March 1, 2021

I wonder if a paid subscription would have fixed this? If you left a paid MMO, they'd probably ask you to fill out an exit survey, and you could say "I'm canceling because load times are terrible", which would (hopefully) raise the priority of reducing load times. But since GTA online is "free", there's not a single exit point where they can ask "why did you stop playing".

thatguy0900 · on March 2, 2021

Gta has made billions off of its shark card micro transaction system. So the incentives are probably pretty similar for player retention. Granted, the players leaving over load times are probably not the players who are invested enough to spend thousands in micro transactions.

thw0rted · on March 3, 2021

That's my point though, you don't get to survey people not buying the microtransaction because they quit due to terrible load times, whereas you could survey people who cancel a subscription. I guess they could still gather data by reading reviews, looking through forums / Reddit / whatever, and tallying up complaints, though.

lumberingjack · on March 1, 2021

It gets worse they're brand new game Red Dead online does the same thing as soon as it did it the first time I logged out and charged back

segmondy · on Feb 28, 2021

This is why I come to HN, I was going to skip this because I thought it was about video games, but really glad to have read it, and loved every line of the article.

So much to get from this.

Even if you don't have the source, you can make a change if you are annoyed enough.

If you don't like something, and the source code is out there, really go contribute.

Performance matters, know how to profile and if using an external dependency, then figure out their implementation details.

Algorithms & Data structures matter, often I see devs talking about how it doesn't matter much but the difference between using a hashmap vs array is evident.

Attentive code reviews matter, chances are they gave this to a junior dev/intern, and it worked with a small dataset and no one noticed.

closeparen · on Feb 28, 2021

I think this is a perfect example of “algorithms and data structures emphasis is overblown.” Real world performance problems don’t look like LeetCode Hard, they look like doing obviously stupid, wasteful work in tight loops.

wnoise · on March 1, 2021

... that's the exact opposite of what I took from this.

The obviously stupid, wasteful work is at heart an algorithmic problem. And it cropped up even in the simplest of data structures. A constant amount of wasteful work often isn't a problem even in tight loops. A linear amount of wasted work, per loop, absolutely is.

lmm · on March 1, 2021

It's not something that requires deep algorithms/data structures knowledge, is the point. Knowing how to invert a binary tree won't move the needle on whether you can spot this kind of problem. Knowing how to operate a profiler is a lot more useful.

rictic · on Feb 28, 2021

True that it's rare that you need to pull out obscure algorithms or data structures, but in many projects you'll be _constantly_ constructing, composing, and walking data structures, and it only takes one or two places that are accidentally quadratic to make something that should take milliseconds take minutes.

The mindset of constantly considering the big-O category of the code you're writing and reviewing pays off big. And neglecting it costs big as well.

ilaksh · on Feb 28, 2021

Except that you need to test your software and if you see performance problems, profile them to identify the cause. It's not like you have one single chance to get everything right.

rictic · on March 3, 2021

The later in development a problem is caught, the more expensive it is. The farther it gets along the pipeline of concept -> prototype -> testing -> commit -> production, the longer it's going to take to notice, repro, identify the responsible code, and fix.

It's true that you don't just have one shot to get it right, but you can't afford to be littering the codebase with accidentally quadratic algorithms.

I fairly regularly encounter code that performed all right when it was written, then something went from X0 cases to X000 cases and now this bit of N^2 code is taking minutes when it should take milliseconds.

imtringued · on March 2, 2021

People complain about Big-O once they reach the end of its usefulness. Your algorithm is O(n) or O(n log n) but it is still too slow.

nikanj · on Feb 28, 2021

And trying to optimize them gets you stink eye at code review time. Someone quotes Knuth, they replace your fast 200 lines with slow-as-molasses 10 lines and head to the bar.

gridspy · on Feb 28, 2021

Unfortunately this. Or they will say "don't optimize it until it proves to be slow in production" - at which point it is too dangerous to change it.

baby · on March 1, 2021

And here what matters is not your programming skills, it’s your profiling skills. Every dev writes code that’s not the most optimized piece from the start, hell we even say “don’t optimise prematurely”. But good devs know how to profile and flamegraph their app, not leetcode their app.

segmondy · on March 1, 2021

actually, "don't optimize prematurely" is a poor advice. just recently I was doing a good review that had the same issue where they were counting the size of an array in a loop, when stuff was being added to the array in the loop too. obvious solution was to track the length and

   arr = []
   while ...:
      if something:
         arr.append(foo)
      ...
      if count(arr) == x:
        stuff
      ...

changed to

   arr=[]
   arr_size = 0
   while ...:
      if something:
         arr.append(foo)
         arr_size++;
      ...
      if arr_size == x:
        stuff
      ...

This is clearly optimization, but it's not premature. The original might just pass code review, but when it wrecks havoc, the amount of time it will cost will not be worth it, jira tickets, figuring out why the damn thing is slower, then having to recreate it in dev, fixing it, reopening another pull request, review, deploy, etc. Sometimes "optimizing prematurely" is the right thing to do if it doesn't cost much time to do or overly completely the initial solution. Of course, this depends on the language, some languages will track the length of the array so checking the size is o(1), but not all languages do, so checking the length can be expensive, knowing the implementation detail matters.

dthul · on March 1, 2021

I'm not sure I would prefer the second version in a code review. I find the first version is conceptually nicer because it's easy to see that you will always get the correct count. In the second version you have to enforce that invariant yourself and future code changes could break it. If this is premature optimization or not depends on the size of the array, number of loop iterations and how often that procedure is called. If that's an optimization you decide to do, I think it would be nice to extract this into an "ArrayWithLength" data structure that encapsulates the invariant.

thaumasiotes · on March 1, 2021

> In the second version you have to enforce that invariant yourself and future code changes could break it.

Yes, that's a real issue. But we've been given two options:

- Does the correct thing, and will continue to do the correct thing regardless of future changes to the code. Will break if the use case changes, even if the code never does.

- Does the correct thing, but will probably break if changes are made to the code. Will work on any input.

It actually seems a lot more likely to me that the input given to the code might change than that the code itself might change. (That's particularly the case for the original post, where the code serves to read a configuration file, but it's true in general.)

dthul · on March 1, 2021

Yes, I absolutely see the reasoning and I think if one does go the route of encapsulating the more efficient array logic one can have the best of both options.

thaumasiotes · on March 1, 2021

> if one does go the route of encapsulating the more efficient array logic one can have the best of both options.

Do you see a way to do this that doesn't involve rolling your own array-like or list-like data type and replacing all uses of ordinary types with the new one? (This is actually already the implementation of standard Python types, but if you're encountering the problem, it isn't the implementation of your types.)

dthul · on March 1, 2021

I guess it depends on the language and library you are using but I have the feeling that in most cases one would probably need to replace the usage of the old data type with the new one.

rocqua · on March 1, 2021

With these things, I have always had the hope that an optimizing compiler would catch this. I think it is an allowed optimization if the count function is considered `const` in C or C++ at least.

segmondy · on Feb 28, 2021

leetcode style thinking will allow you to spot obviously stupid wasteful work in tight loops.

johnfn · on Feb 28, 2021

Exactly - though to add a little nuance to your post, it’s about having a million loops in a 10M line code base and exactly one of them is operating maximally slowly. So preventing the loop from entering the code base is tough - finding it is key.

bluedino · on Feb 28, 2021

I always tell a story about an application we ran, it generated its own interface based on whatever was in inventory. Someone did something really stupid and duplicated each inventory item for each main unit we sold...so you had recursive mess. Assigning 100,000 items when previously it was 100-ish

Anyway, everyone just rolled their eyes and blamed the fact that the app was written in Java.

It ended up generating an XML file during that minute long startup....so we just saved the file to the network and loaded it on startup. If inventory changed, we’d re-generate the file once and be done with it.

indeedmug · on March 1, 2021

It's a lot easier to blame an language for being slow because it's obvious. Blaming algorithms requiresputying in the time to figure things out.

acdha · on March 1, 2021

There’s also a confound between a language and its communities. I’ve seen so many cases where a “slow” language like Python or (older) Perl smoked Java or C++ because the latter developers were trying to follow cultural norms which said that Real Decelopers™ don’t write simple code and they had huge memory churn with dense object hierarchies and indirection so performance ended up being limited by O(n) XML property lookups for a config setting which nobody had ever changed whereas the “slow” language developer had just implemented a simple algorithm directly and most of the runtime was in highly-optimized stdlib native code, a fast regex instead of a naive textbook parser which devolved into an object churn benchmark, etc.

Languages like Java get a lot of bad reputation for that because of popularity: not just that many people are hired into broken-by-design environments (or ones where they’re using some framework from a big consultancy or a vendor who makes most of their revenue from consulting services) but also because many people learn the language as their first language and often are deeply influenced by framework code without realizing the difference between widely used long-term reusable code and what most projects actually need and are staffed for. It’s easy to see the style of the Java standard frameworks or one of the major Apache projects and think that everyone is supposed to write code like that, forgetting that they have to support a greater number of far more diverse projects over a longer timeframe than your in-house business app nobody else works on. Broader experience helps moderate this but many places choose poor metrics and neglect career development.

closeparen · on March 2, 2021

> devolved into an object churn benchmark

I'm stealing this phrase.

imtringued · on March 2, 2021

Java is RAM guzzler with some a small inability to optimize with value types. In its class (managed programming language without value types) it is pretty much as fast as it gets.

The two performance flaws that exist are:

1. Old Java frameworks were not written with performance in mind

2. Your entire app is written in Java so you don't benefit from C++ libraries

madeofpalk · on Feb 28, 2021

> Even if you don't have the source, you can make a change if you are annoyed enough.

Well, until you get flagged by the anti cheat and get your account and motherboard banned...

zionic · on Feb 28, 2021

Imagine getting banned for fixing their insane load times lol

rocqua · on March 1, 2021

Getting banned for DLL injection seems very likely to me. It certainly is a risk.

Heck, it might be against the EULA, which probably doesn't hold op legally, but is decent grounds for a ban.

madeofpalk · on March 1, 2021

Getting banned for modifying the game process seems very commonplace and likely? It'll be a part of any anti-cheat system, it's basically table stakes.

fctorial · on March 1, 2021

This was probably a compiler bug. I don't think the programmers coding the business logic were using 'strlen' and 'sscanf' directly.

singhrac · on March 1, 2021

Honestly, while this horrible code is mildly offensive to me, I'm pretty impressed by this person's persistence. It's one thing to identify a bug in a compiled program, but it's another to patch it without fully understanding what's going on. Caching strlen was a particularly clever trick that sidestepped a bunch of more complicated solutions.

Zak · on Feb 28, 2021

I played through GTA V, enjoyed it, and tried out the online mode afterward.

I've logged in exactly twice. Load times like that may be worth it to a hardcore gamer, but I have no patience for it. There's no shortage of entertainment available to someone with a PC, a reasonable internet connection, and a modicum of disposable income. Waste my time and I'll go elsewhere for my entertainment.

faebi · on Feb 28, 2021

Wow, many people argue how optimized GTA was and then this. I wonder how much money they lost because of this. I often stopped playing because it just took too long to load.

jeroenhd · on Feb 28, 2021

GTA, at least the core gameplay and the single player mode, is quite well optimised. The game ran well even on the cheaper side of gaming PC hardware.

This... this is GTA online. It's a cash cow designed to suck cash out of your pocket. Ads for things you can spend your money on are shown while "connecting", so if this delay wasn't introduced intentionally, it sure isn't a high priority fix. The code isn't part of the optimised, streamlined, interactive part of the game, it's part of the menu and loader system.

Most of these online games/services have so-called "whales" that contribute most if not all of the income the platform makes. If these whales are willing to spend the wads of cash they throw at the platform, they won't even care for another five minutes of ads. The amounts of cash some of these people spend is obscene; the millions Take Two profit from GTA every year are generally generated by only a tiny (usually a single number percentage) of the total player base.

In the end, I doubt they've lost much money on this. They might've even made some from the extra ads.

jsheard · on Feb 28, 2021

> GTA, at least the core gameplay and the single player mode, is quite well optimised. The game ran well even on the cheaper side of gaming PC hardware.

It's easy to forget that GTA5/GTA:O was originally a 360/PS3 game, getting a game of that scope running at all on a system with just 256MB of RAM and VRAM was an incredible achievement.

The A-Team developers who made that happen were probably moved over to Red Dead Redemption 2 though, with GTA5s long tail being handled by the B-Team.

formerly_proven · on Feb 28, 2021

GTA V (the single player game) is quite well optimized and needs a frame rate limiter on most newer systems because it will run at over ~180 fps, at which point the engine starts to barf all over itself.

GTA Online is a huge, enormously buggy and slow mess that will generally struggle to run at 80 fps on a top-of-the-line 2020 system (think 10900K at over 5 GHz with a 3090) and will almost never cross the 120 fps threshold no matter how fast your system is and how low the settings are.

_dujt · on Feb 28, 2021

> at which point the engine starts to barf all over itself.

I’m really confused as to why games are determining anything whatsoever based on the refresh rate of the screen.

Skyrim has this same problem and not being able to play over 60fps is the reason I haven’t touched the game in years.

megameter · on Feb 28, 2021

It's a coding strategy intended to optimize around console refresh targets first and then "do whatever" for PC build.

Generally, a AAA console game will target 30hz or 60hz. Therefore the timing loop is built to serve updates at a steady pace of 30 or 60, with limiting if it goes faster. Many game engines also interpolate animations separately from the rest of the gameplay, allowing them to float at different refresh rates. Many game engines further will also decouple AI routine tick rates from physics to spread out the load. Many game engines now also interleave updates and kick off the next frame before the first is complete, using clever buffering and multithreaded job code. All told, timing in games is one of those hazard zones where the dependencies are both numerous and invisible.

When you bring this piece of intricate engineering over to PC and crank up the numbers, you hit edge cases. Things break. It's usually possible to rework the design to get better framerate independence, but doing do would be invasive - you'd be changing assumptions that are now baked into the content. It isn't just fixing one routine.

gridspy · on Feb 28, 2021

Because if you run the simulation at a different frame-rate from the rendering it is a huge amount more work. Suddenly you have to start marshaling data through intermediate data structures and apply interpolation to everything.

If you then try to run the simulation in parallel with the rendering (rather than between some frames) it is even more work, since inter-thread communication is hard.

This stuff might seem easy for very good programmers, however on a game you want to hire a wide range of programmer skill and "game-play programmers" tend to be weaker on the pure programming front (their skills lie elsewhere)

SSLy · on March 1, 2021

>Skyrim has this same problem and not being able to play over 60fps is the reason I haven’t touched the game in years.

https://www.nexusmods.com/skyrimspecialedition/mods/34705 >Refresh rate uncap for exclusive fullscreen mode.

AndriyKunitsyn · on March 1, 2021

Because unfortunately, in order to draw something on the screen, you need to determine what you draw first. So, before each screen refresh, a thing called "game simulation tick" happens.

cedws · on Feb 28, 2021

Yep, sometimes I feel like having a drive around, and then I remember how long it takes to load and play something else instead. If you end up in a lobby with undesirables and are forced to switch, you've got another long wait.

alborzb · on Feb 28, 2021

It always fascinates me how sometimes people defend things just because they're a fan, even if the particular aspect they're defending doesn't make sense!

I've seen this happen with some other games which are not the best optimised for PCs, but the fans will still defend the developers, just because they like the brand

ip26 · on Feb 28, 2021

It's part of social validation. You inherently want other people to like the things you like, because it validates your choices. This in turn means you'll defend things you like.

Even "rebels", who supposedly relish having fringe tastes, want other rebels to approve of their tastes.

The more strongly you stake your identity as "fan of X", the more social disapproval of X hurts.

comboy · on Feb 28, 2021

They did some good stuff: http://www.adriancourreges.com/blog/2015/11/02/gta-v-graphic...

And just thinking about the size of the game and popularity, to make it work at all, requires some skill. Which makes the OP even more unbelievable.

bredren · on March 1, 2021

This is common with disgraced entertainers.

psyklic · on Feb 28, 2021

I wouldn't be surprised if the long wait increases profits -- as you wait, Rockstar shows you ads for on-sale items.

connicpu · on Feb 28, 2021

I'm willing to bet the developers who wrote the in game store are not the same developers who optimized the rendering pipeline

blibble · on Feb 28, 2021

I tried GTA online once and indeed got fed up of the load times

_the_inflator · on March 1, 2021

I disagree. There is no contradiction. JSON can be a different beast for C, C++, Java, backend coders. You can implement complex 3D graphics while struggling with JSON.

For example, my backend Java guys struggled heavily with JSON mappers. It took them forever to apply changes safely. My department consumes a lot of data from backend systems and we have to aggregate and transform them. Unfortunately the consumed structure changes often.

While a JSON mapper in our case in JAVA was sort of exceptionally hard to handle, a simple NodeJS layer in JavaScript did the job exceptionally easy and fast. So we used a small NodeJS layer to handle the mapping instead of doing this in Java.

Moral of the story: Sometimes there are better tools outside your view. And this seems to be many times the case for JSON. JSON means JavaScript Object Notation. It is still tough for OO languages to handle.

This is my observation.

Koiwai · on March 1, 2021

What are you talking about? the performance improvement is showcased in real world and you're denying it?

imtringued · on March 2, 2021

I don't know why but denial is one of the most common strategies of our time.

eznzt · on Feb 28, 2021

The 3D engine is highly optimised.

SteveNuts · on Feb 28, 2021

They must have hired a Solarwinds intern to write the loading screen.

z92 · on Feb 28, 2021

Also note that the way he fixed it, strlen only caches the last call and returns quickly on an immediate second call with the same string.

Another reason why C styled null terminated strings suck. Use a class or structure and store both the string pointer and its length.

I have seen other programs where strlen was gobbling up 95% of execution time.

wruza · on March 1, 2021

Not that C strings do not suck, but with pascal strings we could discuss in this thread how implicitly copying a slowly decreasing part of a 10mb string at every tokenizer iteration could miss a developer’s attention. It’s completely unrelated to C strings and is completely related to bytefucking by hand. Once you throw a task like “write a low-level edge-case riddled primitive value shuffling machine” into an usual middle level corporative production pipeline, you’re screwed by default.

ziml77 · on March 1, 2021

I'm with you. I hate null terminated strings. In my first experiences with C, I specifically hated them because of strlen needing to scan them fully. When C++ introduced string_view, my hate grew when I realized that I could have zero-copy slices all the way up until I needed to interface with a C API. At that point you're forced to copy, even if your string_view came from something that was null terminated!

ip26 · on Feb 28, 2021

Could this be worked into a compiler/stdlib from the back-end? Could a compiler/stdlib quietly treat all strings as a struct of {length,string} and redefine strlen to just fetch the length field? Perhaps setting a hook to transparently update "length" when "string" is updated is not trivial.

Edit: hah, I'm decades late to the party, here we go:

Most modern libraries replace C strings with a structure containing a 32-bit or larger length value (far more than were ever considered for length-prefixed strings), and often add another pointer, a reference count, and even a NUL to speed up conversion back to a C string. Memory is far larger now, such that if the addition of 3 (or 16, or more) bytes to each string is a real problem the software will have to be dealing with so many small strings that some other storage method will save even more memory (for instance there may be so many duplicates that a hash table will use less memory). Examples include the C++ Standard Template Library std::string...

https://en.wikipedia.org/wiki/Null-terminated_string

toast0 · on Feb 28, 2021

I don't think you could do it transparently, because it's expected to pass the tail of a character array by doing &s[100] or s + 100, etc. I don't think that would be easy to catch all of those and turn them into a string fragment reference.

From c++ class, std::string was easy enough to use everywhere, and just foo.c_str() when you needed to send it to a C library. But that may drags in a lot of assumptions about memory allocation and what not. Clearly, we don't want to allocate when taking 6 minutes to parse 10 megs of JSON! :)

magicalhippo · on Feb 28, 2021

If only C had introduced an actual string type...

GuB-42 · on March 1, 2021

Maybe even more surprising to me is that sscanf() relies on strlen().

I would have expected libc to take that use case in consideration and use a different algorithm when the string exceeds a certain size. Even if the GTA parser is not optimal, I would blame libc here. The worst part is that some machines may have an optimized libc and others don't, making the problem apparent only in some configuration.

I believe standard libraries should always have a reasonable worst case by default. It doesn't have to be perfectly optimized, but I think it is important to have the best reasonable complexity class, to avoid these kinds of problems. The best implementations usually have several algorithms for different cases. For example, a sort function may do insertion (n^2, good for small n) -> quicksort (avg. nlog(n), worst n^2, good overall) -> heapsort (guaranteed nlog(n), slower than quicksort except in edge cases). This way, you will never hit n^2 but not at the cost of slow algorithm for the most common cases.

The pseudo hash table is all GTA devs fault though.

ziml77 · on March 1, 2021

I don't understand why it would need to use strlen anyway. Why wouldn't it treat the string like a stream when scanf is already coded to operate on an actual stream?

JdeBP · on March 1, 2021

It's an implementation strategy, one of at least two, explained at https://news.ycombinator.com/item?id=26298300 on this very page.

masklinn · on March 1, 2021

> For example, a sort function may do insertion […]

That’s generally called “adaptive”. A famous example of that is timsort.

Your version has issues though: insertion sort is stable, which can dangerously mislead users.

kevinventullo · on Feb 28, 2021

I gave up on playing GTA:O because everything took so long to load, having never spent a dime. I have to imagine there is so much lost revenue because of this bug; I hate to be harsh but it is truly an embarrassment that someone external to R* debugged and fixed this before they did (given they had 6 years!).

ThomW · on Feb 28, 2021

Load times is absolutely the primary reason I quit playing.

bouke · on Feb 28, 2021

Same here, being slow as molasses really spoiled the game for me. I wonder if the same software quality caused other unnecessary in-game loading times. I felt that time spent playing GTA:O was about 60% waiting in lobby/playing, and the remaining 40% actual play time. Same with RDR2 by the way.

gridspy · on Feb 28, 2021

These problems come down to which programmers were allowed to remain on the team and how much they were allowed to change. So all similar performance problems will remain.

jjulius · on March 1, 2021

I can stomach a long load time (but that's not to excuse how godawful this load time is), just as long as it's worth it. I need some kind of resolution/ultimate goal, but in GTA:O there really isn't one; it's just grind-grind-grind so that you can get yet another car, or another house, or another thing that you already have plenty of. Lather, rinse, repeat in perpetuity.

I just felt like every time I logged in I went, "So what's the point here?".

josalhor · on Feb 28, 2021

I have a friend who works at Rockstar. I have forwarded the blog post to him.

computronus · on March 1, 2021

I don't play GTA Online, but I'll just say thanks for forwarding, since it's a concrete step towards possibly fixing the bugs.

marshmallow_12 · on March 1, 2021

ask him from me what on earth is wrong with R*

xdrosenheim · on March 1, 2021

Short answer: it doesn't make enough money, it doesn't matter.

GhostVII · on March 1, 2021

GTA V has made 6 billion in revenue, it's probably the most profitable video game in history by a large margin.

jojobas · on March 1, 2021

GTA V is profitable, fixing bugs is not.

josalhor · on March 1, 2021

Fixing bugs that reduce engagement is profitable. I am pretty confident this kind of bug has easily costed Rockstar in the order of 10-100 million dollars _per year_.

lhl · on March 1, 2021

You could easily model how increased retention/engagement rates would translate to increased microtransaction revenue from drastically decreased load times. You could even bucket test it if you wanted to directly quantify the results, but I'm sure that the cost payback would be a matter of days.

marshmallow_12 · on March 1, 2021

but why bother when you got so much money anyway? It's shockingly lazy, but are you as productive as you can be? They're no different.

imtringued · on March 2, 2021

Management decided it doesn't make enough money.

Aissen · on Feb 28, 2021

It would make a great addition to the Accidentally Quadratic blog: https://accidentallyquadratic.tumblr.com/ (haven't been updated in ~2 years, but maybe the author is around).

Dragonai · on March 1, 2021

There's some great information here. Thanks so much for sharing!

avar · on Feb 28, 2021

GTA:O shows advertisements for in-game purchases on the loading screen. How many advertisements you see is a function of how long the loading screen takes.

Something tells me this "problem" was discovered long ago at Rockstar HQ, and quietly deemed not to be a problem worth solving.

cmeacham98 · on Feb 28, 2021

I was going to say surely this has extremely diminishing or even negative return past 30-60 seconds, but then I remembered lots of people are willing to sit through 10 minutes of commercials to watch 20-30 minutes of TV. So I guess for the right type of customer it works?

crazygringo · on March 1, 2021

But on network TV, 8 minutes of commercials are interspersed among 22 minutes of content.

Virtually nobody's willing to sit through 10 minutes of commercials straight.

thaumasiotes · on March 1, 2021

The GTA thing is even worse than that; television doesn't air commercials before the beginning of the show!

SilasX · on March 1, 2021

Especially if you have to watch the full ten minutes first.

Then again, that's what they do at movie theaters...

Taylor_OD · on March 1, 2021

I remember realizing I was 10-20 minutes into an infomercial instead of regular commercials late at night when I was young. I was probably distracted at the time but man did I feel stupid when I realized that it wasnt just a really long P90X commercial.

thitcanh · on March 1, 2021

It depends on the channel, really. I remember once I was sitting at an eatery in the Philippines (basically a cafeteria with immediately-available food) and the TV just played commercials from the time I sat down until I left.

vmception · on Feb 28, 2021

They either:

Found that they get more purchases BECAUSE of the long loading time, despite bouncing other players (the ad theory and happy coincidence for them to have the ad placement slots from shitty engineering)

The engagement team was told bullshit by the engineering team about how impossible it is to fix that issue

Or they are just making enough not to care

the_gipsy · on Feb 28, 2021

Or

the "engagement team" got fooled by longer session times

vmception · on March 1, 2021

hahaha true, the a/b test showed them what they wanted to see and they never stopped to ask users if its what they wanted

“Roll that out to everyone!”

softwhale · on Feb 28, 2021

Could be, but I can imagine people giving up on GTA: Online altogether because it takes too much time to load.

anonymousab · on Feb 28, 2021

> Could be, but I can imagine people giving up on GTA: Online altogether because it takes too much time to load.

Luckily for them, churn is usually a different problem solved by a different part of the team/org with different priorities and insights.

bb123 · on Feb 28, 2021

Yep this was me about 4 years ago. I distinctly remember sitting through the loading screen and being absolutely astonished at how long it took. I never fired it up again because I just couldn’t be bothered to wait that long.

tumetab1 · on March 1, 2021

I'm one that had pre-order bonus on GTA Online but gave up in part due to the loading times. If I have one hour to play I don't want to spend 10 minutes on loading screens.

It's funny to think that I would probably pay some amount to have GTA Online without these absurd loading times and without modders/hackers :D

mxfh · on March 1, 2021

But for the ones that don't, it equals a virtual commute, so it might be a good filter and the audience beyond that watershed is more serious about spending money there. ;)

wun0ne · on Feb 28, 2021

Incredible work. Hopefully R* acknowledge this and compensate you in some way. I won’t be holding my breath.

Maybe set up a donation page? I’d be more than happy to send some beer money your way for your time spent on this!

sundvor · on Feb 28, 2021

Agree, this is up there in the top tier of amazing stories I've read here on HN. I admire T0ST's technical and writing skills; first rate combination. Massive kudos, would like to shout a cup of coffee.

(I also really like the design and presentation of the article; I'm running out of superlatives here.)

sundvor · on March 2, 2021

Fwiw, done - using link at bottom.

emptyparadise · on March 1, 2021

The author's reward will probably be a DMCA takedown.

kuroguro · on March 1, 2021

Thanks for the suggestion, probably missed most of the traffic but just added it :)

https://buymeacoffee.com/t0st

berkayozturk · on March 1, 2021

Hi,

I loved the article. Are you planning to write any guides/tutorials about reverse engineering games? Seems like you have a lot of practical experience. I (and probably many other people) would be really excited if you started writing about how you do all these in detail with practical examples. I would even be glad to pay for such content.

powerfulclick · on Feb 28, 2021

This is really cool - how did you develop the background knowledge to solve this? I'm trying to learn more about low-level stuff and I would have no idea how to approach solving a problem like this

spuz · on Feb 28, 2021

I'd recommend searching HN for threads about learning reverse engineering. Here are a few that I've found:

Reverse Engineering Course: https://news.ycombinator.com/item?id=22061842

Reverse Engineering For Beginners: https://news.ycombinator.com/item?id=21640669

Introduction to reverse engineering for beginners: https://news.ycombinator.com/item?id=16104958

the_only_law · on Feb 28, 2021

I love it but I don’t have the focus or patience to ever do anything more than basic analysis.

A lot of the reverse engineers I know seemingly have deep platform knowledge and can do things like cite Win32 docs from memory.

mmastrac · on Feb 28, 2021

I was a reverse engineer for years and never was able to do anything like quoting docs. I'd be constantly googling or using reference material. The only real attribute I'd suggest is tenacity.

Graziano_M · on March 1, 2021

Tenacity is everything.

Both when looking at a particular problem, but also in sticking to RE in general for long enough to pick up the skills and tricks that make you quick. There are countless tricks you pick up that cleave off huge amounts of time that would otherwise be wasted.

saagarjha · on Feb 28, 2021

The number one way to get the patience to do this is to have impatience for software that is slow.

meibo · on March 1, 2021

And with most RE software(especially the mentioned inaccessible interactive disassembling tool) you'll get that by default, waiting for a 70mb binary to analyze isn't fun :)

saagarjha · on March 1, 2021

Fortunately, software engineers can be bad at making time value judgments ;)

waterheater · on March 2, 2021

Here's a good book on software reverse engineering (SRE): https://www.amazon.com/Practical-Reverse-Engineering-Reversi...

It won't help you with PowerPC, but the chapter list is: x86 and x64, ARM, The Windows Kernel, Debugging and Automation, and Obfuscation.

The book was written in 2014, so it should be reasonably relevant for modern purposes, and especially so if digging into any software older than 2014.

tyingq · on Feb 28, 2021

"They’re parsing JSON. A whopping 10 megabytes worth of JSON with some 63k item entries."

Ahh. Modern software rocks.

ed25519FUUU · on Feb 28, 2021

Parsing 63k items in a 10 MB json string is pretty much a breeze on any modern system, including raspberry pi. I wouldn't even consider json as an anti-pattern with storing that much data if it's going over the wire (compressed with gzip).

Down a little in the article and you'll see one of the real issues:

> But before it’s stored? It checks the entire array, one by one, comparing the hash of the item to see if it’s in the list or not. With ~63k entries that’s (n^2+n)/2 = (63000^2+63000)/2 = 1984531500 checks if my math is right. Most of them useless.

Slikey · on Feb 28, 2021

Check out https://github.com/simdjson/simdjson

More than 3 GB/s are possible. Like you said 10 MB of JSON is a breeze.

tyingq · on Feb 28, 2021

The JSON patch took out more of the elapsed time. Granted, it was a terrible parser. But I still think JSON is a poor choice here. 63k x X checks for colons, balanced quotes/braces and so on just isn't needed.

  Time with only duplication check patch: 4m 30s
  Time with only JSON parser patch:       2m 50s

masklinn · on March 1, 2021

> But I still think JSON is a poor choice here.

It’s an irrelevant one. The json parser from the python stdlib parses a 10Mb json patterned after the sample in a few dozen ms. And it’s hardly a fast parser.

bombcar · on Feb 28, 2021

At least parse it into SQLite. Once.

brianberns · on Feb 28, 2021

They probably add more entries over time (and maybe update/delete old ones), so you’d have to be careful about keeping the local DB in sync.

bombcar · on Feb 28, 2021

So just have the client download the entire DB each time. Can’t be that many megabytes.

Twirrim · on March 1, 2021

I did a very very ugly quick hack in python. Took the example JSON, made the one list entry a string (lazy hack), repeated it 56,000 times. That resulted in a JSON doc that weighed in at 10M. My initial guess at 60,000 times was a pure fluke!

Dumped it in to a very simple sqlite db:

    $ du -hs gta.db
    5.2M    gta.db

Even 10MB is peanuts for most of their target audience. Stick it in an sqlite db punted across and they'd cut out all of the parsing time too.

tyingq · on Feb 28, 2021

I think just using a length encoded serialization format would have made this work reasonably fast.

hobofan · on Feb 28, 2021

Or just any properly implemented JSON parser. That's a laughable small amount of JSON, which should easily be parsed in milliseconds.

LukvonStrom · on Feb 28, 2021

why not embed node.js to do this efficiently :D

4cao · on Feb 28, 2021

Excellent investigation, and an elegant solution.

There's a "but" though: you might end up getting banned from GTA Online altogether if this DLL injection is detected by some anti-cheat routine. The developers should really fix it on their end.

kuroguro · on Feb 28, 2021

Yeah, there's some disclaimers in the PoC repo. Definetly use at your own risk.

away_throw · on Feb 28, 2021

It's highly unlikely you are going to get banned on GTA even with cheats. The anti-cheat is a joke. The game is filled to the brim with cheaters. If me and my friends play, we play with cheats just to protect ourselves from other cheaters.

half-kh-hacker · on March 1, 2021

Game cheat dev here: Just to provide some context, the GTA Online client is woefully horrible at doing client-side validation on the packets it receives from other peers. (there isn't an authoritative server)

This means that anyone in your session can send you a weirdly-formed packet to crash your game. Most cheats have protections against this by just doing Rockstar's job and adding better validation around packet interpretation routines.

Using "cheats just to protect [your]selves" actually makes a lot of sense.

JyB · on Feb 28, 2021

I am absolutely shocked about this finding. The amount of money on microtransactions Rockstar lost because of this single issue must be gigantic. The amount of people that got turned off by the loading times over the years is massive. It's mind boggling.

luckystarr · on Feb 28, 2021

Well, that's embarrassing. I can't even imagine the level of shame I would feel if I had written the offending code.

But, you know, premature optimization yadda yadda.

whatever1 · on Feb 28, 2021

Probably this was written under a very strict release deadline and it worked ok at the time (less items for microtransactions). The problem lies with the management that never picked up the issue once it became a big problem. Pretty sure that any developer in R* is capable of optimizing this parser.

rcxdude · on March 1, 2021

It's the kind of thing that's very easy to accidentally write, it's not that shameful. What's shameful is not investigating the load times at all, since the problem is so easy to see when any measurement is done.

luckystarr · on March 1, 2021

If 63k entries with 10MB of actual data takes minutes to process on a current computer I'd consider that shameful.

10MB is less than the cache in modern CPUs. How can this take minutes(!).

rcxdude · on March 1, 2021

It's easy to write because it doesn't run noticably slowly on smaller data, and it's easy to accidently introduce quadratic behaviour in some systems. Obviously if someone tested this on 10MB of JSON and saw it took minutes and thought that was reasonable, that's a bit ridiculous, but I doubt at the time the code was written anyone expected it to be fed such a large JSON object.

gridspy · on March 1, 2021

I imagine there is a senior programmer working for another game company. They are currently kicking themselves about the poorly performing and rushed loading code they wrote while still working at R*. But there is nothing they can do about it now, since they have moved on.

nine_k · on Feb 28, 2021

As they say, a lot of classified stuff and closed-source code remains classified and closed not because it contains important secrets, but because those who hold the lock feel too ashamed and embarrassed to show the contents.

Cakez0r · on March 1, 2021

It's not a premature optimisation to use a hashset instead of a list though!

ufo · on March 1, 2021

The bug is more devious than that. The code looks linear at a glance and the culprit is that sscanf is actually O(N) on the length of the string. How many people would expect that?

ww520 · on Feb 28, 2021

Yep, O(n^2) has the problem that no matter how fast you upgrade your hardware it would still lag.

Another pet peeve of mine is Civ 6's loading time for a saved game is atrocious. I'm sure there's a O(n^2) loop in there somewhere.

wruza · on March 1, 2021

My personal pet peeve is Windows Update (and their products installation routine in general). I bet that it’s n^3 somewhere deep and they carefully curb than n for decades.

bscphil · on March 1, 2021

Good call. I'd love to read a post-mortem on why it was even possible for Windows XP's update check to be as slow as it was. I've definitely waited around 2 hours one time just for the check to complete after finishing an installation.

shihab · on Feb 28, 2021

I loved so much reading it, I was thinking that if someone were to write fictional, sherlock holmes like fantasy story where our sherlock would take some (maybe fictional) widely used software at each episode, investigate it like this, and reveal some (fictional) grand bug in the end- I'd totally read it.

Yeah I know it sounds stupid, but I suspect real Sherlock Holmes was inspired by a true story like this one too, and at least some contemporary detectives started to enjoy reading them.

saagarjha · on Feb 28, 2021

There’s no need for the examples to be fictional, there are more than enough real world cases to share. Sadly, many of my personal ones end in “I filed a bug with an annotated screenshot of decompiled code indicating where they should fix it but nothing happened”.

saagarjha · on Feb 28, 2021

Reading things like these is bittersweet. One one hand, I am glad to see that the art of figuring out “why is this thing slow” is still alive and well, even in the face of pushback from multiple fronts. On the other hand, it’s clear that the bar is continually rising for people who are looking to do this as a result of that pushback. Software has always had a bottleneck of the portion of the code written by the person with the least knowledge or worst priorities, but the ability to actually work around this as an end user has gotten harder and harder.

The first hurdle is the technical skill required: there has always been closed source software, but these days the software is so much more complex, often even obfuscated, that the level of knowledge necessary to diagnose an issue and fix it has gone up significantly. It used to be that you could hold and entire program in your head and fix it by patching a couple bytes, but these days things have many, many libraries and you may have to do patching at much stranger boundaries (“function level but when the arguments are these values and the call stack is this”). And that’s not to say anything of increasing codesigning/DRM schemes that raise the bar for this kind of thing anyways.

The other thing I’ve been seeing is that the separation between the perceived skills of software authors and software users has increased, which I think has discouraged people from trying to make sense of the systems they use. Software authors are generally large, well funded teams, and together they can make “formidable” programs, which leads many to forget that code is written by individuals who write bugs like individuals. So even when you put in the work to find the bug there will be people who refuse to believe you know what you are doing on account of “how could you possibly know better than $GIANT_CORPORATION”.

If you’re looking for ways to improve this, as a user you should strive to understand how the things you use work and how you might respond to it not working–in my experience this is a perpetually undervalued skill. As a software author you should look to make your software introspectable, be that providing debug symbols or instructions on how users can diagnose issues. And from both sides, more debugging tools and stories like this one :)

will4274 · on Feb 28, 2021

Doesn't surprise me at all. It's an O(n^2) algorithm (strlen called in a loop) in a part of the code where N is likely much smaller in the test environment (in-app purchases).

Overwatch is another an incredibly popular game with obvious bugs (the matchmaking time) front and center. And gamers are quick to excuse it as some sort of incredibly sophisticated matchmaking - just like the gamers mentioned in OP.

It's easy to to say it's something about gamers / gaming / fandom - but I have a locked down laptop issued by my bigcorp which is unbelievably slow. I'd bet a dollar there's a bug in the enterprise management software that spins the CPU. A lot of software just sucks and people use it anyway.

jrockway · on Feb 28, 2021

I am not sure Overwatch's matchmaking time is a bug per se. The time estimates are bad for sure. But the matchmaker can really only be sure of one state -- if you queue for a match, a match will be made. The rest is predicting who will show up, along with some time-based scale for making a suboptimal match in the interest of time. Players absolutely hate these suboptimal matches, so the time threshold ends up being pretty high. The rest seems to just be luck; will the right combination of 11 other people be in the right place at the right time?

I think it could be improved, but it doesn't strike me as being buggy.

(Overwatch itself, lots of bugs. Tons of bugs. If they have any automated tests for game mechanics I would be pretty surprised.)

will4274 · on Feb 28, 2021

No, it doesn't add up. I can see dozens of groups filling up and queueing at my groups level as I wait for matches. Worse, many of my matches just aren't that evenly balanced. Even if you believe the game is dead now, things were just as bad (10+ minute queues for full 6 stacks) at the peak. They don't do tight geographic bindings - I live on the US west coast and regularly get Brits and Aussies in my games.

I guess what they are probably doing is batching groups of games and then matching for the entire batch, to ensure nobody gets a "bad" match. What they've missed is that well - 5% of matches are bad baseline because somebody is being a jerk or quits or has an internet disconnect or smurfs or whatever other reasons. They could have picked an algorithm that gave fast matches 99% of the time at the cost of having bad matches 1% of the time and nobody would have noticed, because their baseline bad match rate is so high. Optimization from too narrow a perspective.

Honestly, the OW matches I get aren't any more balanced that the COD matches I used to get, and I got those in a minute, not 15.

jrockway · on March 1, 2021

I think that player variance is the hardest thing for the matchmaker to account for, and a factor that makes a big difference in the quality of the match. Bronze duo'd with Top 500 is always a shit show, even if the other team has the same duo. It don't think it can be made to work.

(My experience shows this is the case; quickplay matches, with no grouping restrictions, are always more of a shitshow than ranked, which has some grouping restrictions.)

Similarly, an individual's performance variance makes a big difference that a mere arithmetic average can't account for. A 3900 player probably plays like a 4100 player when fresh and warmed up, but like a 3500 player when drunk, sleepy, and mad. The actual SR/MMR will converge to a time weighted average of those two states, but if you're in a 4100 game with that player when they're in their 3500 state, it's probably unwinnable. Maybe the matchmaker could attempt to predict this, but it would probably make people mad. Personally, I think that's the nature of a 12 person game composed of 12 randos. There is very little an algorithm can do to tune that (short of changing abilities/hitboxes/cooldowns of each player, which would make people mad).

Then there are people that abuse the matchmaker by exploiting uncertainty in their favor (creating a new account). Every time I have played in a 6 stack in ranked, we've always played against 6 brand new accounts that are clearly better than the matchmaker thinks they are, which sucks. (In quickplay, I generally have very good games in a 6 stack though.)

Overall, I hate to say this, but I think they're doing the best they can. I don't think there's enough data to make a good match 100% of the time, which is why people find 11 players and play "scrims" instead of ranked/quickplay. I don't know where you're able to see other groups and decide that they're a suitable match -- 90% of players have private profiles, and those that don't only play like 10 ranked games a season, so you can't really gauge how good or bad they are from public data.

(As for brits and aussies joining your west coast games... people VPN in to do that on purpose. Legend has it that west coast gamers are better than east coast gamers, so people all over the world VPN to the west coast and make it a self-fulfilling prophecy.)

will4274 · on March 1, 2021

I think you might be missing my point. Maybe an analogy - the classic problem in distributed database is CAP - consistency, availability, partition tolerance - pick two. Historically, a lot of people picked AP to build their highly available databases. In the context of Spanner, Google basically said this was the wrong tradeoff - availability is more highly impacted by external events like client to server networking incidents than by the design architecture - so you should pick CP instead.

I'm making the same point with regard to matchmaking. Overwatch has tried to optimize for good matches, ignoring all the issues you rightly describe above, and thinking about the tradeoff between time and good matches without regard to external events that impact matches like smurfs, uneven play, DCs, etc. They'd have been better off optimizing for fast matchmaking. It's bad engineering in plain sight, and gamers go out of their way to justify it.

jrockway · on March 1, 2021

That makes sense. The question is would more players quit the game because of long queue times, or because they got stomped when they were new. (That would be the result of just picking the first 12 people in the queue and throwing them in a match.)

I think there is some value in caring about that case. I started playing Overwatch with no FPS background (at 31!) and I never felt like I was in unwinnable games. All the players were as bad as me. (I still remember my first games when a D.va bomb would reliably get 6 kills.)

qyi · on Feb 28, 2021

It's certainly not unique to game consumers. People in general just blame every fault on "it's physically impossible to solve". One big reason why corporations get away with creating non stop worse and worse products.

ufo · on March 1, 2021

The surprising part is that sscanf calls strlen behind the scenes.

xyst · on Feb 28, 2021

not surprising - the game industry is absolutely notorious for cutting corners. didn't know they cut corners this much though.

will r* fix it? maybe, especially since some person literally did half of the work for them. But given R* is a large company, this probably wont be fixed for a long time, and GTAO is probably being maintained by the lowest bid contractor group.

They probably have all of their full time assets working on the next iteration of GTA.

jjulius · on Feb 28, 2021

>But given R* is a large company, this probably wont be fixed for a long time, and GTAO is probably being maintained by the lowest bid contractor group.

They've also made just an absolute assload of money from GTA:O in spite of the godawful load times. Why bother spending the money to fix it when people are happy to deal with it and keep giving you their own cash?

fctorial · on March 1, 2021

> especially since some person literally did half of the work for them

All of the work.

ksec · on Feb 28, 2021

Even after cutting loading by 70% it still take a minute? I haven't played any AAA titles for a long time. But even 30s is way too long. Especially I used to play with HDD. Considering modern SSD can be 30x Faster in Seq Read and Random Read up to 200x.

Is 1 min loading time even normal? Why did it take so long? I never played GTA Online so could someone explain?

Red_Leaves_Flyy · on Feb 28, 2021

Many AAA games take minutes to load.

https://hardcoregamer.com/2020/11/06/ps5-ssd-vs-ps4-hdd-load...

iknowstuff · on March 1, 2021

Could be due to decompression of lots of huge assets.

holyknight · on Feb 28, 2021

Thank god. I always suspected that those loading times were cause by some retarded implementation detail. GTA5 is not that complex to justify that kind of loading times. Even the hardware has scaled massively since their launch and it doesn't even matter.

josephg · on Feb 28, 2021

It so often is. This aspect of modern computing annoys me so much - modern computers, networks and cdns are so fast nowadays that most actions should be instant. My OS should boot basically instantly. Applications should launch instantly. Websites should load almost instantly. I’ll give a pass for 3D modelling, video editing, AAA video games and maybe optimized release builds of code. But everything else should happen faster than I can blink.

But most programs are somehow still really slow! And when you look into why, the reason is always something like this. The code was either written by juniors and never optimized because they don’t know how, or written by mids at the limit of their intelligence. And full of enough complex abstractions that nobody on the team can reason holistically about how the whole program works. Then things get slow at a macro level because fixing it feels hard.

Either way it’s all avoidable. The only thing that makes your old computer feel sluggish for everyday computing is that programmers got faster computers, and then got lazy, and then shipped you crappy software.

saagarjha · on Feb 28, 2021

The most infuriating response to this is “the programs are doing so much these days!” Well, yes, a chat app might do emoji and stuff now. But it’s certainly not doing 1000x the number of things…

josephg · on March 1, 2021

Yes! And a lot of those things are pointless, buggy busywork for the project managers involved. I’d rather well designed, minimal, fast, intuitive software with not many features over something like Xcode - packed full of buggy features that lag or crash the whole ide regularly. Polish is unsexy, but usually way more important than we give it credit for.

As they said at Zynga, move slow and fix your shit.

XCSme · on Feb 28, 2021

Well, if the chat app is implemented in Electron it has to load a fully-fledged browser before even starting to load.

saagarjha · on Feb 28, 2021

Which is clearly not a “feature” I want.

XCSme · on Feb 28, 2021

Why not? You can now have all browser vulnerabilities directly in your chat app!

trollied · on Feb 28, 2021

Wow. I always assumed that profiling would be part of the pre-release test processes for AAA games...

Negitivefrags · on Feb 28, 2021

When it was released the game didn't have all the microtransactions so it probably took no time at all to process the JSON even with this issue.

Then over time they slowly add data to the JSON and then this O(n^2) stuff starts to creep up and up, but the farther away from release you are, the less likely that the kind of engineers to who do optimisation and paying any attention.

They are all off working on the next game.

gbl08ma · on Feb 28, 2021

I had heard about this giant JSON from friends in the GTA V modding community. OP's idea of what it is used for is right. My guess is that this JSON was quite smaller when the game released and has been increasing in size as they add more and more items to sell in-game. Additionally, I speculate that most of the people with the knowledge to do this sort of profiling moved on to work on other Rockstar titles, and the "secondary team(s)" maintaining GTA Online throughout most of its lifespan either didn't notice the problem, since it's something that has become worse slowly over the years, or don't have enough bandwidth to focus on it and fix it.

It's also possible they are very aware of it and are saving up this improvement for the next iteration of GTA Online, running on a newer version of their game engine :)