Daniel Kokotajlo - LessWrong

Was a philosophy PhD student, left to work at AI Impacts, then Center on Long-Term Risk, then OpenAI. Quit OpenAI due to losing confidence that it would behave responsibly around the time of AGI. Now executive director of the AI Futures Project. I subscribe to Crocker's Rules and am especially interested to hear unsolicited constructive criticism. http://sl4.org/crocker.html

Some of my favorite memes:

(by Rob Wiblin)

Comic. Megan & Cueball show White Hat a graph of a line going up, not yet at, but heading towards, a threshold labelled "BAD". White Hat: "So things will be bad?" Megan: "Unless someone stops it." White Hat: "Will someone do that?" Megan: "We don't know, that's why we're showing you." White Hat: "Well, let me know if that happens!" Megan: "Based on this conversation, it already has."
(xkcd)

My EA Journey, depicted on the whiteboard at CLR:

(h/t Scott Alexander)

Alex Blechman @AlexBlechman Sci-Fi Author: In my book I invented the Torment Nexus as a cautionary tale Tech Company: At long last, we have created the Torment Nexus from classic sci-fi novel Don't Create The Torment Nexus 5:49 PM Nov 8, 2021. Twitter Web App

(h/t Scott Alexander)

Nope! I think it's great now. In fact I did it myself already. And in fact I was probably wrong two years ago.

Great question!

I do remember thinking that the predictions in What 2026 Looks Like weren't as wild to insiders as they were to everyone else. Like, various people I knew at the time at Anthropic and OpenAI were like "Great post, super helpful, seems about right to me."

However, I also think that AI 2027 is more... toned down? Sharp edges rounded off? Juicy stuff taken out? compared to What 2026 Looks Like, because it underwent more scrutiny and because we had limited space, and because we had multiple authors. Lots of subplots were deleted, lots of cute and cool ideas were deleted.

My guess is that the answer to your question is 2/3rds "You have learned more about AI compared to what you knew in 2021" and 1/3rd "AI 2027 is a bit more conservative/cautious than W2026LL"

Another thing though: In an important sense, AI 2027 feels more speculative to me now than W2026LL did at the time of writing. This is because AI 2027 is trying to predict something inherently more difficult to predict. W2026LL was trying to predict pretty business-as-usual AI capabilities growth trends and the effects they would have on society. AI 2027 is doing that... for about two years, then the intelligence explosion starts and things go wild. I feel like if AI 2027 looks as accurate in 2029 as W2026LL looks now, that'll be a huge fucking achievement, because it is attempting to forecast over more unknowns so to speak.

To what extent do you think your alpha here was in making unusually good predictions, vs. in paying attention to the correct things at a time when no-one focused on them, then making fairly basic predictions/extrapolations?

In my experience, the best way to make unusually good predictions is to pay attention to the correct things at a time when no one is focusing on them, and then make fairly basic extrapolations/predictions. (How else would you do it?)

Cool. So, I feel pretty confident that via some combination of different-slope experience curves and multiple one-time gains, ASI will be able make the industrial explosion go significantly faster than... well, how fast do you think it'll go exactly? Your headline graph doesn't have labels on the x-axis. It just says "Time." Wanna try adding date labels?

What's this about hitting plateaus though? Do experience curves hit plateaus?

Re: the ratio becoming extreme: You say this is implausible, but it's exactly what happens when you hit a plateau! When you hit a plateau, that means that even as you stack on more OOMs of production, you can't bring the price below a certain level.

Another argument that extreme ratios aren't implausible: It's what happens whenever engineers get something right on the first try, or close to the first try, that dumber people or processes could have gotten right eventually through trial and error. Possible examples: (1) Modern scientists making a new food product detect a toxic chemical in it and add an additional step to the cooking process to eliminate it. In ancient times, native cultures would have stumbled across a similar solution after thousands of years of cultural selection. (2) Modern engineers build a working rope bridge over a chasm, able to carry the desired weight (100 men?) on the first try since they have first principles physics and precise measurements. Historically ancient cultures would have been able to build this bridge too but only after ~thousand earlier failed attempts that either broke or consumed too much rope (been too expensive).

(For hundreds of thousands of years, the 'price' of firewood was probably about the same, despite production going up by OOMs, until the industrial revolution and mechanized logging)

Would you agree that AI R&D and datacenter security are safety-critical domains?

(Not saying such deployment has started yet, or at least not to a sufficient level to be concerned. But e.g. I would say that if you are going to have loads of very smart AI agents doing lots of autonomous coding and monitoring of your datacenters, analogous to as if they were employees, then they pose an 'insider threat' risk, and could potentially e.g. sabotage their successor systems or the alignment or security work happening in the company. Misalignments in these sorts of AIs could, in various ways, end up causing misalignments in successor AIs. During an intelligence explosion / period of AI R&D automation, this could result in misaligned ASI. True, such ASI would not be deployed outside the datacenter yet, but I think the point to intervene is before then, rather than after.)

He makes some obvious points everyone already knows about bottlenecks etc. but then doesn't explain why all that adds up to a decade or more, instead of of a year, or a month, or a century. In our takeoff speeds forecast we try to give a quantitative estimate that takes into account all the bottlenecks etc.

Right yeah aligned AIs should have a fair place too of course.

Very interesting! But I'm not convinced. Some speculation to follow:

In a more dynamic war of maneuver, won't finding/locating your enemy be even more of an issue than it is today? If there are columns of friendly and enemy forces driving every which way in a hurried confusion, trying to exploit breakthroughs or counterattack, having "drone superiority" so that you can see where they are and they can't see where you are seems super important. OK, so that's an argument that air superiority drones will be crucial, but what about bomber drones vs. drone-corrected artillery? Currently bomber drones have something like 20km range compared to 40km range for artillery. Since they are quadcopters though I think that they'll quickly be supplanted by longer-ranged variants, e.g. fixed-wing drones. (Zipline's medical supply drones currently have 160km range) So I think there will be a type of future platform that's basically a pickup truck with a rail for launching fixed-wing bomber drones capable of taking out a tank. This truck will be to a self-propelled artillery piece what a carrier is to a battleship: Before the battleship/artillery gets in range, it'll be detected and obliterated by a concentrated airstrike launched from the carrier/truck. As a bonus the truck can also carry and launch air superiority drones too. Like the Pacific in WW2, most major battles will take place beyond artillery range, between flights of drones launched by groups of carriers/trucks. Oh, and yeah another advantage of the drone carriers vs. the artillery is that they are much, much cheaper & also can potentially take cover more easily (e.g. if your column of trucks is spotted, your men can get out and take the drones into the basements of nearby houses and continue to fight from there, whereas you can't hide your artillery in a basement.)

Also: The ultra static nature of the Russo-Ukrainian war is generally thought to be because of drones. The reason it's been a stalemate is that drones currently favor the defender, because they make it easy to spot and attack enemy concentrations well before they even reach the front lines. The attacker can't do traditional attack tactics anymore. (i.e. accumulate forces in secret behind your lines, across from a weak spot in enemy line, then charge and break through, then exploit, hoping to encircle pockets of enemy forces. Recent successful example: Kharkiv offensive.) There are many, many examples of columns of vehicles being obliterated by drone-corrected artillery, drones, land mines, and drone-laid land mines, before even reaching the front lines. So current offensive tactics have shifted to a sort of piecemeal thing where you send in a constant trickle of troops, often on foot, to gradually erode the enemy line primarily not by doing any shooting themselves but by forcing the enemy to kill them and thereby reveal their positions and get hit by artillery and drones. And this sort of thing is inherently extremely slow and defender-advantaging.

OK I just had a chat with Eli to try to trace the causal history as best we can remember. At a high level, we were working on the scenario and the supplementary research in parallel, and went back and forth making edits to both for months, and our views evolved somewhat over the course of that time.

Timelines: We initially set AGI in 2027 based on my AGI median, which was informed based on a combination of arguments regarding gains from scaling up agency training, as well as a very crude, handwavy version of what later became the benchmarks and gaps model. Later timelines modeling (the stuff that actually went on the website) along with some additional evidence that came out, pushed my median back to 2028. We denoted this in a footnote on the site (footnote #1 in fact) and I posted a shortform about it (plus a tweet or two). tl;dr is that 2027 was my mode, not my median, after the update. We considered rewriting the scenario to happen about one year later, due to this, but decided against since that would have taken a lot of extra time and didn't really change any of the implications. If the timelines model had given very different results which changed our views against 2027 being plausible, then we would have re-written the scenario. I also mentioned this to Kevin Roose in my interview with him (my somewhat later timelines, the difference between median and mode). I didn't expect people to make such a big deal of this.
Takeoff: The takeoff model for our first scenario, the "practice scenario" which we basically scrapped, was basically a simplified version of Davidson's takeoff speeds model. (takeoffspeeds.com) Later takeoff modeling informed which milestones to focus on the scenario (superhuman coder, superhuman AI researcher, etc.) and what AI R&D progress multiplier they should have. Our memory isn't clear on to what extent they also resulted in changes to the speed of the milestone progression. We think an early crude version of our takeoff model might have resulted in significant changes, but we aren't sure. We were also working on our takeoff model up until the last minute, and similar to the timelines model mostly used it as a sanity check.
Compute: The first version of this was done in early 2024, and the result of it and future versions were directly imported into the scenario.
AI Goals: Early versions of this supplement were basically responsible for our decision to go with instrumentally convergent goals as the AIs' ultimate goals in the scenario.
Security: This one was in between a sanity check and directly feeding into the scenario. It didn't result in large changes but confirmed the likelihood of the weight theft and informed various decisions about e.g. cyberattacks.

So.... Habryka's description is somewhat accurate, certainly more accurate than your description ("no meaningful sense"). But I think it still undersells it. That said, it's definitely not the case that we wrote all the supplements first and then wrote the scenario based on the outputs of those calculations; instead, we wrote them in parallel, had various shitty early versions, etc.

If you want to know more about the evidence & modelling that shaped our views in early 2024 when we were starting the project, I could try to compile a list. I've already mentioned takeoffspeeds.com for example. There's lots of other writing I've put on LessWrong on the subject as well.

Does this help?

...just to make sure I'm following, EDIT3 is saying that you still get blackmail in the original scenario even if you delete the "analyze the situation you are in and what that situation implies for your ability to continue pursuing your goals" clause?

What 2026 looks like