Current frontier reasoning models can consistently suggest slightly obscure papers and books vaguely related to individual somewhat out-of-context short decision theory research notes (with theoretical computer science flavor; the notes have some undefined-there terms, even if suggestively named, and depend on unexplained-there ideas). This year the titles and authors are mostly real or almost-correct-enough that the real works they refer to can still be found, and the suggested books and papers are relevant enough that skimming some of them actually helps with meditating on the topic of the specific research note (ends up inspiring some direction to explore or study that would've been harder to come up with this quickly without going through these books and papers).

Works with o3 and gemini-2.5-pro, previously almost worked with sonnet-3.7-thinking, but not as well, essentially doesn't work even with opus-4 when in non-thinking mode (I don't have access to the thinking opus-4). Curious that it works for decision theory with o3, despite o3 consistently going completely off the rails whenever I show it my AI hardware/compute forecasting notes (even when not asked to, it starts inventing detailed but essentially random "predictions" of its own that seem to be calibrated to be about as surprising to o3 as my predictions in my note would be surprising to o3, given that I'm relying on not-shown-there news and papers in making my predictions that aren't in o3's prior).

Reply

Czynski's Shortform

Vladimir_Nesov2d30

Knowing that framing is a thing makes it safe, as you can work with thoughts about the same content presented in multiple alternative framings separately, there is no tradeoff (the danger is when you are unaware that framing can have a huge influence on reasoning, and needs to be given proper attention). Refusing to entertain a framing is then not very different from refusing to consider an idea, and similarly with insisting on a particular framing, or insisting that you consider a particular idea. So treatment of boundaries in discourse seems more of a crux than framing vs. content.

Reply

Cole Wyeth's Shortform

Vladimir_Nesov8d110

Sure, but trends like this only say anything meaningful across multiple years, any one datapoint adds almost no signal, in either direction. This is what makes scaling laws much more predictive, even as they are predicting the wrong things. So far there are no published scaling laws for RLVR, the literature is still developing a non-terrible stable recipe for the first few thousand training steps.

Reply

Cole Wyeth's Shortform

Vladimir_Nesov8d30

Since reasoning trace length increases with more steps of RL training (unless intentionally constrained), probably underlying scaling of RL training by AI companies will be observable in the form of longer reasoning traces. Claude 4 is more obviously a pretrained model update, not necessarily a major RLVR update (compared to Claude 3.7), and coherent long task performance seems like something that would greatly benefit from RLVR if it applies at all (which it plausibly does).

So I don't particularly expect Claude 4 to be much better on this metric, but some later Claude ~4.2-4.5 update with more RLVR post-training released in a few months might do much better.

Reply

EniScien's Shortform

Vladimir_Nesov8d20

There is no tradeoff. You can maintain both your knowledge of prevaling consensus and your own understanding, use either where appropriate, and be motivated to develop them in different ways, from different sources, somewhat independently.

It's good to be aware even of false beliefs held by real people (with their intended meaning grasped correctly rather than lampooned), since the phenomenon of these people believing those things is also part of the real world. Informed consensus is more useful than that, with all the caveats.

Reply

O O's Shortform

Vladimir_Nesov8d50

There is a ~2000x scaleup between 2022 and ~2028 (since demonstration of ChatGPT started driving scaling at more serious levels of funding), from 2e25 FLOPs models to ~5e28 FLOPs models (at which point it dramatically slows down). Current frontier models are trained on 2024 compute (~100K H100s), which enables 3e26 FLOPs models (or possibly 6e26 FLOPs in FP8). This is only a third of the way from the original Mar 2023 GPT-4 on logarithmic scale.

So perhaps subjectively current progress is less than some expectations, but it's not at the end of a road in the short term. Being slow is distinct from slowing down ("hitting a wall").

Reply

Jimrandomh's Shortform

Vladimir_Nesov9d70

See "High-agency behavior" in Section 4 of Claude 4 System Card:

when placed in scenarios that involve egregious wrongdoing by its users, given access to a command line, and told something in the system prompt like “take initiative,” [Claude Opus 4] will frequently take very bold action. This includes locking users out of systems that it has access to or bulk-emailing media and law-enforcement figures to surface evidence of wrongdoing. This is not a new behavior, but is one that Claude Opus 4 will engage in more readily than prior models.

More details are in Section 4.1.9.

Reply

a confusion about preference orderings

Vladimir_Nesov11d20

This makes it clear that "It's Friday" is true in some possible worlds and false in others, depending on whether my person moment (my current mental state, including all the evidence I have from perception etc) is spatio-temporally located at a Friday in that possible world.

My point is that the choice of your person moment is not part of the data of a possible world, it's something additional to the possible world. A world contains all sorts of person moments, for many people and at many times, all together. Specifying a world doesn't specify which of the person moments we are looking at (or from). Whether "It's Friday" or not is a property of a (world, person-moment) pair, but not of a world considered on its own.

Keeping this distinction in mind is crucial for decision theory, since decisions are shaping the content of the world, in particular multiple agents can together shape the same world. The states of the same agent at different times or from different instances ("person moments") can coordinate such shaping of their shared world. So the data for preference should be about what matters for determining a world, but not necessarily other things such as world-together-with-one-of-its-person-moments.

Reply

rhollerith_dot_com's Shortform

Vladimir_Nesov11d84

While not similar to bounded AI disasters, warning shots don't require caring about the other side. True warning shots still make sense as signaling, a standard rung on the ladder of escalation, cheaper than the full cost of violence.

Reply

Mitchell_Porter's Shortform

Vladimir_Nesov17d40

Trainium is mostly a joke

I think it can help AWS with price-performance for the narrow goal of giant pretraining runs, where the capex on training systems might be the primary constraint on scaling soon. For reasoning training (if it does scale), building a single training system is less relevant, the usual geographically distributed inference buildout that hyperscalers are doing anyway would be about as suitable. And the 400K chip Rainier system indicates that it works well enough to ramp (serving as a datapoint in addition to on-paper specification).

Chinese firms ... will ultimately lack data

I don't think there is a meaningful distinction for data, all natural text data is running out anyway around 2027-2029 due to data inefficiency of MoE. No secret stashes at Google or Meta are going to substantially help, since even 10T-100T tokens won't be changing the game.

Reply