There are a bunch of weird sub points and side things here, but I think the big one is that narrow intelligence is not some bounded 'slice' of general intelligence. It's a different kind of thing entirely. I wouldn't model interactions with a narrow intelligence in a bounded environment as at all representative of superintelligence (except as a lower bound on the capabilities one should expect of superintellignece!). A superintelligence also isn't an ensemble of individual narrow AIs (it may be an ensemble of fairly general systems a la MoE, but it won't be "stockfish for cooking plus stockfish for navigation plus stockfish for....", because that would leave a lot out).

Stockfish cannot, for instance, change the rules of the game or edit the board state in a text file or grow arms, punch you out, and take your wallet. A superintelligence given the narrow task of beating you at queen's odds chess could simply cheat in ways you wouldn't expect (esp. if we put the superintelligence into a literally impossible situation that conflicts with some goal it has outside the terms of the defined game).

We lack precisely a robust mechanism of reliably bounding the output space of such an entity (an alignment solution!). Something like mech interp just isn't really targeted at getting that thing, either; it's foundational research with some (hopefully a lot of) safety-relevant implementations and insights.

I think this is the kind of thing Neel is pointing at with "how exceptionally hard it is to reason about how a system far smarter than me could evade my plans." You don't know how stockfish is going to beat you in a fair game; you just know (or we can say 'strong prior' but like... 99%+, right?) that it will. And that 'fair game' is the meta game, the objective in the broader world, not capturing the king.

(I give myself like a 4/10 for this explanation; feel even more affordance than usual to ask for clarification.)

Interpretability Will Not Reliably Find Deceptive AI

yams10d30

Can you offer more explanation for: “the reason the queen’s odds things works…”

My guess is that this would be true if Stockfish were mostly an LLM or similar (making something like ‘the most common move’ each time), but it seems less likely for the actual architecture of Stockfish (which leans heavily on tree search and, later in the game, searches from a list of solved positions and implements their solutions). Perhaps this is what you meant by beginning your reply with ‘conceptually’, but I’m not sure.

[I do basically just think this particular example is a total disanalogy, and literally mean this as a question about Stockfish.]

AI Doomerism in 1879

yams18d10

To the extent that you're saying "I'd like to have more conversations about why creating powerful agentic systems might not go well by default; for others this seems like a given, and I just don't see it", I applaud you and hope you get to talk about this a whole bunch with smart people in a mutually respectful environment. However, I do not believe analogizing the positions of those who disagree with you with luddites from the 19th century (in particular when thousands of pages of publicly available writings, with which you are familiar, exist) is the best way to invite those conversations.

Quoting the first page of a book as though it contained a detailed roadmap of the central (60,000-word) argument's logical flow (which to you is apparently the same as a rigorous historical account of how the authors came to believe what they believe) — while it claims to do nothing of the sort — simply does not parse at all. If you read the book (which I recommend, based on your declared interests here), or modeled the pre-existing knowledge of the median book website reader, you would not think "anything remotely like current techniques" meant "we are worried exclusively about deep learning for deep learning-exclusive reasons; trust us because we know so much about deep learning."

If you find evidence of Eliezer, Nate, or similar saying "The core reason I am concerned about AI safety is [something very specific about deep learning]; otherwise I would not be concerned", I would take your claims about MIRI's past messaging very seriously. As is, no evidence exists before me that I may consider in support of this claim.

Based on what you've said so far, you seem to think that all of the cruxes (or at least the most important ones) must either be purely intuitive or purely technical. If they're purely intuitive, then you dismiss them as the kind of reactionary thinking someone from the 19th century might have come up with. If they're purely technical, you'd be well-positioned to propose clever technical solutions (or else to discredit your interlocutor on the basis of their credentials).

Reality's simply messier than that. You likely have both intuitive and technical cruxes, as well as cruxes with irreducible intuitive and technical components (that is, what you see when you survey the technical evidence is shaped by your prior, and your motivations, as is true for anyone; as was true for you when interpreting that book excerpt).

I think you're surrounded by smart people who would be excited to pour time into talking to you about this, conditional on not opening that discussion with a straw man of their position.

ryan_greenblatt's Shortform

yams21d10

Is the crux that the more optimistic folks plausibly agree (2) is cause for concern, but believe that mundane utility can be reaped with (1), and they don't expect us to slide from (1) into (2) without noticing?

AI Doomerism in 1879

yams22d*84

I don’t think the mainline doom arguments claim to be rooted in deep learning?

Mostly they’re rigorized intuitive models about the nature of agency/intelligence/goal-directedness, which may go some way toward explaining certain phenomena we see in the behavior of LLMs (ie the Palisade Stockfish experiment). They’re theoretical arguments related to a broad class of intuitions and in many cases predate deep learning as a paradigm.

We can (and many do) argue over whether our lens ought to be top-down or bottom-up, but leaning toward the top down approach isn’t the same thing as relying on a-rigorous anxieties of the kind some felt 100 years ago.

Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies

yams25d126

This strikes me as straightforwardly not the purpose of the book. This is a general-audience book that makes the case, as Nate and Eliezer see it, for both the claim in the title and the need for a halt. This isn’t inside baseball on the exact probability of doom, whether the risks are acceptable given the benefits, whether someone should work at a lab, or any of the other favorite in-group arguments. This is For The Out Group.

Many people (like >100 is my guess), with many different view points, have read the book and offered comments. Some of those comments can be shared publicly and some can’t, as is normal in the publishing industry. Some of those comments shaped the end result, some didn’t.

Eukryt Wrts Blg

yams1mo63

Oh, this is all familiar to me and I have my reservations about democracy (although none of them are race-flavored).

The thing I’m curious about is the story that makes the voting habits of 2-3 percent of the population The Problem.

Eukryt Wrts Blg

yams1mo32

What’s the model here?

Eukryt Wrts Blg

yams1mo3029

Then your opponent can counter-argue that your statements are true but cherry-picked, or that your argument skips logical steps xyz and those steps are in fact incorrect. If your opponent instead chooses to say that for you to make those statements is unacceptable behavior, then it's unfortunate that your opposition is failing to represent its side well. As an observer, depending on my purposes and what I think I already know, I have many options, ranging from "evaluating the arguments presented" to "researching the issue myself".

My entire point is that logical steps in the argument are being skipped, because they are, and that the facts are cherrypicked, because they are, and my comment says as much, as well as pointing out a single example (which admits to being non-comprehensive) of an inconvenient (and obvious!) fact left out of the discussion altogether, as a proof of concept, precisely to avoid arguing the object level point (which is irrelevant to whether or not Crimieux's statement has features that might lead one to reasonably dis-prefer being associated with him).

We move into 'this is unacceptable' territory when someone shows themselves to have a habit of forcefully representing their side using these techniques in order to motivate their conclusion, which many have testified Cremieux does, and which is evident from his banning in a variety of (not especially leftist, not especially IQ and genetics hostile) spaces. If your rhetorical policies fail to defend against transparently adversarial tactics predictably pedaled in the spirit of denying people their rights, you have a big whole in your map.

OP didn't use the word "threat". He said he was "very curious about aboriginals" and asked how do you live with them.

You quoted a section that has nothing to do with any of what I was saying. The exact line I'm referring to is:

How do you have a peaceable democracy (or society in general) with a population composed of around 3% (and growing) mentally-retarded people whose vote matters just as much as yours?

The whole first half of your comment is only referencing the parenthetical 'society in general' case, and not the voting case. I assume this is accidental on your part and not a deliberate derailment. To be clear about the stakes:

This is the conclusion of the statement. This is the whole thrust he is working up to. These facts are selected in service of an argument to deny people voting rights on the basis of their race. If the word 'threat' was too valenced for you, how about 'barrier' or 'impediment' to democracy? This is the clear implication of the writing. This is the hypothesis he's asking us to entertain: Australia would be a better country if Aborigines were banned from voting. Not just because their IQs are low, or because their society is regressive, but because they are retarded.

He's not expressing curiosity in this post. He's expressing bald-faced contempt ("Uncouth.. dullards"). I'm not a particularly polite person, and this is language I reserve for my enemies. His username is a transphobic slur. Why are you wasting your charity on this person?

Decoupling isn't ignoring all relevant context within a statement to read it in the most generous possible light; decoupling is distinguishing the relevant from the irrelevant to better see the truth. Cremieux has displayed a pattern of abhorrent bigotry, and I am personally ashamed that my friends and colleagues would list him as an honored guest at their event.

Eukryt Wrts Blg

yams1mo5-3

"How do you have a peaceable democracy (or society in general) with a population...?"

easy: we already do this. Definitionally, 2 percent of people are <70 IQ. I don't think we would commonly identify this as one of the biggest problems with democracy.

I think this demonstrates a failure mode of the 'is it true?' heuristic as a comprehensive methodology for evaluating statements. I can string together true premises (and omit others) to support a much broader range of conclusions than are supported by the actual preponderance of the evidence. (i.e., even if we accept all the premises presented here, the suggestion that letting members of a certain racial group vote is a threat to democracy completely dissolves with the introduction of one additional observation).

[for transparency: my actual belief here is that IQ is a very crude implement with results mediated by many non-genetic population-level factors, but I don't think I need to convince you of this in order to update you toward believing the author is engaged in motivated reasoning!]