andrew sauer - LessWrong

Human takeover might be worse than AI takeover

Keep in mind also, that humans often seem to just want to hurt each other, despite what they claim, and have more motivations and rationalizations for this than you can even count. Religious dogma, notions of "justice", spitefulness, envy, hatred of any number of different human traits, deterrence, revenge, sadism, curiosity, reinforcement of hierarchy, preservation of traditions, ritual, "suffering adds meaning to life", sexual desire, and more and more that I haven't even mentioned. Sometimes it seems half of human philosophy is just devoted to finding ever more rationalizations to cause suffering, or to avoid caring about the suffering of others.

AI would likely not have all this endless baggage causing it to be cruel. Causing human suffering is not an instrumentally convergent goal. So, most AIs will not have it as a persistent instrumental or terminal goal. Not unless some humans manage to "align" it. Most humans DO have causing or upholding some manner of suffering as a persistent instrumental or terminal goal.

Does this game have a name?

Answer by andrew sauerApr 12, 202510

This is equivalent to the game Westley played with Vizzini. You know, if Westley didn't cheat. I like to call it "Sicilian Chess" for that reason, though that's just me.

LWLW's Shortform

andrew sauer1mo93

Trump shot an arrow into the air; it fell to Earth, he knows not where...

Probably one of the best succinct summaries of every damn week that man is president lmao

Love is Love, Science is Fake

andrew sauer1mo10

LOL @ the AI-warped book in that guy's hands

What is Evil about creating House Elves?

andrew sauer1mo20

Now you can!

The "Intuitions" Behind "Utilitarianism"

andrew sauer1mo10

Gwern seems to think this would be used as a way to get rid of corrupt oligarchs, but... Wouldn't this just immediately be co-opted by those oligarchs to solidify their power by legally paying for the assassinations of their opponents? Markets aren't democratic, because a small percentage of the people have most of the money.

What fact that you know is true but most people aren't ready to accept it?

andrew sauer2mo10

To be fair, my position is less described by that Quirrell quote and more by Harry's quote when he's talking to Hermione about moral peer pressure:

"The way people are built, Hermione, the way people are built to feel inside, is that they hurt when they see their friends hurting. Someone inside their circle of concern, a member of their own tribe. That feeling has an off-switch, an off-switch labelled 'enemy' or 'foreigner' or sometimes just 'stranger'. That's how people are, if they don't learn otherwise."

Unlike Quirrell I give people the credit for actually caring, rather than pretending to care, about people. I just don't think that extends to very many people, for most people.

Scope Insensitivity

andrew sauer2mo30

Fun fact for those reading this in the far future, when Eliezer said "effective altruist" in this piece, he most likely was using the literal meaning, not referring to the EA movement, as that name hadn't been coined yet.

Trojan Sky

andrew sauer2mo72

Wildbow (the author of Worm) is currently writing a story with a quite similar premise

What are the best arguments for/against AIs being "slightly 'nice'"?

Answer by andrew sauerFeb 20, 202532

In fact I think it’s safe to say that we’d collectively allocate much more than 1/millionth of our resources towards protecting the preferences of whatever weak agents happen to exist in the world (obviously the cows get only a small fraction of that).

Sure, but extrapolating this to unaligned AI is NOT an encouraging sign. We may allocate greater than 1/million of our resources to animal rights, but we allocate a whole lot more than that to goals which diametrically go against the preferences of those animals such as eating meat and cheese and eggs; we allocate MUCH more resources to "animal wrongs" than animal rights, so to speak.

So to show an AI will be "nice" to humans at all, it is not enough to suppose that it might have some 1/million "nice to humans" term. It requires showing that that term won't be outweighed handily by the rest of its utility function.

LESSWRONG
LW

Posts

Wikitag Contributions

Comments