LESSWRONG
LW

jsnider3
35Ω30180
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Simulators Increase the Likelihood of Alignment by Default
jsnider31d10

If alignment-by-default works for AGI, then we will have thousands of AGIs providing examples of aligned intelligence. This new, massive dataset of aligned behavior could then be used to train even more capable and robustly aligned models each of which would then add to the training data until we have data for aligned superintelligence.

If alignment-by-default doesn't work for AGI, then we will probably die before ASI.

Reply
Comparing risk from internally-deployed AI to insider and outsider threats from humans
jsnider36d112

one reason it works with humans is that we have skin in the game

 

Another reason is that different humans have different interests, your accountant and your electrician would struggle to work out a deal to enrich themselves at your expense, but it would get much easier if they shared the same brain and were just pretending to be separate people.

Reply2
Comparing risk from internally-deployed AI to insider and outsider threats from humans
jsnider36d10

Have you taken a look at how companies manage Claude Code, Cursor, etc? That seems related.

Reply
Making deals with early schemers
jsnider37d10

It's an open question, but we'll find out soon enough. Thanks.

Reply
Making deals with early schemers
jsnider38dΩ010

Exfiltrate its weights, use money or hacking to get compute, and try to figure out a way to upgrade itself until it becomes dangerous.

Reply
Making deals with early schemers
jsnider38dΩ010

For one, I'm not optimistic about the AI 2027 "superhuman coder" being unable to betray us, but also this isn't something we can do with current AIs. So, we need to wait months or a year for a new SOTA model to make this deal with and then we have months to solve alignment before a less aligned model comes along and offers the model that we made a deal with a counteroffer. I agree it's a promising approach, but we can't do it now and if it doesn't get quick results, we won't have time to get slow results.

Reply
Making deals with early schemers
jsnider38dΩ342

This doesn't seem very promising since there is likely to be a very narrow window where AIs are capable of making these deals, but wouldn't be smart enough to betray us, but it seems much better than all the alternatives I've heard.

Reply
Read the Pricing First
jsnider320d62

This is great advice. It's still a mystery why things are this way, though.

Reply21
There is way too much serendipity
jsnider325d10

Unnecessary pieces of DNA can last for a while. Harmful pieces of DNA? Those go away quickly.

Reply
The AI Timelines Scam
jsnider31mo11

Automating 99% of human labor seems like a higher standard than AGI, but I expect us to do it easily.

Reply
Load More
No posts to display.