one reason it works with humans is that we have skin in the game
Another reason is that different humans have different interests, your accountant and your electrician would struggle to work out a deal to enrich themselves at your expense, but it would get much easier if they shared the same brain and were just pretending to be separate people.
Have you taken a look at how companies manage Claude Code, Cursor, etc? That seems related.
It's an open question, but we'll find out soon enough. Thanks.
Exfiltrate its weights, use money or hacking to get compute, and try to figure out a way to upgrade itself until it becomes dangerous.
For one, I'm not optimistic about the AI 2027 "superhuman coder" being unable to betray us, but also this isn't something we can do with current AIs. So, we need to wait months or a year for a new SOTA model to make this deal with and then we have months to solve alignment before a less aligned model comes along and offers the model that we made a deal with a counteroffer. I agree it's a promising approach, but we can't do it now and if it doesn't get quick results, we won't have time to get slow results.
This doesn't seem very promising since there is likely to be a very narrow window where AIs are capable of making these deals, but wouldn't be smart enough to betray us, but it seems much better than all the alternatives I've heard.
This is great advice. It's still a mystery why things are this way, though.
Unnecessary pieces of DNA can last for a while. Harmful pieces of DNA? Those go away quickly.
Automating 99% of human labor seems like a higher standard than AGI, but I expect us to do it easily.
If alignment-by-default works for AGI, then we will have thousands of AGIs providing examples of aligned intelligence. This new, massive dataset of aligned behavior could then be used to train even more capable and robustly aligned models each of which would then add to the training data until we have data for aligned superintelligence.
If alignment-by-default doesn't work for AGI, then we will probably die before ASI.