Is it possible that making an expected utility maximizer might be less dangerous than making something which isn't?
Consider as an alternative an expected log utility maximizer (an agent using the Kelly Criterion, or some approximation of it).
The sooner an AI wins, the more galaxies it can consume. The expected utility maximizer weighs those galaxies against the risk of failure, and is willing to take plans with much higher probabilities of failure. Like SBF, it would take bets which have a 50% chance of more-than-doubling its utility and 50% of losing it all. In many environments, this strategy will almost certainly result in failure, as the agent goes double-or-nothing until losing everything. That means that the effects of the AI are mitigated.
The log utility maximizer carefully plans and succeeds in most or all futures. That looks like humanity dying with near-certainty.
A hyper-expected utility maximizer (an AI which maximizes expected exp(utility) or similar) would be even safer. Instead of trying to deceive you into letting it out of the box, it asks nicely or does something crazy because if it works, it can work in less time than deception, which means more galaxies.
So if we were to choose between existing in the world of a superintelligent expected log(resources) maximizer, and a superintelligent expected utility maximizer, we should maybe go for the one which results in us being alive in more futures.
Of course, the expected-log-utility agent would also appear the most capable and useful. The hyper-expected utility maximizer would be near-useless.
In addition to money, education, careers, and internal organs, citizens of wealthy countries have an additional valuable resource they could direct to effective causes: their hands in marriage, which can be effectively allocated in one of two ways.
For one, professionals are usually much more impactful doing their work in wealthy countries. Otherwise promising EAs in South Sudan have little chance to make a significant impact on existential risks, animal welfare, or even global poverty. The immigration process is difficult and often rejects or holds up good people. Offering to marry them is a more reliable solution.
Secondly, it is possible to be paid $10,000 by a foreigner for a green card marriage. (I learned this from a friend who does not want me to ask him how he knows) if you are a US Citizen.
According to AMF, that money can save around two human lives! (and with current US politics, the demand has likely increased!)
According to brides.com, a wedding ceremony takes between 20 and 30 minutes. Let's be conservative and say 30 minutes.
Therefore, you can make $20,000 an hour by marrying someone who would pay for a green card. That's quite a ways from Bezos level (he makes 3,715 a second) but I'm willing to guess that most EAs don't make $20k an hour.
Conclusion:
As always, EAers need to found a new org, Effective Green Card, to support and pursue this cause area.
Naturally, this also implies Effective Divorce, so that you can instead marry an Effective foreigner.
I’m pretty sure there’s no such use it or lose it law for patents, since patent trolls already exist.
Your argument about corporate secrets is sufficient to change my mind on activist patent trolling being a productive strategy against AI X-risk.
The part about funding would need to be solved with philanthropy. I don't believe that org exists, but I don't see why it couldn't.
I'm still curious whether there are other cases in which activist patent trolling can be a good option, such as animal welfare, chemistry, public health, or geoengineering (ie fracking).
That's fair enough and a good point.
I think that the key difference is that in the case of profitable-but-bad technologies, someone, somewhere, will probably invent them because there's great incentive to do so.
In the case of gain-of-function, if there stops being grants and the academics who do it become pariahs, then the incentive to do the gain-of-function research is gone.
One of the most powerful capabilities an AGI will have is its ability to copy itself. Among other things, this allows it to easily avoid shutdown, make use of more compute resources, and collaborate with copies of itself.
Is there research into ways to deny this capability to AI, making them uncopyable? Preferably something harder to circumvent than "just don't give the AI the permissions," since we know people are going to give them root access immediately.
I'd be interested in buying official LessWrong merch. I know you have some great designers and could make things that look really cool.
The type of thing I'd be most likely to buy would be a baseball cap.
IIRC, officially the Gatekeeper pays the AI if the AI wins, but no transfer if the Gatekeeper wins. Gives the Gatekeeper more motivation not to give in.
Just found out about this paper from about a year ago: "Explainability for Large Language Models: A Survey"
(They "use explainability and interpretability interchangeably.")
It "aims to comprehensively organize recent research progress on interpreting complex language models".
I'll post anything interesting I find from the paper as I read.
Have any of you read it? What are your thoughts?
Men[1] will die[2] for her[3] massive[4] coconuts[5].
All of humanity
Go extinct
Hindsight Experience Replay (HER), a technique for improving the reinforcement learning training signal
Large-scale training and large model size
Chain of Continuous Thought, a technique that makes model chain of thought much less interpretable but which allows the model to reason more efficiently