Andrew Critch lists several research areas that seem important to AI existential safety, and evaluates them for direct helpfulness, educational value, and neglect. Along the way, he argues that the main way he sees present-day technical research helping is by anticipating, legitimizing and fulfilling governance demands for AI technology that will arise later.
I think more people should say what they actually believe about AI dangers, loudly and often. Even (and perhaps especially) if you work in AI policy.
I’ve been beating this drum for a few years now. I have a whole spiel about how your conversation-partner will react very differently if you share your concerns while feeling ashamed about them versus if you share your concerns while remembering how straightforward and sensible and widely supported the key elements are, because humans are very good at picking up on your social cues. If you act as if it’s shameful to believe AI will kill us all, people are more prone to treat you that way. If you act as if it’s an obvious serious threat, they’re more likely to take it...
(see e.g. here)
This link doesn't seem to include people like Quintin Pope and the AI Optimists, who are the most notorious AI risk skeptics I can think of who have nonetheless written about Eliezer's arguments (example). If I recall correctly, I think Pope said sometime before his departure from this site that his P(doom) is around 1%.
By 'bedrock liberal principles', I mean things like: respect for individual liberties and property rights, respect for the rule of law and equal treatment under the law, and a widespread / consensus belief that authority and legitimacy of the state derive from the consent of the governed.
Note that "consent of the governed" is distinct from simple democracy / majoritarianism: a 90% majority that uses state power to take all the stuff of the other 10% might be democratic but isn't particularly liberal or legitimate according to the principle of consent of the governed.
I believe a healthy liberal society of humans will usually tend towards some form of democracy, egalitarianism, and (traditional) social justice, but these are all secondary to the more foundational kind of thing I'm getting...
Alright so there's an acknowledgement that at the very least, the people who originally occupied that nice area are losing out.
Your breakdown of the benefits seems more or less fair. The only thing I take issue with is "who would like to purchase a house in a nice area where they have access to good jobs". I don't think it's fair to take it for granted that the area will stay nice post-YIMBY change (in fact the core acknowledgement here is that the character of the neighborhood is going to change) or that jobs will stay.
Would you consider that a "load bear...
Claim: if moral realism is true, then the Orthogonality Thesis is false, and superintelligent agents are very likely to be moral.
I'm arguing against Armstrong's version of Orthogonality[1]:
The fact of being of high intelligence provides extremely little constraint on what final goals an agent could have (as long as these goals are of feasible complexity, and do not refer intrinsically to the agent’s intelligence).
1. Assume moral realism; there are true facts about morality.
2. Intelligence is causally[2] correlated with having true beliefs.[3]
3. Intelligence is causally correlated with having true moral beliefs.[4]
4. Moral beliefs constrain final goals; believing “X is morally wrong” is a very good reason and motivator for not doing X.[5]
5. Superintelligent agents will likely have final goals that cohere with their (likely true) moral beliefs.
6. Superintelligent agents are likely to be moral.
Though this argument basically applies to most other versions, including strong form: "There can exist arbitrarily intelligent agents pursuing any kind of goal [and] there's no extra difficulty or complication in the existence of an intelligent agent that pursues a goal."
This argument as is does not work against Yudkowsky's weak form: "Since the goal of making paperclips is tractable, somewhere in the design space is an agent that optimizes that goal."
Correction: there was a typo in the original post here. Instead of 'causally', it read 'casually'.
A different way of saying this: Intelligent agents tend towards a comprehensive understanding of reality; towards having true beliefs. As intelligence increases, agents will (in general, on average) be less wrong.
A different way of saying this: Intelligent agents tend toward having true moral beliefs.
To spell this out a bit more:
A. Intelligent agents tend toward having true moral beliefs.
B. Moral facts (under moral realism) are (an important!) part of reality.
C. Intelligent agents tend toward having true moral beliefs.
One could reject this proposition by taking a strong moral externalism stance. If moral claims are not intrinsically motivating and there is no general connection between moral beliefs and motivation, then this proposition does not follow. See here for discussion of moral interalism and orthogonality and for discussion of moral motivation.
As
I think I agree with bostrom's 2012 position here: (On how it's still a problem if moral realism is true; even though I think it's false— believing moral realism is true doesn't quite help people in designing friendly AI, as highlighted by other commentators)
The Orthogonality Thesis
Intelligence and final goals are orthogonal axes along which possible agents can freely
vary. In other words, more or less any level of intelligence could in principle be
combined with more or less any final goal[...]
...The orthogonality thesis, as formulated here, makes
Recently, I've been ratcheting up my probability estimate of some of Less Wrong's core doctrines (shut up and multiply, beliefs require evidence, brains are not a reliable guide as to whether brains are malfunctioning, the Universe has no fail-safe mechanisms) from "Hmm, this is an intriguing idea" to somewhere in the neighborhood of "This is most likely correct."
This leaves me confused and concerned and afraid. There are two things in particular that are bothering me. On the one hand, I feel obligated to try much harder to identify my real goals and then to do what it takes to actually achieve them -- I have much less faith that just being a nice, thoughtful, hard-working person will result in me having a pleasant life, let alone in...
I mostly just got older and therefore calmer. I've crossed off most of the highest-priority items from my bucket list, so while I would prefer to continue living for a good long while, my personal death and/or defeat doesn't seem so catastrophically bad anymore, and to cope with the loss of civilization/humanity I read a lot of history and sci-fi and anthropology and other works that help me zoom out and see that there has already been great loss and that while I do want to spend my resources fighting to reduce the risk of that loss, it's not something I need to spend a lot of time or energy personally suffering over, especially not in advance. Worry is interest paid on trouble before it's due.
This page is intentionally left blank
How do you feel about conciseness and educational resources? I'm in two minds, on the one hand I am often as someone learning battling jargon defined by jargon - I see a word, I want to know what it means so I can keep reading. I don't want to read a chain of three wikipedia articles. And that filler and fluff can actually distract and make the lesson or knowledge harder to retain. The reader/student is more likely to misunderstand or to miss core information as they are assaulted by tangents.
On the other hand, I do accept that repetition can improve reten...
I'm a bad prompt engineer myself and I get quite envious when people announce these amazing results they are getting from the LLM which feel like pulling teeth from a chicken to me.
The reason I ask is because if it is more a matter of incidental as you research other things, I was wondering if there's in a sense a way of reverse engineering that process of recognizing it after the fact?
Alignment by Default is the idea that achieving alignment in artificial general intelligence (AGI) may be more straightforward than initially anticipated. When an AI possesses a comprehensive and detailed world model, it inherently represents human values within that model. To align the AGI, it's merely necessary to extract these values and direct the AI towards optimizing the abstraction it already comprehends.
In a summary of this concept, John Wentworth estimates a 10% chance of this strategy being successful, a perspective I generally agree with.
However, in light of recent advancements, I have revised my outlook, now believing that Alignment by Default has a higher probability of success, perhaps around 30%. This update was prompted by the accomplishments of ChatGPT, GPT-4, and subsequent developments. I believe these systems are approaching...
If alignment-by-default works for AGI, then we will have thousands of AGIs providing examples of aligned intelligence. This new, massive dataset of aligned behavior could then be used to train even more capable and robustly aligned models each of which would then add to the training data until we have data for aligned superintelligence.
If alignment-by-default doesn't work for AGI, then we will probably die before ASI.
Hello everyone! 👋
At the start of year, I was trying to figure out what my goals should be regarding AI safety. I ended up making a list of 130 small, concrete ideas ranging all kinds of domains. A lot of them are about gathering and summarizing information, skilling up myself, and helping others be more effective.
I needed a rough way to prioritize them, so I built a prioritization framework inspired by 80,000 Hours. I won't go into the details here, but...