leogao

Sequences

Alignment Stream of Thought

Wikitag Contributions

Comments

Sorted by
leogao72

I am confused why you think my claims are only semi related. to me my claim is very straightforward, and the things i'm saying are straightforwardly converying a world model that seems to me to explain why i believe my claim. i'm trying to explain in good faith, not trying to say random things. i'm claiming a theory of how people parse information, to justify my opening statement, which i can clarify as:

  • sometimes, people use the rhetorical move of saying something like "people think 95% doom is overconfident, yet 5% isn't. but that's also being 95% confident in not-doom, and yet they don't consider that overconfident. curious." followed by "well actually, it's only a big claim under your reference class. under mine, i.e the set of all instances of a more intelligent thing emerging, actually, 95% doom is less overconfident than 5% doom" this post was inspired by seeing one such tweet, but i see such claims like this every once in a while that play reference class tennis.
  • i think this kind of argument is really bad at persuading people who don't already agree (from empirical observation). my opening statement is saying "please stop doing this, if you do it, and thank you for not doing this, if you dont already do it" the rest of my paragraphs provide an explanation of my theory for why this is bad for changing people's minds. this seems pretty obviously relevant for justifying why we should stop doing the thing. i sometimes see people out there talk like this (including my past self at some point), and then fail to convince people, and then feel very confused about why people don't see the error of their ways when presented with an alternative reference class. if my theory is correct (maybe it isn't, this isn't a super well thought out take, it's more a shower thought), then it would explain this, and people who are failing to convince people would probably want to know why they're failing. i did not spell this out in my opening statement because i thought it was clear but in retrospect this was not clear from the opening statement
  • i don't think the root cause is people being irrational epistemically. i think there is a fundamental reason why people do this that is very reasonable. i think you disagree with this on the object level and many of my paragraphs are attempting to respond to what i view as the reason you disagree. this does not explicitly show up in the opening statement, but since you disagree with this, i thought it would make sense to respond to that too
  • i am not saying you should explicitly say "yeah i think you should treat me as a scammer until i prove otherwise"! i am also not saying you should try to argue with people who have already stopped listening to you because they think you're a scammer! i am merely saying we should be aware that people might be entertaining that as a hypothesis, and if you try to argue by using this particular class of rhetorical move, you will only trigger their defenses further, and that you should instead just directly provide the evidence for why you should be taken seriously, in a socially appropriate manner. if i understand correctly, i think the thing you are saying one should do is the same as the thing i'm saying one should do, but phrased in a different way; i'm saying not to do a thing that you seem to already not be doing.

i think i have not communicated myself well in this conversation, and my mental model is that we aren't really making progress, and therefore this conversation has not brought value and joy into the world in the way i intended. so this will probably be my last reply, unless you think doing so would be a grave error.

leogao52

i'm not even saying people should not evaluate evidence for and against a proposition in general! it's just that this is expensive, and so it is perfectly reasonable to have heuristics to decide which things to evaluate, and so you should first prove with costly signals that you are not pwning them, and then they can weigh the evidence. and until you can provide enough evidence that you're not pwning them for it to be worth their time to evaluate your claims in detail, that it should not be surprising that many people won't listen to the evidence; and that even if they do listen, if there is still lingering suspicion that they are being pwned, you need to provide the type of evidence that could persuade someone that they aren't getting pwned (for which being credibly very honest and truth seeking is necessary but not sufficient), which is sometimes different from mere compellingness of argument

leogao42

in practice many of the claims you hear will be optimized for memetic fitness, even if the people making the claims are genuine. well intentioned people can still be naive, or have blind spots, or be ideologically captured.

also, presumably the people you are trying to convince are on average less surrounded by truth seeking people than you are (because being in the alignment community is strongly correlated with caring about seeking truth).

i don't think this gives up your ability to communicate with people. you simply have to signal in some credible way that you are not only well intentioned but also not merely the carrier of some very memetic idea that slipped past your antibodies. there are many ways to accomplish this. for example, you can build up a reputation of being very scrupulous and unmindkilled. this lets you convey ideas freely to other people in your circles that are also very scrupulous and unmindkilled. when interacting with people outside this circle, for whom this form of reputation is illegible, you need to find something else. depending on who you're talking to and what kinds of things they take seriously, this could be leaning on the credibility of someone like geoff hinton, or of sam/demis/dario, or the UK government, or whatever.

this might already be what you're doing, in which case there's no disagreement between us.

leogao51

i am also trying to accurately describe reality. what i'm saying is, even from the perspective of someone smart and truth-seeking but who doesn't know much about the object-level, it is very reasonable to use bigness of claim as a heuristic for how much evidence you need before you're satisfied, and that if you don't do this, you will be worse at finding the truth in practice. my guess is this applies even more so to the average person.

i think this is very analogous to occam's razor / trust region optimization. clearly, we need to discount theories based on complexity because there are exponentially more complex theories compared to simple ones, many of which have no easily observable difference to the simpler ones, opening you up to being pwned. and empirically it seems a good heuristic to live life by. complex theories can still be true! but given two theories that both accurately describe reality, you want the simpler one. similarly, given two equally complex claims that accurately describe the evidence, you want the one that is less far fetched from your current understanding of the world / requires changing less of your worldview.

also, it doesn't have to be something you literally personally experienced. it's totally valid to read the wikipedia page on the branch davidians or whatever and feel slightly less inclined to take things that have similar vibes seriously, or even to absorb the vibe from your environs (your aversion to scammers and cranks surely did not come ex nihilo, right?)

for most of the examples i raised, i didn't necessarily mean the claim was literally 100% human extinction, and i don't think it matters that it wasn't. first, because the important thing is the vibe of the claim (catastrophic) - since we're talking about heuristics on how seriously to take things that you don't have time to deep dive on, the rule has to be relatively cheap to implement. i think most people, even quite smart people, genuinely don't feel much of an emotional difference between literal human extinction vs collapse of society vs half of people dying painfully, unless they first spend a half hour carefully thinking about the implications of extinction. (and even then depending on their values they may still not feel a huge difference)

also, it would be really bad if you could weasel your way out of a reference class that easily; it would be rife for abuse by bad actors - "see, our weird sect of christianity claims that after armageddon, not only will all actual sinners' souls be tortured forever, but that the devil will create every possible sinner's soul to torture forever! this is actually fundamentally different from all existing christian theories, and it would be unfathomably worse, so it really shouldn't be thought of as the same kind of claim"

even if most people are trying to describe the world accurately (which i think is not true and we only get this impression because we live in a strange bubble of very truth seeking people + are above-average capable at understanding things object level and therefore quickly detecting scams), ideas are still selected for memeticness. i'm sure that 90% of conspiracy theorists genuinely believe that humanity is controlled by lizards and are trying their best to spread what they believe to be true. many (not all) of the worst atrocities in history have been committed by people who genuinely thought they were on the side of truth and good.

(actually, i think people do get pwned all the time, even in our circles. rationalists are probably more likely than average (controlling for intelligence) to get sucked into obviously culty things (e.g zizians), largely because they don't have the memetic antibodies needed to not get pwned, for one reason or another. so probably many rationalists would benefit from evaluating things a little bit more on vibes/bigness and a little bit less on object level)

leogao42

I agree this reference class is better, and implies a higher prior, but I think it's reasonable for the prior over "arbitrary credentialed people warning about something" to be still relatively low in an absolute sense- lots of people have impressive sounding credentials that are not actually good evidence of competence (consider: it's basically a meme at this point that whenever you see a book where the author puts "PhD" after their name, they probably are a grifter / their phd was probably kinda bs), and also there is a real negativity bias where fearmongering is amplified by both legacy and social media. Also, for the purposes of understanding normal people, it's useful to keep in mind that trust in credentials and institutions is not very high right now in the US among genpop.

leogao23-13

I claim it is a lot more reasonable to use the reference class of "people claiming the end of the world" than "more powerful intelligences emerging and competing with less intelligent beings" when thinking about AI x-risk. further, we should not try to convince people to adopt the latter reference class - this sets off alarm bells, and rightly so (as I will argue in short order) - but rather to bite the bullet, start from the former reference class, and provide arguments and evidence for why this case is different from all the other cases.

this raises the question: how should you pick which reference class to use, in general? how do you prevent reference class tennis, where you argue back and forth about what is the right reference class to use? I claim the solution is you want to use reference classes that have consistently made good decisions irl. the point of reference classes is to provide a heuristic to quickly apply judgement to large swathes of situations that you don't have time to carefully examine. this is important because otherwise it's easy to get tied up by bad actors who avoid being refuted by making their beliefs very complex and therefore hard to argue against.

the big problem with the latter reference class is it's not like anyone has had many experiences using it to make decisions ex ante, and if you squint really hard to find day to day examples, they don't all work out the same way. smarter humans do mostly tend to win over less smart humans. but if you work at a zoo, you will almost always be more worried about physical strength and aggressiveness when putting different species in the same enclosure. if you run a farm (or live in Australia), you're very worried about relatively dumb invasive animals like locusts and rabbits.

on the other hand, everyone has personally experienced a dozen different doomsday predictions. whether that's your local church or faraway cult warning about Armageddon, or Y2K, or global financial collapse in 2008, or the maximally alarmist climate people, or nuclear winter, or peak oil. for basically all of them, the right action empirically in retrospect was to not think too much about it. there are many concrete instances of people saying "but this is different" and then getting burned.

and if you allow any reference class to be on as strong a footing as very well established reference classes, then you open yourself up to getting pwned ideologically. "all complex intricate objects we have seen created have been created by something intelligent, therefore the universe must also have an intelligent creator." it's a very important memetic defense mechanism.

(to be clear this doesn't mean you can only believe things others believe, or that humans taking over earth is not important evidence, or that doomsday is impossible!! I personally think AGI will probably kill everyone. but this is a big claim and should be treated as such. if we don't accept this, then we will forever fail to communicate with people who don't already agree with us on AGI x-risk.)

leogao160

my summary of these two papers: https://arxiv.org/pdf/1805.12152  https://arxiv.org/pdf/1905.02175

the first paper observes a phenomenon where adversarial accuracy and normal accuracy are at odds with each other. the authors present a toy example to explain this. 

the construction involves giving each input one channel that is 90% accurate for predicting the binary label, and a bazillion iid gaussian channels that are as noisy as possible individually, so that when you take the average across all of them you get ~100% accuracy. they show that when you do -adversarial training on the input you learn to only use the 90% accurate feature, whereas normal training uses all bazillion weak channels.

the key to this construction is that they consider an -ball on the input (distance is the max across all coordinates). so this means by adding more and more features, you can move further and further in  space (specifically,  in terms of the number of features). but the  distance between the means of the two high dimensional gaussians stays constant, so no matter what your  is, at some point with enough channels you can perturb anything from one class into the other class and vice versa.

in the second paper, the authors do further experiments on real models to show that you can separate out the robust features and the unrobust ones, and recombine them into frankenstein images that look like dogs to humans but cats to the unrobust model and dogs to the robust model.

they also generalize the toy example in the previous paper. they argue that in general, adversarial examples arise exactly when the adversarial attack metric and the loss metric differ. in other words, the loss function (and downstream part of the model, in a multilayer model) implies some loss surface around any data point, and some directions on that surface will be a lot more important for the loss than some other directions. but your  ball (in, say, ) that you do your attack in will treat all those directions equally importantly. so you can pick the direction that maximizes the amount of loss change.

their new example is a classification task on two features, where the two classes are very stretched out gaussians placed diagonally from each other, so that a  ball from each mean reaches into the distribution of the other gaussian. during normal training, the classification boundary learned falls right along the line where the mahalanobis distance from the two means is the same (intuitively, the classification boundary falls along exactly those points where a data point is equally likely to be sampled from either distribution.) but this is different from  norm! it treats distances along the low-variance axis of the gaussian as being much larger, so it doesn't mind putting the boundary close (in  norm) to the mean. this lets the  perturbation step over the boundary.

leogao20

BB grows so fast that in practice it doesn't seem worth distinguishing any of these cases for programs of nontrivial size.

leogao8461

the modern world has many flaws, but I'm still deeply grateful for the modern era of unprecedented peace, prosperity, and freedom in the developed world. 99% of people reading these words have never had to worry about dying in a cholera epidemic, or malaria or smallpox or the plague, or childbirth, or in war, or from a famine, or due to a political purge. this is not true for other times in history, or other places in the world today.

(extremely unoriginal thought, but still important to acknowledge periodically because it's easy to take for granted. especially because it's much more common to complain about ways the world is broken than to acknowledge what has improved over time.)

leogao40

why is it convergent to invent foods which consist of some form of carbs enveloping meat?

Load More
OSZAR »