I think it's still true that a lot of individual AI researchers are motivated by something approximating play (they call it "solving interesting problems"), even if the entity we call Anthropic is not. These researchers are in Anthropic and outside of it. This applies to all companies of course.

Reply

Among Us: A Sandbox for Agentic Deception

Adrià Garriga-alonso24d20

Agreed that likely humans would outperform more! At the moment we don't have a human baseline for AmongUs vs. language models yet, so we wouldn't be able to tell if it improved, but it's a good follow-up.

Reply

Pacing Outside the Box: RNNs Learn to Plan in Sokoban

Adrià Garriga-alonso10moΩ120

I'm curious what you mean, but I don't entirely understand. If you give me a text representation of the level I'll run it! :) Or you can do so yourself

Here's the text representation for level 53

##########
##########
##########
#######  #
######## #
#   ###.@#
#   $ $$ #
#. #.$   #
#     . ##
##########

Reply

Pacing Outside the Box: RNNs Learn to Plan in Sokoban

Adrià Garriga-alonso10moΩ110

Maybe in this case it's a "confusion" shard? While it seems to be planning and produce optimizing behavior, it's not clear that it will behave as a utility maximizer.

Reply

Pacing Outside the Box: RNNs Learn to Plan in Sokoban

Adrià Garriga-alonso10moΩ120

Thank you!! I agree it's a really good mesa-optimizer candidate, it remains to see now exactly how good. It's a shame that I only found out about it about a year ago :)

Reply

AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0

Adrià Garriga-alonso11mo30

Asking for an acquaintance. If I know some graduate-level machine learning, and have read ~most of the recent mechanistic interpretability literature, and have made good progress understanding a small-ish neural network in the last few months.

Is ARENA for me, or will it teach things I mostly already know?

(I advised this person that they already have ARENA-graduate level, but I want to check in case I'm wrong.)

Reply

Language Models Model Us

Adrià Garriga-alonso1y40

How did you feed the data into the model and get predictions? Was there a prompt and then you got the model's answer? Then you got the logits from the API? What was the prompt?

Reply

Why I'm doing PauseAI