32 Matching Annotations
  1. Apr 2022
    1. allegedly metrics like ELO ended up mostly continuousI find this suspicious - why did superforecasters put only a 20% probability on AlphaGo beating Se-dol, if it was so predictable?  Where were all the forecasters calling for Go to fall in the next couple of years, if the metrics were pointing there and AlphaGo was straight on track?  This doesn't sound like the experienced history I remember.Now it could be that my memory is wrong and lots of people were saying this and I didn't hear.  It could be that the lesson is, "You've got to look closely to notice oncoming trains on graphs because most people's experience of the field will be that people go on whistling about how something is a decade away while the graphs are showing it coming in 2 years."But my suspicion is mainly that there is fudge factor in the graphs or people going back and looking more carefully for intermediate data points that weren't topics of popular discussion at the time, or something, which causes the graphs in history books to look so much smoother and neater than the graphs that people produce in advance.

      Clear formalisation of the difference between Paul's and Elizier's differences in interpreting AlphaGo - testable!

    2. I don't expect "the fall" to take years; I feel pretty on board with "the slide" taking months or maybe even a couple of years.  If "the slide" supposedly takes much longer, I wonder why better-scaling tech hasn't come over and started a new slide.

      concrete statement about shapes of crunchtime

    3. hmm.. here i'm running into trouble (type mismatch error) again. i can imagine this in abstract (and perhaps incorrectly/anthropomorphisingly!), but would - at this stage - fail to code up anything like a gridworlds example. more research needed (TM) i guess :)

      Important point to follow up on"

    4. Suppose I tried this distinction

      sumper interesting detailed discussion on how to get from overtly plotting agents to concealing thoughts from the operator

    5. Now that I've publicly given this answer, it's no longer useful as a validation set from my own perspective.  But it's clear enough that probably nobody was ever going to pass the validation set for generating lines of reasoning obvious enough to be generated by Eliezer in 2010 or possibly 2005. 

      The imoportant bits seem to be how far we get with narrow systems before we get general systems; and whether CIS is applicable to all general systems

      Import point to reconcile fast and slow takeoff scenarios seems to be whether the slow scenario thinks that a FOOM will happen eventually, and under which circumstances

    6. back in the day i got frustrated by smart people dismissing the AI control problem as "anthropomorphising", so i prepared a presentation (https://www.dropbox.com/s/r8oaixb1rj3o3vp/AI-control.pdf?dl=0) that visualised the control problem as exhaustive search in a gridworld over (among other things) the state of the off button. this seems to have worked at least in one prominent case where a renowned GOFAI researcher, after me giving the presentation to him 1-1, went from "control problem is silly anthropomorphising scifi" to "why on earth would you give your AI the incorrect [read: unaligned!] utility function?!?"(i even seem to remember sending an overly excited email about that event to you and some FHI people :) i also ended up promoting gridworlds as a tool more generally: gwern did some further work, and of course DM -- though i'm not sure if the latter was related to me promoting it

      Important point about using grid worlds to popularise the control problem - maybe usable for Yo's project

    7. I consider all of this obvious as a convergent instrumental strategy for AIs.  I could probably have generated it in 2005 or 2010 - if somebody had given me the hypothetical of modern-style AI that had been trained by something like gradient descent or evolutionary methods, into which we lacked strong transparency and strong reassurance-by-code-inspection that this would not happen.  I would have told you that this was a bad scenario to get into in the first place, and you should not build an AI like that; but I would also have laid the details, I expect, mostly like they are laid here.

      Comment on the stability of his models re CIS - either impressive or faulty?

    8. And if you try training the AI out of that habit in a domain of lower complexity and intelligence, it is predicted by me that generalizing that trained AI or subsystem to a domain of sufficiently higher complexity and intelligence, but where you could still actually see overt plots, would show you the AI plotting to kill you again.If people try this repeatedly with other corrigibility training tricks on the level where plots are easily observable, they will eventually find a try that seems to generalize to the more complicated and intelligent validation set, but which kills you on the test set.

      Argument why aggression is a focal point

    9. This is not an infallible ward against general intelligence generalizing there; it just at least avoids actively pushing the AI's intelligence to generalize in that direction.  This could be part of a larger complete strategy, which would need to solve a lot of other problems, for building a superhuman engineer that was subhuman at modeling how other agents model its actions.

      Possiblity to build a savant or tool AI

    10. I should also remark somewhere in here: The whole "hide" stage, and also the possibly-later "think non-alarming visible thoughts (once the AI correctly models transparency) (in the unlikely event that transparency exists)" stage, seem liable to occur earlier in the AI's trajectory, if the AI has been previously tasked on problems where there's a favorable success gradient as you model agents modeling other agents.

      This and following paragraphs seem insanely important

    1. So the first thing that needs to happen on a timescale of 5 seconds is perceptual recognition of highly abstract statements unaccompanied by concrete examples, accompanied by an automatic aversion, an ick reaction - this is the trigger which invokes the skill.

      Anki this

    1. Further comment that occurred to me on "takeoff speeds" if I've better understood the main thesis now: its hypotheses seem to include a perfectly anti-Thielian setup for AGI.

      I maybe need to think about this moe

    2. this all sure does sound "pretty darn prohibited" on my model, but I'd hope there'd be something earlier than that we could bet on. what does your Prophecy prohibit happening before that sub-prophesied day?

      why the hell is this prohibited under Elizier's model?

    3. This entire essay seems to me like it's drawn from the same hostile universe that produced Robin Hanson's side of the Yudkowsky-Hanson Foom Debate.Like, all these abstract arguments devoid of concrete illustrations and "it need not necessarily be like..." and "now that I've shown it's not necessarily like X, well, on the meta-level, I have implicitly told you that you now ought to believe Y".It just seems very clear to me that the sort of person who is taken in by this essay is the same sort of person who gets taken in by Hanson's arguments in 2008 and gets caught flatfooted by AlphaGo and GPT-3 and AlphaFold 2.

      Connecting the essay to Hanson's arguments

    4. The path that evolution took there doesn't lead through things that generalized 95% as well as humans first for 10% of the impact, not because evolution wasn't optimizing for that, but because that's not how the underlying cognitive technology worked

      Verbalisation of one of Yudkowsky's primary arguments

  2. Feb 2022
    1. 2019 At the end of the year, the 14nm capacity only 3000 to 5,000 wafers / month, but the 14nm capacity of 2020 will grow very fast. At the end of the year, it will reach 15,000 wafers / month, which is currently 3-5 times, up to 400%

      First mention of production capacity for 14nm

    1. And for an AGI to trust that its goals will remain the same under retraining will likely require it to solve many of the same problems that the field of AGI safety is currently tackling

      where does goal-directed behaviour suddenly come from?

    2. So it’s probably more accurate to think about self-modification as the process of an AGI modifying its high-level architecture or training regime, then putting itself through significantly more training

      this claim relies heavily on the current regime

  3. Jan 2022
    1. (moving from the ideal gas models to the van der Waals equation

      this seems weird - if Q doesn't change, I don't see how the model can be an improvement

    2. it's a set of functions from ¯¯¯¯¯F to {True,False})

      can E_i be a partial function from F-bar to {true,false}? If not, the blegg/rube example doesn't seem to make sense

    1. is the set of all bleggs and rubes in some situation

      can environments track things that are not features? (i.e. here something like number of rube/blegg objects within some space?)

    2. Q(F1∣F2).

      how is this possible? If F_i are only containers for features with labels and a range, how can they be related to each other via a probability function?

    3. Preliminary definition: If M∗ is a refinement of M and R a reward function on M, then M∗ splinters R if there are multiple refactorings of R on M∗ that disagree on elements of E∗ of non-zero probability.

      reward function splintering