4 Matching Annotations
  1. Last 7 days
    1. our GTPO hybrid advantage formulation eliminates the advantage misalignment problem

      大多数人认为在强化学习中,优势函数的计算和优化是一个相对直接的过程,但作者指出存在'优势不匹配问题',并提出了GTPO混合优势公式来解决它。这挑战了强化学习中的基本假设,表明即使是优势函数这样的核心概念也需要仔细设计才能在多轮任务中有效工作。

  2. Jan 2024
  3. Jan 2019
    1. Contrary to mainstream thinking that this new technology is unregulated, it’s really quite the opposite. These systems apply the strictest of rules under highly deterministic and predictable models that are regulated through mathematics. In the future, industry will be regulated not just by institutions and committees but by algorithms and mathematics. The new technology will gradually out-regulate the regulators and, in many cases, make them obsolete because the new system offers more certainty. Antonopoulos explains that “the opposite of authoritarianism is not chaos, but autonomy.”

      <big>评:</big><br/><br/>1933 年德国包豪斯设计学院被纳粹关闭,大部分师生移民到美国,他们同时也把自己的建筑风格带到了美利坚。尽管人们在严格的几何造型上感受到了冷漠感,但是包豪斯主义致力于美术和工业化社会之间的调和,力图探索艺术与技术的新统一,促使公众思考——「如何成为更完备的人」?而这一点间接影响到了我们现在所熟知的美国式人格。<br/><br/>区块链最终会超越「人治」、达到「算法自治」的状态吗?类似的讨论声在人工智能领域同样不绝于耳。「绝对理性」站到了完备人格的对立面,这种冰冷的特质标志着人类与机器交手后的败退。过去有怀疑论者担心,算法的背后实际上由人操控,但随着「由算法生成」的算法,甚至「爷孙代自承袭」算法的出现,这样的担忧逐渐变得苍白无力——我们有了更大的焦虑:是否会出现 “blockchain-based authoritarianism”?

  4. Nov 2018
    1. how does misrepresentative information make it to the top of the search result pile—and what is missing in the current culture of software design and programming that got us here?

      Two core questions in one? As to "how" bad info bubbles to the top of our search results, we know that the algorithms are proprietary—but the humans who design them bring their biases. As to "what is missing," Safiya Noble suggests here and elsewhere that the engineers in Silicon Valley could use a good dose of the humanities and social sciences in their decision-making. Is she right?