5 Matching Annotations
  1. Last 7 days
    1. Current Chat"` abuse detection online evals

      Does this mean that the model should ideally not do a current chat search unless there is a compaction?

      What happens when we switch from a higher context window model to a lower?

    1. The LLM judges should_have_searched blind to what the agent actually did; the verdict is derived by combining that with did_search

      What does this mean? What are these variables for