Current Chat"` abuse detection online evals
Does this mean that the model should ideally not do a current chat search unless there is a compaction?
What happens when we switch from a higher context window model to a lower?
Current Chat"` abuse detection online evals
Does this mean that the model should ideally not do a current chat search unless there is a compaction?
What happens when we switch from a higher context window model to a lower?
RECENCY_WINDOW
What does this do?
Verdict
From, judge PoV?
Search was needed
How is this judged?
The LLM judges should_have_searched blind to what the agent actually did; the verdict is derived by combining that with did_search
What does this mean? What are these variables for