2 Matching Annotations
  1. Last 7 days
    1. current approaches often rely on decoupled trigger-response pipelines or are limited to captioning-style narration, reducing their effectiveness for open-ended question answering and long-horizon interaction

      大多数人认为现有的视频大模型可以通过简单的触发-响应管道或描述式叙述来处理实时视频流,但作者认为这种方法对于开放式问答和长时程交互效果有限。这是一个反直觉的观点,因为它挑战了当前视频处理领域的常规做法,暗示需要更集成的端到端方法来真正实现实时视频理解。

  2. Dec 2021