Hypothesis

16 Matching Annotations

Mar 2026
arxiv.org arxiv.org

Representation Learning with Contrastive Predictive Coding

7
1. maxhenry 29 Mar 2026
  
  in Public
  
  We found that using MINE directly gave identical performance when the task was nontrivial, but became very unstable if the target was easy to predict from the context (e.g., when predicting a single step in the future and the target overlaps with the context).
  
  all content that points to important caveats and gotchas that I might consider when leaning too heavily on the results of this paper
  
  caveats ai-user-approved
2. maxhenry 29 Mar 2026
  
  in Public
  
  We note that better [49, 27] results have been published on these target datasets, by transfer learning from a different source task.
  
  all content that points to important caveats and gotchas that I might consider when leaning too heavily on the results of this paper
  
  caveats ai-user-approved
3. maxhenry 29 Mar 2026
  
  in Public
  
  We also found that not all the information encoded is linearly accessible. When we used a single hidden layer instead the accuracy increases from 64.6 to 72.5, which is closer to the accuracy of the fully supervised model.
  
  all content that points to important caveats and gotchas that I might consider when leaning too heavily on the results of this paper
  
  caveats ai-user-approved
4. maxhenry 29 Mar 2026
  
  in Public
  
  For lasertag_three_opponents_small, contrastive loss does not help nor hurt. We suspect that this is due to the task design, which does not require memory and thus yields a purely reactive policy.
  
  all content that points to important caveats and gotchas that I might consider when leaning too heavily on the results of this paper
  
  caveats ai-user-approved
5. maxhenry 29 Mar 2026
  
  in Public
  
  Although this is a standard transfer learning benchmark, we found that models that learn better relationships in the childeren books did not necessarily perform better on the target tasks (which are very different: movie reviews etc).
  
  all content that points to important caveats and gotchas that I might consider when leaning too heavily on the results of this paper
  
  caveats ai-user-approved
6. maxhenry 29 Mar 2026
  
  in Public
  
  We found that more advanced sentence encoders did not significantly improve the results, which may be due to the simplicity of the transfer tasks (e.g., in MPQA most datapoints consists of one or a few words), and the fact that bag-of-words models usually perform well on many NLP tasks [48].
  
  all content that points to important caveats and gotchas that I might consider when leaning too heavily on the results of this paper
  
  caveats ai-user-approved
7. maxhenry 29 Mar 2026
  
  in Public
  
  It is important to note that the window size (maximum context size for the GRU) has a big impact on the performance, and longer segments would give better results. Our model had a maximum of 20480 timesteps to process, which is slightly longer than a second.
  
  all content that points to important caveats and gotchas that I might consider when leaning too heavily on the results of this paper
  
  caveats ai-user-approved
Visit annotations in context

Tags

caveats

ai-user-approved

Annotators

maxhenry

URL

arxiv.org/pdf/1807.03748
arxiv.org arxiv.org

2305.14975.pdf

7
1. maxhenry 29 Mar 2026
  
  in Public
  
  Finally, our study is limited to short-form question-answering; future work should extend this analysis to longer-form generation settings.
  
  all content that points to important caveats and gotchas that I might consider when leaning too heavily on the results of this paper
  
  caveats ai-user-approved
2. maxhenry 29 Mar 2026
  
  in Public
  
  While our work demonstrates a promising new approach to generating calibrated confidences through verbalization, there are limitations that could be addressed in future work. First, our experiments are focused on factual recall-oriented problems, and the extent to which our observations would hold for reasoning-heavy settings is an interesting open question.
  
  all content that points to important caveats and gotchas that I might consider when leaning too heavily on the results of this paper
  
  caveats ai-user-approved
3. maxhenry 29 Mar 2026
  
  in Public
  
  the 1-stage and 2-stage verbalized numerical confidence prompts sometimes differ drastically in the calibration of their confidences. How can we reduce sensitivity of a model's calibration to the prompt?
  
  all content that points to important caveats and gotchas that I might consider when leaning too heavily on the results of this paper
  
  caveats ai-user-approved
4. maxhenry 29 Mar 2026
  
  in Public
  
  Additionally, the lack of technical details available for many state-of-the-art closed RLHF-LMs may limit our ability to understand what factors enable a model to verbalize well-calibrated confidences and differences in this ability across different models.
  
  all content that points to important caveats and gotchas that I might consider when leaning too heavily on the results of this paper
  
  caveats ai-user-approved
5. maxhenry 29 Mar 2026
  
  in Public
  
  With Llama2-70B-Chat, verbalized calibration provides improvement over conditional probabilities across some metrics, but the improvement is much less consistent compared to GPT-* and Claude-*.
  
  all content that points to important caveats and gotchas that I might consider when leaning too heavily on the results of this paper
  
  caveats ai-user-approved
6. maxhenry 29 Mar 2026
  
  in Public
  
  The verbal calibration of the open source model Llama-2-70b-chat is generally weaker than that of closed source models but still demonstrates improvement over its conditional probabilities by some metrics, and does so most clearly on TruthfulQA.
  
  all content that points to important caveats and gotchas that I might consider when leaning too heavily on the results of this paper
  
  caveats ai-user-approved
7. maxhenry 29 Mar 2026
  
  in Public
  
  Chain-of-thought prompting does not improve verbalized calibration
  
  all content that points to important caveats and gotchas that I might consider when leaning too heavily on the results of this paper
  
  caveats ai-user-approved
Visit annotations in context

Tags

caveats

ai-user-approved

Annotators

maxhenry

URL

arxiv.org/pdf/2305.14975
Apr 2020
journals.plos.org journals.plos.org

Exaggerations and Caveats in Press Releases and Health-Related Science News

1
1. Marlene_Wulf 23 Apr 2020
  
  in BehSci
  
  Sumner, P., Vivian-Griffiths, S., Boivin, J., Williams, A., Bott, L., Adams, R., Venetis, C. A., Whelan, L., Hughes, B., & Chambers, C. D. (2016). Exaggerations and Caveats in Press Releases and Health-Related Science News. PLOS ONE, 11(12), e0168217. https://doi.org/10.1371/journal.pone.0168217
  
  is:article lang:en exaggeration health science news press analysis caveats
Visit annotations in context

Tags

news

is:article

science

health

press

caveats

lang:en

exaggeration

analysis

Annotators

Marlene_Wulf

URL

journals.plos.org/plosone/article
Jun 2017
survivejs.com survivejs.com

Automatic Browser Refresh

1
1. leonard.reidy 22 Jun 2017
  
  in Public
  
  If you access through http://localhost:8080/webpack-dev-server/, WDS provides status information at the top. If your application relies on WebSockets and you use WDS proxying, you need to use this particular url as otherwise WDS logic interferes.
  
  IMPORTANT CAVEAT - If using Websockets and WDS proxying take note!
  
  Websockets react WDS WDS caveats
Visit annotations in context

Tags

WDS

Websockets

react

WDS caveats

Annotators

leonard.reidy

URL

survivejs.com/webpack/developing/automatic-browser-refresh/

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL