Hypothesis

51 Matching Annotations

Aug 2023
arxiv.org arxiv.org

2308.07921.pdf

4
1. yyhycx 23 Aug 2023
  
  in Public
  
  ncorporated with CSV, the model becomes capable of usingcode to verify answers
  
  self-refine是在逻辑层面对代码进行检验，而很多数学问题是可以通过代入答案来检验正确性，从而可以通过代码进行检验
2. yyhycx 23 Aug 2023
  
  in Public
  
  Generating code in brief and frequent segments
  
  可能是每次生成的代码更短，更不容易出错
3. yyhycx 23 Aug 2023
  
  in Public
  
  can improve computational capability morethan the natural language chains CNL
  
  代码可以消除计算错误
4. yyhycx 23 Aug 2023
  
  in Public
  
  This study provides the first systematic analysis of code generation, execution, and self-debugging’s role in mathematical problem-solving.
  
  其实self-refine也进行了类似的工作，只不过不是基于gpt4-code
Visit annotations in context

Annotators

yyhycx

URL

arxiv.org/pdf/2308.07921.pdf
arxiv.org arxiv.org

2210.16257.pdf

3
1. yyhycx 23 Aug 2023
  
  in Public
  
  in one round,
  
  很有可能是过拟合了，generator无法产生有意义的数据用于verifier的训练，可能每一轮都从初识checkpoint进行训练能减缓该问题
2. yyhycx 23 Aug 2023
  
  in Public
  
  The results clearly demonstrate the capa-bility of CoRe to greatly boost PLMs’ reasoningability.
  
  主要的提升来源于verifier，而不是self-thinking
3. yyhycx 23 Aug 2023
  
  in Public
  
  In summary, the overall training objective forverifiers is given by
  
  其实该目标是分开进行训练的
Visit annotations in context

Annotators

yyhycx

URL

arxiv.org/pdf/2210.16257.pdf
arxiv.org arxiv.org

2308.06259.pdf

1
1. yyhycx 15 Aug 2023
  
  in Public
  
  his helps explain the effectiveness of our method.
  
  该工作的有效性主要来源于两个方面，一是模型根据unlabeled output生成instruction，二是模型自己迭代地选择更优的指令数据对。模型能够根据output生成instruction，且得到的数据包含高质量数据，这应该是由于生成高质量output比生成高质量instruction要难得多，而根据已有的output生成instruction就更简单，更容易得到高质量数据对。
Visit annotations in context

Annotators

yyhycx

URL

arxiv.org/pdf/2308.06259.pdf
www.semanticscholar.org www.semanticscholar.org

2304.12244.pdf

1
1. yyhycx 10 Aug 2023
  
  in Public
  
  Figure 1: Running Examples of Evol-Instruct.
  
  要把这个流程应用于vision-language领域，主要问题是：1.设计in-breadth和in-depth的方向。2.设计gpt和lvlm的交互方式。
Visit annotations in context

Annotators

yyhycx

URL

semanticscholar.org/reader/131f499e4d3503da93022d07fcf804a18483bea9
arxiv.org arxiv.org

2305.18752.pdf

1
1. yyhycx 07 Aug 2023
  
  in Public
  
  Our GPT4Tools stands distinct from previous and concurrent studies [5– 11] in three ways
  
  与大多数工作利用gpt生成视觉-语言指令微调数据不同，该工作生成分解并调用api类型的指令数据
Visit annotations in context

Annotators

yyhycx

URL

arxiv.org/pdf/2305.18752.pdf
www.semanticscholar.org www.semanticscholar.org

2306.04181.pdf

2
1. yyhycx 06 Aug 2023
  
  in Public
  
  Percentage (%) of full-mark answers on LMExamQA
  
  两个fine-tuned models的满分答案比例如此之高，说明这个benchmark的难度也不够
2. yyhycx 04 Aug 2023
  
  in Public
  
  Overview of our benchmarking method.
  
  如果是要生成众多语言模型不擅长的benchmark，breadth就是找到LLM不擅长的方向，depth就是在这些方向上不断加深。比如说广泛生成不同类别任务，LLM进行测试，根据测试结果选择最差的方向，加深难度。这样难度还是有层次的，加深一次难度增加一次
Visit annotations in context

Annotators

yyhycx

URL

semanticscholar.org/reader/378a545c3a1cf6c4aada8f9ee8820c0d8008220a
arxiv.org arxiv.org

2305.15023.pdf

5
1. yyhycx 01 Aug 2023
  
  in Public
  
  the visual neck of LaVIN is 6 timessmaller than that of LLaVA [18],
  
  使用的linear layer先下采样再上采样，所以参数更少
2. yyhycx 01 Aug 2023
  
  in Public
  
  n the image encoder, we insert the adapters beforethe multi-head attention modules.
  
  本文在image encoder中也添加了adapters，引入了更多可训练参数
3. yyhycx 01 Aug 2023
  
  in Public
  
  the number of optimized parameters is still kept at a very small scale, e.g.,3∼5M
  
  llava是微调了整个llm，故和llava比它调的参数是少得多，但mini-gpt4只调了一个linear层，应该是调参数最少的
4. yyhycx 01 Aug 2023
  
  in Public
  
  Mixture-of-Modality Adapter (MM-Adapter)
  
  通过不同的adapter提供不同模态的能力，再通过router进行模态选择
5. yyhycx 01 Aug 2023
  
  in Public
  
  can maintain the NLP capabilities of LLMs
  
  mini-gpt4冻住了llm部分，故也完全保留了nlp能力，这里应该是在和llava比，llava由于使用的是llama，故在vl 指令微调的阶段开启了llama的权重。
Visit annotations in context

Annotators

yyhycx

URL

arxiv.org/pdf/2305.15023.pdf
Jul 2023
www.semanticscholar.org www.semanticscholar.org

2210.02969.pdf

1
1. yyhycx 31 Jul 2023
  
  in Public
  
  To amplify the correspondence signal between the input instance and the correct labe
  
  模型只学会根据label分布来判断指令，而忽略了input instance和label之间的关系
Visit annotations in context

Annotators

yyhycx

URL

semanticscholar.org/reader/07ec0d4cc6a2be39def51139d228292c6a0dc627
arxiv.org arxiv.org

2302.14691.pdf

1
1. yyhycx 27 Jul 2023
  
  in Public
  
  We hypothesize that during inference, LLMslearn the correspondence between answer choice in the instruction (e.g. Determine the speaker ofthe dialogue, "agent" or "customer".) and the label (e.g. agent) from demonstrations
  
  可能llm学到的主要是根据指令解决问题
Visit annotations in context

Annotators

yyhycx

URL

arxiv.org/pdf/2302.14691.pdf
arxiv.org arxiv.org

2302.03202.pdf

3
1. yyhycx 25 Jul 2023
  
  in Public
  
  NLI
  
  自然语言推理，判断是否能由前提推导出假设，可以看作只有两个选项的选择题
2. yyhycx 25 Jul 2023
  
  in Public
  
  Sentence Completion
  
  都是自然语言常识推理问题
3. yyhycx 25 Jul 2023
  
  in Public
  
  ‘no prompt text’ prompt of COSMOS-QA dataset
  
  cosmos-qa是一个基于常识的阅读理解数据集，是multi-choice qa
Visit annotations in context

Annotators

yyhycx

URL

arxiv.org/pdf/2302.03202.pdf
arxiv.org arxiv.org

2307.04087.pdf

3
1. yyhycx 22 Jul 2023
  
  in Public
  
  e model trained on SVIT can describe abundant detailsaccurately
  
  这应该得益于数据集中提供的额外regions信息
2. yyhycx 22 Jul 2023
  
  in Public
  
  we follow their setting and only feed the detail description subset of SVIT into the model
  
  更细致的图像描述可以减轻模型感知图像的幻觉
3. yyhycx 22 Jul 2023
  
  in Public
  
  Errors in original annotations.
  
  是否可以通过现有的vision language model进行检测，排除这种错误
Visit annotations in context

Annotators

yyhycx

URL

arxiv.org/pdf/2307.04087.pdf
arxiv.org arxiv.org

2307.08701.pdf

4
1. yyhycx 21 Jul 2023
  
  in Public
  
  his indicates the importance of keeping the trainingdata diverse and balanced across different categories in IFT
  
  这显示了该方法的一些问题，对难度较大的，或较偏的问题，gpt给出的评分可能会偏低，这导致这些指令数据被选中的概率偏低
2. yyhycx 21 Jul 2023
  
  in Public
  
  ∼6k high-quality data suffices to finetune LLaMA achieving similar performance as theoriginal ALPACA
  
  多余的数据是因为质量低，还是多样性差
3. yyhycx 21 Jul 2023
  
  in Public
  
  ALPAGASUS trained on 3k/6k/9k selected data.
  
  低质量的数据多了会影响性能，而高质量的数据是越多越好的
4. yyhycx 21 Jul 2023
  
  in Public
  
  we designate “accuracy” as the dimension for ratingpurposes
  
  单一的维度肯定是不充分的，对每条指令数据集进行单独评分也无法考虑到很多因素，如多样性
Visit annotations in context

Annotators

yyhycx

URL

arxiv.org/pdf/2307.08701.pdf
www.semanticscholar.org www.semanticscholar.org

2307.06290.pdf

1
1. yyhycx 20 Jul 2023
  
  in Public
  
  nearest neighbour score, which is a metric ofdataset diversity
  
  可以看出这个指标是最重要的，这可能因为验证集和训练集是没有任何重合的，故测试的是指令微调后的泛化能力，而diversity越高泛化能力越强。
Visit annotations in context

Annotators

yyhycx

URL

semanticscholar.org/reader/ac4ffaab10f6b6ad83e79ca5691f338abf5cff82
arxiv.org arxiv.org

2203.07281.pdf

1
1. yyhycx 20 Jul 2023
  
  in Public
  
  GRIPSworks best when models can follow declarativeinstructions and are responsive to changes to in-structions (shown in Appendix D)
  
  但其实在指令跟随模型上，指令搜索带来的提升是最小的，这可能是因为经过指令微调后，模型对指令理解的泛化能力大幅度提高了，对指令的细微区别就没那么敏感了
Visit annotations in context

Annotators

yyhycx

URL

arxiv.org/pdf/2203.07281.pdf
www.semanticscholar.org www.semanticscholar.org

2306.17492.pdf

1
1. yyhycx 04 Jul 2023
  
  in Public
  
  Wedefine the partial order between y1 and candidatesbehind it as y1,2:n = y1 ≻ {y2, · · · , yn}, then theobjective of Bradley-Terry becomes
  
  将两个中最好的概率扩展到了多个中最好的概率
Visit annotations in context

Annotators

yyhycx

URL

semanticscholar.org/reader/19db2d61f20a6c439cc79f28ef4c9e4bf26cd20e
Jun 2023
arxiv.org arxiv.org

[PDF] On the Advance of Making Language Models Better Reasoners | Semantic Scholar

3
1. yyhycx 29 Jun 2023
  
  in Public
  
  Step-aware Verifier
  
  step-awre verifier的优势来源于训练时更密集的监督信号
2. yyhycx 29 Jun 2023
  
  in Public
  
  Step-aware Voting Verifier
  
  为什么不将所有steps的结果聚合起来作为一个答案的判别结果？
3. yyhycx 29 Jun 2023
  
  in Public
  
  We regard thereasoning paths that match the ground truth finalanswer as positive, and the others as negative.
  
  这样做可能会有小问题，错误的推理步骤也可能导致正确的答案
Visit annotations in context

Annotators

yyhycx

URL

arxiv.org/pdf/2206.02336.pdf
arxiv.org arxiv.org

2305.00955.pdf

1
1. yyhycx 26 Jun 2023
  
  in Public
  
  feedback-based imitation learning
  
  根据feedback获取的难易程度其实可以分为三个等级
Visit annotations in context

Annotators

yyhycx

URL

arxiv.org/pdf/2305.00955.pdf
www.semanticscholar.org www.semanticscholar.org

2303.11366.pdf

1
1. yyhycx 21 Jun 2023
  
  in Public
  
  several advantages
  
  所有feedback without RL的方法共有的优点
Visit annotations in context

Annotators

yyhycx

URL

semanticscholar.org/reader/0671fd553dd670a4e820553a974bc48040ba0819
arxiv.org arxiv.org

2303.17651.pdf

1
1. yyhycx 20 Jun 2023
  
  in Public
  
  The modest performance gains in Math Reasoning can be traced back to the inability to accuratelyidentify whether there is any error.
  
  那可不可以专门fine tune一个模型来进行错误的检查与定位
Visit annotations in context

Annotators

yyhycx

URL

arxiv.org/pdf/2303.17651
arxiv.org arxiv.org

Let's Verify Step by Step

3
1. yyhycx 13 Jun 2023
  
  in Public
  
  step-supervised方法能不能结合decode，作为rerank方法的一种变体，rerank更契合output-supervised的思想
2. yyhycx 13 Jun 2023
  
  in Public
  
  This step is not intended to teach the generator new skills; it is intendedonly to teach the generator to produce solutions in the desired format
  
  在verifier文章中是对generator进行了少量的finetune，应该是因为gpt-4的性能本身就比gpt-3强得多，直接从gpt-3 sample较难得到有意义的solutions，而gpt-4就不需要了
3. yyhycx 13 Jun 2023
  
  in Public
  
  It provides more precise feedback,since it specifies the exact location of any errors that occur.
  
  一个好的reward model应该要有能力识别solution中是哪里出了错，仅从outcome 监督信号中泛化出这种能力是很困难的，我们可以通过更细致的监督信号简化这个学习过程
Visit annotations in context

Annotators

yyhycx

URL

arxiv.org/pdf/2305.20050
arxiv.org arxiv.org

2110.14168.pdf

2
1. yyhycx 12 Jun 2023
  
  in Public
  
  dropout significantly improves solution-level verifiers
  
  使用一个7B的模型来进行classification是很容易过拟合的
2. yyhycx 12 Jun 2023
  
  in Public
  
  Unfortunately, test@100 performance degradesmuch more sharply than test@1 as we increase the number of epoch
  
  这说明在采用majority voting或rerank类似的方法时要防止generator过拟合
Visit annotations in context

Annotators

yyhycx

URL

arxiv.org/pdf/2110.14168
arxiv.org arxiv.org

2211.14275.pdf

1
1. yyhycx 12 Jun 2023
  
  in Public
  
  Supervised finetuning
  
  supervised finetuning 也可以分为outcome-based和process-based
Visit annotations in context

Annotators

yyhycx

URL

arxiv.org/pdf/2211.14275
arxiv.org arxiv.org

2306.02561.pdf

1
1. yyhycx 11 Jun 2023
  
  in Public
  
  PAIRRANKER outperforms other rankers.
  
  只用于ranking不用于RL，list wise或pair wise的方法肯定是比point wise更有效的
Visit annotations in context

Annotators

yyhycx

URL

arxiv.org/pdf/2306.02561.pdf
cdn.openai.com cdn.openai.com

Lets_Verify_Step_by_Step.pdf

1
1. yyhycx 02 Jun 2023
  
  in Public
  
  since a single logical error is enough to derail amuch larger solution
  
  对于GPT-3 GPT-4这样的模型来说，解决复杂数学问题也并不容易，GPT-3可能需要很多次生成才能产生一个正确的答案，如果有一个很强的reward model可能可以将这个正确的答案筛选出来。
  
  对于依赖于思维链的复杂推理问题，一个思维步骤的错误可能会导致整个回答的错误。
  
  模型解决复杂数学问题的能力是否可以解耦为给出解答步骤的能力和执行解答步骤的能力，分开分析
Visit annotations in context

Annotators

yyhycx

URL

cdn.openai.com/improving-mathematical-reasoning-with-process-supervision/Lets_Verify_Step_by_Step.pdf
arxiv.org arxiv.org

2305.19148.pdf

2
1. yyhycx 01 Jun 2023
  
  in Public
  
  showing that they do not introducedomain-label bia
  
  domain label bias是通过random in-domain words缓解的
2. yyhycx 01 Jun 2023
  
  in Public
  
  we use random words sampled from the un-labeled evaluation dataset as the content-free text
  
  通过这种方法也将domain label bias考虑在内
Visit annotations in context

Annotators

yyhycx

URL

arxiv.org/pdf/2305.19148.pdf
May 2023
arxiv.org arxiv.org

2305.19148.pdf

2
1. yyhycx 31 May 2023
  
  in Public
  
  Domain Label Bias
  
  似乎domain label bias及vanilla label bias和ICL没有一定的联系，并不是只有ICL中才存在这种形式的bias
2. yyhycx 31 May 2023
  
  in Public
  
  Us-ing random words limits the semantic meaning ofthe input, allowing us to estimate the vanilla-labeland context-label biases while using in-domainwords accounts for the effect of the task corpus
  
  总的来说，我们希望整个prompt部分只提供有价值的无偏差的信息，这在一定程度上可以通过校准来实现
Visit annotations in context

Annotators

yyhycx

URL

arxiv.org/pdf/2305.19148.pdf
arxiv.org arxiv.org

2305.15717.pdf

1
1. yyhycx 29 May 2023
  
  in Public
  
  On the other hand, the models do not improve (or even decline in accuracy) on evaluation datasets forwhich there is little support.
  
  也就是说泛化性不好，指令有没有优化的方向
  
  模型在instruction tuning的过程中学到了什么，这是决定其泛化能力的关键
  
  是否可以评估模型在topic，task type，format等方向上的泛化能力
Visit annotations in context

Annotators

yyhycx

URL

arxiv.org/pdf/2305.15717.pdf

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL