Hypothesis

38 Matching Annotations

Last 7 days
huggingface.co huggingface.co

https://huggingface.co/baidu/Unlimited-OCR

4
1. fxp007 26 Jun 2026
  
  in Public
  
  no_repeat_ngram_size= 35
  
  大多数人认为OCR系统不需要特别处理n-gram重复问题，因为这主要在文本生成中重要。作者专门设置了no_repeat_ngram_size参数为35，表明他们的OCR系统需要防止长文本中的重复模式，这挑战了OCR只是简单提取文本而不需要处理文本生成特性的主流认知。
  
  non-consensus ocr-text-generation
2. fxp007 26 Jun 2026
  
  in Public
  
  max_length= 32768
  
  大多数人认为OCR模型处理的文本长度受限于模型架构，通常在几千词左右。作者设置的max_length高达32768，这远超传统OCR系统的处理能力，暗示了模型能够处理超长文档而不丢失上下文，挑战了OCR系统的长度限制认知。
  
  non-consensus ocr-capability
3. fxp007 26 Jun 2026
  
  in Public
  
  Single image supports two configs: gundam or base
  
  大多数人认为OCR模型需要针对特定任务或文档类型进行专门配置，但作者提出单个图像就能支持两种截然不同的配置('gundam'或'base')，这挑战了OCR系统通常需要针对特定场景进行专门配置的行业共识。
  
  non-consensus ocr-flexibility
4. fxp007 26 Jun 2026
  
  in Public
  
  Welcome the Era of One-shot Long-horizon Parsing.
  
  大多数人认为OCR技术需要针对不同类型的文档进行多次处理或微调，但作者声称Unlimited-OCR实现了'一次性长距离解析'，这挑战了OCR领域需要多次处理的常规认知，暗示一个模型可以处理各种复杂文档而无需专门训练。
  
  non-consensus ocr-innovation
Visit annotations in context

Tags

ocr-text-generation

non-consensus

ocr-capability

ocr-innovation

ocr-flexibility

Annotators

fxp007

URL

huggingface.co/baidu/Unlimited-OCR
Oct 2025
ocrmypdf.readthedocs.io ocrmypdf.readthedocs.io

OCRmyPDF documentation — ocrmypdf 13.6.1.dev2+g6e439ee8 documentation

1
1. earthworm 08 Oct 2025
  
  in Public
  
  OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched.
  
  PDF and OCR conversion of image or scanned pdf to OCRed PDF. Command line on Windows when used with winget installation py -m ocrmypdf --sidecar R.txt --output-type pdf R.pdf R_01.pdf
  
  PDF OCR OCRMYPDF
Visit annotations in context

Tags

PDF

OCR

OCRMYPDF

Annotators

earthworm

URL

ocrmypdf.readthedocs.io/en/latest/
Mar 2025
arstechnica.com arstechnica.com

Why extracting data from PDFs is still a nightmare for data experts

1
1. dominicboisvert 11 Mar 2025
  
  in Public
  
  OCR IA
Visit annotations in context

Tags

IA

OCR

Annotators

dominicboisvert

URL

arstechnica.com/ai/2025/03/why-extracting-data-from-pdfs-is-still-a-nightmare-for-data-experts/
Jan 2025
www.cnblogs.com www.cnblogs.com

部分解决 | ocrmypdf对中文pdf进行ocr识别后存在多余空格 - IssacNew - 博客园

1
1. structseeker 12 Jan 2025
  
  in Public
  
  mypdfocr中文识别空格问题
  
  mypdfocr tool OCR PDF
Visit annotations in context

Tags

mypdfocr

PDF

tool

OCR

Annotators

structseeker

URL

cnblogs.com/issacnew/p/17468697.html
bend1031.github.io bend1031.github.io

Untitled document

1
1. structseeker 12 Jan 2025
  
  in Public
  
  Make PDF file searchable OCRmyPDF
  
  PDF tool OCR
Visit annotations in context

Tags

PDF

tool

OCR

Annotators

structseeker

URL

bend1031.github.io/2019/07/02/扫描版-PDF-文字识别并合并入原-PDF_new/
Jul 2024
scholar.google.com scholar.google.com

‪Reversible_Object-Oriented Intertgfeters‬

1
1. mrcolbyrussell 23 Jul 2024
  
  in Public
  
  Reversible_Object-Oriented Intertgfeters
  
  That's Reversible Object-Oriented Interpreters.
  
  correction bad OCR
Visit annotations in context

Tags

bad OCR

correction

Annotators

mrcolbyrussell

URL

scholar.google.com/citations
haintz.media haintz.media

E-Book „The Great Taking" durch Digital Services Act verboten

1
1. milahu 22 Jul 2024
  
  in Public
  
  wie wärs mit selbsthilfe?!
  
  diese passive "wir sind konsumenten" scheisse ist doch genau das problem...
  
  ich hab mir das print buch gekauft für 22 euro, hab den buchrücken aufgeschnitten mit ner kreissäge, und hab die 208 seiten durch meinen ADF scanner gejagt (Brother ADS-3000N, 150eur gebraucht). ohne vorbereitung ist das vielleicht ne halbe stunde arbeit. dann noch die scans rotieren, croppen, leveln, und durch tesseract jagen. für tesseract braucht man ne schnelle CPU.
  
  aktuell tu ich die hocr dateien von tesseract korrekturlesen, später werd ich ne pdf draus machen und über libgen.rs auf annas-archive.org hochladen - ein problem weniger.
  
  hocr dateien hab ich hochgeladen auf https://github.com/milahu/enteignung - vielleicht mag wer helfen beim korrekturlesen, dann gehts 1 oder 2 tage schneller.
  
  mann mann mann... als "IT insider" bin ich so gelangweilt von den normies, die beim thema IT vor 20 jahren stehen geblieben sind, kein plan haben von linux, git, python, torproject, monero, ... aber hauptsache scheisse labern in telegram >: (
  
  enteignung the great taking david rogers webb die große enteignung ebook pdf ocr hocr tesseract scheiss normies
Visit annotations in context

Tags

enteignung

tesseract

hocr

david rogers webb

ocr

die große enteignung

pdf

ebook

scheiss normies

the great taking

Annotators

milahu

URL

haintz.media/artikel/international/e-book-the-great-taking-durch-digital-service-act-verboten/
Mar 2024
www.youtube.com www.youtube.com

ChatGPT Vision: The Best Way to Transform Your Paper Notes Into Digital Text - YouTube

1
1. M.AKilic50 07 Mar 2024
  
  in Public
  
  ChatGPT Vision: The Best Way to Transform Your Paper Notes Into Digital Text
  
  Upload a photo into ChatGPT and request it to transcribe the photo into text. Better than OCR? It creates meaning out of surrounding context; even though words may be wrong.
  
  ChatGPT OCR Tiago Forte
Visit annotations in context

Tags

ChatGPT

OCR

Tiago Forte

Annotators

M.AKilic50

URL

youtube.com/watch
Nov 2023
docdrop.org docdrop.org

Drag and Drop a document

1
1. chrisaldrich 16 Nov 2023
  
  in Public
  
  https://docdrop.org/
  
  Can be used to create optical character recognition on .pdf documents and return documents with selectable/machine readable text.
  
  OCR docdrop EdTech SoCal Meetup 2023-11-15
Visit annotations in context

Tags

docdrop

OCR

EdTech SoCal Meetup 2023-11-15

Annotators

chrisaldrich

URL

docdrop.org/
Sep 2023
docdrop.org docdrop.org

DocDrop | OCR

1
1. carmelitaoosthuizen 12 Sep 2023
  
  in Public
  
  hyp.is-tooling-ocr
Visit annotations in context

Tags

hyp.is-tooling-ocr

Annotators

carmelitaoosthuizen

URL

docdrop.org/ocr/
Jan 2023
docdrop.org docdrop.org

DocDrop | OCR

1
1. kael 09 Jan 2023
  
  in Public
  
  docdrop ocr pdf
Visit annotations in context

Tags

docdrop

ocr

pdf

Annotators

kael

URL

docdrop.org/ocr/
Oct 2022
www.reddit.com www.reddit.com

r/antinet - Worried about paper cards being lost or destroyed

1
1. chrisaldrich 18 Oct 2022
  
  in Public
  
  Worried about paper cards being lost or destroyed .t3_y77414._2FCtq-QzlfuN-SwVMUZMM3 { --postTitle-VisitedLinkColor: #9b9b9b; --postTitleLink-VisitedLinkColor: #9b9b9b; --postBodyLink-VisitedLinkColor: #989898; } I am loving using paper index cards. I am, however, worried that something could happen to the cards and I could lose years of work. I did not have this work when my notes were all online. are there any apps that you are using to make a digital copy of the notes? Ideally, I would love to have a digital mirror, but I am not willing to do 2x the work.
  
  u/LBHO https://www.reddit.com/r/antinet/comments/y77414/worried_about_paper_cards_being_lost_or_destroyed/
  
  As a firm believer in the programming principle of DRY (Don't Repeat Yourself), I can appreciate the desire not to do the work twice.
  
  Note card loss and destruction is definitely a thing folks have worried about. The easiest thing may be to spend a minute or two every day and make quick photo back ups of your cards as you make them. Then if things are lost, you'll have a back up from which you can likely find OCR (optical character recognition) software to pull your notes from to recreate them if necessary. I've outlined some details I've used in the past. Incidentally, opening a photo in Google Docs will automatically do a pretty reasonable OCR on it.
  
  I know some have written about bringing old notes into their (new) zettelkasten practice, and the general advice here has been to only pull in new things as needed or as heavily interested to ease the cognitive load of thinking you need to do everything at once. If you did lose everything and had to restore from back up, I suspect this would probably be the best advice for proceeding as well.
  
  Historically many have worried about loss, but the only actual example of loss I've run across is that of Hans Blumenberg whose zettelkasten from the early 1940s was lost during the war, but he continued apace in another dating from 1947 accumulating over 30,000 cards at the rate of about 1.5 per day over 50 some odd years.
  
  reply note collection loss and damage don't repeat yourself zettelkasten Hans Blumenberg's zettelkasten OCR
Visit annotations in context

Tags

note collection loss and damage

don't repeat yourself

OCR

Hans Blumenberg's zettelkasten

zettelkasten

reply

Annotators

chrisaldrich

URL

reddit.com/r/antinet/comments/y77414/worried_about_paper_cards_being_lost_or_destroyed/
Sep 2022
paperwebsite.com paperwebsite.com

Paper Website: Create a Website Right From Your Notebook

1
1. kael 22 Sep 2022
  
  in Public
  
  paperwebsite ocr notes paper
Visit annotations in context

Tags

ocr

notes

paperwebsite

paper

Annotators

kael

URL

paperwebsite.com/
www.ahp-numerique.fr www.ahp-numerique.fr

Outils pour la transcription, l’OCR, l’HTR et l’annotation sémantique des textes

1
1. kael 22 Sep 2022
  
  in Public
  
  ocr annotations
Visit annotations in context

Tags

ocr

annotations

Annotators

kael

URL

ahp-numerique.fr/2018/11/16/texte-transcription-annotation-ocr-htr/
tesseract.projectnaptha.com tesseract.projectnaptha.com

Tesseract.js | Pure Javascript OCR for 100 Languages!

1
1. kael 22 Sep 2022
  
  in Public
  
  tesseract js ocr
Visit annotations in context

Tags

tesseract

ocr

js

Annotators

kael

URL

tesseract.projectnaptha.com/
Aug 2022
www.reddit.com www.reddit.com

r/antinet - Digitizing and compressing notes - Question

1
1. chrisaldrich 25 Aug 2022
  
  in Public
  
  Digitizing and compressing notes - Question
  
  reply to: https://www.reddit.com/r/antinet/comments/wv9hvq/digitizing_and_compressing_notes_question/
  
  I've got a process I still use, though less frequently, that does both photos as well as optical character recognition (OCR) to digitize the words: https://boffosocko.com/2021/12/20/handwriting-my-website-with-a-digital-amanuensis/ The comments have some rich commentary with related ideas as well.
  
  reply digital amanuensis OCR analog zettelkasten
Visit annotations in context

Tags

digital amanuensis

OCR

reply

analog zettelkasten

Annotators

chrisaldrich

URL

reddit.com/r/antinet/comments/wv9hvq/digitizing_and_compressing_notes_question/
Jun 2022
pdf.abbyy.com pdf.abbyy.com

PDF Software: Open, Read & Edit PDFs | FineReader PDF

1
1. chrisaldrich 17 Jun 2022
  
  in Public
  
  I've used ABBY FineReader (best on Windows) and it was much better at correcting OCR than Adobe Acrobat. —Dana Conard
  
  optical character recognition OCR ABBYY FineReader .pdf tools
Visit annotations in context

Tags

ABBYY FineReader

tools

OCR

.pdf

optical character recognition

Annotators

chrisaldrich

URL

pdf.abbyy.com/
vision.cornell.edu vision.cornell.edu

COCO-Text: Dataset for Text Detection and Recognition | SE(3) Computer Vision Group at Cornell Tech

1
1. robertknight 13 Jun 2022
  
  in Public
  
  COCO-Text: Dataset for Text Detection and Recognition
  
  63K images
  
  145K text instances
  
  Feature labels: machine printed / handwritten. Legible / illegible, English / non-English script
  
  See also the COCO-Text V2 site.
  
  ocr
Visit annotations in context

Tags

ocr

Annotators

robertknight

URL

vision.cornell.edu/se3/coco-text-2/
Feb 2022
deftpdf.com deftpdf.com

DeftPDF | Free PDF Software to Edit, Convert, Sign & More.

1
1. zhy 17 Feb 2022
  
  in Public
  
  Free All-in-one PDF tools A reliable, intuitive and productive PDF Software
  
  pdf ocr
Visit annotations in context

Tags

ocr

pdf

Annotators

zhy

URL

deftpdf.com/
Dec 2021
www.hitechnectar.com www.hitechnectar.com

Complete List of 10 Best OCR Apps for Mobile Phones (Android & iOS)

1
1. chrisaldrich 30 Dec 2021
  
  in Public
  
  https://www.hitechnectar.com/blogs/ocr-apps-mobile/
  
  bookmark handwirting OCR apps
Visit annotations in context

Tags

bookmark

OCR

apps

handwirting

Annotators

chrisaldrich

URL

hitechnectar.com/blogs/ocr-apps-mobile/
Nov 2021
www.myscript.com www.myscript.com

MyScript

1
1. chrisaldrich 24 Nov 2021
  
  in Public
  
  https://www.myscript.com/
  
  handwriting handwriting recognition OCR note taking handwritten websites Notability MyScript e-ink bookmark Livescribe
Visit annotations in context

Tags

OCR

note taking

handwriting recognition

Livescribe

bookmark

handwritten websites

handwriting

e-ink

MyScript

Notability

Annotators

chrisaldrich

URL

myscript.com/
Jul 2021
textsniper.app textsniper.app

TextSniper - Capture and extract any text from your Mac's screen | images

1
1. chrisaldrich 29 Jul 2021
  
  in Public
  
  https://textsniper.app/
  
  A paid Apple based tool for text recognition and extraction
  
  OCR Apple text recognition tools
Visit annotations in context

Tags

tools

OCR

text recognition

Apple

Annotators

chrisaldrich

URL

textsniper.app/
Local file Local file

Titi Lucreti Cari De rerum natura libri sex

1
1. chrisaldrich 13 Jul 2021
  
  in Public
  
  T.LUCRETICARI
  
  Not going to be the prettiest version, but at least somewhat OCR'd for annotating!
  
  OCR Lucretius
Tags

Lucretius

OCR

Annotators

chrisaldrich
drive.google.com drive.google.com

Revista 6-2 jul-dec 2007.p65

1
1. chrisaldrich 13 Jul 2021
  
  in Public
  
  Titi Lucreti Cari De Rerum Natura Libri SexWith a Translation and NotesVolume 1Edited by H. A. J. Munro Lucretius
  
  Testing out the OCR functionality of docdrop.org.
  
  I'm noticing that the pdf fingerprint of this text somehow matches that of other texts as there are a lot of non-related annotations on this page.
  
  Is docdrop doing something squirrelly with the fingerprint @dwhly?
  
  Lucretius docdrop OCR .pdf annotations Hypothes.is
Visit annotations in context

Tags

Lucretius

OCR

docdrop

.pdf

Hypothes.is

annotations

Annotators

chrisaldrich

URL

drive.google.com/uc
Feb 2021
web.hypothes.is web.hypothes.is

Annotating the law | Hypothes.is

1
1. chrisaldrich 27 Feb 2021
  
  in Public
  
  tools Hypothes.is OCR .pdf bookmark read
Visit annotations in context

Tags

bookmark

tools

OCR

.pdf

read

Hypothes.is

Annotators

chrisaldrich

URL

web.hypothes.is/help/how-to-join-a-private-group/
Jan 2021
dev.clariah.nl dev.clariah.nl

Kraken - an Universal Text Recognizer for the Humanities

1
1. mromanello 21 Jan 2021
  
  in Public
  
  Apart from a basic segmenter taken from OCRopus a trainable line extractor is in the process of being implemented. Full trainability of layout analysis is of utmost importance to a truly universal OCR system, as text layout and its semantics varies widely across time and space, e.g. hand-crafted methods for printed Latin text are unlikely to work reliably on Arabic text or manuscripts with extensive interlinear annotation.
  
  wip implementation of line segmentation in kraken
  
  ocr olr
Visit annotations in context

Tags

ocr

olr

Annotators

mromanello

URL

dev.clariah.nl/files/dh2019/boa/0673.html
www.morethantechnical.com www.morethantechnical.com

Creating a searchable PDF with opensource tools ghostscript, hocr2pdf and tesseract-ocr

1
1. mromanello 15 Jan 2021
  
  in Public
  
  nice recipe for quickly turning a scanned PDF into a searchable one
  
  ocr tesseract pdf bash-recipe
Visit annotations in context

Tags

tesseract

ocr

pdf

bash-recipe

Annotators

mromanello

URL

morethantechnical.com/2013/11/21/creating-a-searchable-pdf-with-opensource-tools-ghostscript-hocr2pdf-and-tesseract-ocr/
Oct 2020
myscript.com myscript.com

Apps & Demos | MyScript

1
1. chrisaldrich 10 Oct 2020
  
  in Public
  
  MyScript MathPad
  
  This looks like something I could integrate into my workflow.
  
  LaTeX Mathematics handwriting to text OCR
Visit annotations in context

Tags

LaTeX

Mathematics

OCR

handwriting to text

Annotators

chrisaldrich

URL

myscript.com/technology/technical-demonstrations/
Jul 2020
www.digitalhumanities.org www.digitalhumanities.org

DHQ: Digital Humanities Quarterly: Textension: Digitally Augmenting Document Spaces in Analog Texts

1
1. dominicboisvert 30 Jul 2020
  
  in Public
  
  ARV3054 OCR numérisation
Visit annotations in context

Tags

numérisation

OCR

ARV3054

Annotators

dominicboisvert

URL

digitalhumanities.org/dhq/vol/13/3/000426/000426.html
Apr 2020
web.hypothes.is web.hypothes.is

OCRing a PDF

1
1. diegodlh 29 Apr 2020
  
  in Public
  
  Adobe AcrobatPro.
  
  gImageReader is an excellent open source alternative. It runs both on Windows and Linux, and it provides a simple (yet powerful) frontend GUI to Google's robust open source OCR engine, Tesseract.
  
  I think an open source tool as this is a better fit to the open annotation ecosystem, based on libre software and standards, that Hypothesis promotes, instead of a proprietary (and expensive) tool such as Adobe AcrobatPro.
  
  ocr hypothes.is
Visit annotations in context

Tags

hypothes.is

ocr

Annotators

diegodlh

URL

web.hypothes.is/ocring-a-pdf/
tesseract-ocr.github.io tesseract-ocr.github.io

Technical Documentation

1
1. raj_reddy 24 Apr 2020
  
  in Public
  
  tessdoc Tesseract documentation
  
  ocr tesseract documentation
Visit annotations in context

Tags

tesseract

ocr

documentation

Annotators

raj_reddy

URL

tesseract-ocr.github.io/tessdoc/Documentation.html
Apr 2019
www.archimag.com www.archimag.com

Lancement d'OCR4all, un outil open source et gratuit de reconnaissance de caractères anciens pour les chercheurs en histoire et les archivistes

1
1. dominicboisvert 28 Apr 2019
  
  in Public
  
  ARV3054 OCR numérisation
Visit annotations in context

Tags

numérisation

OCR

ARV3054

Annotators

dominicboisvert

URL

archimag.com/archives-patrimoine/2019/04/24/ocr4all-open-source-gratuit-reconnaissance-caracteres-anciens
Sep 2015
groups.google.com groups.google.com

Suggestion and question on Hypothesis service - Google Groups

1
1. judell 11 Sep 2015
  
  in Public
  
  h_support pdf ocr
Visit annotations in context

Tags

h_support

pdf

ocr

Annotators

judell

URL

groups.google.com/a/hypothes.is/forum/
Aug 2015
biostor.org biostor.org

The southern forms of Serinus canicottis (Swainson)

2
1. rdmpage 31 Aug 2015
  
  in Public
  
  $?
  
  ♀♀
  
  ocr
2. rdmpage 31 Aug 2015
  
  in Public
  
  <^S
  
  ♂♂
  
  ocr
Visit annotations in context

Tags

ocr

Annotators

rdmpage

URL

biostor.org/reference/112808/page/3

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL