17 Matching Annotations
  1. Nov 2023
    1. Thalia (Minería de textos para resaltar, agregar y vincular información en artículos) es un motor de búsqueda semántica que permite explorar 27 millones de resúmenes de PubMed. En su versión actual, es capaz de reconocer ocho tipos de entidades: 1. quimicos 2. Enfermedades 3. Drogas 4. genes 5. metabolitos 6. Proteínas 7. Especies 8. Entidades anatómicas

      PubMed

  2. Aug 2021
  3. Jul 2021
    1. they do not form the basis for discovery,

      I don't entirely agree with this part of the statement because the digital tools we have allow us to both view information in an entirely new way and to see connections that we couldn't have seen very readily. For example, the ability to take any written work and create a concordance of words can give us great insight that just reading the work would not have. If we wanted to see to what degree society is viewed from a male vs. female perspective between 1920 and 2020 we could analyze specific words in several pieces of literature from those time periods to see how significantly each gender is represented. If not impossible to do before digital tools, it would certainly be so laborious as to render it an insignificant goal in the scheme of humanistic inquiry. Thus we there is a basis for discovery within digital tools.

  4. Mar 2021
  5. Aug 2020
  6. Jun 2020
  7. May 2020
  8. Apr 2020
    1. there is also strong encouragement to make code re-usable, shareable, and citable, via DOI or other persistent link systems. For example, GitHub projects can be connected with Zenodo for indexing, archiving, and making them easier to cite alongside the principles of software citation [25].
      • Teknologi Github dan Gitlab fokus kepada modus teks yang dapat dengan mudah dikenali dan dibaca mesin/komputer (machine readable).

      • Saat ini text mining adalah teknologi utama yang berkembang cepat. Machine learning tidak akan jalan tanpa bahan baku dari teknologi text mining.

      • Oleh karenanya, jurnal-jurnal terutama terbitan LN sudah lama memiliki dua versi untuk setiap makalah yang dirilis, yaitu versi PDF (yang sebenarnya tidak berbeda dengan kertas zaman dulu) dan versi HTML (ini bisa dibaca mesin).

      • Pengolah kata biner seperti Ms Word sangat bergantung kepada teknologi perangkat lunak (yang dimiliki oleh entitas bisnis). Tentunya kode-kode untuk membacanya akan dikunci.

      • Bahkan PDF yang dianggap sebagai cara termudah dan teraman untuk membagikan berkas, juga tidak dapat dibaca oleh mesin dengan mudah.

  9. Oct 2017
  10. Jul 2017
    1. This third research question led to the formulation of agile text mining, a new methodologyagile textminingto support the development of efficient TMAs. Agile text mining copes with the unpredictablerealities of creating text-mining applications.
  11. Mar 2017
    1. In addition, Neylon suggested that some low-level TDM goes on below the radar. ‘Text and data miners at universities often have to hide their location to avoid auto cut-offs of traditional publishers. This makes them harder to track. It’s difficult to draw the line between what’s text mining and what’s for researchers’ own use, for example, putting large volumes of papers into Mendeley or Zotero,’ he explained.

      Without a clear understanding of what a reference managers can do and what text and data mining is, it seems that some publishers will block the download of fulltexts on their platforms.

  12. Apr 2016
    1. preferably

      Delete "preferably". Limiting the scope of text mining to exclude societal and commercial purposes limits the usefulness to enterprises (especially SMEs that cannot mine on their own) as well as to society. These limitations have ramifications in terms of limiting the research questions that researchers can and will pursue.

    2. Encourage researchers not to transfer the copyright on their research outputs before publication.

      This statement is more generally applicable than just to TDM. Besides, "Encourage" is too weak a word here, and from a societal perspective, it would be far better if researchers were to retain their copyright (where it applies), but make their copyrightable works available under open licenses that allow publishers to publish the works, and others to use and reuse it.

  13. Feb 2014
    1. National governments are also weighing in on the issue. The UK government aims this April to make text-mining for non-commercial purposes exempt from copyright, allowing academics to mine any content they have paid for.

      UK government intervening to make text-mining for non-commercial purposes exempt from copyright.

    2. “Our plan is just to wait for the copyright exemption to come into law in the United Kingdom so we can do our own content-mining our own way, on our own platform, with our own tools,” says Mounce. “Our project plans to mine Elsevier’s content, but we neither want nor need the restricted service they are announcing here.”

      This seems to be a sensible move rather than be hindered not by copyright, but by the onerous contract that Elsevier wants to put in place.

    3. some researchers feel that a dangerous precedent is being set. They argue that publishers wrongly characterize text-mining as an activity that requires extra rights to be granted by licence from a copyright holder, and they feel that computational reading should require no more permission than human reading. “The right to read is the right to mine,” says Ross Mounce of the University of Bath, UK, who is using content-mining to construct maps of species’ evolutionary relationships.

      "The right to read is the right to mine."

  14. Nov 2013
    1. In a Literary Lab project on 18th-century novels, English students study a database of nearly 2,000 early books to tease out when “romances,” “tales” and “histories” first emerged as novels, and what the different terms signified.

      This may be a reference to the Eighteenth Century Collection Online-Text Creation Partnership (ECCO-TCP) project, which transcribed and marked up in XML ~2,200 eighteenth-century books from the Eighteenth Century Collections Online database (ECCO). The ECCO-TCP corpus is in the public domain and available for anyone to use: http://www.textcreationpartnership.org/tcp-ecco/