  1. Jul 2023
    1. CRISP-DM has not been built in a theoretical, academic manner working from technicalprinciples, nor did elite committees of gurus create it behind closed doors.
  2. Jun 2023
    1. Learning heterogeneous graph embedding for Chinese legal document similarity

      The paper proposes L-HetGRL, an unsupervised approach using a legal heterogeneous graph and incorporating legal domain-specific knowledge, to improve Legal Document Similarity Measurement (LDSM) with superior performance compared to other methods.

  3. Apr 2023
    1. After struggling with this problem for a while and still being far from solving this issue, I realized that I was making too many requests to the website; which made me come up with the idea of saving all the pages I needed to scrape on my local computer. Next, I started sending requests to these local HTML files instead and kept adapting my code.

      I had similar problem on this.

  4. Jun 2022
  5. Feb 2022
    1. Data Mining und Knowledge Discovery in Databases be-inhalten Methoden der Informations- und Wissensextraktion aus strukturierten Datensätzen [99].

  6. Feb 2021
  7. Mar 2020
    1. multiple scandals have highlighted some very shady practices being enabled by consent-less data-mining — making both the risks and the erosion of users’ rights clear
  8. Sep 2018
      Predictive analytics encompasses a variety of statistical techniques from data mining, predictive modelling, and machine learning, that analyze current and historical facts to make predictions about future or otherwise unknown events.

  9. Aug 2018
  10. Jun 2018
  11. Mar 2018
    1. Introducing Subscribe with Google

      Interesting to see this roll out as Facebook is having some serious data collection problems. This looks a bit like a means for Google to directly link users with content they're consuming online and then leveraging it much the same way that Facebook was with apps and companies like Cambridge Analytica.

  12. Mar 2017
    1. In addition, Neylon suggested that some low-level TDM goes on below the radar. ‘Text and data miners at universities often have to hide their location to avoid auto cut-offs of traditional publishers. This makes them harder to track. It’s difficult to draw the line between what’s text mining and what’s for researchers’ own use, for example, putting large volumes of papers into Mendeley or Zotero,’ he explained.

      Without a clear understanding of what a reference managers can do and what text and data mining is, it seems that some publishers will block the download of fulltexts on their platforms.

  13. Dec 2016
    1. ‘In the past, if you were an alcohol distiller, you could throw up your hands and say, look, I don’t know who’s an alcoholic,’ he said. ‘Today, Facebook knows how much you’re checking Facebook. Twitter knows how much you’re checking Twitter. Gaming companies know how much you’re using their free-to-play games. If these companies wanted to do something, they could.’
  14. Apr 2016
      Delete "preferably". Limiting the scope of text mining to exclude societal and commercial purposes limits the usefulness to enterprises (especially SMEs that cannot mine on their own) as well as to society. These limitations have ramifications in terms of limiting the research questions that researchers can and will pursue.

    2. Encourage researchers not to transfer the copyright on their research outputs before publication.

      This statement is more generally applicable than just to TDM. Besides, "Encourage" is too weak a word here, and from a societal perspective, it would be far better if researchers were to retain their copyright (where it applies), but make their copyrightable works available under open licenses that allow publishers to publish the works, and others to use and reuse it.

  15. Feb 2016
    1. I read my first books on data mining back in the early 1990's and one thing I read was that "80% of the effort in a data mining project goes into data cleaning."