39 Matching Annotations
  1. Feb 2023
    1. A huge percentage of the data that gets processed is less than 24 hours old. By the time data gets to be a week old, it is probably 20 times less likely to be queried than from the most recent day. After a month, data mostly just sits there.
    2. Customer data sizes followed a power-law distribution. The largest customer had double the storage of the next largest customer, the next largest customer had half of that, etc.
    3. the vast majority of customers had less than a terabyte of data in total data storage. There were, of course, customers with huge amounts of data, but most organizations, even some fairly large enterprises, had moderate data sizes.
    4. Most applications do not need to process massive amounts of data. This has led to a resurgence in data management systems with traditional architectures; SQLite, Postgres, MySQL are all growing strongly, while “NoSQL” and even “NewSQL” systems are stagnating.

      SQL still shines over NoSQL

    5. The most surprising thing that I learned was that most of the people using “Big Query” don’t really have Big Data. Even the ones who do tend to use workloads that only use a small fraction of their dataset sizes.
  2. Nov 2022
  3. Nov 2021
    1. May 13, 2013 | 43 Comments Big Data Needs Thick Data Tricia Wang Editor’s Note: Tricia provides an excellent segue between last month’s “Ethnomining” Special Edition and this month’s on “Talking to Companies about Ethnography.” She offers further thoughts building on our collective discussion (perhaps bordering on obsession?) with the big data trend. With nuance she tackles and reinvents some of the terminology circulating in the various industries that wish to make use of social research. In the wake of big data, ethnographers, she suggests, can offer thick data. In the face of derisive mention of “anecdotes” we ought to stand up to defend the value of stories. __________________________________________________ image from Mark Smiciklas at Intersection Consulting Big Data can have enormous appeal. Who wants to be thought of as a small thinker when there is an opportunity to go BIG? The positivistic bias in favor of Big Data (a term often used to describe the quantitative data that is produced through analysis of enormous datasets) as an objective way to understand our world presents challenges for ethnographers. What are ethnographers to do when our research is seen as insignificant or invaluable? Can we simply ignore Big Data as too muddled in hype to be useful? No. Ethnographers must engage with Big Data. Otherwise our work can be all too easily shoved into another department, minimized as a small line item on a budget, and relegated to the small data corner. But how can our kind of research be seen as an equally important to algorithmically processed data? What is the ethnographer’s 10 second elevator pitch to a room of data scientists? …and GO! Big Data produces so much information that it needs something more to bridge and/or reveal knowledge gaps. That’s why ethnographic work holds such enormous value in the era of Big Data. Lacking the conceptual words to quickly position the value of ethnographic work in the context of Big Data, I have begun, over the last year, to employ the term Thick Data (with a nod to Clifford Geertz!) to advocate for integrative approaches to research. Thick Data uncovers the meaning behind Big Data visualization and analysis. Thick Data: ethnographic approaches that uncover the meaning behind Big Data visualization and analysis. Thick Data analysis primarily relies on human brain power to process a small “N” while big data analysis requires computational power (of course with humans writing the algorithms) to process a large “N”. Big Data reveals insights with a particular range of data points, while Thick Data reveals the social context of and connections between data points. Big Data delivers numbers; thick data delivers stories. Big data relies on machine learning; thick data relies on human learning.
  4. Mar 2021
  5. Feb 2021
    1. There are two directions to look for: first, using the principle of independence between the sources and the knowledge management layer, and second, fine tuning the balance between automatic processing and manual curation.
  6. Mar 2020
    1. En relación con los ciudadanos, estos deben contar con alfabetización en datos44, esto es, las capacidades para navegar en sus propios ecosistemas de datos para producirlos , apropiarlos , comunicarlos y usarlos

      Alfabetización de datos

    2. la innovación basada en datos corresponde al aprovechamiento de los mismos mediante la aplicación de técnicas de analítica para mejorar o crear nuevos bienes, servicios o procesos, que aporten a la diversificación y sofisticación de la economía y a la generación de valor social, como una nueva fuente de crecimiento (OCDE, 2015).

      Innovación con datos significa crear bienes, servicios o procesos a partir de técnicas de analítica.

    3. La recolección, almacenamiento y procesamiento de datos da lugar a la información, de la cual es posible obtener conocimiento

      Definición de Datos, Información y Conocimiento. Esta definición ya había sido planteada por la Gestión del Conocimiento desde hace más de 40 años

    4. es necesario que las condiciones para la explotación de datos sean impulsadas mediante la intervención pública, corrigiendo las fallas de gobierno que impiden el surgimiento de elementos habilitadores. Lo anterior,mediante el aprovechamiento de un activo público que es generado de manera rutinaria y masiva, que por su naturaleza no es creado por el mercado: los datos públicos.

      Este documento aborda los Datos Abiertos y BigData para la generación de nuevos mercados.

    5. el PND 2014-2018 es el único antecedente directo que determina expresamente la necesidad de disponer de una política pública de explotación de datos

      La explotación de datos en Colombia se plantea inicialmente, a nivel normativo, en eel PND 2014-2018

  7. Aug 2019
    1. “patient sovereignty” will now become an important debate. In particular, the ownership of data in healthcare, while already an important topic of discussion, will become an even more complex argument.

      Discussão sobre os dados em/na/da saúde

    2. m-health offers predominantly interconnectivity between patients and healthcare professionals while IoT devices offer the ability to collect information and perform procedures with increasingly minimal invasion. Finally, big data gives healthcare professionals an opportunity to spot trends and patterns for both individual patients and groups of patients, improving the speed of diagnosis and disease prevention. In the next section the third and final pillar of Health 4.0; design, is discussed

      Como as tecnologias interagem na Saúde 4.0

  8. Jun 2019
    1. In marketing, familiar uses of big data include “recommendation engines” like those used by companies such as Netflix and Amazon to make purchase suggestions based on the prior interests of one customer as compared to millions of others.

      Jonathan Shaw explained "Big Data" as a beneficial device in our society. He describes "Big Data" can be helpful to find awareness and tendency especially in the industry. For example, giving the consumer's pattern of purchase from the big number of information. However, when you have so much information, it can be an obstruction to find good specific detail that you are looking for. Knowing the characteristic which advantage and weakness of how to handle "Big Data" will be the key of a development in our society.

  9. Nov 2017
    1. They have a very simplistic view of the activity being monitored by only distilling it down into only a few dimensions for the rule to interrogate

      Number of dimensions need to be large. In normal database systems these dimensions are small.

  10. Sep 2016
    1. often private companies whose technologies power the systems universities use for predictive analytics and adaptive courseware
    2. the use of data in scholarly research about student learning; the use of data in systems like the admissions process or predictive-analytics programs that colleges use to spot students who should be referred to an academic counselor; and the ways colleges should treat nontraditional transcript data, alternative credentials, and other forms of documentation about students’ activities, such as badges, that recognize them for nonacademic skills.

      Useful breakdown. Research, predictive models, and recognition are quite distinct from one another and the approaches to data that they imply are quite different. In a way, the “personalized learning” model at the core of the second topic is close to the Big Data attitude (collect all the things and sense will come through eventually) with corresponding ethical problems. Through projects vary greatly, research has a much more solid base in both ethics and epistemology than the kind of Big Data approach used by technocentric outlets. The part about recognition, though, opens the most interesting door. Microcredentials and badges are a part of a broader picture. The data shared in those cases need not be so comprehensive and learners have a lot of agency in the matter. In fact, when then-Ashoka Charles Tsai interviewed Mozilla executive director Mark Surman about badges, the message was quite clear: badges are a way to rethink education as a learner-driven “create your own path” adventure. The contrast between the three models reveals a lot. From the abstract world of research, to the top-down models of Minority Report-style predictive educating, all the way to a form of heutagogy. Lots to chew on.

  11. Jul 2016
    1. I could have easily chosen a different prepositional phrase. "Convivial Tools in an Age of Big Data.” Or “Convivial Tools in an Age of DRM.” Or “Convivial Tools in an Age of Venture-Funded Education Technology Startups.” Or “Convivial Tools in an Age of Doxxing and Trolls."

      The Others.

  12. Apr 2016
    1. “fundamentally if we want to realize the potential of human networks to change how we work then we need analytics to transform information into insight otherwise we will be drowning in a sea of content and deafened by a cacophony of voices”

      Marie Wallace's perspective on the potential of bigdata analytics, specifically analysis of human networks, in the context of creating a smarter workplace.

  13. Jan 2016
  14. Dec 2015
    1. your system is able to flag at least a critical mass of videos taught in the Mueller method as having a bigger educational impact on the students the average educational video by some measure you have identified

      Sounds like a neat description of what many Big Data enthusiasts are actually trying to do. Some Big Data positivists do go so far as to claim that the “inference engine” will eventually be powerful enough to find meaning. But this distinction is within the Big Data field, not between it and other fields.

    2. sufficiently rich information
    3. It’s educators who come up with hypotheses and test them using a large data set.

      And we need an ever-larger data set, right?

    4. Some were done this way on purpose but based on intuitions by classroom teachers.

      Isn’t Big Data partly about reverse-engineering these intuitions?

    5. hadoop thingamabob back end
    6. a good example of the kind of insight that big data is completely blind to

      Not sure it follows directly, but also important to point out.

    1. As long as the content in SmartBooks is locked down, then it is possible to run machine learning algorithms against the clicks of millions of students using that content. To the degree that the platform is opened up for custom, newly created books, the controlled experiment goes away and the possibility of big data analysis goes with it.

      Not sure it follows…

    2. they are making a bet against the software as a replacement for the teacher and against big data
  15. Nov 2015
  16. Jun 2015
    1. Ein Start-up aus den USA testet ein neues Hilfsmittel zur Minimierung der Folgen von Datendiebstählen: Es prüft, ob in versteckten Teilen des Web fremde Daten zum Kauf angeboten werden.