3,479 Matching Annotations
  1. Oct 2019
    1. I frequently talk with people who are not that concerned about surveillance, or who feel that the positives outweigh the risks. Here, I want to share some important truths about surveillance: Surveillance can facilitate human rights abuses and even genocide Data is often used for different purposes than why it was collected Data often contains errors Surveillance typically operates with no accountability Surveillance changes our behavior Surveillance disproportionately impacts the marginalized Data privacy is a public good We don’t have to accept invasive surveillance

    Tags

    Annotators

    URL

    1. Terminar los proyectos que empezamos en 2019, con prioridad en Documentatón, ya que no es un cover, sino que es nuestro propio libro.

      Para mí el tema de acabarlo son recursos (tiempo y dinero, etc). Podemos ir avanzando de a trozos un capítulo a la vez, haciéndolo de encuentro en encuentro, pero esto daría un ritmo muy lento. La experiencia previa muestra que esto no es sostenible y que si queremos un libro terminado, más que un esfuerzo colectivo, se requerirá un alto esfuerzo individual. Como muestra la gráfica de reportes de la documentatón, una persona puede hacer más que la suma de las restantes (hablando de no temerle a la soledad):

      Sin embargo, si esto es lo que está pasando, reflejando las métricas de muchos proyectos de software libre, dependemos fuertemente de los tiempos de esos individuos. En mi caso, no puedo continuar con la documentatón hasta no resolver el tema de los artículos de mi graduación, que se volvió una verdadera telenovela (eso merece su entrada de blog aparte) y preferí asuntos como los Data Haiku, precisamente porque son actividades más puntuales que todo un libro, que luego pueden convertirse en capítulos de uno (por ejemplo el de Datactivismos) pero transmitiendo ese aire de lo ágil y de lo terminado, que precisamente quisiéramos comunicar.

      Creo que tenemos que reconocer en las dinámicas comunitarias, qué podemos hacer en ellas y en cuáles ritmos, y entender que lo otro requerirá de recursos extra (económicos, personales, etc) que tendremos que proveer como personas naturales o jurídicas, con nuestro propio esfuerzo o el de nuestras empresas/fundaciones.

    2. Selección comunitaria de temas para Data Weeks o Data Rodas. Apoyo de proyectos de los participantes de la comunidad. Reuniones periódicas de la Comunidad, algo así como Data Roda el primer viernes de cada mes, así sea para saludarnos síncronamente y ver en qué andamos y hacer un encera - brilla de bacanes.

      Estos tres puntos se podrían juntar con la idea de que los participantes propongan sus propios proyectos y se apropien de la planeación y ejecución de las Data Rodas o Data Weeks venideros.

      Sólo quitaría el carácter periódico, pues creo que una de las potencias de nuestra comunidad es responder flexiblemente a lo eventual. Por ejemplo, ahora tenemos un periodo electoral en Colombia. De allí surgió mi preocupación por visualizar financiación de campañas, pero los eventos de la semana pasada derivaron en blikis, con soporte de comentarios. Una reacción ágil a la contingencia y no el seguimiento riguroso de algo pre-planeado (a mi me gustaría retomar lo de financiación de campañas, pero será luego).

      De nuevo la sugerencia, como dije en mi entrada de respuesta a esta, y en otras ocasiones es sustituir la planeación por la coordinación. Mi propuesta de coordinación es la siguiente:

      • Los miembros que quieren ver otras temáticas las proponen en los canales comunitarios y se apersonan de su preparación y ejecución.
      • Los otros miembros respondemos a esas iniciativas autónomas, en solidaridad, acompañando esas sesiones y aportándoles.
      • Al final de cada evento, miramos hacia dónde podemos llevar los otros.
    3. Si puedo aportar una herramienta más a Grafoscopio, quiero que sea la querendura.

      Me parece muy potente la querendura como metodología. Sin embargo, por lo pronto siento que es un listado amplio de ideas sueltas en el enlace que nos presentas y me gustaría indagar por las prácticas concretas que la hacen posible.

  2. Sep 2019
    1. if n is very small (for example n = 3), rather than showing error bars and statistics, it is better to simply plot the individual data points.
    1. Keep the ergonomics of stable reference and directly mutable objects. In other words; be able to have a variable pointing to an object, and make subsequent reads or writes to it. Without needing to fear that you’re working with old data. While, in the background,..State is stored in an immutable, structurally shared tree.
    1. With MobX you don't need to normalize your data.

      flip side: https://codeburst.io/the-curious-case-of-mobx-state-tree-7b4e22d461f:

      MobX cannot guarantee your data is JSON serializable,

    1. Estimated economic benefit of data linkage

      the potential value from linking Census data to administrative data sets is only beginning to be realised and holds immense potential.(In other work for the Population Health Research Network, Lateral Economics concluded that data linkage generated over $16 for every dollar invested).

    2. Cost reduction suggestion

      there may be ways to reduce costs associated with the development of Census-equivalent statistics, including relying less on the general public to answer questions every five years

    1. “But then again,” a person who used information in this way might say, “it’s not like I would be deliberately discriminating against anyone. It’s just an unfortunate proxy variable for lack of privilege and proximity to state violence.

      In the current universe, Twitter also makes a number of predictions about users that could be used as proxy variables for economic and cultural characteristics. It can display things like your audience's net worth as well as indicators commonly linked to political orientation. Triangulating some of this data could allow for other forms of intended or unintended discrimination.

      I've already been able to view a wide range (possibly spurious) information about my own reading audience through these analytics. On September 9th, 2019, I started a Twitter account for my 19th Century Open Pedagogy project and began serializing installments of critical edition, The Woman in White: Grangerized. The @OPP19c Twitter account has 62 followers as of September 17th.

      Having followers means I have access to an audience analytics toolbar. Some of the account's followers are nineteenth-century studies or pedagogy organizations rather than individuals. Twitter tracks each account as an individual, however, and I was surprised to see some of the demographics Twitter broke them down into. (If you're one of these followers: thank you and sorry. I find this data a bit uncomfortable.)

      Within this dashboard, I have a "Consumer Buying Styles" display that identifies categories such as "quick and easy" "ethnic explorers" "value conscious" and "weight conscious." These categories strike me as equal parts confusing and problematic: (Link to image expansion)

      I have a "Marital Status" toolbar alleging that 52% of my audience is married and 49% single.

      I also have a "Home Ownership" chart. (I'm presuming that the Elizabeth Gaskell House Museum's Twitter is counted as an owner...)

      ....and more

    1. On the other hand, a resource may be generic in that as a concept it is well specified but not so specifically specified that it can only be represented by a single bit stream. In this case, other URIs may exist which identify a resource more specifically. These other URIs identify resources too, and there is a relationship of genericity between the generic and the relatively specific resource.

      I was not aware of this page when the Web Annotations WG was working through its specifications. The word "Specific Resource" used in the Web Annotations Data Model Specification always seemed adequate, but now I see that it was actually quite a good fit.

  3. Aug 2019
    1. Material Design Material System Introduction Material studies About our Material studies Basil Crane Fortnightly Owl Rally Reply Shrine Material Foundation Foundation overview Environment Surfaces Elevation Light and shadows Layout Understanding layout Pixel density Responsive layout grid Spacing methods Component behavior Applying density Navigation Understanding navigation Navigation transitions Search Color The color system Applying color to UI Color usage Text legibility Dark theme Typography The type system Understanding typography Language support Sound About sound Applying sound to UI Sound attributes Sound choreography Sound resources Iconography Product icons System icons Animated icons Shape About shape Shape and hierarchy Shape as expression Shape and motion Applying shape to UI Motion Understanding motion Speed Choreography Customization Interaction Gestures Selection States Material Guidelines Communication Confirmation & acknowledgement Data formats Data visualization Principles Types Selecting charts Style Behavior Dashboards Empty states Help & feedback Imagery Launch screen Onboarding Offline states Writing Guidelines overview Material Theming Overview Implementing your theme Components App bars: bottom App bars: top Backdrop Banners Bottom navigation Buttons Buttons: floating action button Cards Chips Data tables Dialogs Dividers Image lists Lists Menus Navigation drawer Pickers Progress indicators Selection controls Sheets: bottom Sheets: side Sliders Snackbars Tabs Text fields Tooltips Usability Accessibility Bidirectionality Platform guidance Android bars Android fingerprint Android haptics Android icons Android navigating between apps Android notifications Android permissions Android settings Android slices Android split-screen Android swipe to refresh Android text selection toolbar Android widget Cross-platform adaptation Data visualization Data visualization depicts information in graphical form. Contents Principles Types Selecting charts Style Behavior Dashboards Principles Data visualization is a form of communication that portrays dense and complex information in graphical form. The resulting visuals are designed to make it easy to compare data and use it to tell a story – both of which can help users in decision making. Data visualization can express data of varying types and sizes: from a few data points to large multivariate datasets. AccuratePrioritize data accuracy, clarity, and integrity, presenting information in a way that doesn’t distort it. HelpfulHelp users navigate data with context and affordances that emphasize exploration and comparison. ScalableAdapt visualizations for different device sizes, while anticipating user needs on data depth, complexity, and modality. Types Data visualization can be expressed in different forms. Charts are a common way of expressing data, as they depict different data varieties and allow data comparison.The type of chart you use depends primarily on two things: the data you want to communicate, and what you want to convey about that data. These guidelines provide descriptions of various different types of charts and their use cases.Types of chartsChange over time charts show data over a period of time, such as trends or comparisons across multiple categories. Common use cases include: Category comparison...Read MoreChange over timeChange over time charts show data over a period of time, such as trends or comparisons across multiple categories.Common use cases include: Stock price performanceHealth statisticsChronologies Change over time charts include:1. Line charts 2. Bar charts 3. Stacked bar charts 4. Candlestick charts 5. Area charts 6. Timelines 7. Horizon charts 8. Waterfall charts Category comparisonCategory comparison charts compare data between multiple distinct categories. Use cases include: Income across different countriesPopular venue timesTeam allocations Category comparison charts include: 1. Bar charts 2. Grouped bar charts 3. Bubble charts 4. Multi-line charts 5. Parallel coordinate charts 6. Bullet charts RankingRanking charts show an item’s position in an ordered list.Use cases include: Election resultsPerformance statistics Ranking charts include: 1. Ordered bar charts 2. Ordered column charts 3. Parallel coordinate charts Part-to-wholePart-to-whole charts show how partial elements add up to a total.Use cases include: Consolidated revenue of product categoriesBudgets Part-to-whole charts include: 1. Stacked bar charts 2. Pie charts 3. Donut charts 4. Stacked area charts 5. Treemap charts 6. Sunburst charts CorrelationCorrelation charts show correlation between two or more variables.Use cases include: Income and life expectancy Correlation charts include: 1. Scatterplot charts 2. Bubble charts 3. Column and line charts 4. Heatmap charts DistributionDistribution charts show how often each values occur in a dataset. Use cases include: Population distributionIncome distribution Distribution charts include: 1. Histogram charts 2. Box plot charts 3. Violin charts 4. Density charts FlowFlow charts show movement of data between multiple states.Use cases include: Fund transfersVote counts and election results Flow charts include: 1. Sankey charts 2. Gantt charts 3. Chord charts 4. Network charts RelationshipRelationship charts show how multiple items relate to one other.Use cases includeSocial networksWord charts Relationship charts include: 1. Network charts 2. Venn diagrams 3. Chord charts 4. Sunburst charts Selecting charts Multiple types of charts can be suitable for depicting data. The guidelines below provide insight into how to choose one chart over another. Showing change over timeChange over time can be expressed using a time series chart, which is a chart that represents data points in chronological order. Charts that express...Read MoreChange over time can be expressed using a time series chart, which is a chart that represents data points in chronological order. Charts that express change over time include: line charts, bar charts, and area charts.Type of chartUsageBaseline value * Quantity of time seriesData typeLine chartTo express minor variations in dataAny valueAny time series (works well for charts with 8 or more time series)ContinuousBar chartTo express larger variations in data, how individual data points relate to a whole, comparisons, and rankingZero4 or fewerDiscrete or categoricalArea chartTo summarize relationships between datasets, how individual data points relate to a wholeZero (when there’s more than one series)8 or fewerContinuous* The baseline value is the starting value on the y-axis.Bar and pie chartsBoth bar charts and pie charts can be used to show proportion, which expresses a partial value in comparison to a total value. Bar charts,...Read MoreBoth bar charts and pie charts can be used to show proportion, which expresses a partial value in comparison to a total value. Bar charts express quantities through a bar’s length, using a common baselinePie charts express portions of a whole, using arcs or angles within a circleBar charts, line charts, and stacked area charts are more effective at showing change over time than pie charts. Because all three of these charts share the same baseline of possible values, it’s easier to compare value differences based on bar length. Do.Use bar charts to show changes over time or differences between categories. Don’t.Don’t use multiple pie charts to show changes over time. It’s difficult to compare the difference in size across each slice of the pie. Area chartsArea charts come in several varieties, including stacked area charts and overlapped area charts: Overlapping area charts are not recommended with more than two time...Read MoreArea charts come in several varieties, including stacked area charts and overlapped area charts:Stacked area charts show multiple time series (over the same time period) stacked on top of one another Overlapped area charts show multiple time series (over the same time period) overlapping one anotherOverlapping area charts are not recommended with more than two time series, as doing so can obscure the data. Instead, use a stacked area chart to compare multiple values over a time interval (with time represented on the horizontal axis). Do.Use a stacked area chart to represent multiple time series and maintain a good level of legibility. Don’t.Don’t use overlapped area charts as it obscures data values and reduces readability. Style Data visualizations use custom styles and shapes to make data easier to understand at a glance, in ways that suit the user’s needs and context.Charts can benefit from customizing the following: Graphical elementsTypographyIconographyAxes and labelsLegends and annotationsStyling different types of dataVisual encoding is the process of translating data into visual form. Unique graphical attributes can be applied to both quantitative data (such as temperature, price,...Read MoreVisual encoding is the process of translating data into visual form. Unique graphical attributes can be applied to both quantitative data (such as temperature, price, or speed) and qualitative data (such as categories, flavors, or expressions). These attributes include:ShapeColorSizeAreaVolumeLengthAnglePosition DirectionDensityExpressing multiple attributesMultiple visual treatments can be applied to more than one aspect of a data point. For example, a bar color can represent a category, while a bar’s length can express a value (like population size). Shape can be used to represent qualitative data. In this chart, each category is represented by a specific shape (circles, squares, and triangles), which makes it easy to compare data both within a specific range or against other categories. ShapeCharts can use shapes to display data in a range of ways. A shape can be styled as playful and curvilinear, or precise and high-fidelity,...Read MoreCharts can use shapes to display data in a range of ways. A shape can be styled as playful and curvilinear, or precise and high-fidelity, among other ways in between. Level of shape detailCharts can represent data at varying levels of precision. Data intended for close exploration should be represented by shapes that are suitable for interaction (in terms of touch target size and related
  4. Jul 2019
    1. Every time your child opens the email, that person knows generally where they are (or specifically, if they have other info to triangulate against).
    1. In contrast to such pseudonymous social networking, Facebook is notable for its longstanding emphasis on real identities and social connections.

      Lack of anonymity also increases Facebook's ability to properly link shadow profiles purchased from other data brokers.

    1. our sum of squares is 41.187941.187941.1879

      Just considering the Y, and not the X. Calculating the residuals from the average/mean Y.

    1. in clustering analyses, standardization may be especially crucial in order to compare similarities between features based on certain distance measures. Another prominent example is the Principal Component Analysis, where we usually prefer standardization over Min-Max scaling, since we are interested in the components that maximize the variance

      Use standardization, not min-max scaling, for clustering and PCA.

    2. As a rule of thumb I’d say: When in doubt, just standardize the data, it shouldn’t hurt.
    1. driven by data—where schools use data to identify a problem, select a strategy to address the problem, set a target for improvement, and iterate to make the approach more effective and improve student achievement.

      Gates data model.

    1. many elements used in the objective function of a learning algorithm (such as the RBF kernel of Support Vector Machines or the l1 and l2 regularizers of linear models) assume that all features are centered around zero and have variance in the same order. If a feature has a variance that is orders of magnitude larger than others, it might dominate the objective function and make the estimator unable to learn from other features correctly as expected.
  5. Jun 2019
  6. varsellcm.r-forge.r-project.org varsellcm.r-forge.r-project.org
    1. missing values are managed, without any pre-processing, by the model used to cluster with the assumption that values are missing completely at random.

      VarSelLCM package

    1. Success ina data science project comes not from access to any one exotic tool, but from having quantifiablegoals, good methodology, crossdiscipline interactions, and a repeatable workflow.

    Tags

    Annotators

    1. Academicsarealsoatfaulthere:arecentanalysisof29millionpapersinover15,000peer-reviewedtitlespublishedaroundthetimeoftheZikaandEbolaepidemicsfoundthatlessthan1%exploredthegenderedimpactoftheoutbreaks

      How do we prevent this pattern here at Georgia Tech? There is a very obvious gender gap, especially in STEM where bad data in medicine and engineering are collected? What are some mini steps we can take to encourage pursuing data for different backgrounds? Education is always first, starting with class similar to this one informing people about how gender plays a role. Perhaps then we can create projects exploring the issue in data related to each person's major.

    Tags

    Annotators

    1. However, this doesn’t mean that Min-Max scaling is not useful at all! A popular application is image processing, where pixel intensities have to be normalized to fit within a certain range (i.e., 0 to 255 for the RGB color range). Also, typical neural network algorithm require data that on a 0-1 scale.

      Use min-max scaling for image processing & neural networks.

    2. The result of standardization (or Z-score normalization) is that the features will be rescaled so that they’ll have the properties of a standard normal distribution with μ=0μ=0\mu = 0 and σ=1σ=1\sigma = 1 where μμ\mu is the mean (average) and σσ\sigma is the standard deviation from the mean
  7. May 2019
    1. 1RWDOOPRYLHVKDYHWREHGRFXPHQWDULHVDQGQRWDOOYLVXDOL]DWLRQKDVWREHWUDGLWLRQDOFKDUWVDQGJUDSKV

      This is an interesting fact, usually when I think of visualization and data I go to the classic default charts and data. I'll have to keep this iin mind.

    2. 7KHEDVHRIWKHJUDSKLFLVVLPSO\DOLQHFKDUW+RZHYHUGHVLJQHOHPHQWVKHOSWHOOWKHVWRU\EHWWHU/DEHOLQJDQGSRLQWHUVSURYLGHFRQWH[WDQGKHOS\RXVHHZK\WKHGDWDLVLQWHUHVWLQJDQGOLQHZLGWKDQGFRORUGLUHFW\RXUH\HVWRZKDW¶VLPSRUWDQW

      I really like this because I don't see it often and it actually does draw my eye to the data and capture my interest.

    1. Virtually all BPMs have utilities for creating simple, data-gathering forms. And in many types of workflows, these simple forms may be adequate. However, in any workflow that includes complex document assembly (such as loan origination workflows), BPM forms are not likely to get the job done. Automating the assembly of complex documents requires ultra-sophisticated data-gathering forms, which can only be designed and created after the documents themselves have been automated. Put another way, you won't know which questions need to be asked to generate the document(s) until you've merged variables and business logic into the documents themselves. The variables you merge into the document serve as question fields in the data gathering forms. And here's the key point - since you have to use the document assembly platform to create interviews that are sophisticated enough to gather data for your complex documents, you might as well use the document assembly platform to generate all data-gathering forms in all of your workflows.
    1. El ritmo de las actividades de diseño e instalación de redes comunitarias en veredas del municipio de Fusagasugá se ve acrecentado por las convocatorias internas de investigación de la Universidad de Cundinamarca que a lo largo del tiempo de vida de Red FusaLibrehan sido un músculo financiero que les permite acelerar los proc

      Interesante vínculo entre comunidad y universidad. En nuestro caso, no hemos logrado un vínculo permanente y si bien algunos dineros de convocatorias de investigación universitaria y convocatorias internacionales permitieron pagar parte de los Data Weeks, junto con una contribución menor de algunos asistentes, en general ha sido un proyecto financiado con recursos propios y préstamos familiares.

    1. Developing economies’ copper demand has steadily grown over the last decades, fueling economic and social improvement. By 2011, China already represented 40% of the demand.

      Why does China need so much.

    2. Codelco is a state-owned Chilean mining company and the world’s largest copper producer. Based on their annual report and USGS statistics, they produced ~10% of the world’s copper in 2015 and own 8% of global reserves. They are also a large producer of greenhouse gas emissions. Last year, Codelco produced 3,2 t CO2e/millions tmf from both indirect and direct effects, and in 2011 it consumed 12% of the total national electricity supply.

      Goddamn they should start recylcling

    1. Methodology The classic OSINT methodology you will find everywhere is strait-forward: Define requirements: What are you looking for? Retrieve data Analyze the information gathered Pivoting & Reporting: Either define new requirements by pivoting on data just gathered or end the investigation and write the report.

      Etienne's blog! Amazing resource for OSINT; particularly focused on technical attacks.

  8. Apr 2019
    1. Powered by Data wrote 4 of the resources on this page. "Measuring Outcomes" is about admin data. "Understanding the Philanthropic Landscape" is about open data - sp. open grants data. "Effective Giving" is an intro. And "Emerging Data Practices" is a tech backgrounder from June 2015.

    1. Instead of encouraging more “data-sharing”, the focus should be the cultivation of “data infrastructure”,¹⁴ maintained for the public good by institutions with clear responsibilities and lines of accountability.

  9. Mar 2019
  10. www.archivogeneral.gov.co www.archivogeneral.gov.co
    1. Normalización de las entradas descriptivas: Personas, Lugares, Instituciones (utilización de Linked Open Data (LOD) cuando sea posible.

      ¿Qué sistema de organización de conocimiento se los posibilita? ¿Qué están usando para enlazar datos y en qué formato?

    1. The government needs to place tough restrictions on data collection and storage by businesses to limit the amount of damage in the event of a cyber breach.

      I find it hard to imagine how this could be usefully implemented. How is monitoring of data collection going to be done?

      Even simpler ideas, like the Do Not Call registry, have difficulty clamping down on businesses that breach regulations.

    1. Mithering about the unmodellable. "Sometime late last year I went to the Euro IA conference with Anya and Silver to give a talk on the domain modelling work we've been doing in UK Parliament."

    1. DXtera Institute is a non-profit, collaborative member-based consortium dedicated to transforming student and institutional outcomes in higher education.

      DXtera Institute is a non-profit, collaborative member-based consortium dedicated to transforming student and institutional outcomes in higher education. We specialize in helping higher education professionals drive more efficient access to information and insights for effective decision-making and realize long-term cost savings, by simplifying and removing barriers to systems integration and improving data aggregation and control.

      With partners across the U.S. and Europe, our consortium includes some of the brightest minds in education and technology, all working together to solve critical higher education issues on a global scale.

    1. Data journalism produced by two of the nation’s most prestigious news organizations — The New York Times and The Washington Post — has lacked transparency, often failing to explain the methods journalists or others used to collect or analyze the data on which the articles were based, a new study finds. In addition, the news outlets usually did not provide the public with access to that data

      While this is a worthwhile topic, I would like to see more exploration of data journalism in the 99.99999 percent of news organizations that are NOT the New York Times or the Washington Post and don't have the resources to publish so many data stories despite the desperate need for them across the nation. Also, why no digital news outlets included?

    2. Worse yet, it wouldn’t surprise me if we saw more unethical people publish data as a strategic communication tool, because they know people tend to believe numbers more than personal stories. That’s why it’s so important to have that training on information literacy and methodology.”

      Like the way unethical people use statistics in general? This should be a concern, especially as government data, long considered the gold standard of data, undergoes attacks that would skew the data toward political ends. (see the census 2020)

    3. fall short of the ideal of data journalism

      Is this the ideal of data journalism? Where is this ideal spelled out, and is there any sign that the NYT and WaPo have agreed to abide by this ideal?

  11. Feb 2019
    1. set; if this is higher, the tree 2can be considered to fit the data less well

      To test the fit between data and more than one alternative tree, you can just do a bootstrap analysis, and map the results on a neighbour-net splits graph based on the same data.

      Note that the phangorn library includes functions to transfer information between trees/tree samples and trees and networks:<br/> Schliep K, Potts AJ, Morrison DA, Grimm GW. 2017. Intertwining phylogenetic trees and networks. Methods in Ecology and Evolution (DOI:10.1111/2041-210X.12760.)[http://onlinelibrary.wiley.com/doi/10.1111/2041-210X.12760/full] – the basic functions and script templates are provided in the associated vignette.

    1. These models are emerging, which is why its exciting to be involved in the ground floor of this sector, however some models clearly make sense already and thats largely because they closely follow the models free software itself has shaped. If you want status, then you can make a name for yourself by leading a team to write the docs ala free software itself, if you want money then build the reputation for the documentation team and contract out your knowledge (eg. extend the docs on contract ala free software).

      Creo que hay que conectarlo con modelos de microfinanciación y tiendas independientes tipo Itch.io y que el experimento debería ser progresivo pero dejar un mapa posible de su propio futuro. Algo así intentaremos en la edición 13a del Data Week.

    1. Dissecting Flavivirus Biology in Salivary Gland Cultures from Fed and Unfed Ixodes scapularis (Black-Legged Tick)

      Data worth viewing: a tick trachea with viral infection in its salivary glands.

    1. !..�P'�r\0CA \= e,;4 ��'-"-'

      Could empirical data made up of experiences present in the form of an ethnography? Or autoethnography? I'm not sure if this is what you were getting at here, but it is a thought that came to mind!

  12. Jan 2019
    1. Nyhan and Reifler also found that presenting challenging information in a chart or graph tends to reduce disconfirmation bias. The researchers concluded that the decreased ambiguity of graphical information (as opposed to text) makes it harder for test subjects to question or argue against the content of the chart.

      Amazingly important double-edged finding for discussions of data visualization!

  13. demandlab.weebly.com demandlab.weebly.com
    1. y bosses want to see quick wins, but I know we can achieve big w

      add "My data (database) quality sucks"

    1. You may not access or use the Site in any manner that could damage or overburden any MIT server, or any network connected to any MIT server. You may not use the Site in any manner that would interfere with any other party’s use of the Site.

      Vamos a realizar pequeños scrapping, que no sobrecargarán el servidor, así que estamos cumpliendo con esta parte y de hecho, después de que trabajemos, permitiran repartir la carga del servidor, pues una copia estará en nuestros servidores.

    1. Adoption of good practice to generate high quality data will depend on sharing the burden of capacity building in some way. That in turn, can-not happen until there is a framework that provides sufficient trust to allow the sharing and compar-ison of data and its management.

      harkening to the 'data trust' concept being discussed from U.S. Mellon-funded projects, also co-authored by the authors of this paper.

  14. Dec 2018
    1. Outliers : All data sets have an expected range of values, and any actual data set also has outliers that fall below or above the expected range. (Space precludes a detailed discussion of how to handle outliers for statistical analysis purposes, see: Barnett & Lewis, 1994 for details.) How to clean outliers strongly depends on the goals of the analysis and the nature of the data.

      Outliers can be signals of unanticipated range of behavior or of errors.

    2. Understanding the structure of the data : In order to clean log data properly, the researcher must understand the meaning of each record, its associated fi elds, and the interpretation of values. Contextual information about the system that produced the log should be associated with the fi le directly (e.g., “Logging system 3.2.33.2 recorded this fi le on 12-3-2012”) so that if necessary the specifi c code that gener-ated the log can be examined to answer questions about the meaning of the record before executing cleaning operations. The potential misinterpretations take many forms, which we illustrate with encoding of missing data and capped data values.

      Context of the data collection and how it is structured is also a critical need.

      Example, coding missing info as "0" risks misinterpretation rather than coding it as NIL, NDN or something distinguishable from other data

    3. Data transformations : The goal of data-cleaning is to preserve the meaning with respect to an intended analysis. A concomitant lesson is that the data-cleaner must track all transformations performed on the data .

      Changes to data during clean up should be annotated.

      Incorporate meta data about the "chain of change" to accompany the written memo

    4. Data Cleaning A basic axiom of log analysis is that the raw data cannot be assumed to correctly and completely represent the data being recorded. Validation is really the point of data cleaning: to understand any errors that might have entered into the data and to transform the data in a way that preserves the meaning while removing noise. Although we discuss web log cleaning in this section, it is important to note that these principles apply more broadly to all kinds of log analysis; small datasets often have similar cleaning issues as massive collections. In this section, we discuss the issues and how they can be addressed. How can logs possibly go wrong ? Logs suffer from a variety of data errors and distortions. The common sources of errors we have seen in practice include:

      Common sources of errors:

      • Missing events

      • Dropped data

      • Misplaced semantics (encoding log events differently)

    5. In addition, real world events, such as the death of a major sports fi gure or a political event can often cause people to interact with a site differently. Again, be vigilant in sanity checking (e.g., look for an unusual number of visitors) and exclude data until things are back to normal.

      Important consideration for temporal event RQs in refugee study -- whether external events influence use of natural disaster metaphors.

    6. Recording accurate and consistent time is often a challenge. Web log fi les record many different timestamps during a search interaction: the time the query was sent from the client, the time it was received by the server, the time results were returned from the server, and the time results were received on the client. Server data is more robust but includes unknown network latencies. In both cases the researcher needs to normalize times and synchronize times across multiple machines. It is common to divide the log data up into “days,” but what counts as a day? Is it all the data from midnight to midnight at some common time reference point or is it all the data from midnight to midnight in the user’s local time zone? Is it important to know if people behave differently in the morning than in the evening? Then local time is important. Is it important to know everything that is happening at a given time? Then all the records should be converted to a common time zone.

      Challenges of using time-based log data are similar to difficulties in the SBTF time study using Slack transcripts, social media, and Google Sheets

    7. Log Studies collect the most natural observations of people as they use systems in whatever ways they typically do, uninfl uenced by experimenters or observers. As the amount of log data that can be collected increases, log studies include many different kinds of people, from all over the world, doing many different kinds of tasks. However, because of the way log data is gathered, much less is known about the people being observed, their intentions or goals, or the contexts in which the observed behaviors occur. Observational log studies allow researchers to form an abstract picture of behavior with an existing system, whereas experimental log stud-ies enable comparisons of two or more systems.

      Benefits of log studies:

      • Complement other types of lab/field studies

      • Provide a portrait of uncensored behavior

      • Easy to capture at scale

      Disadvantages of log studies:

      • Lack of demographic data

      • Non-random sampling bias

      • Provide info on what people are doing but not their "motivations, success or satisfaction"

      • Can lack needed context (software version, what is displayed on screen, etc.)

      Ways to mitigate: Collecting, Cleaning and Using Log Data section

    8. Two common ways to partition log data are by time and by user. Partitioning by time is interesting because log data often contains signifi cant temporal features, such as periodicities (including consistent daily, weekly, and yearly patterns) and spikes in behavior during important events. It is often possible to get an up-to-the- minute picture of how people are behaving with a system from log data by compar-ing past and current behavior.

      Bookmarked for time reference.

      Mentions challenges of accounting for time zones in log data.

    9. An important characteristic of log data is that it captures actual user behavior and not recalled behaviors or subjective impressions of interactions.

      Logs can be captured on client-side (operating systems, applications, or special purpose logging software/hardware) or on server-side (web search engines or e-commerce)

    10. Table 1 Different types of user data in HCI research

    11. Large-scale log data has enabled HCI researchers to observe how information diffuses through social networks in near real-time during crisis situations (Starbird & Palen, 2010 ), characterize how people revisit web pages over time (Adar, Teevan, & Dumais, 2008 ), and compare how different interfaces for supporting email organi-zation infl uence initial uptake and sustained use (Dumais, Cutrell, Cadiz, Jancke, Sarin, & Robbins, 2003 ; Rodden & Leggett, 2010 ).

      Wide variety of uses of log data

    12. Behavioral logs are traces of human behavior seen through the lenses of sensors that capture and record user activity.

      Definition of log data

    1. Ethnographic findings are not privileged, just particular: another country heard from. To regard them as anything more (or anything less) than that distorts both them and their implications, which are far profounder than mere primitivity, for social theory.

      This tension exists in HCI as well.

      Interpreted data vs empirical data and how each is systematically analyzed.

  15. Nov 2018
    1. One way to think about "core" biodiversity data is as a network of connected entities, such as taxa, taxonomic names, publications, people, species, sequences, images, collections, etc. (Fig. 1)
    1. “It’s about embracing the inscrutable nature of human interactions,” says Chang. Evidence-based medicine was a massive improvement over intuition-based medicine, he says, but it only covers traditionally quantifiable data, or those things that are easy to measure. But we’re now quantifying information that was considered qualitative a generation ago.

      Biggest challenges to redesigning the health care system in a way that would work better for patients and improve health

    2. “Our biggest opportunity is leaning into that. It’s either embracing the qualitative nature of that and designing systems that can act just on the qualitative nature of their experience, or figuring how to quantitate some of those qualitative measures,” says Chang. “That’ll get us much further, because the real value in health care systems is in the human interactions. My relationship with you as a doctor and a patient is far more valuable than the evidence that some trial suggests.”

      Biggest challenges to redesigning the health care system in a way that would work better for patients and improve health

    1. The Chinese place a higher value on community good versus individual rights, so most feel that, if social credit will bring a safer, more secure, more stable society, then bring it on
    1. Unless you need to push the boundaries of what these technologies are capable of, you probably don’t need a highly specialized team of dedicated engineers to build solutions on top of them. If you manage to hire them, they will be bored. If they are bored, they will leave you for Google, Facebook, LinkedIn, Twitter, … – places where their expertise is actually needed. If they are not bored, chances are they are pretty mediocre. Mediocre engineers really excel at building enormously over complicated, awful-to-work-with messes they call “solutions”. Messes tend to necessitate specialization.
    1. For the second, we could try to detect inconsistencies, eitherby inspecting samples of the class hierarchy

      Yes, that's what I do when doing quality work on the taxonomy (with the tool wdtaxonomy)

    2. Possible relations between Items

      This only includes properties of data-type item?! It should be made more clear because the majority of Wikidata classes has other data types.

    3. A KG typically spans across several domains and is built on topof a conceptual schema, orontology, which defines what types of entities (classes) are allowed inthe graph, alongside the types ofpropertiesthey can have

      Wikidata differs from typical KG as it is not build on top of classes (entity types). Any item (entity) can be connected by any property. Wikidata's only strict "classes" in the sense of KG classes are its data types (item, lemma, monolingual string...).

    Tags

    Annotators

    1. Entscheidend ist, dass sie Herren des Verfahrens bleiben - und eine Vision für das neue Maschinenzeitalter entwickeln.

      Es sieht für mich nicht eigentlich so aus als wären wir jemals die "Herren des Verfahrens" gewesen. Und auch darum geht es ja bei Marx. Denke ich.

    1. Does the widespread and routine collection of student data in ever new and potentially more-invasive forms risk normalizing and numbing students to the potential privacy and security risks?

      What happens if we turn this around - given a widespread and routine data collection culture which normalizes and numbs students to risk as early as K-8, what are our responsibilities (and strategies) to educate around this culture? And how do our institutional practices relate to that educational mission?

  16. Oct 2018
    1. As a recap, Chegg discovered on September 19th a data breach dating back to April that "an unauthorized party" accessed a data base with access to "a Chegg user’s name, email address, shipping address, Chegg username, and hashed Chegg password" but no financial information or social security numbers. The company has not disclosed, or is unsure of, how many of the 40 million users had their personal information stolen.

    1. tl;dr: data engineer = software, coding, cleaning data sets data architects = structure the technology to manage data models and database admin data scientist = stats + math models business analysts = communication and domain expertise

    1. research publications are not research data

      they could be, if used as part of a text mining corpus, for example

  17. Sep 2018
    1. I love the voice of their help page. Someone very opinionated (in a good way) is building this product. I particularly like this quote: Your data is a liability to us, not an asset.
    1. End-Users

      Because Grafoscopio was used in critical digital literacy workshops, dealing with data activism and journalism, the intended users are people who don't know how to program necessarily, but are not afraid of learning to code to express their concerns (as activists, journalists and citizens in general) and if fact are wiling to do so.

      Tool adaptation was "natural" of the workshops, because the idea was to extend the tool so it can deal with authentic problems at hand (as reported extensively in the PhD thesis) and digital citizenship curriculum was build in the events as a memory of how we deal with the problems. But critical digital literacy is a long process, so coding as a non-programmers knowledge in service of wider populations able to express in code, data and visualizations citizen concerns is a long time process.

      Visibility, scalability and sustainablitiy of such critical digital literacy endeavors where communities and digital tools change each other mutually is still an open problem, even more considering their location in the Global South (despite addressing contextualized global problems).

    1. In October 2014 the Open Knowledge Foundation recommends the Creative Commons CC0 license to dedicate content to the public domain,[51][52] and the Open Data Commons Public Domain Dedication and License (PDDL) for data.[53]
    1. predictive analysis

      Predictive analytics encompasses a variety of statistical techniques from data mining, predictive modelling, and machine learning, that analyze current and historical facts to make predictions about future or otherwise unknown events.

  18. Aug 2018
    1. this possibility of increased ownership and agency over technology and a somewhat romantic idea I have that this can transfer to inspire ownership and agency over learning
    1. A file containing personal information of 14.8 million Texas residents was discovered on an unsecured server. It is not clear who owns the server, but the data was likely compiled by Data Trust, a firm created by the GOP.