Another important distinction is between data and metadata. Here, the term “data” refers to the part of a file or dataset which contains the actual representation of an object of inquiry, while the term “metadata” refers to data about that data: metadata explicitly describes selected aspects of a dataset, such as the time of its creation, or the way it was collected, or what entity external to the dataset it is supposed to represent.
This part is notably helpful for the understanding of differences that separate "metadata" from "data". I was writing a blog post for my weekly assignment. Knowing that data is the representation of the object and metadata describes information the data helps build the definition of the terms in my schema of knowledge. In many cases, metadata even provides resources that either give insights to how the data is collected or/and introduces possible perspectives as to how the data can be seen/utilized in the future. Data can survive without metadata, but metadata won't exist without the data. However, the data that lacks metadata may stay uncracked and ciphered, leading to the data potentially becoming useless in the fundamental and economic growth of human beings.
Companies need to actually have an ethics panel, and discuss what the issues are and what the needs of the public really are. Any ethics board must include a diverse mix of people and experiences. Where possible, companies should look to publish the results of these ethics boards to help encourage public debate and to shape future policy on data use.
Most of us are familiar with data visualization: charts, graphs, maps and animations that represent complex series of numbers. But visualization is not the only way to explain and present data. Some scientists are trying to sonify storms with global weather data. That could be easier to get a sense of interrelated storm dynamics by hearing them.
È importante notare che nella pratica si ritiene a volte necessario passare da modelli di rappresentazione tradizionali come quello relazionale per la modellazione dei dati operando opportune trasformazioni per poi renderli disponibili secondo i principi dei Linked Open Data. Tuttavia, tale pratica non è necessariamente quella più appropriata: esistono situazioni per cui può essere più conveniente partire da un’ontologia del dominio e che si intende modellare e dall’uso di standard del web semantico per poter governare i processi di gestione dei dati.
Non trovo utilità in quanto qui scritto onestamente. Molti più sistemi sono ormai linked open data nativi, quindi oltre al fatto che parlare di linked open data in arricchimento è sbagliato, direi di lasciar perdere questo periodo.
utilizzano diversi standard e tecniche, tra cui il framework RDF
rifraserei in "si basano su diversi standard, tra cui RDF, e spesso usano vocabolari controllati RDF per rappresentare terminologia controllata del dominio applicativo di riferimento"
a formati di dati a quattro stelle come le serializzazioni RDF o il JSON-LD
JSON-LD è una serializzazione RDF nel mondo JSON. Occhio che qui la traduzione in italiano del documento del publications office non è venuta fuori bene (loro dicono data format such as RDF or JSON-LD che sarebbe anche impreciso. RDF è un modello di rappresentazione del dato nel Web. Le serializzazioni RDF sono tipo Ntriple, RDF/Turtle, RDF/XML, JSON-LD). Tra l'altro nell'allegato tecnico sui formati per i dati aperti, testo preso dalla precedente linee guida, JSON-LD è indicato come serializzazione RDF.
linked data
Sono open o no?
il linking è una funzionalità molto importante e di fatto può essere considerata una forma particolare di arricchimento. La particolarità consiste nel fatto che l’arricchimento avviene grazie all’interlinking fra dataset di origine diversa, tipicamente fra amministrazioni o istituzioni diverse, ma anche, al limite, all’interno di una stessa amministrazione”
Qui c'è un problema di fondo proprio concettuale. Il problema è che il paradigma dei Linked Open Data è stato derubricato come arricchimento, che nelle linee guida che si cita qui era solo una fase di un processo generale per la gestione dei dati linked open data. Fare linked open data non vuol solo dire arricchire i dati, ma è possibile gestire un dato fin dalla sua nascita in linked open data nativamente. Questo era lo spirito delle linee guida qui citate. Estrapolando solo una parte avete snaturato un po' tutto. Consiglio di trattare l'argomento com'era trattato nelle precedenti linee guida. Peccato anche che sia sparita la figura della metropolitana che aiutava molto.
Come detto, il collegamento (linking) dei dati può aumentarne il valore creando nuove relazioni e consentendo così nuovi tipi di analisi.
Comunque, farei uno sforzo in più, con tutto quello che l'italia ha scritto sui linked open data, per scrivere frasi che non siano proprio paro paro la traduzione in italiano del documento in inglese.
di licenze standard,
licenze aperte standard. Aggiungere la parola aperte che è fondamentale.
The reason these apps are great for such a broad range of use cases is they give users really strong data structures to work within.
Inside the very specific realm of personal knowledge bases, TiddlyWiki is the killer app when it comes to using blocks and having structured, translatable data behind them.
80% of data analysis is spent on the process of cleaning and preparing the data
Imagine having unnecessary and wrong data in your document, you would most likely have to experience the concept of time demarcation -- the reluctance in going through every single row and column to eliminate these "garbage data". Clearly, owning all kinds of data without organizing them feels like stuffing your closet with clothes that you should have donated 5 years ago. It is a time-consuming and soul-destroying process for us. Luckily, in R, we have something in R called "tidyverse" package, which I believe the author talks about in the next paragraph, to make life easier for everyone. I personally use dplyr and ggplot2 when I deal with data cleaning, and they are extremely helpful. WIthout these packages' existence, I have no idea when I will be able to reach the final step of data visualization.
On a new clone of the Canva monorepo, git status takes 10 seconds on average while git fetch can take anywhere from 15 seconds to minutes due to the number of changes merged by engineers.
Over the last 10 years, the code base has grown from a few thousand lines to just under 60 million lines of code in 2022. Every week, hundreds of engineers work across half a million files generating close to a million lines of change (including generated files), tens of thousands of commits, and merging thousands of pull requests.
The goal is to gain “digital sovereignty.”
the age of borderless data is ending. What we're seeing is a move to digital sovereignty
nothing is permanent in the digital world
Either ironic or maybe not the best advice when suggesting people might choose something like Notion or Evernote which could disappear with your data...
23.0G com.txt # 23 gigs uncompressed
23 GB txt file <--- list of all the existing .com domains
Some of the basic outline of this looks like OER (Open Educational Resources) and its "five Rs": Retain, Reuse, Revise, Remix and/or Redistribute content. (To which I've already suggested the sixth: Request update (or revision control).
Some of this is similar to:
The Read Write Web is no longer sufficient. I want the Read Fork Write Merge Web. #osb11 lunch table. #diso #indieweb [Tantek Çelik](http://tantek.com/2011/174/t1/read-fork-write-merge-web-osb110
Idea of collections of learning as collections or "playlists" or "readlists". Similar to the old tool Readlist which bundled articles into books relatively easily. See also: https://boffosocko.com/2022/03/26/indieweb-readlists-tools-and-brainstorming/
Use of Wiki version histories
Some of this has the form of a Wiki but with smaller nuggets of information (sort of like Tiddlywiki perhaps, which also allows for creating custom orderings of things which had specific URLs for displaying and sharing them.) The Zettelkasten idea has some of this embedded into it. Shared zettelkasten could be an interesting thing.
Data is the new soil. A way to reframe "data is the new oil" but as a part of the commons. This fits well into the gardens and streams metaphor.
Jerry, have you seen Matt Ridley's work on Ideas Have Sex? https://www.ted.com/talks/matt_ridley_when_ideas_have_sex Of course you have: https://app.thebrain.com/brains/3d80058c-14d8-5361-0b61-a061f89baf87/thoughts/3e2c5c75-fc49-0688-f455-6de58e4487f1/attachments/8aab91d4-5fc8-93fe-7850-d6fa828c10a9
I've heard Jerry mention the idea of "crystallization of knowledge" before. How can we concretely link this version with Cesar Hidalgo's work, esp. Why Information Grows.
Cross reference Jerry's Brain: https://app.thebrain.com/brains/3d80058c-14d8-5361-0b61-a061f89baf87/thoughts/4bfe6526-9884-4b6d-9548-23659da7811e/notes
Expected to come into force on June 27, India's new data retention law will force VPN companies to keep users' data - like IP addresses, real names and usage patterns - for up to five years. They will also be required to hand this information over to authorities upon request.
Some draconian Indian data-retention laws are coming.
“Data is the new oil,” she said.
Oft repeated phrase and one I wouldn't have expected in this article.
Recognizing that the CEC hyperthreat operates at micro and macro scales across most forms of human activity and that a whole-of-society approach is required to combat it, the approach to the CEC hyperthreat partly relies on a philosophical pivot. The idea here is that a powerful understanding of the CEC hyperthreat (how it feels, moves, and operates), as well as the larger philosophical and survival-based reasons for hyper-reconfiguration, enables all actors and groups to design their own bespoke solutions. Consequently, the narrative and threat description act as a type of orchestration tool across many agencies. This is like the “shared consciousness” idea in retired U.S. Army general Stanley A. McChrystal’s “team of teams” approach to complexity.7 Such an approach is heavily dependent on exceptional communication of both the CEC hyperthreat and hyper-response pathways, as well as providing an enabling environment in terms of capacity to make decisions, access information and resources. This idea informs Operation Visibility and Knowability (OP VAK), which will be described later.
Such an effort will require a supporting worldwide digital ecosystem. In the recent past, major evolutionary transitions (MET) (Robin et al, 2021) of our species have been triggered by radical new information systems such as spoken language, and then inscribed language. Something akin to a Major Competitive Transitions (MCT) may be required to accompany a radical transition to a good anthropocene. (See annotation: https://hyp.is/go?url=https%3A%2F%2Fwww.frontiersin.org%2Farticles%2F10.3389%2Ffevo.2021.711556%2Ffull&group=world)
If large data is ingested into a public Indyweb, because Indyweb is naturally a graph database, a salience landscape can be constructed of the hyperthreat and data visualized in its multiple dimensions and scales.
Metaphorically, it can manifest as a hydra with multiple tentacles reach out to multiple scales and dimensions. VR and AR technology can be used to expose the hyperobject and its progression.
The proper hyperthreat is not climate change alone, although that is the most time sensitive dimension of it, but rather the totality of all blowbacks of human progress...the aggregate of all progress traps that have been allowed to grow, through a myopic prioritization of profit over global wellbeing due to the invisibility of the hyperobject, from molehills into mountains.
I explore how moves towards ‘objective’ data as the basis for decision-making orientated teachers’ judgements towards data in ways that worked to standardise judgement and exclude more multifaceted, situated and values-driven modes of professional knowledge that were characterised as ‘human’ and therefore inevitably biased.
But, aren't these multifaceted, situated, and values-driven modes also constituted of data? Isn't everything represented by data? Even 'subjective' understanding of the world is articulated as data.
Is there some 'standard' definition of data that I'm not aware of in the context of this domain?
Recommended by Ben Williamson. Purpose: It may have some relevance for the project with Ben around chat bots and interviews, as well as implications for the introduction of portfolios for assessment.
Each developer on average wastes 30 minutes before and after the meeting to context switch and the time is otherwise non-value adding. (See this study for the cost of context switching).
<small><cite class='h-cite via'>ᔥ <span class='p-author h-card'>Maria Farrell</span> in What is Ours is Only Ours to Give — Crooked Timber (<time class='dt-published'>05/18/2021 11:28:17</time>)</cite></small>
For example, the idea of “data ownership” is often championed as a solution. But what is the point of owning data that should not exist in the first place? All that does is further institutionalise and legitimate data capture. It’s like negotiating how many hours a day a seven-year-old should be allowed to work, rather than contesting the fundamental legitimacy of child labour. Data ownership also fails to reckon with the realities of behavioural surplus. Surveillance capitalists extract predictive value from the exclamation points in your post, not merely the content of what you write, or from how you walk and not merely where you walk. Users might get “ownership” of the data that they give to surveillance capitalists in the first place, but they will not get ownership of the surplus or the predictions gleaned from it – not without new legal concepts built on an understanding of these operations.
www.nytimes.com www.nytimes.com
And it’s easy to leave. Unlike on Facebook or Twitter, Substack writers can simply take their email lists and direct connections to their readers with them.
Owning your audience is key here.
We believe that Facebook is also actively encouraging people to use tools like Buffer Publish for their business or organization, rather than personal use. They are continuing to support the use of Facebook Pages, rather than personal Profiles, for things like scheduling and analytics.
Of course they're encouraging people to do this. Pushing them to the business side is where they're making all the money.
Manton says owning your domain so you can move your content without breaking URLs is owning your content, whereas I believe if your content still lives on someone else's server, and requires them to run the server and run their code so you can access your content, it's not really yours at all, as they could remove your access at any time.
This is a slippery slope problem, but people are certainly capable of taking positions along a broad spectrum here.
The one thing I might worry about--particularly given micro.blog's--size is the relative bus factor of one represented by Manton himself. If something were to happen to him, what recourse has he built into make sure that people could export their data easily and leave the service if the worst were to come to happen? Is that documented somewhere?
Aside from this the service has one of the most reasonable turn-key solutions for domain and data ownership I've seen out there without running all of your own infrastructure.
First, Manton's business model is for users to not own their content. You might be able to own your domain name, but if you have a hosted Micro.blog blog, the content itself is hosted on Micro.blog servers, not yours. You can export your data, or use an RSS feed to auto-post it to somewhere you control directly, but if you're not hosting the content yourself, how does having a custom domain equal self-hosting your content and truly owning it? Compared to hosting your own blog and auto-posting it to Micro.blog, which won't cost you and won't make Micro.blog any revenue, posting for a hosted blog seems to decrease your ownership.
I'm not sure that this is the problem that micro.blog is trying to solve. It's trying to solve the problem of how to be online as simply and easily as possible without maintaining the overhead of hosting and managing your own website.
As long as one can easily export their data at will and redirect their domain to another host, one should be fine. In some sense micro.blog makes it easier than changing phone carriers, which in most cases will abandon one's text messages without jumping through lots of hoops. .
One step that micro.blog could set up is providing a download dump of all content every six months to a year so that people have it backed up in an accessible fashion. Presently, to my knowledge, one could request this at any time and move when they wished.
The ad lists various data that WhatsApp doesn’t collect or share. Allaying data collection concerns by listing data not collected is misleading. WhatsApp doesn’t collect hair samples or retinal scans either; not collecting that information doesn’t mean it respects privacy because it doesn’t change the information WhatsApp does collect.
An important logical point. Listing what they don't keep isn't as good as saying what they actually do with one's data.
indiedigitalmedia.com indiedigitalmedia.com
The main thing Smith has learned over the past seven years is “the importance of ownership.” He admitted that Tumblr initially helped him “build a community around the idea of digital news.” However, it soon became clear that Tumblr was the only one reaping the rewards of its growing community. As he aptly put it, “Tumblr wasn’t seriously thinking about the importance of revenue or business opportunities for their creators.”
write your own blog post on your own damn site
And isn't this what everyone should really be doing anyway so that they own their own work and words?
Third, the post-LMS world should protect the pedagogical prerogatives and intellectual property rights of faculty members at all levels of employment. This means, for example, that contingent faculty should be free to take the online courses they develop wherever they happen to be teaching. Similarly, professors who choose to tape their own lectures should retain exclusive rights to those tapes. After all, it’s not as if you have to turn over your lecture notes to your old university whenever you change jobs.
Own your pedagogy. Send just like anything else out there...
And yes, some add-ons exist, but I just wish the feature was native to the browser. And I do not want to rely on a third party service. My quotes are mine only and should not necessary be shared with a server on someone's else machine.
Ownership of the data is important. One could certainly set up their own Hypothes.is server if they liked.
I personally take the data from my own Hypothes.is account and dump it into my local Obsidian.md vault for saving, crosslinking, and further thought.
With Alphabet Inc.’s Google, and Facebook Inc. and its WhatsApp messaging service used by hundreds of millions of Indians, India is examining methods China has used to protect domestic startups and take control of citizens’ data.
Governments owning citizens' data directly?? Why not have the government empower citizens to own their own data?
The highlights you made in FreeTime are preserved in My Clippings.txt, but you can’t see them on the Kindle unless you are in FreeTime mode. Progress between FreeTime and regular mode are tracked separately, too. I now pretty much only use my Kindle in FreeTime mode so that my reading statistics are tracked. If you are a data nerd and want to crunch the data on your own, it is stored in a SQLite file on your device under system > freetime > freetime.db.
FreeTime mode on the Amazon Kindle will provide you with reading statistics. You can find the raw data as an SQLite file under system > freetime > freetime.db.
wildland.io wildland.io
This looks intriguing... A client for abstracting data stores for use anywhere.
I tried very hard in that book, when it came to social media, to be platform agnostic, to emphasize that social media sites come and go, and to always invest first and foremost in your own media. (Website, blog, mailing list, etc.)
Facebook provides some data portability, but makes an odd plea for regulation to make more functionality possible.
Why do this when they could choose to do the right thing? They don't need to be forced and could certainly try to enforce security. It wouldn't be any worse than unveiling the tons of personal data they've managed not to protect in the past.
Any subscriber data that’s collected (like phone numbers and emails) is owned by the host and can be exported if they leave the service.
Goodreads lost my entire account last week. Nine years as a user, some 600 books and 250 carefully written reviews all deleted and unrecoverable. Their support has not been helpful. In 35 years of being online I've never encountered a company with such callous disregard for their users' data.
A clarion call for owning your own data.
I like how Dr. Pacheco-Vega outlines some of his research process here.
Sharing it on Twitter is great, and so is storing a copy on his website. I do worry that it looks like the tweets are embedded via a simple URL method and not done individually, which means that if Twitter goes down or disappears, so does all of his work. Better would be to do a full blockquote embed method, so that if Twitter disappears he's got the text at least. Images would also need to be saved separately.
www.buildingasecondbrain.com www.buildingasecondbrain.com
Common Pitfalls to Avoid When Choosing Your App
What are the common pitfalls when choosing a note taking application or platform?
Own your data
Prefer note taking systems that don't rely on a company's long term existence. While Evernote or OneNote have been around for a while, there's nothing to say they'll be around forever or even your entire lifetime. That shiny new startup note taking company may not gain traction in the market and exist in two years. If your notes are trapped inside a company's infrastructure and aren't exportable to another location, you're simply dead in the water. Make sure you have a method to be able to export and own the raw data of your notes.
Test driving many
and not choosing or sticking with one (or even a few)<br /> Don't get stunned into inaction by the number of choices.
Shiny object syndrome
is the situation where people focus all attention on something that is new, current or trendy, yet drop this as soon as something new takes its place.<br /> There will always be new and perhaps interesting note taking applications. Some may look fun and you'll be tempted to try them out and fragment your notes. Don't waste your time unless the benefits are manifestly clear and the pathway to exporting your notes is simple and easy. Otherwise you'll spend all your time importing/exporting and managing your notes and not taking and using them. Paper and pencil has been around for centuries and they work, so at a minimum do this. True innovation in this space is exceedingly rare, and even small affordances like the ability to have [[wikilinks]] and/or bi-directional links may save a few seconds here and there, in the long run these can still be done manually and having a system far exceeds the value of having the best system.
(Relate this to the same effect in the blogosphere of people switching CMSes and software and never actually writing content on their website. The purpose of the tool is using it and not collecting all the tools as a distraction for not using them. Remember which problem you're attempting to solve.)
Future needs and whataboutisms
Surely there will be future innovations in the note taking space or you may find some niche need that your current system doesn't solve. Given the maturity of the space even in a pen and paper world, this will be rare. Don't worry inordinately about the future, imitate what has worked for large numbers of people in the past and move forward from there.
Others? Probably...
Even with data that’s less fraught than our genome, our decisions about what we expose to the world have externalities for the people around us.
We need to think more about the externalities of our data decisions.
It's the feedback that's motivating A-list bloggers like Digg founder Kevin Rose to shut down their blogs and redirect traffic to their Google+ profiles. I have found the same to be true.
This didn't work out too well for them did it?
The European Commission has prepared to legislate to require interoperability, and it calls being able to use your data wherever and whenever you like “multi-homing”. (Not many other people like this term, but it describes something important – the ability for people to move easily between platforms
an interesting neologism to describe something that many want
the decentralised and open source nature of these systems, where anyone can host an instance, may protect their communities from the kinds of losses experienced by users of the many commercial platforms that have gone out of business over the last decades (e.g. Geocities, Wikispaces or Google + to name just a few).
https://indieweb.org/site-deaths names a large number of others
anvaka.github.io anvaka.github.io
Subsidiarity, which uses “data cooperatives, collaboratives, and trusts with privacy-preserving and -enhancing techniques for data processing, such as federated learning and secure multiparty computation.”
Another value of the data cooperative model might be that each individual might not have time to research and administer possible new data-sharing requests/opportunities, and it would be helpful to entrust that work to a cooperative entity that already has one's trust.
mutabit.com mutabit.com
For the continuous part I would try to illustrate my workflow with the following diagram
www.nngroup.com www.nngroup.com
A 20-year age difference (for example, from 20 to 40, or from 30 to 50 years old) will, on average, correspond to reading 30 WPM slower, meaning that a 50-year old user will need about 11% more time than a 30-year old user to read the same text.
Users’ age had a strong impact on their reading speed, which dropped by 1.5 WPM for each year of age.
linter.structured-data.org linter.structured-data.org
Overall, having spent a significant amount of time building this project, scaling it up to the size it’s at now, as well as analysing the data, the main conclusion is that it is not worth building your own solution, and investing this much time. When I first started building this project 3 years ago, I expected to learn way more surprising and interesting facts. There were some, and it’s super interesting to look through those graphs, however retrospectively, it did not justify the hundreds of hours I invested in this project.I’ll likely continue tracking my mood, as well as a few other key metrics, however will significantly reduce the amount of time I invest in it.
Words of the author of https://krausefx.com//blog/how-i-put-my-whole-life-into-a-single-database
It seems as if excessive personal data tracking is not worth it
www.canada.ca www.canada.ca
www.alastore.ala.org www.alastore.ala.org
Besançon, L., Peiffer-Smadja, N., Segalas, C., Jiang, H., Masuzzo, P., Smout, C., Billy, E., Deforet, M., & Leyrat, C. (2021). Open science saves lives: Lessons from the COVID-19 pandemic. BMC Medical Research Methodology, 21(1), 117. https://doi.org/10.1186/s12874-021-01304-y
twitter.com twitter.com
twitter.com twitter.com
twitter.com twitter.com
kit.svelte.dev kit.svelte.dev
The combined stuff is available to components using the page store as $page.stuff, providing a mechanism for pages to pass data 'upward' to layouts.
bidirectional data flow ?! That's a game changer.
analogue in Rails: content_for
www.nytimes.com www.nytimes.com
twitter.com twitter.com
twitter.com twitter.com
twitter.com twitter.com
twitter.com twitter.com
Local file Local file
For this reason, the Secretary of State set out a vision1 for health and care to have nationalopen standards for data and interoperability that are mandated throughout the NHS andsocial care.
Melton, J., & Sinclair, R. (2021). COVID-19 Infection Rates Are Related to Population Rates of Vaccination: A Response to Subramanian and Kumar.
www.imperial.ac.uk www.imperial.ac.uk
Imperial News. ‘“Issue of Inequalities” for Long COVID Patients Needs to Be Addressed | Imperial News | Imperial College London’. Accessed 22 April 2022. https://www.imperial.ac.uk/news/232234/issue-inequalities-long-covid-patients-needs/.
twitter.com twitter.com
newsnodes.com newsnodes.com
twitter.com twitter.com
twitter.com twitter.com
twitter.com twitter.com
twitter.com twitter.com
twitter.com twitter.com
twitter.com twitter.com
twitter.com twitter.com
twitter.com twitter.com
twitter.com twitter.com
twitter.com twitter.com
twitter.com twitter.com
twitter.com twitter.com
twitter.com twitter.com
twitter.com twitter.com
bmjopen.bmj.com bmjopen.bmj.com
www.poverty-action.org www.poverty-action.org
www.southampton.ac.uk www.southampton.ac.uk
Locally, Curry said troopers handled 275 distracted driving crashes in 2020, 322 in 2021 and 47 so far this year. He added local troopers issued 267 distracted driving citations in 2020, 299 in 202 and 51 so far this year.
This article was published on April 9th, (Day 99 of 2022) which is 27% through the year. So based on the data provided, what can we expect?
In terms of CRASHES, we've had 47 so far when 27% of the way through 2020 and 2021 we would have had 75 and 87 crashes by now (assuming that distracted driving crashes are generally evenly distributed through the year) - so we're on track for 173 distracted driving crashes this year; that's only a little over half (54% of last year's numbers).
As they said in the first paragraph:
The Delaware Post of the Ohio State Highway Patrol is stepping up enforcement this month in an effort to curb distracted driving, which the agency reports is leading to increased traffic crashes and deaths statewide.
...so they're doing this on a PR schedule - not because the numbers are up - in fact, the numbers are down locally by a huge margin.
With Troopers focusing on this, it means they're not focusing on safety problems that are increasing.
github.com github.com
A python script from karlicoss to export/access your Hypothes.is data: annotations and profile info
link to https://hyp.is/VZ2G7IPiEeutw1PTsBrlLw/github.com/collignon/annotation-tools
Let’s look at a recent paper by Xia, Bao, Lo, Xing, Hassan, & Li entitled Measuring Program Comprehension: A Large-Scale Field Study with Professionals and published in IEEE Transactions on Software Engineering, 44, 951-976, 2018. This paper is quite interesting in that it describes in great details how the figures are obtained. And it says that Comprehension took on average ~58%.
Developers spend most of their time figuring the system out
www.core-econ.org www.core-econ.org
A New York Times article uses the same temperature dataset you have been using to investigate the distribution of temperatures and temperature variability over time. Read through the article, paying close attention to the descriptions of the temperature distributions.
Unfortunately, like most NYT content, this article is behind a paywall. I'm partly reading this as I plan to develop a set of open education resources myself and the problem of how to manage dead/unavailable links looks like a key stumbling block.
Tyler Black, MD. (2021, December 10). Statistics Canada has been asking kids about mental health during the pandemic. Initially, after the first 5 months (with school shutdowns, summer break, lots of restrictions), more kids said they were better than worse, most reported no change. 86% “No change or better” [/1] https://t.co/3shKtrxEVU [Tweet]. @tylerblack32. https://twitter.com/tylerblack32/status/1469380405451100162
twitter.com twitter.com
twitter.com twitter.com
ourworldindata.org ourworldindata.org
twitter.com twitter.com
Barnes, O., & Burn-Murdoch, J. (2022, January 7). Covid hospital admissions in Greater Manchester surpass last winter’s peak. Financial Times. https://www.ft.com/content/6a8a52f3-d940-4d70-a30b-111a3a646089
- Mar 2022
www.whitehouse.gov www.whitehouse.gov
Strategic, cost-efficient evidence-building relies onstrong data governance that facilitates the access, pro-tection, and use of program and other administrativedata to enable and support secondary uses, including for
The statutemakes agency evidence-building plans, known as LearningAgendas, foundational to building a culture of evidencegeneration and use.
www.thelancet.com www.thelancet.com
twitter.com twitter.com
www.theguardian.com www.theguardian.com
www.theguardian.com www.theguardian.com
www.nature.com www.nature.com
The audit found that the CIO has limited insight into each Sector’s entire data holdings given a decentralized model, and lack of centralized guidance, standard definitions, and corporate data management system. CMSS representatives acknowledged that the NRCan Data Inventory is not a complete listing of NRCan datasets; however, it was found that it serves as a good starting point in identifying datasets held within the Department. However, per TBS guidance, a complete departmental inventory should include a list of all datasets even if they are identified as not eligible for release.
twitter.com twitter.com
psyarxiv.com psyarxiv.com
twitter.com twitter.com
twitter.com twitter.com
twitter.com twitter.com
twitter.com twitter.com
Trevor Bedford on Twitter. (n.d.). Twitter. Retrieved 29 March 2022, from https://twitter.com/trvrb/status/1466076761427304453
Prof. Christina Pagel 🇺🇦. (2021, December 6). This is very high for a Monday https://t.co/q1zTsJcYiM [Tweet]. @chrischirp. https://twitter.com/chrischirp/status/1467894035666939911
twitter.com twitter.com
psyarxiv.com psyarxiv.com
twitter.com twitter.com
www.pnas.org www.pnas.org
Kerr, P. J., Cattadori, I. M., Liu, J., Sim, D. G., Dodds, J. W., Brooks, J. W., Kennett, M. J., Holmes, E. C., & Read, A. F. (2017). Next step in the ongoing arms race between myxoma virus and wild rabbits in Australia is a novel disease phenotype. Proceedings of the National Academy of Sciences, 114(35), 9397–9402. https://doi.org/10.1073/pnas.1710336114
twitter.com twitter.com
twitter.com twitter.com
www.owlstown.com www.owlstown.com
Desirables: * data export * data import (POSSE/PESOS) * collaboration (wiki/fanclub, annotations)
intellipaat.com intellipaat.com
Learn Data Science from IIT Madras faculty & Industry experts and earn a Data Science certification from India's best Engineering College. Become a Data Scientist through multiple data Science courses covered in this 7-month data science certification program with hands-on exercises & Project work.
This Data Science Course is offered by Intellipaat in collaboration with IIT Madras (one of the renowned institutes in India) to help you master Data Science skills like Python, programming, Data Visualization, Statistical analysis and computing, Deep Learning, etc.
Eager to step into the field of Data Science? Explore the Page now!
twitter.com twitter.com
ReconfigBehSci on Twitter: ‘RT @CT_Bergstrom: I also dislike the choice of axis scales. I don’t mind line graphs with axes that don’t go to zero (https://t.co/EpPNR9Lx…’ / Twitter. (n.d.). Retrieved 22 March 2022, from https://twitter.com/SciBeh/status/1477181158425251840
Data integrity is a good thing. Constraining the values allowed by your application at the database-level, rather than at the application-level, is a more robust way of ensuring your data stays sane.
This is particularly useful in cases where you want to separate your data migrations from your schema migrations or where you have multiple steps in your migration process that must have other steps invoked throughout.
railsguides.net railsguides.net
The code will work without exception but it doesn’t set correct association, because the defined classes are under namespace AddStatusToUser. This is what happens in reality: role = AddStatusToUser::Role.create!(name: 'admin') AddStatusToUser::User.create!(nick: '@ka8725', role: role)
github.com github.com
There are three keys to backfilling safely: batching, throttling, and running it outside a transaction. Use the Rails console or a separate migration with disable_ddl_transaction!.
Active Record creates a transaction around each migration, and backfilling in the same transaction that alters a table keeps the table locked for the duration of the backfill. class AddSomeColumnToUsers < ActiveRecord::Migration[7.0] def change add_column :users, :some_column, :text User.update_all some_column: "default_value" end end
github.com github.com
No need to construct strings that then need to be deconstructed later.
I believe we need the break free of these anachronistic designs and use event loggers, not message loggers
µ/log's idea is to replace the "3 Pillars of Observability" with a more fundamental concept: "the event"
bold goal
Event-based data is easy to index, search, augment, aggregate and visualise therefore can easily replace traditional logs, metrics and traces.
psyarxiv.com psyarxiv.com
twitter.com twitter.com
www.cs.sfu.ca www.cs.sfu.ca
Their alignment rule is based on the principle that any primitiveobject of K bytes must have an address that is a multiple of K.
data alignment 的原则是什么?
dorian.substack.com dorian.substack.com
Linked data makes it possible to completely decouple computable information from the system that ordinarily houses it.
www.axios.com www.axios.com
75% of people in the U.S. never tweet.On an average weeknight in January, just 1% of U.S. adults watched primetime Fox News (2.2 million). 0.5% tuned into MSNBC (1.15 million).Nearly three times more Americans (56%) donated to charities during the pandemic than typically give money to politicians and parties (21%).
twitter.com twitter.com
www.youtube.com www.youtube.comYouTube1
Government of Canada Professsional Development Data Strategy
twitter.com twitter.com
twitter.com twitter.com
Working on a new data visceralization. I’m particularly interested in the tactile quality of this one. Covid deaths from 3/2020-6/2021
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>Working on a new data visceralization. I’m particularly interested in the tactile quality of this one. Covid deaths from 3/2020-6/2021 pic.twitter.com/MjFZCqDP4x
— Jacqueline Wernimont (@profwernimont) March 1, 2022
www.colbyrussell.com www.colbyrussell.com
let zeta = getProcessControl.bind(this); Object.setPrototypeOf(zeta, Object.getPrototypeOf(this)); return zeta;
useful pattern
www.sigs-datacom.de www.sigs-datacom.de
Schwierigkeiten oft zu ineffizienten Abläufen, und damit verbunden einer unnötig langen „time-to-data“.
- Herausforderung; time-to-data zu lang
blog.bib.uni-mannheim.de blog.bib.uni-mannheim.de
Linked Data bezieht sich dabei auf die technische Aufbereitung der Daten, so dass eine Verknüpfung (Linking) der Daten möglich ist. Das dabei zum Einsatz kommende Datenmodell ist RDF, das ursprünglich für das Semantic Web entwickelt wurde.
coronavirus.data.gov.uk coronavirus.data.gov.uk
Support search by postcode!
twitter.com twitter.com
twitter.com twitter.com
twitter.com twitter.com
www.nytimes.com www.nytimes.com
When the C.D.C. published the first significant data on the effectiveness of boosters in adults younger than 65 two weeks ago, it left out the numbers for a huge portion of that population: 18- to 49-year-olds, the group least likely to benefit from extra shots, because the first two doses already left them well-protected.
US is not only the worst country from a death/cases standpoint, but also its governmental health services are not adept to the task.
US is a failed state in many domains outside defense & security.
www.youtube.com www.youtube.com
twitter.com twitter.com
twitter.com twitter.com
twitter.com twitter.com
www.nejm.org www.nejm.org
twitter.com twitter.com
twitter.com twitter.com
twitter.com twitter.com
twitter.com twitter.com
www.dataversity.net www.dataversity.net
To achieve nimbleness, we can simplify the data landscape by using a semantic fabric, popularly called data fabric, based on a strong Metadata Management operating model.
data fabric
news.ycombinator.com news.ycombinator.com
Wordle's spread on social media was enabled in part by its low-tech approach for e.g. sharing scores.
One low-tech approach that could've been used here for data persistence would be to generate and prompt the user to save their latest scorecard in PDF or Word format—only it's not a PDF or Word format, but instead "wordlescore.html" file, albeit one that they are able to save to disk and double click to open all the same. When they need to update their scorecard with today's data, you use window.open to show a page that prompts the user to open their most recent scorecard (using either Ctrl+/Cmd+O, or by navigating to the place where they saved it on disk via bookmark). What's not apparent on sight alone is that their wordlescore.html also contains a JS payload as an inline script. When wordlescore.html is opened, it's able to communicate with the Wordle tab via postMessage to window.opener, request the newest data from the app, and then update wordlescore.html itself as appropriate.
twitter.com twitter.com
www.faps.fau.de www.faps.fau.de
Data Mining und Knowledge Discovery in Databases be-inhalten Methoden der Informations- und Wissensextraktion aus strukturierten Datensätzen [99].
Data Mining-Systeme
Entscheidungs- und Führungsinformationssysteme beruhen zumeist auf bereichsbezogenen, integrierten und zeitlich veränderli-chen Datensammlungen, sog. Data Warehouses.
gidmk.medium.com gidmk.medium.com
twitter.com twitter.com
twitter.com twitter.com
twitter.com twitter.com
twitter.com twitter.com
www.schneier.com www.schneier.com
twitter.com twitter.com
twitter.com twitter.com
twitter.com twitter.com
www.ipvs.uni-stuttgart.de www.ipvs.uni-stuttgart.de
neuer Typ von Datenplattform für die Speicherung, Integrationund Analyse aller Arten von (Roh-)daten etabliert
Data Lake als ein neuer Typ von Datenplattform
Sie helfen beispielsweise, die heterogenen Datensilos eines Unternehmens zu erschließen, sie intelligent zu verknüpfen, neu zu interpretieren und im Firmen-Intranet gezielt bereitzustellen.
Potential von semantischen Technologien: Auflösung von heterogenen Daten-Silos Technologie: Linked Data
www.trendreport.de www.trendreport.de
Darüber hinaus ist ein wichtiger Trend Linked Data im Unternehmensumfeld zu etablieren, um eine neue Generation semantischer, vernetzter Daten-Anwendungen auf Basis des Linked Data Paradigmas zu entwickeln, zu etablieren und erfolgreich zu vermarkten. Im BMBF Wachstumskernprojekt „Linked Enterprise Data Services“ entsteht hierfür beispielsweise eine Technologieplattform, die es Unternehmen ermöglichen soll, neue Dienstleistungen im Web 3.0 zu etablieren.
BMBF Wachstumskernprojekt „Linked Enterprise Data Services