Hypothesis

3,505 Matching Annotations

Dec 2015
blogs.edweek.org blogs.edweek.org

Textbooks Out of Step With Scientists on Climate Change, Study Says

1
1. otterscotter 03 Dec 2015
  
  in Public
  
  Textbooks Out of Step With Scientists on Climate Change, Study Says
  
  OER open textbooks open data
Visit annotations in context

Tags

open data

open textbooks

OER

Annotators

otterscotter

URL

blogs.edweek.org/edweek/curriculum/2015/12/textbooks_out_of_step_with_scientists_on_climate_change.html
www.sr.ithaka.org www.sr.ithaka.org

Office of Scholarly Communication | Ithaka S+ROffice of Scholarly Communication | Ithaka S+R

1
1. ryerbanta 02 Dec 2015
  
  in Public
  
  As of May 1, 2015, there is a new requirement from some research councils that research data must also be openly available,
  
  data requirements
  
  policy data
Visit annotations in context

Tags

policy

data

Annotators

ryerbanta

URL

sr.ithaka.org/publications/office-of-scholarly-communication/
www.meanboyfriend.com www.meanboyfriend.com

What it means to be Open | Overdue Ideas

1
1. Enkerli 01 Dec 2015
  
  in Public
  
  Among the most useful summaries I have found for Linked Data, generally, and in relationship to libraries, specifically. After first reading it, got to hear of the acronym LODLAM: “Linked Open Data for Libraries, Archives, and Museums”. Been finding uses for this tag, in no small part because it gets people to think about the connections between diverse knowledge-focused institutions, places where knowledge is constructed. Somewhat surprised academia, universities, colleges, institutes, or educational organisations like schools aren’t explicitly tied to those others. In fact, it’s quite remarkable that education tends to drive much development in #OpenData, as opposed to municipal or federal governments, for instance. But it’s still very interesting to think about Libraries and Museums as moving from a focus on (a Web of) documents to a focus on (a Web of) data.
  
  #LODLAM Linked Data Open Data Open Education Open Standards #OpenWeb Linked Open Data Education Higher Education Academia Open World Assumption
Visit annotations in context

Tags

Open Education

Linked Data

Open Standards

Higher Education

Open World Assumption

#LODLAM

Education

Open Data

Academia

Linked Open Data

#OpenWeb

Annotators

Enkerli

URL

meanboyfriend.com/overdue_ideas/2015/05/what-it-means-to-be-open/
Nov 2015
news.mit.edu news.mit.edu

How to make better visualizations

1
1. daveh70 29 Nov 2015
  
  in Public
  
  The effectiveness of infographics, or any other form of communication, can be measured in terms of whether people:
  
  pay attention to it
  
  understand it
  
  remember it later
  
  Titles are important. Ideally, the title should concisely state the main point you want people to grasp.
  
  Recall of both labels and data can be improved by using redundancy -- text as well as images. For example:
  
  flags in addition to country names
  
  proportional bubbles in addition to numbers.
  
  infographics visualization data visualization
Visit annotations in context

Tags

data visualization

infographics

visualization

Annotators

daveh70

URL

news.mit.edu/2015/how-make-better-infographic-visualizations-1105
chronicle.com chronicle.com

Beyond Textbooks and OER: reflecting on #OpenEd15 – ProfHacker - Blogs - The Chronicle of Higher Education

1
1. Enkerli 27 Nov 2015
  
  in Public
  
  it apparently meant allowing students to see the syllabus before they register
  
  There are initiatives to do much more than this, including using Open Data on syllabi to delve down into course content.
  
  Open Syllabus Open Data
Visit annotations in context

Tags

Open Data

Open Syllabus

Annotators

Enkerli

URL

chronicle.com/blogs/profhacker/beyond-textbooks-and-oer-reflecting-on-opened15/61342
europa.eu europa.eu

European Commission - PRESS RELEASES - Press release - Neelie Kroes Vice-President of the European Commission responsible for the Digital Agenda Digital Agenda and Open Data From Crisis of Trust to Open Governing Bratislava, 5 March 2012

1
1. Enkerli 27 Nov 2015
  
  in Public
  
  That's why I say that data is the new oil for the digital age
  
  Data Linked Data Open Data #BigData Data Economy
Visit annotations in context

Tags

#BigData

Open Data

Data

Linked Data

Data Economy

Annotators

Enkerli

URL

europa.eu/rapid/press-release_SPEECH-12-149_en.htm
www.randalolson.com www.randalolson.com

Introducing TPOT, the Data Science Assistant

1
1. daveh70 16 Nov 2015
  
  in Public
  
  TPOT is a Python tool that automatically creates and optimizes machine learning pipelines using genetic programming. Think of TPOT as your “Data Science Assistant”: TPOT will automate the most tedious part of machine learning by intelligently exploring thousands of possible pipelines, then recommending the pipelines that work best for your data.
  
  https://github.com/rhiever/tpot TPOT (Tree-based Pipeline Optimization Tool) Built on numpy, scipy, pandas, scikit-learn, and deap.
  
  machine learning artificial intelligence data science
Visit annotations in context

Tags

artificial intelligence

machine learning

data science

Annotators

daveh70

URL

randalolson.com/2015/11/15/introducing-tpot-the-data-science-assistant/
booktype.okfn.org booktype.okfn.org

/chapter: About-This-Book / Open Education Handbook 2014

1
1. otterscotter 06 Nov 2015
  
  in Public
  
  Open Education Handbook 2014
  
  All about open education
  
  open education OER open textbooks open access open pedagogy open research open data
Visit annotations in context

Tags

open textbooks

open access

open research

open education

open data

OER

open pedagogy

Annotators

otterscotter

URL

booktype.okfn.org/open-education-handbook-2014/about-this-book/
Oct 2015
www.campuscomputing.net www.campuscomputing.net

CC2015 - Exec Summary

1
1. otterscotter 29 Oct 2015
  
  in Public
  
  The Coming of OERRelated to the enthusiasm for digital instructional resources,four-fifths (81percent) of the survey participants agreethat “Open Source textbooks/Open Education Resource(OER) content “will be an important source for instructional resources in five yea
  
  OER campus computing survey data research
Visit annotations in context

Tags

campus computing

research

survey

OER

data

Annotators

otterscotter

URL

campuscomputing.net/sites/www.campuscomputing.net/files/CC2015 - Exec Summary & Graphics.pdf
web.hypothes.is web.hypothes.is

Annotating the law | Hypothes.is

2
1. Enkerli 13 Oct 2015
  
  in Public
  
  why not annotate, say, the Eiffel Tower itself
  
  As long as it has some URI, it can be annotated. Any object in the world can be described through the Semantic Web. Especially with Linked Open Data.
  
  #LODLAM Linked Data Semantic Annotation Semantic Web Semantic images
2. Enkerli 09 Oct 2015
  
  in Public
  
  If you deal with PDFs online, you’ve probably noticed that some are different from others. Some are really just images.
  
  First step in Linked Open Data is moving away from image PDFs.
  
  Open Data #LODLAM
Visit annotations in context

Tags

#LODLAM

Open Data

Semantic images

Semantic Annotation

Linked Data

Semantic Web

Annotators

Enkerli

URL

web.hypothes.is/help/keyboard-shortcuts-for-hypothesis/
hackpad.com hackpad.com

Open Research II: Week Four (Special Guest Speaker Session)

1
1. otterscotter 07 Oct 2015
  
  in Public
  
  Open Research MOOC
  
  mooc hackpad open data open research
Visit annotations in context

Tags

open data

mooc

hackpad

open research

Annotators

otterscotter

URL

hackpad.com/Open-Research-II-Week-Four-Special-Guest-Speaker-Session-Bayn7fyH0w3
www.iowastatedaily.com www.iowastatedaily.com

Faculty Senate wants work published through Open Access

1
1. otterscotter 07 Oct 2015
  
  in Public
  
  The second level of Open Access is Gold Open Access, which requires the author to pay the publishing platform a fee to have their work placed somewhere it can be accessed for free. These fees can range in the hundreds to thousands of dollars.
  
  Not necessarily true. This is a misconception. "About 70 percent of OA journals charge no APCs at all. We’ve known this for a decade but it’s still widely overlooked by people who should know better." -Suber http://lj.libraryjournal.com/2015/09/opinion/not-dead-yet/an-interview-with-peter-suber-on-open-access-not-dead-yet/#_
  
  suber open access open data
Visit annotations in context

Tags

open data

open access

suber

Annotators

otterscotter

URL

iowastatedaily.com/news/politics_and_administration/campus/article_867177b2-6861-11e5-a886-739d913b3ea8.html
Sep 2015
docs.google.com docs.google.com

Controlled-Vocabulary-Considerations-xAPI

3
1. jessiechuang 20 Sep 2015
  
  in Public
  
  In a nutshell, an ontology answers the question, “What things can we say exist in a domain, and how do we describe those things that relate to each other?”
  
  Ontology Linked Data
2. jessiechuang 20 Sep 2015
  
  in Public
  
  According to inventor of the World Wide Web, Tim Berners-Lee, there are four key principles of Linked Data (Berners-Lee, 2006): Use URIs to denote things. Use HTTP URIs so that these things can be referred to and looked up (dereferenced) by people and user agents. Provide useful information about the thing when its URI is dereferenced, leveraging standards such as RDF, SPARQL. Include links to other related things (using their URIs) when publishing data on the web.
  
  Linked Data
3. jessiechuang 20 Sep 2015
  
  in Public
  
  In section 4.1.3.2 of the xAPI specification, it states “Activity Providers SHOULD use a corresponding existing Verb whenever possible.”
  
  xAPI Linked Data
Visit annotations in context

Tags

Linked Data

Ontology

xAPI

Annotators

jessiechuang

URL

docs.google.com/document/d/1zBPKryuF1tXHTI-AYjXd0ctdWoq4o4P-Uq9SAhJfus0/edit
europepmc.org europepmc.org

Restoring Study 329: efficacy and harms of paroxetine and imipramine in treatment of major depression in adolescence

2
1. memartone 17 Sep 2015
  
  in Public
  
  This is problematic because the article has been influential in the literature supporting the use of antidepressants in adolescents.
  
  Example of the type of harm that lack of transparency can lead to.
  
  Data sharing Reproducibility
2. memartone 17 Sep 2015
  
  in Public
  
  Access to primary data from trials has important implications for both clinical practice and research, including that published conclusions about efficacy and safety should not be read as authoritative. The reanalysis of Study 329 illustrates the necessity of making primary trial data and protocols available to increase the rigour of the evidence base.
  
  How can anyone argue that science isn't served by making primary data available? We must recognize that more people are harmed by not sharing data than are harmed by data being shared.
  
  Data sharing
Visit annotations in context

Tags

Data sharing

Reproducibility

Annotators

memartone

URL

europepmc.org/abstract/MED/26376805
www.sciencedirect.com www.sciencedirect.com

Distinct Subpopulations of Nucleus Accumbens Dynorphin Neurons Drive Aversion and Reward

1
1. tradman 15 Sep 2015
  
  in Public
  
  (B) Dyn labeling in dyn-IRES-cre x Ai9-tdTomato compared to in situ images from the Allen Institute for Brain Science in a sagittal section highlighting presence of dyn in the striatum, the hippocampus, BNST, amygdala, hippocampus, and substantia nigra. All images show tdTomato (red) and Nissl (blue) staining.(C) Coronal section highlighting dynorphinergic cell labeling in the NAc as compared to the Allen Institute for Brain Science.
  
  Allen Brain Institute
  
  data reuse
Visit annotations in context

Tags

data reuse

Annotators

tradman

URL

sciencedirect.com/science/article/pii/S0896627315007138
www.sciencemag.org www.sciencemag.org

Reward-Predictive Cues Enhance Excitatory Synaptic Strength onto Midbrain Dopamine Neurons

1
1. tradman 15 Sep 2015
  
  in Public
  
  Because cue-evoked DA release developed throughout learning, we examined whether DA release correlated with conditioned-approach behavior. Figure 1E and table S1 show that the ratio of the CS-related DA release to the reward-related DA release was significantly (r2 = 0.68; P = 0.0005) correlated with number of CS nosepokes in a conditioning session (also see fig. S4).
  
  single trial analysis
  
  data reuse
Visit annotations in context

Tags

data reuse

Annotators

tradman

URL

sciencemag.org/content/321/5896/1690.full
www.confluent.io www.confluent.io

Using logs to build a solid data infrastructure (or: why dual writes are a bad idea)

1
1. robertknight 04 Sep 2015
  
  in Public
  
  This approach is called change data capture, which I wrote about recently (and implemented on PostgreSQL). As long as you’re only writing to a single database (not doing dual writes), and getting the log of writes from the database (in the order in which they were committed to the DB), then this approach works just as well as making your writes to the log directly.
  
  Interesting section on applying log-orientated approaches to existing systems.
  
  data-infrastructure
Visit annotations in context

Tags

data-infrastructure

Annotators

robertknight

URL

confluent.io/blog/using-logs-to-build-a-solid-data-infrastructure-or-why-dual-writes-are-a-bad-idea/
Aug 2015
www.edudemic.com www.edudemic.com

A Teacher’s Guide to Wikipedia | Edudemic

1
1. Enkerli 28 Aug 2015
  
  in Public
  
  Shared information
  
  The “social”, with an embedded emphasis on the data part of knowledge building and a nod to solidarity. Cloud computing does go well with collaboration and spelling out the difference can help lift some confusion.
  
  social media Open Education Cloud Computing Big Data Open Access Open Standards Open Data
Visit annotations in context

Tags

Big Data

Open Education

Open Data

Cloud Computing

social media

Open Access

Open Standards

Annotators

Enkerli

URL

edudemic.com/social-media-education/
hypothes.is hypothes.is

I Annotate 2013: Our Take | Hypothesis

1
1. Enkerli 21 Aug 2015
  
  in Public
  
  publisher or museum
  
  Potential for LODLAM!
  
  #LODLAM Linked Data Libraries Archives Museums
Visit annotations in context

Tags

Linked Data

#LODLAM

Libraries Archives Museums

Annotators

Enkerli

URL

hypothes.is/blog/iannotate-2013-our-take/
www.w3.org www.w3.org

What do HTTP URIs Identify? - Design Issues

1
1. tilgovi 14 Aug 2015
  
  in Public
  
  I feel that there is a great benefit to fixing this question at the spec level. Otherwise, what happens? I read a web page, I like it and I am going to annotate it as being a great one -- but first I have to find out whether the URI my browser is used, conceptually by the author of the page, to represent some abstract idea?
  
  Tim Berners-Lee annotation semantic web linked data
Visit annotations in context

Tags

annotation

linked data

semantic web

Tim Berners-Lee

Annotators

tilgovi

URL

w3.org/DesignIssues/HTTP-URI.html
blogs.nature.com blogs.nature.com

We’ve clarified our policies on institutional repositories : Scientific Data

1
1. libriomancer 12 Aug 2015
  
  in Public
  
  data deposition is limited to researchers working at the same institution,
  
  Not necessarily. For many institutions, as long as one of the researchers is affiliated, the data can be deposited
  
  research data institutional repositories
Visit annotations in context

Tags

research data

institutional repositories

Annotators

libriomancer

URL

blogs.nature.com/scientificdata/2015/08/05/institutional-repositories/
europepmc.org europepmc.org

Sizing the Problem of Improving Discovery and Access to NIH-Funded Data: A Preliminary Study

4
1. mavery 11 Aug 2015
  
  in Public
  
  Big data to knowledge (BD2K)
  
  would like to know more about this term and HHS inititiative
  
  big data BD2K
2. mavery 11 Aug 2015
  
  in Public
  
  the definition of a “dataset,”
  
  this is interesting, and will be interesting to track within and across disciplines
  
  data dataset
3. memartone 11 Aug 2015
  
  in Public
  
  Approximately 87% of the invisible datasets consist of data newly collected for the research reported; 13% reflect reuse of existing data. More than 50% of the datasets were derived from live human or non-human animal subjects.
  
  Another good statistic to have
  
  Unrecovered data
4. memartone 11 Aug 2015
  
  in Public
  
  Among articles with invisible datasets, we found an average of 2.9 to 3.4 datasets, suggesting there were approximately 200,000 to 235,000 invisible datasets generated from NIH-funded research published in 2011.
  
  This is a good statistic to have handy.
  
  Unrecovered data
Visit annotations in context

Tags

BD2K

dataset

big data

Unrecovered data

data

Annotators

memartone

mavery

URL

europepmc.org/abstract/MED/26207759
dublincore.org dublincore.org

DCMI Metadata Terms

1
1. libriomancer 07 Aug 2015
  
  in Public
  
  Dataset
  
  what about unstructured data?
  
  data metadata dublin core
Visit annotations in context

Tags

metadata

data

dublin core

Annotators

libriomancer

URL

dublincore.org/documents/dcmi-terms/
Jun 2015
www.aaai.org www.aaai.org

Improving Your Chances: Boosting Citizen Science Discovery

1
1. ethanwhite 30 Jun 2015
  
  in Public
  
  The comparison between the model and the experts is based on the species distribution models (SMDs), not on actual species occurrences, so the observed difference could be due to weakness in the SDM predictions rather than the model outperforming the experts. The explanation for this choice in Footnote 4 is reasonable, but I wonder if it could be addressed by rarifying the sampling appropriately.
  
  ecology data modeling
Visit annotations in context

Tags

ecology data modeling

Annotators

ethanwhite

URL

aaai.org/ocs/index.php/HCOMP/HCOMP13/paper/viewFile/7477/7420
chronicle.com chronicle.com

Where Should You Keep Your Data?

1
1. libriomancer 24 Jun 2015
  
  in Public
  
  If you can’t find the correct web page, ask a reference librarian.
  
  YES, ASK US. Also, we love to work with faculty on managing their data!
  
  libraries data management
Visit annotations in context

Tags

libraries

data management

Annotators

libriomancer

URL

chronicle.com/article/Where-Should-You-Keep-Your/231065/
gizmodo.com gizmodo.com

Amazon Will Soon Start Paying Authors Based on e-Book Pages Read

1
1. jeremydean 22 Jun 2015
  
  in Public
  
  possible with modern technology,
  
  This is terrifying but also fascinating. Imagine the data for MFA programs on the content/style whatever on the last page readers thumbed before stopping the turning!
  
  Also, couldn't this system be easily gamed: creating bots to "peruse" texts at the right pace repeatedly?
  
  Amazon Kindle data
Visit annotations in context

Tags

Kindle

Amazon

data

Annotators

jeremydean

URL

gizmodo.com/amazon-will-soon-start-paying-authors-based-on-e-book-p-1712821327
docs.gatesfoundation.org docs.gatesfoundation.org

Literacy-Courseware-Challege-RFP.pdf

1
1. jeremydean 17 Jun 2015
  
  in Public
  
  G enerat ing student performance data that can help students, teachers, and parents identify areas for further teaching or practice
  
  Data, data, data
  
  Common Core Gates data
Visit annotations in context

Tags

data

Gates

Common Core

Annotators

jeremydean

URL

docs.gatesfoundation.org/documents/Literacy-Courseware-Challege-RFP.pdf
nepanode.anl.gov nepanode.anl.gov

Endangered Species Act: Section 7&10 Consultation - Terrestrial and Freshwater Species — Explore Maps - NEPAnode

1
1. jjediny 05 Jun 2015
  
  in Public
  
  Critical Habitat - Terrestrial - Polygon [USFWS] Critical Habitat - Terrestrial - Line [USFWS]
  
  Critical Habitat Layers need to be updated
  
  Data Refresh
Visit annotations in context

Tags

Refresh

Data

Annotators

jjediny

URL

nepanode.anl.gov/maps/775
May 2015
www.theengineroom.org www.theengineroom.org

Bringing a text to life: 6 platforms for annotating text online

1
1. judell 27 May 2015
  
  in Public
  
  The book would need to be set up on a website first
  
  Not necessarily, if PDF is in the mix, it can be the medium for annotations that might later anchor to a website -- even if PDFs are distributed to participants and used locally as mentioned above.
  
  responsible-data HypothesisInThePress
Visit annotations in context

Tags

HypothesisInThePress

responsible-data

Annotators

judell

URL

theengineroom.org/platforms-for-annotating-text-online/
dlib.nyu.edu dlib.nyu.edu

It’s about time: historical periodization and Linked Ancient World Data

1
1. gethia 24 May 2015
  
  in Public
  
  periods have proven to work poorly with Linked Data principles, which require well-defined entities for linking.
  
  linked data
Visit annotations in context

Tags

linked data

Annotators

gethia

URL

dlib.nyu.edu/awdl/isaw/isaw-papers/7/rabinowitz/
Apr 2015
pywb-h.herokuapp.com pywb-h.herokuapp.com

Rationale for WHO's New Position Calling for Prompt Reporting and Public Disclosure of Interventional Clinical Trial Results

2
1. memartone 16 Apr 2015
  
  in Public
  
  There is now a strong body of evidence showing failure to comply with results-reporting requirements across intervention classes, even in the case of large, randomised trials [3–7]. This applies to both industry and investigator-driven trials. I
  
  Compliance not mechanism
  
  data sharing
2. memartone 16 Apr 2015
  
  in Public
  
  “the registration of all interventional trials is a scientific, ethical, and moral responsibility”
  
  World Health Organization's statement
  
  data sharing
Visit annotations in context

Tags

data sharing

Annotators

memartone

URL

pywb-h.herokuapp.com/journals.plos.org/plosmedicine/article
europepmc.org europepmc.org

How to Get All Trials Reported: Audit, Better Data, and Individual Accountability

2
1. memartone 16 Apr 2015
  
  in Public
  
  Anyone withholding the methods and results of a clinical trial is already in breach of multiple codes and regulations, including the Declaration of Helsinki, various promises from industry and professional bodies, and, in many cases, the United States Food and Drug Administration (FDA) Amendment Act of 2007. Indeed, a recently published cohort study of trials in clinicaltrials.gov found that more than half had failed to post results; and even though the FDA is entitled to issue fines of $10,000 a day for transgressions, no such fines have ever been levied [3].
  
  Sticks don't work if they aren't used. I find this rather disturbing.
  
  data sharing mandate
2. memartone 16 Apr 2015
  
  in Public
  
  The best currently available evidence shows that the methods and results of clinical trials are routinely withheld from doctors, researchers, and patients [2–5], undermining our best efforts at informed decision making.
  
  open data data sharing
Visit annotations in context

Tags

mandate

open data

data sharing

Annotators

memartone

URL

europepmc.org/abstract/MED/25874719
www.badscience.net www.badscience.net

I foresee that nobody will do anything about this problem – Bad Science

1
1. memartone 16 Apr 2015
  
  in Public
  
  This week there was an amazing landmark announcement from the World Health Organisation: they have come out and said that everyone must share the results of their clinical trials, within 12 months of completion, including old trials (since those are the trials conducted on currently used treatments).
  
  Data sharing
Visit annotations in context

Tags

Data sharing

Annotators

memartone

URL

badscience.net/2008/06/all-time-classic-creationist-pwnage/
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov

NeuroLex.org: an online framework for neuroscience knowledge

1
1. memartone 09 Apr 2015
  
  in Public
  
  First, the domain is a poor candidate because the domain of all entities relevant to neurobiological function is extremely large, highly fragmented into separate subdisciplines, and riddled with lack of consensus (Shirky, 2005).
  
  Probably a good thing to add to the Complex Data integration workshop write up
  
  Complex data integration notes
Visit annotations in context

Tags

Complex data integration notes

Annotators

memartone

URL

ncbi.nlm.nih.gov/pmc/articles/PMC3757470/
dmm.biologists.org dmm.biologists.org

Shining a light on dark data

1
1. judell 03 Apr 2015
  
  in Public
  
  Wouldn’t it be useful, both to the scientific community or the wider world, to increase the publication of negative results?
  
  science dark-data
Visit annotations in context

Tags

science

dark-data

Annotators

judell

URL

dmm.biologists.org/content/2/11-12/521
Mar 2015
flowingdata.com flowingdata.com

Future of visualization

1
1. pietroblu 24 Mar 2015
  
  in Public
  
  the future of visualization
  
  really true!
  
  show data variations, not design variations
  
  visualization data design
Visit annotations in context

Tags

design

visualization

data

Annotators

pietroblu

URL

flowingdata.com/2015/03/23/future-of-visualization-2/
iopscience.iop.org iopscience.iop.org

C:

1
1. augustmuench 18 Mar 2015
  
  in Public
  
  Geneva group “high” mass-loss evolutionary tracks
  
  Is there a http link for these evolutionary models?
  
  data link
Visit annotations in context

Tags

data link

Annotators

augustmuench

URL

iopscience.iop.org/0004-637X/774/2/100/pdf/0004-637X_774_2_100.pdf
Feb 2015
wiki.chn.io wiki.chn.io

Smallest Federated Wiki

1
1. almereyda 15 Feb 2015
  
  in Public
  
  Num / Num summarizes a graph with nodes / arcs.
  
  The underlying graph model is not explicitly mentionned here nor in the README of the plugin.
  
  federated wiki graphs data model
Visit annotations in context

Tags

federated wiki

data model

graphs

Annotators

almereyda

URL

wiki.chn.io/about-data-plugin.html
go.coverity.com go.coverity.com

2014-Coverity-Scan-Spotlight-Big-Data.pdf

1
1. vonhaller 04 Feb 2015
  
  in Public
  
  the critical role that big data open source projects play in the Internet of Things (IoT).
  
  Big Data Open source IoT
Visit annotations in context

Tags

Open source

IoT

Big Data

Annotators

vonhaller

URL

go.coverity.com/rs/coverity/images/2014-Coverity-Scan-Spotlight-Big-Data.pdf
Jan 2015
readwrite.com readwrite.com

The Internet Of Things Will Be A Hotel California For Your Data

2
1. BigBlueHat 29 Jan 2015
  
  in Public
  
  Make no mistake, in today's digital age, we are most definitely "renters" with virtually no rights—including rights to our data.
  
  data ownership renting
2. BigBlueHat 29 Jan 2015
  
  in Public
  
  The Internet of Things promises to create mountains upon mountains of data, but none of it will be yours.
  
  Internet of Things data ownership
Visit annotations in context

Tags

data ownership

renting

Internet of Things

Annotators

BigBlueHat

URL

readwrite.com/2015/01/23/internet-of-things-data-privacy-hotel-california
newleftreview.org newleftreview.org

New Left Review - Evgeny Morozov: Socialize the Data Centres!

2
1. adamsaltiel 28 Jan 2015
  
  in Public
  
  The big question, of course, is whether that player has to be a private capitalist corporation, or some federated, publicly-run set of services that could reach a data-sharing agreement free of monitoring by intelligence agencies.
  
  So there we are. It is pretty straight forward really.
  
  internet data ownership morozov
2. adamsaltiel 28 Jan 2015
  
  in Public
  
  But if you turn data into a money-printing machine for citizens, whereby we all become entrepreneurs, that will extend the financialization of everyday life to the most extreme level, driving people to obsess about monetizing their thoughts, emotions, facts, ideas—because they know that, if these can only be articulated, perhaps they will find a buyer on the open market. This would produce a human landscape worse even than the current neoliberal subjectivity. I think there are only three options. We can keep these things as they are, with Google and Facebook centralizing everything and collecting all the data, on the grounds that they have the best algorithms and generate the best predictions, and so on. We can change the status of data to let citizens own and sell them. Or citizens can own their own data but not sell them, to enable a more communal planning of their lives. That’s the option I prefer.
  
  Very well thought out. Obviously must know about read write web, TSL certificate issues etc. But what does neoliberal subjectivity mean? An interesting phrase.
  
  SSL TSL cryptography internet story politics data security morozov
Visit annotations in context

Tags

cryptography

politics

internet

TSL

SSL

story

data

security

ownership

morozov

Annotators

adamsaltiel

URL

newleftreview.org/II/91/evgeny-morozov-socialize-the-data-centres
Dec 2014
www.openthesaurus.de www.openthesaurus.de

Synonyme - OpenThesaurus - Deutscher Thesaurus

1
1. sofias 20 Dec 2014
  
  in Public
  
  open data thesaurus deutsch
Visit annotations in context

Tags

open data

thesaurus

deutsch

Annotators

sofias

URL

openthesaurus.de/
publishing.aip.org publishing.aip.org

Supporting data | American Institute of Physics

1
1. augustmuench 10 Dec 2014
  
  in Public
  
  This is the redirected link for the "Physics Auxiliary Publication Service". But there is no data here.
  
  data data sharing supplementary material journal archives
Visit annotations in context

Tags

journal archives

supplementary material

data

data sharing

Annotators

augustmuench

URL

publishing.aip.org/authors/supporting-data
Nov 2014
www.hackeducation.com www.hackeducation.com

From "Open" to Justice #OpenCon2014

2
1. tilgovi 17 Nov 2014
  
  in Public
  
  If we believe in equality, if we believe in participatory democracy and participatory culture, if we believe in people and progressive social change, if we believe in sustainability in all its environmental and economic and psychological manifestations, then we need to do better than slap that adjective “open” onto our projects and act as though that’s sufficient or — and this is hard, I know — even sound.
  
  open participatory democracy democracy open source participatory culture open data
2. tilgovi 17 Nov 2014
  
  in Public
  
  that the moments when students generate “education data” is, historically, moments when they come into contact with the school and more broadly the school and the state as a disciplinary system
  
  open data discipline power
Visit annotations in context

Tags

open source

open data

participatory democracy

discipline

democracy

participatory culture

power

open

Annotators

tilgovi

URL

hackeducation.com/2014/11/16/from-open-to-justice/
May 2014
www-group.slac.stanford.edu www-group.slac.stanford.edu

Untitled document

1
1. aculich 20 May 2014
  
  in Public
  
  SSPP # 7.2 Power Usage Effectiveness (PUE) (Electronic Maximum annual weighted average PUE of 1.4 by FY15 )
  
  SLAC target PUE of 1.4 by FY15
  
  PUE SLAC data center efficiency data centers
Visit annotations in context

Tags

data centers

PUE

data center efficiency

SLAC

Annotators

aculich

URL

www-group.slac.stanford.edu/fac/docs/SLAC_Sustainability_Plan_FY13.pdf
blogs.berkeley.edu blogs.berkeley.edu

Untitled document

1
1. aculich 20 May 2014
  
  in Public
  
  Google’s ultra-efficient data centers, with a PUE of 1.12, are beating the PUE curve by miles.
  
  Google's PUE is 1.12
  
  PUE data center efficiency data centers Google
Visit annotations in context

Tags

Google

PUE

data center efficiency

data centers

Annotators

aculich

URL

blogs.berkeley.edu/2014/04/01/how-bit-met-watt/comment-page-1/
www.i2sl.org www.i2sl.org

Untitled document

1
1. aculich 20 May 2014
  
  in Public
  
  When the project is complete later this year (all done while the existing data center remained in operation!), the data center's annual PUE will drop from 1.5 to 1.2, saving 20 percent of its annual electrical cost.
  
  Warren Hall target efficiency: 1.2 as of 2011
  
  UCBerkeley PUE data center efficiency data centers
Visit annotations in context

Tags

PUE

data center efficiency

UCBerkeley

data centers

Annotators

aculich

URL

i2sl.org/labs21/conference/2011/abstracts/e6_soladay_1.html
www.mghpcc.org www.mghpcc.org

Untitled document

1
1. aculich 20 May 2014
  
  in Public
  
  The MGHPCC is targeting a PUE of less than 1.3. A recent report cites typical data center PUEs at 1.9. This means that our facility can expect to
  
  Target of 1.3 (vs typical data centers around 1.9) PUE
  
  MGHPCC PUE data center efficiency data centers
Visit annotations in context

Tags

PUE

data center efficiency

MGHPCC

data centers

Annotators

aculich

URL

mghpcc.org/about/what-are-the-green-design-aspects-of-the-mghpcc/
Apr 2014
www.dbms2.com www.dbms2.com

Untitled document

1
1. aculich 30 Apr 2014
  
  in Public
  
  Mike Olson of Cloudera is on record as predicting that Spark will be the replacement for Hadoop MapReduce. Just about everybody seems to agree, except perhaps for Hortonworks folks betting on the more limited and less mature Tez. Spark’s biggest technical advantages as a general data processing engine are probably: The Directed Acyclic Graph processing model. (Any serious MapReduce-replacement contender will probably echo that aspect.) A rich set of programming primitives in connection with that model. Support also for highly-iterative processing, of the kind found in machine learning. Flexible in-memory data structures, namely the RDDs (Resilient Distributed Datasets). A clever approach to fault-tolerance.
  
  Spark's advantages:
  
  DAG processing model
  
  programming primitives for DAG model
  
  highly-iterative processing suited for ML
  
  RDD in-memory data structures
  
  clever approach to fault-tolerance
  
  brc data spark
Visit annotations in context

Tags

spark

data

brc

Annotators

aculich

URL

dbms2.com/2014/04/30/spark-on-fire/
Feb 2014
www.shirky.com www.shirky.com

Untitled document

4
1. aculich 05 Feb 2014
  
  in Public
  
  1960 and 1975, states more than doubled their rate of appropriations for higher education, from four dollars per thousand in state revenue to ten.
  
  data verify
2. aculich 05 Feb 2014
  
  in Public
  
  From 1945 to 1975, the number of undergraduates increased five-fold, and graduate students nine-fold. PhDs graduating one year got jobs teaching the ever-larger cohort of freshman arriving the next.
  
  data verify
3. aculich 05 Feb 2014
  
  in Public
  
  In the first half of the 20th century, higher education was a luxury and a rarity in the U.S. Only 5% or so of adults, overwhelmingly drawn from well-off families, had attended college.
  
  data verify
4. aculich 05 Feb 2014
  
  in Public
  
  The proportion of part-time and non-tenure track teachers went from less than half of total faculty, before 1975, to over two-thirds now.
  
  non-tenure track teachers data verify
Visit annotations in context

Tags

verify

non-tenure track teachers

data

Annotators

aculich

URL

shirky.com/weblog/2014/01/there-isnt-enough-money-to-keep-educating-adults-the-way-were-doing-it/
www.tweaktown.com www.tweaktown.com

Untitled document

2
1. aculich 03 Feb 2014
  
  in Public
  
  The Backblaze environment is the exact opposite. I do not believe I could dream up worse conditions to study and compare drive reliability. It's hard to believe they plotted this out and convened a meeting to outline a process to buy the cheapest drives imaginable, from all manner of ridiculous sources, install them into varying (and sometimes flawed) chassis, then stack them up and subject them to entirely different workloads and environmental conditions... all with the purpose of determining drive reliability.
  
  The conditions and process described here mirrors the process many organizations go through in an attempt to cut costs by trying to cut through what is perceived as marketing-hype. The cost differences are compelling enough to continually tempt people down a path to considerably reduce costs while believing that they've done enough due-diligence to avoid raising the risk to an unacceptable level.
  
  Backblaze data-stories risk calculus
2. aculich 03 Feb 2014
  
  in Public
  
  The enthusiast in me loves the Backblaze story. They are determined to deliver great value to their customers, and will go to any length to do so. Reading the blog posts about the extreme measures they took was engrossing, and I'm sure they enjoyed rising to the challenge. Their Storage Pod is a compelling design that has been field-tested extensively, and refined to provide a compelling price point per GB of storage.
  
  An anecdote with data to quantify the experience has some value sort of drawing conclusions for making future decisions-- but the temptation to make decisions on that single story is high in the face of the void quantified stories & data from other sources. What is a responsible way to collect these data-stories and publish them with disclaimers sufficient enough to avoid the spin that invariably comes along with them?
  
  In part the industry opens itself up to this kind of spin when the data at-scale is not made available publicly and we're all subject to the marketing-spin in the purchase decision-making process.
  
  backblaze data-stories spin
Visit annotations in context

Tags

Backblaze

spin

risk

data-stories

calculus

backblaze

Annotators

aculich

URL

tweaktown.com/print/articles/6028/index.html
Jan 2014
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov

Untitled document

8
1. aculich 31 Jan 2014
  
  in Public
  
  Less than half (45%) of the respondents are satisfied with their ability to integrate data from disparate sources to address research questions
  
  The most important take-away I see in this whole section on reasons for not making data electronically available is not mentioned here directly!
  
  Here are the raw numbers for I am satisfied with my ability to integrate data from disparate sources to address research questions:
  
  156 (12.2%) Agree Strongly
  
  419 (32.7%) Agree Somewhat
  
  363 (28.3%) Neither Agree nor Disagree
  
  275 (21.5%) Disagree Somewhat
  
  069 (05.4%) Disagree Strongly
  
  Of the people who are not satisfied in some way, how many of those think current data sharing mechanisms are sufficient for their needs?
  
  Of the ~5% of people who are strongly dissatisfied, how many of those are willing to spend time, energy, and money on new sharing mechanisms, especially ones that are not yet proven? If they are willing to do so, then what measurable result or impact will the new mechanism have over the status quo?
  
  Who feel that current sharing mechanisms stand in the way of publications, tenure, promotion, or being cited?
  
  Of those who are dissatisfied, how many have existing investment in infrastructure versus those who are new and will be investing versus those who cannot invest in old or new?
  
  10 years ago how would you have convinced someone they need an iPad or Android smartphone?
  
  RIT status quo survey data satisfaction barriers question controversy
2. aculich 31 Jan 2014
  
  in Public
  
  Reasons for not making data electronically available. Regarding their attitudes towards data sharing, most of the respondents (85%) are interested in using other researchers' datasets, if those datasets are easily accessible. Of course, since only half of the respondents report that they make some of their data available to others and only about a third of them (36%) report their data is easily accessible, there is a major gap evident between desire and current possibility. Seventy-eight percent of the respondents said they are willing to place at least some their data into a central data repository with no restrictions. Data repositories need to make accommodations for varying levels of security or access restrictions. When asked whether they were willing to place all of their data into a central data repository with no restrictions, 41% of the respondents were not willing to place all of their data. Nearly two thirds of the respondents (65%) reported that they would be more likely to make their data available if they could place conditions on access. Less than half (45%) of the respondents are satisfied with their ability to integrate data from disparate sources to address research questions, yet 81% of them are willing to share data across a broad group of researchers who use data in different ways. Along with the ability to place some restrictions on sharing for some of their data, the most important condition for sharing their data is to receive proper citation credit when others use their data. For 92% of the respondents, it is important that their data are cited when used by other researchers. Eighty-six percent of survey respondents also noted that it is appropriate to create new datasets from shared data. Most likely, this response relates directly to the overwhelming response for citing other researchers' data. The breakdown of this section is presented in Table 13.
  
  Categories of data sharing considered:
  
  I would use other researchers' datasets if their datasets were easily accessible.
  
  I would be willing to place at least some of my data into a central data repository with no restrictions.
  
  I would be willing to place all of my data into a central data repository with no restrictions.
  
  I would be more likely to make my data available if I could place conditions on access.
  
  I am satisfied with my ability to integrate data from disparate sources to address research questions.
  
  I would be willing to share data across a broad group of researchers who use data in different ways.
  
  It is important that my data are cited when used by other researchers.
  
  It is appropriate to create new datasets from shared data.
  
  RIT data curation data sharing categories survey results
3. aculich 31 Jan 2014
  
  in Public
  
  Data sharing practices. Only about a third (36%) of the respondents agree that others can access their data easily, although three-quarters share their data with others (see Table 11). This shows there is a willingness to share data, but it is difficult to achieve or is done only on request.
  
  There is a willingness, but not a way!
  
  RIT data curation data sharing
4. aculich 31 Jan 2014
  
  in Public
  
  Nearly one third of the respondents chose not to answer whether they make their data available to others. Of those who did respond, 46% reported they do not make their data electronically available to others. Almost as many reported that at least some of their data are available somehow, either on their organization's website, their own website, a national network, a global network, a personal website, or other (see Table 10). The high percentage of non-respondents to this question most likely indicates that data sharing is even lower than the numbers indicate. Furthermore, the less than 6% of scientists who are making “All” of their data available via some mechanism, tends to re-enforce the lack of data sharing within the communities surveyed.
  
  RIT data curation data sharing
5. aculich 31 Jan 2014
  
  in Public
  
  Adding descriptive metadata to datasets helps makes the dataset more accessible by others and into the future. Respondents were asked to indicate all metadata standards they currently use to describe their data. More than half of the respondents (56%) reported that they did not use any metadata standard and about 22% of respondents indicated they used their own lab metadata standard. This could be interpreted that over 78% of survey respondents either use no metadata or a local home grown metadata approach.
  
  Not surprising that roughly 80% use no or ad hoc metadata.
  
  RIT data curation metadata ad hoc
6. aculich 31 Jan 2014
  
  in Public
  
  Data reuse. Respondents were asked to indicate whether they have the sole responsibility for approving access to their data. Of those who answered this question, 43% (n=545) have the sole responsibility for all their datasets, 37% (n=466) have for some of their datasets, and 21% (n=266) do not have the sole responsibility.
  
  RIT data curation reuse responsibility
7. aculich 31 Jan 2014
  
  in Public
  
  Policies and procedures sometimes serve as an active rather than passive barrier to data sharing. Campbell et al. (2003) reported that government agencies often have strict policies about secrecy for some publicly funded research. In a survey of 79 technology transfer officers in American universities, 93% reported that their institution had a formal policy that required researchers to file an invention disclosure before seeking to commercialize research results. About one-half of the participants reported institutional policies that prohibited the dissemination of biomaterials without a material transfer agreement, which have become so complex and demanding that they inhibit sharing [15].
  
  Policies and procedures are barriers, but there are many more barriers beyond that which get in the way first.
  
  RIT data curation policy barriers technology transfer
8. aculich 31 Jan 2014
  
  in Public
  
  data practices of researchers – data accessibility, discovery, re-use, preservation and, particularly, data sharing
  
  data accessibility
  
  discovery
  
  re-use
  
  preservation
  
  data sharing
  
  RIT data curation practices
Visit annotations in context

Tags

question

barriers

technology transfer

RIT

metadata

responsibility

data curation

data sharing

survey data

categories

controversy

practices

status quo

reuse

ad hoc

survey results

satisfaction

policy

Annotators

aculich

URL

ncbi.nlm.nih.gov/pmc/articles/PMC3126798/
www.dataone.org www.dataone.org

Untitled document

10
1. aculich 31 Jan 2014
  
  in Public
  
  The Data Life Cycle: An Overview The data life cycle has eight components: Plan : description of the data that will be compiled, and how the data will be managed and made accessible throughout its lifetime Collect : observations are made either by hand or with sensors or other instruments and the data are placed a into digital form Assure : the quality of the data are assured through checks and inspections Describe : data are accurately and thoroughly described using the appropriate metadata standards Preserve : data are submitted to an appropriate long-term archive (i.e. data center ) Discover : potentially useful data are located and obtained, along with the relevant information about the data ( metadata ) Integrate : data from disparate sources are combined to form one homogeneous set of data that can be readily analyzed Analyze : data are analyzed
  
  The lifecycle according to who? This 8-component description is from the point of view of only the people who obsessively think about this "problem".
  
  Ask a researcher and I think you'll hear that lifecycle means something like:
  
  collect -> analyze -> publish
  
  or a more complex data management plan might be:
  
  ask someone -> receive data in email -> analyze -> cite -> publish -> tenure
  
  To most people lifecycle means "while I am using the data" and archiving means "my storage guy makes backups occasionally".
  
  Asking people to be aware of the whole cycle outlined here is a non-starter, but I think there is another approach to achieve what we want... dramatic pause [to be continued]
  
  What parts of this cycle should the individual be responsible for vs which parts are places where help is needed from the institution?
  
  RIT data curation jargon lifecycle critique opinion responsibility
2. aculich 31 Jan 2014
  
  in Public
  
  Data represent important products of the scientific enterprise that are, in many cases, of equivalent or greater value than the publications that are originally derived from the research process. For example, addressing many of the grand challenge scientific questions increasingly requires collaborative research and the reuse , integration, and synthesis of data.
  
  Who else might care about this other than Grand Challenge Question researchers?
  
  data curation grand challenge questions RIT
3. aculich 31 Jan 2014
  
  in Public
  
  Journals and sponsors want you to share your data
  
  What is the sharing standard? What are the consequences of not sharing? What is the enforcement mechanism?
  
  There are three primary sharing mechanisms I can think of today: email, usb stick, and dropbox (née ftp).
  
  The dropbox option is supplanting ftp which comes from another era, but still satisfies an important niche for larger data sets and/or higher-volume or anonymous traffic.
  
  Dropbox, email and usb are all easily accessible parts of the day-to-day consumer workflow; they are all trivial to set up without institutional support or, importantly, permission.
  
  An email account is already provisioned by default for everyone or, if the institutional email offerings are not sufficient, a person may easily set up a 3rd-party email account with no permission or hassle.
  
  Data management alternatives to these three options will have slow or no adoption until the barriers to access and use are as low as email; the cost of entry needs to be no more than *a web browser, an email address, and no special permission required".
  
  RIT data curation data management data sharing sharing standards barriers adoption
4. aculich 31 Jan 2014
  
  in Public
  
  An effective data management program would enable a user 20 years or longer in the future to discover , access , understand, and use particular data [ 3 ]. This primer summarizes the elements of a data management program that would satisfy this 20-year rule and are necessary to prevent data entropy .
  
  Who cares most about the 20-year rule? This is an ideal that appeals to some, but in practice even the most zealous adherents can't picture what this looks like in some concrete way-- except in the most traditional ways: physical paper journals in libraries are tangible examples of the 20-year rule.
  
  Until we have a digital equivalent for data I don't blame people looking for tenure or jobs for not caring about this ideal if we can't provide a clear picture of how to achieve this widely at an institutional level. For digital materials I think the picture people have in their minds is of tape backup. Maybe this is generational? New generations not exposed widely to cassette tapes, DVDs, and other physical media that "old people" remember, only then will it be possible to have a new ideal that people can see in their minds-eye.
  
  RIT data curation data management 20-year rule critique opinion ideals vision tangible jargon data entropy tape backup
5. aculich 31 Jan 2014
  
  in Public
  
  A key component of data management is the comprehensive description of the data and contextual information that future researchers need to understand and use the data. This description is particularly important because the natural tendency is for the information content of a data set or database to undergo entropy over time (i.e. data entropy ), ultimately becoming meaningless to scientists and others [ 2 ].
  
  I agree with the key component mentioned here, but I feel the term data entropy is an unhelpful crutch.
  
  RIT data curation data management key component jargon data entropy
6. aculich 31 Jan 2014
  
  in Public
  
  data entropy Normal degradation in information content associated with data and metadata over time (paraphrased from [ 2 ]).
  
  I'm not sure what this really means and I don't think data entropy is a helpful term. Poor practices certainly lead to disorganized collections of data, but I think this notion comes from a time when people were very concerned about degradation of physical media on which data is stored. That is, of course, still a concern, but I think the term data entropy really lends itself as an excuse for people who don't use good practices to manage data and is a cover for the real problem which is a kind of data illiteracy in much the same way we also face computational illiteracy widely in the sciences. Managing data really is hard, but let's not mask it with fanciful notions like data entropy.
  
  RIT data curation jargon data entropy entropy illiteracy critique opinion
7. aculich 31 Jan 2014
  
  in Public
  
  Although data management plans may differ in format and content, several basic elements are central to managing data effectively.
  
  What are the "several basic elements?"
  
  RIT data curation question
8. aculich 31 Jan 2014
  
  in Public
  
  By documenting your data and recommending appropriate ways to cite your data, you can be sure to get credit for your data products and their use
  
  Citation is an incentive. An answer to the question "What's in it for me?"
  
  RIT data curation citation incentive
9. aculich 31 Jan 2014
  
  in Public
  
  This primer describes a few fundamental data management practices that will enable you to develop a data management plan, as well as how to effectively create, organize, manage, describe, preserve and share data
  
  Data management practices:
  
  create
  
  organize
  
  manage
  
  describe
  
  preserve
  
  share
  
  RIT data curation data management practices
10. aculich 31 Jan 2014
  
  in Public
  
  The goal of data management is to produce self-describing data sets. If you give your data to a scientist or colleague who has not been involved with your project, will they be able to make sense of it? Will they be able to use it effectively and properly?
  
  RIT data curation question
Visit annotations in context

Tags

lifecycle

opinion

question

sharing standards

data entropy

key component

barriers

critique

RIT

20-year rule

responsibility

grand challenge questions

data curation

data sharing

citation

incentive

data management

tape backup

vision

tangible

jargon

practices

entropy

ideals

adoption

illiteracy

Annotators

aculich

URL

dataone.org/sites/all/documents/DataONE_BP_Primer_020212.pdf
www.alexandria.ucsb.edu www.alexandria.ucsb.edu

Untitled document

14
1. aculich 31 Jan 2014
  
  in Public
  
  One respondent noted that NSF doesn't have an enforcement policy. This is presumably true of other mandate sources as well, and brings up the related and perhaps more significant problem that mandates are not always (if they are ever) accompanied by the funding required to satisfy them. Another respondent wrote that funding agencies expect universities to contribute to long-term data storage.
  
  RIT data curation mandate enforcement
2. aculich 31 Jan 2014
  
  in Public
  
  Data management activities, grouped. The data management activities mentioned by the survey can be grouped into five broader categories: "storage" (comprising backup or archival data storage, identifying appropriate data repositories, day-to-day data storage, and interacting with data repositories); "more information" (comprising obtaining more information about curation best practices and identifying appropriate data registries and search portals); "metadata" (comprising assigning permanent identifiers to data, creating and publishing descriptions of data, and capturing computational provenance); "funding" (identifying funding sources for curation support); and "planning" (creating data management plans at proposal time). When the survey results are thus categorized, the dominance of storage is clear, with over 80% of respondents requesting some type of storage-related help. (This number may also reflect a general equating of curation with storage on the part of respondents.) Slightly fewer than 50% of respondents requested help related to metadata, a result explored in more detail below.
  
  Categories of data management activities:
  
  storage
  
  backup/archival data storage
  
  identifying appropriate data repositories
  
  day-to-day data storage
  
  interacting with data repositories
  
  more information
  
  obtaining more information about curation best practices
  
  identifying appropriate data registries
  
  search portals
  
  metadata
  
  assigning permanent identifiers to data
  
  creating/publishing descriptions of data
  
  capturing computational provenance
  
  funding
  
  identifying funding sources for curation support
  
  planning
  
  creating data management plans at proposal time
  
  RIT data curation data management categories
3. aculich 30 Jan 2014
  
  in Public
  
  Data management activities, grouped. The data management activities mentioned by the survey can be grouped into five broader categories: "storage" (comprising backup or archival data storage, identifying appropriate data repositories, day-to-day data storage, and interacting with data repositories); "more information" (comprising obtaining more information about curation best practices and identifying appropriate data registries and search portals); "metadata" (comprising assigning permanent identifiers to data, creating and publishing descriptions of data, and capturing computational provenance); "funding" (identifying funding sources for curation support); and "planning" (creating data management plans at proposal time). When the survey results are thus categorized, the dominance of storage is clear, with over 80% of respondents requesting some type of storage-related help. (This number may also reflect a general equating of curation with storage on the part of respondents.) Slightly fewer than 50% of respondents requested help related to metadata, a result explored in more detail below.
  
  Storage is a broad topic and is a very frequently mentioned topic in all of the University-run surveys.
  
  http://www.alexandria.ucsb.edu/~gjanee/dc@ucsb/survey/plots/q4.2.png
  
  Highlight by Chris during today's discussion.
  
  RIT data curation diagram
4. aculich 30 Jan 2014
  
  in Public
  
  Distribution of departments with respect to responsibility spheres. Ignoring the "Myself" choice, consider clustering the parties potentially responsible for curation mentioned in the survey into three "responsibility spheres": "local" (comprising lab manager, lab research staff, and department); "campus" (comprising campus library and campus IT); and "external" (comprising external data repository, external research partner, funding agency, and the UC Curation Center). Departments can then be positioned on a tri-plot of these responsibility spheres, according to the average of their respondents' answers. For example, all responses from FeministStds (Feminist Studies) were in the campus sphere, and thus it is positioned directly at that vertex. If a vertex represents a 100% share of responsibility, then the dashed line opposite a vertex represents a reduction of that share to 20%. For example, only 20% of ECE's (Electrical and Computer Engineering's) responses were in the campus sphere, while the remaining 80% of responses were evenly split between the local and external spheres, and thus it is positioned at the 20% line opposite the campus sphere and midway between the local and external spheres. Such a plot reveals that departments exhibit different characteristics with respect to curatorial responsibility, and look to different types of curation solutions.
  
  This section contains an interesting diagram showing the distribution of departments with respect to responsibility spheres:
  
  http://www.alexandria.ucsb.edu/~gjanee/dc@ucsb/survey/plots/q2.5.png
  
  RIT data curation responsibility departments diagram distribution
5. aculich 30 Jan 2014
  
  in Public
  
  In the course of your research or teaching, do you produce digital data that merits curation? 225 of 292 (77%) of respondents answered "yes" to this first question, which corresponds to 25% of the estimated population of 900 faculty and researchers who received the survey.
  
  For those who do not feel they have data that merits curation I would at least like to hear a description of the kinds of data they have and why they feel it does not need to be curated?
  
  For some people they may already be using well-curated data sets; on the other hand there are some people who feel their data may not be useful to anyone outside their own research group, so there is no need to curate the data for use by anyone else even though under some definition of "curation" there may be important unmet curation needs for internal-use only that may be visible only to grad students or researchers who work with the data hands-on daily.
  
  UPDATE: My question is essentially answered here: https://hypothes.is/a/xBpqzIGTRaGCSmc_GaCsrw
  
  RIT data curation survey question question merit
6. aculich 30 Jan 2014
  
  in Public
  
  Responsibility, myself versus others. It may appear that responses to the question of responsibility are bifurcated between "Myself" and all other parties combined. However, respondents who identified themselves as being responsible were more likely than not to identify additional parties that share that responsibility. Thus, curatorial responsibility is seen as a collaborative effort. (The "Nobody" category is a slight misnomer here as it also includes non-responses to this question.)
  
  This answers my previous question about this survey item:
  
  https://hypothes.is/a/QrDAnmV8Tm-EkDuHuknS2A
  
  RIT data curation survey question responsibility collaborative effort answer
7. aculich 30 Jan 2014
  
  in Public
  
  Awareness of data and commitment to its preservation are two key preconditions for successful data curation.
  
  Great observation!
  
  RIT data curation survey question preconditions awareness commitment
8. aculich 30 Jan 2014
  
  in Public
  
  Which parties do you believe have primary responsibility for the curation of your data? Almost all respondents identified themselves as being personally responsible.
  
  For those that identify themselves as personally responsible would they identify themselves (or their group) as the only ones responsible for the data? Or is there a belief that the institution should also be responsible in some way in addition to themselves?
  
  RIT data curation survey question question responsibility
9. aculich 30 Jan 2014
  
  in Public
  
  Availability of the raw survey data is subject to the approval of the UCSB Human Subjects Committee.
  
  http://www.research.ucsb.edu/compliance/human-subjects/
  
  RIT data curation survey data restrictions human subjects
10. aculich 30 Jan 2014
  
  in Public
  
  Survey design The survey was intended to capture as broad and complete a view of data production activities and curation concerns on campus as possible, at the expense of gaining more in-depth knowledge.
  
  Summary of the survey design
  
  RIT data curation summary survey survey design
11. aculich 30 Jan 2014
  
  in Public
  
  Researchers may be underestimating the need for help using archival storage systems and dealing with attendant metadata issues.
  
  In my mind this is a key challenge: even if people can describe what they need for themselves (that in itself is a very hard problem), what to do from the infrastructure standpoint to implement services that aid the individual researcher and also aid collaboration across individuals in the same domain, as well as across domains and institutions... in a long-term sustainable way is not obvious.
  
  In essence... how do we translate needs that we don't yet fully understand into infrastructure with low barrier to adoption, use, and collaboration?
  
  RIT data curation key challenge question
12. aculich 30 Jan 2014
  
  in Public
  
  Researchers view curation as a collaborative activity and collective responsibility.
  
  RIT data curation collaboration responsibility
13. aculich 30 Jan 2014
  
  in Public
  
  To summarize the survey's findings: Curation of digital data is a concern for a significant proportion of UCSB faculty and researchers. Curation of digital data is a concern for almost every department and unit on campus. Researchers almost universally view themselves as personally responsible for the curation of their data. Researchers view curation as a collaborative activity and collective responsibility. Departments have different curation requirements, and therefore may require different amounts and types of campus support. Researchers desire help with all data management activities related to curation, predominantly storage. Researchers may be underestimating the need for help using archival storage systems and dealing with attendant metadata issues. There are many sources of curation mandates, and researchers are increasingly under mandate to curate their data. Researchers under curation mandate are more likely to collaborate with other parties in curating their data, including with their local labs and departments. Researchers under curation mandate request more help with all curation-related activities; put another way, curation mandates are an effective means of raising curation awareness. The survey reflects the concerns of a broad cross-section of campus.
  
  Summary of survey findings.
  
  RIT data curation survey findings summary
14. aculich 30 Jan 2014
  
  in Public
  
  In 2012 the Data Curation @ UCSB Project surveyed UCSB campus faculty and researchers on the subject of data curation, with the goals of 1) better understanding the scope of the digital curation problem and the curation services that are needed, and 2) characterizing the role that the UCSB Library might play in supporting curation of campus research outputs.
  
  1) better understanding the scope of the digital curation problem and the curation services that are needed
  
  2) characterizing the role that the UCSB Library might play in supporting curation of campus research outputs.
  
  RIT data curation goals
Visit annotations in context

Tags

survey question

restrictions

findings

question

awareness

preconditions

RIT

diagram

responsibility

data curation

merit

survey design

departments

key challenge

collaboration

goals

data management

commitment

categories

survey data

enforcement

mandate

collaborative effort

answer

survey

summary

human subjects

distribution

Annotators

aculich

URL

alexandria.ucsb.edu/~gjanee/dc@ucsb/survey/
en.wikioffuture.org en.wikioffuture.org

Untitled document

2
1. aculich 16 Jan 2014
  
  in Public
  
  The project will develop an analysis package in the open-source language R and complement it with a step-by-step hands-on manual to make tools available to a broad, international user community that includes academics, scientists working for governments and non-governmental organizations, and professionals directly engaged in conservation practice and land management. The software package will be made publicly available under http://www.clfs.umd.edu/biology/faganlab/movement/.
  
  Output of the project:
  
  analysis package written in R
  
  step-by-step hands-on manual
  
  make tools available to a broad, international community
  
  software made publicly available
  
  Question: What software license will be used? The Apache software license is potentially a good choice here because it is a strong open source license supported by a wide range of communities with few obligations or barriers to access/use which supports the goal of a broad international audience.
  
  Question: Will the data be made available under a license, as well? Maybe a CC license of some sort?
  
  bioinformatics data software licenses question
2. aculich 16 Jan 2014
  
  in Public
  
  These species represent not only different types of movement (on land, in air, in water) but also different types of relocation data (from visual observations of individually marked animals to GPS relocations to relocations obtained from networked sensor arrays).
  
  Movement types:
  
  land
  
  air
  
  water
  
  Types of relocation data:
  
  visual observations
  
  GPS
  
  networked sensor arrays
  
  bioinformatics movement types relocation data sources
Visit annotations in context

Tags

software

relocation data sources

licenses

bioinformatics

question

movement types

data

Annotators

aculich

URL

en.wikioffuture.org/ABI_Innovation:_Informatics_Tools_for_Population-level_Movement_Dynamics
onlinelibrary.wiley.com onlinelibrary.wiley.com

Untitled document

2
1. aculich 16 Jan 2014
  
  in Public
  
  Once a searchable atlas has been constructed there are fundamentally two approaches that can be used to analyze the data: one visual, the other mathematical.
  
  data analysis visual analysis mathematical analysis
2. aculich 16 Jan 2014
  
  in Public
  
  The initial inputs for deriving quantitative information of gene expression and embryonic morphology are raw image data, either of fluorescent proteins expressed in live embryos or of stained fluorescent markers in fixed material. These raw images are then analyzed by computational algorithms that extract features, such as cell location, cell shape, and gene product concentration. Ideally, the extracted features are then recorded in a searchable database, an atlas, that researchers from many groups can access. Building a database with quantitative graphical and visualization tools has the advantage of allowing developmental biologists who lack specialized skills in imaging and image analysis to use their knowledge to interrogate and explore the information it contains.
  
  1) Initial input is raw image data 2) feature extraction on raw image data 3) extracted features stored in shared, searchable database 4) database available to researchers from many groups 5) quantitative graphical and visualization tools allow access to those without specialized skill in imaging and image analysis
  
  raw image data database atlases access
Visit annotations in context

Tags

data analysis

atlases

visual analysis

access

mathematical analysis

database

raw image data

Annotators

aculich

URL

onlinelibrary.wiley.com/doi/10.1002/wdev.107/full
about.jstor.org about.jstor.org

Untitled document

1
1. aculich 12 Jan 2014
  
  in Public
  
  We regularly provide scholars with access to content for this purpose. Our Data for Research site (http://dfr.jstor.org)
  
  The access to this is exceedingly slow. Note that it is still in beta.
  
  publications research data
Visit annotations in context

Tags

publications

research

data

Annotators

aculich

URL

about.jstor.org/news/jstor-statement-misuse-incident-and-criminal-case
Nov 2013
igor.gold.ac.uk igor.gold.ac.uk

Untitled document

1
1. zukile 09 Nov 2013
  
  in Public
  
  Not even gephi is very good at visualising temporal networks.
  
  Hmm I disagree. In teh version of Gephi very thing is cool.
  
  Data Datavisualisation
Visit annotations in context

Tags

Data

Datavisualisation

Annotators

zukile

URL

igor.gold.ac.uk/~so301gb/blog/

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators