77 Matching Annotations
  1. Aug 2020
    1. have pretty well-specif

      here!

    2. Explore useful behavior, then nail it down with tests, and only then optimize the heck out of it

      foo!

  2. Aug 2018
  3. Jul 2018
  4. Apr 2018
    1. recruita cohort of data scientists andothers with expertise in areas such as project management, systems engineering, and computer sciencefrom the private sectorand academia for short-term (1- to 3-year) national service sabbaticals.

      What is the incentive here?

    2. Establish partnerships to allow systems integrators/engineers from the private sector to refine and optimize prototype tools developed in academia to make them efficient, cost-effective, and widely useful for biomedical research.

      The data science ecosystem rewards partnerships but not for refining and optimizing - I take a lot of inspiration from the Python/Jupyter and R/Rstudio ecosystems, which are less focused on efficiency and cost-effectiveness and more focused on building on top of open source. From this perspective, actually supporting significant open source software development efforts would grow the ecosystem substantially. But, conversely, most faculty are not good at writing software or managing software development, so you must provide a career path to support research-integrated computational faculty.

    3. Use appropriate funding mechanism, scientific review, and management for tool development.

      I am not aware of any successful rubrics for this.

  5. Mar 2018
    1. To make the best tools available to the research community, NIH will leverage existingvibrant tool-sharing systems to helpestablish a morecompetitive “marketplace” for tool developers and providers than currently exists. By separating the evaluation and funding for tool development and dissemination from support for databasesand knowledgebases, innovative new tools and methods shouldrapidly overtake and supplant older, obsolete ones.The goal of creating a more competitive marketplace, in which open-source programs, workflows, and other applicationscan be provided directly to users, could also allow direct linkages to key data resourcesfor real-time data analysis.

      The Marketplace analogy seems flawed here. We do not need a place to purchase tools, necessarily; we need a place to discover and evaluate them. The latter cannot be left up to the tool authors, if we want good, honest science to be done.

    2. One of the first steps NIH is taking to modernize the biomedical research data ecosystem is funding the NIH Data Commons pilot:

      Yay w00t

    3. There is currently no generalsystem to transform, or harden,innovative algorithmsand tools created by academic scientists into enterprise-ready resourcesthat meet industry standards of ease of use and efficiency of operation.

      I'd be curious to know what these industry standards are.

  6. Feb 2018
    1. ls -a

      ls -l

    2. following

      make sure people are starting from their dc_sample_data/untimmed_fastq directory

    3. redirect output

      Not yet discussed

    4. usr

      This should be /usr/bin

  7. Mar 2017
  8. Feb 2017
  9. Jan 2017
    1. This is a pretty swanky list of schools: only one of them has "State" in its name.

      Hi Greg,

      doesn't seem like your disqus comments are working, so I’m reverting to hypothesis ;).

      good sentiments as usual, but I'm frustrated by the analysis.

      • I don't know what the word "State" means to you if it’s in the uni name. Michigan State has one of the top ranked micro departments in the country. Conversely, UC Davis is more or less the ag school of the UCs and hence "lower status" in some people's eyes. I agree that the tendency is for “State” universities to be viewed as lower tier than “University” universities but this isn’t true across the board, not is a top-tier school necessarily all that research-relevant in any particular area. A better proxy might be department rank.

      • At least two of the Moore DDD folk (me and Ethan) received the Moore DDD Investigator award while at schools that (by your criteria) are less prestigious than our current locations.

      • A bunch of the Moore DDD investigators, as well as the Data Science Environments overall, recruit from wherever - so they're likely to be taking in postdocs with a diversity of experience. (This is certainly true of me.)

      • Data Carpentry and Project Jupyter (two of the Practice grants that I’d guess were invited to send people) were founded in part by people at 'state' schools (Cal State and Michigan State).

      • As Brian McFee pointed out on Twitter, there's likely to be a lot of exchange between these labs and other universities, including teaching intensive universities.

      I'd agree with the post more if a distinction was made between R1 and non-R1, but that introduces another set of complications because I don’t think there aren’t that many postdocs at most non-R1 institutions. Perhaps junior faculty from non-R1 institutions should be invited? I’ve had great interactions with faculty from teaching-intensive universities at some of my workshops and am hoping to work with a much larger group this summer.

      All in all the situation is a lot more complicated than represented here…

      A final thought - if you were the Moore Foundation, and you cared about data science, and you cared about democratization of data science, you could maybe fund some people who could then reach out in whatever ways they felt appropriate to do the kinds of things you’re talking about. Like, say, Data Carpentry, my lab, and Ethan’s lab. Places like the Data Science Environments could support people to become Software Carpentry Instructors then teach hundreds or thousands of people at many institutions about data science. Like, say, a number of people you could name. :)

      —titus

  10. Dec 2016
  11. Sep 2016
    1. ## Project Organization

      Formatting typo.

    2. CSV is good for tabular data, JSON, YAML, or XML for non-tabular data such as graphs5, and HDF5 for certain kinds of structured data.

      grammar is weird in this sentence

    3. the raw data

      It's no longer raw at this point, is it?

    4. Unlike some other guides,

      Suggest eliminating this.

    1. Ten Simple Rules for Making Research Software More Robust

      Missing rules/points:

      • make it open source if possible. It's implicit in a lot of what you say but people will always want to be able to adjust your software in ways you didn't intend.
      • provide a link to a user forum or issue tracker. This is tough to do in some cases b/c you might not want to provide support, but specifying a place that users can go to talk about your software might be useful.
    2. Reuse software (within reason).

      "Build on other software" perhaps?

      I am also not sure this leads to robustness :)

    3. There has been extended discussion over the past few years of the sustainability of research software, but this question is meaningless in isolation: any piece of software can be sustained if its users are willing to put in enough effort. The real equation is the ratio between the skill and effort available, and the ease with which software can be installed, understood, used, maintained, and extended. Following the ten rules we outline here reduce the denominator, and thereby enable researchers to build on each other’s work more easily.

      This seems out of line with the rest of the article, which was about robustness and reuse, which is, to my mind, somewhat separate from sustainability. I would ditch the sustainability argument and summarize according to the original intent of this paper...

    4. Many

      For example, ... (also, not limited to biology :)

    5. users will want to be able to turn off the feature entirely to have a baseline comparison.

      I don't entirely understand this. Parameter == feature?

    6. To strike a balance between these two, developers should document all of the packages that theirs depends on, preferably in a machine-readable form

      Typically this is something that a user could also use (via pip, apt-get, whatever) instead of having the list of software in README. Doesn't this get a bit redundant?

    7. Unfortunately, the interface between two software packages can be a source of considerable frustration. Support requests descend into debugging errors produced by the other project.

      ...hence semantic versioning, suggested above!

    8. Using popular projects reduces the amount of code that needs to be maintained and adds the strength of vetted software to the final program.

      "Popular" doesn't always equate to "vetted".

    9. the

      "and the results"

    10. Programs

      Maybe "A program's authors"?

    11. Github

      GitHub (capital H)

    12. Sourceforge

      No.

    13. Bitbutcket

      typo

    14. The README file for [khmer]2 is a good model:

      thank you!

    15. If the dependency must remain closed, place it at an internal location on shared disk, remove all write permissions, and link to it from your README, although this method is discouraged because of the potential for sharing and risks accidental removal.

      I don't really understand what this means. This is for software that will be used internally only, or...?

    16. Often, multiple libraries exist with the same or very similar names,

      They do?

    17. This is entirely reasonable as long as it is properly documented.

      It is?

    18. Introduction

      My overall sense of this Introduction is that you are primarily preaching to the choir - only people who are already convinced that this is a problem will buy. While I personally appreciate the op-ed it's not going to be convincing to a broader audience. That may be OK, but I think it could be made about half as long and twice as convincing by focusing on integrating the citations you provide into a coherent, well-supported argument.

    19. That said, not every coding effort requires such rigor:

      This does not follow from previous paragraph. In fact, I'm not sure what "such rigor" refers to?

    20. many duplicated efforts

      But there are other reasons for duplication, including the drive to publish, no? It's not all about robustness. In fact, I think it would be bad for science if there was only one of each tool.

    21. Often, that software will be undocumented and work in unexpected ways (if it works at all). It will often rely on nonexistent paths or resources, be tuned for a single dataset, or simply be an older version than was used in published papers. The new user is then faced with two unpalatable options: hack the existing code to make it work, or start over. Being unable to

      Some citations to support this would be useful. How big a problem is this? Do we know?

    22. Everyone with a few years of experience feels a tremor of fear when told to use a graduated student’s code to analyze their data.

      citation?

  12. Feb 2016
    1. For example, if you're reading this blog post in a hypothes.is compatible browser (try Firefox or Google Chrome), you should see this sentence highlighted; click on it and you should see an annotation window pop out with some text in it.

      Hi, mom! Look, I'm attaching arbitrary text, with tags, to some content somewhere on the Internet!

  13. Dec 2015
  14. Nov 2015
    1. Topics

      Code coverage is omitted entirely but is incredibly useful for targeting new tests.

    2. simply that software cannot be called scientific unless it has been validated.

      So, I put in a single test, and my software is validated - right? Sarcasm aside, I think you should provide some additional perspective on what "validated" means here.

      One thought might be to include a mention of code coverage - if 80% of your code isn't run by any automated tests, then you really have no expectation as to whether that code is correct. Or some such statement.

    3. fix a bug a second time

      "reintroduce an old bug"?

    4. If the software that governs the computational or physical experiment is wrong, then disasters (such as false claims in a publication) will result.

      This is a little too strong. Scientifically speaking, if you're relying on the output of your software as the sole basis for a claim, you should seek confirmation from elsewhere before believing it!

    1. This issue is further complicated by recent decisions not to give governments backdoor access to online properties to allow law enforcement — with proper legal instruments — to analyze user content. These decisions must be reconsidered and debated in view of evolving threats.

      Given that the Paris attackers operated in the open, I am confused by this comment.

      https://theintercept.com/2015/11/18/signs-point-to-unencrypted-communications-between-terror-suspects/

      Also note that many of the attackers were already known to law enforcement:

      https://www.washingtonpost.com/world/middle_east/who-were-the-paris-attackers-many-crossed-officials-radar/2015/11/23/982a1e5e-91e8-11e5-befa-99ceebcbb272_story.html

      So how would giving law enforcement more access to communications have helped here?

    1. “When you see something that is technically sweet, you go ahead and do it, and you argue about what to do about it only after you have had your technical success.”

      Never heard this quote before, but it's great!

    2. When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.”

      One of my favorite quotes!

    3. “An ultraintelligent machine could design even better machines,” he wrote. “There would then unquestionably be an ‘intelligence explosion,’ and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control.

      An irreversible step.

    4. It is not far-fetched to suppose that there might be some possible technology which is such that (a) virtually all suffi­ciently advanced civilizations eventually discover it and (b) its discovery leads almost universally to existential disaster.”

      This is a pretty good point - but then why would we not see AI-based civilizations?

    1. Collaboration

      Missing: a simple example/test data set to run as a smoke test ("plug it in; does smoke emerge?") This is where my unit tests usually start anyway.

    2. ermissive license (e.g., MIT/BSD/Apache) for software

      We're over GPL, eh? What about mentioning "OSI license" instead of just these three?

    3. And a notes.txt file containing the to-do list

      What the heck is a notes.txt? I've never seen this convention mentioned before :)

    4. Use data structures instead of variables like score1, score2, score3, etc.

      Either you're talking about 'scores' as a list, or composite data structures (in which case score1, score2, score3 don't make sense as an example) - which? Both? Neither?

    1. “Now people aren’t talking on Facebook and Twitter,” Ms. Khalil, now a private consultant, said. “They are communicating through apps that are encrypted, and it’s very difficult for law enforcement, because even when we get warrants and subpoenas, we still can’t get information. Unless there are certain legislative changes made, we’ll have to go back to the basics of human intelligence collection.
    1. This workshop was actually cancelled

      This should show up only on October site.

    2. Sticky notes and how they work... + Minute Cards

      This should show up on both May and October sites.

    3. This workshop was given on May 4th and 5th, 2015,

      This is a comment that will only show up on the May site!

    1. The separate repository,

      Testing highlighting.

    2. Testing! This is a test of the emergency annotating system.