Hypothesis

5 Matching Annotations

May 2020
hackmd.io hackmd.io

Guide - Frictionless Data - Scratch (Plan, Drafting etc) - HackMD

5
1. irio_datopian 11 May 2020
  
  in Public
  
  Can we do validation in javascript?
  
  That's a really good question. Nowadays, there are no good data processing tools in Javascript. We could be the ones building it as a key-selling point for Frictionless Data in general.
2. irio_datopian 11 May 2020
  
  in Public
  
  There’s also scaling up: pandas is awesome but it doesn’t do everything. Suppose you now want to use some other tool and you want to pass your data across (as csv). You will need to share some basic info about that data e.g. the schema, dialect etc! Wouldn’t it be nice if you could do that in a standard way … and all in one (not 3-4 different objects but one combined object) … enter Table Schema, Data Resource etc.
  
  Really good selling point.
3. irio_datopian 11 May 2020
  
  in Public
  
  What you would like is something that just did type inference or dialect inference but which you then compose with other tools to build more complex functionality (this is the zen of unix again). This is scaling down.
  
  On the other end, I feel this is a better argument in favor of Data Packages. I would easily recommend people to use Table Schemas all the time, and to start really simple data processing jobs in a lightweight data-package-py. Later on, to migrate to Pandas when it gets needed.
  
  I don't think we already have:
  
  a really lightweight data-package-py, without any required dependencies.
  
  an easy way to migrate to Pandas, when needed.
  
  We might have the latter, but I haven't seen a good tutorial of it.
4. irio_datopian 11 May 2020
  
  in Public
  
  starting data scientists using pandas to build simple data wrangling pipelines (e.g. load this csv, delete this row etc). This is quick but it also means that you now need to install pandas just to do that (and what happens if you want to start doing integrated testing, you are now installing a very substantial lib every time etc).
  
  I'd say that this is not a strong reason against Pandas/in favor of Data Packages. Every data scientist working in Python will already have Pandas installed, and will use it all the time. It's a common pattern in data analysis, already.
5. irio_datopian 11 May 2020
  
  in Public
  
  monolithicness (bigger issue): pandas is an amazing tool. It also requires numpy just to run.
  
  In that case, if this is one of the issues we want to issue, we should aim for not reproducing in Frictionless Data libraries. I'm pretty sure that datapackage-py has multiple optional dependencies that are automatically installed.
Visit annotations in context

Annotators

irio_datopian

URL

hackmd.io/Nhbzig5UTbKBxCNQWMsong

Annotators

URL