Can we do validation in javascript?
That's a really good question. Nowadays, there are no good data processing tools in Javascript. We could be the ones building it as a key-selling point for Frictionless Data in general.
Can we do validation in javascript?
That's a really good question. Nowadays, there are no good data processing tools in Javascript. We could be the ones building it as a key-selling point for Frictionless Data in general.
There’s also scaling up: pandas is awesome but it doesn’t do everything. Suppose you now want to use some other tool and you want to pass your data across (as csv). You will need to share some basic info about that data e.g. the schema, dialect etc! Wouldn’t it be nice if you could do that in a standard way … and all in one (not 3-4 different objects but one combined object) … enter Table Schema, Data Resource etc.
Really good selling point.
What you would like is something that just did type inference or dialect inference but which you then compose with other tools to build more complex functionality (this is the zen of unix again). This is scaling down.
On the other end, I feel this is a better argument in favor of Data Packages. I would easily recommend people to use Table Schemas all the time, and to start really simple data processing jobs in a lightweight data-package-py. Later on, to migrate to Pandas when it gets needed.
I don't think we already have:
We might have the latter, but I haven't seen a good tutorial of it.
starting data scientists using pandas to build simple data wrangling pipelines (e.g. load this csv, delete this row etc). This is quick but it also means that you now need to install pandas just to do that (and what happens if you want to start doing integrated testing, you are now installing a very substantial lib every time etc).
I'd say that this is not a strong reason against Pandas/in favor of Data Packages. Every data scientist working in Python will already have Pandas installed, and will use it all the time. It's a common pattern in data analysis, already.
monolithicness (bigger issue): pandas is an amazing tool. It also requires numpy just to run.
In that case, if this is one of the issues we want to issue, we should aim for not reproducing in Frictionless Data libraries. I'm pretty sure that datapackage-py has multiple optional dependencies that are automatically installed.