3,479 Matching Annotations
  1. Apr 2020
    1. Data Erasure and Storage Time The personal data of the data subject will be erased or blocked as soon as the purpose of storage ceases to apply. The data may be stored beyond that if the European or national legislator has provided for this in EU regulations, laws or other provisions to which the controller is subject. The data will also be erased or blocked if a storage period prescribed by the aforementioned standards expires, unless there is a need for further storage of the data for the conclusion or performance of a contract.
    2. The data is stored in log files to ensure the functionality of the website. In addition, the data serves us to optimize the website and to ensure the security of our information technology systems. An evaluation of the data for marketing purposes does not take place in this context. The legal basis for the temporary storage of the data and the log files is Art. 6 para. 1 lit. f GDPR. Our legitimate interests lie in the above-mentioned purposes.
    3. The temporary storage of the IP address by the system is necessary to enable the website to be delivered to the user's computer. For this the IP address of the user must remain stored for the duration of the session.
    4. In the case of storing the data in log files, this is the case after seven days at the latest. Further storage is possible; in this case, the IP addresses of the users are erased or anonymized, so that an association of the calling client is no longer possible.
    5. The collection of the data for the provision of the website and the storage of the data in log files is absolutely necessary for the operation of the website. Consequently, there is no possibility of objection on the part of the user.
    6. The legal basis for the processing of personal data using cookies is Art. 6 para. 1 lit. f GDPR. Our legitimate interests lie in the above-mentioned purposes.
    1. Someone, somewhere has screwed up to the extent that data got hacked and is now in the hands of people it was never intended to be. No way, no how does this give me license to then treat that data with any less respect than if it had remained securely stored and I reject outright any assertion to the contrary. That's a fundamental value I operate under
    1. A "breach" is an incident where data is inadvertently exposed in a vulnerable system, usually due to insufficient access controls or security weaknesses in the software.
    1. One mistake that we made when creating the import/export experience for Blogger was relying on one HTTP transaction for an import or an export. HTTP connections become fragile when the size of the data that you're transferring becomes large. Any interruption in that connection voids the action and can lead to incomplete exports or missing data upon import. These are extremely frustrating scenarios for users and, unfortunately, much more prevalent for power users with lots of blog data.
    2. data liberation is best provided through APIs, and data portability is best provided by building code using those APIs to perform cloud-to-cloud migration
    3. The data liberation effort focuses specifically on data that could hinder users from switching to another service or competing product—that is, data that users create in or import into Google products
    4. At Google, our attitude has always been that users should be able to control the data they store in any of our products, and that means that they should be able to get their data out of any product. Period. There should be no additional monetary cost to do so, and perhaps most importantly, the amount of effort required to get the data out should be constant, regardless of the amount of data. Individually downloading a dozen photos is no big inconvenience, but what if a user had to download 5,000 photos, one at a time, to get them out of an application? That could take weeks of their time.
    5. Want to keep your users? Just make it easy for them to leave.
    6. Users are starting to realize, however, that as they store more and more of their personal data in services that are not physically accessible, they run the risk of losing vast swaths of their online legacy if they don't have a means of removing their data.
    1. The second way we avoid locking you into 1Password is through the ability to export data to a more neutral format. Not all versions are yet where we want them to be with respect to export, and we’re working on that. But there is usually some path, if not always a simple click away, to export your 1Password data.
    1. In addition to strings, Redis supports lists, sets, sorted sets, hashes, bit arrays, and hyperloglogs. Applications can use these more advanced data structures to support a variety of use cases. For example, you can use Redis Sorted Sets to easily implement a game leaderboard that keeps a list of players sorted by their rank.

      redis support more data structure memcached is k-v

      memCached is not highly available, beause lack of replication support like redis

    1. “Big Data reveals insights with a particular range of data points, while Thick Data reveals the social context of and connections between data points.”21
    2. Adapting Tricia Wang’s notion of “thick data,”

      https://dscout.com/people-nerds/people-are-your-data-tricia-wang So what is thick data? It’s a term you coined, right? "Thick data is simply my re-branding of qualitative data, or design thinking, or user research, gut intuition, tribal knowledge, sensible thinking—I’ve heard it called so many things. Being open to the unknown means you will embrace data that is thick. Ostensibly it’s data that has yet to be quantified, is data that you may not even know that you need to collect and you don’t know it until you’re in the moment being open to it, and open to the invitation of the unknown in receiving that."

    3. User subjects and data objects are treated as programmable matter, which is to say extractable matter.

      yes.

    1. Covid-19 is an emergency on such a huge scale that, if anonymity is managed appropriately, internet giants and social media platforms could play a responsible part in helping to build collective crowd intelligence for social good, rather than profit
    2. Google's move to release location data highlights concerns around privacy. According to Mark Skilton, director of the Artificial Intelligence Innovation Network at Warwick Business School in the UK, Google's decision to use public data "raises a key conflict between the need for mass surveillance to effectively combat the spread of coronavirus and the issues of confidentiality, privacy, and consent concerning any data obtained."
  2. Mar 2020
    1. legitimate interest triggers when “processing is necessary for the purposes of the legitimate interests pursued by the controller or by a third party, except where such interests are overridden by the interests or fundamental rights and freedoms of the data subject
    2. of the six lawful, GDPR-compliant ways companies can get the green light to process individual personal data, consent is the “least preferable.” According to guidelines in Article 29 Working Party from the European Commission, "a controller must always take time to consider whether consent is the appropriate lawful ground for the envisaged processing or whether another ground should be chosen instead." 
    3. “But, if you’re unsure or haven’t mapped out entirely your processing activities,” he said, “it’s impossible to accurately reflect what your users or clients are consenting to when they complete a consent request.”
    4. “It is unfortunate that a lot of companies are blindly asking for consent when they don’t need it because they have either historically obtained the consent to contact a user,” said digital policy consultant Kristina Podnar. “Or better yet, the company has a lawful basis for contact. Lawful basis is always preferable to consent, so I am uncertain why companies are blindly dismissing that path in favor of consent.”
    1. Data has become a “natural resource” for advertising technology. “And, just as with every other precious resource, we all bear responsibility for its consumption,”
    2. They can form a new company that handles all operations within the EU but nowhere else. This subsidiary company can license and segregate European data from the parent company,
    3. To join the Privacy Shield Framework, a U.S.-based organization is required to self-certify to the Department of Commerce and publicly commit to comply with the Framework’s requirements. While joining the Privacy Shield is voluntary, the GDPR goes far beyond it.
    1. This protects you from government overreach, and protects your privacy, and makes identity theft a lot harder.
    2. For example, personal information. Each governmental ministry might have any amount of information on a given voter. Finance knows about your tax, health knows about your medicare claims, justice knows about your criminal record, the RTA the status of your drivers licence. But they don’t know what the OTHER ministries know about you. The data is segregated. Just because one group of people knows about a given part of your life or existence, doesn’t mean that they can get access to everything else. You, by design, do not have one single file that has everything on you. The data is segregated.
    1. Users have the right to obtain (in a machine readable format) their personal data for the purpose of transferring it from one controller to another, without being prevented from doing so by the data processor.
    2. Users have the right to access their personal data and information about how their personal data is being processed. If the user requests it, data controllers must provide an overview of the categories of data being processed, a copy of the actual data and details about the processing. The details should include the purpose, how the data was acquired and with whom it was shared.
    3. The Right to access is closely linked to the Right to data portability, but these two rights are not the same.
    1. Data privacy now a global movementWe’re pleased to say we’re not the only ones who share this philosophy, web browsing companies like Brave have made it possible so you can browse the internet more privately; and you can use a search engine like DuckDuckGo to search with the freedom you deserve.
    2. our values remain the same – advocating for 100% data ownership, respecting user-privacy, being reliable and encouraging people to stay secure. Complete analytics, that’s 100% yours.
    3. Due to that, you have 100% data ownership as Matomo is hosted on your own servers and we have absolutely no way of gaining access to your data.

      Technically impossible for them to get your data if the data doesn't pass through them at all.

    4. when you choose Matomo Cloud, we acknowledge in our Terms that you own all rights, titles, and interest to your users’ data. We obtain no rights from you to your users data. This means we can’t on-sell it to third parties, we can’t claim ownership of it, you can export your data anytime

      Technically impossible for them to sell your data if the data doesn't pass through them at all.

    5. the privacy of your users is respected
    6. The relationship is between the website owner (you) and the visitor, with no external sources looking in
  3. www.graphitedocs.com www.graphitedocs.com
    1. Own Your Encryption KeysYou would never trust a company to keep a record of your password for use anytime they want. Why would you do that with your encryption keys? With Graphite, you don't have to. You own and manage your keys so only YOU can decrypt your content.
    1. it would appear impossible to require a publisher to provide information on and obtain consent for the installation of cookies on his own website also with regard to those installed by “third parties**”
    2. Our solution goes a bit further than this by pointing to the browser options, third-party tools and by linking to the third party providers, who are ultimately responsible for managing the opt-out for their own tracking tools.
    3. You are also not required to manage consent for third-party cookies directly on your site/app as this responsibility falls to the individual third-parties. You are, however, required to at least facilitate the process by linking to the relevant policies of these third-parties.
    4. the publisher would be required to check, from time to time, that what is declared by the third parties corresponds to the purposes they are actually aiming at via their cookies. This is a daunting task because a publisher often has no direct contacts with all the third parties installing cookies via his website, nor does he/she know the logic underlying the respective processing.
    5. When you think about data law and privacy legislations, cookies easily come to mind as they’re directly related to both. This often leads to the common misconception that the Cookie Law (ePrivacy directive) has been repealed by the General Data Protection Regulation (GDPR), which in fact, it has not. Instead, you can instead think of the ePrivacy Directive and GDPR as working together and complementing each other, where, in the case of cookies, the ePrivacy generally takes precedence.
    1. Decision point #2 – Do you send any data to third parties, directly or inadvertently? <img class="alignnone size-full wp-image-10174" src="https://www.jeffalytics.com/wp-content/uploads/7deb832d95678dc21cc23208d76f4144_Flowchart.png" alt="GDPR cookie consent flowchart" width="1451" height="601" srcset="https://www.jeffalytics.com/wp-content/uploads/7deb832d95678dc21cc23208d76f4144_Flowchart.png 1451w, https://www.jeffalytics.com/wp-content/uploads/7deb832d95678dc21cc23208d76f4144_Flowchart-300x124.png 300w, https://www.jeffalytics.com/wp-content/uploads/7deb832d95678dc21cc23208d76f4144_Flowchart-981x406.png 981w, https://www.jeffalytics.com/wp-content/uploads/7deb832d95678dc21cc23208d76f4144_Flowchart-761x315.png 761w, https://www.jeffalytics.com/wp-content/uploads/7deb832d95678dc21cc23208d76f4144_Flowchart-611x253.png 611w, https://www.jeffalytics.com/wp-content/uploads/7deb832d95678dc21cc23208d76f4144_Flowchart-386x160.png 386w, https://www.jeffalytics.com/wp-content/uploads/7deb832d95678dc21cc23208d76f4144_Flowchart-283x117.png 283w, https://www.jeffalytics.com/wp-content/uploads/7deb832d95678dc21cc23208d76f4144_Flowchart-600x249.png 600w, https://www.jeffalytics.com/wp-content/uploads/7deb832d95678dc21cc23208d76f4144_Flowchart-1024x424.png 1024w, https://www.jeffalytics.com/wp-content/uploads/7deb832d95678dc21cc23208d76f4144_Flowchart-50x21.png 50w, https://www.jeffalytics.com/wp-content/uploads/7deb832d95678dc21cc23208d76f4144_Flowchart-250x104.png 250w, https://www.jeffalytics.com/wp-content/uploads/7deb832d95678dc21cc23208d76f4144_Flowchart-241x100.png 241w, https://www.jeffalytics.com/wp-content/uploads/7deb832d95678dc21cc23208d76f4144_Flowchart-400x166.png 400w, https://www.jeffalytics.com/wp-content/uploads/7deb832d95678dc21cc23208d76f4144_Flowchart-350x145.png 350w, https://www.jeffalytics.com/wp-content/uploads/7deb832d95678dc21cc23208d76f4144_Flowchart-840x348.png 840w, https://www.jeffalytics.com/wp-content/uploads/7deb832d95678dc21cc23208d76f4144_Flowchart-860x356.png 860w, https://www.jeffalytics.com/wp-content/uploads/7deb832d95678dc21cc23208d76f4144_Flowchart-1030x427.png 1030w" sizes="(max-width: 1451px) 100vw, 1451px" /> Remember, inadvertently transmitting data to third parties can occur through the plugins you use on your website. You don't necessarily have to be doing this proactively. If the answer is “Yes,” then to comply with GDPR, you should use a cookie consent popup.
    1. You must clearly identify each party that may collect, receive, or use end users’ personal data as a consequence of your use of a Google product. You must also provide end users with prominent and easily accessible information about that party’s use of end users’ personal data.
    1. GDPR introduces a list of data subjects’ rights that should be obeyed by both data processors and data collectors. The list includes: Right of access by the data subject (Section 2, Article 15). Right to rectification (Section 3, Art 16). Right to object to processing (Section 4, Art 21). Right to erasure, also known as ‘right to be forgotten’ (Section 3, Art 17). Right to restrict processing (Section 3, Art 18). Right to data portability (Section 3, Art 20).
    1. An example of reliance on legitimate interests includes a computer store, using only the contact information provided by a customer in the context of a sale, serving that customer with direct regular mail marketing of similar product offerings — accompanied by an easy-to-select choice of online opt-out.
    1. This is no different where legitimate interests applies – see the examples below from the DPN. It should also be made clear that individuals have the right to object to processing of personal data on these grounds.
    2. Individuals can object to data processing for legitimate interests (Article 21 of the GDPR) with the controller getting the opportunity to defend themselves, whereas where the controller uses consent, individuals have the right to withdraw that consent and the ‘right to erasure’. The DPN observes that this may be a factor in whether companies rely on legitimate interests.

      .

    1. Legitimate Interest may be used for marketing purposes as long as it has a minimal impact on a data subject’s privacy and it is likely the data subject will not object to the processing or be surprised by it.
    1. 15. Data sharing: practices and needs of the transport research community In view of the growing importance of data sharing in transportresearch and based on recommendations of a dedicated expert group on the Transport Research Cloud a new study will establish:1. What are the needs and objections of transports researchers in relation to data sharing:2. What are the training requirementsneeded for the transport research community to facilitate data sharing and3. What potential user communities would expect from a Transport Research Cloud?Type of Action: Provision of technical/scientific services by the Joint Research CentreIndicative timetable: Second quarter of 2020Indicative budget: EUR 0.20 million from the 2020 budget
    1. multiple scandals have highlighted some very shady practices being enabled by consent-less data-mining — making both the risks and the erosion of users’ rights clear
    2. Earlier this year it began asking Europeans for consent to processing their selfies for facial recognition purposes — a highly controversial technology that regulatory intervention in the region had previously blocked. Yet now, as a consequence of Facebook’s confidence in crafting manipulative consent flows, it’s essentially figured out a way to circumvent EU citizens’ fundamental rights — by socially engineering Europeans to override their own best interests.
    3. The deceitful obfuscation of commercial intention certainly runs all the way through the data brokering and ad tech industries that sit behind much of the ‘free’ consumer Internet. Here consumers have plainly been kept in the dark so they cannot see and object to how their personal information is being handed around, sliced and diced, and used to try to manipulate them.
    4. design choices are being selected to be intentionally deceptive. To nudge the user to give up more than they realize. Or to agree to things they probably wouldn’t if they genuinely understood the decisions they were being pushed to make.
    1. sleep Student's Sleep Data

      Data which show the effect of two soporific drugs (increase in hours of sleep compared to control) on 10 patients.

    2. Pulse Pulse Rates and Exercise

      Students in a Stat2 class recorded resting pulse rates (in class), did three "laps" walking up/down a nearby set of stairs, and then measured their pulse rate after the exercise. They provided additional information about height, weight, exercise, and smoking habits via a survey.

    3. HSAUR agefat Total Body Composision Data

      Dataset used for KSB

    1. Consent is one of six lawful grounds for processing data. It may be arguable that anti-spam measures such as reCaptcha can fall under "legitimate interests" (ie you don't need to ask for consent)
    1. A 1% sample of AddThis Data (“Sample Dataset”) is retained for a maximum of 24 months for business continuity purposes.
    1. Not only are public transport datasets useful for benchmarking route planning systems, they are also highly useful for benchmarking geospatial [13, 14] and temporal [15, 16] RDF systems due to the intrinsic geospatial and temporal properties of public transport datasets. While synthetic dataset generators already exist in the geospatial and temporal domain [17, 18], no systems exist yet that focus on realism, and specifically look into the generation of public transport datasets. As such, the main topic that we address in this work, is solving the need for realistic public transport datasets with geospatial and temporal characteristics, so that they can be used to benchmark RDF data management and route planning systems. More specifically, we introduce a mimicking algorithm for generating realistic public transport data, which is the main contribution of this work.
    1. Right now, if you want to know what data Facebook has about you, you don’t have the right to ask them to give you all of the data they have on you, and the right to know what they’ve done with it. You should have that right. You should have the right to know and have access to your data.
    1. Good data privacy practices by companies are good for the world. We wake up every day excited to change the world and keep the internet that we all know and love safe and transparent.
    2. startup focused on creating transparency in data. All that stuff you keep reading about the shenanigans with companies mishandling people's data? That's what we are working on fixing.
  4. Feb 2020
    1. Research ethics concerns issues, such as privacy, anonymity, informed consent and the sensitivity of data. Given that social media is part of society’s tendency to liquefy and blur the boundaries between the private and the public, labour/leisure, production/consumption (Fuchs, 2015a: Chapter 8), research ethics in social media research is par-ticularly complex.
    2. One important aspect of critical social media research is the study of not just ideolo-gies of the Internet but also ideologies on the Internet. Critical discourse analysis and ideology critique as research method have only been applied in a limited manner to social media data. Majid KhosraviNik (2013) argues in this context that ‘critical dis-course analysis appears to have shied away from new media research in the bulk of its research’ (p. 292). Critical social media discourse analysis is a critical digital method for the study of how ideologies are expressed on social media in light of society’s power structures and contradictions that form the texts’ contexts.
    3. t has, for example, been common to study contemporary revolutions and protests (such as the 2011 Arab Spring) by collecting large amounts of tweets and analysing them. Such analyses can, however, tell us nothing about the degree to which activists use social and other media in protest communication, what their motivations are to use or not use social media, what their experiences have been, what problems they encounter in such uses and so on. If we only analyse big data, then the one-sided conclusion that con-temporary rebellions are Facebook and Twitter revolutions is often the logical conse-quence (see Aouragh, 2016; Gerbaudo, 2012). Digital methods do not outdate but require traditional methods in order to avoid the pitfall of digital positivism. Traditional socio-logical methods, such as semi-structured interviews, participant observation, surveys, content and critical discourse analysis, focus groups, experiments, creative methods, par-ticipatory action research, statistical analysis of secondary data and so on, have not lost importance. We do not just have to understand what people do on the Internet but also why they do it, what the broader implications are, and how power structures frame and shape online activities
    4. Challenging big data analytics as the mainstream of digital media studies requires us to think about theoretical (ontological), methodological (epistemological) and ethical dimensions of an alternative paradigm

      Making the case for the need for digitally native research methodologies.

    5. Who communicates what to whom on social media with what effects? It forgets users’ subjectivity, experiences, norms, values and interpre-tations, as well as the embeddedness of the media into society’s power structures and social struggles. We need a paradigm shift from administrative digital positivist big data analytics towards critical social media research. Critical social media research combines critical social media theory, critical digital methods and critical-realist social media research ethics.
    6. de-emphasis of philosophy, theory, critique and qualitative analysis advances what Paul Lazarsfeld (2004 [1941]) termed administrative research, research that is predominantly concerned with how to make technologies and administration more efficient and effective.
    7. Big data analytics’ trouble is that it often does not connect statistical and computational research results to a broader analysis of human meanings, interpretations, experiences, atti-tudes, moral values, ethical dilemmas, uses, contradictions and macro-sociological implica-tions of social media.
    8. Such funding initiatives privilege quantitative, com-putational approaches over qualitative, interpretative ones.
    9. There is a tendency in Internet Studies to engage with theory only on the micro- and middle-range levels that theorize single online phenomena but neglect the larger picture of society as a totality (Rice and Fuller, 2013). Such theories tend to be atomized. They just focus on single phenomena and miss soci-ety’s big picture
    1. Data extraction is a piece of a larger puzzle called data integration (getting the data you want to a single place, from different systems, the way you want it), which people have been working on since the early 1980s.

      definition

  5. Jan 2020
    1. Sociedad Quimica y Minera de Chile (NYSE:SQM): Known as SQM, this company also extracts potash from the Atacama Desert and is Chile’s largest fertilizer producer.

      Potassium mines in 3rd world countries.

    1. Annotation extends that power to a web made not only of linked resources, but also of linked segments within them. If the web is a loom on which applications are woven, then annotation increases the thread count of the fabric. Annotation-powered applications exploit the denser weave by defining segments and attaching data or behavior to them.

      I remember the first time I truly understood what Jon meant when he said this. One web page can have an unlimited number of specific addresses pointing into its parts--and through annotation these parts can be connected to an unlimited number of parts of other things. Jon called it: Exploding the web! How far we've come from Vannevar Bush's musings...

    1. The Web Annotation Data Model specification describes a structured model and format to enable annotations to be shared and reused across different hardware and software platforms.

      The publication of this web standard changed everything. I look forward to true testing of interoperable open annotation. The publication of the standard nearly three years ago was a game changer, but the game is still in progress. The future potential is unlimited!

    1. A final word: when we do not understand something, it does not look like there is anything to be understood at all - it just looks like random noise. Just because it looks like noise does not mean there is no hidden structure.

      Excellent statement! Could this be the guiding principle of the current big data boom in biology?

  6. Dec 2019
    1. You can create sub-projects (or sub-contexts) by adding a backslash
    2. Double click on a project/context select all there sub-projects/contexts, therefore show their tasks
    3. arborescence

      First sighting of word arborescence. I thought they were just doing that for fun, as a play on "tree", but I guess it's a real graph theory concept (https://en.wikipedia.org/wiki/Arborescence_(graph_theory)).

    1. Your task list is a plain text file, not some proprietary format owned by a company or locked to a specific application.
    2. A simple and timeless format Plain text is the simplest file format there is. It will always be accessible, by some kind of application, forever.
  7. plaintext-productivity.net plaintext-productivity.net
    1. Avoiding complicated outlining or mind-mapping software saves a bunch of mouse clicks or dreaming up complicated visualizations (it helps if you are a linear thinker).

      Hmm. I'm not sure I agree with this thought/sentiment (though it's hard to tell since it's an incomplete sentence). I think visualizations and mind-mapping software might be an even better way to go, in terms of efficiency of editing (since they are specialized for the task), enjoyment of use, etc.

      The main thing text files have going for them is flexibility, portability, client-neutrality, the ability to get started right now without researching and evaluating a zillion competing GUI app alternatives.

    2. Plaintext files are tiny, simple, quick to work with, editable by tons of great programs, searchable by all modern operating systems, easy to back up, perfect for versioning, trivial to sync between devices, and are amazingly flexible in their uses and formats.
    3. In this system, plaintext files are used for most of the backbone of your organizational system.
  8. burnsoftware.wordpress.com burnsoftware.wordpress.com
    1. made to work alongside the various plain-text, Dropbox syncing mobile notes apps such as Denote for Android and Jottings for iPhone from an app for the Ubuntu desktop. Plain text notes anywhere you want. Easily synced between your desktop and phone. Notes, plain and simple.
  9. burnsoftware.wordpress.com burnsoftware.wordpress.com
    1. Future proofs your journal entries by saving them as plain text and organizing them as you go. This means you can read or create entries when you don’t have DayJournal.
    1. Plain text is software and operating system agnostic. It's searchable, portable, lightweight, and easily manipulated. It's unstructured. It works when someone else's web server is down or your Outlook .PST file is corrupt. There's no exporting and importing, no databases or tags or flags or stars or prioritizing or insert company name here-induced rules on what you can and can't do with it.
    1. Countless productivity apps and sites store your tasks in their own proprietary database and file format. But you can work with your todo.txt file in every text editor ever made, regardless of operating system or vendor.
    1. greater integration of data, data security, and data sharing through the establishment of a searchable database.

      Would be great to connect these efforts with others who work on this from the data end, e.g. RDA as mentioned above.

      Also, the presentation at http://www.gfbr.global/wp-content/uploads/2018/12/PG4-Alpha-Ahmadou-Diallo.pptx states

      This data will be made available to the public and to scientific and humanitarian health communities to disseminate knowledge about the disease, support the expansion of research in West Africa, and improve patient care and future response to an outbreak.

      but the notion of public access is not clearly articulated in the present article.

    2. platform

      Does it have a name and online presence? The details provided here go beyond what's given in reference 13, but some more detail would still be useful, e.g. to connect the initiative to efforts directed at data management and curation more generally, for instance in the framework of the Research Data Alliance, https://www.rd-alliance.org/ .

    1. Practical highlights in my opinion:

      • It's important to know about data padding in PG.
      • Be conscious when modelling data tables about columns ordering, but don't be pure-school and do it in a best-effort basis.
      • Gains up to 25% in wasted storage are impressive but always keep in mind the scope of the system. For me, gains are not worth it in the short-term. Whenever a system grows, it is possible to migrate data to more storage-efficient tables but mind the operative burder.

      Here follows my own commands on trying the article points. I added - pg_column_size(row()) on each projection to have clear absolute sizes.

      -- How does row function work?
      
      SELECT pg_column_size(row()) AS empty,
             pg_column_size(row(0::SMALLINT)) AS byte2,
             pg_column_size(row(0::BIGINT)) AS byte8,
             pg_column_size(row(0::SMALLINT, 0::BIGINT)) AS byte16,
             pg_column_size(row(''::TEXT)) AS text0,
             pg_column_size(row('hola'::TEXT)) AS text4,
             0 AS term
      ;
      
      -- My own take on that
      
      SELECT pg_column_size(row()) AS empty,
             pg_column_size(row(uuid_generate_v4())) AS uuid_type,
             pg_column_size(row('hola mundo'::TEXT)) AS text_type,
             pg_column_size(row(uuid_generate_v4(), 'hola mundo'::TEXT)) AS uuid_text_type,
             pg_column_size(row('hola mundo'::TEXT, uuid_generate_v4())) AS text_uuid_type,
             0 AS term
      ;
      
      CREATE TABLE user_order (
        is_shipped    BOOLEAN NOT NULL DEFAULT false,
        user_id       BIGINT NOT NULL,
        order_total   NUMERIC NOT NULL,
        order_dt      TIMESTAMPTZ NOT NULL,
        order_type    SMALLINT NOT NULL,
        ship_dt       TIMESTAMPTZ,
        item_ct       INT NOT NULL,
        ship_cost     NUMERIC,
        receive_dt    TIMESTAMPTZ,
        tracking_cd   TEXT,
        id            BIGSERIAL PRIMARY KEY NOT NULL
      );
      
      SELECT a.attname, t.typname, t.typalign, t.typlen
        FROM pg_class c
        JOIN pg_attribute a ON (a.attrelid = c.oid)
        JOIN pg_type t ON (t.oid = a.atttypid)
       WHERE c.relname = 'user_order'
         AND a.attnum >= 0
       ORDER BY a.attnum;
      
      -- What is it about pg_class, pg_attribute and pg_type tables? For future investigation.
      
      -- SELECT sum(t.typlen)
      -- SELECT t.typlen
      SELECT a.attname, t.typname, t.typalign, t.typlen
        FROM pg_class c
        JOIN pg_attribute a ON (a.attrelid = c.oid)
        JOIN pg_type t ON (t.oid = a.atttypid)
       WHERE c.relname = 'user_order'
         AND a.attnum >= 0
       ORDER BY a.attnum
      ;
      
      -- Whoa! I need to master mocking data directly into db.
      
      INSERT INTO user_order (
          is_shipped, user_id, order_total, order_dt, order_type,
          ship_dt, item_ct, ship_cost, receive_dt, tracking_cd
      )
      SELECT true, 1000, 500.00, now() - INTERVAL '7 days',
             3, now() - INTERVAL '5 days', 10, 4.99,
             now() - INTERVAL '3 days', 'X5901324123479RROIENSTBKCV4'
        FROM generate_series(1, 1000000);
      
      -- New item to learn, pg_relation_size. 
      
      SELECT pg_relation_size('user_order') AS size_bytes,
             pg_size_pretty(pg_relation_size('user_order')) AS size_pretty;
      
      SELECT * FROM user_order LIMIT 1;
      
      SELECT pg_column_size(row(0::NUMERIC)) - pg_column_size(row()) AS zero_num,
             pg_column_size(row(1::NUMERIC)) - pg_column_size(row()) AS one_num,
             pg_column_size(row(9.9::NUMERIC)) - pg_column_size(row()) AS nine_point_nine_num,
             pg_column_size(row(1::INT2)) - pg_column_size(row()) AS int2,
             pg_column_size(row(1::INT4)) - pg_column_size(row()) AS int4,
             pg_column_size(row(1::INT2, 1::NUMERIC)) - pg_column_size(row()) AS int2_one_num,
             pg_column_size(row(1::INT4, 1::NUMERIC)) - pg_column_size(row()) AS int4_one_num,
             pg_column_size(row(1::NUMERIC, 1::INT4)) - pg_column_size(row()) AS one_num_int4,
             0 AS term
      ;
      
      SELECT pg_column_size(row(''::TEXT)) - pg_column_size(row()) AS empty_text,
             pg_column_size(row('a'::TEXT)) - pg_column_size(row()) AS len1_text,
             pg_column_size(row('abcd'::TEXT)) - pg_column_size(row()) AS len4_text,
             pg_column_size(row('abcde'::TEXT)) - pg_column_size(row()) AS len5_text,
             pg_column_size(row('abcdefgh'::TEXT)) - pg_column_size(row()) AS len8_text,
             pg_column_size(row('abcdefghi'::TEXT)) - pg_column_size(row()) AS len9_text,
             0 AS term
      ;
      
      SELECT pg_column_size(row(''::TEXT, 1::INT4)) - pg_column_size(row()) AS empty_text_int4,
             pg_column_size(row('a'::TEXT, 1::INT4)) - pg_column_size(row()) AS len1_text_int4,
             pg_column_size(row('abcd'::TEXT, 1::INT4)) - pg_column_size(row()) AS len4_text_int4,
             pg_column_size(row('abcde'::TEXT, 1::INT4)) - pg_column_size(row()) AS len5_text_int4,
             pg_column_size(row('abcdefgh'::TEXT, 1::INT4)) - pg_column_size(row()) AS len8_text_int4,
             pg_column_size(row('abcdefghi'::TEXT, 1::INT4)) - pg_column_size(row()) AS len9_text_int4,
             0 AS term
      ;
      
      SELECT pg_column_size(row(1::INT4, ''::TEXT)) - pg_column_size(row()) AS int4_empty_text,
             pg_column_size(row(1::INT4, 'a'::TEXT)) - pg_column_size(row()) AS int4_len1_text,
             pg_column_size(row(1::INT4, 'abcd'::TEXT)) - pg_column_size(row()) AS int4_len4_text,
             pg_column_size(row(1::INT4, 'abcde'::TEXT)) - pg_column_size(row()) AS int4_len5_text,
             pg_column_size(row(1::INT4, 'abcdefgh'::TEXT)) - pg_column_size(row()) AS int4_len8_text,
             pg_column_size(row(1::INT4, 'abcdefghi'::TEXT)) - pg_column_size(row()) AS int4_len9_text,
             0 AS term
      ;
      
      SELECT pg_column_size(row()) - pg_column_size(row()) AS empty_row,
             pg_column_size(row(''::TEXT)) - pg_column_size(row()) AS no_text,
             pg_column_size(row('a'::TEXT)) - pg_column_size(row()) AS min_text,
             pg_column_size(row(1::INT4, 'a'::TEXT)) - pg_column_size(row()) AS two_col,
             pg_column_size(row('a'::TEXT, 1::INT4)) - pg_column_size(row()) AS round4;
      
      SELECT pg_column_size(row()) - pg_column_size(row()) AS empty_row,
             pg_column_size(row(1::SMALLINT)) - pg_column_size(row()) AS int2,
             pg_column_size(row(1::INT)) - pg_column_size(row()) AS int4,
             pg_column_size(row(1::BIGINT)) - pg_column_size(row()) AS int8,
             pg_column_size(row(1::SMALLINT, 1::BIGINT)) - pg_column_size(row()) AS padded,
             pg_column_size(row(1::INT, 1::INT, 1::BIGINT)) - pg_column_size(row()) AS not_padded;
      
      SELECT a.attname, t.typname, t.typalign, t.typlen
        FROM pg_class c
        JOIN pg_attribute a ON (a.attrelid = c.oid)
        JOIN pg_type t ON (t.oid = a.atttypid)
       WHERE c.relname = 'user_order'
         AND a.attnum >= 0
       ORDER BY t.typlen DESC;
      
      DROP TABLE user_order;
      
      CREATE TABLE user_order (
        id            BIGSERIAL PRIMARY KEY NOT NULL,
        user_id       BIGINT NOT NULL,
        order_dt      TIMESTAMPTZ NOT NULL,
        ship_dt       TIMESTAMPTZ,
        receive_dt    TIMESTAMPTZ,
        item_ct       INT NOT NULL,
        order_type    SMALLINT NOT NULL,
        is_shipped    BOOLEAN NOT NULL DEFAULT false,
        order_total   NUMERIC NOT NULL,
        ship_cost     NUMERIC,
        tracking_cd   TEXT
      );
      
      -- And, what about other varying size types as JSONB?
      
      SELECT pg_column_size(row('{}'::JSONB)) - pg_column_size(row()) AS empty_jsonb,
             pg_column_size(row('{}'::JSONB, 0::INT4)) - pg_column_size(row()) AS empty_jsonb_int4,
             pg_column_size(row(0::INT4, '{}'::JSONB)) - pg_column_size(row()) AS int4_empty_jsonb,
             pg_column_size(row('{"a": 1}'::JSONB)) - pg_column_size(row()) AS basic_jsonb,
             pg_column_size(row('{"a": 1}'::JSONB, 0::INT4)) - pg_column_size(row()) AS basic_jsonb_int4,
             pg_column_size(row(0::INT4, '{"a": 1}'::JSONB)) - pg_column_size(row()) AS int4_basic_jsonb,
             0 AS term;
      
    1. Remarkably, studies receiving mainly public funding can, a quarter of a century on, still survive without making their data available in a useful way. In the UK a series of studies—the Avon Longitudinal Study of Parents and Children (ALSPAC) (100), UK Biobank (101), and Born in Bradford (102), among others—have surely been exemplary in promoting data accessibility.

      Critical points!

    1. view-helpers form-helpers form-helper view-helper button buttons form forms

      Since I didn't know which variant was canonical, I tagged with both/all variants. Gross.

    1. I'll give a little bit of the history to provide context. My own involvement in this started around 2008 after we had shipped our key-value store. My next project was to try to get a working Hadoop setup going, and move some of our recommendation processes there. Having little experience in this area, we naturally budgeted a few weeks for getting data in and out, and the rest of our time for implementing fancy prediction algorithms. So began a long slog. We originally planned to just scrape the data out of our existing Oracle data warehouse. The first discovery was that getting data out of Oracle quickly is something of a dark art. Worse, the data warehouse processing was not appropriate for the production batch processing we planned for Hadoop—much of the processing was non-reversable and specific to the reporting being done. We ended up avoiding the data warehouse and going directly to source databases and log files. Finally, we implemented another pipeline to load data into our key-value store for serving results. This mundane data copying ended up being one of the dominate items for the original development. Worse, any time there was a problem in any of the pipelines, the Hadoop system was largely useless—running fancy algorithms on bad data just produces more bad data. Although we had built things in a fairly generic way, each new data source required custom configuration to set up. It also proved to be the source of a huge number of errors and failures. The site features we had implemented on Hadoop became popular and we found ourselves with a long list of interested engineers. Each user had a list of systems they wanted integration with and a long list of new data feeds they wanted. ETL in Ancient Greece. Not much has changed.

      A great anecdote / story on the (pains) of data integration

    2. Effective use of data follows a kind of Maslow's hierarchy of needs. The base of the pyramid involves capturing all the relevant data, being able to put it together in an applicable processing environment (be that a fancy real-time query system or just text files and python scripts). This data needs to be modeled in a uniform way to make it easy to read and process. Once these basic needs of capturing data in a uniform way are taken care of it is reasonable to work on infrastructure to process this data in various ways—MapReduce, real-time query systems, etc. It's worth noting the obvious: without a reliable and complete data flow, a Hadoop cluster is little more than a very expensive and difficult to assemble space heater. Once data and processing are available, one can move concern on to more refined problems of good data models and consistent well understood semantics. Finally, concentration can shift to more sophisticated processing—better visualization, reporting, and algorithmic processing and prediction. In my experience, most organizations have huge holes in the base of this pyramid—they lack reliable complete data flow—but want to jump directly to advanced data modeling techniques. This is completely backwards. So the question is, how can we build reliable data flow throughout all the data systems in an organization?
    3. Data integration is making all the data an organization has available in all its services and systems.
  10. Nov 2019
    1. TrackMeNot is user-installed and user-managed, residing wholly on users' system and functions without the need for 3rd-party servers or services. Placing users in full control is an essential feature of TrackMeNot, whose purpose is to protect against the unilateral policies set by search companies in their handling of our personal information.
    1. Google has confirmed that it partnered with health heavyweight Ascension, a Catholic health care system based in St. Louis that operates across 21 states and the District of Columbia.

      What happened to 'thou shalt not steal'?

    1. Speaking with MIT Technology Review, Rohit Prasad, Alexa’s head scientist, has now revealed further details about where Alexa is headed next. The crux of the plan is for the voice assistant to move from passive to proactive interactions. Rather than wait for and respond to requests, Alexa will anticipate what the user might want. The idea is to turn Alexa into an omnipresent companion that actively shapes and orchestrates your life. This will require Alexa to get to know you better than ever before.

      This is some next-level onslaught.

  11. codeactsineducation.wordpress.com codeactsineducation.wordpress.com
    1. SEL measurement is being done in myriad ways, involving multiple different conceptualizations of SEL, different political positions, and different sectoral interests.

      Here I am reminded of the book Counting What Counts

    1. This booklet itells you how to use the R statistical software to carry out some simple analyses that are common in analysing time series data.

      what is time series?

    1. "While we hope that Google will lift these unwarranted sanctions for AdNauseam, it highlights a much more serious problem for Chrome users," the AdNauseam team adds. "It is frightening to think that at any moment Google can quietly make your extensions and data disappear, without so much as a warning."
  12. Oct 2019
    1. "Element" SelectorsEach component has a data-reach-* attribute on the underlying DOM element that you can think of as the "element" for the component.