3,374 Matching Annotations
  1. Mar 2024
    1. Résumé de la vidéo [00:00:01][^1^][1] - [00:22:59][^2^][2]:

      Cette vidéo présente les objectifs et les activités de la Fabrique des Mobilités (FabMob), une association qui vise à promouvoir une mobilité durable et moins émettrice de carbone. Elle explique le concept de commun numérique et son application pratique dans le secteur de la mobilité, en mettant l'accent sur la coopération entre acteurs hétérogènes et la gouvernance partagée des ressources numériques.

      Points forts: + [00:00:01][^3^][3] Introduction de FabMob * Présentation des objectifs * Définition d'un commun numérique + [00:04:00][^4^][4] Rôle de la DGITM * Collaboration avec FabMob * Importance des communs dans la mobilité + [00:08:03][^5^][5] Modalités de participation * Encouragement des questions * Cycle de travail sur les outils numériques + [00:09:01][^6^][6] Définition académique d'un commun * Trois piliers : ressource, communauté, gouvernance * Exemples de communs numériques + [00:13:20][^7^][7] Panorama institutionnel * Diverses institutions impliquées dans les communs numériques * Exemples européens et français + [00:20:36][^8^][8] Distinction entre Open Data, Open Source et commun numérique * Explication des termes * Importance de la gouvernance des données Résumé de la vidéo [00:23:01][^1^][1] - [00:45:09][^2^][2]:

      La partie 2 de la vidéo aborde la logique d'Open Data, d'OP source, et de commun numérique dans le contexte français, en mettant l'accent sur l'importance de l'ouverture, des licences variées, et de la gouvernance collective pour le partage des ressources numériques.

      Points forts: + [00:23:01][^3^][3] Open Data et OP source * Accès libre aux logiciels * Licences variées + [00:23:37][^4^][4] Commun numérique * Service de sa communauté * Pas nécessairement ouvert + [00:25:02][^5^][5] Avantages du numérique * Effets de réseau * Coûts de réplication faibles + [00:27:00][^6^][6] Gouvernance collective * Importance de la fédération * Gestion de la ressource + [00:31:11][^7^][7] Exemples concrets * Affluence TC à Grenoble * Intelligence artificielle dans les transports + [00:43:42][^8^][8] Politiques publiques par les communs * Réduction des coûts * Transparence et pérennité Résumé de la vidéo [00:45:11][^1^][1] - [01:05:36][^2^][2]:

      La vidéo discute de l'importance de rendre les données de réglementation routière accessibles et utilisables pour les collectivités, en particulier pour l'intégration dans les systèmes GPS. Elle souligne la nécessité d'une collaboration communautaire pour créer une base de données exhaustive et utile.

      Points forts: + [00:45:11][^3^][3] Accessibilité des données * Simplifier l'utilisation des données pour les collectivités * Créer des outils de navigation intuitifs + [00:46:01][^4^][4] Intégration GPS * Intégrer les règles de circulation dans les GPS * Adapter la navigation aux spécificités des véhicules + [00:47:03][^5^][5] Avantages logistiques * Faciliter la traduction des règlements pour les chauffeurs étrangers * Améliorer la coordination entre les services de gestion du réseau + [00:48:00][^6^][6] Applications futures * Imaginer des usages réglementaires dynamiques * Permettre une créativité réglementaire avec les données numériques Résumé de la vidéo [01:05:38][^1^][1] - [01:25:41][^2^][2]:

      Cette vidéo discute des incitations financières pour le covoiturage en France, des défis de fraude et de la création d'un registre de preuve de covoiturage pour sécuriser les trajets et encourager l'adoption du covoiturage.

      Points forts: + [01:05:38][^3^][3] Incitations pour le covoiturage * Gratuité pour les passagers * Rémunération pour les conducteurs + [01:06:14][^4^][4] Forfait mobilité durable * Jusqu'à 800 € par an pour les salariés + [01:06:28][^5^][5] Primes de l'État * 100 € pour les nouveaux covoitureurs + [01:07:10][^6^][6] Défis de fraude * Risques liés aux incitations financières + [01:07:58][^7^][7] Registre de preuve de covoiturage * Infrastructure numérique contre la fraude + [01:11:02][^8^][8] Communauté et gouvernance * Plus de 700 collectivités impliquées Résumé de la vidéo [01:15:00][^1^][1] - [01:22:59][^2^][2]:

      La vidéo aborde le concept des communs numériques, leur importance dans la transition écologique et la mobilité, et comment ils favorisent la coopération entre divers acteurs. Elle souligne l'importance de la gouvernance collective et présente des exemples concrets de communs numériques dans le secteur des transports.

      Points clés: + [01:15:00][^3^][3] Définition des communs numériques * Trois piliers : ressource partagée, communauté hétérogène, règles de gouvernance + [01:17:00][^4^][4] Exemples de communs numériques * Open Street Map, logiciels, données, serveurs + [01:19:00][^5^][5] Institutions et communs numériques * Directions ministérielles, agences nationales, collectivités + [01:21:00][^6^][6] Différence entre Open Data, Open Source et communs numériques * Open Data : données en accès libre; Open Source : code source ouvert; Communs numériques : gestion collective de ressources numériques

    1. Empower doesn't allow you to import legacy data from other sources (like Mint) or input manual transactions such as cash. The latter isn't important to me, but the former certainly is. I have 16 years of transaction data in Mint that I want to preserve.
  2. www.monarchmoney.com www.monarchmoney.com
    1. Our diagrams and charts make it easy to see where every dollar of your hard-earned money is flowing, so you can track your spending patterns at a glance.
    1. some of our older applications rely substantially on manual extract, transform and load (ETL)processes to pass data from one system to another. This substantially increases the volumeof customer and staff data in transit on the network, which in a modern data managementand reporting infrastructure would be encapsulated in secure, automated end-to-end

      Reliance on ETL seen as risky

      I’m not convinced about this. Real-time API connectivity between systems is a great goal…very responsive to changes filtering through disparate systems. But a lot of “modern” processing is still done by ETL batches (sometimes daily, sometimes hourly, sometimes every minute).

    1. Ironically, data brokers need to collect additional info to verify your identity and ensure they’re deleting the right person’s data.
    1. What if I don’t live in California?Only California residents have the right to data deletion under CCPA. (Why companies have the right to your data and you do not is another story. And here’s another. And another.)But some companies have said they’ll honor deletion requests no matter where you live. Spotify, Uber and Twitter said they treat deletion requests from any geographic location the same. Netflix, Microsoft, Starbucks and UPS have also said they’ll extend CCPA rights to all Americans.
    2. The company will probably ask for you to send over additional information or set up an appointment to verify your identity — that’s so no one can pretend to be you and steal or delete your data. To verify, you may need to confirm your account username and password, provide a piece of data like your phone number for the company to cross-check, or, rarely, show your government-issued ID. You should never be required to set up an account to get your data deleted, according to CCPA.
    3. People have to verify their identities before companies can delete data, which poses an extra obstacle.
    1. Nobody can see deleted accounts - not even developers. Deleted accounts are fully deleted. There's nothing to see basically by definition.
    2. There is no active user with that ID, so you cannot search by it. The whole point of deleting an account is to make it inaccessible, unreferenced, and unlinked. We're not going to implement "soft" account deletion.
    3. You cannot. And you're not supposed to. When an account is deleted, it is disassociated from all existing posts by design.
    1. Udi Dahan wrote about this in Don't Delete - Just Don't. There is always some sort of task, transaction, activity, or (my preferred term) event which actually represents the "delete". It's OK if you subsequently want to denormalize into a "current state" table for performance, but do that after you've nailed down the transactional model, not before. In this case you have "users". Users are essentially customers. Customers have a business relationship with you. That relationship does not simply vanish into thin air because they canceled their account. What's really happening is:
    2. The truth is that both of these approaches are wrong. Deleting is wrong. If you're actually asking this question then it means you're modelling the current state instead of the transactions. This is a bad, bad practice in database-land.
    3. In any system even remotely tied to money, hard-deletion violates all sorts of accounting expectations, even if moved to an archive/tombstone table. The correct way to handle this is a retroactive event.
  3. Feb 2024
    1. The data on the 463 courses at UT Austin can be found in the evals data frame included in the moderndive package
    1. (The more modication a library demands of eachMARC record, the more it costs.) In Harvard’s case she typicallyaccepts the record as is, even when the original card bearsadditional subject headings or enriching notes of various kinds.

      Information loss in digitizing catalog cards...

    1. Molly White on 'ownership' wrt digital stuff. Check for the various aspects she lifts out. wrt 'your data' Vgl [[On Selling Access to Your Data and Ownership of Data – Interdependent Thoughts 20220209114247]] and [[Saying My Data Is Too Imprecise]]. For (personal) data ownership is not a useful concept.

    1. The UCLA Loneliness Index has shot up in recent years, and I think we’ve mentioned loneliness already. And it's one of the key things that people say when you ask them about their lives. They say they're lonely.

      Where can you find data for this? I hear this statistic quoted often but not sure where i can find per-country data on this.

  4. Jan 2024
    1. I have compiled a list of database sources for global information about energy so you can save time.

      List of database sources for global information about energy

  5. Dec 2023
    1. There is a growing need for open standards for formats used to represent text, images, video and other collections of data, so that one producer's data will be accessible to another's software.

      Data formats are like currency. Either standardize it or make sure there are converters. Money exchange. Most used formats are valuable but also valuable content in a rare format makes the converter more valuable.

    1. 26

      ¿Qué onda con este bajón? No hace sentido.

    2. hurn

      Los churn rates en esta tabla son más altos para las niñas y, sin embargo, en la primera tabla el churn rate de niñas es menor que el de niños

      • for: remote COP29 project proposal - demographic data

      • comment

        • this is the first year the full participants list has been published. If it shows city/country of origin, that would be very useful for this proposal
  6. Nov 2023
    1. ActiveRecord::Base.normalizes declares an attribute normalization. The normalization is applied when the attribute is assigned or updated, and the normalized value will be persisted to the database. The normalization is also applied to the corresponding keyword argument of query methods, allowing records to be queried using unnormalized values.

      Guess I don't need to use mdeering/attribute_normalizer gem anymore...

  7. Oct 2023
    1. Description: The European Language Social Science Thesaurus (ELSST) is a broad-based, multilingual thesaurus for the social sciences. It is owned and published by the Consortium of European Social Science Data Archives (CESSDA) and its national Service Providers. The thesaurus consists of over 3,000 concepts and covers the core social science disciplines: politics, sociology, economics, education, law, crime, demography, health, employment, information and communication technology and, increasingly, environmental science.

    1. We wrześniu 2023 roku w większości badanych zawodów zanotowano spadki liczby ofert pracy rok do roku Największy widoczny jest w branży IT – pracodawcy opublikowali o 52 proc. mniej ofert rok do roku
    1. The important part, as is so often the case with technology, isn’t coming up with a solution to the post portability problem, but coming up with a solution together so that there is mutual buy-in and sustainability in the approach.

      The solution is to not create keep creating these fucking problems in the first place.

  8. Sep 2023
    1. Das Meeeis um die Antarktis bedeckt in diesem September so wenig Ozean Fläche wie in keinem September der Messgeschichte. Im September erreicht es seine maximale Ausdehnung. In diesem diesem Jahr liegt sie 1,75 Millionen Quadratmeter Kilometer unter dem langjährigen Durchschnitt und eine Million Quadratmeter unter dem bisher niedrigsten September-Maximum. Im Februar wurde auch bei der geringsten Ausdehnung des antarktischen Meereises ein Rekord verzeichnet. Ob und wie diese Entwicklung mit der globalen Erhitzung zusammenhängt ist noch unklar. Die obersten 300 m des Ozeans um die Antarktis sind deutlich wärmer als früher. https://www.theguardian.com/world/2023/sep/26/antarctic-sea-ice-shrinks-to-lowest-annual-maximum-level-on-record-data-shows

    1. It' is pretty good to see the mapping innovation taking several shapes, from the starting narrative to this one.

      Regarding feedback from this one I would make a call out that make more visible where the data and code behind the map is hosted and how to reproduce the results.

      On a more general sense, I think is important to see how the different narratives are better connected and which values they embody and make explicit. I would propose this values:

      1. Utility:

        • internal: helping us to make short or long lasting peer to peer connections like the one between Copincha (Habana, Cuba) and HackBo/Grafoscopio (Bogotá, Colombia) communities resulting from DOTS 202.
        • external to showcase which innovation, people and communities are doing and how they are connected now or can be in the future.
      2. Reproducibility: The data narratives should be able to be reproducible.

      3. Portability: Functionality bundles, including data, code, software should be packages to they can be used in local contexts, particularly those with low/intermittent internet connectivity.

      4. Recontextualization: Our data narratives should be empowering its reuse, adaptation, and extension by other communities and in other context.

      5. Commons/Community oriented: licenses on data/code should be explicit to allow the previous qualities. Some times that would require a copyfarleft license that protect third parties extract value from the data narratives and its bundles against the community interest (cfg current discussion on data collection from IA projects against community of creators).
    1. “There are no safeguards on what information it can ask for.”

      This is wrong. Section 36 of the Act says:

      The Central Government may, for the purposes of this Act, require the Board and any Data Fiduciary or intermediary to furnish such information as it may call for.

    1. Migration from pre-exisiting non-flatpak installations In order to migrate from pre-exisiting non-flatpak installation and preserve all settings please copy or move entire ~/.thunderbird folder into ~/.var/app/org.mozilla.Thunderbird/.thunderbird In case Thunderbird opens a new profile instead of the existing one, run: flatpak run org.mozilla.Thunderbird -P then select the right profile and tick "Use the selected profile without asking on startup" box.
    1. A new class for containing value objects: it is somewhat similar to Struct (and reuses some of the implementation internally), but is intended to be immutable, and have more modern and cleaner API.
    1. Recent work has revealed several new and significant aspects of the dynamics of theory change. First, statistical information, information about the probabilistic contingencies between events, plays a particularly important role in theory-formation both in science and in childhood. In the last fifteen years we’ve discovered the power of early statistical learning.

      The data of the past is congruent with the current psychological trends that face the education system of today. Developmentalists have charted how children construct and revise intuitive theories. In turn, a variety of theories have developed because of the greater use of statistical information that supports probabilistic contingencies that help to better inform us of causal models and their distinctive cognitive functions. These studies investigate the physical, psychological, and social domains. In the case of intuitive psychology, or "theory of mind," developmentalism has traced a progression from an early understanding of emotion and action to an understanding of intentions and simple aspects of perception, to an understanding of knowledge vs. ignorance, and finally to a representational and then an interpretive theory of mind.

      The mechanisms by which life evolved—from chemical beginnings to cognizing human beings—are central to understanding the psychological basis of learning. We are the product of an evolutionary process and it is the mechanisms inherent in this process that offer the most probable explanations to how we think and learn.

      Bada, & Olusegun, S. (2015). Constructivism Learning Theory : A Paradigm for Teaching and Learning.

    1. The Hyperdocument "Library System" where hyperdocuments can be submitted to a library-like service that catalogs them and guarantees access when referenced by its catalog number, or "jumped to" with an appropriate link. Links within newly submitted hyperdocuments can cite any passages within any of the prior documents, and the back-link service lets the online reader of a document detect and "go examine" any passage of a subsequent document that has a link citing that passage.

      That this isn't possible with open systems like the Web is well-understood (I think*). But is it feasible to do it with as-yet-untested closed (and moderated) systems? Wikis do something like this, but I'm interested in a service/community that behaves more closely in the concrete details to what is described here.

      * I think that this is understood, that is. That it's impossible is not what I'm uncertain about.

  9. Aug 2023
    1. So far, smart city systems are being set up to appropriate and commercialize individual and community data. So far, communities are not waking up to the realization that a capacity they need is being stolen from them before they have it.”
      • for: smart cities, doughnut cities, cosmolocal, downscaled planetary boundaries, cross-scale translation of earth system boundaries, TPF, community data, local data, open data, community data ownership, quote, quote - Garth Graham, quote - community owned data
      • quote
      • paraphrase
        • Innovation in the creation and sustainability of social institutions acts predominantly at the local level.
        • In the Internet of Things, for those capacities to emerge in smart cities, communities need the capacity to own and analyse the data created that models what they are experiencing.
        • Local data needs to be seen as a common, pool resource.
        • Where that occurs, communities will have the capacity to learn or innovate their way forward.
        • So far, smart city systems are being set up to appropriate and commercialize individual and community data.
        • So far, communities are not waking up to the realization that a capacity they need is being stolen from them before they have it.
      • author: Garth Graham
        • leader of Telecommunities Canada
    2. We lived in a relatively unregulated digital world until now. It was great until the public realized that a few companies wield too much power today in our lives. We will see significant changes in areas like privacy, data protection, algorithm and architecture design guidelines, and platform accountability, etc. which should reduce the pervasiveness of misinformation, hate and visceral content over the internet.
      • for: quote, quote - Prateek Raj, quote - internet regulation, quote - reducing misinformation, fake news, indyweb - support
      • quote
        • We lived in a relatively unregulated digital world until now.
        • It was great until the public realized that a few companies wield too much power today in our lives.
        • We will see significant changes in areas like
          • privacy,
          • data protection,
          • algorithm and
          • architecture design guidelines, and
          • platform accountability, etc.
        • which should reduce the pervasiveness of
          • misinformation,
          • hate and visceral content
        • over the internet.
        • These steps will also reduce the power wielded by digital giants.
        • Beyond these immediate effects, it is difficult to say if these social innovations will create a more participative and healthy society.
        • These broader effects are driven by deeper underlying factors, like
          • history,
          • diversity,
          • cohesiveness and
          • social capital, and also
          • political climate and
          • institutions.
        • In other words,
          • just as digital world is shaping the physical world,
          • physical world shapes our digital world as well.
      • author: Prateek Raj
        • assistant professor in strategy, Indian Institute of Management, Bangalore
    1. According to Section 18(2)(a) of the Bill, the Central Government can issue a notification exempting any “instrumentality of the State” from the provisions of this Bill in the interests of the sovereignty and integrity of India, security of the State, friendly relations with foreign States, maintenance of public order; or preventing incitement to any cognizable offence relating to any of the above.

      Uses language from Art. 12 (which has been read to include all "instrumentalities of the State" as "State"), and from Art. 19(1)(a).

  10. Jul 2023
    1. Conceptual data model: describes the semantics of a domain, being the scope of the model. For example, it may be a model of the interest area of an organization or industry. This consists of entity classes, representing kinds of things of significance in the domain, and relationship assertions about associations between pairs of entity classes. A conceptual schema specifies the kinds of facts or propositions that can be expressed using the model. In that sense, it defines the allowed expressions in an artificial 'language' with a scope that is limited by the scope of the model.
    2. "Data models for different systems are arbitrarily different. The result of this is that complex interfaces are required between systems that share data. These interfaces can account for between 25-70% of the cost of current systems".
    3. The term data model can refer to two distinct but closely related concepts
    4. A data model can sometimes be referred to as a data structure, especially in the context of programming languages.
    5. Sometimes it refers to an abstract formalization of the objects and relationships found in a particular application domain
    6. A data model[1][2][3][4][5] is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities.
    1. Visualizing freely available citation data using VOSviewer
      • Title
        • Visualizing freely available citation data using VOSviewer
      • Author
        • Nees Jan van Eck
        • Ludo Waltman
      • Date
        • Oct 23, 2017
      • Source
      • Description
        • Today we released version 1.6.6 of our VOSviewer software for constructing and visualizing bibliometric networks.
        • The most important new feature in this version is the support for working with Crossref data.
        • Recently, the Initiative for Open Citations (I4OC) managed to convince a large number of scientific publishers to make the reference lists of publications in their journals freely available through Crossref.
        • Thanks to I4OC, Crossref has become a valuable data source for VOSviewer users.
        • In this blog post, we discuss how users of the new version 1.6.6 of VOSviewer can benefit from Crossref data.
    1. Unispace finds that nearly half (42%) of companies that mandated office returns witnessed a higher level of employee attrition than they had anticipated. And almost a third (29%) of companies enforcing office returns are struggling with recruitment. Imagine that — nearly half!
    1. Code for processing data samples can get messy and hard to maintain; we ideally want our dataset code to be decoupled from our model training code for better readability and modularity.

      Code for data processing and model training should be separated as different modules.

    1. Aktive Kompetenz

      Offene Metadaten für (eigene) Publikationen selbst zu produzieren bzw. zu pflegen und in der eigenen Wisskomm zu verwenden (z.B. für Nachrichten mit Thumbnails in Social Medias) ist m.E. eine Aktivität und Strategie, die am Rande hier mit Erwähnung finden könnte, vgl. https://redaktionsblog.hypotheses.org/5219 ++ selbsr erschleßen mit https://scholia.toolforge.org/doi/ + 10.17175/wp_2023b führt zu https://scholia.toolforge.org/doi/10.17175/wp_2023b, vgl. das Workshopbeispiel: https://de.wikiversity.org/wiki/Open_Science_Festival/Forschen_im_Wikiversum_(2023)#Publizieren_und_Erschlie%C3%9Fen

    1. Such efforts to protect data privacy go beyond the abilities of the technology involved to also encompass the design process. Some Indigenous communities have created codes of use that people must follow to get access to community data. And most tech platforms created by or with an Indigenous community follow that group’s specific data principles. Āhau, for example, adheres to the Te Mana Raraunga principles of Māori data sovereignty. These include giving Māori communities authority over their information and acknowledging the relationships they have with it; recognizing the obligations that come with managing data; ensuring information is used for the collective benefit of communities; practicing reciprocity in terms of respect and consent; and exercising guardianship when accessing and using data. Meanwhile Our Data Indigenous is committed to the First Nations principles of ownership, control, access and possession (OCAP). “First Nations communities are setting their own agenda in terms of what kinds of information they want to collect,” especially around health and well-being, economic development, and cultural and language revitalization, among others, Lorenz says. “Even when giving surveys, they’re practicing and honoring local protocols of community interaction.”

      Colonized groups such as these indigenous people have urgency to avoid colonization of their data and are doing something about it

    1. CRISP-DM has not been built in a theoretical, academic manner working from technicalprinciples, nor did elite committees of gurus create it behind closed doors.
  11. Jun 2023
    1. How might humanists adopt STEM-oriented norms around data sharing

      This seems to be a fairly packed sentence. Why should they?

    2. the “Nelson memo” requires all publications and supporting data produced with federal funds be made freely and publicly available without an embargo period and points towards future mandates that would require all data generated with federal funds (not just data associated with publications) to be made public.

      This is the first problem: what does it mean. If I write a book on Middlemarch, what is my data? My notes? The quotations I use (note the non-exemption of publications only).

    1. What I have seen is situations where things were made horribly complicated to get around protections for which there was no need, and to try to guard the consistency of data structures that were horribly over-complicated and un-normalized.
    2. Using a property or a method to access the field enables you to maintain encapsulation, and fulfill the contract of the declaring class.
    3. Exposing properties gives you a way to hide the implementation. It also allows you to change the implementation without changing the code that uses it (e.g. if you decide to change the way data are stored in the class)
    4. Anything that isn't explicitly enforced by contract is vulnerable to misunderstandings. It's doing your teammates a great service, and reducing everyone's effort, by eliminating ambiguity and enforcing information flow by design.
    5. Far more preferable is to minimize data structure so that it tends to be normalized and not to have inconsistent states. Then, if a member of a class is changed, it is simply changed, rather than damaged.
    6. you nailed it! A consumer should only be able to set an object's state at initialization (via the constructor). Once the object has come to life, it should be internally responsible for its own state lifecycle. Allowing consumers to affect the state adds unnecessary complexity and risk.
    7. Making a property writable adds an order of magnitude in complexity. In the real world it's definitely not realistic for every class to be immutable, but if most of your classes are, it's remarkably easier to write bug-free code. I had that revelation once and I hope to help others have it.
    1. Let me preface this by saying I'm talking primarily about method access here, and to a slightly lesser extent, marking classes final, not member access.
    1. Digital nomads must earn at least €2,800 per month to qualify for its new visa, around four times Portugal’s minimum wage. According to Nomad List nearly 16,000 people were remote working in Lisbon last December, where they now find themselves blamed for rocketing rents and house prices.

      Digital nomads in Portugal

    2. According to a March survey, 36 per cent of digital nomads have an annual income of between $100,000 and $250,000. Another eight per cent earn between $250,000 and one million. Attracted by these bank balances, dozens of countries have now introduced so-called “digital nomad visas” (permitting extended stays to work remotely).

      Income of digital nomads

    1. The conversation will no longer be accessible via the shared link, but if a user imported the conversation into their chat history, deleting your link will not remove the conversation from their chat history.
    1. colleges will also have to invest in a major effort to “convince graduates that part of paying it forward is to respond to surveys aimed at determining what worked for them.”
    1. Learning heterogeneous graph embedding for Chinese legal document similarity

      The paper proposes L-HetGRL, an unsupervised approach using a legal heterogeneous graph and incorporating legal domain-specific knowledge, to improve Legal Document Similarity Measurement (LDSM) with superior performance compared to other methods.

  12. May 2023
    1. Trakt DataRecoveryIMPORTANTOn December 11 at 7:30 pm PST our main database crashed and corrupted some of the data. We're deeply sorry for the extended downtime and we'll do better moving forward. Updates to our automated backups are already in place and they will be tested on an ongoing basis.Data prior to November 7 is fully restored.Watched history between November 7 and Decmber 11 has been recovered. There is a separate message on your dashboard allowing you to review and import any recovered data.All other data (besides watched history) after November 7 has already been restored and imported.Some data might be permanently lost due to data corruption.Trakt API is back online as of December 20.Active VIP members will get 2 free months added to their expiration date
    1. Open data (within constraints of privacy laws) – For an infrastructure to be forked it will be necessary to replicate all relevant data. The CC0 waiver is best practice in making data legally available. Privacy and data protection laws will limit the extent to which this is possible

      {Open Data}

    1. It is also important to note that this positive evidence for low-income certificate-earners stands in con-trast to findings for other historically underserved groups; studies indicate that individuals of color and older individuals go on to stack credentials at lower rates and see smaller earnings gains relative to White individuals and younger individuals (Bohn and McConville, 2018; Bohn, Jackson and McConville, 2019; Daugherty et al., 2020; Daugherty and Anderson, 2021). Although we suspect many low-income individuals are also individuals of color, the findings suggest that there are inequities within stackable credential pipelines that might be more strongly tied to race, ethnicity, and age than to socioeconomic status. It is also possible that many low-income individuals never complete a first certificate and thus do not enter a stackable credential pathway

  13. Apr 2023
    1. Recommended Resource

      I recommend adding the webpage "Open Access in Australia" on Wikiwand that documents Australia's history for accepting and promoting open access and open publication in its country.

      The site contains a timeline that documents key years in which the open movement, open access, open government, and open data concepts were introduced. The year that CC Australia was established is included in the timeline.

    1. **Recommend Resource: ** Under the "More Information About Other Open Movements" I recommended adding Higashinihon Daishinsai Shashin Hozon Purojekuto, (trans. Great Earthquake of Eastern Japan Photo Archiving Project) which is one of Japan's open government and open data efforts to document all photographs about Japan's 2011 earthquake.

      The site currently contains close to 40,000 photographs of the aftermath of the natural disaster.

      The photos are hosted by Yahoo! Japan and are published under non-commercial clause for open access to the public.

    1. Once the awarding and registration systems are in place, institutions should also integrate with a modern CRM solution to attract and manage student interest, support, and personalized communications to increase enrollment and engagement. The CRM needs to support career services and other experiential learning departments as the school looks to build outside relationships with organizations and industry partners to provide real-world learning experiences and assessment opportunities for students

      CRM focus that goes beyond the academic unit to include others. Also think about Alumni Affairs, Foundation, and lifelong learning.

    1. Why do so many businesses share their data openly, for free? Most often, the answer is scale. As companies grow, the staff within those companies realize they have more ideas than they have the time and resources to develop them. It’s typically easier to work with other external companies that specialize in these ideas than build them in-house. By creating APIs, a company allows third-party developers to build applications that improve adoption and usage of its platform. That way, a business can build an ecosystem that becomes dependent on the data from their API, which often leads to additional revenue opportunities.
    1. After struggling with this problem for a while and still being far from solving this issue, I realized that I was making too many requests to the website; which made me come up with the idea of saving all the pages I needed to scrape on my local computer. Next, I started sending requests to these local HTML files instead and kept adapting my code.

      I had similar problem on this.

  14. Mar 2023
    1. [There’s also] a big new study from Cambridge University, in which researchers looked at 84,000 people…and found that social media was strongly associated with worse mental health during certain sensitive life periods, including for girls ages 11 to 13…One explanation is that teenagers (and teenage girls in particular) are uniquely sensitive to the judgment of friends, teachers, and the digital crowd.
    1. In order to throw light on the question whether exceptionally bright children are specially likely to be one-sided, nervous, delicate, morally abnormal, socially unadaptable, or otherwise peculiar, the writer has secured rather extensive information regarding 31 children whose mental age was found by intelligence tests to be 25 per cent above the actual age. This degree of intelligence is possessed by about 2 children out of 100, and is nearly as far above average intelligence as high-grade feeble-mindedness is below. The supplementary information, which was furnished in most cases by the teachers, may be summarized as follows: -- Ability special or general. In the case of 20 out of 31 the ability is decidedly general, and with 2 it is mainly general. The talents of 5 are described as more or less special, but only in one case is it remarkably so. Doubtful 4. Health. 15 are said to be perfectly healthy; 13 have one or more physical defects; 4 of the 13 are described as delicate; 4 have adenoids; 4 have eye-defects; 1 lisps; and 1 stutters. These figures are about the same as one finds in any group of ordinary children. Studiousness. "Extremely studious," 15; "usually studious" or "fairly studious," 11; "not particularly studious," 5; "lazy," 0. Moral traits. Favorable traits only, 19; one or more unfavorable traits, 8; no answer, 4. The eight with unfavorable moral traits are described as follows: 2 are "very self-willed"; 1 "needs close watching"; 1 is "cruel to animals"; 1 is "untruthful"; 1 is "unreliable"; 1 is "a bluffer"; 1 is "sexually abnormal," perverted," and "vicious." It will be noted that with the exception of the last child, the moral irregularities mentioned can hardly be regarded, from the psychological point of view, as essentially abnormal. It is perhaps a good rather than a bad sign for a child to be self-willed; most children "need close watching"; and a certain amount of untruthfulness in children is the rule and not the exception. Social adaptability. Socially adaptable, 25; not adaptable, 2; doubtful, 4. Attitude of other children. "Favorable," "friendly," "liked by everybody," "much admired," "popular," etc., 26; "not liked," 1; "inspires repugnance," 1; no answer, 1. Is child a leader? "Yes," 14; "no," or "not particularly," 12; doubtful, 5. Is play life normal? "Yes," 26; "no," 1; "hardly," 1; doubtful, 3. 1s child spoiled or vain? "No," 22; "yes," 5; "somewhat," 2; no answer, 2. According to the above data, exceptionally intelligent children are fully as likely to be healthy as ordinary children; their ability is far more often general than special, they are studious above the average, really serious faults are not common among them, they are nearly always socially adaptable, are sought after as playmates and companions, their play life is usually normal, they are leaders far oftener than other children, and notwithstanding their many really superior qualities they are seldom vain or spoiled.

      The data shows that children who are more superior are seen as healthy. I think children that are superior are seen as more healthy because they have a more positive outlook on life.

    1. There are two main reasons to use logarithmic scales in charts and graphs.
      • respond to skewness towards large values / outliers by spreading out the data.
      • show multiplicative factors rather than additive (ex: b is twice that of a).

        The data values are spread out better with the logarithmic scale. This is what I mean by responding to skewness of large values.

      In Figure 2 the difference is multiplicative. Since 27 = 26 times 2, we see that the revenues for Ford Motor are about double those for Boeing. This is what I mean by saying that we use logarithmic scales to show multiplicative factors

    2. One reason for choosing a dot plot rather than a bar chart is that it is less cluttered. We will be learning other benefits of dot plots in this and future posts.
      • Length of bar/line has no meaning in a log-scale

        A dot plot is judged by its position along an axis; in this case, the horizontal or x axis. A bar chart is judged by the length of the bar. I don’t like using lengths with logarithmic scales. That is a second reason that I prefer dot plots over bar charts for these data.

    1. A new breach involving data from nine million AT&T customers is a fresh reminder that your mobile provider likely collects and shares a great deal of information about where you go and what you do with your mobile device — unless and until you affirmatively opt out of this data collection. Here’s a primer on why you might want to do that, and how.
    1. As a teacher of English to secondary school students, and as an online doctoral student, I am excited to explore and possibly integrate Hypothesis into my work. I love research and everything involved with it. Thank you to the creators of this tool --

    1. "For this campaign, we surveyed 930 Americans to explore their retirement plans. Among them, 16% were retired, 22% were still working, and 62% were retirees who had returned to work."So, 149 of those surveyed were retired. Of those 149, 25 (1 in 6) are considering returning to work. 13 of those want remote positions.
  15. Feb 2023
    1. Where information that a controller would otherwise be required to provide to a datasubject pursuant to subsection (1) includes personal data relating to another individualthat would reveal, or would be capable of revealing, the identity of the individual, thecontroller—(a)shall not, subject to subsection (8), provide the data subject with the informationthat constitutes such personal data relating to the other individual, and(b)shall provide the data subject with a summary of the personal data concernedthat—(i)in so far as is possible, permits the data subject to exercise his or her rightsunder this Part, and

      There's a right to provide a summary where it would be hard to avoid revealing the identity of another individual.

    2. Subject to subsection (2), a controller, with respect to personal data for which it isresponsible, may restrict, wholly or partly, the exercise of a right of a data subjectspecified in subsection (4)

      Can restrict, but must be necessary and proportionate (and under one of the restriction rights)

    3. Subsection (1) shall not apply—(a)in respect of personal data relating to the data subject that consists of anexpression of opinion about the data subject by another person given inconfidence or on the understanding that it would be treated as confidential, or(b)to information specified in paragraph (b)(i)(III)of that subsection in so far as arecipient referred to therein is a public authority which may receive data in thecontext of a particular inquiry in accordance with the law of the State.

      Access doesn't need to include opinions made in confidence, or information obtained by a public authority who recieves data in the context of a particular inquiry.

    1. And it constitutes an important but overlooked signpost in the 20th-centuryhistory of information, as ‘facts’ fell out of fashion but big data became big business.

      Of course the hardest problem in big data has come to be realized as the issue of cleaning up messing and misleading data!

    Tags

    Annotators

    1. a peer-reviewed article

      This peer reviewed article titled "The Safety of COVID-19 Vaccinations—We Should Rethink the Policy" uses the mishandling of data provided by scientists to spready disinformation claiming that the Covid-19 vaccine is killing people. This is an example of disinformation because this study is peer reviewed, so the people involved in it are well educated and versed in the development and usage of the vaccine.

    1. I used SjoerdV / ConvertOneNote2MarkDown PowerShell script. The key is running PowerShell and OneNote as Administrator.It will crash a bunch of times depending on the size of your OneNote repository. However, if you keep restarting the program as administrator it seems to start back were it left off.Here are my notes: https://www.dropbox.com/s/au66hamcv71sggk/202211151246%20OneNote%20to%20Markdown%20Procedure.pdf?dl=0

      Details for converting OneNote to Obsidian using Markdown

      • Nora Bateson
      • great example of
      • warm data:
        • a doctor who used to visit her mother at her home home
          • the doctor's report of her mother's condition
          • make up the "cold data"
          • but it only told a part of the story
          • the other part of the story was not recorded in the formal medical transcripts
          • but was recorded in the living, breathing doctor
          • who experienced the conditions Nora's mother lived in
            • Was the room warm, or cold?
            • Was there a lot of family support?
            • Was there a lot of love in the human relationships? etc
    1. student outcomes, including learning, persistence, or attitudes.

      I would think that this would be one of the easiest things to measure and also would provide significant and useful data. We should check in with Brian (?) to see what data is currently being tracked.

  16. Jan 2023
    1. 3.1 Guest Lecture: Lauren Klein » Q&A on "What is Feminist Data Science?"<br /> https://www.complexityexplorer.org/courses/162-foundations-applications-of-humanities-analytics/segments/15631

      https://www.youtube.com/watch?v=c7HmG5b87B8

      Theories of Power

      Patricia Hill Collins' matrix of domination - no hierarchy, thus the matrix format

      What are other broad theories of power? are there schools?

      Relationship to Mary Parker Follett's work?

      Bright, Liam Kofi, Daniel Malinsky, and Morgan Thompson. “Causally Interpreting Intersectionality Theory.” Philosophy of Science 83, no. 1 (January 2016): 60–81. https://doi.org/10.1086/684173.

      about Bayesian modeling for intersectionality


      Where is Foucault in all this? Klein may have references, as I've not got the context.


      How do words index action? —Laura Klein


      The power to shape discourse and choose words - relationship to soft power - linguistic memes

      Color Conventions Project


      20:15 Word embeddings as a method within her research


      General result (outside of the proximal research) discussed: women are more likely to change language... references for this?


      [[academic research skills]]: It's important to be aware of the current discussions within one's field. (LK)


      36:36 quantitative imperialism is not the goal of humanities analytics, lived experiences are incredibly important as well. (DK)

    1. https://www.complexityexplorer.org/courses/162-foundations-applications-of-humanities-analytics/segments/15630

      https://www.youtube.com/watch?v=HwkRfN-7UWI


      Seven Principles of Data Feminism

      • Examine power
      • Challenge power
      • Rethink binaries and hierarchies
      • Elevate emotion an embodiment
      • Embrace pluralism
      • Consider context
      • Make labor visible

      Abolitionist movement

      There are some interesting analogies to be drawn between the abolitionist movement in the 1800s and modern day movements like abolition of police and racial justice, etc.


      Topic modeling - What would topic modeling look like for corpuses of commonplace books? Over time?


      wrt article: Soni, Sandeep, Lauren F. Klein, and Jacob Eisenstein. “Abolitionist Networks: Modeling Language Change in Nineteenth-Century Activist Newspapers.” Journal of Cultural Analytics 6, no. 1 (January 18, 2021). https://doi.org/10.22148/001c.18841. - Brings to mind the difference in power and invisible labor between literate societies and oral societies. It's easier to erase oral cultures with the overwhelm available to literate cultures because the former are harder to see.

      How to find unbiased datasets to study these?


      aspirational abolitionism driven by African Americans in the 1800s over and above (basic) abolitionism

    1. Big tech has benefited from an educational dynamic that consistently underfunds public education but demands increased technology to prepare the workers of the future, providing low-cost solutions in exchange for data and the potential for future product loyalty

      This is a pattern most of us are familiar with. The best example I know is Apple's launch of the iPad in LA schools without saying, or knowning, how it will be used. Apple has a long history of testing its products out on users. Google habitually does the same, offering products for "free" in exchange for data and expanding a user base for its products.

    1. 个人学习可能取决于他人行为的主张突出了将学习环境视为一个涉及多个互动参与者的系统的重要性
    1. In March, Fortum and Microsoft announced our joint plan for a ground-breaking data centre region in the Helsinki, Finland metro­politan area.

      Data centers and district heating - a perfect match. Clean electricity and then output for heat.

    1. Blind news audiences are being left behind in the data visualisation revolution: here's how we fix that

      !- Title : Blind news audiences are being left behind in the data visualisation revolution: here's how we fix that

    1. When engaging in data literacy work in our classrooms, it’s helpful to keep two ideas at play at once: on the one hand, these algorithmic systems are nowhere near as “smart” as these platforms want to lead us to believe they are; and on the other hand, concerns about accuracy can distract us from the bigger picture, that these platforms are built on a logic of prediction that, one nudge at a time, may ultimately infringe upon users’ ability to make up their own mind.
    1. If you have experienced trouble in rememberingdates try the following system which has proved beneficial to at least onestudent.

      Maxfield suggest drawing out a timeline as a possible visual cue for helping to remember dates. He seemingly misses any mention of ars memoria techniques here.

    1. ProPublica recently reported that breathing machines purchased by people with sleep apnea are secretly sending usage data to health insurers, where the information can be used to justify reduced insurance payments.

      !- surveillance capitalism : example- - Propublica reported breathing machines for sleep apnea secretly send data to insurance companies

    1. Actually I’m not sure most people do this, I just hope I’m not the only one.

      You are not. I will hoard this blog post on my hypothes.is :)

    1. We believe that the numeric notational marks associated with the animals constituted a calendar, and given that it references natural behaviour in terms of seasons relative to a fixed point in time, we may refer to it as a phenological calendar, with a meteorological basis.
    2. We have proposed the existence of a notational system associated with an unambiguous animal subject, relating to biologically significant events informed by the ethological record, which allows us for the first time to understand a Palaeolithic notational system in its entirety. This utilized/allowed the function of ordinality (and, later, place value), which were revolutionary steps forward in information recording.
    1. Data Viz with Python and RLearn to Make Plots in Python and R

      data viz with python and R

    1. We can have a machine learning model which gives more than 90% accuracy for classification tasks but fails to recognize some classes properly due to imbalanced data or the model is actually detecting features that do not make sense to be used to predict a particular class.

      Les mesures de qualite d'un modele de machine learning

  17. Dec 2022
    1. According to an analysis from the Wall Street Journal, the top 1% of Twitch streamers made over 50% of all money paid out by the platform in 2021. Furthermore, just 5% of users had made over $1,000 in the same year. Only 0.06% had made over the U.S. median household income of $67,521. In a survey of 5,000 community members composed of smaller Twitch streamers, Stream Scheme found that 76% were not able to reach Twitch’s $100 minimum payout threshold. Most others were making between $25-130 per month on the platform. 
    2. In a 2021 leak of Twitch’s user data that included creator payouts, it was revealed that from August 2019 to October 2021, the top 100 streamers on the platform made anywhere between $9,626,712.16 and $886,999.17. 
    1. Best times to post on social media overall: Tuesdays through Thursdays at 9 a.m. or 10 a.m. Best days to post on social media: Tuesdays through Thursdays Worst days to post on social media: Sundays
    1. Remember the book title and its genre. You will need to define the term "memoir," and recognize the publisher, title, and author for bibliographic information including the year of publication.

    Tags

    Annotators

  18. Nov 2022
    1. https://whatever.scalzi.com/2022/11/25/how-to-weave-the-artisan-web/

      “But Scalzi,” I hear you say, “How do we bring back that artisan, hand-crafted Web?” Well, it’s simple, really, and if you’re a writer/artist/musician/other sort of creator, it’s actually kind of essential:

    1. Our annotators achieve thehighest precision with OntoNotes, suggesting thatmost of the entities identified by crowdworkers arecorrect for this dataset.

      interesting that the mention detection algorithm gives poor precision on OntoNotes and the annotators get high precision. Does this imply that there are a lot of invalid mentions in this data and the guidelines for ontonotes are correct to ignore generic pronouns without pronominals?

    2. an algorithm with high precision on LitBank orOntoNotes would miss a huge percentage of rele-vant mentions and entities on other datasets (con-straining our analysis)

      these datasets have the most limited/constrained definitions for co-reference and what should be marked up so it makes sense that precision is poor in these datasets

    3. Procedure: We first launch an annotation tutorial(paid $4.50) and recruit the annotators on the AMTplatform.9 At the end of the tutorial, each annotatoris asked to annotate a short passage (around 150words). Only annotators with a B3 score (Bagga

      Annotators are asked to complete a quality control exercise and only annotators who achieve a B3 score of 0.9 or higher are invited to do more annotation

    4. Annotation structure: Two annotation ap-proaches are prominent in the literature: (1) a localpairwise approach, annotators are shown a pairof mentions and asked whether they refer to thesame entity (Hladká et al., 2009; Chamberlain et al.,2016a; Li et al., 2020; Ravenscroft et al., 2021),which is time-consuming; or (2) a cluster-basedapproach (Reiter, 2018; Oberle, 2018; Bornsteinet al., 2020), in which annotators group all men-tions of the same entity into a single cluster. InezCoref we use the latter approach, which can befaster but requires the UI to support more complexactions for creating and editing cluster structures.

      ezCoref presents clusters of coreferences all at the same time - this is a nice efficient way to do annotation versus pairwise annotation (like we did for CD^2CR)

    5. owever, these datasets vary widelyin their definitions of coreference (expressed viaannotation guidelines), resulting in inconsistent an-notations both within and across domains and lan-guages. For instance, as shown in Figure 1, whileARRAU (Uryupina et al., 2019) treats generic pro-nouns as non-referring, OntoNotes chooses not tomark them at all

      One of the big issues is that different co-reference datasets have significant differences in annotation guidelines even within the coreference family of tasks - I found this quite shocking as one might expect coreference to be fairly well defined as a task.