344 Matching Annotations
  1. Oct 2023
    1. indefinite: the provider has no particular commitment to the object.

      Expectations

    2. finite: availability is expected to end on or around a given date (e.g., limited support for software versions not marked “long term stable”) or trigger event (e.g., single-use link).

      Expectations

    3. finite: availability is expected to end on or around a given date (e.g., limited support for software versions not marked “long term stable”) or trigger event (e.g., single-use link). indefinite: the provider has no particular commitment to the object. lifetime: the object is expected to be available as long as the provider exists. subinfinite: due to succession arrangements, the object is expected to be available beyond the provider organization’s lifetime.

      Expectations

      'Indefinite' should rather be 'Undefined'

    4. We define content variance to be a description of the ways in which provider policy or practice anticipates how an object’s content will change over time. Approaches to content variance differ depending on the object, version, service, and provider.

      Expectations

    5. molting: Previously recorded content may be entirely overwritten at any time with content that preserves thematic continuity. For example, an organization’s homepage may be completely reworked while continuing to be its homepage, and a weather or financial service page may reflect dramatic changes in conditions several times a day.

      Expectations

    6. rising: Previously recorded content may be improved at any time, for example, with better metadata (datasets), new features (software), or new insights (pre- and post-prints). This encompasses any change under “fixing”

      Expectations

    7. fixing: Previously recorded content may be corrected at any time, in addition to any change under “keeping”

      Expectations

    8. keeping: Previously recorded content will not change, but character, compression, and markup encodings may change during a format migration, and high-priority security concerns will be acted upon (e.g., software virus decontamination, security patching).

      Expectations

    9. frozen: The bit stream representing previously recorded content will not change

      Expectations

    10. id string: the sequence of characters that is the identifier string itself, possibly modified by adding a well-known prefix (often starting with http://) in order to turn it into a URL. identifier: an association between an id string and a thing; e.g., an identifier “breaks” when the association breaks, but to act on an identifier requires its id string. actionable identifier: an identifier whose id string may be acted upon by widely available software systems such as web browsers; e.g., URLs are actionable identifiers.

      Classes of identifier

    11. On the other hand, the universal numeric fingerprint (Altman & King 2007) is a PID that supports citation of numeric data in a way that is largely immune to the syntactic formatting and packaging of the data

      Versioning

    12. By contrast, repositories such as figshare (figshare 2016) and Merritt (Abrams et al. 2011) tolerate changes to metadata under the PID assigned originally, but create a new “versioned” PID if the object title or a component file changes, and in the latter case, the original non-versioned PID always references the latest version

      Versioning

    13. The DataONE federated data network (Michener et al. 2011) assigns a PID to immutable data objects and a “series identifier” that resolves to the latest version of an object (DataONE 2015).

      Versioning

    14. Hey, T, Tansley, S and Tolle, K (2009). The Fourth Paradigm: Data-Intensive Scientific Discovery In: Redmond, Washington: Microsoft Research.

      Citation stability

    15. At a minimum it implies a prediction about an archive’s commitment and capacity to provide some specific kind of long-term functionality

      Persistence

    16. persistence is purely a matter of service

      Persistence

    1. Content drift describes the case where the resource identified by its URI changes over time and hence, as time goes by, the request returns content that becomes less and less representative of what was originally referenced.

      Content Drift

    1. The Handle System was first implemented in autumn 1994, and was administered and operated by CNRI until December 2015, when a new "multi-primary administrator" (MPA) mode of operation was introduced

      Handle system introduction

    1. These findings provide strong indicators that scholarly contentproviders reply to DOI requests differently, depending on the request method,the originating network environment, and institutional subscription levels

      PID Resolution factors

    1. In addition, PIDs may be local to an individual organization (e.g. identifiers in an internal human resources system), national (e.g. the DAI – Digital Author Identifier, used in the Netherlands), or global (all the examples in the paragraph above).

      PID Scope

    2. identifiers for research objects and outputs, for example, DOIs (digital object identifiers), Archival Resource Key identifiers (ARKs), handles and IGSNs (International Geo Sample Number).

      PID Entities - research outputs

    3. identifiers for organizations, including GRID (Global Research Identifier Database), Ringgold IDs, ISNIs (International Standard Name Identifiers), LEIs (legal entity identifiers) and the identifiers that will be provided by the recently announced Research Organization Registry2

      PID Entities - organisations

    4. identifiers for researchers, such as ORCID iDs, ResearcherIDs and Scopus IDs

      PID Entities - Researchers

    1. ARK systems such as Noid and N2T can record and provide metadata about any resource with an ARK.  That metadata becomes available via APIs, and can be seen when you add “?” to the end of an ARK URL. (See “Inflections” below) ARK metadata is very flexible, with no initial required metadata, but with support for multiple metadata schemas.  This flexibility is intentional: ARKs are designed to support a full digital object workflow, including the earliest stages before a resource is well-understood or described.

      ARK Metadata

    1. The ARK Alliance maintains a complete registry of all assigned NAANs, currently at the California Digital Library. The registry is mirrored at the (U.S.) National Library of Medicine and the National Library of France.

      PID Registry

  2. Sep 2023
    1. PIDs are exemplary implementations of FAIR data in their own right, but they also help to provide FAIR access to research entities like articles and datasets.

      Benefits

    2. DOIs are a great solution for the problem of URIs that change over time, but this approach does depend on journal publishers, repositories, libraries, and other major hosting organization to be responsible for maintaining current link information within the DOI records that they have created

      Integrity

    3. Ultimately the knowledge graph will permit a much clearer understanding of global research networks, research impact, and the ways in which knowledge is created in a highly interconnected world.

      Use Case

    4. PIDs infrastructure promises much more accurate and timely reporting for key metrics including the number of publications produced at an institution in a given year, the total number of grants, and the amount of grant funding received.

      Use Case

    5. They can also much more easily see whether researchers have met mandated obligations for open access publishing and open data sharing.

      Use Case

    6. The global knowledge graph created by the interlinking of PIDs can help funders to much more easily identify the publications, patents, collaborations, and open knowledge resources that are generated through their various granting programs.

      Use Case

    7. benefit from efficiencies

      Benefits to Institutions

    8. Benefits to Researchers

      Time-saving: reduction in administrative burden

    9. identifiers will continue to resolve indefinitely.

      Resolvability

    10. All PID Registration Agencies must have highly redundant storage and hosting infrastructure in order to ensure that services are globally available 24-7

      Redundancy

    11. Persistent

      Persistent

    12. Machine-Readable

      {Machine-Readable}

    13. Globally Unique Names

      {Globally Unique}

    1. Systems such as DOI can thus support resolution mechanisms that are likely to be able to maintain the resolution of identifiers regardless of changes in technology or to one particular system.

      {Protocol Independence}

    1. Brown, Josh, Jones, Phill, Meadows, Alice, & Murphy, Fiona. (2022). Incentives to invest in identifiers: A cost-benefit analysis of persistent identifiers in Australian research systems. Zenodo. https://doi.org/10.5281/zenodo.7100578

      P1: Benefits of PIDs

    1. PIDs for research dataPIDs for instrumentsPIDs for academic eventsPIDs for cultural objects and their contextsPIDs for organizations and projectsPIDs for researchers and contributorsPIDs for physical objectsPIDs for open-access publishing services and current research information systems (CRIS)PIDs for softwarePIDs for text publications

      PID Use Case Elements, entities

    1. Although the DOIs assigned to relatively large aggregations of datasets are well suited for citation and acknowledgment pur-poses, they are not issued at fine enough granularity to meet the scientific imperative that published results should be traceableand verifiable

      Reproducibility

    2. Persistent identifiers for acknowledgment and citation

      PID Use Cases

    3. ne key element is to generate a dataset-centric rather than system-centric focus, with an aim to making the infrastructure less prone to systemic failure.

      PID Motivations

    4. scientific reproducibility and accountability

      PID Motivations

    1. To reuse and/or reproduce research it is desirable that researchoutput be available with sufficient context and details for bothhumans and machines to be able to interpret the data as described inthe FAIR principles

      Reusability and reproducibility of research output

    2. Registration of research output is necessary to report tofunders like NWO, ZonMW, SIA, etc. for monitoring andevaluation of research (e.g. according to SEP or BKOprotocols). Persistent identifiers can be applied to ease theadministrative burden. This results in better reporting,better information management and in the end betterresearch information.

      Registering and reporting research

    3. 1. Registration and reporting research2. Reusability and reproducibility of research3. Evaluation and recognition of research4. Grant application5. Researcher profiling6. Journal rankings

      PID Use Cases

    1. Deduplication of researchersLinkage with awardsAuthoritative attribution of affiliationand worksORCID iD RecommendedIdentification of datasets, software andother types of research outputsDataCite DOI RecommendedIdentification of organisations GRID/ROR RecommendedIdentification of organisations inNZRISNZBN Required for data providers

      PID Use Cases

    1. Function PID type (Examples:) Recommended or required?

      PID Use Cases

    2. The progress and impact of the project will be measured and monitored through the collection ofquantitative indicators. The different systems of the project partners as well as ORCID Inc. andROR will be queried. If possible, indicators for all 10 PID use cases should be measured. Theseinclude for example the following indicators:● Number of registered DataCite DOIs by scientific institutions in Germany.● Number of registered DataCite-DOIs that have a link to further resources via arelated-IDentifier relationship.● Number of ROR implementations at scientific institutions in Germany.● Number of GND records that have an ORCID iD or a ROR ID.● Etc.

      PID Use Cases

    1. Key features● KISTI’s mission is to curate collect, consolidate, and provide scientific information toKorean researchers and institutions. It includes but is not limited to.■ Curating Korean R&D outputs. Curate them higher state of identification for bettercuration, tracking research impact, analysing research outcomes.■ DOI RA management. Issuing DOIs to Korean research outputs, Intellectualproperties, research data■ Support Korean societies to stimulate better visibilities of their journal articlesaround the world.■ Collaborate for better curation (identification and interlinking) with domestic andglobal scientific information management institutions, publishers and identifiermanaging agencies

      PID Use Cases

    1. Name of infrastructure Key purpose List of integrated PIDsFairdata.fi Research data publication,metadata hub andpreservation serviceDOI, URN, ORCID (updaResearch.fi National research data hub. Current draft:ADSbibcode - AstrophysicsData System -Bibliographic ReferenceCode (en)ARK - Archival ResourceKey (en)arXiv - arXiv identifierscheme (en)BusinessID - Y-tunnus (fi)(en)Crossref_funders -Crossref Funder Registry(en)DOI - Digital ObjectIdentifier (en)Case Study: FINLAND Page 3 of 6

      PID Use Cases

    1. Name of infrastructure Key purpose List ofintegratedPIDse-infra This large infrastructure will build the NationalRepository Platform in the upcoming years. Thatshould greatly facilitate adoption of PIDs.TBDNational CRIS - IS VaVaI(R&D Information System)National research information system. We planon working with Research, Development andInnovation Council (in charge of IS VaVaI) onintegrating global PIDs into their submissionprocesses as required. Nowadays it uses mostlylocal identifiers.TBDInstitutional CRIS systems Various institutional CRIS systems at CzechRPOs. OBD (Personal Bibliographic Database)application is an outstanding case of aninstitutional CRIS system in the Czech Republicdeveloped locally by a Czech company DERS.An ORCID integration for OBD is currently indevelopment.TBD, OBDORCID inprocessInstitutional or subjectrepositoriesThere are several repositories in the Czechrepublic collecting different objects, some arealready using PIDs but there is still enough roomto improve and really integrate those PIDs, notonly allow their evidence.Handle,DOI,maybeotherMajor research funders Grant application processes TBDLocal publishers Content submission processes TBD

      PID Use Cases

    2. TARGET INSTITUTIONS:● Public research performing organisations (RPOs): Higher Education Institutions andResearch organizations● Research funding organizations (RFOs): Ministry of Education, Youth and Sports, CzechScience Foundation, Technology Agency of the Czech Republic etc.● Policymakers: Ministry of Education, Youth and Sports; Research, Development andInnovation Council (R&D&I Council)● Libraries: National library, National Library of Technology, academic libraries● Publishers based in Czechia● Service providers, research infrastructuresTARGET GROUPS:● Researchers● Librarians● Open Science/Open Access managers/coordinators● CRIS system managers● Repository managers● Other research support positions, e.g. data stewards, data curators

      PID Stakeholders and Target Groups

    3. Function PID type Recommended or required?

      PID Use Cases

    1. Function PID type Recommended or required?

      PID Use Cases in the Netherlands

    2. 1. Registration and reporting research2. Reusability and reproducibility of research3. Evaluation and recognition of research4. Grant application5. Researcher profiling6. Journal rankings

      PID Use Cases in the Netherlands

    1. PIDs comparison tableCase study Function PID typeFinland Researchers, persons ORCID; ISNIOrganisations VAT-number (not resolvableyet)RoRISNI___________________________________________________________________________________________________________________Pathways to National PID Strategies: Guide and Checklist to facilitate uptake and alignment Page 13 of 20

      PID usage by country

  3. Aug 2023
  4. Jul 2023
  5. May 2023
    1. Deep Learning (DL) A Technique for Implementing Machine LearningSubfield of ML that uses specialized techniques involving multi-layer (2+) artificial neural networksLayering allows cascaded learning and abstraction levels (e.g. line -> shape -> object -> scene)Computationally intensive enabled by clouds, GPUs, and specialized HW such as FPGAs, TPUs, etc.

      [29] AI - Deep Learning

    1. The object of the present volume is to point out the effects and the advantages which arise from the use of tools and machines ;—to endeavour to classify their modes of action ;—and to trace both the causes and the consequences of applying machinery to supersede the skill and power of the human arm.

      [28] AI - precedents...

    1. Epidemiologist Michael Abramson, who led the research, found that the participants who texted more often tended to work faster but score lower on the tests.

      [21] AI - Skills Erosion

    1. An AI model taught to view racist language as normal is obviously bad. The researchers, though, point out a couple of more subtle problems. One is that shifts in language play an important role in social change; the MeToo and Black Lives Matter movements, for example, have tried to establish a new anti-sexist and anti-racist vocabulary. An AI model trained on vast swaths of the internet won’t be attuned to the nuances of this vocabulary and won’t produce or interpret language in line with these new cultural norms. It will also fail to capture the language and the norms of countries and peoples that have less access to the internet and thus a smaller linguistic footprint online. The result is that AI-generated language will be homogenized, reflecting the practices of the richest countries and communities.

      [21] AI Nuances

    1. According to him, there are several goals connected to AI alignment that need to be addressed:

      [20] AI - Alignment Goals

    1. The following table lists the results that we visualized in the graphic.

      [18] AI - Increased sophistication

  6. artificialintelligenceact.eu artificialintelligenceact.eu
    1. A summary presentation on the Act by the European Commission can be downloaded here.

      [3] AI - Risk Clasiification

    1. Images: Generative AI can create new images based on existing ones, such as creating a new portrait based on a person’s face or a new landscape based on existing scenery

      [17] AI- Features - Image Synthesis

    1. To evaluate the information for yourself, you can also expand your view to see how the response is corroborated, and click to go deeper.

      [14] AI Features - Provenance

    2. You’ll see an AI-powered snapshot of key information to consider, with links to dig deeper.

      [14] AI - Summary

    1. Actors: Language models could drive down the cost of running influence operations, placing them within reach of new actors and actor types. Likewise, propagandists-for-hire that automate production of text may gain new competitive advantages.Behavior: Influence operations with language models will become easier to scale, and tactics that are currently expensive (e.g., generating personalized content) may become cheaper. Language models may also enable new tactics to emerge—like real-time content generation in chatbots.Content: Text creation tools powered by language models may generate more impactful or persuasive messaging compared to propagandists, especially those who lack requisite linguistic or cultural knowledge of their target. They may also make influence operations less discoverable, since they repeatedly create new content without needing to resort to copy-pasting and other noticeable time-saving behaviors.

      [10] AI - Influencing Concerns

    1. The new Bing also cites all its sources, so you’re able to see links to the web content it references.

      [13] AI Features - Provenance

    2. empowers you to refine your search until you get the complete answer you are looking for by asking for more details, clarity and ideas – with links available so you can immediately act on your decisions.

      [13] AI Features - Refinement

    3. reviews results from across the web to find and summarize the answer you’re looking for.

      [13] AI Features - Summaries

    1. Editable metadata – identifiers’ metadata must be able to be edited in order to allow their owners to update details of the thing they are referring to, such as its location, as they will inevitably change.

      {Dynamic}

    2. Ownership – identifiers created must be able to have their management restricted to particular agent;

      {Single Agent}

    3. articulates requirements for readability sating that identifiers must be: Any printable characters from the Universal Character Set of ISO/IEC 10646 (ISO 2012):UTF-8 encoding is required; Case insensitive:Only ASCII case folding is allowed.

      {UTF-8} {ASCII Case Folding}

    4. The reason for broadening them is that identifier resolution systems may be forced to change protocols over time and what is acceptable for one protocol may not be for another.

      {Protocol Independence}

    5. Creating identifiers that are independent of any particular technology or organisation and are able to be unambiguously understood are well-known requirement for PID systems.

      {Independence}

    6. Uniqueness – within some scope, not necessarily globally, to avoid clashes;

      {Unique}

    1. In addition, data model policy requires that RAs maintain a record of the date of allocation of a DOI name, and the identity of the registrant on whose behalf the DOI name was allocated.

      {Record-keeping}

    2. The policy provides a simple test of an RA’s competence: the ability to make a DOI Kernel Declaration, which requires that the RA has an internal system which can support the unambiguous allocation of a DOI name, and is fundamentally sound enough to support interoperability within the network.

      {Competence} {Unambiguous Allocation}

    3. The second aim of DOI data model policy is “to ensure minimum standards of quality of administration of DOI names by Registration Agencies, and facilitate the administration of the DOI system as a whole”.

      {Administrative Capacity}

    1. Designing and implementing specific operational processes for e.g. quality control of input data and output data; Integrating the community into other DOI related activities and services.

      {Quality Assurance}

    2. Providing applications, services, marketing, outreach, business cases etc. to introduce the DOI system to the community; Designing and implementing specific operational processes for e.g. quality control of input data and output data;

      {Services}

    3. Providing information and advice to the community

      {Community Advice}

    4. Registration Agencies must comply with the policies and technical standards established by the IDF, but are free to develop their own business model for running their businesses. There is no appropriate “one size fits all” model; RAs may be for-profit or not-for-profit organisations. The costs of providing DOI registration may be included in the services offered by an RA provision and not separately distinguished from these. Examples of possible business models may involve explicit charging based on the number of prefixes allocated or the number of DOI names allocated; volume discounts, usage discounts, stepped charges, or any mix of these; indirect charging through inclusion of the basic registration functions in related value added services; and cross-subsidy from other sources.

      {Fee-for-Service}

    5. Integrating the community into other DOI related activities and services

      {Community}

    1. URNs are globally unique persistent identifiers assigned within defined namespaces so they will be available for a long period of time, even after the resource which they identify ceases to exist or becomes unavailable.

      {Global}

    2. Approximately sixty formal URN namespace identifiers have been registered.

      {Unambiguous Allocation}

    3. In order to ensure the global uniqueness of URN namespaces, their identifiers (NIDs) are required to be registered with the IANA. Registered namespaces may be "formal" or "informal".

      {Unique}

    4. A Uniform Resource Name (URN) is a Uniform Resource Identifier (URI) that uses the urn scheme. URNs are globally unique persistent identifiers assigned within defined namespaces so they will be available for a long period of time, even after the resource which they identify ceases to exist or becomes unavailable

      {Persistence}

    1. existence, and ability to be used in services outside the direct control of the issuing assigner, without a stated time limit

      {Persistence}

    2. specification by a DOI name (3.2) of one and only one referent (3.16)

      {Unique}

    3. process of submitting a DOI name (3.2) to a network service and receiving in return one or more pieces of current information related to the identified object such as metadata or a location of the object or of metadata

      {Resolvable}

    4. — dynamic updating of metadata, applications and services.

      {Dynamic}

    5. — single management of data for multiple output formats (platform independence),

      {Platform Independence}

    6. — interoperability with other data from other sources,

      {Interoperable}

    7. — persistence, if material is moved, rearranged, or bookmarked,

      {Persistence}

    8. — extensibility by adding new features and services through management of groups of DOI names,

      {Extensible}

    1. Patent non-assertion – The organisation should commit to a patent non-assertion covenant. The organisation may obtain patents to protect its own operations, but not use them to prevent the community from replicating the infrastructure.

      {No Patents}

    2. Open source – All software required to run the infrastructure should be available under an open source license. This does not include other software that may be involved with running the organisation.

      {Open Source}

    3. Open data (within constraints of privacy laws) – For an infrastructure to be forked it will be necessary to replicate all relevant data. The CC0 waiver is best practice in making data legally available. Privacy and data protection laws will limit the extent to which this is possible

      {Open Data}

    4. Available data (within constraints of privacy laws) – It is not enough that the data be made “open” if there is not a practical way to actually obtain it. Underlying data should be made easily available via periodic data dumps.

      {Accessible}

    5. Revenue based on services, not data – data related to the running of the research enterprise should be a community property. Appropriate revenue sources might include value-added services, consulting, API Service Level Agreements or membership fees.

      {Sustainable Operational Revenue}

    6. Mission-consistent revenue generation – potential revenue sources should be considered for consistency with the organisational mission and not run counter to the aims of the organisation

      {Mission-Consistent}

    7. Goal to create contingency fund to support operations for 12 months – a high priority should be generating a contingency fund that can support a complete, orderly wind down (12 months in most cases). This fund should be separate from those allocated to covering operating risk and investment in development.

      {Contingency}

    8. Goal to generate surplus – organisations which define sustainability based merely on recovering costs are brittle and stagnant. It is not enough to merely survive, it has to be able to adapt and change. To weather economic, social and technological volatility, they need financial resources beyond immediate operating costs.

      {Surplus}

    9. Time-limited funds are used only for time-limited activities – day to day operations should be supported by day to day sustainable revenue sources. Grant dependency for funding operations makes them fragile and more easily distracted from building core infrastructure.

      {Time-Limited}

    10. Formal incentives to fulfil mission & wind-down – infrastructures exist for a specific purpose and that purpose can be radically simplified or even rendered unnecessary by technological or social change. If it is possible the organisation (and staff) should have direct incentives to deliver on the mission and wind down.

      {Formal Incentives]

    11. Living will – a powerful way to create trust is to publicly describe a plan addressing the condition under which an organisation would be wound down, how this would happen, and how any ongoing assets could be archived and preserved when passed to a successor organisation. Any such organisation would need to honour this same set of principles.

      {Living Will}

    12. Cannot lobby – the community, not infrastructure organisations, should collectively drive regulatory change. An infrastructure organisation’s role is to provide a base for others to work on and should depend on its community to support the creation of a legislative environment that affects it.

      {Cannot Lobby}

    13. Transparent operations – achieving trust in the selection of representatives to governance groups will be best achieved through transparent processes and operations in general (within the constraints of privacy laws).

      {Transparent}

    14. Non-discriminatory membership – we see the best option as an “opt-in” approach with a principle of non-discrimination where any stakeholder group may express an interest and should be welcome. The process of representation in day to day governance must also be inclusive with governance that reflects the demographics of the membership.

      {Membership}

    15. Stakeholder Governed – a board-governed organisation drawn from the stakeholder community builds more confidence that the organisation will take decisions driven by community consensus and consideration of different interests.

      {Stakeholder Governed}

    16. Coverage across the research enterprise – it is increasingly clear that research transcends disciplines, geography, institutions and stakeholders. The infrastructure that supports it needs to do the same.

      {Coverage}

    1. this specification permits several other cases of URN resolution as well as URNs for resources that do not involve information retrieval systems. This is true either individually for particular URNs or (as defined below) collectively for entire URN namespaces.

      {Resolvable}