- Last 7 days
-
www.sciencedirect.com www.sciencedirect.com
-
Learning heterogeneous graph embedding for Chinese legal document similarity
The paper proposes L-HetGRL, an unsupervised approach using a legal heterogeneous graph and incorporating legal domain-specific knowledge, to improve Legal Document Similarity Measurement (LDSM) with superior performance compared to other methods.
-
- May 2023
-
random-blather.com random-blather.com
-
https://random-blather.com/2014/04/28/information-isnt-power/
Illustration by David Somerville based on the original by Hugh McLeod.
Link to: https://hypothes.is/a/ysRBGgACEe6UNPvIvmWBkQ
This diagram is roughly a cartoon of the zettelkasten process, especially if the panels are labeled: reading, excerpting/synopsis, linking, serendipity, writing.
-
-
-
Trakt DataRecoveryIMPORTANTOn December 11 at 7:30 pm PST our main database crashed and corrupted some of the data. We're deeply sorry for the extended downtime and we'll do better moving forward. Updates to our automated backups are already in place and they will be tested on an ongoing basis.Data prior to November 7 is fully restored.Watched history between November 7 and Decmber 11 has been recovered. There is a separate message on your dashboard allowing you to review and import any recovered data.All other data (besides watched history) after November 7 has already been restored and imported.Some data might be permanently lost due to data corruption.Trakt API is back online as of December 20.Active VIP members will get 2 free months added to their expiration date
From late 2022
Tags
Annotators
URL
-
-
www.doi.org www.doi.org
-
More sophisticated functionality available, e.g., multiple resolution, data typing
{Data Typing} {Multiple Resolution}
-
-
openscholarlyinfrastructure.org openscholarlyinfrastructure.org
-
Open data (within constraints of privacy laws) – For an infrastructure to be forked it will be necessary to replicate all relevant data. The CC0 waiver is best practice in making data legally available. Privacy and data protection laws will limit the extent to which this is possible
{Open Data}
Tags
Annotators
URL
-
-
www.rand.org www.rand.org
-
It is also important to note that this positive evidence for low-income certificate-earners stands in con-trast to findings for other historically underserved groups; studies indicate that individuals of color and older individuals go on to stack credentials at lower rates and see smaller earnings gains relative to White individuals and younger individuals (Bohn and McConville, 2018; Bohn, Jackson and McConville, 2019; Daugherty et al., 2020; Daugherty and Anderson, 2021). Although we suspect many low-income individuals are also individuals of color, the findings suggest that there are inequities within stackable credential pipelines that might be more strongly tied to race, ethnicity, and age than to socioeconomic status. It is also possible that many low-income individuals never complete a first certificate and thus do not enter a stackable credential pathway
-
- Apr 2023
-
certificates.creativecommons.org certificates.creativecommons.org
-
Recommended Resource
I recommend adding the webpage "Open Access in Australia" on Wikiwand that documents Australia's history for accepting and promoting open access and open publication in its country.
The site contains a timeline that documents key years in which the open movement, open access, open government, and open data concepts were introduced. The year that CC Australia was established is included in the timeline.
-
-
certificates.creativecommons.org certificates.creativecommons.org
-
**Recommend Resource: ** Under the "More Information About Other Open Movements" I recommended adding Higashinihon Daishinsai Shashin Hozon Purojekuto, (trans. Great Earthquake of Eastern Japan Photo Archiving Project) which is one of Japan's open government and open data efforts to document all photographs about Japan's 2011 earthquake.
The site currently contains close to 40,000 photographs of the aftermath of the natural disaster.
The photos are hosted by Yahoo! Japan and are published under non-commercial clause for open access to the public.
-
-
campustechnology.com campustechnology.com
-
Once the awarding and registration systems are in place, institutions should also integrate with a modern CRM solution to attract and manage student interest, support, and personalized communications to increase enrollment and engagement. The CRM needs to support career services and other experiential learning departments as the school looks to build outside relationships with organizations and industry partners to provide real-world learning experiences and assessment opportunities for students
CRM focus that goes beyond the academic unit to include others. Also think about Alumni Affairs, Foundation, and lifelong learning.
-
-
en.wikipedia.org en.wikipedia.org
-
blog.hubspot.com blog.hubspot.com
-
Why do so many businesses share their data openly, for free? Most often, the answer is scale. As companies grow, the staff within those companies realize they have more ideas than they have the time and resources to develop them. It’s typically easier to work with other external companies that specialize in these ideas than build them in-house. By creating APIs, a company allows third-party developers to build applications that improve adoption and usage of its platform. That way, a business can build an ecosystem that becomes dependent on the data from their API, which often leads to additional revenue opportunities.
-
-
betterprogramming.pub betterprogramming.pub
-
After struggling with this problem for a while and still being far from solving this issue, I realized that I was making too many requests to the website; which made me come up with the idea of saving all the pages I needed to scrape on my local computer. Next, I started sending requests to these local HTML files instead and kept adapting my code.
I had similar problem on this.
-
- Mar 2023
-
noahpinion.substack.com noahpinion.substack.com
-
[There’s also] a big new study from Cambridge University, in which researchers looked at 84,000 people…and found that social media was strongly associated with worse mental health during certain sensitive life periods, including for girls ages 11 to 13…One explanation is that teenagers (and teenage girls in particular) are uniquely sensitive to the judgment of friends, teachers, and the digital crowd.
-
-
psychclassics.yorku.ca psychclassics.yorku.ca
-
In order to throw light on the question whether exceptionally bright children are specially likely to be one-sided, nervous, delicate, morally abnormal, socially unadaptable, or otherwise peculiar, the writer has secured rather extensive information regarding 31 children whose mental age was found by intelligence tests to be 25 per cent above the actual age. This degree of intelligence is possessed by about 2 children out of 100, and is nearly as far above average intelligence as high-grade feeble-mindedness is below. The supplementary information, which was furnished in most cases by the teachers, may be summarized as follows: -- Ability special or general. In the case of 20 out of 31 the ability is decidedly general, and with 2 it is mainly general. The talents of 5 are described as more or less special, but only in one case is it remarkably so. Doubtful 4. Health. 15 are said to be perfectly healthy; 13 have one or more physical defects; 4 of the 13 are described as delicate; 4 have adenoids; 4 have eye-defects; 1 lisps; and 1 stutters. These figures are about the same as one finds in any group of ordinary children. Studiousness. "Extremely studious," 15; "usually studious" or "fairly studious," 11; "not particularly studious," 5; "lazy," 0. Moral traits. Favorable traits only, 19; one or more unfavorable traits, 8; no answer, 4. The eight with unfavorable moral traits are described as follows: 2 are "very self-willed"; 1 "needs close watching"; 1 is "cruel to animals"; 1 is "untruthful"; 1 is "unreliable"; 1 is "a bluffer"; 1 is "sexually abnormal," perverted," and "vicious." It will be noted that with the exception of the last child, the moral irregularities mentioned can hardly be regarded, from the psychological point of view, as essentially abnormal. It is perhaps a good rather than a bad sign for a child to be self-willed; most children "need close watching"; and a certain amount of untruthfulness in children is the rule and not the exception. Social adaptability. Socially adaptable, 25; not adaptable, 2; doubtful, 4. Attitude of other children. "Favorable," "friendly," "liked by everybody," "much admired," "popular," etc., 26; "not liked," 1; "inspires repugnance," 1; no answer, 1. Is child a leader? "Yes," 14; "no," or "not particularly," 12; doubtful, 5. Is play life normal? "Yes," 26; "no," 1; "hardly," 1; doubtful, 3. 1s child spoiled or vain? "No," 22; "yes," 5; "somewhat," 2; no answer, 2. According to the above data, exceptionally intelligent children are fully as likely to be healthy as ordinary children; their ability is far more often general than special, they are studious above the average, really serious faults are not common among them, they are nearly always socially adaptable, are sought after as playmates and companions, their play life is usually normal, they are leaders far oftener than other children, and notwithstanding their many really superior qualities they are seldom vain or spoiled.
The data shows that children who are more superior are seen as healthy. I think children that are superior are seen as more healthy because they have a more positive outlook on life.
-
-
-
There are two main reasons to use logarithmic scales in charts and graphs.
- respond to skewness towards large values / outliers by spreading out the data.
- show multiplicative factors rather than additive (ex: b is twice that of a).
The data values are spread out better with the logarithmic scale. This is what I mean by responding to skewness of large values.
In Figure 2 the difference is multiplicative. Since 27 = 26 times 2, we see that the revenues for Ford Motor are about double those for Boeing. This is what I mean by saying that we use logarithmic scales to show multiplicative factors
-
One reason for choosing a dot plot rather than a bar chart is that it is less cluttered. We will be learning other benefits of dot plots in this and future posts.
- Length of bar/line has no meaning in a log-scale
A dot plot is judged by its position along an axis; in this case, the horizontal or x axis. A bar chart is judged by the length of the bar. I don’t like using lengths with logarithmic scales. That is a second reason that I prefer dot plots over bar charts for these data.
- Length of bar/line has no meaning in a log-scale
-
-
krebsonsecurity.com krebsonsecurity.com
-
A new breach involving data from nine million AT&T customers is a fresh reminder that your mobile provider likely collects and shares a great deal of information about where you go and what you do with your mobile device — unless and until you affirmatively opt out of this data collection. Here’s a primer on why you might want to do that, and how.
-
-
web.hypothes.is web.hypothes.is
-
As a teacher of English to secondary school students, and as an online doctoral student, I am excited to explore and possibly integrate Hypothesis into my work. I love research and everything involved with it. Thank you to the creators of this tool --
-
-
news.ycombinator.com news.ycombinator.com
-
"For this campaign, we surveyed 930 Americans to explore their retirement plans. Among them, 16% were retired, 22% were still working, and 62% were retirees who had returned to work."So, 149 of those surveyed were retired. Of those 149, 25 (1 in 6) are considering returning to work. 13 of those want remote positions.
-
-
newamerica.org newamerica.org
-
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov
-
-
Back to the basics: Identifying and addressing underlying challenges in achieving high quality and relevant health statistics for indigenous populations in Canada
-
-
investinopen.org investinopen.org
-
The compiled dataset is made available on Zenodo.
Great to see IOI walking the walk here and releasing data with the recommended CC0 public domain dedication.
-
-
www.theguardian.com www.theguardian.com
-
Seit dem Beginn von Satelliten-Beobachtungen vor vier Jahrzehnten ist das antarktische Meereis noch nie so geschrumpft wie im Februar 2023.
Tags
- climate tipping points
- time: 2023-02
- expert: Will Hopp
- Mode: study
- time: 1979-2023
- expert: Rob Massom
- expert: Ted Scambos
- expert: Matt England
- Region: Antarctica
- expert: Phil Reid
- Project: Australian Antarctic Program Partnership
- institution: National Snow and Ice Data Center
- Parameter: m sq km
- Thwaites-Gletscher
- Region: west antarctic ice shield
- expert: Ariaan Purich
- process: sea ice loss
Annotators
URL
-
-
revolutionpopuli.com revolutionpopuli.com
-
Flancian thought this was interesting.
-
- Feb 2023
-
www.irishstatutebook.ie www.irishstatutebook.ie
-
Where information that a controller would otherwise be required to provide to a datasubject pursuant to subsection (1) includes personal data relating to another individualthat would reveal, or would be capable of revealing, the identity of the individual, thecontroller—(a)shall not, subject to subsection (8), provide the data subject with the informationthat constitutes such personal data relating to the other individual, and(b)shall provide the data subject with a summary of the personal data concernedthat—(i)in so far as is possible, permits the data subject to exercise his or her rightsunder this Part, and
There's a right to provide a summary where it would be hard to avoid revealing the identity of another individual.
-
Subject to subsection (2), a controller, with respect to personal data for which it isresponsible, may restrict, wholly or partly, the exercise of a right of a data subjectspecified in subsection (4)
Can restrict, but must be necessary and proportionate (and under one of the restriction rights)
-
Subsection (1) shall not apply—(a)in respect of personal data relating to the data subject that consists of anexpression of opinion about the data subject by another person given inconfidence or on the understanding that it would be treated as confidential, or(b)to information specified in paragraph (b)(i)(III)of that subsection in so far as arecipient referred to therein is a public authority which may receive data in thecontext of a particular inquiry in accordance with the law of the State.
Access doesn't need to include opinions made in confidence, or information obtained by a public authority who recieves data in the context of a particular inquiry.
-
-
www.irishstatutebook.ie www.irishstatutebook.ie
-
Data Protection Act 2018
Irish Data Protection Act 2018
-
-
Local file Local file
-
And it constitutes an important but overlooked signpost in the 20th-centuryhistory of information, as ‘facts’ fell out of fashion but big data became big business.
Of course the hardest problem in big data has come to be realized as the issue of cleaning up messing and misleading data!
-
-
www.science.org www.science.org
-
a peer-reviewed article
This peer reviewed article titled "The Safety of COVID-19 Vaccinations—We Should Rethink the Policy" uses the mishandling of data provided by scientists to spready disinformation claiming that the Covid-19 vaccine is killing people. This is an example of disinformation because this study is peer reviewed, so the people involved in it are well educated and versed in the development and usage of the vaccine.
-
-
www.reddit.com www.reddit.com
-
I used SjoerdV / ConvertOneNote2MarkDown PowerShell script. The key is running PowerShell and OneNote as Administrator.It will crash a bunch of times depending on the size of your OneNote repository. However, if you keep restarting the program as administrator it seems to start back were it left off.Here are my notes: https://www.dropbox.com/s/au66hamcv71sggk/202211151246%20OneNote%20to%20Markdown%20Procedure.pdf?dl=0
Details for converting OneNote to Obsidian using Markdown
-
-
docdrop.org docdrop.org
-
- Nora Bateson
- great example of
- warm data:
- a doctor who used to visit her mother at her home home
- the doctor's report of her mother's condition
- make up the "cold data"
- but it only told a part of the story
- the other part of the story was not recorded in the formal medical transcripts
- but was recorded in the living, breathing doctor
- who experienced the conditions Nora's mother lived in
- Was the room warm, or cold?
- Was there a lot of family support?
- Was there a lot of love in the human relationships? etc
- a doctor who used to visit her mother at her home home
-
-
journals.publishing.umich.edu journals.publishing.umich.edu
-
student outcomes, including learning, persistence, or attitudes.
I would think that this would be one of the easiest things to measure and also would provide significant and useful data. We should check in with Brian (?) to see what data is currently being tracked.
-
-
www.lifewire.com www.lifewire.com
-
www.lifewire.com www.lifewire.com
-
news.microsoft.com news.microsoft.com
-
Microsoft today announced it intends to build a new datacenter region in Southern Finland.
Microsoft data center region in Finland
-
-
meerenergie.amsterdam meerenergie.amsterdam
-
Bij datacenters in het Science Park komt warmte vrij bij het koelen van de servers.
Meer Energie data center warmte van het Science Park
Tags
Annotators
URL
-
- Jan 2023
-
www.complexityexplorer.org www.complexityexplorer.org
-
3.1 Guest Lecture: Lauren Klein » Q&A on "What is Feminist Data Science?"<br /> https://www.complexityexplorer.org/courses/162-foundations-applications-of-humanities-analytics/segments/15631
https://www.youtube.com/watch?v=c7HmG5b87B8
Theories of Power
Patricia Hill Collins' matrix of domination - no hierarchy, thus the matrix format
What are other broad theories of power? are there schools?
Relationship to Mary Parker Follett's work?
Bright, Liam Kofi, Daniel Malinsky, and Morgan Thompson. “Causally Interpreting Intersectionality Theory.” Philosophy of Science 83, no. 1 (January 2016): 60–81. https://doi.org/10.1086/684173.
about Bayesian modeling for intersectionality
Where is Foucault in all this? Klein may have references, as I've not got the context.
How do words index action? —Laura Klein
The power to shape discourse and choose words - relationship to soft power - linguistic memes
Color Conventions Project
20:15 Word embeddings as a method within her research
General result (outside of the proximal research) discussed: women are more likely to change language... references for this?
[[academic research skills]]: It's important to be aware of the current discussions within one's field. (LK)
36:36 quantitative imperialism is not the goal of humanities analytics, lived experiences are incredibly important as well. (DK)
-
-
www.complexityexplorer.org www.complexityexplorer.org
-
https://www.youtube.com/watch?v=HwkRfN-7UWI
Seven Principles of Data Feminism
- Examine power
- Challenge power
- Rethink binaries and hierarchies
- Elevate emotion an embodiment
- Embrace pluralism
- Consider context
- Make labor visible
Abolitionist movement
There are some interesting analogies to be drawn between the abolitionist movement in the 1800s and modern day movements like abolition of police and racial justice, etc.
Topic modeling - What would topic modeling look like for corpuses of commonplace books? Over time?
wrt article: Soni, Sandeep, Lauren F. Klein, and Jacob Eisenstein. “Abolitionist Networks: Modeling Language Change in Nineteenth-Century Activist Newspapers.” Journal of Cultural Analytics 6, no. 1 (January 18, 2021). https://doi.org/10.22148/001c.18841. - Brings to mind the difference in power and invisible labor between literate societies and oral societies. It's easier to erase oral cultures with the overwhelm available to literate cultures because the former are harder to see.
How to find unbiased datasets to study these?
aspirational abolitionism driven by African Americans in the 1800s over and above (basic) abolitionism
Tags
- algorithms
- invisible labor
- watch
- topic modeling
- aspirational abolitionism
- Data Feminism
- Lauren F. Klein
- operationalization
- abolitionists
- emotional labor
- power frameworks
- intersectional feminism
- Catherine D'Ignazio
- data science
- slavery
- defunding police
- orality vs. literacy
- dodging the memory hole
Annotators
URL
-
-
ssha.org ssha.org
-
citejournal.org citejournal.org
-
Big tech has benefited from an educational dynamic that consistently underfunds public education but demands increased technology to prepare the workers of the future, providing low-cost solutions in exchange for data and the potential for future product loyalty
This is a pattern most of us are familiar with. The best example I know is Apple's launch of the iPad in LA schools without saying, or knowning, how it will be used. Apple has a long history of testing its products out on users. Google habitually does the same, offering products for "free" in exchange for data and expanding a user base for its products.
-
-
hypothes.is hypothes.is假设1
-
个人学习可能取决于他人行为的主张突出了将学习环境视为一个涉及多个互动参与者的系统的重要性
Tags
Annotators
URL
-
-
-
In March, Fortum and Microsoft announced our joint plan for a ground-breaking data centre region in the Helsinki, Finland metropolitan area.
Data centers and district heating - a perfect match. Clean electricity and then output for heat.
-
-
reutersinstitute.politics.ox.ac.uk reutersinstitute.politics.ox.ac.uk
-
Blind news audiences are being left behind in the data visualisation revolution: here's how we fix that
!- Title : Blind news audiences are being left behind in the data visualisation revolution: here's how we fix that
-
-
www.civicsoftechnology.org www.civicsoftechnology.org
-
When engaging in data literacy work in our classrooms, it’s helpful to keep two ideas at play at once: on the one hand, these algorithmic systems are nowhere near as “smart” as these platforms want to lead us to believe they are; and on the other hand, concerns about accuracy can distract us from the bigger picture, that these platforms are built on a logic of prediction that, one nudge at a time, may ultimately infringe upon users’ ability to make up their own mind.
-
-
Local file Local file
-
If you have experienced trouble in rememberingdates try the following system which has proved beneficial to at least onestudent.
Maxfield suggest drawing out a timeline as a possible visual cue for helping to remember dates. He seemingly misses any mention of ars memoria techniques here.
-
-
news.harvard.edu news.harvard.edu
-
ProPublica recently reported that breathing machines purchased by people with sleep apnea are secretly sending usage data to health insurers, where the information can be used to justify reduced insurance payments.
!- surveillance capitalism : example- - Propublica reported breathing machines for sleep apnea secretly send data to insurance companies
-
-
news.stanford.edu news.stanford.edu
-
Layoffs increase the odds of suicide by two and a half times.
-
-
andreisurugiu.com andreisurugiu.com
-
Actually I’m not sure most people do this, I just hope I’m not the only one.
You are not. I will hoard this blog post on my hypothes.is :)
Tags
Annotators
URL
-
-
www.cambridge.org www.cambridge.org
-
We believe that the numeric notational marks associated with the animals constituted a calendar, and given that it references natural behaviour in terms of seasons relative to a fixed point in time, we may refer to it as a phenological calendar, with a meteorological basis.
-
We have proposed the existence of a notational system associated with an unambiguous animal subject, relating to biologically significant events informed by the ethological record, which allows us for the first time to understand a Palaeolithic notational system in its entirety. This utilized/allowed the function of ordinality (and, later, place value), which were revolutionary steps forward in information recording.
-
-
datavizpyr.com datavizpyr.com
-
Data Viz with Python and RLearn to Make Plots in Python and R
data viz with python and R
Tags
Annotators
URL
-
-
coderzcolumn.com coderzcolumn.com
-
We can have a machine learning model which gives more than 90% accuracy for classification tasks but fails to recognize some classes properly due to imbalanced data or the model is actually detecting features that do not make sense to be used to predict a particular class.
Les mesures de qualite d'un modele de machine learning
-
- Dec 2022
-
every.to every.to
-
According to an analysis from the Wall Street Journal, the top 1% of Twitch streamers made over 50% of all money paid out by the platform in 2021. Furthermore, just 5% of users had made over $1,000 in the same year. Only 0.06% had made over the U.S. median household income of $67,521. In a survey of 5,000 community members composed of smaller Twitch streamers, Stream Scheme found that 76% were not able to reach Twitch’s $100 minimum payout threshold. Most others were making between $25-130 per month on the platform.
-
In a 2021 leak of Twitch’s user data that included creator payouts, it was revealed that from August 2019 to October 2021, the top 100 streamers on the platform made anywhere between $9,626,712.16 and $886,999.17.
Tags
Annotators
URL
-
-
sproutsocial.com sproutsocial.com
-
Best times to post on social media overall: Tuesdays through Thursdays at 9 a.m. or 10 a.m. Best days to post on social media: Tuesdays through Thursdays Worst days to post on social media: Sundays
-
-
www.aub.edu.lb www.aub.edu.lb
-
Data Services
-
-
aub.edu.lb.libcal.com aub.edu.lb.libcal.com
-
How to Manage your Research Data Effectively
This workshop is offered once per semester to faculty members and research assistants
-
-
docs.docker.com docs.docker.com
-
Introduction of the Compose specification makes a clean distinction between the Compose YAML file model and the docker-compose implementation.
-
-
Local file Local file
-
Remember the book title and its genre. You will need to define the term "memoir," and recognize the publisher, title, and author for bibliographic information including the year of publication.
-
-
zephoria.medium.com zephoria.medium.com
-
One interesting concept in organizational sociology is “normal accidents theory.” Studying Three Mile Island, Charles Perrow created a 2x2 grid
-
- Nov 2022
-
medium.com medium.com
-
hat we want is to be able to leave Facebook and still talk to our friends, instead of having many Facebooks.
What about Matrix?
-
-
brainsteam.co.uk brainsteam.co.uk
-
https://brainsteam.co.uk/annotations/
Example of someone owning their Hypothes.is annotations and publishing them on their own website.
-
-
whatever.scalzi.com whatever.scalzi.com
-
https://whatever.scalzi.com/2022/11/25/how-to-weave-the-artisan-web/
“But Scalzi,” I hear you say, “How do we bring back that artisan, hand-crafted Web?” Well, it’s simple, really, and if you’re a writer/artist/musician/other sort of creator, it’s actually kind of essential:
-
-
arxiv.org arxiv.org
-
Our annotators achieve thehighest precision with OntoNotes, suggesting thatmost of the entities identified by crowdworkers arecorrect for this dataset.
interesting that the mention detection algorithm gives poor precision on OntoNotes and the annotators get high precision. Does this imply that there are a lot of invalid mentions in this data and the guidelines for ontonotes are correct to ignore generic pronouns without pronominals?
-
an algorithm with high precision on LitBank orOntoNotes would miss a huge percentage of rele-vant mentions and entities on other datasets (con-straining our analysis)
these datasets have the most limited/constrained definitions for co-reference and what should be marked up so it makes sense that precision is poor in these datasets
-
Procedure: We first launch an annotation tutorial(paid $4.50) and recruit the annotators on the AMTplatform.9 At the end of the tutorial, each annotatoris asked to annotate a short passage (around 150words). Only annotators with a B3 score (Bagga
Annotators are asked to complete a quality control exercise and only annotators who achieve a B3 score of 0.9 or higher are invited to do more annotation
-
Annotation structure: Two annotation ap-proaches are prominent in the literature: (1) a localpairwise approach, annotators are shown a pairof mentions and asked whether they refer to thesame entity (Hladká et al., 2009; Chamberlain et al.,2016a; Li et al., 2020; Ravenscroft et al., 2021),which is time-consuming; or (2) a cluster-basedapproach (Reiter, 2018; Oberle, 2018; Bornsteinet al., 2020), in which annotators group all men-tions of the same entity into a single cluster. InezCoref we use the latter approach, which can befaster but requires the UI to support more complexactions for creating and editing cluster structures.
ezCoref presents clusters of coreferences all at the same time - this is a nice efficient way to do annotation versus pairwise annotation (like we did for CD^2CR)
-
owever, these datasets vary widelyin their definitions of coreference (expressed viaannotation guidelines), resulting in inconsistent an-notations both within and across domains and lan-guages. For instance, as shown in Figure 1, whileARRAU (Uryupina et al., 2019) treats generic pro-nouns as non-referring, OntoNotes chooses not tomark them at all
One of the big issues is that different co-reference datasets have significant differences in annotation guidelines even within the coreference family of tasks - I found this quite shocking as one might expect coreference to be fairly well defined as a task.
-
Specifically, our work investigates the quality ofcrowdsourced coreference annotations when anno-tators are taught only simple coreference cases thatare treated uniformly across existing datasets (e.g.,pronouns). By providing only these simple cases,we are able to teach the annotators the concept ofcoreference, while allowing them to freely interpretcases treated differently across the existing datasets.This setup allows us to identify cases where ourannotators disagree among each other, but moreimportantly cases where they unanimously agreewith each other but disagree with the expert, thussuggesting cases that should be revisited by theresearch community when curating future unifiedannotation guidelines
The aim of the work is to examine a simplified subset of co-reference phenomena which are generally treated the same across different existing datasets.
This makes spotting inter-annotator disagreement easier - presumably because for simpler cases there are fewer modes of failure?
-
this work, we developa crowdsourcing-friendly coreference annota-tion methodology, ezCoref, consisting of anannotation tool and an interactive tutorial. Weuse ezCoref to re-annotate 240 passages fromseven existing English coreference datasets(spanning fiction, news, and multiple other do-mains) while teaching annotators only casesthat are treated similarly across these datasets
this paper describes a new efficient coreference annotation tool which simplifies co-reference annotation. They use their tool to re-annotate passages from widely used coreference datasets.
Tags
Annotators
URL
-
-
www.wikiverse.io www.wikiverse.io
-
An independent initiative made by Owen Cornec who has also made many other beautiful data visualizations. Wikiverse vividly captures the fact that Wikipedia is a an awe-inspiring universe to explore.
Tags
Annotators
URL
-
-
scribe.rip scribe.rip
-
One example could be putting all files into an Amazon S3 bucket. It’s versatile, cheap and integrates with many technologies. If you are using Redshift for your data warehouse, it has great integration with that too.
Essentially the raw data needs to be vaguely homogenised and put into a single place
-
-
www.cs.ucr.edu www.cs.ucr.edu
-
Dr. Miho Ohsaki re-examined workshe and her group had previously published and confirmed that the results are indeed meaningless in the sensedescribed in this work (Ohsaki et al., 2002). She has subsequently been able to redefine the clustering subroutine inher work to allow more meaningful pattern discovery (Ohsaki et al., 2003)
Look into what Dr. Miho Ohsaki changed about the clustering subroutine in her work and how it allowed for "more meaningful pattern discovery"
-
Eamonn Keogh is an assistant professor of Computer Science at the University ofCalifornia, Riverside. His research interests are in Data Mining, Machine Learning andInformation Retrieval. Several of his papers have won best paper awards, includingpapers at SIGKDD and SIGMOD. Dr. Keogh is the recipient of a 5-year NSF CareerAward for “Efficient Discovery of Previously Unknown Patterns and Relationships inMassive Time Series Databases”.
Look into Eamonn Keogh's papers that won "best paper awards"
-
-
blogs.perficient.com blogs.perficient.com
-
This is different than row-level security because row-level security is going to allow you to restrict the actual data that’s shown to them not the actual report that’s shown
-
-
rmoff.net rmoff.net
-
It took me a while to grok where dbt comes in the stack but now that I (think) I have it, it makes a lot of sense. I can also see why, with my background, I had trouble doing so. Just as Apache Kafka isn’t easily explained as simply another database, another message queue, etc, dbt isn’t just another Informatica, another Oracle Data Integrator. It’s not about ETL or ELT - it’s about T alone. With that understood, things slot into place. This isn’t just my take on it either - dbt themselves call it out on their blog:
Also - just because their "pricing" page caught me off guard and their website isn't that clear (until you click through to the technical docs) - I thought it's worth calling out that DBT appears to be an open-core platform. They have a SaaS offering and also an open source python command-line tool - it seems that these articles are about the latter
-
Of course, despite what the "data is the new oil" vendors told you back in the day, you can’t just chuck raw data in and assume that magic will happen on it, but that’s a rant for another day ;-)
Love this analogy - imagine chucking some crude into a black box and hoping for ethanol at the other end. Then, when you end up with diesel you have no idea what happened.
-
Working with the raw data has lots of benefits, since at the point of ingest you don’t know all of the possible uses for the data. If you rationalise that data down to just the set of fields and/or aggregate it up to fit just a specific use case then you lose the fidelity of the data that could be useful elsewhere. This is one of the premises and benefits of a data lake done well.
absolutely right - there's also a data provenance angle here - it is useful to be able to point to a data point that is 5 or 6 transformations from the raw input and be able to say "yes I know exactly where this came from, here are all the steps that came before"
-
-
developer.mozilla.org developer.mozilla.org
-
binary string (i.e., a string in which each character in the string is treated as a byte of binary data)
-
-
watermark.silverchair.com watermark.silverchair.com
-
Introduction to Daniel Rosiak's spectacular "Sheaf Theory through Examples" available open access from MIT Direct Press: https://doi.org/10.7551/mitpress/12581.003.0003
-
-
docdrop.org docdrop.org
-
okay so remind you what is a sheath so a sheep is something that allows me to 00:05:37 translate between physical sources or physical realms of data and physical regions so these are various 00:05:49 open sets or translation between them by taking a look at restrictions overlaps 00:06:02 and then inferring
Fixed typos in transcript:
Just generally speaking, what can I do with this sheaf-theoretic data structure that I've got? Okay, [I'll] remind you what is a sheaf. A sheaf is something that allows me to translate between physical sources or physical realms of data [in the left diagram] and the data that are associated with those physical regions [in the right diagram]
So these [on the left] are various open sets [an example being] simplices in a [simplicial complex which is an example of a] topological space.
And these [on the right] are the data spaces and I'm able to make some translation between [the left and the right diagrams] by taking a look at restrictions of overlaps [a on the left] and inferring back to the union.
So that's what a sheaf is [regarding data structures]. It's something that allows me to make an inference, an inferential machine.
-
-
listfollowers.com listfollowers.com
-
stackoverflow.com stackoverflow.com
-
it seems like a perversion of my beautiful REST/JSON server
-
-
beepb00p.xyz beepb00p.xyz
-
I also think being able to self-host and export parts of your data to share with others would be great.
This might be achievable through Holochain application framework. One promising project built on Holochain is Neighbourhoods. Their "Social-Sensemaker Architecture" across "neighbourhoods" is intriguing
-
-
www.weforum.org www.weforum.org
-
Your data could warm you up this winter, here’s how
WEC - Circular Economy - $2.5 by 2025
-
-
www.prisma.io www.prisma.io
-
with Prisma you never create application models in your programming language by manually defining classes, interfaces, or structs. Instead, the application models are defined in your Prisma schema
Tags
Annotators
URL
-
-
martinfowler.com martinfowler.com
-
high friction and cost of discovering, understanding, trusting, and ultimately using quality data. If not addressed, this problem only exacerbates with data mesh, as the number of places and teams who provide data - domains - increases.
Encore un lien avec https://frictionlessdata.io/
-
-
martinfowler.com martinfowler.com
-
building common infrastructure
Solution à la duplication des efforts et des données.
-
A data product owner makes decisions around the vision and the roadmap for the data products, concerns herself with satisfaction of her consumers and continuously measures and improves the quality and richness of the data her domain owns and produces. She is responsible for the lifecycle of the domain datasets, when to change, revise and retire data and schemas. She strikes a balance between the competing needs of the domain data consumers.
Ressemble aux rôles et responsabilités de nos intendants de données.
-
-
pruvisto.org pruvisto.org
-
https://pruvisto.org/debirdify/
Tool for moving some of your Twitter data over to Mastodon or other parts of the Fediverse.
-
-
-
CEO, Mike Tung was on Data science podcast. Seems to be solving problem that Google search doesn't; how seriously should you take the results that come up? What confidence do you have in their truth or falsity?
Tags
Annotators
URL
-
- Oct 2022
-
queue.acm.org queue.acm.org
-
only by examining a constellation of metrics in tension can we understand and influence developer productivity
I love this framing! In my experience companies don't generally acknowledge that metrics can be in tension, which usually means they're only tracking a subset of the metrics they ought to be if they want to have a more complete/realistic understanding of the state of things.
Tags
Annotators
URL
-
-
developerpitstop.com developerpitstop.com
-
Software engineers typically stay at one job for an average of two years before moving somewhere different. They spend less than half the amount of time at one company compared to the national average tenure of 4.2 years.
-
The average performance pay rise for most employees is 3% a year. That is minuscule compared to the 14.8% pay raise the average person gets when they switch jobs.
-
-
innerjoin.bit.io innerjoin.bit.io
-
There are a lot of PostgreSQL servers connected to the Internet: we searched shodan.io and obtained a sample of more than 820,000 PostgreSQL servers connected to the Internet between September 1 and September 29. Only 36% of the servers examined had SSL certificates. More than 523,000 PostgreSQL servers listening on the Internet did not use SSL (64%)
-
At most 15% of the approximately 820,000 PostgreSQL servers listening on the Internet require encryption. In fact, only 36% even support encryption. This puts PostgreSQL servers well behind the rest of the Internet in terms of security. In comparison, according to Google, over 96% of page loads in Chrome on a Mac are encrypted. The top 100 websites support encryption, and 97 of those default to encryption.
-
-
Local file Local file
-
one recognizes in the tactile realitythat so many of the cards are on flimsy copy paper, on the verge of disintegration with eachuse.
Deutsch used flimsy copy paper, much like Niklas Luhmann, and as a result some are on the verge of disintegration through use over time.
The wear of the paper here, however, is indicative of active use over time as well as potential care in use, a useful historical fact.
-
-
blog.didomi.io blog.didomi.io
-
Toutefois, à bien des égards, la loi 25 est le plus rigoureux des trois régimes.
-
-
cdn-contenu.quebec.ca cdn-contenu.quebec.ca
-
En cas de non-respect de la Loi, la Commission d’accès à l’information pourra imposer des sanctionsimportantes, qui pourraient s’élever jusqu’à 25 M$ ou à 4 % du chiffre d’affaires mondial. Cette sanctionsera proportionnelle, notamment, à la gravité du manquement et à la capacité de payer de l’entreprise.ENTREPRISES
-
-
cdn-contenu.quebec.ca cdn-contenu.quebec.ca
-
certains renseignements détenus par le ministère del’Éducation et de l’Enseignement supérieur ont été dérobés. Ainsi, 360 000 enseignantspeuvent être des victimes potentielles.
-
-
archive.org archive.org
-
Noting the dates of available materials within archives or sources can be useful on bibliography notes for either planning or revisiting sources. (p16,18)
Similarly one ought to note missing dates, data, volumes, or resources at locations to prevent unfruitfully looking for data in these locations or as a note to potentially look for the missing material in other locations. (p16)
-
- Sep 2022
-
douglasorr.github.io douglasorr.github.io
-
First, to clarify - what is "code", what is "data"? In this article, when I say "code", I mean something a human has written, that will be read by a machine (another program or hardware). When I say "data", I mean something a machine has written, that may be read by a machine, a human, or both. Therefore, a configuration file where you set logging.level = DEBUG is code, while virtual machine instructions emitted by a compiler are data. Of course, code is data, but I think this over-simplified view (humans write code, machines write data) will serve us best for now...
-
-
citeseerx.ist.psu.edu citeseerx.ist.psu.edu
-
The authors propose, based on these experiences, that the cause ofa number of unexpected difficulties in human-computer interaction lies in users’ unwillingness orinability to make structure, content, or procedures explicit
I'm curious if this is because of unwillingness or difficulty.
Tags
Annotators
URL
-
-
projects.fivethirtyeight.com projects.fivethirtyeight.com
-
-
no
Saying that Microsoft Word doesn't have export seems disingenuous as the program defaults to file ownership from the start.
-
- Aug 2022
-
www.w3.org www.w3.org
-
In practice, a system in which different parts of the web have different capabilities cannot insist on bidirectional links. Imagine, for example the publisher of a large and famous book to which many people refer but who has no interest in maintaining his end of their links or indeed in knowing who has refered to the book.
Why it's pointless to insist that links should have been bidirectional: it's unenforceable.
-
-
Local file Local file
-
If the key, or the de-vice on which it is stored is compromised, or if avulnerability can be exploited, then the data assetcan be irrevocably stolen
Another scenario, if the key or the storage-key device is compromised, or if vulnerability exploitation occurs, then data asset can be stolen.
-
If akey is lost, this invariably means that the secureddata asset is irrevocably lost
Counterpart, be careful! If a key is lost, the secured data asset is lost
-
-
www.nytimes.com www.nytimes.com
-
Bloom, J., & Cobey, S. (2021, December 12). Opinion | A Scientist’s Guide to Understanding Omicron. The New York Times. https://www.nytimes.com/2021/12/12/opinion/covid-omicron-data.html
-
-
www.faz.net www.faz.net
-
Kinderimpfstoff gegen Corona: Stiko-Chef Mertens würde eigene Kinder jetzt nicht impfen lassen. (2021, December 2). FAZ.NET. https://www.faz.net/aktuell/politik/inland/stiko-chef-mertens-wuerde-eigene-kinder-nicht-gegen-corona-impfen-17662194.html
-
-
www.npr.org www.npr.org
-
Summers, J. (2021). Little Difference In Vaccine Hesitancy Among White And Black Americans, Poll Finds. NPR.Org. Retrieved March 17, 2021, from https://www.npr.org/sections/coronavirus-live-updates/2021/03/12/976172586/little-difference-in-vaccine-hesitancy-among-white-and-black-americans-poll-find
-
-
twitter.com twitter.com
-
ReconfigBehSci. (2021, October 13). RT @PaulMainwood: Wales is first with their vaccine stock data this week, with a solid 1.6m new doses made available to the UK roll-out pro… [Tweet]. @SciBeh. https://twitter.com/SciBeh/status/1448311825557118977
-
-
twitter.com twitter.com
-