378 Matching Annotations
  1. Apr 2019
    1. рецензируемаямонографиявключаетвсебя 47 глав, посвященныхразличнымотраслямономастическойнаукиинаписанных 43 ав-торамииз 13 стран. РуководствоавторскимколлективомвзяланасебяКэролХоу, профессоруниверситетаГлазго, бывшийпрезидентМеждународногосо-ветапоономастическимнаукам
    1. прецедентный текст, прецедент-ное высказывание, прецедентное имя, преце-дентную ситуацию
    2. По мнению исследователя, к прецедентымотносятся «тексты, (1) значимые для той илииной личности в познавательном и эмоцио-нальном отношениях, (2) имеющие сверхлич-ностный характер, то есть хорошо известныеи окружению данной личности, включая ипредшественников и современников, и, нако-нец, такие, (3) обращение к которым возоб-новляется неоднократно в дискурсе даннойязыковой личности» [1, с. 216].
    3. Апеллятивизация, то есть переход именсобственных (онимов) в имена нарицатель-ные (апеллятивы) распространена в русскомязыке достаточно широко.
    4. При участии апеллятивизированного они-ма Отелло образовалась синонимическаяпара ревнивец – Отелло (Отелло – геройодноименной трагедии В. Шекспира (1564–1616), из ревности убивший свою жену).
    5. Апеллятивизированный оним Печкинпополнил синонимический ряд с доминантойпочтальон (Печкин – почтальон, персонаждетских повестей современного российскогописателя Э. Успенского и снятых по ним муль-типликационных фильмов)
    6. Рассматривая причины преимуществен-ного использования апеллятивизированных они-мов Шерлок Холмс (Холмс) и Пинкертон посравнению с апеллятивизированными онимамиМегрэ и Пуаро, мы пришли к выводу, что дан-ный факт не связан со степенью известноститого или иного литературного героя. Так, рас-сказы и повести о Шерлоке Холмсе и снятыепо ним фильмы пользуются большой популяр-ностью, а романы о Пинкертоне не переизда-вались много лет. При этом по частоте упот-реблений единицы Шерлок Холмс (Холмс)иПинкертон различаются незначительно
    7. Прецедентным именемназывают «индивидуальное имя, связанноеили 1) с широко известным текстом, относя-щимся, как правило, к числу прецедентных(Анна Каренина, Обломов), или 2) с ситуа-цией, широко известной носителям языка ивыступающей как прецедентная (Иван Су-санин); в состав прецедентных имен входяттакже 3) имена-символы, указывающие на не-которую эталонную совокупность определен-ных качеств (Наполеон, Сальери)» [2]
    8. Апеллятивизированный оним Митрофа-нушка (Митрофан) пополнил синонимичес-кий ряд с доминантой невежда (Митрофа-нушка – невежественный и не желавшийучиться герой комедии Д.И. Фонвизина (1745–1792) «Недоросль»)
    9. про-номинации– «замены нарицательного име-ни собственным (или наоборот), например:Отелло вместо ревнивец»
    10. Таким образом, имена литературных ге-роев, подвергшиеся апеллятивизации, входятв синонимический ряд и образуют синоними-ческие пары с соответствующими апелляти-вами. В подавляющем большинстве случа-ев апеллятивизированная единица выступа-ет по отношению к доминанте как стилисти-ческий синоним.
    11. Апеллятивизированный оним Плюшкинпополнил синонимический ряд с доминантойскупец (Плюшкин – отличавшийся необычай-ной скупостью помещик, персонаж поэмыН.В. Гоголя (1809–1852) «Мертвые души»).
    1. Distant reading o ersnew ways to challenge assumptions about genre, narrative and other aspects ofliterature, by facilitating the analysis of large-scale collections of literary works.Numerous approaches have been proposed and tested for this purpose, includ-ing those based on statistical topic models [10], character pro ling [6], characterfrequency analysis [5, 22], and sentiment analysis [4].
    1. DeepWalk was introduced by Perozziet al.in 2014 [28]. The authors brought the idea of us-ing recent work in word embedding. They emphasize interesting similarities between NLP andSNA (as seen in II.1). The main idea is to givesentencesof nodes (instead of words) as inputfor word embedding algorithms. They used random walk to produce suchsentencesout of anunweighted graph (the overall architecture of Deepwalk can be seen in figure III.5).
    2. Negative sampling (NS) is a simplified version of Noise-contrastive estimation (NCE) [35]. Theyboth are sampling approaches : instead of calculating a cross product for each word in the vo-cabulary, they just use samples of the vocabulary set. Unlike the hierarchical softmax, samplingapproaches do not exhibit a softmax layer. Sampling approaches are only useful during the trainingtime. A full softmax must be computed during the evaluation (that is not a problem since we donot use the model after the training, we just get the feature vectors)
    3. According to Mikolovet al.it seems that the non linear hidden layer brings too much com-plexity [26]. In this paper, the author propose two shallow neural networks without hidden layer,Continuous Bag of Words (CBOW) and Skip-Gram model.
    4. Asstated before, Social Network Analysis often deals with millions of nodes, therefore the size of theadjacency matrix is the square of number of nodes. Not mentioning obvious performance issues,this embedding would be hardly exploitable because of the curse of dimensionality
    5. The simplest graph embedding we can consider is the adjacency matrix of the network
    6. Producing vector representations of nodes builds a bridge between social network analysis,data analytics and statistics. The vectors can therefore be used by machine learning algorithmsthat takes vectors as input for prediction, distance computation or clustering.
    7. Node role identificationaims to infer from a social graph the role the node is playing into thenetwork.
    8. Community detectionaims to identify groups of nodes highly interconnected
    9. Link predictionaims to predict the connections that will appear between the actors of a socialnetwork.
    10. Node classificationaims to retrieved individual data of an actor (age, gender, interestsetc.)
    11. Information diffusion studyingaims to understand how an information is diffused through thenetwork
    12. Network modelingaims to simulate the behavior of the network with a simple model.
    13. This idea is summed up by Degenne and Forsé [1]: “Instead of thinking realityin terms of relations between actors, lots of those who analyze empirical data limit themselvesinto thinking it in categories (for example: young people, women, executives, developing countries,etc.). These categories are built by aggregating people with similar features and,a priori, relevantfor the current issue.
    1. Downsides of NMF Can only be applied to non-negative data Interpretability is hit or miss Non-convex optimization, requires initialization Not orthogonal
    2. Why NMF? Meaningful signs Positive weights No “cancellation” like in PCA Can learn over-complete representation
    1. We extend the fact thatNMF is similar to pLSI and LDA generative models and modelthe outliers using the`1;2-norm. This particular formulation ofNMF is non-standard, and requires careful design of optimizationmethods to solve the problem. We solve the resulting optimiza-tion problem using block coordinate descent technique.
    2. One advantage of matrix factorizationmethods is that they decompose the term-document structureof the underlying corpus into a set of semantic term clusters anddocument clusters. The semantic nature of this decompositionprovides the context in which a document may be interpreted foroutlier analysis.
    3. there are surprisingly few methods which arespecifically focusedon this domain, even though many genericmethods such as distance-based methods can be easily adapted tothis domain [13,20], and are often used for text outlier analysis
    1. Thenotionofaneventdiffersfromabroadercategoryofeventsbothinspatial/temporallocalizationandinspecificity.Forexample,theeruptionofMountPinatuboonJune15th,1991isconsidertobeanevent,whereasvolcaniceruptioningeneralisconsideredtobeaclassofevents
    2. Eventsmightbeunexpected,suchastheerup-tionofavolcano,orexpected,suchasapoliticalelection
    3. Duringthefirstportionofthisstudy,thenotionofa“topic”wasmodifiedandsharp-enedtobean“event”,meaningsomeuniquethingthathap-pensatsomepointintime
    4. Thetrackingtaskisdefinedtobethetaskofassociatingincomingstorieswitheventsknowntothesystem.Aneventisdefined(“known”)byitsassociationwithstoriesthatdiscusstheevent.Thuseachtargeteventisde-finedbyalistofstoriesthatdiscussit
    5. ThisstudycorpusspanstheperiodfromJuly1,1994toJune30,1995andincludesnearly16,000stories,withabouthalftakenfromReutersnewswireandhalffromCNNbroadcastnewstranscripts.
    1. Relational Topic Models (RTM), is another extension, RTM is a hierarchicalmodel of networks and per-node attribute data. First, each document was createdfrom topics in LDA. Then, modelling the connections between documents and con-sidered as binary variables, one for each pair from documents. These are distributedbased on a distribution that depends on topics used to generate each of the constituentdocuments. So in this way, the content of the documents are statistically linked to thelink structure between them and we can say that this model can be used to summarizea network of documents [69]
    2. MedLDA, proposed the maximum entropy discrimination latent Dirichlet allo-cation (MedLDA) model, which incorporates the mechanism behind the hierarchi-cal Bayesian models (such as, LDA) with the max margin learning (such as SVMs)according to a unified restricted optimization framework.
    3. LLDA is a supervised algorithm that makes topics applying the Labels assigned man-ually. Therefore, LLDA can obtain meaningful topics, with words that map well tothe labels applied. As a disadvantage, Labeled LDA has limitation to support latentsubtopics within a determined label or any global topics. For overcome this problem,proposed partially labeled LDA (PLDA)
    4. DTM, Dynamic Topic Model (DTM) is introduced by Blei and Laerty as an ex-tension of LDA that this model can obtain evolution of topics over time in a sequen-tially arranged corpus of documents and exhibits evolution of word-topic distributionwhich causes it easy to vision the topic trend [73]. As an advantage, DTM is veryimpressible for extracting topics from collections that change slowly over a period oftime.
    5. Author-Topic model [75], is a popular and simple probabilistic model in topicmodeling for finding relationships among authors, topics, words and documents. Thismodel provides a distribution of dierent topics for each author and also a distributionof words for each topic. For evaluation, the authors used 1700 papers of NIPS con-ference and also 160,000 CiteSeer abstracts of CiteSeer dataset. To estimate the topicand author distributions applied Gibbs sampling. According to their result, showedthis approach can provide a significantly predictive for interests of authors in per-plexity measure.
    6. Undeniably, this period(2003 to 2009) is very important because key and baseline approaches were intro-duced, such as: CorrLDA, Author-Topic Model , DTM and , RTM etc
    7. In summary, this paper makes four main contributions:–We investigate scholarly articles (from 2003 to 2016) which are related to TopicModeling based on LDA to discover the research development, current trends andintellectual structure of topic modeling based on LDA.–We investigate topic modeling applications in various sciences.–We summarize challenges in topic modeling, such as image processing, Visualiz-ing topic models, Group discovery, User Behavior Modeling, and etc.–We introduce some of the most famous data and tools in topic modeling
    8. An analysis of geographic information is another issue that can be referred to [17].They introduced a novel method based on multi-modal Bayesian models to describesocial media by merging text features and spatial knowledge that called GeoFolk.
    9. nother group of researchers focused on topic modeling in software engineering,in [8] for the first time, they used LDA, to extract topics in source code and performvisualization of software similarity,
  2. Mar 2019
  3. Feb 2019
  4. Jan 2019
  5. Dec 2018
    1. Возникновение когнитивной лингвистики – это один из эпизодов общего методологического сдвига, начавшегося в лингвистике с конца в 1950-х годов и сводящегося к снятию запрета на введение в рассмотрение «далеких от поверхности», недоступных непосредственному наблюдению теоретических (модельных) конструктов

      пост-бихевиоризм?

    1. A semantic treebank is a collection of natural language sentences annotated with a meaning representation. These resources use a formal representation of each sentence's semantic structure.
    1. A syntactically annotated corpus (treebank) is a part of Russian National Corpus.[2] It contains 40,000 sentences (600,000 words) which are fully syntactically and morphologically annotated. The primary annotation was made by ETAP-3 and then manually verified by competent linguists.
  6. Nov 2018
  7. Oct 2018
    1. In English and many other languages using some form of the Latin alphabet, the space is a good approximation of a word divider (word delimiter), although this concept has limits because of the variability with which languages emically regard collocations and compounds.

      пробел -- хороший аппроксиматор разделения слов. но есть коллокации и сложносоставные слова.

    1. Since the question is about determining the morphological profile of a language, the issue of determining word boundaries is quite central.

      Так как морфологический профиль, то проблема разделения слов (word boundaries) является центральной

    1. Agglutination is a linguistic process pertaining to derivational morphology in which complex words are formed by stringing together morphemes without changing them in spelling or phonetics.
    1. Так как ханча никогда не упрощали централизованно, её знаки в большинстве случаев идентичны традиционным китайским и японским иероглифам. Очень малое число знаков ханча имеют скорописные формы или уникальны для корейского языка.

      Иероглифы упрощают. Видимо, для ускорения письма. А для чего еще?