71 Matching Annotations
  1. Apr 2025
    1. I would argue that "whole tree" thinking is enhanced by --follow being the default. What I mean is when I want to see the history of the code within a file, I really don't usually care whether the file was renamed or not, I just want to see the history of the code, regardless of renames. So in my opinion it makes sense for --follow to be the default because I don't care about individual files; --follow helps me to ignore individual file renames, which are usually pretty inconsequential.
    1. (validBytes - lastIndexEntry > indexIntervalBytes)

      recover过程中会重置索引,然后遍历每个batch来添加索引项,这样就能避免掉原本写入消息时,一次性写入的多batch的消息最多只写入一次索引导致查询效率低的问题

  2. Jul 2024
    1. Especially users working with Microsoft Office 365 and therefore Outlook noticed very often that login is not possible. Upon closer analysis, it was found that the MS/Bing crawlers are particularly persistent and repeatedly call the reset links, regardless of server configuration or the like. For this reason, a text field was implemented in the backend via the Drupal State API, in which selected user agents (always one per line) can be entered. These are checked by 'Shy One Time', in case of a hit a redirect to the LogIn form with a 302 status code occurs, the reset link is not invalidated.
  3. Apr 2024
    1. Rather than seeing document history as a linear sequence of versions,we could see it as a multitude of projected views on top of a database of granular changes

      That would be nice.

      As well as preserving where ops come from.

      Screams to have gossip-about-gossip. I'm surprised people do not discover it.

    2. Another approach is to maintain all operations for a particular document version in anordered log, and to append new operations at the end when they are generated. To mergetwo document versions with logs 𝐿1 and 𝐿2 respectively, we scan over the operations in𝐿1, ignoring any operations that already exist in 𝐿2; any operations that do not exist in 𝐿2are applied to the 𝐿1 document and appended to 𝐿1 in the order they appear in 𝐿2.

      This much like Chronofold's subjective logs.

      Do the docs need to be shared in full? The only thing we need is delta ops.

  4. Nov 2023
    1. A subscriber periodically check-points the latest LSN it has processed to stable storage. When a subscriber crashes, upon recovery it resumes pro-cessing from the latest checkpointed LSN. Thus, a subscriber may process some events twice (those processed between the last checkpoint and the crash), but it never skips any events. Events in the log are processed at least once by each subscriber.

    Tags

    Annotators

    1. Moreover, if processing an event may have external side-effectsbesides updating a replica state – for example, if it may trigger anemail to be sent – then the time warp approach requires some wayof undoing or compensating for those side-effects in the case wherea previously processed event is affected by a late-arriving eventwith an earlier timestamp. It is not possible to un-send an emailonce it has been sent, but it is possible to send a follow-up emailwith a correction, if necessary. If the possibility of such correctionsis unacceptable, optimistic replication cannot be used, and SMR oranother strongly consistent approach must be used instead. In manybusiness systems, corrections or apologies arise from the regularcourse of business anyway [27], so maybe occasional correctionsdue to out-of-order events are also acceptable in practice.
    2. If the applicationdevelopers wish to change the logic for processing an event, forexample to change the resulting database schema or to fix a bug,they can set up a new replica, replay the existing event log usingthe new processing function, switch clients to reading from thenew replica instead of the old one, and then decommission theold replica [34].
  5. Aug 2023
  6. Mar 2023
    1. There are two main reasons to use logarithmic scales in charts and graphs.
      • respond to skewness towards large values / outliers by spreading out the data.
      • show multiplicative factors rather than additive (ex: b is twice that of a).

        The data values are spread out better with the logarithmic scale. This is what I mean by responding to skewness of large values.

      In Figure 2 the difference is multiplicative. Since 27 = 26 times 2, we see that the revenues for Ford Motor are about double those for Boeing. This is what I mean by saying that we use logarithmic scales to show multiplicative factors

  7. Feb 2023
  8. Oct 2022
    1. Additionally, make sure to use both forward and side scatter on log scale when measuring microparticles or microbiological samples like bacteria. These types of particles generate dim scatter signals that are close to the cytometer’s noise, so it’s often necessary to visualize signal on a log scale in order to separate the signal from scatter noise.
  9. Sep 2022
  10. Aug 2022
  11. Jan 2022
    1. Most developers are familiar with MySQL and PostgreSQL. They are great RDBMS and can be used to run analytical queries with some limitations. It’s just that most relational databases are not really designed to run queries on tens of millions of rows. However, there are databases specially optimized for this scenario - column-oriented DBMS. One good example is of such a database is ClickHouse.

      How to use Relational Databases to process logs

  12. Sep 2021
  13. Jun 2021
    1. From pretty format documentation: '%w([<w>[,<i1>[,<i2>]]])': switch line wrapping, like the -w option of git-shortlog[1]. And from shortlog: -w[<width>[,<indent1>[,<indent2>]]] Linewrap the output by wrapping each line at width. The first line of each entry is indented by indent1 spaces, and the second and subsequent lines are indented by indent2 spaces. width, indent1, and indent2 default to 76, 6 and 9 respectively. If width is 0 (zero) then indent the lines of the output without wrapping them.
  14. Feb 2021
  15. Oct 2020
  16. May 2020
    1. Amazon S3 event notifications are designed to be delivered at least once. Typically, event notifications are delivered in seconds but can sometimes take a minute or longer.

      event notification of s3 might take minutes

      BTW,

      cloud watch does not support s3, but cloud trail does

  17. Apr 2020
    1. Events can self-trigger based on a schedule; alarms don't do this Alarms invoke actions only for sustained changes Alarms watch a single metric and respond to changes in that metric; events can respond to actions (such as a lambda being created or some other change in your AWS environment) Alarms can be added to CloudWatch dashboards, but events cannot Events are processed by targets, with many more options than the actions an alarm can trigger

      Event vs Alarm

  18. Jan 2020
  19. Oct 2019
  20. Jul 2019
  21. Mar 2019
  22. Dec 2018
    1. Outliers : All data sets have an expected range of values, and any actual data set also has outliers that fall below or above the expected range. (Space precludes a detailed discussion of how to handle outliers for statistical analysis purposes, see: Barnett & Lewis, 1994 for details.) How to clean outliers strongly depends on the goals of the analysis and the nature of the data.

      Outliers can be signals of unanticipated range of behavior or of errors.

    2. Understanding the structure of the data : In order to clean log data properly, the researcher must understand the meaning of each record, its associated fi elds, and the interpretation of values. Contextual information about the system that produced the log should be associated with the fi le directly (e.g., “Logging system 3.2.33.2 recorded this fi le on 12-3-2012”) so that if necessary the specifi c code that gener-ated the log can be examined to answer questions about the meaning of the record before executing cleaning operations. The potential misinterpretations take many forms, which we illustrate with encoding of missing data and capped data values.

      Context of the data collection and how it is structured is also a critical need.

      Example, coding missing info as "0" risks misinterpretation rather than coding it as NIL, NDN or something distinguishable from other data

    3. Data transformations : The goal of data-cleaning is to preserve the meaning with respect to an intended analysis. A concomitant lesson is that the data-cleaner must track all transformations performed on the data .

      Changes to data during clean up should be annotated.

      Incorporate meta data about the "chain of change" to accompany the written memo

    4. Data Cleaning A basic axiom of log analysis is that the raw data cannot be assumed to correctly and completely represent the data being recorded. Validation is really the point of data cleaning: to understand any errors that might have entered into the data and to transform the data in a way that preserves the meaning while removing noise. Although we discuss web log cleaning in this section, it is important to note that these principles apply more broadly to all kinds of log analysis; small datasets often have similar cleaning issues as massive collections. In this section, we discuss the issues and how they can be addressed. How can logs possibly go wrong ? Logs suffer from a variety of data errors and distortions. The common sources of errors we have seen in practice include:

      Common sources of errors:

      • Missing events

      • Dropped data

      • Misplaced semantics (encoding log events differently)

    5. In addition, real world events, such as the death of a major sports fi gure or a political event can often cause people to interact with a site differently. Again, be vigilant in sanity checking (e.g., look for an unusual number of visitors) and exclude data until things are back to normal.

      Important consideration for temporal event RQs in refugee study -- whether external events influence use of natural disaster metaphors.

    6. Recording accurate and consistent time is often a challenge. Web log fi les record many different timestamps during a search interaction: the time the query was sent from the client, the time it was received by the server, the time results were returned from the server, and the time results were received on the client. Server data is more robust but includes unknown network latencies. In both cases the researcher needs to normalize times and synchronize times across multiple machines. It is common to divide the log data up into “days,” but what counts as a day? Is it all the data from midnight to midnight at some common time reference point or is it all the data from midnight to midnight in the user’s local time zone? Is it important to know if people behave differently in the morning than in the evening? Then local time is important. Is it important to know everything that is happening at a given time? Then all the records should be converted to a common time zone.

      Challenges of using time-based log data are similar to difficulties in the SBTF time study using Slack transcripts, social media, and Google Sheets

    7. Log Studies collect the most natural observations of people as they use systems in whatever ways they typically do, uninfl uenced by experimenters or observers. As the amount of log data that can be collected increases, log studies include many different kinds of people, from all over the world, doing many different kinds of tasks. However, because of the way log data is gathered, much less is known about the people being observed, their intentions or goals, or the contexts in which the observed behaviors occur. Observational log studies allow researchers to form an abstract picture of behavior with an existing system, whereas experimental log stud-ies enable comparisons of two or more systems.

      Benefits of log studies:

      • Complement other types of lab/field studies

      • Provide a portrait of uncensored behavior

      • Easy to capture at scale

      Disadvantages of log studies:

      • Lack of demographic data

      • Non-random sampling bias

      • Provide info on what people are doing but not their "motivations, success or satisfaction"

      • Can lack needed context (software version, what is displayed on screen, etc.)

      Ways to mitigate: Collecting, Cleaning and Using Log Data section

    8. Two common ways to partition log data are by time and by user. Partitioning by time is interesting because log data often contains signifi cant temporal features, such as periodicities (including consistent daily, weekly, and yearly patterns) and spikes in behavior during important events. It is often possible to get an up-to-the- minute picture of how people are behaving with a system from log data by compar-ing past and current behavior.

      Bookmarked for time reference.

      Mentions challenges of accounting for time zones in log data.

    9. An important characteristic of log data is that it captures actual user behavior and not recalled behaviors or subjective impressions of interactions.

      Logs can be captured on client-side (operating systems, applications, or special purpose logging software/hardware) or on server-side (web search engines or e-commerce)

    10. Large-scale log data has enabled HCI researchers to observe how information diffuses through social networks in near real-time during crisis situations (Starbird & Palen, 2010 ), characterize how people revisit web pages over time (Adar, Teevan, & Dumais, 2008 ), and compare how different interfaces for supporting email organi-zation infl uence initial uptake and sustained use (Dumais, Cutrell, Cadiz, Jancke, Sarin, & Robbins, 2003 ; Rodden & Leggett, 2010 ).

      Wide variety of uses of log data

  23. Jan 2018
  24. Mar 2017
  25. May 2015
    1. That is, the human annotators are likely to assign different relevance labels to a document, depending on the quality of the last document they had judged for the same query. In addi- tion to manually assigned labels, we further show that the implicit relevance labels inferred from click logs can also be affected by an- choring bias. Our experiments over the query logs of a commercial search engine suggested that searchers’ interaction with a document can be highly affected by the documents visited immediately be- forehand.