1,102 Matching Annotations
  1. Jul 2025
    1. In today’s fast-moving, AI-powered era, autonomous agents are playing a bigger role than ever. They are helping businesses run smoother and making decisions affecting millions of lives every day. While these systems are designed to make our lives easier and unlock new opportunities, we can’t get carried away—we need to implement proper AI Agent Evaluation frameworks and best practices to ensure these systems actually work as intended and follow ethical AI principles.

      Explore the key metrics, tools, and frameworks used for AI agent evaluation. Learn how to assess performance, reliability, and efficiency of AI agents in real-world scenarios.

    Tags

    Annotators

  2. Jun 2025
  3. May 2025
  4. Mar 2025
  5. Feb 2025
  6. Oct 2024
  7. Sep 2024
    1. Disable all observers in your test suite by default. They should not be complicating your model tests because they should have separate concerns anyway. You don't need to unit test that observers actually fire, because ActiveRecord's test suite does that, and your integration tests will cover it.
    2. I emphatically disagree with BlueFish about observers being difficult to properly unit test. This is precisely the biggest point that distinguishes them from lifecycle callbacks: you can test observers in isolation, and doing so discourages you from falling into many of the state- and order-heavy design pitfalls BlueFish refers to (which again I think is more often true of lifecycle callbacks).
  8. Aug 2024

    Tags

    Annotators

  9. Jul 2024
  10. May 2024
    1. l'enquête que j'ai conduite pendant 5 ans avec annecler de faossé 00:53:00 dans les Hautes Alpes près de la frontière italienne dans une région qui est autour du col de Montgenèvre l'un des principaux points d'entrée en France par le sud-est l'un des deux points d'entrée en 00:53:11 en France par le sud-est montre que le protocole décrit dans les textes n'est généralement pas suivi alors même que depuis 2016 la proportion des jeunes garçons en provenance d'Afrique sub-saharienne se déclarant mineur a 00:53:24 souvent été élevée alors frontière d'abord on a eu pendant plusieurs années un refoulement presque systématique par les policiers chargés de la garder qui contestait l'âge déclaré par le jeune voire déchirer son acte de naissance 00:53:38 jusqu'à ce que plusieurs condamnations de l'État par les tribunaux administratifs ne conduisent ce dernier à se montrer plus respectueux de la loi en adressant ses garçons à l'aide sociale à l'enfance via une association 00:53:50 locale ce qui est désormais fait le plus souvent certains agents récal Citran continuant CEP pendant à les renvoyer en Italie parfois après avoir falsifié leurs documents ainsi que le constatent 00:54:02 les associations qui de l'autre côté de la frontière côté italien conserve les copies des originaux
    2. c'est aujourd'hui dans les services départementaux que ceux qui 00:54:14 se déclarent mineurs se heurent aux obstacles les plus insurmontables il arrive qu'il soit rejeté dès la demande de rendez-vous par l'agent administratif chargé de l'accueil qui simplement dit au jeunes qu' n'est pas mineur sur sa seu 00:54:27 ne lui laissant même pas la possibilité de déposer un dossier toutefois le plus souvent c'est l'évaluation proprement dite de minorités qui s'avère l'étape infranchissable cette évaluation est souvent faite par des personnels sans 00:54:38 formation spécifique et soumise à la pression politique du Conseil départemental dont les élus s'inquiètent des dépenses supplémentaires occasionnées par la prise en charge des mineurs non accompagnés et ce bien qu'un plan national a été mise en place pour 00:54:51 répartir ces derniers sur tout le territoire
  11. Apr 2024
    1. if your treatments are ordered, don't compare each mean with each other mean (multiple comparisons), instead do one test for trend to ask if the outcome is linearly related with treatment number

      How do you do hypothesis testing for trends for an ordered categorical variable?

      Could you convert x to numbers (1,2,3) and run a linear regression y ~ x? or even categorical ordered variables can be linearly regressed?

    1. We quote because we are afraid to-change words, lest there be a change in meaning.

      Quotations are easier to collect than writing things out in one's own words, not only because it requires no work, but we may be afraid of changing the original meaning by changing the original words or by collapsing the context and divorcing the words from their original environment.

      Perhaps some may be afraid that the words sound "right" and they have a sense of understanding of them, but they don't quite have a full grasp of the situation. Of course this may be remedied by the reader or listener not only by putting heard stories into their own words and providing additional concrete illustrative examples of the concepts. These exercises are meant to ensure that one has properly heard/read and understood a concept. Psychologists call this paraphrasing or repetition the "echo effect" (others might say parroting or mirroring) and have found that it can help to build understanding, connection, and likeability between people. Great leaders who do this will be sure to make sure that credit for the original ideas goes to the originator and not to themselves simply because they repeated it, especially in group settings where their words may have more primacy amidst their underlings.

      (I can't find it at the moment, but there's a name/tag for this in my notes? looping?)

      Beyond this, can one place the idea into a more clear language than the original? Add some poetry perhaps? Make the concept into a concrete meme to make it more memorable?

      Journalists like to quote because it gives primacy of voice to the speaker and provides the reader with the sense that they're getting the original from which they might make up their own minds. It also provides a veneer of vérité to their reportage.

      Link this back to Terrence's comedy: https://hypothes.is/a/xe15ZKPGEe6NJkeL77Ji4Q

    2. Description and illustration are^ comple-mentary, they give together a more complete picture than citherwithout the other.

      Kaiser says that "description and illustration are complementary, they give together a more complete picture than either without the other" and this sentiment is similar to Mortimer J. Adler and Charles Van Doren's pedagogy of restatement and providing concrete examples a means of testing understanding.

      See: - https://hypothes.is/a/RgUa-mOcEe6PChv_seYXZA - https://hypothes.is/a/B3sDhlm5Ee6wF0fRYO0OQg

  12. Mar 2024
  13. Feb 2024
  14. Dec 2023
  15. Nov 2023
  16. Oct 2023
    1. Barzun, Jacques. “Opinion | Multiple Choice Flunks Out.” The New York Times, October 11, 1988, sec. Opinion. https://www.nytimes.com/1988/10/11/opinion/multiple-choice-flunks-out.html.

      Archived copy at https://web.archive.org/web/20231022192353/https://www.nytimes.com/1988/10/11/opinion/multiple-choice-flunks-out.html. Internet Archive.

      Barzun takes standardized multiple-choice tests to task.

      A version of this article appears in Barzun's book: Barzun, Jacques. Begin Here: The Forgotten Conditions of Teaching and Learning. University of Chicago Press, 1991. http://archive.org/details/begin-here-the-forgotten-conditions-of-teaching-and-learning.

    2. He pointed out that these questions penalize the more imaginative and favor those who are content to collect facts. Therefore, multiple-choice test statistics, in all their uses, are misleading.

      He = Banesh Hoffman

      This is tangentially similar to Malcolm Gladwell's claim that standardized testing for law school privileges certain types of thinkers over others, something which creates thinkers who are good at quick things with respect to time pressures rather than slower and more deliberate thinkers who are needed at higher level functions like the Supreme Court.

      See: The Tortoise and the Hare, S4 E2 of Revisionist History https://www.pushkin.fm/podcasts/revisionist-history/the-tortoise-and-the-hare

      testing imagination versus fact memorization/simple recall compared with thinking quickly under pressure or slowly with time and increased ability to reason

    1. Youmust apprehend the unity with definiteness. There is only oneway to know that you have succeeded. You must be able totell yourself or anybody else what the unity is, and in a fewwords. ( If it requires too many words, you have not seen theunity but a multiplicity. ) Do not be satisfied with "feeling theunity" that you cannot express. The reader who says, "I knowwhat it is, but I just can't say it," probably does not even foolhimself.

      Adler/Van Doren use the statement of unity of a work as an example of testing one's understanding of a work and its contents.

      (Again, did this exist in the 1940 edition?)

      Who do McDaniel and Donnelly 1996 cite in their work as predecessors of their idea as certainly it existed?


      Examples in the literature of this same idea/method after this: - https://hypothes.is/a/TclhyMfqEeyTkQdZl43ZyA (Feynman Technique in ZK; relationship to Ahrens) - explain it to me like I'm a 5th grader - https://hypothes.is/a/BKhfvuIyEeyZj_v7eMiYcg ("People talk" in Algebra Project) - https://hypothes.is/a/m0KQSDlZEeyYFLulG9z0vw (Intellectual Life version) - https://hypothes.is/a/OyAAflm5Ee6GStMjUMCKbw (earlier version of statement in this same work) - https://hypothes.is/a/iV5MwjivEe23zyebtBagfw (Ahrens' version of elaboration citing McDaniel and Donnelly 1996, this uses both restatement and application to a situation as a means of testing understanding) - https://hypothes.is/a/B3sDhlm5Ee6wF0fRYO0OQg (Adler's version for testing understanding from his video) - https://hypothes.is/a/rh1M5vdEEeut4pOOF7OYNA (Manfred Kuenh and Luhmann's reformulating writing)

  17. Sep 2023
    1. https://www.reddit.com/r/Zettelkasten/comments/10jx7gg/wooden_antinet_zettelkasten/

      Scott Scheper commissioned a two drawer solid wood (cedar) zettelkasten box similar to those from the early 20th century. He had it listed on his website initially for $995 and then later for a reduced $495.

      He created a waitlist sign up for it, ostensibly to test the interest in manufacturing/selling them as a product. To my knowledge he never made any beyond the initial prototype.

      The high cost likely dampened interest compared to the much cheaper primary and secondary markets for these sorts of storage containers.

      See also:<br /> - $995 https://web.archive.org/web/20230124062200/https://www.antinet.org/wooden-antinet-waitlist - $495 reduction https://web.archive.org/web/20230306195625/https://www.antinet.org/wooden-antinet-waitlist

  18. sendersupport.olc.protection.outlook.com sendersupport.olc.protection.outlook.com
  19. Aug 2023
    1. I ran into the same problem and never really found a good answer via the test objects. The only solution I saw was to actually update the session via a controller. I defined a new action in one of my controllers from within test_helper (so the action does not exist when actually runnning the application). I also had to create an entry in routes. Maybe there’s a better way to update routes while testing. So from my integration test I can do the following and verfiy: assert(session[:fake].nil?, “starts empty”) v = ‘Yuck’ get ‘/user_session’, :fake => v assert_equal(v, session[:fake], “value was set”)
  20. Jul 2023
    1. REPLs are nice but they work well only for reasonably isolated code with few dependencies. It's hard to set up a complex object to pass into a function. It's harder still to set up an elaborate context of dependencies around that function.

      I wonder how much of this is accomplishable by automatically parameterizing code by the types that aren't used internally so they implementation can forget about the specifics. In addition some sort of meta-programming capability to automatically generate arbitrary instances or a richer form of trace types for user types would go a long way to simplifying the trace generation.

  21. Jun 2023
  22. May 2023
  23. Mar 2023
    1. Industrial concerns doubtless suffer enormous losses from the employment of persons whose mental ability is not equal to the tasks they are expected to perform. The present methods of trying out new employees, transferring them to simpler and simpler jobs as their inefficiency becomes apparent, is wasteful and to a great extent unnecessary. A cheaper and more satisfactory method would be to employ a psychologist to examine applicants for positions and to weed out the unfit. Any business employing as many as five hundred or a thousand workers, as, for example, a large department store, could save in this way several times the salary of a well-trained psychologist.

      I think this is interesting because they are saying that intelligence testing could be used to determine job positions. I agree that employing a psychologist to examine applications for positions would be beneficial because the employer doesn't have to worry about certain things the psychologist would look for. I agree that using a psychologist to weed people out of decision of employment could be effective because many people are applying, but the employers only want certain people for that job. I think this is relevant to the history of psychology because there are some companies who use people to determine who is deemed fit for the company, and this is what they wanted to start doing so they could find the best employees for that particular job.

  24. Feb 2023
    1. What we ultimately should care about is being able to use our knowledge to produce something new, whatever that may be. To not merely reproduce you must understand the material. And understanding requires application, a hermeneutic principle that particularly Gadamer worked out extensively. If you really want to measure your level of understanding, you should try to apply or explain something to yourself or someone else.
  25. Jan 2023
    1. I've seen a bunch of people sharing this and repeating the conclusion: that the success is because the CEO loves books t/f you need passionate leaders and... while I think that's true, I don't think that's the conclusion to draw here. The winning strategy wasn't love, it was delegation and local, on the ground, knowledge.

      This win comes from a leader who acknowledges people in the stores know their communities and can see and react faster to sales trends in store... <br /> —Aram Zucker-Scharff (@Chronotope@indieweb.social) https://indieweb.social/@Chronotope/109597430733908319 Dec 29, 2022, 06:27 · Mastodon for Android

      Also heavily at play here in their decentralization of control is regression toward the mean (Galton, 1886) by spreading out buying decisions over a more diverse group which is more likely to reflect the buying population than one or two corporate buyers whose individual bad decisions can destroy a company.

      How is one to balance these sorts of decisions at the center of a company? What role do examples of tastemakers and creatives have in spaces like fashion for this? How about the control exerted by Steve Jobs at Apple in shaping the purchasing decisions of the users vis-a-vis auteur theory? (Or more broadly, how does one retain the idea of a central vision or voice with the creative or business inputs of dozens, hundreds, or thousands of others?)

      How can you balance the regression to the mean with potentially cutting edge internal ideas which may give the company a more competitive edge versus the mean?

  26. Dec 2022
  27. Nov 2022
    1. I've developed additional perspective on this issue - I have DNS settings in my hosts file that are what resolve the visits to localhost, but also preserve the subdomain in the request (this latter point is important because Rails path helpers care which subdomain is being requested) To sum up the scope of the problem as it stands now - I need a way within Heroku/Capybara system tests to both route requests to localhost, but also maintain the subdomain information of the request. I've been able to accomplish one or the other, but haven't found a configuration that provides both yet.
    1. Honestly, at this point, I don't even know what tools I'm using, and which is responsible for what feature. Diving into the code of capybara and cucumber yields hundreds of lines of metaprogramming magic that somehow accretes into a testing framework. It's really making me loathe TDD despite my previous youthful enthusiasm.

      opinion: too much metaprogramming magic

      I'm not so sure it's "too much" though... Any framework or large software project is going to feel that way to a newcomer looking at the code, due to the number of layers of abstractions, etc. that eventually were added/needed by the maintainers to make it maintainable, decoupled, etc.

  28. Oct 2022
  29. Sep 2022
  30. Aug 2022
  31. Jul 2022
    1. It really slows down your test suite accessing the disk.So yes, in principle it slows down your tests. There is a "school of testing" where developer should isolate the layer responsible for retrieving state and just set some state in memory and test functionality (as if Repository pattern). The thing is Rails is a tightly coupled with implementation logic of state retrieval on core level and prefers "school of testing" in which you couple logic with state retrial to some degree.Good example of this is how models are tested in Rails. You could just build entire test suite calling `FactoryBot.build` and never ever use `FactoryBot.create` and stub method all around and your tests will be lighting fast (like 5s to run your entire test suite). This is highly unproductive to achieve and I failed many times trying to achieve that because I was spending more time maintaining my tests then writing something productive for business.Or you can took more pragmatic route and save database record where is too difficult to just 'build' the factory (e.g. Controller tests, association tests etc)Same I would say for saving the file to the Disk. Yes you are right You could just "not save the file to disk" and save few milliseconds. But at the same time you will in future stumble upon scenarios where your tests are not passing because the file is not there (e.g. file processing validations) Is it really worth it ? I never worked on a project where saving file to a disk would slow down tests significantly enough that would be an issue (and I work for company where core business is related to file uploading) Especially now that we have SSD drives in every laptop/server it's blazing fast so at best you would save 1 seconds for entire test suite (given you call FactoryBot traits to set/store file where it make sense. Not when every time you build an object.)
    1. Điểm đo ở xã Hải Bối, huyện Đông Anh ghi nhận lượng mưa lớn với gần 250 mm. Tại nội thành, quận Bắc Từ Liêm mưa lớn nhất 240 mm, Cầu Giấy gần 140 mm, các quận Nam Từ Liêm, Bắc Từ Liêm, Hà Đông trên 100 mm.

      Đoạn này là sao ấy nhỉ

  32. Jun 2022
    1. When a few of his friends became interested in thetopic, he took eight minutes to progressively summarize the bestexcerpts before sharing the summarized article with them. The timethat he had spent reading and understanding a complex subject paidoff in time savings for his friends, while also giving them a newinterest to connect over.

      To test one's own understanding of a topic one has read about and studied, it can be useful to discuss it or describe one's understanding to friends or colleagues in conversations. This will help you discover where the holes are based on the person's understanding and comprehension of what you've said. Can you fill in all the holes where they have questions? Are their questions your new questions which have exposed holes that need to be filled in your understanding or in the space itself.

      I do this regularly in conversations with people. It makes the topics of conversation more varied and interesting and helps out your thinking at the same time. In particular I've been doing this method in Dan Allosso's book club. It's almost like trying on a new idea the way one might try on a piece of clothing to see how it fits or how one likes it for potential purchase. If an idea "fits" then continue refining it and add it to your knowledge base. These conversations also help to better link ideas in my thought space to those of what we're reading. (I wonder if others are doing these same patterns, Dan seems to, but I don't have as good a grasp on this with other participants).

      Link to :<br /> - Ahren's idea of writing to expose understanding<br /> - Feynman technique<br /> - Socratic method (this is sort of side or tangential method to this) <- define this better/refine

  33. May 2022
  34. Apr 2022
    1. Ashish K. Jha, MD, MPH. (2020, October 27). President keeps saying we have more cases because we are testing more This is not true But wait, how do we know? Doesn’t more testing lead to identifying more cases? Actually, it does So we look at other data to know if its just about testing or underlying infections Thread [Tweet]. @ashishkjha. https://twitter.com/ashishkjha/status/1321118890513080322

    1. Dr Nisreen Alwan 🌻. (2021, March 14). Exactly a year ago we wrote this letter in the Times. We were gobsmacked! We just didn’t understand what the government was basing all its decisions on including stopping testing and the herd immunity by natural infection stuff. We wanted to see the evidence backing them. [Tweet]. @Dr2NisreenAlwan. https://twitter.com/Dr2NisreenAlwan/status/1371168531669258242

    1. Denise Dewald, MD 🗽. (2021, August 12). Here are some modeling predictions for the delta variant from COVSIM (group at North Carolina State): PLEASE CHECK THIS OUT - RESOURCES TO SHARE WITH YOUR SCHOOL DISTRICT School-level COVID-19 Modeling Results for North Carolina for #DeltaVariant https://t.co/zU5hB9bKlY [Tweet]. @denise_dewald. https://twitter.com/denise_dewald/status/1425626289399009288

    1. Manual testing is a type of software test in which testers manually carry out test cases without using automation tools. Testers are actually behind the screen of the application, carry out test cases and see what the result is.

      What is manual testing?

      Manual testing is a type of software test in which testers manually carry out test cases without using automation tools. Testers are actually behind the screen of the application, carry out test cases and see what the result is.

  35. Mar 2022
    1. A test case is a series of actions that are performed to determine a specific function or functionality of your application. Test scenarios are rather vague and include a wide range of variables. However, testing is all about being very specific. That is why we need elaborate test cases.

      Test cases, examples and Best Practices A test case is a series of actions that are performed to determine a specific function or functionality of your application. Test scenarios are rather vague and include a wide range of variables. However, testing is all about being very specific. That is why we need elaborate test cases.

    1. Capybara can get us part of the way there. It allows us to work with an API rather than manipulating the HTML directly, but what it provides isn't an application specific API. It gives us low-level API methods like find, fill_in, and click_button, but it doesn't provide us with high-level methods to do things like "sign in to the app" or "click the Dashboard item in the navigation bar".
  36. Feb 2022
    1. Because CENS was an academic research lab, faculty members held a large amount of power to decide which projects students pursued and what issues students faced during design, testing, and implem

      CENS seems like it takes its job seriously. Like I said in my other annotation for week 5. Just because data scientists are trying to root out bias in all forms doesn't mean it is always effective or that what is effective can't be improved.

  37. Jan 2022
    1. Que faire si vous ne recevez pas de réponse écrite ?Consultez l'accusé réception de votre demande.L'AR indique la date à laquelle, en l'absence de réponse écrite, votre demande est considérée comme acceptée ou refusée.Répondez aux questions successives et les réponses s’afficheront automatiquementSi l'AR indique que la demande est acceptée en l'absence de réponse écriteIl s'agit d'une décision implicite d'acceptation : Lorsque l'administration ne répond pas à une demande, le silence signifie que la demande est acceptée..L'AR indique que vous pouvez demander une attestation à l'administration.Si l'AR indique que la demande est refusée en l'absence de réponse écriteIl s'agit d'une décision implicite de rejet : Lorsque l'administration ne répond pas à un recours gracieux ou hiérarchique, le silence est assimilé à un rejet ..L'AR indique comment contester ce refus (voies et délais de recours). Choisir votre cas Si l'AR indique que la demande est acceptée en l'absence de réponse écrite Si l'AR indique que la demande est refusée en l'absence de réponse écrite Si l'AR indique que la demande est acceptée en l'absence de réponse écriteIl s'agit d'une décision implicite d'acceptation: titleContent.L'AR indique que vous pouvez demander une attestation à l'administration.Si l'AR indique que la demande est refusée en l'absence de réponse écriteIl s'agit d'une décision implicite de rejet: titleContent.L'AR indique comment contester ce refus (voies et délais de recours).   À noter : sauf exceptions, si vous ne recevez pas de réponse au bout de 2 mois, cela signifie que votre demande est acceptée. C'est ce qu'on appelle la règle du silence vaut acceptation (SVA).
    1. Elliott, P., Eales, O., Bodinier, B., Tang, D., Wang, H., Jonnerby, J., Haw, D., Elliott, J., Whitaker, M., Walters, C., Atchison, C., Diggle, P., Page, A., Trotter, A., Ashby, D., Barclay, W., Taylor, G., Ward, H., Darzi, A., … Donnelly, C. (2022). Post-peak dynamics of a national Omicron SARS-CoV-2 epidemic during January 2022 [Working Paper]. http://spiral.imperial.ac.uk/handle/10044/1/93887

    1. Dr. Thrasher wrote a book! (2022, January 8). My cousin wanted to get tested. She waited in an auto testing line for 6.5 hours, and stayed in it bc she was traveling to bury her Daddy. How many people give up in such long lines? How many cases upwards of a million are we losing bc Biden et all failed on home tests? Https://t.co/Q7WVy5qD4v [Tweet]. @thrasherxy. https://twitter.com/thrasherxy/status/1479826389142491146