2,339 Matching Annotations
  1. Aug 2025
    1. Doesn't delete files from the destination. If you want to also delete files from destination, to make it match source, use the sync command instead.

      rclone copy

    1. researchers don’t consider AI today to be especially useful in guiding the scientific process,

      I see full sentences copied verbatim from this reference into the current article.

      Is these also being written by AI?

    2. have access to a vast corpus of high-quality open-access papers

      This must be a mistake - are there really that many open-access papers?

    1. if all you had done was divert money from another charity about which another equally motivated fundraiser was equally passionate.

      Interesting. What other aspects would make it positive sum? - What if a non-charity money was brought in? Even if a tiny amount of money came in here, this will be a postive sum gain for charity / humanity (assuming charity is a good use of money)

    2. lawyers

      Are lawyers not adding to human welfare? On a basic level they do right?

      I guess apart from the basic functions of holding the machinery of law, sophisticated lawyering only serves a zero-sum role of their client against the interest of another client..

    1. If the only way to make gains in the stock market was for someone else to take a loss, then the stock market wouldn't be able to go up.

      What does it mean though for a "stock market" to go up?

    2. Over time, trading gains outweigh trading losses for investors as a group.

      If this was zero sum though, can the value be accumulating at the loss of some other entity that's not in the stock market, possibly customers of the actual product etc.?

      But one could argue that customers paid for whatever value they got from the product. So that is a positive sum transaction too.

    1. feel more intuitive, powerful, and cohesive thanks to its heavily customized take on GNOME, adding custom tools and extensions. It's also one of the first distros to offer NVIDIA drivers out of the box, making it a hassle-free option for NVIDIA GPU owners—such as myself.

      Pop_OS

  2. Jul 2025
    1. While these genes would be classified as paralogs within individual genomes [32], at the population level they may be more accurately described as metaparalogs

      Doesn't ortholog already describe this in different species/sub-types?

    1. Due to higher relative abundances of viruses on skin in some cohorts of IEIs,1616.Tirosh, O. ∙ Conlan, S. ∙ Deming, C. ..., NISC, Comparative, Sequencing ProgramExpanded skin virome in DOCK8-deficient patientsNat. Med. 2018; 24:1815-1821CrossrefScopus (101)PubMedGoogle Scholar,1717.Blaustein, R.A. ∙ Shen, Z. ∙ Kashaf, S.S., NISC Comparative Sequencing Program ...Expanded microbiome niches of RAG-deficient patientsCell Rep. Med. 2023; 4, 101205Full TextFull Text (PDF)Scopus (4)Google Scholar we also included targeted 16S rRNA gene amplicon sequencing (n = 534 samples) to analyze bacterial communities due to the risk of recurrent bacterial infections

      Was 16S more usable then?

    1. in a series that are expected to have similar communities (i.e. longitudinal time series or cross-sectional studies where a significant portion of the strains are shared across samples

      what kind of cross-sectional studies will fit in here - Does this qualify?

      cross-sectional study of : 10 dairy farm workers and 6 community controls’ gut metagenomes

    1. In our case, we didn't explicitly mark anything else as an output, so there's nothing else there.

      Where do you mark things as output?

    1. functional dark matter

      Read good rant against the use of "dark matter" in biology by Murat Eren

      The dark matter in physics is a place holder for a not-yet-characterized form of matter that is distinct from ordinary matter

      On the other hand, the dark matter in microbiology describes microbes that are right in front of us. We can lyse them, we can see them under our microscopes, we can get pieces of their genomes sequenced, and we can occasionally cultivate them.

      ... stick a swab in your mouth and reconstruct a genome or cultivate a member from the “dark matter” of the human oral cavity.

    1. On the other hand, the dark matter in microbiology describes microbes that are right in front of us. We can lyse them, we can see them under our microscopes, we can get pieces of their genomes sequenced, and we can occasionally cultivate them.

      I think the core argument is that it is way too easier to bring to light something from the "dark matter" of biology with just a few sequencing runs and clever analysis.

      So this doesn't justify the analogy from dark matter in physics

    1. distant proteins with essentially no sequence homology may share the same catalytic residues

      If the definition of catalytic residues is very generous to include even 2-3 residues, is there a likelihood that random/evolutionarily unrelated pr/oteins might share such residues by chance?

      This evolutionary argument rests on very thin evidence and is going to be hard to prove by showing that the same connection does not occur in random proteins/or a set of curated proteins/domains/motifs that are definitely not related evolutionarily

    2. This approach can also be reversed to identify novel protein functions based on unusual catalytic residues.

      This application is definitely more defensible since there is a tangible way to test the hypothesis when looking for proteins of related functions

    3. Nice article, but some of the arguments are somewhat vague. It would benefit from an enhanced discussion of how evolution would occur or some concrete example of how these distantly related protein fragments are connected through evolution

    4. hinting at a potential evolutionary trajectory

      Is the tower height an easily evolvable feature of these proteins though? (quick thoughts, didn't read the reference here) - I bet looking at other/unrelated proteins for features such as these arbitrarily defined tower height can give you lot of confounding hypotheses

    1. (Cormode, 2011).

      Read this for an example of what sketching means.

      This blog would benefit from a simple example illustration for each of these paradigms..!

    2. algorithms are allowed a single or small number of passes over the data; an excellent tutorial-style overview is by Muthukrishnan (2003) .

      Streaming

    1. consider an alignment-based approach like Hostile.

      I thought that alignment based approaches are computationally expensive compared to mapping based? How does hostile-minimap2 do this better than kraken2?

    Annotators

    1. might delay plasmid evolution

      Hmm, does the segregation really affect the diversity pool as much if we consider all copies of the plasmid present in the whole community as a single pool?

  3. wwood.github.io wwood.github.io
    1. It is currently aimed at the analysis of metagenomes sequenced using Illumina short read technology.

      Does this work with long-reads as well?

      (Todd) likely due to indels in long reads and other issues we had to fight with for SeqScreen-Nano etc

    1. ‘SingleM’, which estimates community composition using conserved regions within universal marker genes.

      By conserved regions you mean that they are not identical but have some variation that you can use for profiling taxonomy?

    1. meta, fastq ->

      something seems wrong, can't have two inputs to this map. Getting error groovy ERROR ~ Invalid method invocation `call` with arguments: /home/pbk1/practice-nextflow/training/hello-nf-core/greetings.csv (sun.nio.fs.UnixPath) on _closure3 type

    1. When a process is invoked in a workflow, it must be provided a channel for each channel in the process input section

      Are the inputs optional?

    1. SYNBIO AUCTIONHANDBOOK22.4.19Made with ❤ for D.A.V. Public School, VelacheryiGEM 2019 SASTRA Team - Human Practices #1

      (from Slack chats: CR) What is everyone’s favorite resource to show to undergraduates new to synthetic biology? Bonus points to printable/written/non-video stuff

      (GP) Not directly a resource, as this assumes some level of bg has been given, but I was introduced to different parts of a genetic construct via this game during my undergrad:

    1. Physics constrained ML models are the best of both worlds of flexible fitting (include higher order interactions that won’t be included in first principles models) and interpretability.

      • such models go in the form of ODEs with coefficient matrix multiplications to known parameters, where the matrix is provided by a trained neural network.

      • This mixes the best of both worlds where known interactions/dependancies are enforced while the unknown coefficients are fit using black box models. The authors did a great job finding some nice experimental data and fitting it to the models

      Read more to figure out exactly how these models different and how the models differ and the discussion about take-aways

      • But how do we make sure it is not rigid to disallow higher order interactions etc.
    2. NSM is more accurate than mechanistic or machine learning components on experimental datasets

      Interesting, is the constrained ML model more accurate than the unconstrained one?

    1. During the next 25 years, hundreds of scholarly articles cited the letter—in many cases overgeneralizing or omitting key details, potentially helping drive overprescription of opioids in the 1990s and contributing to the ensuing wave of overdose deaths, a 2017 analysis found.

      This is a wonderful example and some great detective work tracking the citation count of this paper.

      But is it really possible that the citation count was that significant a driver in the opioid crisis? - citations could influence policy makers, who might have a higher effect - Drug makers could ride on the scientific sentiment with the misleading citations? ..

    1. I will not be arrested and deported for writing this essay. In that respect, the legal value of my citizenship remains secure. But what that citizenship is worth, on a deeper level, feels imperiled.

      Interesting meta-comment. Might be relevant when someone puts down criticism by citing the limited freedom of making it in other countries such as India

    1. Nice post arguing that don't go to an AI agent first thing. Start with simpler LLM, RAG systems that are easier to debug

      When people say "agent," they mean that last step: the LLM output controls the workflow. Most people skip straight to letting the LLM control the workflow without realizing that simpler patterns often work better. Using an agent means handing control to the LLM. But unless your task is so dynamic that its flow can’t be defined upfront, that kind of freedom usually hurts more than it helps. Most of the time, simpler workflows with humans in charge still outperform full-blown agents.

    1. nextflow pull nf-core/demo Nextflow will pull the pipeline code, meaning it will download the full repository to your local drive.

      This is the same as git pull; It just organizes the module into modules/nf-core/ for you.

      This only works for the demo module which is in the nf-core repo. For all other modules found in nf-core, you need to install and use nf-core modules install ..

    1. Many binning methods have been developed

      Overview of binning methods in this paragraph. argument for using deep learning is not very persuasive to me.. - Using composition (k-mer) or abundance (coverage) / combined - Hybrid models are "superior" - Integrating such different features efficiently is not easy, so enter the black box/ deep learning stuff.

    1. “PacBio CCS” datapoint was taken from a 2019 publication that does not represent today’s performance. For example, Karst et al. (2021) described HiFi sequencing of full-length (4.4 kb) rDNA amplicons of the same sample at 99.9993% (Q51.5) accuracy, almost two orders of magnitude greater than the previous study.

      This is a neat way to show improving technologies by overlaying it onto famous benchmarking figures from previous versions ; and following the same methodology from published papers so they don't accuse you of using biased stuff to show your method in better light..

    1. We would like to thank E. Simon and J. Silberg for their helpful discussions of this work.

      Very curious to see if this is Joff Silberg, Rice University?

    2. Interesting paper using multiple AI agents for multi-step reasoning using interdisciplinary tools to develop binders to new COVID variants

    3. requires reasoning across diverse fields from biology to computer science

      I doubt there is much computer science expertise in using already existing software?

    4. Virtual Lab consists of an LLM principal investigator agent guiding a team of LLM agents with different scientific backgrounds

      I wonder how such a multi-agent framework compares to a single large agent?

    1. shows that the internal organization of concepts in the AI model corresponds, at least somewhat, to our human notions of similarity.

      This could be confirmation bias since you are likely only flagging/commenting on things that already check the human notion of similarity.

    2. measure a kind of "distance" between features based on which neurons appeared in their activation patterns

      Is this a measured of shared neuron subsets within features?

    1. All non-mandatory command-line tool non-file arguments MUST be provided as a string via the $task.ext.args variable.

      Explanation from seqera AI

      Why this convention exists:

      Flexibility: Users can easily customize module behavior without modifying the module code itself
      Separation of concerns: Module logic stays separate from parameter configuration
      Reusability: The same module can be used with different parameters across different pipelines
      Consistency: All nf-core modules follow the same pattern for optional arguments
      

      What goes in ext.args vs. input channels:

      Input channels: Mandatory non-file arguments that are essential for the tool to function (e.g., required modes, essential parameters)
      ext.args: Optional flags, parameters with defaults, or any non-essential command-line options
      
    1. (reading in progress) Tool to process bias correction (what kinds?) in microbiome datasets ..

      • Older tools for batch correction: Use outcome variable (means?) hence risk overfitting ; are non-interpretable

      Questions

      • How is this ML model interpretable?

      • How do you have enough data to learn factors for each microbe and each batch

      • How does this fit in with Amy Willis’ framework for unobserved taxa etc.?

    1. Web application that comprehensively determines the composition of a shotgun sequence data set. It should take about 10 seconds to process an uploaded sample.

      Composition => very high level overview of what % of reads belong to these categories (human, 8 animals, microbes, plants) ; and approx depth.

      • How is it so fast? Do you do pre-processing/cleanup and QC of the reads before using sourmash like tools for profilling?'
    1. anytime a justice writes separately, the question is always why: Who is it for, and what does the justice hope to accomplish by voluntarily doing extra work?

      Each justice has a different gallery to play to? Seems to be the summary

    1. A central bank able to control domestic interest rates is a sufficient condition to allow a government to freely pursue countercyclical fiscal policy with no danger of a runaway increase in the debt ratio.

      Interesting, since the central bank is supposed to be independent from the executive/government, you are leaving it to the fact that the fiscal and monetary policy decisions don't bleed into each other?

      If the fiscal policy decisions proposed by MMT that woiuld increase the debt:GDP ratios don't affect the prices or other market features, only then this could be possible unless the state actively intervenes in the monetary policy of the central/reserve bank. - I think the mandate of central bank is to keep inflation under control and keep a check on unemployment?

    1. Thankfully, the development of AI technologies, especially Large Language Models (LLMs) [8, 9, 10] with strong reasoning, adequate knowledge reserve and excellent coding capabilities [11], is reshaping the paradigms and precepts of how people leverage bioinformatics data.

      introduction

    1. Provided the manuscript, FuncFetch extracted data such as species information, enzyme names, sequence identifiers, substrates, and products, which were subjected to extensive quality analyses
    2. identified multiple extraction errors including incorrect associations, nontarget enzymes, and hallucinations, which highlight the need for further manual curation
    1. Performing these tasks requires the installation, integration, and tuning of multiple software packages, which is not trivial even for groups with extensive bioinformatics expertise. As a result, most studies rely on ad hoc pipelines based on custom scripts and intensive manual analyses, making it difficult to reproduce or extend analysis results and hampering collaboration.

      useful text

    2. that the choice of assembler has a strong influence on the final assembly results and choosing the ideal assembler requires taking into account both contiguity and correctness

      This is something Omi workflow can help if there are clearly defined rules in the field - Where do we get such rules from? maybe review papers? - Might not be automatable but can implement in clearly defined rules (expert curated) in python that interact with user prompts via LLM

    3. level of improvement provided by MetAMOS over other assembly tools is highly dependent on the specific characteristics of the dataset being assembled
      • library size re-estimation within MetAMOS (was incorrect in tongue dorsum dataset so helped a lot)
      • Number of regions of genomic variation (helps real datasets that have a high number of these since scaffold building pipeline is good)
    4. aggressive assembly approaches sometimes result in more contiguous assemblies, but often introduce errors of the most severe kind (chimeras)

      There are trade-offs. - Could we ask the user if the tool should err on more contiguous vs chimeras with a set % - or summarize results from both and ask the user to choose..?

    5. allowing scientists to focus their attention on individual components without having to re-implement all the components of a metagenomic pipeline
    6. most studies rely on ad hoc pipelines based on custom scripts and intensive manual analyses, making it difficult to reproduce or extend analysis results and hampering collaboration
    7. MetAMOS represents an important step towards fully automated metagenomic analysis, starting with next-generation sequencing reads and producing genomic scaffolds, open-reading frames and taxonomic or functional annotations
    1. the advantages of computational pipelines over ad hoc scripts, even for simple tasks, are all more apparent with increasingly complex datasets and the use of parallel processing.

      why pipelines vs ad hoc scripts - track dependencies (statically inferred, DAGs) - rules reused for many files (parallelization) - data tracking (rapid development in subsets of pipeline ~ changing parameters,ie. avoid duplicate work when resuming workflows.)

    2. Automatic data tracking in pipelines allows only the out-of-date parts of the analyses to be rescheduled and recalculated, with minimal redundancy
    1. In contrast to Pwrake and GXP Make, Snakemake does not rely on any password-less SSH setup or custom server processes running on the cluster nodes
    1. Despite the short history of bioinformatics, several software efforts have evolved into a pipeline model

      history of toolchaining for bioinformatics

    2. running analysis in a serial rule-dependent fashion (workflow) and (2) the ability to run these tasks in parallel where possible (high-throughput).
    1. A clear statement of the nature of the intrusive judicial inquiry a parent company could be subjected to in such cases was provided by Lord Bingham when the litigation reached the House of Lords as follows:

      Intrusive Judicial Inquiry into Parent Company Liability Lord Bingham’s statement in the House of Lords highlights the level of scrutiny that a parent company may face in transnational tort claims. Courts assess whether the parent company played an active role in controlling the subsidiary’s operations, particularly in matters of health, safety, and environmental standards. This includes an inquiry into:

      Corporate Oversight – The extent to which the parent company exercised control over subsidiaries. Knowledge and Responsibility – What the parent company’s directors and employees knew or ought to have known about the subsidiary’s activities. Decision-Making and Action – Whether the parent company took positive steps to ensure compliance or failed to act, leading to harm. Documentary Evidence – Courts examine internal company records, including: Board meeting minutes Reports from directors and employees Correspondence related to oversight of the subsidiary Jurisdiction and Access to Justice The House of Lords upheld jurisdiction in the UK by applying the Connelly principle, which states that English courts should hear cases if there is a real risk that justice would not be accessible in the foreign jurisdiction. This was based on:

      The complexity of the litigation, making it difficult to fund and pursue in South Africa. The need for extensive corporate records, which were primarily located in the UK parent company’s offices. Precedents in Parent Company Liability By 2001, English courts had ruled on three key cases affirming parent company liability, establishing that:

      The legal principle was not controversial. UK courts should retain jurisdiction under forum non conveniens grounds when justice could not be obtained abroad. Impact on Transnational Litigation This judicial approach set an important precedent, paving the way for future cases like Chandler v Cape (2012) and Okpabi v Shell (2021), reinforcing the principle that parent companies may owe a duty of care to individuals harmed by the actions of their foreign subsidiaries.

  4. academic.oup.com academic.oup.com
    1. When I began my work, jazz was a stunt,” was Duke Ellington’s later critique of some of this music11Close—but the slick professionalism of the Harlem stride style also served to expand the audience for African American music in the face of discrimination from cultural elites, both within and without the black community, and despite a severe economic downturn.

      for final

    1. within the internal perspective. They are first-order claims about what is right or wrong in specific counterfactual conditions, and can thus be glossed in expressivist terms. This is underpinned by the fact that our moral attitudes respond to natural features of the world. We judge that kicking dogs is wrong because of the pain they suffer when kicked, not because we happen to disapprove of such behaviour. Quasi-realists can therefore hold that kicking dogs remains wrong in worlds at which our counterparts approve of it, for o

      .kjgjjhk

    1. Moreover, just like cognitive disinhibition, schizotypy is correlated with creativity, verbal and visual, with one caveat: Desiring isolation, being introvertive, lacking a capacity for pleasure—these do not predict creativity. They don’t make you more creative, according to the studies.

      ??

    2. with no regard for the truth of the assertion. To others and to himself, he willfully defied reality. He’d reverse himself, too. If particular lines of argument failed to persuade, he’d advocate others. He’d throw people off by adopting their position as his own. He’d say an idea was crazy, then a week later call it great.

      this sounds horrible

    1. Run a generative AI chatbot on Jetson Orin Nano Super Developer Kit. This chatbot features Ollama with Open WebUI, a widely used, open-source, chatbot server interface that connects to locally running LLMs.

      deploying Omi - Open WebUI could be used to run a local LLM through API calls on T8 server?

    1. npm is a couple of things. First and foremost, it is an online repository for the publishing of open-source Node.js projects. Second, it is a CLI tool that aids you install those packages and manage their versions and dependencies.

    Annotators

    1. many hurdles, caused by the heterogeneity of domain data, the sophistication of domain knowledge, the uniqueness of domain objectives, and the diversity of the constraints
    2. we present a comprehensive survey on domain specification techniques for large language models, an emerging direction critical for large language model applications
    3. LLMs, significantly outperforming smaller models in understanding and generating human-like text,have emerged as a promising AI research trend
    4. domainspecialization of Large Language Models (LLMs) is defined as the process of customizing general-purpose LLMs accordingto specific domain contextual data, augmented by domain-specific knowledge, optimized by the domain’s objective, andregulated by domain-specific constraints

      for introduction

    1. sketching, a popular data compression technique, can serve as an efficient adaptation strategy for LLMs while avoiding low-rank assumptions
    1. predefined workflows and rigid models, SpatialAgent employs adaptive reasoning and dynamic tool integration, allowing it to adjust to new datasets, tissue types, and biological questions
    2. Key modules. The action module (left) executes tasks such as retrieving reference datasets, converting gene names, verifying ligand–receptor interactions using existing databases, processing data with established software packages (e.g., numpy) or generating and executing custom code, while reasoning over and aggregating information from multiple sources
    1. open family of heterogeneous reasoning models that deliver exceptional reasoning capabilities, inference efficiency, and an open license for enterprise use.
    2. performs competitively with state-of-the-art reasoning models such as DeepSeek-R1 while offering superior inference throughput and memory efficiency
    3. using neural architecture search from Llama 3 models for accelerated inference, knowledge distillation, and continued pretraining, followed by a reasoning-focused post-training stage
    1. anvi’o75 was used to profile and visualize the different Turicibacter strain DNA sequences to locate putative bile salt hydrolase and 7α-HSDH homologs in contig groups, generate variability profiles, and measure gene coverage and detection statistics.
    1. Yet, taxonomic insights offer limited utility to understand functional drivers of biological systems, a pinnacle desire that brings together many corners of microbiology
    1. introduce Lyra, a subquadratic architecture for sequence modeling, grounded in the biological framework of epistasis for understanding sequence-to-function relationships
    1. we propose the use of a pangenome graph, built from assembly graphs produced by assembling short reads of the same sample with different assemblers
    2. highlights similarities between contigs from different assemblies while retaining information on contigs that appear only in one of the input assemblies
    1. present the first assembly-free and mapping-free approach for augmenting an existing pangenome graph using unassembled long reads from an individual not already present in the pangenome.
    1. agentic technology uses tool calling on the backend to obtain up-to-date information, optimize workflows and create subtasks autonomously to achieve complex goals.
    2. ability to store past interactions in memory and plan future actions encourages a personalized experience and comprehensive responses