2,359 Matching Annotations

Jul 2025
www.biorxiv.org www.biorxiv.org

Reference-free Structural Variant Detection in Microbiomes via Long-read Coassembly Graphs

1
1. pbk1 11 Jul 2025
  
  in Public
  
  In isolate genomics, the goal of SV detection is relatively straightforward: detect long genomic differences between a sequence and reference genome that can be classified as an insertion, deletion, inversion, duplication, translocation, or any combination
Visit annotations in context

Annotators

pbk1

URL

biorxiv.org/content/10.1101/2024.01.25.577285v1
nf-co.re nf-co.re

Docs: Install a module

1
1. pbk1 11 Jul 2025
  
  in Public
  
  You can install modules from nf-core/modules in your pipeline using nf-core modules install. A module installed this way will be installed to the ./modules/nf-core/modules directory.
Visit annotations in context

Annotators

pbk1

URL

nf-co.re/docs/nf-core-tools/modules/install
www.nextflow.io www.nextflow.io

Conda environments — Nextflow 19.10.0 documentation

7
1. pbk1 11 Jul 2025
  
  in Public
  
  Nextflow automatically creates and activates the Conda environment(s) given the dependencies specified by each process.
2. pbk1 11 Jul 2025
  
  in Public
  
  The use of Conda recipes specified using the conda directive needs to be enabled explicitly in the pipeline configuration file (i.e. nextflow.config):
3. pbk1 11 Jul 2025
  
  in Public
  
  conda.enabled = true
4. pbk1 11 Jul 2025
  
  in Public
  
  Specifying the Conda environments in a separate configuration profile is therefore recommended to allow the execution via a command line option and to enhance the workflow portability
5. pbk1 11 Jul 2025
  
  in Public
  
  process.conda = 'samtools'
  
  does this mean all tools / processes using conda need to be pre-specified in the conda.profile? seems dumb..
6. pbk1 11 Jul 2025
  
  in Public
  
  You can also download Conda lock files from Wave build pages. These files list every package and its dependencies, so Conda doesn’t need to resolve the environment. This makes environment setup faster and more reproducible.
7. pbk1 11 Jul 2025
  
  in Public
  
  conda '/some/path/my-env.yaml'
  
  What is this path relative to?
Visit annotations in context

Annotators

pbk1

URL

nextflow.io/docs/latest/conda.html
www.nextflow.io www.nextflow.io

Configuration options — Nextflow documentation

2
1. pbk1 11 Jul 2025
  
  in Public
  
  conda.useMicromamba
2. pbk1 11 Jul 2025
  
  in Public
  
  Uses the micromamba binary instead of conda to create Conda environments
Visit annotations in context

Annotators

pbk1

URL

nextflow.io/docs/latest/reference/config.html
nf-co.re nf-co.re

Docs: Pipeline configuration

1
1. pbk1 11 Jul 2025
  
  in Public
  
  Please only use Conda as a last resort, i.e., when it’s not possible to run the pipeline with Docker or Singularity.
Visit annotations in context

Annotators

pbk1

URL

nf-co.re/docs/usage/configuration
www.nextflow.io www.nextflow.io

Workflows — Nextflow documentation

3
1. pbk1 11 Jul 2025
  
  in Public
  
  Any channel in the workflow can be assigned to an output, including process and subworkflow outputs. This approach is intended to replace the publishDir directive.
  
  I guess this is to publish important files and exclude intermediate ones?
2. pbk1 11 Jul 2025
  
  in Public
  
  A named workflow is a workflow that can be called by other workflows:
3. pbk1 11 Jul 2025
  
  in Public
  
  As a best practice, params should be used only in the entry workflow and passed to workflows and processes as explicit inputs.
Visit annotations in context

Annotators

pbk1

URL

nextflow.io/docs/latest/workflow.html
academic.oup.com academic.oup.com

Mapler: A pipeline for assessing assembly quality in taxonomically rich metagenomes sequenced with HiFi reads

5
1. pbk1 11 Jul 2025
  
  in Public
  
  assembly
2. pbk1 11 Jul 2025
  
  in Public
  
  challenging in complex environmental samples consisting of hundreds to thousands of populations
3. pbk1 11 Jul 2025
  
  in Public
  
  Mapler is a metagenome assembly and evaluation pipeline
4. pbk1 11 Jul 2025
  
  in Public
  
  Hi-Fi long read
  
  means pacbio with long and accurate reads
5. pbk1 11 Jul 2025
  
  in Public
  
  novel metrics assessing the diversity that remains uncaptured by the assembly process
Visit annotations in context

Annotators

pbk1

URL

academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btaf334/8157874
training.nextflow.io training.nextflow.io

Part 2: Hello Channels - training.nextflow.io

5
1. pbk1 11 Jul 2025
  
  in Public
  
  output: path "${greeting}-output.txt" script: """ echo '$greeting' > '$greeting-output.txt'
  
  why is there a repetition?
2. pbk1 11 Jul 2025
  
  in Public
  
  You can think of view() as a debugging tool, like a print() statement in Python
3. pbk1 11 Jul 2025
  
  in Public
  
  We prefer to be explicit to aid code clarity, as such the $it syntax is discouraged and will slowly be phased out of the Nextflow language.
4. pbk1 11 Jul 2025
  
  in Public
  
  We are using an operator closure here - the curly brackets.
5. pbk1 11 Jul 2025
  
  in Public
  
  splitCsv() reads each line into an array, and each comma-separated value in the line becomes an element in the array
Visit annotations in context

Annotators

pbk1

URL

training.nextflow.io/2.2/hello_nextflow/02_hello_channels/
www.nextflow.io www.nextflow.io

Channels — Nextflow 0.23.2 documentation

1
1. pbk1 11 Jul 2025
  
  in Public
  
  a process will emit value channels if it is invoked with all value channels, including simple values which are implicitly wrapped in a value channel.
Visit annotations in context

Annotators

pbk1

URL

nextflow.io/docs/latest/channel.html
nextflow.io nextflow.io

Preparing for strict syntax — Nextflow documentation

1
1. pbk1 11 Jul 2025
  
  in Public
  
  In the strict syntax, variables must be declared with def and must not specify a type:
Visit annotations in context

Annotators

pbk1

URL

nextflow.io/docs/latest/strict-syntax.html
www.nextflow.io www.nextflow.io

Operators — Nextflow documentation

1
1. pbk1 11 Jul 2025
  
  in Public
  
  each row is simply a list of columns.
Visit annotations in context

Annotators

pbk1

URL

nextflow.io/docs/latest/reference/operator.html
training.nextflow.io training.nextflow.io

Part 3: Hello Workflow - training.nextflow.io

4
1. pbk1 11 Jul 2025
  
  in Public
  
  output: path "UPPER-${input_file}" script: """ cat '$input_file' | tr '[a-z]' '[A-Z]' > 'UPPER-${input_file}'
  
  how can I minimize the repitition in the output path name in this nextflow process?
2. pbk1 11 Jul 2025
  
  in Public
  
  You know how to collect outputs from a batch of process calls and feed them into a joint analysis or summation step.
3. pbk1 11 Jul 2025
  
  in Public
  
  things could get a little tricky, because we need to be able to handle an arbitrary number of input files. Specifically, we can't write the command up front, so we need to tell Nextflow how to compose it at runtime based on what inputs flow into the process.
4. pbk1 11 Jul 2025
  
  in Public
  
  we're not adding the operator in the context of a channel factory, but to an output channel.
Visit annotations in context

Annotators

pbk1

URL

training.nextflow.io/2.2/hello_nextflow/03_hello_workflow/
www.biorxiv.org www.biorxiv.org

tMHG-Finder: Tree-guided Maximal Homologous Group Finder for Bacterial Genomes

3
1. pbk1 11 Jul 2025
  
  in Public
  
  MHG is formed by identifying and grouping all homologous sequences
2. pbk1 11 Jul 2025
  
  in Public
  
  evolutionary events
3. pbk1 11 Jul 2025
  
  in Public
  
  are encapsulated within the same MHG
Visit annotations in context

Annotators

pbk1

URL

biorxiv.org/content/10.1101/2025.03.16.643543v1
training.nextflow.io training.nextflow.io

Part 4: Hello Modules - training.nextflow.io

1
1. pbk1 11 Jul 2025
  
  in Public
  
  write another pipeline that calls on one of those processes, you just need to type one short import statement to use the relevant module. This is better than just copy-pasting the code, because if later you decide to improve the module, all your pipelines will inherit the improvements.
Visit annotations in context

Annotators

pbk1

URL

training.nextflow.io/2.2/hello_nextflow/04_hello_modules/
seqera.io seqera.io

Rethinking containers for cloud native pipelines | Seqera

5
1. pbk1 11 Jul 2025
  
  in Public
  
  They encapsulate applications and dependencies in portable, self-contained packages that can be easily distributed. Containers are also key to enabling predictable and reproducible results.
2. pbk1 11 Jul 2025
  
  in Public
  
  Nextflow was one of the first workflow technologies to fully embrace containers for data analysis pipelines.
  
  as opposed to using conda as much as possible before containerization?
3. pbk1 11 Jul 2025
  
  in Public
  
  Today, workflows may comprise dozens of distinct container images. Pipeline developers must manage and maintain these containers and ensure that their functionality precisely aligns with the requirements of every pipeline task.
4. pbk1 11 Jul 2025
  
  in Public
  
  Wave — a container provisioning and augmentation service that is fully integrated with the Nextflow and Nextflow Tower ecosystems.
5. pbk1 11 Jul 2025
  
  in Public
  
  Wave allows developers to manage containers as part of the pipeline itself
Visit annotations in context

Annotators

pbk1

URL

seqera.io/blog/rethinking-containers-for-cloud-native-pipelines/
www.biomesense.com www.biomesense.com

Biomesense

1
1. pbk1 11 Jul 2025
  
  in Public
  
  Our platform combines novel hardware with AI-enabled bioinformatics to unlock the personalized medicine potential of the gut microbiome
  
  I wonder that bioinformatics they are doing that could be useful for omi
Visit annotations in context

Annotators

pbk1

URL

biomesense.com/
academic.oup.com academic.oup.com

Assembling bacterial puzzles: piecing together functions into microbial pathways

1
1. pbk1 11 Jul 2025
  
  in Public
  
  Is this relevant to metabolic reconstruction part of SOMATEM pathways?
  
  (private to pbk1:) Read annotations in public group
  
  metabolic reconstruction read-later
Visit annotations in context

Tags

metabolic reconstruction

read-later

Annotators

pbk1

URL

academic.oup.com/nargab/article/6/3/lqae109/7740577
www.biorxiv.org www.biorxiv.org

SKiM: Accurately Classifying Metagenomic ONT Reads in Limited Memory

1
1. pbk1 11 Jul 2025
  
  in Public
  
  Todd says this is no comparison to Sylph. Read later to figure out if interested (this one doesn't cite Sylph, maybe peers?)
  
  (pbk1 private) Read annotations in public group
  
  taxonomic profiling metagenomics lightweight
Visit annotations in context

Tags

taxonomic profiling

lightweight

metagenomics

Annotators

pbk1

URL

biorxiv.org/content/10.1101/2025.05.13.653326v1
www.nextflow.io www.nextflow.io

Configuration — Nextflow documentation

4
1. pbk1 11 Jul 2025
  
  in Public
  
  launch directory
  
  is this the working dir from where nextflow is called from?
2. pbk1 11 Jul 2025
  
  in Public
  
  profiles are applied in the order in which they are specified on the command line.
3. pbk1 11 Jul 2025
  
  in Public
  
  The standard profile is used by default when no profile is specified.
4. pbk1 11 Jul 2025
  
  in Public
  
  Config scopes are used to group related config options
Visit annotations in context

Annotators

pbk1

URL

nextflow.io/docs/latest/config.html
seqera.io seqera.io

Nextflow DSL 2 is here! | Seqera

3
1. pbk1 11 Jul 2025
  
  in Public
  
  The only difference when compared with legacy syntax is that the process is not bound with specific input and output channels, as was previously required using the from and into keywords respectively
2. pbk1 11 Jul 2025
  
  in Public
  
  Another exciting feature of Nextflow DSL 2 is the ability to compose built-in operators, pipeline processes and sub-workflows with the pipe (|) operator
3. pbk1 11 Jul 2025
  
  in Public
  
  mimics the expressiveness of the Unix pipe model
Visit annotations in context

Annotators

pbk1

URL

seqera.io/blog/dsl2-is-here/
www.nature.com www.nature.com

Guidelines for preventing and reporting contamination in low-biomass microbiome studies

2
1. pbk1 11 Jul 2025
  
  in Public
  
  (Table 3) Might be relevant to decontamination approaches suggested in the SOMATEM pipeline?
  
  (private pbk1) Read annotations in public group
2. pbk1 11 Jul 2025
  
  in Public
  
  due to their inherent complexity and the limited availability of decontamination pipelines compared with those for marker gene datasets
Visit annotations in context

Annotators

pbk1

URL

nature.com/articles/s41564-025-02035-2
nf-co.re nf-co.re

ampliseq: Usage

3
1. pbk1 11 Jul 2025
  
  in Public
  
  multiple profiles can be loaded
2. pbk1 11 Jul 2025
  
  in Public
  
  later profiles can overwrite earlier profiles.
3. pbk1 11 Jul 2025
  
  in Public
  
  Please only use Conda as a last resort i.e. when it’s not possible to run the pipeline with Docker, Singularity, Podman, Shifter, Charliecloud, or Apptainer.
  
  Why?: seqera AI chat says nf-core has reproducibility issues with conda: run-time resolution causing hash verification failures. Here's the summary -
  
  When Conda is Still Appropriate
  
  Conda remains useful for:
  
  Development and prototyping: When you need flexibility to update packages Custom/proprietary software: When containers aren't available Resource-constrained environments: Where container overhead is problematic Legacy systems: Where container runtimes aren't available
  
  Best Practice Recommendation
  
  For production workflows, the recommended approach is:
  
  Primary: Use Docker/Singularity containers Development: Use Wave to generate containers from conda specs Fallback: Use conda only when containers aren't feasible Future: Leverage conda lock files for maximum reproducibility
  
  The "last resort" recommendation reflects the hard-learned lessons from managing nearly 1,500 nf-core modules and the practical challenges of maintaining reproducible bioinformatics workflows at scale.
Visit annotations in context

Annotators

pbk1

URL

nf-co.re/ampliseq/2.12.0/docs/usage/
training.nextflow.io training.nextflow.io

Configuration - training.nextflow.io

2
1. pbk1 11 Jul 2025
  
  in Public
  
  Process directives allow the specification of settings for the task execution such as cpus, memory, container, and other resources in the workflow script.
2. pbk1 11 Jul 2025
  
  in Public
  
  it’s strongly suggested to define the process settings in the workflow configuration file instead of the workflow script
Visit annotations in context

Annotators

pbk1

URL

training.nextflow.io/2.2/basic_training/config/
www.nextflow.io www.nextflow.io

Wave containers — Nextflow documentation

3
1. pbk1 11 Jul 2025
  
  in Public
  
  quick alternative to building Conda packages in the local computer
2. pbk1 11 Jul 2025
  
  in Public
  
  Wave allows the provisioning of containers based on the conda directive used by the processes in your pipeline
3. pbk1 11 Jul 2025
  
  in Public
  
  Seqera Platform access token is not mandatory, but it is recommended in order to access private container repositories and pull public containers without being affected by service rate limits
  
  Usage limits here
Visit annotations in context

Annotators

pbk1

URL

nextflow.io/docs/latest/wave.html
www.biorxiv.org www.biorxiv.org

FastGA: Fast Genome Alignment

7
1. pbk1 11 Jul 2025
  
  in Public
  
  FastGA finds alignments between two genome sequences more than an order of magnitude faster
2. pbk1 11 Jul 2025
  
  in Public
  
  stores millions of alignments in a fraction of the space of a conventional CIGAR-string
3. pbk1 11 Jul 2025
  
  in Public
  
  using a trace-point encoding
4. pbk1 11 Jul 2025
  
  in Public
  
  We carefully separate the problems of genome alignment and genome homology
5. pbk1 11 Jul 2025
  
  in Public
  
  by not conflating these two steps, FastGA can be used for other downstream tasks such as finding recurrent insertions due to transposable elements
6. pbk1 11 Jul 2025
  
  in Public
  
  The key idea is to reduce the number of k-mers inspected for seed matches by using only those that are minimizers in a window of some small size
7. pbk1 11 Jul 2025
  
  in Public
  
  Lot of claims of speed and memory efficient storage of alignments. read more later ignoring the details that are too technical
Visit annotations in context

Annotators

pbk1

URL

biorxiv.org/content/10.1101/2025.06.15.659750v1
www.nextflow.io www.nextflow.io

Processes — Nextflow 0.23.2 documentation

3
1. pbk1 11 Jul 2025
  
  in Public
  
  it is a way to perform a dry-run
2. pbk1 11 Jul 2025
  
  in Public
  
  provide a dummy script that mimics the execution
3. pbk1 11 Jul 2025
  
  in Public
  
  ou can define a command stub, which replaces the actual process command when the -stub-run or -stub command-line option is enabled:
Visit annotations in context

Annotators

pbk1

URL

nextflow.io/docs/latest/process.html
www.nature.com www.nature.com

Machine learning for microbiologists

3
1. pbk1 11 Jul 2025
  
  in Public
  
  In this Review, we examine the main machine learning concepts, tasks and applications that are relevant for experimental and clinical microbiologists
2. pbk1 11 Jul 2025
  
  in Public
  
  provide the minimal toolbox
3. pbk1 11 Jul 2025
  
  in Public
  
  understand, interpret and use machine learning
Visit annotations in context

Annotators

pbk1

URL

nature.com/articles/s41579-023-00984-1
www.biorxiv.org www.biorxiv.org

Lightweight taxonomic profiling of long-read metagenomic datasets with Lemur and Magnet

7
1. pbk1 11 Jul 2025
  
  in Public
  
  investigated the concordance between the species and genus level calls across the tools
2. pbk1 11 Jul 2025
  
  in Public
  
  we built a new marker gene database with 43 markers for bacteria+archaea and 48 markers for fungi
3. pbk1 11 Jul 2025
  
  in Public
  
  We then built the database using recent versions of NCBI RefSeq: version 221 for both bacteria (329,194 assemblies) and archaea (1,911) and version 222 for fungi (564)
  
  https://zenodo.org/records/10802546
  
  Need to create a reproducible process/script to update the database with newer versions of NCBI RefSeq!
4. pbk1 11 Jul 2025
  
  in Public
  
  The final database was 4.1 GB, containing 3,335,783 sequences.
5. pbk1 11 Jul 2025
  
  in Public
  
  Lemur, a marker-gene-based long-read taxonomic profiler
6. pbk1 11 Jul 2025
  
  in Public
  
  Magnet, a genome-based validation tool for confirming the presence and absence of microbial genomes present in a sample
7. pbk1 11 Jul 2025
  
  in Public
  
  we introduce Lemur and Magnet, a pair of tools optimized for lightweight and accurate taxonomic profiling for long-read shotgun metagenomic datasets
  
  What makes this long-read compatible? The EM (expectation maximization) technique similar to Emu?
Visit annotations in context

Annotators

pbk1

URL

biorxiv.org/content/10.1101/2024.06.01.596961v2
academic.oup.com academic.oup.com

BugBuster: A novel automatic and reproducible workflow for metagenomic data analysis

2
1. pbk1 11 Jul 2025
  
  in Public
  
  BugBuster is a fully containerized, modular, and reproducible workflow implemented in Nextflow. The pipeline streamlines analysis at level of reads, contigs, and metagenome-assembled genomes (MAGs), offering dedicated modules for taxonomic profiling and resistome characterization.
2. pbk1 11 Jul 2025
  
  in Public
  
  Thanks to the use of containers, BugBuster can be deployed with minimal configuration on workstations, high-performance clusters, or cloud platforms
  
  Does this really require containers or can be done with conda as well?
Visit annotations in context

Annotators

pbk1

URL

academic.oup.com/bioinformaticsadvances/advance-article/doi/10.1093/bioadv/vbaf152/8174904
academic.oup.com academic.oup.com

GTDB-Tk v2: memory friendly classification with the genome taxonomy database

1
1. pbk1 11 Jul 2025
  
  in Public
  
  an update to GTDB-Tk that uses a divide-and-conquer approach where user genomes are initially placed into a bacterial reference tree with family-level representatives followed by placement into an appropriate class-level subtree comprising species representatives
Visit annotations in context

Annotators

pbk1

URL

academic.oup.com/bioinformatics/article/38/23/5315/6758240
academic.oup.com academic.oup.com

GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy

2
1. pbk1 11 Jul 2025
  
  in Public
  
  Genome Taxonomy Database (GTDB; https://gtdb.ecogenomic.org) provides a phylogenetically consistent and rank normalized genome-based taxonomy for prokaryotic genomes sourced from the NCBI Assembly database.
2. pbk1 11 Jul 2025
  
  in Public
  
  GTDB uses relative evolutionary divergence (RED) to delineate higher-rank taxa and average nucleotide identity (ANI) to delineate species clusters
Visit annotations in context

Annotators

pbk1

URL

academic.oup.com/nar/article/50/D1/D785/6370255
www.nextflow.io www.nextflow.io

Preparing for strict syntax — Nextflow documentation

1
1. pbk1 11 Jul 2025
  
  in Public
  
  Groovy-style type annotations should be used instead:
Visit annotations in context

Annotators

pbk1

URL

nextflow.io/docs/latest/strict-syntax.html
nextflow.io nextflow.io

Workflows — Nextflow documentation

1
1. pbk1 11 Jul 2025
  
  in Public
  
  The take: section is used to declare the inputs of a named workflow:
Visit annotations in context

Annotators

pbk1

URL

nextflow.io/docs/latest/workflow.html
nextflow.io nextflow.io

Syntax — Nextflow documentation

1
1. pbk1 11 Jul 2025
  
  in Public
  
  Statements and script declarations can not be mixed at the same level.
Visit annotations in context

Annotators

pbk1

URL

nextflow.io/docs/latest/reference/syntax.html
www.biorxiv.org www.biorxiv.org

Sequence similarity estimation by random subsequence sketching

2
1. pbk1 11 Jul 2025
  
  in Public
  
  Abstract
  
  Read annotations in public group
2. pbk1 07 Jul 2025
  
  in Public
  
  unlike the linear number of k-mers in a sequence, the number of subsequences grows exponentially
  
  What is k-mer vs subsequence difference?
  
  (duckduckgo-AI generated) A k-mer is a specific type of subsequence that consists of a fixed length (k) of nucleotides from a biological sequence, while a subsequence can be any sequence derived from another sequence by deleting some elements without changing the order of the remaining elements. In bioinformatics, k-mers are often used for tasks like DNA sequence assembly and analysis.
Visit annotations in context

Annotators

pbk1

URL

biorxiv.org/content/10.1101/2025.02.05.636706v2
training.nextflow.io training.nextflow.io

Hello nf-core - training.nextflow.io

2
1. pbk1 11 Jul 2025
  
  in Public
  
  best practices guidelines enforced by the project further ensure that the pipelines are robust, well-documented, and validated against real-world datasets.
  
  something to learn from and emulate?
2. pbk1 11 Jul 2025
  
  in Public
  
  Convert basic Nextflow modules to nf-core compatible modules
Visit annotations in context

Annotators

pbk1

URL

training.nextflow.io/2.2/hello_nf-core/
bmcmicrobiol.biomedcentral.com bmcmicrobiol.biomedcentral.com

The evaluation of shotgun sequencing and rpoB metabarcoding for taxonomic profiling of bacterial communities

4
1. pbk1 11 Jul 2025
  
  in Public
  
  potential limitation of shotgun sequencing is the complexity of bioinformatics pipelines required for its analysis
  
  This is a great statement for making the Somatem pipeline more accessible
2. pbk1 11 Jul 2025
  
  in Public
  
  scripts were made available for researchers
3. pbk1 11 Jul 2025
  
  in Public
  
  KBase platform [37] offers a user-friendly interface that allows for the analysis of data using most of the tools described in this publication
  
  other GUI tools to benchmark to / comment on?
  
  GUI bioinformatic platform
4. pbk1 08 Jul 2025
  
  in Public
  
  Unexpectedly, k-mer approaches resulted in rather high false positive rates, which may lead to misinterpretations of microbial community composition.
  
  Could this be improved with tool choices and databases? - There are more recent tools than Kraken2-bracken and sourmash for this. - Centrifuger is a recent tool ; and sylph is known to have more stringent / less false positives
Visit annotations in context

Tags

bioinformatic platform

GUI

Annotators

pbk1

URL

bmcmicrobiol.biomedcentral.com/articles/10.1186/s12866-025-04149-3
training.nextflow.io training.nextflow.io

Nextflow for Genomics - training.nextflow.io

3
1. pbk1 11 Jul 2025
  
  in Public
  
  course demonstrates how to implement a simple variant calling pipeline with GATK (Genome Analysis Toolkit)
2. pbk1 11 Jul 2025
  
  in Public
  
  linear workflow
3. pbk1 11 Jul 2025
  
  in Public
  
  accessory files
Visit annotations in context

Annotators

pbk1

URL

training.nextflow.io/2.2/nf4_science/genomics/
training.nextflow.io training.nextflow.io

Part 1: Run a demo pipeline - training.nextflow.io

8
1. pbk1 11 Jul 2025
  
  in Public
  
  understand what it does and how it should be configured before attempting to run it
2. pbk1 11 Jul 2025
  
  in Public
  
  ln -s $NXF_HOME/assets pipelines
  
  if NXF_HOME is not found, try ~/.nextflow
3. pbk1 11 Jul 2025
  
  in Public
  
  nf-core project enforces strong guidelines for how pipelines are structured, and how the code is organized, configured and documented.
4. pbk1 11 Jul 2025
  
  in Public
  
  subworkflows
5. pbk1 11 Jul 2025
  
  in Public
  
  reuse chunks of code across different pipelines
6. pbk1 11 Jul 2025
  
  in Public
  
  flexible while minimizing maintenance burden
7. pbk1 11 Jul 2025
  
  in Public
  
  'utility' or housekeeping subworkflows
8. pbk1 11 Jul 2025
  
  in Public
  
  accessory functions
Visit annotations in context

Annotators

pbk1

URL

training.nextflow.io/2.3.0/hello_nf-core/01_run_demo/
nf-co.re nf-co.re

Docs: Getting started

1
1. pbk1 11 Jul 2025
  
  in Public
  
  Nextflow works best with an active internet connection, as it is able to fetch all pipeline requirements.
Visit annotations in context

Annotators

pbk1

URL

nf-co.re/docs/usage/introduction
training.nextflow.io training.nextflow.io

Part 2: Rewrite Hello for nf-core - training.nextflow.io

2
1. pbk1 11 Jul 2025
  
  in Public
  
  It may seem like a lot of work to accomplish the same result as the original pipeline, but you do get all those lovely reports generated automatically
2. pbk1 11 Jul 2025
  
  in Public
  
  features of nf-core, including input validation and some neat metadata handling capabilities
Visit annotations in context

Annotators

pbk1

URL

training.nextflow.io/2.2.1/hello_nf-core/02_rewrite_hello/
www.ibm.com www.ibm.com

Structured vs. Unstructured Data: What’s the Difference? | IBM

1
1. pbk1 11 Jul 2025
  
  in Public
  
  unstructured data does not have a predefined data model, it is not easily processed and analyzed through conventional data tools and methods. It is best managed in nonrelational or NoSQL databases or in data lakes, which are designed to handle massive amounts of raw data in any format.
Visit annotations in context

Annotators

pbk1

URL

ibm.com/think/topics/structured-vs-unstructured-data
nf-co.re nf-co.re

modules/centrifuge_build

1
1. pbk1 11 Jul 2025
  
  in Public
  
  NCBI taxonomy dump
  
  How to retrieve this?
  
  Download a taxdump.tar.gz file from NCBI servers and extract the names.dmp and nodes.dmp files from it. taxonomizer: sherrilmix.github.io
  
  NCBI taxonomy
Visit annotations in context

Annotators

pbk1

URL

nf-co.re/modules/centrifuge_build/
www.nextflow.io www.nextflow.io

VS Code integration — Nextflow documentation

8
1. pbk1 11 Jul 2025
  
  in Public
  
  highlights source code in red for errors and yellow for warnings
2. pbk1 11 Jul 2025
  
  in Public
  
  Problems tab. Here, you can search for diagnostics
3. pbk1 11 Jul 2025
  
  in Public
  
  language server parses scripts and config files according to the Nextflow language specification, which is more strict than the Nextflow CLI
4. pbk1 11 Jul 2025
  
  in Public
  
  Include declarations in scripts and config files act as links, and ctrl-clicking them opens the corresponding script or config file.
5. pbk1 11 Jul 2025
  
  in Public
  
  view the definition of a symbol (e.g., a workflow, process, function, or variable),
6. pbk1 11 Jul 2025
  
  in Public
  
  can format your scripts and config files based on a standard set of formatting rules
7. pbk1 11 Jul 2025
  
  in Public
  
  right-click the symbol, select Rename Symbol
8. pbk1 11 Jul 2025
  
  in Public
  
  Format Document command in the command palette
Visit annotations in context

Annotators

pbk1

URL

nextflow.io/docs/latest/vscode.html
peerj.com peerj.com

Extending MetAMOS - new methods and new integrations

1
1. pbk1 10 Jul 2025
  
  in Public
  
  There are three most popular pipelines used for NGSanalyses: QIIME, mothur and MetAMOS
  
  Don't know if this statement is justified given that the number of citations differ by 2 orders of magnitude - Mothur: 20 K - Qiime : 37 K - MetAMOS: 230
Visit annotations in context

Annotators

pbk1

URL

peerj.com/preprints/1706.pdf
www.pnas.org www.pnas.org

Kinetic mechanisms for the sequence dependence of transcriptional errors

1
1. pbk1 10 Jul 2025
  
  in Public
  
  Using first-passage analysis validated by Monte Carlo simulations, we quantitatively characterize nucleotide-specific error rates during RNA polymerase II transcription
  
  (comments before reading in full:) Curious how you got all the rates mentioned in Fig 1C. - Appendix table S1 shows most rate constant parameters are fitted; I wonder how they were fitted - The rate constants that were fixed, did you get those from literature..?
Visit annotations in context

Annotators

pbk1

URL

pnas.org/doi/10.1073/pnas.2505040122
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov

Taxonomy - Site Guide - NCBI

1
1. pbk1 09 Jul 2025
  
  in Public
  
  New taxa are added to the Taxonomy database as data are deposited for them.
  
  How often should I update this in a classifier like centrifuger?
Visit annotations in context

Annotators

pbk1

URL

ncbi.nlm.nih.gov/guide/taxonomy/
www.biorxiv.org www.biorxiv.org

Ten species comprise half of the bacteriology literature, leaving most species unstudied

5
1. pbk1 09 Jul 2025
  
  in Public
  
  Visualization were created with pgfplots
  
  why not ggplot? Since you were using R already..
2. pbk1 09 Jul 2025
  
  in Public
  
  knowledge gap between microbes that are only studied en masse as communities and those select few species whose molecular, genetic, or physiological diversity is studied in detail.
  
  Since you are not capturing species not present in the title/abstract. You would also not capture a future study that employs automated robotics to study multiple organism like you mentioned in the previous paragraph!
3. pbk1 09 Jul 2025
  
  in Public
  
  apply these tools to the myriad species that live in the understudied corners of our world.
  
  Roboticizing microbiology involves moving parts (shaking cultures), changing temperatures etc. and it will be harder to automate the study of understudied species with finicky behaviours. For example, certain streptomyces species (roseosporus) form aggregates if not grown with the proper shaking in a bevelled flask within viscous media with glass beads put in.
  
  How on earth do you automate your way out when you cannot standardize culture conditions for finicky organisms?
4. pbk1 09 Jul 2025
  
  in Public
  
  Statisticians have taught for decades that the most efficient and robust experimental designs vary multiple factors simultaneously and then deconvolve the effects and interactions with simple statistical models
  
  This will be a nightmare in biology with low sample sizes and limited data. You will need a lot more depth of data to de-convolve factors efficiently even with the newer AI methods
5. pbk1 09 Jul 2025
  
  in Public
  
  counted the number of PubMed articles that refer to each species in their title or abstract
  
  What about microbes mentioned in the body of the paper or even tables of supplementary material etc. Are these not significant enough to count as "understanding" these microbes?
  
  New AI based methods would make it possible to scrape such references given contextual keywords etc. that discriminate between casual references vs emphasis enough that the microbe is being "studied".
  
  Also, what does it mean for a microbe to be "understood" anyways? Do these all qualify, and at the same magnitude? 1. Microbiology (culture methods, media, growth rate calculations) 2. Synthetic bio (figuring out regulatory elements.. promoters, RBS and such that enable expressing genes on plasmids or chromosomal integration) 3. Bioinformatic explorations involving function (insights from meta-transcriptomic studies)
Visit annotations in context

Annotators

pbk1

URL

biorxiv.org/content/10.1101/2025.01.04.631297v1
www.nytimes.com www.nytimes.com

I Can’t Sleep. Now What?

1
1. pbk1 09 Jul 2025
  
  in Public
  
  few sips can help lower your overall body temperature, mimicking its natural decline before you sleep
  
  How does this compare with drinking hot milk which is adviced by some sources?
Visit annotations in context

Annotators

pbk1

URL

nytimes.com/interactive/2025/07/07/well/i-cant-sleep.html
www.nature.com www.nature.com

Universal rules govern plasmid copy number

2
1. pbk1 09 Jul 2025
  
  in Public
  
  Notes from Todd:
  
  Huge caveat that the study assumes plasmids are all detected via variable sequencing depths and impossible to speak to copy numbers across varying sequencing technologies
  
  Interesting at the exploratory level but just scratching the surface and may be biased due to the biased nature of the samples in the SRA
2. pbk1 03 Jul 2025
  
  in Public
  
  PCN was then calculated for each sample as the ratio between the mean coverage of plasmid contigs and the mean coverage of the chromosome
  
  How robust is this measurement compared to qPCR (or even better: ddPCR) - Could be interesting to compare and benchmark this to some known data such as this paper: Accurate Determination of Plasmid Copy Number of Flow-Sorted Cells using Droplet Digital PCR
Visit annotations in context

Annotators

pbk1

URL

nature.com/articles/s41467-025-61202-5
www.biorxiv.org www.biorxiv.org

Multi-omics time-series analysis in microbiome research: a systematic review

1
1. pbk1 08 Jul 2025
  
  in Public
  
  Besides the review itself, it's a nice organization of longitudinal data, so can be useful when looking for datasets (Nick Sapoval)
Visit annotations in context

Annotators

pbk1

URL

biorxiv.org/content/10.1101/2025.07.03.659054v1
genome.cshlp.org genome.cshlp.org

Centrifuge: rapid and sensitive classification of metagenomic sequences

1
1. pbk1 07 Jul 2025
  
  in Public
  
  A k-mer based taxonomic classification tool ; Much smaller database size than Kraken. Uses compressed k-mer indexing using BWT compression and FM index
  
  (compression) Uses only unique portions of new genomes to reduce redundancy in the index. - Fig 1
  
  FM-index provides a means to exploit both large and small k-mer matches by enabling rapid search of k-mers of any length
  
  Centrifuge can assign a sequence to multiple taxonomic categories
  
  Centrifuger is a more compressed version? what are the trade offs of this?
  
  In Centrifuger, the Burrows-Wheeler transformed genome sequences are losslessly compressed using a novel scheme called run-block compression
Visit annotations in context

Annotators

pbk1

URL

genome.cshlp.org/content/26/12/1721.full
www.science.org www.science.org

High-throughput, single-microbe genomics with strain resolution, applied to a human gut microbiome

2
1. pbk1 07 Jul 2025
  
  in Public
  
  This is very neat. This could be a good complex dataset with known ground truth to benchmark tools for strain resolution / time-course tracking / HGT tracking methods like rhea
  
  I would re-create the bulk sequencing by truncating the droplet-specific barcode and collecting all the sequences together) ; I wish they parallelly sequenced the full sample without the droplets for this purpose though..
2. pbk1 07 Jul 2025
  
  in Public
  
  enables us to follow the relative abundances of these strains over time in the human donor
  
  Isn't there a cheaper way to track strains without needing single-cell sequenced genomes?
Visit annotations in context

Annotators

pbk1

URL

science.org/doi/10.1126/science.abm1483
journals.asm.org journals.asm.org

Identifying and Predicting Novelty in Microbiome Studies | mBio

1
1. pbk1 07 Jul 2025
  
  in Public
  
  For each microbiome sample, its MNS was derived by searching its sequence against those of all samples produced by past studies
  
  What does a "sample" mean: - A metagenome - a collection of sequences in a microbiome - a single sequence from a microbiome collection?
Visit annotations in context

Annotators

pbk1

URL

journals.asm.org/doi/10.1128/mbio.02099-18
arxiv.org arxiv.org

UMA: A Family of Universal Models for Atoms

1
1. pbk1 07 Jul 2025
  
  in Public
  
  evaluate UMA models on a diverse set of applications
  
  which ones?
Visit annotations in context

Annotators

pbk1

URL

arxiv.org/abs/2506.23971
academic.oup.com academic.oup.com

Construction of edit-distance graphs for large sets of short reads through minimizer-bucketing

1
1. pbk1 07 Jul 2025
  
  in Public
  
  Pairs of short reads with small edit distances, along with their unique molecular identifier tags, have been exploited to correct sequencing errors in both reads and tags.
  
  nice summary of UMI working principle
Visit annotations in context

Annotators

pbk1

URL

academic.oup.com/bioinformaticsadvances/article/5/1/vbaf081/8110477
zihad.com.bd zihad.com.bd

Sync Google Drive in linux using podman and rclone

1
1. pbk1 07 Jul 2025
  
  in Public
  
  Sync Google Drive in linux using podman and rclone
  
  What's the advantage of podman here?
Visit annotations in context

Annotators

pbk1

URL

zihad.com.bd/posts/how-to-setup-sync-google-drive-with-podman-container-and-rclone/
zihad.com.bd zihad.com.bd

Sync Google Drive in Linux using rclone

1
1. pbk1 07 Jul 2025
  
  in Public
  
  let’s automate the command using systemd
  
  How does this compare to cronjob - don't need a system file for that
Visit annotations in context

Annotators

pbk1

URL

zihad.com.bd/posts/sync-google-drive-linux-clone/
www.baeldung.com www.baeldung.com

Running a Script With systemd Before System Shutdown | Baeldung on Linux

1
1. pbk1 05 Jul 2025
  
  in Public
  
  First, we create a symbolic link to the unit file in the /etc/systemd/system directory.
  
  Simlink has an issue when the service will fail to load on startup. Copying the file is better (Windsurf AI)
  
  This is a common issue with systemd services that are symlinked from a user's home directory. The problem occurs because the home directory isn't mounted when systemd tries to read the service file during early boot. Here's how to fix it: 1. Copy the service file instead of symlinking it:
Visit annotations in context

Annotators

pbk1

URL

baeldung.com/linux/systemd-run-script-before-shutdown
the-ken.com the-ken.com

Train AI models, earn up to $50/hr: these firms turn to Indian PhDs, techies, actors

1
1. pbk1 05 Jul 2025
  
  in Public
  
  It’s easier than teaching kids, and it’s more exciting in some ways,” said an AI trainer
  
  Is it more satisfying than teaching humans though?
Visit annotations in context

Annotators

pbk1

URL

the-ken.com/story/train-ai-models-earn-up-to-50-hr-these-firms-turn-to-indian-phds-techies-actors/
academic.oup.com academic.oup.com

Minimap2: pairwise alignment for nucleotide sequences

1
1. pbk1 03 Jul 2025
  
  in Public
  
  Minimap2 is a new paradigm in mapping and by extension pairwise alignment. Uses concepts from full-genome aligners (seed-chain-align) and works for short, long reads (noisy) and RNA-seq as well. - Uses: read mapper, long-read overlapper, full-genome aligner
  
  capability of minimap2 comes from a fast base-level alignment algorithm and an accurate chaining algorithm..
  
  Minimap2 indexes reference k-mers with a hash table
Visit annotations in context

Annotators

pbk1

URL

academic.oup.com/bioinformatics/article/34/18/3094/4994778
clauswilke.com clauswilke.com

t-SNE – Art by Claus O. Wilke

1
1. pbk1 03 Jul 2025
  
  in Public
  
  More recently, Lior Pachter has argued that t-SNE and the related UMAP do not serve a meaningful purpose in data analysis and are only useful for producing art.
  
  t-SNE UMAP debate
Visit annotations in context

Tags

UMAP

t-SNE

debate

Annotators

pbk1

URL

clauswilke.com/art/project/t-sne
www.nature.com www.nature.com

Machine learning and deep learning applications in microbiome research

3
1. pbk1 03 Jul 2025
  
  in Public
  
  feature extraction attempts to reduce the dimensionality of a dataset by building a compressed representation of the input features
  
  Example: Go from species to higher taxa ~ genera, family..
2. pbk1 03 Jul 2025
  
  in Public
  
  Methods like t-stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP) faithfully capture and reveal local and non-linear relationships in complex microbiome datasets, but their tuning is finicky
  
  debate t-SNE
3. pbk1 03 Jul 2025
  
  in Public
  
  Whether taxonomic or functional profiles provide a better discriminatory power in downstream analysis is subject to debate [23,24,25].
  
  Do you mean like a classifier?
  
  read later
Visit annotations in context

Tags

read later

t-SNE

debate

Annotators

pbk1

URL

nature.com/articles/s43705-022-00182-9
www.nature.com www.nature.com

A fair comparison

1
1. pbk1 03 Jul 2025
  
  in Public
  
  Interesting note to read on pseudocont value and how it should be set to be consistent across tools
  
  pseudocount microbiome metagenomics
Visit annotations in context

Tags

pseudocount

microbiome

metagenomics

Annotators

pbk1

URL

nature.com/articles/nmeth.2897
docs.conda.io docs.conda.io

Managing channels — conda 4.8.4.post47+4cccb93e documentation

1
1. pbk1 01 Jul 2025
  
  in Public
  
  Different channels can have the same package, so conda must handle these channel collisions.
  
  Biopython has this issue. It sometimes causes errors of package resolution since it is present in both bioconda (older versions) and conda-forge (more recent, maintained)
  
  Error: libmamba Could not solve for environment specs
  
  To solve this, put conda-forge at higher priority than bioconda
Visit annotations in context

Annotators

pbk1

URL

docs.conda.io/projects/conda/user-guide/tasks/manage-channels.html
www.nature.com www.nature.com

Rapid species-level metagenome profiling and containment estimation with sylph

1
1. pbk1 01 Jul 2025
  
  in Public
  
  many environments can still not be classified well at the species level11 because of database incompleteness. Our strategy for tackling this problem was to design sylph so that researchers can create customized databases from their novel genomes or MAGs, although this requires the generation of new genomes for researchers working in undercharacterized microbiomes.
  
  How can users create their own customized databases? - Find any reference to this in the methods/suppl
Visit annotations in context

Annotators

pbk1

URL

nature.com/articles/s41587-024-02412-y
www.nytimes.com www.nytimes.com

California Rolls Back Its Landmark Environmental Law

2
1. pbk1 01 Jul 2025
  
  in Public
  
  make it too easy to build manufacturing sites that could cause more pollution
  
  Can you force less pollution in other ways?
2. pbk1 01 Jul 2025
  
  in Public
  
  avoid rigorous environmental review
  
  Does damage to the environment have to be in the same bucket as not wanting any change in their backyards (NIMBY)?
Visit annotations in context

Annotators

pbk1

URL

nytimes.com/2025/06/30/us/california-environment-newsom-ceqa.html
www.biorxiv.org www.biorxiv.org

Microbiome diversity of low biomass skin sites is captured by metagenomics but not 16S amplicon sequencing

2
1. pbk1 01 Jul 2025
  
  in Public
  
  Advocating for shallow metagenomics for better taxonomic resolution (sub-species) compared to 16S ; this is important for low microbial density samples (skin microbiomes).
2. pbk1 01 Jul 2025
  
  in Public
  
  16S amplicon sequencing exhibited extreme bias toward the most abundant taxon
  
  I assume the PCR step causes most issue. and qPCR with species specific primers doesn't reproduce this since there is no competition?
Visit annotations in context

Annotators

pbk1

URL

biorxiv.org/content/10.1101/2025.06.24.661265v1
Jun 2025
www.ashbyhq.com www.ashbyhq.com

An ATS with AI to Power Your Recruiting | Ashby

1
1. pbk1 28 Jun 2025
  
  in Public
  
  let AI surface which applicants match your given criteria.
  
  The video here gives a nice glance for applicants to understand what ATS does
  
  ATS
Visit annotations in context

Tags

ATS

Annotators

pbk1

URL

ashbyhq.com/ai
Local file Local file

Untitled document

1
1. pbk1 28 Jun 2025
  
  in Public
  
  especially interested incandidates with prior wet lab experience and a generalist quantitative mindset.
  
  How do you show generalist quantitative mindset in resume? - maybe easier in cover letter?
Annotators

pbk1
academic.oup.com academic.oup.com

Minimap2: pairwise alignment for nucleotide sequences

1
1. pbk1 27 Jun 2025
  
  in Public
  
  Minimap2 follows a typical seed-chain-align procedure as is used by most full-genome aligners
  
  anchor = exact matches of minimizers from the query (seeds) in the reference (from database)
  
  chain = sets of colinear anchors
  
  align = extending the chain + filling in the gaps
Visit annotations in context

Annotators

pbk1

URL

academic.oup.com/bioinformatics/article/34/18/3094/4994778
ccd.rice.edu ccd.rice.edu

Self Assessment

2
1. pbk1 26 Jun 2025
  
  in Public
  
  Myers-Briggs Type Indicator
  
  May benefit from this during interviews. Getting to know yourself better. strengths, weaknesses
  
  Email Raylea for the code. the other 2 : strong and focus2 is more for undergrads
2. pbk1 17 Jun 2025
  
  in Public
  
  Myers-Briggs Type Indicator® (MBTI®) assessment
  
  Another tool mentioned in the CCD appointment types is strong interest inventory
Visit annotations in context

Annotators

pbk1

URL

ccd.rice.edu/students/explore-majors-careers/self-assesment
academic.oup.com academic.oup.com

VaxBot-HPV: a GPT-based chatbot for answering HPV vaccine-related questions

1
1. pbk1 25 Jun 2025
  
  in Public
  
  We extracted 202 question-answer pairs from the KB and 39 questions generated by GPT-4 for training and testing purposes
  
  Isn't it weird to train and test on the same questions?
Visit annotations in context

Annotators

pbk1

URL

academic.oup.com/jamiaopen/article/8/1/ooaf005/8024268
www.science.org www.science.org

Confronting risks of mirror life

3
1. pbk1 25 Jun 2025
  
  in Public
  
  Read: kate adamala harm of mirror life
2. pbk1 25 Jun 2025
  
  in Public
  
  David A. Relman,
  
  Todd says this is a vocal guy with influence in this camp
3. pbk1 25 Jun 2025
  
  in Public
  
  capability to create mirror life is likely at least a decade away and would require large investments and major technical advances
  
  I believe, making these self-replicating is a long way off. Will need all the machinery including polymerases, ribosomes etc. as well as a way to make the necessary monomers from available forms in the environment
Visit annotations in context

Annotators

pbk1

URL

science.org/doi/10.1126/science.ads9158
www.nature.com www.nature.com

Guidelines for preventing and reporting contamination in low-biomass microbiome studies

1
1. pbk1 24 Jun 2025
  
  in Public
  
  Nice guideline document for thinking about contamination in low biomass samples and wet-lab + computational approaches of dealing with it
Visit annotations in context

Annotators

pbk1

URL

nature.com/articles/s41564-025-02035-2
www.science.org www.science.org

Gaia: An AI-enabled genomic context–aware platform for protein sequence annotation

1
1. pbk1 24 Jun 2025
  
  in Public
  
  This method merges an embedding based protein homolog search with a genomic context similarity. This needs a multi-modal LM including aa and DNA seqs. - Genomic context examples: CRISPR/defense islands.
  
  Modalities of protein homolog (sequence similarity) search 1. amino acid sequence based: BLAST, HMMER 2. Embedding based search: using ESM2 embeddings - Structural search, using AlphaFold structures - But all of these lack the extra boost provided by adding in genomic context which currently is only done manually!
Visit annotations in context

Annotators

pbk1

URL

science.org/doi/10.1126/sciadv.adv5109
rice.app.box.com rice.app.box.com

SBC Aetna Memorial Hermann ACO FY2025.pdf | Powered by Box

1
1. pbk1 21 Jun 2025
  
  in Public
  
  $50 copay/visit
  
  Urgent care.
  
  No coverage for non-urgent use.
  
  Does CVS Minute clinic count as urgent care?
Visit annotations in context

Annotators

pbk1

URL

rice.app.box.com/s/kgj46yw43o4psxzvitx44rzzis9f8vxx
stackoverflow.com stackoverflow.com

How to reset Git LFS bandwidth?

1
1. pbk1 20 Jun 2025
  
  in Public
  
  If you make a 1 byte change and push the file again, you'll use another 500 MB of storage and no bandwidth
  
  this seems insane; what if I don't want to track the versions of this large file and only keep the final versions? - There should be some option to just change the link to the latest version and dump the old version without using the bandwidth during download as well - See the latest version of git-lfs for info on this
Visit annotations in context

Annotators

pbk1

URL

stackoverflow.com/questions/59223469/how-to-reset-git-lfs-bandwidth
genomebiology.biomedcentral.com genomebiology.biomedcentral.com

KPop: accurate and scalable comparative analysis of microbial genomes by sequence embeddings

5
1. pbk1 19 Jun 2025
  
  in Public
  
  A higher resolution but still quick method to compare multiple genomes (unassembled also). Uses a full k-mer spectra instead of minHash methods
  
  Unlike MinHash-based methods that produce distances and have lower resolution, KPop is able to accurately map sequences onto a low-dimensional space.
  
  Questions: (before reading paper..)
  
  By unassembled genomes, do you mean contigs?
  
  How does this k-mer spectra make it higher resolution than minHash?
  
  Does dataset dependence of these transformation make this a hurdle in some way?
2. pbk1 19 Jun 2025
  
  in Public
  
  KPop, a novel versatile method based on full k-mer spectra and dataset-specific transformations, through which thousands of assembled or unassembled microbial genomes can be quickly compared
  
  Does dataset dependence of these transformation make this a hurdle in some way?
3. pbk1 19 Jun 2025
  
  in Public
  
  simplified signatures (“sketches”) based on some dataset-independent choices
  
  Does dataset independence make these tools better than current one in some way?
4. pbk1 19 Jun 2025
  
  in Public
  
  the most relevant methods to classify or compare microbial genomes based on k-mers can be broadly divided into the following categories:
  
  Good section to skim
  
  genome comparison
5. pbk1 19 Jun 2025
  
  in Public
  
  KPop is able to accurately map sequences onto a low-dimensional space
  
  The claim is that this is higher resolution than minHash methods like mash?
Visit annotations in context

Tags

genome comparison

Annotators

pbk1

URL

genomebiology.biomedcentral.com/articles/10.1186/s13059-025-03585-8
www.biorxiv.org www.biorxiv.org

Planetary-scale metagenomic search reveals new patterns of CRISPR targeting

3
1. pbk1 17 Jun 2025
  
  in Public
  
  Cool study that re-queries a wide range of metagenomic data to raise some new thoughts on phage host range questions
  
  Read later to clarify thoughts in the hypothesis comments
2. pbk1 17 Jun 2025
  
  in Public
  
  we observed surprising cases of viruses targeted by microbes not expected to be viable hosts.
  
  Interesting, need to read to find out why you won't expect something to be viable host. Is it mismatched environmental source of the phage vs the host / phylogenetic mismatch between expected host of the phage and spacer source?
3. pbk1 17 Jun 2025
  
  in Public
  
  CRISPR spacers frequently matched multiple MGEs
  
  Do you mean the same spacer matches multiple MGEs? - Need to clarify better..
Visit annotations in context

Annotators

pbk1

URL

biorxiv.org/content/10.1101/2025.06.12.659409v1
pmc.ncbi.nlm.nih.gov pmc.ncbi.nlm.nih.gov

Computational design of serine hydrolases

1
1. pbk1 16 Jun 2025
  
  in Public
  
  Previous efforts to design enzymes have largely focused on finding geometric matches between model active sites and preexisting protein structures, an approach akin to buying a suit from a thrift store; it is unlikely the fit will be perfect.
  
  Great analogy!🤣
Visit annotations in context

Annotators

pbk1

URL

pmc.ncbi.nlm.nih.gov/articles/PMC12288761/
www.geeksforgeeks.org www.geeksforgeeks.org

How Does Git Handle Symbolic Links? - GeeksforGeeks

2
1. pbk1 13 Jun 2025
  
  in Public
  
  Unlike hard links, which point directly to the file data on the disk, symlinks are independent files that contain a path to another file or directory
  
  Hard link vs soft link
  
  I'm curious how a hard link would operate when synced to another computer via git/cloud drives.In my experience, I found that a hardlink I made in windows broke when used rclone sync with onedrive into a linux PC
  
  symlink symbolic link
2. pbk1 13 Jun 2025
  
  in Public
  
  Git treats symbolic links as special files that store the path to the target file. When you add a symbolic link to a Git repository, Git records the link information rather than the contents of the target file. Here’s how Git handles symbolic links during various operations:
  
  is this only for softlinks?
Visit annotations in context

Tags

symbolic link

symlink

Annotators

pbk1

URL

geeksforgeeks.org/git/how-does-git-handle-symbolic-links/
www.biorxiv.org www.biorxiv.org

Bacterial species and surface structures shape gene transfer and the transcriptional landscape during early conjugation

1
1. pbk1 13 Jun 2025
  
  in Public
  
  Their carriage often corresponds with changes to the host transcriptome
  
  is this specific to conjugative plasmids or any plasmid?
Visit annotations in context

Annotators

pbk1

URL

biorxiv.org/content/10.1101/2025.06.12.659224v1
microbiomejournal.biomedcentral.com microbiomejournal.biomedcentral.com

High-resolution strain-level microbiome composition analysis from short reads

1
1. pbk1 12 Jun 2025
  
  in Public
  
  Summary: Uses short-reads from metagenomes to give strain level composition. employs tree-based k-mer indexing. Briefly they do: s1. cluster similar strains + tree index for searching, s2. generate strain specific k-mers (collinear blocks within same cluster) > build a matrix.
  
  employs a novel tree-based k-mers indexing structure to strike a balance between the strain identification accuracy and the computational complexity…
  
  By searching strains inside the identified clusters, StrainScan achieves a higher resolution than cluster-level tools such as StrainGE and StrainEst
  
  Note: Also contrast with a newer Strainify tool?
Visit annotations in context

Annotators

pbk1

URL

microbiomejournal.biomedcentral.com/articles/10.1186/s40168-023-01615-w
pubs.acs.org pubs.acs.org

Protein Binding Pocket Dynamics

1
1. pbk1 12 Jun 2025
  
  in Public
  
  Macromolecular binding pockets, on the other hand, are located on the protein surface and are often shallower
  
  protein-protein interactions?
Visit annotations in context

Annotators

pbk1

URL

pubs.acs.org/doi/10.1021/acs.accounts.5b00516
superuser.com superuser.com

How to increase and decrease quote levels in Thunderbird?

1
1. pbk1 12 Jun 2025
  
  in Public
  
  paste it in Thunderbird as a quote block (Ctrl+Shift+o).
  
  making a quite block in thunderbird
  
  thunderbird keyboard shortcut
Visit annotations in context

Tags

thunderbird

keyboard shortcut

Annotators

pbk1

URL

superuser.com/questions/1208931/how-to-increase-and-decrease-quote-levels-in-thunderbird
elifesciences.org elifesciences.org

Barcode-free multiplex plasmid sequencing using Bayesian analysis and nanopore sequencing

3
1. pbk1 12 Jun 2025
  
  in Public
  
  taking fully advantage of our algorithm might involve coordination between multiple colleagues in a lab who are constructing plasmids with different expected sequences.
  
  This is something a local core like GCEC can help with
2. pbk1 12 Jun 2025
  
  in Public
  
  it could be further reduced by executing time-consuming dynamic programming only for some query-reference pairs that necessitate high levels of accuracy and by introducing parallel computing
  
  Nice, Any other ideas to reduce RAM use?
3. pbk1 12 Jun 2025
  
  in Public
  
  theoretical minimum number of reads that is required for the reliable consensus calculation is 30 reads per plasmid
  
  Does this depend on the plasmid length and the preperation kit before sequencing that determines fragmentation?
Visit annotations in context

Annotators

pbk1

URL

elifesciences.org/reviewed-preprints/88794v1
nf-co.re nf-co.re

Docs: Pipeline configuration

1
1. pbk1 10 Jun 2025
  
  in Public
  
  Please only use Conda as a last resort, i.e., when it’s not possible to run the pipeline with Docker or Singularity.
  
  Why is conda not recommended?
Visit annotations in context

Annotators

pbk1

URL

nf-co.re/docs/usage/configuration
www.biorxiv.org www.biorxiv.org

Genome modeling and design across all domains of life with Evo 2

3
1. pbk1 10 Jun 2025
  
  in Public
  
  To enable data augmentations and stitching of multiple contigs together, we introduce two special tokens. The ‘#’ token is used to join sequences from the same species with uncertain distance to each other, while the ‘@’ token is used for sequences that are from the same contig/strand and are near each other.
  
  Do you ensure that the stitched contigs are in the same order within the chromosome - Is this better than using an assembly tool?
  
  How are these delimiters # and @ processed at the output stage? - If these delimiters are de-emphasized during the calculations, would this promote evo2 to learn a false sense of continuity between contigs that are not connected within the actual genome?
2. pbk1 10 Jun 2025
  
  in Public
  
  Evo 2 can also leverage its unique representation of biological complexity to generate new genomic sequences
  
  What is the point in generating genomic sequences with some vague notion such as "naturalness"? - Assuming future adaptations would include prompting to generate specific sequence features; maybe it makes more sense in this context?
3. pbk1 10 Jun 2025
  
  in Public
  
  previously demonstrated that machine learning models trained on prokaryotic genomic sequences can model the function of DNA, RNA, and proteins
  
  Elaborate "Can model the function"
Visit annotations in context

Annotators

pbk1

URL

biorxiv.org/content/10.1101/2025.02.18.638918v1
www.mdpi.com www.mdpi.com

VOGDB—Database of Virus Orthologous Groups

1
1. pbk1 09 Jun 2025
  
  in Public
  
  VOGDB, which is a database of virus orthologous groups. VOGDB is a multi-layer database that progressively groups viral genes into groups connected by increasingly remote similarity
  
  Layers: 1. pair-wise sequence similarity 2. sequence profile alignment 3. predicted protein structures
  
  The first layer is based on pair-wise sequence similarities, the second layer is based on the sequence profile alignments, and the third layer uses predicted protein structures to find the most remote similarity
Visit annotations in context

Annotators

pbk1

URL

mdpi.com/1999-4915/16/8/1191
www.nature.com www.nature.com

Pooled analysis of 3,741 stool metagenomes from 18 cohorts for cross-stage and strain-level reproducible microbial biomarkers of colorectal cancer

1
1. pbk1 09 Jun 2025
  
  in Public
  
  Specific gut species distinguish left-sided versus right-sided CRC (area under the curve = 0.66) with an enrichment of oral-typical microbes
  
  It is very surprising that left and right side of colerectal cancers have heterogeneity!
Visit annotations in context

Annotators

pbk1

URL

nature.com/articles/s41591-025-03693-9
resources.biginterview.com resources.biginterview.com

How To Use AI To Write A Resume: Tips And Mistakes To Avoid

1
1. pbk1 08 Jun 2025
  
  in Public
  
  totally AI-generated resumes have a sameness to them that recruiters can tell right away. They all use similar language, and they’re almost identical.
  
  resume AI
Visit annotations in context

Tags

AI

resume

Annotators

pbk1

URL

resources.biginterview.com/resumes/how-to-make-a-resume-with-ai/
www.cnn.com www.cnn.com

Trump preparing large-scale cancellation of federal funding for California, sources say

1
1. pbk1 06 Jun 2025
  
  in Public
  
  The Trump administration is preparing to cancel a large swath of federal funding for California
  
  How do you prevent such partisan and vindictive actions by federal government on states?
  
  Same thing is happening in India - Is there any framework people have seen in a more federated country, maybe Germany?
Visit annotations in context

Annotators

pbk1

URL

cnn.com/2025/06/06/politics/trump-california-federal-funding
www.biorxiv.org www.biorxiv.org

Inter-tool analysis of a NIST dataset for assessing baseline nucleic acid sequence screening

1
1. pbk1 05 Jun 2025
  
  in Public
  
  or methodological differences in screening algorithms.
  
  This could have been elaborated a bit more. It is too generic and rather obvious
  
  TODO: try bringing this up to Todd on Slack..
  
  to-do
Visit annotations in context

Tags

to-do

Annotators

pbk1

URL

biorxiv.org/content/10.1101/2025.05.30.655379v1
academic.oup.com academic.oup.com

Unicore Enables Scalable and Accurate Phylogenetic Reconstruction with Structural Core Genes

2
1. pbk1 05 Jun 2025
  
  in Public
  
  Interesting study that expands the similarity metric used to mark core-genes that determine clade membership and phylogeny by their homology. They sub-sample the similarity problem by predicting 3Di structural strings as opposed to full structure prediction
  
  To identify core-genes, we traditionally use amino acid similarity (better than nucleotide.. codon usage differences)
  
  Going one step ahead, we can use protein structures/folds to generalize this further for deep clades where amino acid homology is quite low.
  
  Read more to see how they implement this and how robust is this homology inferred an approximate subsampling like scheme onto 3Di structural strings generated from amino acids like alphafold
2. pbk1 05 Jun 2025
  
  in Public
  
  can also be defined using structures
  
  You mean protein structure
Visit annotations in context

Annotators

pbk1

URL

academic.oup.com/gbe/advance-article/doi/10.1093/gbe/evaf109/8155201
benjamindlee.com benjamindlee.com

Why I Use Nim instead of Python for Data Processing

1
1. pbk1 05 Jun 2025
  
  in Public
  
  , I increasingly use the Nim programming language for data processing tasks. Nim is under-appreciated in computational science but it is a very capable Python replacement for non-numerical data processing. At a high level, Nim is as easy to write as Python and as fast as C
  
  nim = Interesting cool and fast python like programming language
  
  How does this compare to Julia?
Visit annotations in context

Annotators

pbk1

URL

benjamindlee.com/posts/2021/why-i-use-nim-instead-of-python-for-data-processing/
benjamindlee.com benjamindlee.com

Obsidian is (almost) a Typora killer

1
1. pbk1 05 Jun 2025
  
  in Public
  
  It’s really helpful to be able to dictate my academic papers using my phone when inspiration hits me, wherever that may be.
  
  Audio transcription is available with a plug in?
Visit annotations in context

Annotators

pbk1

URL

benjamindlee.com/posts/2022/obsidian-is-almost-a-typora-killer/
journals.asm.org journals.asm.org

Aeromonas spp. as a fast-growing high-performance chassis for protein production | Applied and Environmental Microbiology

1
1. pbk1 04 Jun 2025
  
  in Public
  
  removed potential virulence genes and secretion systems (T3SS and T6SS) to ensure safety
  
  Would keeping the secretion system enable easier protein purification through secretion tags?
Visit annotations in context

Annotators

pbk1

URL

journals.asm.org/doi/10.1128/aem.00780-25

Prashant Kalvapalle

Graduate student - Systems Synthetic and Physical biology (SSPB)

Rice University

Interested in quantitative biology, microbial ecology, game theory, microbial biosensors. Also take interest in international politics, public health, economics

Annotations: 2,359

Joined: May 31, 2018

Location: Houston

Link: stadler.rice.edu/prashant-kalvapalle

ORCID: 0000-0002-8255-3623

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Annotators

URL

Annotators