Resources
Here’s a friendly, handpicked list of bioinformatics resources to help you kick off your analyses or sharpen your skills. It’s not everything out there, just stuff based on my experience. And whenever there’s a great tutorial already online, I’ll point you to it. No need to reinvent the wheel!
Programming
The Essentials: Key Programming Languages for Bioinformatics and Data Science.
Languages

Bash
The command-line powerhouse. Bash is your go-to for automating repetitive tasks, managing files, and running pipelines on servers or clusters. In bioinformatics, it’s essential for processing large datasets, running tools like BWA or SAMtools, and chaining commands into efficient workflows.
Python
The Swiss Army knife of bioinformatics. Python’s simplicity and vast ecosystem make it ideal for data analysis, scripting, and machine learning. Use it to parse genomic data, build pipelines, or create interactive visualizations, Python does it all.

R
The statistician’s best friend. R shines in exploratory data analysis, statistical modeling, and publication-ready visualizations. In bioinformatics, it’s the tool of choice for differential expression analysis, clustering, and interpreting high-throughput sequencing data.
Workflow management
Web applications
Integrated Development Environments
IDE are essential tools for developers, offering a range of features that facilitate coding, debugging, and project management.
Visual Studio Code
Visual Studio Code (VS Code) is a highly popular, open-source code editor developed by Microsoft. It supports a wide range of programming languages and comes with features like IntelliSense code completion, debugging tools, Git integration, and a vast marketplace for extensions.

PyCharm
PyCharm is another IDE from JetBrains, specifically designed for Python development. It provides features like code analysis, graphical debugging, and an integrated terminal. PyCharm also supports web development with Django, Flask, and other frameworks.
Rstudio
RStudio is a powerful and widely-used Integrated Development Environment (IDE) designed to enhance productivity in coding, data analysis, and visualization, primarily for R but also supporting Python. It is the go-to choice for over 90% of R programmers, making it the de facto standard in the field.

Positron
Positron is the next-generation IDE for developing R code. Developed by Posit, it is set to be the successor of RStudio, bringing new features and a smoother user experience.
Bioinformatics tools
tools that you need for most sequencing data analyses
Raw data quality check

MultiQC
MultiQC rapidly processes all files within the designated directory, producing interactive reports within seconds. While functionally comparable to FastQC, it supports a broader range of file formats, including SAM and BAM, and necessitates a deeper level of technical comprehension.
Reads filtering and trimming
Genome assembly

Unicycler
Unicycler is the genome assembler you need. It has been designed for hybrid assembly (long reads + short reads) but also works very well with short reads only. It uses SPAdes in the background and produces assemblies of quality.

Flye
If you need to perform genome assembly using long reads, use Flye. It has been designed for long reads either from PacBio or Nanopore technologies.

QUAST
Once assembled, evaluate your assembly’s quality to ensure optimal results or compare it with others, QUAST is the tool for the job. It analyzes your contigs and delivers key metrics to accurately assess assembly quality.
Phylogeny

IQ-TREE
fast and user-friendly tool for phylogenetic tree inference based on maximum likelihood. It supports DNA and protein sequence alignments and includes features such as automated model selection and ultrafast bootstrap, making it well suited for efficient and accurate phylogenetic analyses.
Annotation

RAST
RAST (Rapid Annotation using Subsystem Technology) is an automated pipeline for the annotation of bacterial and archaeal genomes. It provides functional predictions for genes and assigns them to biological subsystems. Only accessible online, it is system independent but may takes quite a long time before finishing annotation.

Bakta
Fast and standardized tool for the annotation of bacterial genomes. It provides high-quality structural and functional annotations using curated reference databases, ensuring consistent results across analyses. Bakta is designed for ease of use and integration into bioinformatics pipelines, making it well suited for large-scale bacterial genome projects.

AMRfinder
AMRFinder is a tool developed by NCBI for identifying antimicrobial resistance (AMR) genes and associated mutations in bacterial genomes. It uses curated AMR reference data to detect resistance determinants from DNA or protein sequences, providing reliable annotations important for surveillance, research, and clinical microbiology.

Resfinder
Web-based tool for detecting acquired antimicrobial resistance genes in bacterial whole-genome sequences. Developed by the Center for Genomic Epidemiology, it compares input sequences against curated resistance gene databases to support AMR surveillance and epidemiological studies.

GenoScanner
GenomeScanner is a lightweight bioinformatics tool for taxonomically classify microbial genomes.
Peaks calling

MACS3
MACS3 (Model-based Analysis of ChIP-Seq) is a widely used tool for identifying enriched regions in ChIP-seq and related sequencing data. It models background noise to accurately detect peaks corresponding to protein–DNA interactions, making it a standard choice for transcription factor and epigenomic analyses.

HOMER
HOMER is a suite of tools for analyzing ChIP-seq and other high-throughput sequencing data, with a strong focus on peak detection and motif discovery. It enables identification of enriched genomic regions and associated regulatory motifs, supporting studies of transcriptional regulation and epigenomics.
Variant calling

Snippy
Rapid bacterial variant calling and core genome alignment tool. It identifies SNPs and small indels from whole-genome sequencing data by mapping reads to a reference genome, making it ideal for comparative genomics and outbreak investigations.

GATK
GATK (Genome Analysis Toolkit) is a comprehensive software suite for variant discovery and genotyping in high-throughput sequencing data. Widely used in human and model organism genomics, it provides robust tools for calling SNPs, indels, and structural variants, along with best-practice workflows for accurate and reproducible analysis.
Metagenomics

DADA2
Software package for high-resolution analysis of amplicon sequencing data, particularly 16S and ITS rRNA gene sequences. It models and corrects sequencing errors to infer exact biological sequences, enabling accurate identification of microbial taxa and community composition.

metaSPAdes
MetaSPAdes is a specialized genome assembler designed for metagenomic sequencing data. It reconstructs microbial genomes from complex communities by efficiently handling uneven coverage and mixed populations, making it ideal for environmental and clinical metagenomics studies.

QIIME2
Powerful, open-source platform for analyzing and visualizing microbiome sequencing data. It supports reproducible workflows for tasks such as quality control, taxonomic classification, diversity analysis, and data visualization, making it widely used in microbial ecology and microbiome research.
Others

Skani
Tool for rapid and scalable k-mer based analysis of genomic sequences. It enables fast comparison and clustering of large-scale genome datasets, making it suitable for genome similarity studies, phylogenomics, and microbial population analyses.

mge-cluster
Tool for clustering and analyzing mobile genetic elements (MGEs) in microbial genomes. It enables identification of related MGEs across datasets, facilitating studies of horizontal gene transfer, antibiotic resistance, and genome evolution.

BEDTools
Powerful suite of utilities for comparing, manipulating, and analyzing genomic features in BED, GFF/GTF, VCF, and other formats. It enables tasks such as intersecting, merging, and subtracting genomic intervals, making it a staple tool for genome annotation and sequence analysis workflows.

deepTools
Toolkit for the analysis and visualization of high-throughput sequencing data, particularly ChIP-seq, RNA-seq, and ATAC-seq. It provides functions for normalization, coverage calculation, and generation of publication-quality heatmaps and profiles, facilitating exploration of genomic signal patterns.

blast+
Suite of tools for comparing nucleotide or protein sequences against sequence databases. It enables rapid identification of homologous sequences, functional annotation, and evolutionary analysis, making it a cornerstone of bioinformatics research and genomic studies.

featuresCount
High-performance tool for counting reads mapped to genomic features such as genes, exons, or transcripts. It efficiently processes large RNA-seq datasets and provides accurate read assignments, making it a key step in gene expression analysis workflows.
SeqKit
Fast and versatile toolkit for manipulating and analyzing FASTA and FASTQ sequence files. It provides a wide range of functions, including filtering, sorting, sampling, and statistics calculation, making it a convenient utility for routine bioinformatics workflows.

bioconvert
Versatile tool for converting between a wide range of bioinformatics file formats, including sequence, alignment, and annotation files. It streamlines data interoperability, making it easier to integrate different tools and pipelines in genomics and transcriptomics analyses.
Online resources
Convenient websites for facilitating your analysis and data manipulation, databases and others
Plots and visualizations

From data to Viz
Find the most appropriate graph for your data with great examples and code in R and Python.

GenoVi
GenoVi generates circular genome representations for complete, draft, and multiple bacterial and archaeal genomes.

Fundamentals of Data Visualization
A complete guide about what to do and not to do when creating figures. And more !
Databases

Glittr
Web application framework designed to make getting started quick and easy, with the ability to scale up to complex applications.

Data & ML
Database
A collection of 200 real-world data science and machine learning case studies across companies.

What statistical test to do ?
A guide to choose the right statistical test to do with your data. Part of an amazing blog about statistics.

Bioinformatics_toolkit
A GitHub repository packed with tutorials, analysis examples, and other valuable resources.
sanbox.bio
Practice your Bash commands in a safe sandbox. Experiment freely—no risk of breaking anything!

SequencEnG
Explore a beautiful interactive resource to deepen your understanding of sequencing techniques because knowing where your data comes from is key.












