Extract coverage from BAM files in BED format

What is coverage

Sequence coverage (or sequencing depth) refers to the number of reads that include a specific nucleotide of a reference genome. In the screenshot below a small region of the Human genome is shown in a genome browser, where the alignment of a re-sequencing experiment was loaded. Each gray arrow in the main area of the window represents a single read, while the highlighted area is a coverage plot, summarizing for each nucleotide how many times was sequenced in this experiment.

Image for post

If the sequencing was performed using paired ends or mate pairs, it’s possible to physical coverage is the number of times a base is read or spanned by paired ends or mate paired reads¹. The picture below shows the difference.

Bedtools genome coverage

Bedtools has an even better option to calculate coverage, as it produce a standard BED file.

bedtools genomecov -bga FILE.bam > FILE.coverage.bed

The output is a standard BED file: a tabular file with these columns: chromosome, start, end, coverage. The difference with the previous output is that adjacent bases with the same coverage will be collapsed in the same interval.

If we want to quickly detect the zero coverage regions, we can exploit the fact that the last column indicates the coverage:

bedtools genomecov -bga FILE.bam | grep -w "0$" > FILE.0X.bed

Both samtools and bedtools can extract coverage information from a BAM file. Using text manipulation tools (like grep or awk) we can further filter the output they produce to select regions of interest.

Getting the regions covered within a coverage range

Image for post
A BED file produced from the coverage profile of a BAM file

A quick tool to produce a coverage profile is covtobed, that takes a sorted BAM file as input and will produce a BED file, optionally specifying to only prints those intervals having a coverage within a range (or only the regions with no coverage).

To install the tool you can use Miniconda:

conda install -y -c bioconda covtobed

An then the usage is as simple as:

covtobed -m 0 -M 1 input.bam > not_covered.bed

the full manual is available in the repository wiki.

efetch failed

efetch installed via conda, this time failed in one of the VM i use. Trying to manually install LWP::protocol::https result in further errors, arriving to complain for lacking a C compiler…

Bash went to the rescue:

curl -s https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?id=LT906492.1\&db=nuccore\&rettype=fasta 

Dump html content with lynx

Lynx is an incredibly useful text-only browser, that you can use to read webpages (interactively) from the command line. I used it sometimes to access websites restricted to some areas where I could access a server, for example.

It can be used to output the webpage (non interactively) to the standard output, both the HTML source:

lynx -source "URL" > page.html

Or the plain text:

lynx -dump "URL" > page.txt

~/.screenrc

The common and widespread status bar for screen, plus minor tweaks for the GNU screen configuration file (.screenrc). Copy from below or download it from here.

# Bind F11 and F12 (NOT F1 and F2) to previous and next screen window
bindkey -k F1 prev bindkey -k F2 next startup_message off

# Window list at the bottom.
hardstatus alwayslastline
hardstatus string "%-w%{= BW}%50>%n %t%{-}%+w%< | %c | %D %d"

# Enable mouse scrolling and scroll bar history scrolling
termcapinfo xterm* ti@:te@