Extract coverage from BAM files in BED format

What is coverage

Sequence coverage (or sequencing depth) refers to the number of reads that include a specific nucleotide of a reference genome. In the screenshot below a small region of the Human genome is shown in a genome browser, where the alignment of a re-sequencing experiment was loaded. Each gray arrow in the main area of the window represents a single read, while the highlighted area is a coverage plot, summarizing for each nucleotide how many times was sequenced in this experiment.

Image for post

If the sequencing was performed using paired ends or mate pairs, it’s possible to physical coverage is the number of times a base is read or spanned by paired ends or mate paired reads¹. The picture below shows the difference.

Bedtools genome coverage

Bedtools has an even better option to calculate coverage, as it produce a standard BED file.

bedtools genomecov -bga FILE.bam > FILE.coverage.bed

The output is a standard BED file: a tabular file with these columns: chromosome, start, end, coverage. The difference with the previous output is that adjacent bases with the same coverage will be collapsed in the same interval.

If we want to quickly detect the zero coverage regions, we can exploit the fact that the last column indicates the coverage:

bedtools genomecov -bga FILE.bam | grep -w "0$" > FILE.0X.bed

Both samtools and bedtools can extract coverage information from a BAM file. Using text manipulation tools (like grep or awk) we can further filter the output they produce to select regions of interest.

Getting the regions covered within a coverage range

Image for post
A BED file produced from the coverage profile of a BAM file

A quick tool to produce a coverage profile is covtobed, that takes a sorted BAM file as input and will produce a BED file, optionally specifying to only prints those intervals having a coverage within a range (or only the regions with no coverage).

To install the tool you can use Miniconda:

conda install -y -c bioconda covtobed

An then the usage is as simple as:

covtobed -m 0 -M 1 input.bam > not_covered.bed

the full manual is available in the repository wiki.

Advertisement