CNAnorm is a Bioconductor package to estimate Copy Number Aberrations (CNA) in cancer samples.
It is described in the paper:
Gusnanto, A., Wood, H.M., Pawitan, Y., Rabbitts, P. and Berri, S. Correcting for cancer genome size and tumour cell content enables better estimation of copy number alterations from next generation sequence data. 2012. Bioinformatics, 28(1):40-47
CNAnorm performs ratio, GC content correction and normalization of data obtained using very low coverage (one read every 100-10,000 bp) high throughput sequencing. It performs a “discrete” normalization looking for the ploidy of the genome. It also provides tumour content if at least two ploidy states can be found.
Get the latest (recommended) version of CNAnorm and its documentation from Bioconductor. You might need a Fortran compiler and
make to compile the latest versione (Linux/Unix users). If you install it using biocLite(“CNAnorm”) from within R, you will install the latest release version.
You can also obtain the perl script
bam2windows.pl or the latest version from googlecode to convert sam/bam files to the text files required by CNAnorm. For documentation on usage, run the script without arguments
For further information on both programs, please contact Stefano Berri
NGSoptwin is an R package designed to choose the optimal window size for CNAnorm. It is available here.
Additional data files
We provide gc1000Base.txt.gz, an example file for GC content (build GRCh37/hg19) to optionally use with bam2windows.pl. It provides average GC content every 1000 bp. The size of the window in the GC content file should be at least an order of magnitude smaller than the window used for CNAnorm to minimise boundary effects. If you require higher resolution, you can dowload the gc5Base tables from UCSD and/or make your own. The smaller the window size in the GC content file, the larger this will be, and the longer it will take to
bam2windows.pl to process it.
LS041 bam files
We provide the bam files used to produce the dataset included in
They contain 500,000 reads randomly extracted from the following larger and unsorted files
To produce the text file used as input for CNAnorm, enter the following:
perl bam2windows.pl --readNum 50 --gc_file gc1000Base.txt.gz LS041_tumour_500K_sorted.bam LS041_control_500K_sorted.bam > LS041.tab
It will produce this file