Clare86
The Eve Premium app on sequencing.com has different options for calling variants from sequencing data; Strelka, Gatk, Platypus and Samtools. This has made me wonder which tool Dante uses for variant calling. 

Does anyone know? 
Quote 0 0
Randy H

Dante uses the Dragen Pipeline tool suite (formerly Edico Genome, now Illumina Basespace) made available on the Amazon Marketplace: https://aws.amazon.com/marketplace/pp/B07CZ3F5HY?ref_=srh_res_product_title  The tool suite has options for both BCFTalks and GATK. Most 30x WGS runs Dante Labs I have seen are using a GATK pipeline with the hs37D5 reference model.  If you look in the header of VCF and BAM files, the dragen command and software version are usually listed.  Although even that will not necessarily give you the exact pipeline calls. Note that the new Illumina release has lower version numbers (Dragen had gone up to 8 under Edico but is now at 3.5 for Illumina).  While the software is available for local servers (https://www.illumina.com/products/by-type/informatics-products/dragen-bio-it-platform.html), its cost is probitive.  Edico Genome developed an FPGA accelerator for certain functions which reduces the overall alignment time to under an hour with similar improvements in variant calling pipeline.  Because of the custom algorithm and hardware, you cannot exactly replicate the results elsewhere. Streika is generally used for calling small variant InDel's and Structure Variants (SVs). Platypus is a Python based tool out of the Wellcome Center and available on GitHub.  You could run it as well as it could be integrated into WGSExtract 🙂  I have not seen wide adoption. Technically, the variant caller is in BCFTools and not SAMTools. 

Another acceleration technique uses the GPU's of high end graphic processors built for gaming.  These are designed with 128 to 256 parallel compute channels.  Normally for processing 3D graphic images in parallel and "real time".  While translating bioinformatics to this platform is not straightforward (the graphic pipelines are high parallelized and independent; as well as computer bound and heavily pipelineable.  Some of this is true for bioinformatics; bioinformatics has heavy IO requirements with less compute-bound aspects and so cannot be so deeply pipelined)  Even so, we are starting to see a creep to GPU-based algorithms. Something used in the new BGI machines. In a similar way, the algorithms and hardware is custom and so you cannot exactly replicate the results in a desktop, software-only environment. Unless it becomes available in AWS or similar as well. Key is as more custom acceleration techniques are used, the less we likely have access to replicating results as their software is custom.

There is a nice comparison and overview of some variant callers in an article from 2018.

Quote 0 0