HLA*IMP:03 v0.1.0

University of Melbourne logo


This page describes how to prepare your data for upload to the HLA*IMP:03 web server. For more general documentation, please go to the Documentation page.

Your SNP data must be in the form of phased haplotypes in the Oxford HAPS/SAMPLE file format. This means you will need to perform your own haplotype phasing before uploading your data.

The SNP array used to genotype your data should be specified when uploading your data. For each SNP array there is a set of SNPs required by HLA*IMP:03 (as specified in the SNP information summary file) that must be in your uploaded data. If any of the required SNPs are missing from your data (e.g. due to quality control filtering) then you will need to perform SNP imputation so that all required SNPs are present in your uploaded data. There also must be no sporadically missing data, however please note that phasing software such as SHAPEIT will automatically impute sporadically missing SNPs.

If your SNP array is not on our current list of SNP arrays then please contact us to discuss alternative options. Note that we also provide the option for you to provide data for all SNPs in the HLA*IMP:03 reference panel, rather than only those on a specific SNP array. Selecting this option will likely require you to impute some of the required SNPs.

The following steps describe how to get your data into the required format.

1. Ensure correct genome build and strand

Please ensure that your SNP data use GRCh37/hg19 coordinates and are on the '+' strand.

GRCh37 is currently the most commonly used genome build (although not the most recent). If your data do not use GRCh37 coordinates then you should use a tool such as liftOver to convert them.

Use of the '+' strand is standard for haplotype reference panels such as those from the 1000 Genomes Project and the Haplotype Reference Consortium. The following resource is valuable for resolving strand issues: Genotype Chip Strand Files.

2. Extract SNPs from across the HLA region

HLA*IMP:03 only requires SNPs from across the HLA region. For convenience, we recommend extracting SNPs on chromosome 6 from coordinates 20,000,000 to 40,000,000 bp. This allows for the retention of SNPs flanking the HLA region to aid haplotype phasing, and manageable file sizes. (If your file sizes are greater than 100 MB then you will need to break up your data into batches of individuals to be submitted as separate jobs.) Relevant SNPs can be extracted, for example, by the following PLINK command (all on one line):

plink --file myData --chr 6 --from-mb 20 --to-mb 40 --recode --out myHLAregionData

3. Haplotype phasing and SNP imputation

Below we describe how to perform haplotype phasing and SNP imputation with an online imputation service. The advantage of using an online service is that much of the work is automated, although you may have to wait for your results depending on current demand. These services perform both phasing and SNP imputation, so there is no need to check if you are missing any of the required SNPs as they will be imputed automatically.

Use of an online SNP imputation service

The following assumes use of the Michigan Imputation Server, although it is possible to make use of the Sanger Imputation Service instead.

a) Prepare your data for upload as per the Michigan Imputation Server instructions. You only need upload data on chromosome 6 from coordinates 20 Mbp to 40 Mbp, as mentioned above.

b) Submit your job to the Michigan Imputation Server. We recommend setting the reference panel to '1000G Phase 3 v5'. Ensure that the phasing algorithm is set to a method that provides 'phased output' (currently 'Eagle' and 'HapiUR' provide phased output, but 'ShapeIT' does not).

c) Download results from the Michigan Imputation Server when notified. This will be in the form of a gzipped vcf file.

d) Make sure you have bcftools installed. Installation instructions are available here.

e) Convert your phased and imputed data from vcf.gz format to Oxford HAPS/SAMPLE format, retaining only the SNPs required by HLA*IMP:03, as follows. First, download the full SNP list reference file, which contains the list of SNPs required by HLA*IMP:03. Then run the following command (all on one line):

bcftools view -p -T hlaimp3.all.snps.txt myImputedPhasedSNPs.vcf.gz | bcftools convert --hapsample myHLAIMP03input

f) You should now have two files, myHLAIMP03input.haps and myHLAIMP03input.sample, ready for upload to the HLA*IMP:03 web server. Note: it is a good idea to check that myHLAIMP03input.haps contains all of the SNPs in hlaimp3.all.snps.txt.

Performing phasing and SNP imputation locally

You may alternatively perform phasing and imputation yourself, however we recommend using an online imputation service unless you are an experienced user of phasing and imputation software. Performing phasing and SNP imputation locally essentially requires you to replicate the steps carried out by an online imputation service. This will require you to have downloaded a haplotype reference panel (we recommend 1000 Genomes Phase 3) and use software for phasing and SNP imputation. You should first phase your data (e.g. with SHAPEIT). Then check if you are missing any SNPs required for your SNP array (listed in the SNP information summary file). If you are missing required SNPs then you must impute them (e.g. with IMPUTE2).