KIR*IMP v1.2.0

University of Melbourne logo

Documentation

Registration

In order to use KIR*IMP, you first need to register an account and log in. This can be done on the home page. Remember to first consult our Terms & Conditions.

Data preparation

You will need to prepare your SNP data into a specific format before uploading. The full requirements are as follows:

  • Your data needs to be phased. We recommend using the SHAPEIT software for phasing SNP genotypes.
  • Your data should be provided in the HAPS/SAMPLE file format. This is the native output format for SHAPEIT. Both the HAPS and SAMPLE files need to be uploaded when creating a KIR*IMP job.
  • SNP positions should be specified in GRCh37 ('build 37') coordinates. The SNP IDs provided are ignored, only the positions are used to identify SNPs and relate them to our reference panel.
  • SNP alleles should be encoded in the same way as the reference panel. A SNP information summary file is provided to help you convert your data to the same encoding (see below).

Reference files

  • SNP information summary file. This shows the allele encoding and allele frequencies for the UK reference panel, which is used for fitting the KIR*IMP model. You can use this to ensure that the SNP alleles in your data are encoded in the same way.

Imputation

To submit an imputation job:
  1. Log in and go to the My jobs page.
  2. At the top of the page is a form that allows you to upload your data to submit a new job.
  3. Some basic data checks are run on your files before a job is accepted. If these fail, an error message will pop up to let you know.
  4. Once your job is accepted it is be added to the job queue. You will receive an email to let you know once it is complete.
  5. All files will be deleted after 30 days. Please ensure you download your results before then.

The imputation process involves re-fitting the model to be tailored to your dataset. Specifically, the SNP intersection between the input dataset and the reference panel will be taken, and a model will be fitted to the reference panel using only these SNPs. This model is then used for carrying out the imputation for the input dataset.

The greater the SNP intersection, the more accurate the imputation is likely to be. To ensure adequate performance, we currently require a minimum of 10 SNPs in the intersection in order to carry out a job.

Output

At the conclusion of a KIR*IMP job you will be sent an email with an access key that will allow you to download the results. The steps to do this are:

  1. Log in and go to the My jobs page.
  2. In the table of jobs, find the one you wish to download and follow the 'Download results' link.
  3. When prompted, enter the access key from your email. This will then start the download.

The download will be a zip archive containing the following files:

imputations.csv

The imputed KIR types. One row for each input haplotype and KIR locus, showing the most likely (i.e. posterior mode) allele and its associated posterior probability.

accuracy.csv

Estimates of the average per-haplotype imputation accuracy for each KIR locus. These relate to the model fitted with the SNP intersection (see 'Imputation' above). The estimates are the out-of-bag (OOB) accuracy calculated during the model fitting process (see our Publications for details).

accuracy.pdf

A plot of the imputation accuracy estimates (see above).

alleles_kir.csv

Counts of the KIR alleles for both the input dataset (imputed) and the reference panel (known), for each KIR locus. These are also shown in a plot (see below).

alleles_kir.pdf

A plot comparing the KIR allele counts (see above). The diagonal lines indicate equal frequency in both datasets. If the KIR loci are imputed well, and the input dataset and reference panel are representative of the same population, then the frequencies in the two datasets should be similar (indicated by the points lying close to the diagonal line).

alleles_snp.csv

Allele frequencies of the SNPs in the intersection between the input dataset and the reference panel. These are also shown in a plot (see below).

alleles_snp.pdf

A plot comparing the SNP allele frequencies between the input dataset and the reference panel. If the two datasets are representative of the same population, and the SNP alleles use the same encoding (e.g. same strand), then these frequencies should be similar.

posteriors.pdf

A plot showing the distribution of the posterior probabilities of the most likely alleles (the same as in imputations.csv; see above) for each KIR locus. This is shown as an empirical cumulative distribution function. Two curves are plotted: one for the input dataset and, for comparison, one for OOB imputations from the reference panel. If the KIR loci are imputed well, the two distributions should be similar.

Data retention

All data files will be deleted 30 days after processing. This includes all uploaded files as well as all output files associated with each job. Please ensure you download the results from your imputation jobs before they are deleted.

Changelog

All notable changes to this project are documented here. This project adheres to Semantic Versioning.

1.2.0 (2017-07-20)

Changed:

  • Moved to a new server at the University of Melbourne.
  • New Terms and Conditions, to reflect the transfer from MCRI to the University of Melbourne.
  • Substantial upgrades to underlying libraries.
  • Minor changes and improvements to the web interface.
  • Updated the contact details.

1.1.1 (2016-06-24)

Changed:

  • Substantial internal changes to the implementation. These are only of an infrastructural nature, no changes were made to the statistical model.
  • Implemented a 30-day data retention policy. Updated the Documentation page and the Terms and Conditions (clause 3.4) to describe this.
  • Fixed the numbering of clauses in Section 14 of the Terms and Conditions.
  • Minor changes to wording on some webpages.
  • Updated the contact details.

1.1.0 (2015-12-09)

Added:

  • SNP encoding re-alignment. Previously it was required that the input dataset have exactly the same SNP encoding as the reference panel, including which alleles are coded as 0 and 1. Now the latter requirement is relaxed: the 0/1 encoding is automatically adjusted to ensure the input dataset matches the reference panel. It is nevertheless still necessary to ensure that the SNPs in the input dataset are on the same strand as the reference panel. A plot comparing the SNP allele frequencies is now provided as a diagnostic tool to check if there are any alignment problems.
  • Extra output files (useful as diagnostics): comparison of KIR and SNP allele frequencies, comparison of the distributions of the imputation posteriors.
  • New email address and a link to our new mailing list, on the Contact page.
  • Version numbering and a changelog.

Changed:

  • Changed output data files to be comma-separated (CSV) rather than tab-separated (TSV).
  • Improved the names of the output files.
  • Updated and expanded the documentation.

1.0.0 (2015-10-01)

Initial release. Full details of the methodology and validation are in the Publications.