KIR*IMP v1.2.0 |
|
Documentation
Registration
In order to use KIR*IMP, you first need to register an account and log in.
This can be done on the home page. Remember
to first consult our Terms &
Conditions.
Data preparation
You will need to prepare your SNP data into a specific format before
uploading. The full requirements are as follows:
- Your data needs to be phased. We recommend using the SHAPEIT
software for phasing SNP genotypes.
- Your data should be provided in the HAPS/SAMPLE
file format. This is the native output format for SHAPEIT. Both
the HAPS and SAMPLE files need to be uploaded when creating a KIR*IMP
job.
- SNP positions should be specified in GRCh37
('build 37') coordinates. The SNP IDs provided are ignored, only the
positions are used to identify SNPs and relate them to our reference
panel.
- SNP alleles should be encoded in the same way as the reference
panel. A SNP information summary file is provided to help you
convert your data to the same encoding (see below).
Reference files
- SNP information
summary file. This shows the allele encoding and allele
frequencies for the UK reference panel, which is used for fitting the
KIR*IMP model. You can use this to ensure that the SNP alleles in your
data are encoded in the same way.
Imputation
To submit an imputation job:
- Log in and go to the My jobs
page.
- At the top of the page is a form that allows you to upload your
data to submit a new job.
- Some basic data checks are run on your files before a job is
accepted. If these fail, an error message will pop up to let you
know.
- Once your job is accepted it is be added to the job queue. You
will receive an email to let you know once it is complete.
- All files will be deleted after 30 days. Please ensure you
download your results before then.
The imputation process involves re-fitting the model to be tailored to your
dataset. Specifically, the SNP intersection between the input dataset and the
reference panel will be taken, and a model will be fitted to the reference
panel using only these SNPs. This model is then used for carrying out the
imputation for the input dataset.
The greater the SNP intersection, the more accurate the imputation is likely
to be. To ensure adequate performance, we currently require a minimum of 10
SNPs in the intersection in order to carry out a job.
Output
At the conclusion of a KIR*IMP job you will be sent an email with an access key
that will allow you to download the results. The steps to do this are:
- Log in and go to the My jobs
page.
- In the table of jobs, find the one you wish to download and follow
the 'Download results' link.
- When prompted, enter the access key from your email. This will then
start the download.
The download will be a zip archive containing the following files:
imputations.csv
The imputed KIR types. One row for each input haplotype and KIR locus,
showing the most likely (i.e. posterior mode) allele and its associated
posterior probability.
accuracy.csv
Estimates of the average per-haplotype imputation accuracy for each KIR
locus. These relate to the model fitted with the SNP intersection (see
'Imputation' above). The estimates are the out-of-bag (OOB) accuracy
calculated during the model fitting process (see our
Publications for
details).
accuracy.pdf
A plot of the imputation accuracy estimates (see above).
alleles_kir.csv
Counts of the KIR alleles for both the input dataset (imputed) and the
reference panel (known), for each KIR locus. These are also shown in a plot
(see below).
alleles_kir.pdf
A plot comparing the KIR allele counts (see above). The diagonal lines
indicate equal frequency in both datasets. If the KIR loci are imputed
well, and the input dataset and reference panel are representative of the same
population, then the frequencies in the two datasets should be similar
(indicated by the points lying close to the diagonal line).
alleles_snp.csv
Allele frequencies of the SNPs in the intersection between the input
dataset and the reference panel. These are also shown in a plot (see
below).
alleles_snp.pdf
A plot comparing the SNP allele frequencies between the input dataset
and the reference panel. If the two datasets are representative of the same
population, and the SNP alleles use the same encoding (e.g. same strand), then
these frequencies should be similar.
posteriors.pdf
A plot showing the distribution of the posterior probabilities of the
most likely alleles (the same as in imputations.csv
; see above)
for each KIR locus. This is shown as an empirical
cumulative distribution function. Two curves are plotted: one for the
input dataset and, for comparison, one for OOB imputations from the reference
panel. If the KIR loci are imputed well, the two distributions should be
similar.
Data retention
All data files will be deleted 30 days after processing. This includes all
uploaded files as well as all output files associated with each job. Please
ensure you download the results from your imputation jobs before they are
deleted.
Changelog
All notable changes to this project are documented here. This project adheres
to
Semantic Versioning.
1.2.0 (2017-07-20)
Changed:
- Moved to a new server at the University of Melbourne.
- New Terms and Conditions, to reflect the transfer from MCRI to the
University of Melbourne.
- Substantial upgrades to underlying libraries.
- Minor changes and improvements to the web interface.
- Updated the contact details.
1.1.1 (2016-06-24)
Changed:
- Substantial internal changes to the implementation. These are only
of an infrastructural nature, no changes were made to the
statistical model.
- Implemented a 30-day data retention policy. Updated the
Documentation page and the Terms and Conditions (clause 3.4) to
describe this.
- Fixed the numbering of clauses in Section 14 of the Terms and
Conditions.
- Minor changes to wording on some webpages.
- Updated the contact details.
1.1.0 (2015-12-09)
Added:
- SNP encoding re-alignment. Previously it was required that the
input dataset have exactly the same SNP encoding
as the reference panel, including which alleles are coded as 0 and
1. Now the latter requirement is relaxed: the 0/1 encoding is
automatically adjusted to ensure the input dataset matches the
reference panel. It is nevertheless still necessary to ensure
that the SNPs in the input dataset are on the same strand as the
reference panel. A plot comparing the SNP allele frequencies is
now provided as a diagnostic tool to check if there are any
alignment problems.
- Extra output files (useful as diagnostics): comparison of KIR and
SNP allele frequencies, comparison of the distributions of the
imputation posteriors.
- New email address and a link to our new mailing list, on the Contact page.
- Version numbering and a changelog.
Changed:
- Changed output data files to be comma-separated (CSV) rather than
tab-separated (TSV).
- Improved the names of the output files.
- Updated and expanded the documentation.
1.0.0 (2015-10-01)
Initial release. Full details of the methodology and validation are in the
Publications.