Sample input data (Can contain NA entries): here

The expression data should be in the format:

   Sample <tab> cell_name_1 <tab> cell_name_2 <tab> ...
   Gene name 1 <tab> expression in TPM <tab> expression in TPM <tab> ...
   Gene name 2 <tab> expression in TPM <tab> expression in TPM <tab> ...
   .
   .
   .


Sample output file: here

Output file contains the top 100 nearest neighbors (starting from the nearest) of each cell in the input data. The cell type, dataset reference information of each of the nearest neighbor is also provided.

The word "Sample" should be written as-is. We use gene symbols as gene names (Case-insensitive). We will use this gene list (9437 genes) to perform retrieval task. : here

Missing genes will be imputed based on the input data. All genes (even not in the 9437 genes) in the input data will be used for imputation.

Please make sure that your input data contains most of the genes so that our model can have reasonable performance.

The maximum upload file size is 200 MB. Please be patient since it took a while to upload large input file.

password is (no space): scRNA-Seq










The single cell RNA-seq data (in TPM units) used for our paper is here

The supporting tables are here.

The released code is here.
For contacting, please email the email id "zivbj" at the address "cs.cmu.edu"