How to build reference genome?

Scenario 1: Building a reference genome that is compatible with single-cell data from different platforms.

If you have both single-cell data from 10X Genomics and SeekSpace® products, it is recommended to use 10X CellRanger to build the reference genome. SeekSpace® Tools is compatible with the reference genome built by CellRanger. The code for processing gene annotation files (GTF files) is as follows:

The code for processing gene annotation files (GTF files) is as follows:

/path/to/cellranger mkgtf Homo_sapiens.GRCh38.ensembl.gtf Homo_sapiens.GRCh38.ensembl.filtered.gtf \
    --attribute=gene_biotype:protein_coding \
    --attribute=gene_biotype:lncRNA \
    --attribute=gene_biotype:antisense \
    --attribute=gene_biotype:IG_LV_gene \
    --attribute=gene_biotype:IG_V_gene \
    --attribute=gene_biotype:IG_V_pseudogene \
    --attribute=gene_biotype:IG_D_gene \
    --attribute=gene_biotype:IG_J_gene \
    --attribute=gene_biotype:IG_J_pseudogene \
    --attribute=gene_biotype:IG_C_gene \
    --attribute=gene_biotype:IG_C_pseudogene \
    --attribute=gene_biotype:TR_V_gene \
    --attribute=gene_biotype:TR_V_pseudogene \
    --attribute=gene_biotype:TR_D_gene \
    --attribute=gene_biotype:TR_J_gene \
    --attribute=gene_biotype:TR_J_pseudogene \
    --attribute=gene_biotype:TR_C_gene
cellranger mkref --genome=GRCh38 --fasta=GRCh38.fa --genes=GRCh38-filtered-ensembl.gtf
cd GRCh38/genes
gunzip -dc genes.gtf.gz > genes.gtf

Note

  • If the reference genome built by CellRanger is not compatible with the STAR version of SeekSpace® Tools, you can specify the STAR path of CellRanger for SeekSpace® Tools with --star_path /path/to/cellranger-5.0.1/lib/bin/STAR.

  • The chromosome names in fasta files must match the chromosome names in the gtf file. For example, if the name of chromosome 1 in fasta files is chr1, then the name of chromosome 1 in the gtf file must also be chr1.


Scenario 2: if you only have SeekSpace® products, there is no need to consider platform compatibility.

The code for building genome index using STAR is as follows:

/demo/seekspacetools_v1.0.0/bin/STAR \
  --runMode genomeGenerate \
  --runThreadN 16 \                        
  --genomeDir /path/to/star \             
  --genomeFastaFiles /path/to/genome.fa \  
  --sjdbGTFfile /path/to/genome.gtf \      
  --sjdbOverhang 149 \                     
  --limitGenomeGenerateRAM 17179869184     

Note

  • The chromosome names in fasta files must match the chromosome names in the gtf file. For example, if the name of chromosome 1 in fasta files is chr1, then the name of chromosome 1 in the gtf file must also be chr1.