Quick start ============ In this page one will find a quick start guide for the software immunopepper. Immunopepper is a software tool for the detection of neoantigens from a splicing graph. It generates the set of all theoretically feasible peptide sequences (or kmers) through direct translation of all walks along the graph. The general way of using immunopepper varies depending on the aim of the user. The following flowchart be helpful to decide which is the best way to use immunopepper and its different modes, according to the user's needs. .. image:: imgs/flowchart_immunopepper.png :width: 600 :align: center :alt: Immunopepper flowchart Installation ------------- For the installation from source, the user needs to clone the git repository of immunopepper. Moreover, there are some prerequisites that need to be installed depending on the mode that is run. The commands needed to install immunopepper are: .. code-block:: git clone https://github.com/ratschlab/immunopepper.git conda create -n immunopepper python=3.9 conda activate immunopepper conda install cython conda install -c bioconda 'pyvcf3==1.0.3' make install More information on the installation and on the prerequisutes for cancerspecif, mhcbind and pepquery modes can be found in the :ref:`installation` section. The installation can be tested by running: .. code-block:: immunopepper -h Installation test and usage ----------------------------- Once you have installed immunopepper, the installation can be tested by running the build mode on an easy dummy example. More information about the example can be found in the :ref:`tutorials` section. **Data location:** The data for the example can be found in the github repository under the folder: "immunopepper/tests/data_simulated /data". **Command line:** The command line prompt to run the example is: .. code-block:: immunopepper build --output-dir immunopepper_usecase/ --ann-path immunopepper/tests/data_simulated/data/build_mode/simulated_Ipp.gtf --splice-path immunopepper/tests/data_simulated/data/build_mode/genes_graph_conf3.merge_graphs.pickle --ref-path immunopepper/tests/data_simulated/data/build_mode/genome.fa --kmer 9 --count-path immunopepper/tests/data_simulated/data/build_mode/genes_graph_conf3.merge_graphs.count.hdf5 --parallel 1 --batch-size 1 --start-id 0 --process-num 0 --output-fasta --verbose 2 **Terminal output:** The output that you will see in the terminal if the run was successful is: .. code-block:: 2023-06-22 12:48:54,100 INFO Command lineNamespace(output_dir='immunopepper_usecase/', ann_path='immunopepper/tests/data_simulated/data/build_mode/simulated_Ipp.gtf', splice_path='immunopepper/tests/data_simulated/data/build_mode/genes_graph_conf3.merge_graphs.pickle', ref_path='immunopepper/tests/data_simulated/data/build_mode/genome.fa', kmer=9, libsize_extract=False, all_read_frames=False, count_path='immunopepper/tests/data_simulated/data/build_mode/genes_graph_conf3.merge_graphs.count.hdf5', output_samples=[], heter_code=0, compressed=True, parallel=1, batch_size=1, pickle_samples=[], process_chr=None, complexity_cap=None, genes_interest=None, start_id=0, process_num=0, skip_annotation=False, libsize_path=None, output_fasta=True, force_ref_peptides=False, filter_redundant=False, kmer_database=None, gtex_junction_path=None, disable_concat=False, disable_process_libsize=False, mutation_sample=None, germline='', somatic='', sample_name_map=None, use_mut_pickle=False, verbose=2) 2023-06-22 12:48:54,100 INFO >>>>>>>>> Build: Start Preprocessing 2023-06-22 12:48:54,100 INFO Building lookup structure ... 2023-06-22 12:48:54,101 INFO Time spent: 0.000 seconds 2023-06-22 12:48:54,102 INFO Memory usage: 0.159 GB 2023-06-22 12:48:54,102 INFO Loading count data ... 2023-06-22 12:48:54,104 INFO Time spent: 0.002 seconds 2023-06-22 12:48:54,104 INFO Memory usage: 0.160 GB 2023-06-22 12:48:54,104 INFO Loading splice graph ... 2023-06-22 12:48:54,105 INFO Time spent: 0.000 seconds 2023-06-22 12:48:54,105 INFO Memory usage: 0.161 GB 2023-06-22 12:48:54,105 INFO Add reading frame to splicegraph ... 2023-06-22 12:48:54,107 INFO Time spent: 0.002 seconds 2023-06-22 12:48:54,107 INFO Memory usage: 0.161 GB 2023-06-22 12:48:54,107 INFO >>>>>>>>> Finish Preprocessing 2023-06-22 12:48:54,107 INFO >>>>>>>>> Start traversing splicegraph 2023-06-22 12:48:54,107 INFO >>>> Processing output_sample cohort, there are 9 graphs in total 2023-06-22 12:48:54,108 INFO Saving results to immunopepper_usecase/cohort_mutNone 2023-06-22 12:48:54,108 INFO Not Parallel 2023-06-22 12:48:54,108 INFO >>>>>>>>> Start Background processing 2023-06-22 12:48:54,111 INFO Saved ref_annot_peptides.fa.gz with 40 lines in 0.0003s 2023-06-22 12:48:54,111 INFO Saved ref_annot_kmer.gz with 294 lines in 0.0002s 2023-06-22 12:48:54,113 DEBUG ....cohort: annotation graph from batch all/9 processed, max time cost: 0.0, memory cost: 0.16 GB 2023-06-22 12:48:54,113 INFO >>>>>>>>> Start Foreground processing 2023-06-22 12:48:54,175 INFO Saved gene_expression_detail.gz with 9 lines in 0.0006s 2023-06-22 12:48:54,176 INFO Saved ref_sample_peptides.fa.gz with 88 lines in 0.0004s 2023-06-22 12:48:54,177 INFO Saved ref_sample_peptides_meta.gz with 44 lines in 0.0005s 2023-06-22 12:48:54,177 DEBUG ....cohort: output_sample graph from batch all/9 processed, max time cost: 0.02, memory cost: 0.16 GB 2023-06-22 12:48:54,188 INFO Saved library size results to immunopepper_usecase/expression_counts.libsize.tsv **Output files:** If the run was successful, you should see the following contents in the output directory: .. code-block:: immunopepper_usecase/ ├── cohort_mutNone │ ├── Annot_IS_SUCCESS │ ├── gene_expression_detail.gz │ ├── output_sample_IS_SUCCESS │ ├── ref_annot_kmer.gz │ ├── ref_annot_peptides.fa.gz │ ├── ref_graph_kmer_JuncExpr │ │ ├── part-*.gz │ │ ├── part-*.gz │ │ └── part-*.gz │ ├── ref_graph_kmer_SegmExpr │ │ ├── part-*.gz │ │ ├── part-*.gz │ │ ├── part-*.gz │ │ ├── part-*.gz │ │ ├── part-*.gz │ │ ├── part-*.gz │ │ ├── part-*.gz │ │ ├── part-*.gz │ │ └── part-*.gz │ ├── ref_sample_peptides.fa.gz │ └── ref_sample_peptides_meta.gz .. note:: The "*" in some file names refer to a number and letter sequence unique to each run and part. Other links of interest ----------------------- - The input parameters and a more detailed description of each mode can be found in :ref:`modes` section. - A more detailed description of the output files and what they contain can be found in :ref:`outputs` section. - More tutorials for the different modes are located in the :ref:`tutorials` section. - The input data for the build tutorial can be found in the folder: **immunopepper/tests/data_simulated/data** The input for the other modes are the outputs of build mode. - The output data for each of the tutorials can be found under the folder **immunopepper/immunopepper_usecase**: