{ "cells": [ { "cell_type": "markdown", "id": "6f3d8ca8", "metadata": {}, "source": [ "# Step 2. Learn peptide-HLA representation" ] }, { "cell_type": "markdown", "id": "69cbc476", "metadata": {}, "source": [ "Peptides are short fragments of proteins that are generated through enzymatic cleavage within cells. These peptides are then presented on the cell surface by HLA (Human Leukocyte Antigen) molecules, allowing the immune system to scan and monitor the internal state of the cell. Some of these presented peptides can trigger T cell activation, forming the core mechanism of adaptive immune surveillance.\n", "\n", "For example, peptides derived from pathogens such as bacteria, viruses, or from cancer cells often differ from those found in normal human cells. When such foreign or abnormal peptides are presented by HLA molecules, they can be recognized by cytotoxic T cells, leading to the targeted killing of infected or malignant cells. However, if self-derived peptides are mistakenly presented and recognized as foreign, this may lead to autoimmune diseases, where the immune system attacks the body’s own healthy tissues." ] }, { "cell_type": "markdown", "id": "cc26934b", "metadata": {}, "source": [ "```{image} ../_static/peptide.png\n", ":alt: 模型结果图\n", ":width: 500px\n", ":align: center" ] }, { "cell_type": "markdown", "id": "c3753d2a", "metadata": {}, "source": [ "## Load peptide-HLA binding data" ] }, { "cell_type": "markdown", "id": "42c19f28", "metadata": {}, "source": [ "Import the necessary libraries and load the peptide-HLA binding data. " ] }, { "cell_type": "code", "execution_count": 14, "id": "31e7d96d", "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "os.environ[\"OMP_NUM_THREADS\"] = \"8\"\n", "os.environ[\"OPENBLAS_NUM_THREADS\"] = \"8\"\n", "os.environ[\"MKL_NUM_THREADS\"] = \"8\"\n", "os.environ[\"VECLIB_MAXIMUM_THREADS\"] = \"8\"\n", "os.environ[\"NUMEXPR_NUM_THREADS\"] = \"8\"" ] }, { "cell_type": "code", "execution_count": 15, "id": "d61512a0", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "trimap version: 1.0.5\n" ] } ], "source": [ "import numpy as np\n", "import torch\n", "import pandas as pd\n", "import trimap\n", "from trimap import utils\n", "from trimap.model import HLA_vae, PEP_vae\n", "\n", "print('trimap version:', trimap.__version__)\n", "\n", "seed = 1234\n", "np.random.seed(seed)\n", "torch.manual_seed(seed)\n", "torch.cuda.manual_seed(seed)\n", "torch.cuda.manual_seed_all(seed)\n", "torch.backends.cudnn.benchmark = False\n", "torch.backends.cudnn.deterministic = True" ] }, { "cell_type": "markdown", "id": "02013dde", "metadata": {}, "source": [ "The data is typically in a CSV format, where each row represents a peptide-HLA pair along. We focus on top 20 HLA alleles with the most data points. [Download pmhc_data.csv](https://drive.google.com/file/d/1_rTAQcoxoy-wuv9h4LQSZLmtKkGuaGXz/view?usp=drive_link)" ] }, { "cell_type": "code", "execution_count": 16, "id": "c1f9e796", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | Epitope | \n", "HLA | \n", "
|---|---|---|
| 0 | \n", "TQGTPLKY | \n", "HLA-A*01:01 | \n", "
| 1 | \n", "LDYHMEKY | \n", "HLA-A*01:01 | \n", "
| 2 | \n", "VTDVGIRY | \n", "HLA-A*01:01 | \n", "
| 3 | \n", "LVDLRNYQ | \n", "HLA-A*01:01 | \n", "
| 4 | \n", "LVDDNYFY | \n", "HLA-A*01:01 | \n", "
| ... | \n", "... | \n", "... | \n", "
| 123926 | \n", "QHQLQQQQL | \n", "HLA-B*38:01 | \n", "
| 123927 | \n", "LHQQESQSL | \n", "HLA-B*38:01 | \n", "
| 123928 | \n", "EHANDQQHI | \n", "HLA-B*38:01 | \n", "
| 123929 | \n", "QHNYPGIKL | \n", "HLA-B*38:01 | \n", "
| 123930 | \n", "SHIVQHERI | \n", "HLA-B*38:01 | \n", "
123931 rows × 2 columns
\n", "