{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Step 2. Scan peptides from gut microbial proteomes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Autoimmune diseases occur when the immune system erroneously targets the body’s own tissues. While genetic predisposition provides a crucial foundation, mounting evidence highlights the role of environmental factors — particularly microbial exposure — in the initiation and progression of autoimmunity. Among the proposed mechanisms, **molecular mimicry** is one of the most well-characterized.\n", "\n", "In this process, microbial peptides exhibit sequence or structural similarity to self-peptides. As a result, T cells initially activated against the pathogen may cross-react with self-antigens, leading to unintended autoimmune responses.\n", "\n", "**Example**: Certain gut bacterial peptides closely resemble self-peptides presented by HLA-B*27, a major genetic risk allele in ankylosing spondylitis (AS). T cells primed by these microbial peptides may subsequently recognize and attack host tissues, triggering chronic inflammation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{image} ../_static/gut_bacterial.png\n", ":alt: 模型结果图\n", ":width: 800px\n", ":align: center" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Scan all possible 9-mer peptides from microbial proteomes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We collected 16 bacterial strains that are known to be associated with AS. The proteomes of these strains were downloaded from the NCBI database. We will scan all possible 9-mer peptides from these proteomes and check if they match any of the self-peptides presented by HLA-B*27.\n", "\n", "All 16 bacterial proteomes can be downloaded [Here](https://drive.google.com/drive/folders/18VGxJh_6d-OJAexfdKDrd5KaSr450OTA?usp=drive_link)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total proteins: 5562\n", "Total peptides: 1658189\n" ] } ], "source": [ "from Bio import SeqIO\n", "import gzip\n", "import pandas as pd\n", "\n", "name = 'RJX1596' # for example, change to the desired protein database name\n", "file_path = 'Data/{}.faa.gz'.format(name)\n", "\n", "all_seqs = []\n", "protein_seqs = []\n", "peptide_dict = {}\n", "with gzip.open(file_path, \"rt\") as handle:\n", " for record in SeqIO.parse(handle, \"fasta\"):\n", " protein_id = record.description\n", " sequence = str(record.seq)\n", " all_seqs.append(sequence)\n", " protein_seqs.append(protein_id)\n", "print(f\"Total proteins: {len(all_seqs)}\")\n", "\n", "def scan_strings(input_list, protein_seqs, length=9):\n", " all_peptides = []\n", " for item, protein in zip(input_list, protein_seqs):\n", " for i in range(0, len(item) - length+1, 1):\n", " new_str = item[i:i+length]\n", " peptide_seq = new_str\n", " all_peptides.append(protein)\n", " if peptide_seq not in peptide_dict:\n", " peptide_dict[peptide_seq] = [protein]\n", " else:\n", " peptide_dict[peptide_seq].append(protein)\n", "\n", "scan_strings(all_seqs, protein_seqs, length=9)\n", "print(f\"Total peptides: {len(peptide_dict)}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Save 9mers to a .pep file" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "peptide_df = pd.DataFrame(peptide_dict.keys(), columns=['Peptide'])\n", "peptide_df.to_csv('{}.pep'.format(name), index=False, header=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## NetMHCpan4.1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We use [NetMHCpan4.1](https://services.healthtech.dtu.dk/service.php?NetMHCpan-4.1) to predict the binding affinity of the peptides to HLA-B*27. NetMHCpan is a widely used tool for predicting peptide-MHC binding, and it has been shown to be effective for a variety of MHC alleles, including HLA-B*27.\n", "\n", "Download the Linux Version 4.1b [Here](https://services.healthtech.dtu.dk/services/NetMHCpan-4.1/)\n", "\n", "Follow the instructions in the netMHCpan-4.1.readme file to install NetMHCpan4.1.\n", "\n", "**Run NetMHCpan to predict HLA affinity**\n", "\n", "In the 'netMHCpan-4.1/test' directory test the software:\n", "\n", "**Predict HLA-27:05 affinity** by running the following command:\n", "\n", "```bash\n", "../netMHCpan -p RJX1596.pep -BA -xls -a HLA-B2705 -xlsfile RJX1596.xls\n", "```\n", "NetMHCpan-4.1 will output a file named `RJX1596.xls` containing the predicted binding affinities of the peptides to HLA-B*27:05." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Select peptides with EL_Rank<5 and BA_Rank<5 (Ranking top 5% of the peptides)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Download the output files for 16 bacterial strains [Here](https://drive.google.com/drive/folders/1WjUtQSiI8V5mFa7ZIpy1JFAUDX61SwFE?usp=drive_link)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total peptides with high affinity with HLA-27:05: 49123\n", "\n", "RangeIndex: 49123 entries, 0 to 49122\n", "Data columns (total 12 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 Pos 49123 non-null int64 \n", " 1 Peptide 49123 non-null object \n", " 2 ID 49123 non-null object \n", " 3 core 49123 non-null object \n", " 4 icore 49123 non-null object \n", " 5 EL-score 49123 non-null float64\n", " 6 EL_Rank 49123 non-null float64\n", " 7 BA-score 49123 non-null float64\n", " 8 BA_Rank 49123 non-null float64\n", " 9 Ave 49123 non-null float64\n", " 10 NB 49123 non-null int64 \n", " 11 Protein_ID 49123 non-null object \n", "dtypes: float64(5), int64(2), object(5)\n", "memory usage: 4.5+ MB\n", "None\n" ] } ], "source": [ "df = pd.read_csv('{}.xls'.format(name), sep='\\t', header=1)\n", "df = df[df['NB']==1]\n", "df = df[df['EL_Rank']<5]\n", "df = df[df['BA_Rank']<5]\n", "df.reset_index(drop=True, inplace=True)\n", "all_peptides = df['Peptide'].values.tolist()\n", "print(f\"Total peptides with high affinity with HLA-27:05: {len(all_peptides)}\")\n", "df['Protein_ID'] = df['Peptide'].apply(lambda x: peptide_dict[x] if x in peptide_dict else x)\n", "print(df.info())" ] } ], "metadata": { "kernelspec": { "display_name": "general", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.21" } }, "nbformat": 4, "nbformat_minor": 2 }