{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Transcriptomes\n", "==============\n", "\n", "RNAvigate has some functionality to extract transcript-coordinate data from\n", "genomic-coordinate data files." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import rnavigate as rnav\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Transcripts\n", "-----------\n", "\n", "First, we need to set up the genome and transcriptome annotations, then we can retreive information about our transcript(s) of interest, here SERPINA1 (Ensembl ID: ENST00000393087.9).\n", "\n", "As we'll see later, this `Transcript` object provides useful tools on it's own, and can be used with BED files to extract transcript-coordinate profiles or annotations.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "GRCh38 = rnav.transcriptomics.Transcriptome(\n", " genome=\"GCF_000001405.26_GRCh38_genomic.fna\",\n", " annotation=\"MANE.GRCh38.v1.0.ensembl_genomic.gtf\"\n", ")\n", "\n", "SERPINA1 = GRCh38.get_transcript(\"ENST00000393087.9\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "eCLIP Peaks\n", "-----------\n", "\n", "RNAvigate parses BED6 and narrowPeak (BED6+4) files, and includes specific functions to download peak files from the ENCORE eCLIP database.\n", "\n", "First, we can use `rnav.transcriptomics.download_eclip_peaks` to retreive the eCLIP peaks from\n", "[ENCORE](https://www.encodeproject.org/encore-matrix/?type=Experiment&status=released&internal_tags=ENCORE).\n", "This downloads one narrowPeak file for each combination of protein target and cell line (K562 and HepG2).\n", "We only need to do this once.\n", "The data can be saved to a central location and reused in other notebooks.\n", "\n", "With these files, we can create the eCLIP \"database\" using `rnav.transcriptomics.eCLIPDatabase`.\n", "\n", "To help us to start thinking about this data, we can display all of the proteins that bind SERPINA1. Binding sites will be displayed in transcript coordinates." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "eclip_path = \"../../../reference_data/eCLIP_downloads\"\n", "# rnav.transcriptomics.download_eclip_peaks(outpath=eclip_path)\n", "# rnav.transcriptomics.create_eclip_table(inpath=eclip_path, outpath=eclip_path)\n", "eclip = rnav.transcriptomics.eCLIPDatabase(inpath=eclip_path)\n", "\n", "eclip.print_all_peaks(SERPINA1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Creating annotations and profiles\n", "---------------------------------\n", "\n", "We will use the methods of `Transcript` and `eCLIPDatabase` to create annotations and profiles, and assign these directly to data keywords.\n", "We can use any data keywords we like for this assignment.\n", "\n", "`eclip.get_eclip_density` will create a per-nucleotide profile.\n", "The value of each nucleotide is the total number of eCLIP peaks overlapping that position.\n", "This can be useful to get a sense of overall protein binding and which regions may be functional protein-binding scaffolds.\n", "\n", "`eclip.get_annotation` will create an annotation of protein binding regions for a given protein target and cell line.\n", "\n", "`transcript.get_cds_annotation` creates a span annotation to highlight the coding sequence.\n", "\n", "`transcript.get_junctions_annotation` creates a span annotation to highlight exon-exon junctions.\n", "Each span is two nucleotides: the 3' end of the 5' exon, and the 5' end of the 3' exon.\n", "\n", "`transcript.get_exon_annotation` creates a span annotation to highlight a specified exon.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "test = rnav.Sample(\n", " sample=\"SERPINA1 mRNA\",\n", " SERPINA1=SERPINA1,\n", " eCLIP=eclip.get_eclip_density(transcript=SERPINA1, cell_line=\"HepG2\"),\n", " cds=SERPINA1.get_cds_annotation(color=\"red\"),\n", " ddx3x=eclip.get_annotation(SERPINA1, \"HepG2\", \"DDX3X\", color=\"blue\"),\n", " junctions=SERPINA1.get_junctions_annotation(color=\"black\"),\n", " exon3=SERPINA1.get_exon_annotation(3),\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Plotting\n", "--------\n", "\n", "With these profiles and annotations, we can start creating plots.\n", "\n", "For example, here a profile of eCLIP peak density over SERPINA1.\n", "\n", "- red bar: coding sequence\n", "- blue bars: DDX3X binding regions (in the 5' UTR)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot = rnav.plot_profile(\n", " [test],\n", " sequence=\"SERPINA1\",\n", " profile=\"eCLIP\",\n", " annotations=[\"cds\", \"ddx3x\"],\n", ")\n" ] } ], "metadata": { "kernelspec": { "display_name": "RNAvigate", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.2" }, "nbsphinx": { "execute": "never" } }, "nbformat": 4, "nbformat_minor": 2 }