rnavigate.transcriptomics package
Submodules
rnavigate.transcriptomics.bed module
- class rnavigate.transcriptomics.bed.BedFile(bedfile)
Bases:
objectReads a BED6 file and extracts annotations and profiles.
Parameters
- bedfilestr
Path to the bed file.
- get_annotation(transcript, **kwargs)
Get annotations for a single transcript.
Parameters
- transcriptdata.Transcript
The transcript for which annotations are to be extracted.
- **kwargs
Additional keyword arguments to be passed to the data.Annotation object.
Returns
- data.Annotation
An Annotation object containing the extracted annotations.
- get_annotations(transcripts, **kwargs)
Get annotations for a list of transcripts.
Parameters
- transcripts
A list of data.Transcript objects for which annotations are to be extracted.
- **kwargs
Additional keyword arguments to be passed to the data.Annotation object.
Returns
- dict
A dictionary of data.Annotation objects with the transcripts as keys.
- get_density_profile(transcript, **kwargs)
Get a density profile for a single transcript.
Parameters
- transcriptdata.Transcript
The transcript for which the profile is to be extracted.
- **kwargs
Additional keyword arguments to be passed to the data.Profile object.
Returns
- data.Profile
A Profile object containing the extracted profile.
- get_profile(transcript, **kwargs)
Get a profile for a single transcript.
Parameters
- transcriptdata.Transcript
The transcript for which the profile is to be extracted.
- **kwargs
Additional keyword arguments to be passed to the data.Profile object.
Returns
- data.Profile
A Profile object containing the extracted profile.
rnavigate.transcriptomics.eclip module
- rnavigate.transcriptomics.eclip.create_eclip_table(inpath, outpath)
Create a table file to look up eCLIP filenames from target and cell type.
Parameters
- inpathstring
input directory path containing eCLIP bed files
- outpathstring
output directory path
- rnavigate.transcriptomics.eclip.download_eclip_peaks(outpath, assembly='GRCh38')
Download eCLIP narrowPeak files from ENCODE database
Parameters
- outpathstring
output directory path
- assembly“h19” or “GRCh38”, default: “GRCh38”
reference genome assembly
- class rnavigate.transcriptomics.eclip.eCLIPDatabase(inpath)
Bases:
objectClass to handle eCLIP data and to extract annotations and profiles.
Parameters
- inpathstring
input directory path containing eCLIP bed files and eclip table file.
- get_annotation(transcript, cell_line, target, **kwargs)
Get eCLIP annotation for a transcript.
Parameters
- transcriptdata.Transcript
The transcript for which eCLIP annotation is to be extracted.
- cell_line“K562” or “HepG2”
Cell line for which eCLIP annotation is to be extracted.
- targetstring
Target for which eCLIP annotation is to be extracted.
- kwargsdict
Additional keyword arguments to be passed to the get_annotation method.
Returns
- data.Annotation
An Annotation object containing the eCLIP annotation.
- get_cell_target_data(cell_line, target)
Get the eCLIP data for a specific cell line and target.
Parameters
- cell_line“K562” or “HepG2”
Cell line for which eCLIP data is to be extracted.
- targetstring
Target for which eCLIP data is to be extracted.
Returns
- transcriptomics.NarrowPeak
eCLIP data for the specified cell line and target.
- get_eclip_data()
Get eCLIP data for all cell lines and targets.
Returns
- dict
A dictionary of eCLIP data with cell lines as keys and targets as subkeys.
- get_eclip_density(transcript, cell_line, targets=None)
Get eCLIP density profile for a transcript.
Parameters
- transcriptdata.Transcript
The transcript for which eCLIP density is to be extracted.
- cell_line“K562” or “HepG2”
Cell line for which eCLIP density is to be extracted.
- targetslist of strings, optional
Targets for which eCLIP density is to be extracted. By default, all targets are considered.
Returns
- data.Profile
A Profile object containing the eCLIP density values.
- get_profile(transcript, cell_line, target)
Get eCLIP profile for a transcript.
Parameters
- transcriptdata.Transcript
The transcript for which eCLIP profile is to be extracted.
- cell_line“K562” or “HepG2”
Cell line for which eCLIP profile is to be extracted.
- targetstring
Target for which eCLIP profile is to be extracted.
Returns
- data.Profile
A Profile object containing the eCLIP profile values.
- print_all_peaks(transcript)
Print all eCLIP peaks for a transcript.
- print_peaks(transcript, cell_line, target)
Print eCLIP peaks for a given transcript, cell line, and target.
rnavigate.transcriptomics.transcriptome module
This submodule defines the Transcriptome and Transcript classes.
Transcriptome objects require genome fasta and annotation gtf files. Then, if provided with a transcript ID, it will return a Transcript object.
Transcript objects contain a transcript sequence and genome coordinates. It can return annotations for CDS, UTR, and exon junctions. It can be used with Bed objects to convert genome coordinate data to transcript coordinate RNAvigate data classes.
- class rnavigate.transcriptomics.transcriptome.Transcript(parent, name, sequence, chromosome, strand, coordinates, tx_info, cds_coors=None, other_features=None)
Bases:
SequenceTranscript object for a single transcript.
Parameters
- parentTranscriptome
Parent Transcriptome object
- namestr
Transcript ID
- sequencestr
Transcript sequence
- chromosomestr
Chromosome ID
- strandstr
Strand of the transcript
- coordinatestuple
Tuple of two lists of genome coordinates for the transcript, e.g.: [(start1, start2, …), (stop1, stop2, …)]
- tx_infodict
Dictionary of transcript information from the GTF file
- cds_coorslist
List of genome coordinates for the CDS
- other_featureslist
List of dictionaries of other features from the GTF file
- get_cds_annotation(**kwargs)
Return an Annotation object for the CDS.
Parameters
- **kwargs
Additional keyword arguments for the Annotation object
- get_cds_domains()
Return a Domains object for the 5’ UTR, CDS and 3’ UTR.
- get_coordinate_df()
Return a DataFrame of transcript coordinates.
- get_exon_annotation(exon_number, **kwargs)
Return an Annotation object for a single exon.
Parameters
- exon_numberint
Exon number
- **kwargs
Additional keyword arguments for the Annotation object
- get_exon_domains()
Return a Domains object for the exons.
- get_junctions_annotation(**kwargs)
Return an Annotation object for the exon junctions.
Parameters
- **kwargs
Additional keyword arguments for the Annotation object
- get_tx_coordinate(coordinate)
Return the transcript coordinate for a genome coordinate.
- get_tx_range(start, stop)
Return the transcript coordinates for a genome range.
- class rnavigate.transcriptomics.transcriptome.Transcriptome(genome, annotation, path, chr_ids=None)
Bases:
objectTranscriptome object for a genome and annotation file.
Parameters
- genomestr
Path to the genome fasta file
- annotationstr
Path to the annotation gtf file
- pathstr or Path
Path to the genome and annotation files
- chr_idsdict
Dictionary of chromosome IDs
- get_sequence(chromosome, coordinates, strand)
Return a transcript sequence for a single transcript.
- get_sequences(chromosomes, coordinates, strands)
Return a dictionary of transcript sequences.
- get_transcript(transcript_id)
Return a Transcript object for a single transcript ID.
- get_transcripts(transcript_ids)
Return a dictionary of Transcript objects for a list of transcript IDs.
Module contents
- class rnavigate.transcriptomics.BedFile(bedfile)
Bases:
objectReads a BED6 file and extracts annotations and profiles.
Parameters
- bedfilestr
Path to the bed file.
- get_annotation(transcript, **kwargs)
Get annotations for a single transcript.
Parameters
- transcriptdata.Transcript
The transcript for which annotations are to be extracted.
- **kwargs
Additional keyword arguments to be passed to the data.Annotation object.
Returns
- data.Annotation
An Annotation object containing the extracted annotations.
- get_annotations(transcripts, **kwargs)
Get annotations for a list of transcripts.
Parameters
- transcripts
A list of data.Transcript objects for which annotations are to be extracted.
- **kwargs
Additional keyword arguments to be passed to the data.Annotation object.
Returns
- dict
A dictionary of data.Annotation objects with the transcripts as keys.
- get_density_profile(transcript, **kwargs)
Get a density profile for a single transcript.
Parameters
- transcriptdata.Transcript
The transcript for which the profile is to be extracted.
- **kwargs
Additional keyword arguments to be passed to the data.Profile object.
Returns
- data.Profile
A Profile object containing the extracted profile.
- get_profile(transcript, **kwargs)
Get a profile for a single transcript.
Parameters
- transcriptdata.Transcript
The transcript for which the profile is to be extracted.
- **kwargs
Additional keyword arguments to be passed to the data.Profile object.
Returns
- data.Profile
A Profile object containing the extracted profile.
- class rnavigate.transcriptomics.NarrowPeak(bedfile)
Bases:
BedFileReads a narrowPeak (BED6+4) file and extracts annotations and profiles.
Parameters
- bedfilestr
Path to the narrowPeak file.
- class rnavigate.transcriptomics.Transcript(parent, name, sequence, chromosome, strand, coordinates, tx_info, cds_coors=None, other_features=None)
Bases:
SequenceTranscript object for a single transcript.
Parameters
- parentTranscriptome
Parent Transcriptome object
- namestr
Transcript ID
- sequencestr
Transcript sequence
- chromosomestr
Chromosome ID
- strandstr
Strand of the transcript
- coordinatestuple
Tuple of two lists of genome coordinates for the transcript, e.g.: [(start1, start2, …), (stop1, stop2, …)]
- tx_infodict
Dictionary of transcript information from the GTF file
- cds_coorslist
List of genome coordinates for the CDS
- other_featureslist
List of dictionaries of other features from the GTF file
- get_cds_annotation(**kwargs)
Return an Annotation object for the CDS.
Parameters
- **kwargs
Additional keyword arguments for the Annotation object
- get_cds_domains()
Return a Domains object for the 5’ UTR, CDS and 3’ UTR.
- get_coordinate_df()
Return a DataFrame of transcript coordinates.
- get_exon_annotation(exon_number, **kwargs)
Return an Annotation object for a single exon.
Parameters
- exon_numberint
Exon number
- **kwargs
Additional keyword arguments for the Annotation object
- get_exon_domains()
Return a Domains object for the exons.
- get_junctions_annotation(**kwargs)
Return an Annotation object for the exon junctions.
Parameters
- **kwargs
Additional keyword arguments for the Annotation object
- get_tx_coordinate(coordinate)
Return the transcript coordinate for a genome coordinate.
- get_tx_range(start, stop)
Return the transcript coordinates for a genome range.
- class rnavigate.transcriptomics.Transcriptome(genome, annotation, path, chr_ids=None)
Bases:
objectTranscriptome object for a genome and annotation file.
Parameters
- genomestr
Path to the genome fasta file
- annotationstr
Path to the annotation gtf file
- pathstr or Path
Path to the genome and annotation files
- chr_idsdict
Dictionary of chromosome IDs
- get_sequence(chromosome, coordinates, strand)
Return a transcript sequence for a single transcript.
- get_sequences(chromosomes, coordinates, strands)
Return a dictionary of transcript sequences.
- get_transcript(transcript_id)
Return a Transcript object for a single transcript ID.
- get_transcripts(transcript_ids)
Return a dictionary of Transcript objects for a list of transcript IDs.
- rnavigate.transcriptomics.create_eclip_table(inpath, outpath)
Create a table file to look up eCLIP filenames from target and cell type.
Parameters
- inpathstring
input directory path containing eCLIP bed files
- outpathstring
output directory path
- rnavigate.transcriptomics.download_eclip_peaks(outpath, assembly='GRCh38')
Download eCLIP narrowPeak files from ENCODE database
Parameters
- outpathstring
output directory path
- assembly“h19” or “GRCh38”, default: “GRCh38”
reference genome assembly
- class rnavigate.transcriptomics.eCLIPDatabase(inpath)
Bases:
objectClass to handle eCLIP data and to extract annotations and profiles.
Parameters
- inpathstring
input directory path containing eCLIP bed files and eclip table file.
- get_annotation(transcript, cell_line, target, **kwargs)
Get eCLIP annotation for a transcript.
Parameters
- transcriptdata.Transcript
The transcript for which eCLIP annotation is to be extracted.
- cell_line“K562” or “HepG2”
Cell line for which eCLIP annotation is to be extracted.
- targetstring
Target for which eCLIP annotation is to be extracted.
- kwargsdict
Additional keyword arguments to be passed to the get_annotation method.
Returns
- data.Annotation
An Annotation object containing the eCLIP annotation.
- get_cell_target_data(cell_line, target)
Get the eCLIP data for a specific cell line and target.
Parameters
- cell_line“K562” or “HepG2”
Cell line for which eCLIP data is to be extracted.
- targetstring
Target for which eCLIP data is to be extracted.
Returns
- transcriptomics.NarrowPeak
eCLIP data for the specified cell line and target.
- get_eclip_data()
Get eCLIP data for all cell lines and targets.
Returns
- dict
A dictionary of eCLIP data with cell lines as keys and targets as subkeys.
- get_eclip_density(transcript, cell_line, targets=None)
Get eCLIP density profile for a transcript.
Parameters
- transcriptdata.Transcript
The transcript for which eCLIP density is to be extracted.
- cell_line“K562” or “HepG2”
Cell line for which eCLIP density is to be extracted.
- targetslist of strings, optional
Targets for which eCLIP density is to be extracted. By default, all targets are considered.
Returns
- data.Profile
A Profile object containing the eCLIP density values.
- get_profile(transcript, cell_line, target)
Get eCLIP profile for a transcript.
Parameters
- transcriptdata.Transcript
The transcript for which eCLIP profile is to be extracted.
- cell_line“K562” or “HepG2”
Cell line for which eCLIP profile is to be extracted.
- targetstring
Target for which eCLIP profile is to be extracted.
Returns
- data.Profile
A Profile object containing the eCLIP profile values.
- print_all_peaks(transcript)
Print all eCLIP peaks for a transcript.
- print_peaks(transcript, cell_line, target)
Print eCLIP peaks for a given transcript, cell line, and target.