rnavigate.transcriptomics package

Submodules

rnavigate.transcriptomics.bed module

class rnavigate.transcriptomics.bed.BedFile(bedfile)

Bases: object

Reads a BED6 file and extracts annotations and profiles.

Parameters

bedfilestr

Path to the bed file.

get_annotation(transcript, **kwargs)

Get annotations for a single transcript.

Parameters

transcriptdata.Transcript

The transcript for which annotations are to be extracted.

**kwargs

Additional keyword arguments to be passed to the data.Annotation object.

Returns

data.Annotation

An Annotation object containing the extracted annotations.

get_annotations(transcripts, **kwargs)

Get annotations for a list of transcripts.

Parameters

transcripts

A list of data.Transcript objects for which annotations are to be extracted.

**kwargs

Additional keyword arguments to be passed to the data.Annotation object.

Returns

dict

A dictionary of data.Annotation objects with the transcripts as keys.

get_density_profile(transcript, **kwargs)

Get a density profile for a single transcript.

Parameters

transcriptdata.Transcript

The transcript for which the profile is to be extracted.

**kwargs

Additional keyword arguments to be passed to the data.Profile object.

Returns

data.Profile

A Profile object containing the extracted profile.

get_profile(transcript, **kwargs)

Get a profile for a single transcript.

Parameters

transcriptdata.Transcript

The transcript for which the profile is to be extracted.

**kwargs

Additional keyword arguments to be passed to the data.Profile object.

Returns

data.Profile

A Profile object containing the extracted profile.

class rnavigate.transcriptomics.bed.NarrowPeak(bedfile)

Bases: BedFile

Reads a narrowPeak (BED6+4) file and extracts annotations and profiles.

Parameters

bedfilestr

Path to the narrowPeak file.

rnavigate.transcriptomics.eclip module

rnavigate.transcriptomics.eclip.create_eclip_table(inpath, outpath)

Create a table file to look up eCLIP filenames from target and cell type.

Parameters

inpathstring

input directory path containing eCLIP bed files

outpathstring

output directory path

rnavigate.transcriptomics.eclip.download_eclip_peaks(outpath, assembly='GRCh38')

Download eCLIP narrowPeak files from ENCODE database

Parameters

outpathstring

output directory path

assembly“h19” or “GRCh38”, default: “GRCh38”

reference genome assembly

class rnavigate.transcriptomics.eclip.eCLIPDatabase(inpath)

Bases: object

Class to handle eCLIP data and to extract annotations and profiles.

Parameters

inpathstring

input directory path containing eCLIP bed files and eclip table file.

get_annotation(transcript, cell_line, target, **kwargs)

Get eCLIP annotation for a transcript.

Parameters

transcriptdata.Transcript

The transcript for which eCLIP annotation is to be extracted.

cell_line“K562” or “HepG2”

Cell line for which eCLIP annotation is to be extracted.

targetstring

Target for which eCLIP annotation is to be extracted.

kwargsdict

Additional keyword arguments to be passed to the get_annotation method.

Returns

data.Annotation

An Annotation object containing the eCLIP annotation.

get_cell_target_data(cell_line, target)

Get the eCLIP data for a specific cell line and target.

Parameters

cell_line“K562” or “HepG2”

Cell line for which eCLIP data is to be extracted.

targetstring

Target for which eCLIP data is to be extracted.

Returns

transcriptomics.NarrowPeak

eCLIP data for the specified cell line and target.

get_eclip_data()

Get eCLIP data for all cell lines and targets.

Returns

dict

A dictionary of eCLIP data with cell lines as keys and targets as subkeys.

get_eclip_density(transcript, cell_line, targets=None)

Get eCLIP density profile for a transcript.

Parameters

transcriptdata.Transcript

The transcript for which eCLIP density is to be extracted.

cell_line“K562” or “HepG2”

Cell line for which eCLIP density is to be extracted.

targetslist of strings, optional

Targets for which eCLIP density is to be extracted. By default, all targets are considered.

Returns

data.Profile

A Profile object containing the eCLIP density values.

get_profile(transcript, cell_line, target)

Get eCLIP profile for a transcript.

Parameters

transcriptdata.Transcript

The transcript for which eCLIP profile is to be extracted.

cell_line“K562” or “HepG2”

Cell line for which eCLIP profile is to be extracted.

targetstring

Target for which eCLIP profile is to be extracted.

Returns

data.Profile

A Profile object containing the eCLIP profile values.

print_all_peaks(transcript)

Print all eCLIP peaks for a transcript.

print_peaks(transcript, cell_line, target)

Print eCLIP peaks for a given transcript, cell line, and target.

rnavigate.transcriptomics.transcriptome module

This submodule defines the Transcriptome and Transcript classes.

Transcriptome objects require genome fasta and annotation gtf files. Then, if provided with a transcript ID, it will return a Transcript object.

Transcript objects contain a transcript sequence and genome coordinates. It can return annotations for CDS, UTR, and exon junctions. It can be used with Bed objects to convert genome coordinate data to transcript coordinate RNAvigate data classes.

class rnavigate.transcriptomics.transcriptome.Transcript(parent, name, sequence, chromosome, strand, coordinates, tx_info, cds_coors=None, other_features=None)

Bases: Sequence

Transcript object for a single transcript.

Parameters

parentTranscriptome

Parent Transcriptome object

namestr

Transcript ID

sequencestr

Transcript sequence

chromosomestr

Chromosome ID

strandstr

Strand of the transcript

coordinatestuple

Tuple of two lists of genome coordinates for the transcript, e.g.: [(start1, start2, …), (stop1, stop2, …)]

tx_infodict

Dictionary of transcript information from the GTF file

cds_coorslist

List of genome coordinates for the CDS

other_featureslist

List of dictionaries of other features from the GTF file

get_cds_annotation(**kwargs)

Return an Annotation object for the CDS.

Parameters

**kwargs

Additional keyword arguments for the Annotation object

get_cds_domains()

Return a Domains object for the 5’ UTR, CDS and 3’ UTR.

get_coordinate_df()

Return a DataFrame of transcript coordinates.

get_exon_annotation(exon_number, **kwargs)

Return an Annotation object for a single exon.

Parameters

exon_numberint

Exon number

**kwargs

Additional keyword arguments for the Annotation object

get_exon_domains()

Return a Domains object for the exons.

get_junctions_annotation(**kwargs)

Return an Annotation object for the exon junctions.

Parameters

**kwargs

Additional keyword arguments for the Annotation object

get_tx_coordinate(coordinate)

Return the transcript coordinate for a genome coordinate.

get_tx_range(start, stop)

Return the transcript coordinates for a genome range.

class rnavigate.transcriptomics.transcriptome.Transcriptome(genome, annotation, path, chr_ids=None)

Bases: object

Transcriptome object for a genome and annotation file.

Parameters

genomestr

Path to the genome fasta file

annotationstr

Path to the annotation gtf file

pathstr or Path

Path to the genome and annotation files

chr_idsdict

Dictionary of chromosome IDs

get_sequence(chromosome, coordinates, strand)

Return a transcript sequence for a single transcript.

get_sequences(chromosomes, coordinates, strands)

Return a dictionary of transcript sequences.

get_transcript(transcript_id)

Return a Transcript object for a single transcript ID.

get_transcripts(transcript_ids)

Return a dictionary of Transcript objects for a list of transcript IDs.

Module contents

class rnavigate.transcriptomics.BedFile(bedfile)

Bases: object

Reads a BED6 file and extracts annotations and profiles.

Parameters

bedfilestr

Path to the bed file.

get_annotation(transcript, **kwargs)

Get annotations for a single transcript.

Parameters

transcriptdata.Transcript

The transcript for which annotations are to be extracted.

**kwargs

Additional keyword arguments to be passed to the data.Annotation object.

Returns

data.Annotation

An Annotation object containing the extracted annotations.

get_annotations(transcripts, **kwargs)

Get annotations for a list of transcripts.

Parameters

transcripts

A list of data.Transcript objects for which annotations are to be extracted.

**kwargs

Additional keyword arguments to be passed to the data.Annotation object.

Returns

dict

A dictionary of data.Annotation objects with the transcripts as keys.

get_density_profile(transcript, **kwargs)

Get a density profile for a single transcript.

Parameters

transcriptdata.Transcript

The transcript for which the profile is to be extracted.

**kwargs

Additional keyword arguments to be passed to the data.Profile object.

Returns

data.Profile

A Profile object containing the extracted profile.

get_profile(transcript, **kwargs)

Get a profile for a single transcript.

Parameters

transcriptdata.Transcript

The transcript for which the profile is to be extracted.

**kwargs

Additional keyword arguments to be passed to the data.Profile object.

Returns

data.Profile

A Profile object containing the extracted profile.

class rnavigate.transcriptomics.NarrowPeak(bedfile)

Bases: BedFile

Reads a narrowPeak (BED6+4) file and extracts annotations and profiles.

Parameters

bedfilestr

Path to the narrowPeak file.

class rnavigate.transcriptomics.Transcript(parent, name, sequence, chromosome, strand, coordinates, tx_info, cds_coors=None, other_features=None)

Bases: Sequence

Transcript object for a single transcript.

Parameters

parentTranscriptome

Parent Transcriptome object

namestr

Transcript ID

sequencestr

Transcript sequence

chromosomestr

Chromosome ID

strandstr

Strand of the transcript

coordinatestuple

Tuple of two lists of genome coordinates for the transcript, e.g.: [(start1, start2, …), (stop1, stop2, …)]

tx_infodict

Dictionary of transcript information from the GTF file

cds_coorslist

List of genome coordinates for the CDS

other_featureslist

List of dictionaries of other features from the GTF file

get_cds_annotation(**kwargs)

Return an Annotation object for the CDS.

Parameters

**kwargs

Additional keyword arguments for the Annotation object

get_cds_domains()

Return a Domains object for the 5’ UTR, CDS and 3’ UTR.

get_coordinate_df()

Return a DataFrame of transcript coordinates.

get_exon_annotation(exon_number, **kwargs)

Return an Annotation object for a single exon.

Parameters

exon_numberint

Exon number

**kwargs

Additional keyword arguments for the Annotation object

get_exon_domains()

Return a Domains object for the exons.

get_junctions_annotation(**kwargs)

Return an Annotation object for the exon junctions.

Parameters

**kwargs

Additional keyword arguments for the Annotation object

get_tx_coordinate(coordinate)

Return the transcript coordinate for a genome coordinate.

get_tx_range(start, stop)

Return the transcript coordinates for a genome range.

class rnavigate.transcriptomics.Transcriptome(genome, annotation, path, chr_ids=None)

Bases: object

Transcriptome object for a genome and annotation file.

Parameters

genomestr

Path to the genome fasta file

annotationstr

Path to the annotation gtf file

pathstr or Path

Path to the genome and annotation files

chr_idsdict

Dictionary of chromosome IDs

get_sequence(chromosome, coordinates, strand)

Return a transcript sequence for a single transcript.

get_sequences(chromosomes, coordinates, strands)

Return a dictionary of transcript sequences.

get_transcript(transcript_id)

Return a Transcript object for a single transcript ID.

get_transcripts(transcript_ids)

Return a dictionary of Transcript objects for a list of transcript IDs.

rnavigate.transcriptomics.create_eclip_table(inpath, outpath)

Create a table file to look up eCLIP filenames from target and cell type.

Parameters

inpathstring

input directory path containing eCLIP bed files

outpathstring

output directory path

rnavigate.transcriptomics.download_eclip_peaks(outpath, assembly='GRCh38')

Download eCLIP narrowPeak files from ENCODE database

Parameters

outpathstring

output directory path

assembly“h19” or “GRCh38”, default: “GRCh38”

reference genome assembly

class rnavigate.transcriptomics.eCLIPDatabase(inpath)

Bases: object

Class to handle eCLIP data and to extract annotations and profiles.

Parameters

inpathstring

input directory path containing eCLIP bed files and eclip table file.

get_annotation(transcript, cell_line, target, **kwargs)

Get eCLIP annotation for a transcript.

Parameters

transcriptdata.Transcript

The transcript for which eCLIP annotation is to be extracted.

cell_line“K562” or “HepG2”

Cell line for which eCLIP annotation is to be extracted.

targetstring

Target for which eCLIP annotation is to be extracted.

kwargsdict

Additional keyword arguments to be passed to the get_annotation method.

Returns

data.Annotation

An Annotation object containing the eCLIP annotation.

get_cell_target_data(cell_line, target)

Get the eCLIP data for a specific cell line and target.

Parameters

cell_line“K562” or “HepG2”

Cell line for which eCLIP data is to be extracted.

targetstring

Target for which eCLIP data is to be extracted.

Returns

transcriptomics.NarrowPeak

eCLIP data for the specified cell line and target.

get_eclip_data()

Get eCLIP data for all cell lines and targets.

Returns

dict

A dictionary of eCLIP data with cell lines as keys and targets as subkeys.

get_eclip_density(transcript, cell_line, targets=None)

Get eCLIP density profile for a transcript.

Parameters

transcriptdata.Transcript

The transcript for which eCLIP density is to be extracted.

cell_line“K562” or “HepG2”

Cell line for which eCLIP density is to be extracted.

targetslist of strings, optional

Targets for which eCLIP density is to be extracted. By default, all targets are considered.

Returns

data.Profile

A Profile object containing the eCLIP density values.

get_profile(transcript, cell_line, target)

Get eCLIP profile for a transcript.

Parameters

transcriptdata.Transcript

The transcript for which eCLIP profile is to be extracted.

cell_line“K562” or “HepG2”

Cell line for which eCLIP profile is to be extracted.

targetstring

Target for which eCLIP profile is to be extracted.

Returns

data.Profile

A Profile object containing the eCLIP profile values.

print_all_peaks(transcript)

Print all eCLIP peaks for a transcript.

print_peaks(transcript, cell_line, target)

Print eCLIP peaks for a given transcript, cell line, and target.