rnavigate.data package

Submodules

rnavigate.data.alignments module

Alignment objects map coordinates, vectors, and dataframes to a new sequence

Classes

BaseAlignment (ABC): abstract base class for alignments
SequenceAlignment (BaseAlignment): aligns one sequence another sequence
RegionAlignment (BaseAlignment): cuts a sequence between a start and end position
AlignmentChain (BaseAlignment): allows chaining of above alignments

class rnavigate.data.alignments.AlignmentChain(*alignments)

Bases: BaseAlignment

Combines a list of alignments into one.

Parameters

alignmentslist of Alignment objects: the alignments to chain together

Attributes

alignmentslist: the constituent alignments
starting_sequencestr: starting sequence of alignments[0]
target_sequencestr: target sequence of alignments[-1]
mappingnumpy.array: an array which maps from starting_sequence to target_sequence. index of starting_sequence is mapping[index] of target sequence

get_inverse_alignment(): Alignments require a method to get the inverted alignment

get_mapping()

combines mappings from each alignment.

Returns

mappingnumpy.array: mapping from initial starting sequence to final target sequence index of starting_sequence is mapping[index] of target sequence

class rnavigate.data.alignments.BaseAlignment(starting_sequence, target_length)

Bases: ABC

Abstract base class for alignments

Parameters

starting_sequencestring: the sequence to be aligned
target_lengthint: the length of the target sequence

Attributes

starting_sequencestring: the beginning sequence
mappingnumpy.array: the alignment map array. index of starting_sequence is mapping[index] of target_sequence
target_sequencestring: the portion of starting sequence that is mapped
target_lengthinteger: the length of the target sequence

abstractmethod get_inverse_alignment(): Alignments require a method to get the inverted alignment

abstractmethod get_mapping(): Alignments require a mapping from starting to target sequence

get_target_sequence(): Gets the portion of starting sequence that fits the alignment

map_dataframe(dataframe, position_columns)

Takes a dataframe and maps position columns to target sequence.

Rows with unmapped positions are dropped.

Parameters

dataframepandas.DataFrame: a dataframe with position columns
position_columnslist of str: a list of columns containing positions to map

Returns

pandas.DataFrame: a new dataframe (copy) with position columns mapped or dropped

map_indices(indices, keep_minus_one=True)

Takes a list of indices (0-index) and maps them to target sequence

Parameters

indicesint or list of int: a single or list of integer indices
keep_minus_onebool, defaults to True: whether to keep unmapped starting sequence indices (-1) in the returned array.

Returns

numpy.array: the equivalent indices in target sequence

map_nucleotide_dataframe(dataframe, position_column='Nucleotide', sequence_column='Sequence')

Takes a per-nt dataframe and map it to the target sequence.

Dataframe must have 1 row per nucleotide in starting sequence, with a position column and a sequence column. Dataframe is mapped to have the same format, but for target sequence nucleotides and positions.

Parameters

dataframepandas.DataFrame: a per-nucleotide dataframe
position_columnstring, defaults to “Nucleotide”: name of the position column.
sequence_columnstring, defaults to “Sequence”: name of the sequence column.

Returns

pandas.DataFrame: a new dataframe (copy) mapped to target sequence. Unmapped starting sequence positions are dropped and unmapped target sequence positions are filled.

map_positions(positions, keep_zero=True)

Takes a list of positions (1-index) and maps them to target sequence

Parameters

positionsint or list of int: a single or list of integer positions
keep_zerobool, defaults to True: whether to keep unmapped starting sequence positions (0) in the returned array.

Returns

numpy.array: the equivalent positions in target sequence

map_values(values, fill=nan)

Takes an array of length equal to starting sequence and maps them to target sequence, unmapped positions in starting sequence are dropped and unmapped positions in target sequence are filled with fill value.

Parameters

valuesiterable: values to map to target sequence.
fillany, defaults to np.nan: a value for unmapped positions in target sequence.

Returns

numpy.array: an array of values equal in length to target sequence

class rnavigate.data.alignments.SequenceAlignment(sequence1, sequence2, align_kwargs=None, full=False, use_previous=True)

Bases: BaseAlignment

The most useful feature of RNAvigate. Maps positions from one sequence to a totally different sequence using user-defined pairwise alignment or automatic pairwise alignment.

Parameters

sequence1string: the sequence to be aligned
sequence2string: the sequence to align to
align_kwargsdict, defaults to None: a dictionary of arguments to pass to pairwise2.align.globalms
fullbool, defaults to False: whether to keep unmapped starting sequence positions.
use_previousbool, defaults to True: whether to use previously set alignments

Attributes

sequence1str: the sequence to be aligned
sequence2str: the sequence to align to
alignment1str: the alignment string matching sequence1 to sequence2
alignment2str: the alignment string matching sequence2 to sequence1
starting_sequencestr: sequence1
target_sequencestr: sequence2 if full is False, else alignment2
mappingnumpy.array: the alignment map array. index of starting_sequence is mapping[index] of target_sequence

get_alignment()

Gets an alignment that has either been user-defined or previously calculated or produces a new pairwise alignment between two sequences.

Returns

alignment1, alignment2tuple of 2 str: the alignment strings matching sequence1 and sequence2, respectively.

get_inverse_alignment(): Gets an alignment that maps from sequence2 to sequence1.

get_mapping()

Calculates a mapping from starting sequence to target sequence.

Returns

mappingnumpy.array: an array that maps to an index of target sequence. index of starting_sequence is mapping[index] of target_sequence

print(print_format='full')

Print the alignment in a human-readable format.

Parameters

print_format“full”, “cigar”, “long” or “short”, defaults to “full”: how to format the alignment. “full”: the full length alignment with changes labeled “X” “cigar”: the CIGAR string “long”: locations and sequences of each change “short”: total number of matches, mismatches, and indels

print_all_changes(): Print location and sequence of all changes.

print_cigar(): Print the CIGAR string

print_number_of_changes(): Print the total numbers of matches, mismatches, and indels.

class rnavigate.data.alignments.StructureAlignment(sequence1, sequence2, structure1=None, structure2=None, full=False)

Bases: BaseAlignment

Experimental secondary structure alignment based on RNAlign2D algorithm (https://doi.org/10.1186/s12859-021-04426-8)

Parameters

sequence1string: the sequence to be aligned
sequence2string: the sequence to align to
structure1string, defaults to None: the secondary structure of sequence1
structure2string, defaults to None: the secondary structure of sequence2
fullbool, defaults to False: whether to align to full length of sequence2 or just mapped length

Attributes

sequence1str: the sequence to be aligned
sequence2str: the sequence to align to
structure1str: the secondary structure of sequence1
structure2str: the secondary structure of sequence2
alignment1str: the alignment string matching sequence1 to sequence2
alignment2str: the alignment string matching sequence2 to sequence1
starting_sequencestr: sequence1
target_sequencestr: sequence2 if full is False, else alignment2
mappingnumpy.array: the alignment map array. index of starting_sequence is mapping[index] of target_sequence

get_alignment()

Aligns pseudo-amino-acid sequences according to RNAlign2D rules.

Returns

alignment1, alignment2tuple of 2 str: the alignment strings matching sequence1 and sequence2, respectively.

get_inverse_alignment(): Gets an alignment that maps from sequence2 to sequence1.

get_mapping()

Calculates a mapping from starting sequence to target sequence.

Returns

mappingnumpy.array: an array which maps an indices to the target sequence. starting_sequence[idx] == target_sequence[self.mapping[idx]]

set_as_default_alignment(): Set this as the default alignment between sequence1 and sequence2.

rnavigate.data.alignments.convert_sequence(aas, nts, dbn)

Convert pseudo-amino-acid sequence to nucleotide and dotbracket or vice versa.

Parameters

aasstring or True: the amino acid sequence if True, returns the amino acid translation of nts and dbn
ntsstring or True: the nucleotide sequence if True, returns the nucleotide translation of aas
dbnstring or True: the dot-bracket notation string if True, returns the dot-bracket translation of aas

Returns

string: sequence of the specified translation. If nts and dbn are True, returns a tuple.

Example

conver_sequence(aas=”ACDEFGHIKLMNPQRSTVWY”, nts=True, dbn=True) returns (“AAAAACCCCCUUUUUGGGGG”, “([.])([.])([.])([.])”)

rnavigate.data.alignments.lookup_alignment(sequence1, sequence2, t_or_u='U')

look up a previously set alignment in the _alignments_cache

Parameters

sequence1string: The first sequence to align
sequence2string: The second sequence to be aligned to
t_or_u“T”, “U”, or False, defaults to “U”: “T” converts “U”s to “T”s “U” converts “U”s to “T”s False does nothing

Returns

dictionary, if an alignment is found, otherwise None

{“seqA”: sequence1 with gap characters representing alignment,: “seqB”: sequence2 with gap characters representing alignment}

rnavigate.data.alignments.set_alignment(sequence1, sequence2, alignment1, alignment2, t_or_u='U')

Add an alignment to be used as the default between two sequences.

When objects with these sequences are aligned for visualization, RNAvigate uses this alignment instead of an automated pairwise sequence alignment. Alignment 1 and 2 must have matching lengths. alignment(1,2) and sequence(1,2) must differ only by dashes “-“.

e.g.:: sequence1 =”AAGCUUCGGUACAUGCAAGAUGUAC” sequence2 =”AUCGAUCGAGCUGCUGUGUACGUAC” alignment1=”AAGCUUCG———GUACAUGCAAGAUGUAC” alignment2=”AUCGAUCGAGCUGCUGUGUAC———GUAC”

|mm| | indel | | indel |

Parameters

sequence1string: the first sequence
sequence2string: the second sequence
alignment1string: first sequence, plus dashes “-” indicating indels
alignment2string: second sequence, plus dashes “-” indicating indels
t_or_u“T”, “U”, or False: “T” converts “U”s to “T”s

rnavigate.data.alignments.set_multiple_sequence_alignment(fasta, set_pairwise=False)

Set alignments from a multiple sequence alignment Pearson fasta file.

Sets alignments to a base sequence, then returns the base sequence to be when a multiple sequence alignment plot is desired. Also sets all pairwise alignments, if desired. When setting pairwise alignments, dashes that are shared between pairwise sequences are removed first.

Parameters

fastastring: location of Pearson fasta file
set_pairwisebool, defaults to False: whether to set every pairwise alignment as well as the multiple sequence alignment.

rnavigate.data.annotation module

annotations.py contains Annotations and subclasses.

class rnavigate.data.annotation.Annotation(input_data, annotation_type, sequence, name=None, color='blue')

Bases: Sequence

Basic annotation class to store 1D features of an RNA sequence

Each feature type must be a seperate instance. Feature types include:: a group of separted nucleotides (e.g. binding pocket) regions of interest (e.g. coding sequence, Alu elements) sites of interest (e.g. m6A locations) primer binding sites.

Parameters

input_datalist

List will be treated according to annotation_type argument. Expected behaviors for each value of annotation_type: “sites” or “group”: 1-indexed location of sites of interest

example: [1, 10, 20, 30] is four sites, 1, 10, 20, and 30

“spans”: 1-indexed, inclusive locations of spans of interest: example: [[1, 10], [20, 30]] is two spans, 1 to 10 and 20 to 30
“primers”: Similar to spans, but 5’/3’ direction is preserved.: example: [[1, 10], [30, 20]] forward 1 to 10, reverse 30 to 20

annotation_type“group”, “sites”, “spans”, or “primers”

The type of annotation.

sequencestr or pandas.DataFrame

Nucleotide sequence, path to fasta file, or dataframe containing a “Sequence” column.

namestr, defaults to None

Name of annotation.

colormatplotlib color-like, defaults to “blue”

Color to be used for displaying this annotation on plots.

Attributes

datapandas.DataFrame: Stores the list of sites or regions
namestr: The label for this annotation for use on plots
colorvalid matplotlib color: Color to represent annotation on plots
sequencestr: The reference sequence string

property boolean: Return a boolean array of the annotation on the sequence.

classmethod from_boolean_array(values, sequence, annotation_type, name, color='blue', window=1)

Create an Annotation from an array of boolean values.

True values are used to create the Annotation.

Parameters

valueslist of True or False: the boolean array
sequencestring or rnav.data.Sequence: the sequence of the Annotation
annotation_type“spans”, “sites”, “primers”, or “group”: the type of the new annotation If “spans” or “primers”, adjacent True values, or values within window are collapse to a region.
namestring: a name for labelling the annotation.
colorstring, defaults to “blue”: a color for plotting the annotation
windowinteger, defaults to 1: a window around True values to include in the annotation.

Returns

rnavigate.data.Annotation: the new Annotation

from_sites(sites): Create the self.data dataframe from a list of sites.

from_spans(spans): Create the self.data dataframe from a list of spans.

get_aligned_data(alignment)

Aligns this Annotation to a new sequence and returns a copy.

Parameters

alignmentrnavigate.data.Alignment: Alignment object used to align to a new sequence.

Returns

rnavigate.data.Annotation: A new Annotation with the same name, color, and annotation type, but with the input data aligned to the target sequence.

get_sites()

Returns a list of nucleotide positions included in this annotation.

Returns

sitestuple: a list of nucleotide positions

get_subsequences(buffer=0)

class rnavigate.data.annotation.Motif(input_data, sequence, name=None, color='blue')

Bases: Annotation

Automatically annotates the occurances of a sequence motif as spans.

Parameters

input_datastr: sequence motif to search for. Uses conventional nucleotide codes. e.g. “DRACH” = [AGTU] [AG] A C [ATUC]
sequencestr or pandas.DataFrame: Nucleotide sequence, path to fasta file, or dataframe containing a “Sequence” column.
namestr, defaults to None: Name of annotation.
colormatplotlib color-like, defaults to “blue”: Color to be used for displaying this annotation on plots.

Attributes

datapandas.DataFrame: Stores the list of regions that match the motif
namestr: The label for this annotation for use on plots
colorvalid matplotlib color: Color to represent annotation on plots
sequencestr: The reference sequence string

get_aligned_data(alignment)

Searches the new sequence for the motif and returns a new Motif annotation.

Parameters

alignmentrnavigate.data.Alignment: Alignment object used to align to a new sequence.

Returns

rnavigate.data.Motif: A new Motif with the same name, color, and motif but with the input data aligned to the target sequence.

get_spans_from_motif(sequence, motif)

Returns a list of spans for each location of motif found within sequence.

Parameters

sequencestring: sequence to be searched
motifstring: sequence motif to searched for.

Returns

spanslist of lists: list of [start, end] positions of each motif occurance

class rnavigate.data.annotation.ORFs(input_data, name=None, sequence=None, color='blue')

Bases: Annotation

Automatically annotations occurances of open-reading frames as spans.

Parameters

input_data“longest” or “all”: which ORFs to annotate. “longest” annotates the longest ORF. “all” annotates all potential ORFs.
sequencestr or pandas.DataFrame: Nucleotide sequence, path to fasta file, or dataframe containing a “Sequence” column.
namestr, defaults to None: Name of annotation.
colormatplotlib color-like, defaults to “blue”: Color to be used for displaying this annotation on plots.

Attributes

datapandas.DataFrame: Stores the list of regions that match the motif
namestr: The label for this annotation for use on plots
colorvalid matplotlib color: Color to represent annotation on plots
sequencestr: The reference sequence string

get_aligned_data(alignment)

Searches the new sequence for ORFs and returns a new ORF annotation.

Parameters

alignmentrnavigate.data.Alignment: Alignment object used to align to a new sequence.

Returns

rnavigate.data.ORFs: A new ORFs annotation with the same name, color, and input_data but with the input data aligned to the target sequence.

get_spans_from_orf(sequence, which='all')

Given a sequence string, returns spans for specified ORFs

Parameters

sequencestring: RNA nucleotide sequence
which“longest” or “all”, defaults to “all”: “all” returns all spans, “longest” returns the longest span

Returns

list of tuples: (start, end) position of each ORF 1-indexed, inclusive

rnavigate.data.annotation.domains(input_data, names, colors, sequence)

Create a list of Annotations from a list of spans.

Currently, domains functionality in RNAvigate just uses a list of spans. In the future, this should be a dedicated class. Generally, domains should cover an entire sequence without overlap, but this is not enforced. e.g. [[1, 100], [101, 200]] for a 200 nt sequence.

Parameters

input_datalist of lists: list of spans for each domain
nameslist of strings: list of names for each domain
colorslist of valid matplotlib colors: list of colors for each domain
sequencestring: sequence to be annotated

Returns

list of rnavigate.data.Annotation: list of Annotations

rnavigate.data.colors module

class rnavigate.data.colors.ScalarMappable(cmap, normalization, values, title='', tick_labels=None, **cbar_args)

Bases: _ScalarMappable

Used to map scalar values to a color and to create a colorbar plot.

Parameters

cmapstr, tuple, float, or list: A valid mpl color, list of valid colors or a valid colormap name
normalization“min_max”, “0_1”, “none”, or “bins”: The type of normalization to use when mapping values to colors
valueslist: The values to use when normalizing the data
titlestr, defaults to “”: The title of the colorbar.
tick_labelslist, defaults to None: The labels to use for the colorbar ticks. If None, values are determined automatically.
**cbar_argsdict: Additional arguments to pass to the colorbar function

Attributes

rnav_normstr: The type of normalization to use when mapping values to colors
rnav_valslist: The values to use when normalizing the data
rnav_cmaplist: The colors to use when mapping values to colors
cbar_argsdict: Additional arguments to pass to the colorbar function
tick_labelslist: The labels to use for the colorbar ticks. If None, values are determined automatically.
titlestr: The title of the colorbar.

get_cmap(cmap)

Converts a cmap specification to a matplotlib colormap object.

Parameters

cmapstring, tuple, float, or list: A valid mpl color, list of valid colors or a valid colormap name

Returns

matplotlib colormap: a colormap matching the input

get_norm(normalization, values, cmap)

Given a normalization type and values, return a normalization object.

Parameters

normalization“min_max”, “0_1”, “none”, or “bins”: The type of normalization to use when mapping values to colors
valueslist: The values to use when normalizing the data
cmapmatplotlib colormap: The colormap to use when normalizing the data

Returns

matplotlib.colors normalization object: Used to normalize data before mapping to colors

is_equivalent_to(cmap2)

Check if two ScalarMappable objects are equivalent.

Parameters

cmap2ScalarMappable: The ScalarMappable object to compare to

Returns

bool: True if the two ScalarMappable objects are equivalent, False otherwise

values_to_hexcolors(values, alpha=1.0)

Map values to colors and return a list of hex colors.

Parameters

valueslist: The values to map to colors
alphafloat, defaults to 1.0: The alpha value to use for the colors

Returns

list of strings: A list of hex colors

rnavigate.data.data module

Classes for storing and manipulating data for RNAvigate.

This module contains the base classes for RNAvigate data classes:: Sequence: represents a nucleotide sequence Data: represents a data table with a sequence

class rnavigate.data.data.Data(input_data, sequence, metric, metric_defaults, read_table_kw=None, name=None)

Bases: Sequence

The base class for RNAvigate Profile and Interactions classes.

Parameters

input_datapandas.DataFrame or str: a pandas dataframe or path to a data file
sequencestring or rnavigate.data.Sequence: the sequence to use for the data
metricstring or dict: the column of the dataframe to use as the default metric to visualize
metric_defaultsdict: a dictionary of metric defaults
read_table_kwdict, optional: kwargs dictionary passed to pd.read_table
namestring, optional: the name of the data, defaults to None

Attributes

datapandas.DataFrame: the data table
filepathstring: the path to the data file
sequencestring or rnavigate.data.Sequence: the sequence to use for the data
metricstring or dict: the column of the dataframe to use as the metric to visualize
metric_defaultsdict: A dictionary of metric values and default settings for visualization
default_metricstring: the default metric to use for visualization

add_metric_defaults(metric_defaults): Add metric defaults to self.metric_defaults

property cmap: Get the colormap to use for colorbars and to retrieve colors.

property color_column: Get the column of the dataframe to use as the color for visualization.

property colors: Get one matplotlib color-like value for each nucleotide in self.sequence.

property error_column: Get the column of the dataframe to use as the error for visualization.

property metric: Get the column of the dataframe to use as the metric for visualization.

read_file(filepath, read_table_kw)

Convert data file to pandas dataframe and store as self.data

Parameters

filepathstring: path to data file containing interactions
read_table_kwdict: kwargs dictionary passed to pd.read_table

Returns

dataframepandas.DataFrame: the data table

class rnavigate.data.data.Sequence(input_data, name=None, entry=0)

Bases: object

A class for storing and manipulating RNA sequences.

Parameters

sequencestring or pandas.DataFrame: sequence string, fasta file, or a Pandas dataframe containing a “Sequence” column
namestring, optional: The name of the sequence, defaults to None
entryint, defaults to 0: The index of the sequence in the fasta file if a fasta file is provided

Attributes

sequencestring: The sequence string
namestring: The name of the sequence
other_infodict: A dictionary of additional information about the sequence
null_alignmentSequenceAlignment: An alignment of the sequence to itself

get_aligned_data(alignment)

Get a copy of the sequence positionally aligned to another sequence.

Parameters

alignmentrnavigate.data.Alignment: the alignment to use

Returns

aligned_sequencernavigate.data.Sequence: the aligned sequence

get_colors(source, pos_cmap='rainbow', profile=None, structure=None, annotations=None)

Get colors and colormap representing information about the sequence.

Parameters

sourcestr, list, or matplotlib color-like

the source of the color information if a string, must be one of:

“sequence”, “position”, “profile”, “structure”, “annotations”

if a list, must be a list of matplotlib color-like values, colormap: will be None.
if a matplotlib color-like value, all nucleotides will be colored: that color, colormap will be None.

pos_cmapstr, defaults to “rainbow”

cmap used for position colors if source is “position”

profilernavigate.data.Profile, optional

the profile to use to get colors if source is “profile”

structurernavigate.data.SecondaryStructure, optional

the structure to use to get colors if source is “structure”

annotationslist of rnavigate.data.Annotations, optional

the annotations to use to get colors if source is “annotations”

Returns

colorsnumpy array: one matplotlib color-like value for each nucleotide in self.sequence
colormaprnavigate.data.ScalarMappable: a colormap used for creating a colorbar

get_colors_from_annotations(annotations, default_color='gray')

Get colors and colormap representing sequence annotations.

Parameters

annotationslist of rnavigate.data.Annotations: the annotations to use to get colors.
default_colormatplotlib color-like, defaults to “gray”: the color to use for nucleotides not in any annotation

Returns

colorsnumpy array: one matplotlib color-like value for each nucleotide in self.sequence
colormaprnavigate.data.ScalarMappable: a colormap used for creating a colorbar

get_colors_from_positions(pos_cmap='rainbow')

Get colors and colormap representing the nucleotide position.

Parameters

pos_cmapstr, defaults to “rainbow”: cmap used for position colors

Returns

colorsnumpy array: one matplotlib color-like value for each nucleotide in self.sequence
colormaprnavigate.data.ScalarMappable: a colormap used for creating a colorbar

get_colors_from_profile(profile)

Get colors and colormap representing per-nucleotide data.

Parameters

profilernavigate.data.Profile: the profile to use to get colors.

Returns

colorsnumpy array: one matplotlib color-like value for each nucleotide in self.sequence
colormaprnavigate.data.ScalarMappable: a colormap used for creating a colorbar

get_colors_from_sequence()

Get a colors and colormap representing the nucleotide sequence.

Returns

colorsnumpy array: one matplotlib color-like value for each nucleotide in self.sequence
colormaprnavigate.data.ScalarMappable: a colormap used for creating a colorbar

get_colors_from_structure(structure)

Get colors and colormap representing base-pairing status.

Parameters

structurernavigate.data.SecondaryStructure: the structure to use to get colors.

Returns

colorsnumpy array: one matplotlib color-like value for each nucleotide in self.sequence
colormaprnavigate.data.ScalarMappable: a colormap used for creating a colorbar

get_region(region='all')

Checks region input for validity and returns start and end positions.

If region is “all”, returns 1, self.length. Otherwise, ensures that region is between these values and returns the values, sorted.

Parameters

regionlist of 2 int: start and end positions of the region

Returns

start, endint, int: the starting and ending positions

get_region_data(region='all')

Get a copy of the data object containing only the specified region.

Parameters

regionlist of 2 int, defaults to “all”: start and end positions of the region

Returns

region_datarnavigate.data.Sequence: the sequence containing only the specified region

get_seq_from_dataframe(dataframe)

Parse a dataframe for the sequence string, store as self.sequence.

Parameters

dataframepandas.DataFrame: must contain a “Sequence” column

property length

Get the length of the sequence

Returns

lengthint: the length of self.sequence

normalize_sequence(t_or_u='U', uppercase=True)

Converts sequence to all uppercase nucleotides and corrects T or U.

Parameters

t_or_u“T”, “U”, or False, defaults to “U”: “T” converts “U”s to “T”s “U” converts “T”s to “U”s False does nothing.
uppercasebool, defaults to True: Whether to make sequence all uppercase

read_fasta(fasta, entry)

Parse a fasta file for the first sequence.

Parameters

fastastring: path to fasta file
entryint: the index of the sequence in the fasta file

Returns

sequencestring: the sequence string

write_fasta(file, name)

Write the sequence to a fasta file.

Parameters

filestring: path to output fasta file
namestring: the name of the sequence to write in the fasta file

rnavigate.data.data.normalize_sequence(sequence, t_or_u='U', uppercase=True)

Returns sequence as all uppercase nucleotides and/or corrects T or U.

Parameters

sequencestring or RNAvigate Sequence): The sequence If given an RNAvigate Sequence, the sequence string is retrieved
t_or_u“T”, “U”, or False, defaults to “U”: “T” converts “U”s to “T”s “U” converts “T”s to “U”s False does nothing
uppercase bool, defaults to True: Whether to make sequence all uppercase

Returns

string
the cleaned-up sequence string

rnavigate.data.interactions module

class rnavigate.data.interactions.AllPossible(sequence, metric='data', input_data=None, metric_defaults=None, read_table_kw=None, window=1, name=None)

Bases: Interactions

A class for storing and manipulating all possible interactions.

Parameters

sequencestring or rnavigate.data.Sequence

The sequence string corresponding to the pairing probability data.

metricstring, defaults to “Probability”

The column name to use for visualization.

metric_defaultsdict

Keys are metric names and values are dictionaries of metric-specific defaults. These defaults include:

“metric_column”string
the column name to use for visualization

“cmap”string or matplotlib.colors.Colormap
the colormap to use for visualization

“normalization”“min_max”, “0_1”, “none”, or “bins”
The type of normalization to use when mapping values to colors

“values”list of float
The values to used with normalization of the data

“title”string
the title to use for colorbars

“extend”“min”, “max”, “both”, or “neither”
Which ends to extend when drawing the colorbar.

“tick_labels” : list of string

read_table_kwdict, optional

kwargs passed to pandas.read_table() when reading input_data.

windowint, defaults to 1

The window size used to generate the pairing probability data.

namestr, optional

A name for the AllPossible object.

Attributes

datapandas.DataFrame: The pairing probability data.

class rnavigate.data.interactions.Interactions(input_data, sequence, metric, metric_defaults, read_table_kw=None, window=1, name=None)

Bases: Data

A class for storing and manipulating interactions data.

Parameters

input_datastring or pandas.DataFrame

If string, a path to a file containing interactions data. If dataframe, the dataframe containing interactions data. The dataframe must contain columns “i”, “j”, and self.metric. Dataframe may also include other columns.

sequencestring or rnavigate.data.Sequence

The sequence string corresponding to the interactions data.

metricstring

The column name to use for visualization.

metric_defaultsdict

Keys are metric names and values are dictionaries of metric-specific defaults. These defaults include:

“metric_column”string
the column name to use for visualization

“cmap”string or matplotlib.colors.Colormap)
the colormap to use for visualization

“normalization”“min_max”, “0_1”, “none”, or “bins”
The type of normalization to use when mapping values to colors

“values”list of float
The values to used with normalization of the data

“title”string
the title to use for colorbars

“extend”“min”, “max”, “both”, or “neither”
Which ends to extend when drawing the colorbar.

“tick_labels” : list of string

read_table_kwdict

kwargs passed to pandas.read_table() when reading input_data.

windowint

The window size used to generate the interactions data.

namestr

The name of the data object.

Attributes

datapandas.DataFrame: The interactions data.
windowint: The window size that is being represented by i-j pairs.

copy(apply_filter=False)

Returns a copy of the interactions, optionally with masked rows removed.

Parameters

apply_filterbool, defaults to False: If True, masked rows (“mask” == False) are dropped.

Returns

rnavigate.data.Interactions: A copy of the interactions.

count_filter(**kwargs): Counts the number of interactions that pass the given filters.

data_specific_filter(**kwargs)

Does nothing for the base Interactions class, can be overwritten in subclasses.

Returns:: dict: dictionary of keyword argument pairs

filter(prefiltered=False, reset_filter=True, structure=None, min_cd=None, max_cd=None, paired_only=False, ss_only=False, ds_only=False, profile=None, min_profile=None, max_profile=None, compliments_only=False, nts=None, max_distance=None, min_distance=None, exclude_nts=None, isolate_nts=None, resolve_conflicts=None, **kwargs)

Convenience function that applies the above filters simultaneously.

Parameters

prefilteredbool, defaults to False

If True, the mask is not updated.

reset_filterbool, defaults to True

If True, the mask is reset before applying filters.

structurernavigate.data.SecondaryStructure, defaults to None

The structure to use for filtering.

min_cdint, defaults to None

The minimum contact distance to allow.

max_cdint, defaults to None

The maximum contact distance to allow.

paired_onlybool, defaults to False

If True, only keep interactions that are paired in the structure.

ss_onlybool, defaults to False

If True, only keep interactions between single-stranded nucleotides.

ds_onlybool, defaults to False

If True, only keep interactions between double-stranded nucleotides.

profilernavigate.data.Profile, defaults to None

The profile to use for masking.

min_profilefloat, defaults to None

The minimum profile value to allow.

max_profilefloat, defaults to None

The maximum profile value to allow.

compliments_onlybool, defaults to False

If True, only keep interactions where i and j are complimentary nucleotides.

ntsstr, defaults to None

If compliment_only is False, only keep interactions where i and j are in nts.

max_distanceint, defaults to None

The maximum distance to allow. If None, no maximum distance is set.

min_distanceint, defaults to None

The minimum distance to allow. If None, no minimum distance is set.

exclude_ntslist of int, defaults to None

A list of positions to exclude.

isolate_ntslist of int, defaults to None

A list of positions to isolate.

resolve_conflictsstr, defaults to None

If not None, conflicting windows are resolved using the Maximal Weighted Independent Set. The weights are taken from the metric value. The graph is first broken into components to speed up the identification of the MWIS. Then the mask is updated to only include the MWIS.

**kwargsdict

Each keyword should have the format “column_operator” where column is a valid column name of the dataframe and operator is one of:

“ge”: greater than or equal to “le”: less than or equal to “gt”: greater than “lt”: less than “eq”: equal to “ne”: not equal to

The values given to these keywords are then used in the comparison and False comparisons are filtered out. e.g.:

self.mask_on_values(Statistic_ge=23) evaluates to: self.update_mask(self.data[“Statistic”] >= 23)

Returns

masknumpy array: a boolean array of the same length as self.data

get_aligned_data(alignment, apply_filter=True)

Returns a copy mapped to a new sequence with masked rows removed.

Parameters

alignmentrnavigate.data.SequenceAlignment: The alignment to use for mapping the interactions.
apply_filterbool, defaults to True: If True, masked rows (“mask” == False) are dropped.

Returns

rnavigate.data.Interactions: Interactions mapped to a new sequence.

get_ij_colors()

Gets i, j, and colors lists for plotting interactions.

i and j are the 5’ and 3’ ends of each interaction, and colors is the color to use for each interaction. Values of self.data[self.metric] are normalized to 0 to 1, which correspond to self.min_max values. These are then mapped to a color using self.cmap.

Returns

ilist: 5’ ends of each interaction
jlist: 3’ ends of each interaction
colorslist: colors to use for each interaction

get_sorted_data()

Returns a copy of the data sorted by self.metric.

Returns

pandas.DataFrame: a copy of the data sorted by self.metric

mask_on_distance(max_dist=None, min_dist=None)

Mask interactions based on their distance in sequence space.

Parameters

max_distint, defaults to None: The maximum distance to allow. If None, no maximum distance is set.
min_distint, defaults to None: The minimum distance to allow. If None, no minimum distance is set.

Returns

masknumpy array: a boolean array of the same length as self.data

mask_on_position(exclude=None, isolate=None)

Mask interactions based on their i and j positions.

Parameters

excludelist of int, defaults to None: A list of positions to exclude.
isolatelist of int, defaults to None: A list of positions to isolate.

Returns

masknumpy array: a boolean array of the same length as self.data

mask_on_profile(profile, min_profile=None, max_profile=None)

Masks interactions based on per-nucleotide measurements.

Parameters

profilernavigate.data.Profile: The profile to use for masking.
min_profilefloat, defaults to None: The minimum profile value to allow.
max_profilefloat, defaults to None: The maximum profile value to allow.

Returns

masknumpy array: a boolean array of the same length as self.data

mask_on_sequence(compliment_only=None, nts=None)

Mask interactions based on sequence.

Parameters

compliment_onlybool, defaults to None: If True, only keep interactions where i and j are complimentary nucleotides.
ntsstr, defaults to None: If compliment_only is False, only keep interactions where i and j are in nts.

Returns

numpy array: a boolean array of the same length as self.data

mask_on_structure(structure, min_cd=None, max_cd=None, ss_only=False, ds_only=False, paired_only=False)

Masks interactions based on a secondary structure.

Parameters

structurernavigate.data.SecondaryStructure: The secondary structure to use for masking.
min_cdint, defaults to None: The minimum contact distance to allow.
max_cdint, defaults to None: The maximum contact distance to allow.
ss_onlybool, defaults to False: If True, only keep interactions between single-stranded nucleotides.
ds_onlybool, defaults to False: If True, only keep interactions between double-stranded nucleotides.
paired_onlybool, defaults to False: If True, only keep interactions that are paired in the structure.

Returns

masknumpy array: a boolean array of the same length as self.data

mask_on_values(**kwargs)

Mask interactions based on values in self.data.

Parameters

kwargsdict

Each keyword should have the format “column_operator” where column is a valid column name of the dataframe and operator is one of:

“ge”: greater than or equal to “le”: less than or equal to “gt”: greater than “lt”: less than “eq”: equal to “ne”: not equal to

The values given to these keywords are then used in the comparison and False comparisons are filtered out. e.g.:

self.mask_on_values(Statistic_ge=23) evaluates to: self.update_mask(self.data[“Statistic”] >= 23)

Returns

masknumpy array: a boolean array of the same length as self.data

print_new_file(outfile=None)

Create a new file with mapped and filtered interactions.

Parameters

outfilestr, defaults to None: path to an output file. If None, file string is printed to console.

reset_mask(): Resets the mask to all True (removes previous filters)

resolve_conflicts(metric=None)

Uses an experimental method to resolve conflicts.

Resolves conflicting windows using the Maximal Weighted Independent Set. The weights are taken from the metric value. The graph is first broken into components to speed up the identification of the MWIS. Then the mask is updated to only include the MWIS. This method is computationally expensive for large or dense datasets.

Parameters

metricstr, defaults to None: The metric to use for weighting the graph. If None, self.metric is used.

Returns

masknumpy array: a boolean array of the same length as self.data

set_3d_distances(pdb, atom): Wrapper for set_distances for backwards compatibility.

set_distances(structure, atom="O2'")

Sets the Distance column value based on nt distances in the given structure.

If structure is a SecondaryStructure, contact distances are calculated, and if structure is a PDB, 3D distances are calculated. These distances are averaged across the window and stored in a new “Distance” column in self.data.

Parameters

structurernavigate.data.SecondaryStructure or rnavigate.data.PDB: Structure object to use for calculating distances
atomstr: atom id to use for calculating distances in a PDB structure

update_mask(mask): Updates the mask by ANDing the current mask with the given mask.

class rnavigate.data.interactions.PAIRMaP(input_data, sequence=None, metric='Class', metric_defaults=None, read_table_kw=None, window=1, name=None)

Bases: RINGMaP

A class for storing and manipulating PAIRMaP data.

Parameters

input_datastring or pandas.DataFrame

If string, a path to a file containing PAIRMaP data. If dataframe, the dataframe containing PAIRMaP data. The dataframe must contain columns “i”, “j”, “Statistic”, and “Class”. Dataframe may also include other columns.

sequencestring or rnavigate.data.Sequence

The sequence string corresponding to the PAIRMaP data.

metricstring, defaults to “Class”

The column name to use for visualization.

metric_defaultsdict

Keys are metric names and values are dictionaries of metric-specific defaults. These defaults include:

“metric_column”string
the column name to use for visualization

“cmap”string or matplotlib.colors.Colormap)
the colormap to use for visualization

“normalization”“min_max”, “0_1”, “none”, or “bins”
The type of normalization to use when mapping values to colors

“values”list of float
The values to used with normalization of the data

“title”string
the title to use for colorbars

“extend”“min”, “max”, “both”, or “neither”
Which ends to extend when drawing the colorbar.

“tick_labels” : list of string

read_table_kwdict, optional

kwargs passed to pandas.read_table() when reading input_data.

windowint, defaults to 1

The window size used to generate the PAIRMaP data. If an input file is provided, this value is overwritten by the value in the header.

namestr, optional

A name for the interactions object.

Attributes

datapandas.DataFrame: The PAIRMaP data.

data_specific_filter(all_pairs=False, **kwargs)

Used by Interactions.filter(). By default, non-primary and -secondary pairs are removed. all_pairs=True changes this behavior.

Parameters

all_pairsbool, defaults to False: whether to include all PAIRs.

Returns

kwargsdict: any additional keyword-argument pairs are returned
masknumpy array: a boolean array of the same length as self.data

get_sorted_data()

Same as parent function, unless metric is set to “Class”, in which case ij pairs are returned in a different order.

Returns

pandas.DataFrame: a copy of the data sorted by self.metric

read_file(filepath, read_table_kw=None)

Parses a pairmap.txt file and stores data as a dataframe

Sets self.window (usually 3), from header.

Parameters

filepathstr: path to pairmap.txt file
read_table_kwdict, defaults to None: This argument is ignored.

class rnavigate.data.interactions.PairingProbability(input_data, extension=None, sequence=None, metric='Probability', metric_defaults=None, read_table_kw=None, window=1, name=None)

Bases: Interactions

A class for storing and manipulating pairing probability data.

Parameters

input_datastring or pandas.DataFrame

If string, a path to a file containing pairing probability data. If dataframe, the dataframe containing pairing probability data. The dataframe must contain columns “i”, “j”, “Probability”, and “log10p”. Dataframe may also include other columns.

extensionstring, defaults to None

The file extension of the input_data. If None, the extension is determined from the input_data string. Options are “.bps”, “.txt”, and “.dp”. If the extension is “.bps”, the sequence is parsed from the file. If the extension is “.txt” or “.dp”, the sequence must be provided via the sequence argument.

sequencestring or rnavigate.data.Sequence

The sequence string corresponding to the pairing probability data.

metricstring, defaults to “Probability”

The column name to use for visualization.

metric_defaultsdict

Keys are metric names and values are dictionaries of metric-specific defaults. These defaults include:

“metric_column”string
the column name to use for visualization

“cmap”string or matplotlib.colors.Colormap
the colormap to use for visualization

“normalization”“min_max”, “0_1”, “none”, or “bins”
The type of normalization to use when mapping values to colors

“values”list of float
The values to used with normalization of the data

“title”string
the title to use for colorbars

“extend”“min”, “max”, “both”, or “neither”
Which ends to extend when drawing the colorbar.

“tick_labels” : list of string

read_table_kwdict, optional

kwargs passed to pandas.read_table() when reading input_data.

windowint, defaults to 1

The window size used to generate the pairing probability data.

namestr, optional

A name for the PairingProbability object.

Attributes

datapandas.DataFrame: The pairing probability data.

data_specific_filter(**kwargs)

By default, interactions with probabilities less than 0.03 are removed.

Returns

kwargsdict: any additional keyword-argument pairs are returned
masknumpy array: a boolean array of the same length as self.data

get_entropy_profile(print_out=False, save_file=None)

Calculates per-nucleotide Shannon entropy from pairing probabilities.

Parameters

print_outbool, defaults to False: If True, entropy values are printed to console.
save_filestr, defaults to None: If not None, entropy values are saved to this file.

Returns

rnavigate.data.Profile: a Profile object containing the entropy data

read_bps()

Parses a bps file and returns sequence as a string and data as a dataframe.

Returns

str: the sequence string
pandas.DataFrame: the pairing probability data

read_txt()

Parses a pairing probability file and returns data as a dataframe.

Parameters

filepathstr: path to pairing probability file
read_table_kwdict, defaults to None: This argument is ignored.

Returns

pandas.DataFrame: the pairing probability data

class rnavigate.data.interactions.RINGMaP(input_data, sequence=None, metric='Statistic', metric_defaults=None, read_table_kw=None, window=1, name=None)

Bases: Interactions

A class for storing and manipulating RINGMaP data.

Parameters

input_datastring or pandas.DataFrame

If string, a path to a file containing RINGMaP data. If dataframe, the dataframe containing RINGMaP data. The dataframe must contain columns “i”, “j”, “Statistic”, and “Zij”. Dataframe may also include other columns.

sequencestring or rnavigate.data.Sequence

The sequence string corresponding to the RINGMaP data.

metricstring, defaults to “Statistic”

The column name to use for visualization.

metric_defaultsdict

Keys are metric names and values are dictionaries of metric-specific defaults. These defaults include:

“metric_column”string
the column name to use for visualization

“cmap”string or matplotlib.colors.Colormap)
the colormap to use for visualization

“normalization”“min_max”, “0_1”, “none”, or “bins”
The type of normalization to use when mapping values to colors

“values”list of float
The values to used with normalization of the data

“title”string
the title to use for colorbars

“extend”“min”, “max”, “both”, or “neither”
Which ends to extend when drawing the colorbar.

“tick_labels” : list of string

read_table_kwdict, optional

kwargs passed to pandas.read_table() when reading input_data.

windowint, defaults to 1

The window size used to generate the RINGMaP data. If an input file is provided, this value is overwritten by the value in the header.

namestr, optional

A name for the interactions object.

Attributes

datapandas.DataFrame: The RINGMaP data.

data_specific_filter(positive_only=False, negative_only=False, **kwargs)

Adds filters for “Sign” column to parent filter() function

Parameters

positive_onlybool, defaults to False: If True, only keep positive correlations.
negative_onlybool, defaults to False: If True, only keep negative correlations.

Returns

kwargsdict: any additional keyword-argument pairs are returned
masknumpy array: a boolean array of the same length as self.data

get_sorted_data()

Sorts on the product of self.metric and “Sign” columns.

Except when self.metric is “Distance”.

Returns

pandas.DataFrame: a copy of the data sorted by (self.metric * “Sign”) columns

read_file(filepath, read_table_kw=None)

Parses a RINGMaP correlations file and stores data as a dataframe.

Also sets self.window (usually 1, from header).

Parameters

filepathstr: path to correlations file.
read_table_kwdict, defaults to {}: kwargs passed to pandas.read_table().

Returns

pandas.DataFrame: the RINGMaP data

class rnavigate.data.interactions.SHAPEJuMP(input_data, sequence=None, metric='Percentile', metric_defaults=None, read_table_kw=None, window=1, name=None)

Bases: Interactions

A class for storing and manipulating SHAPEJuMP data.

Parameters

input_datastring or pandas.DataFrame

If string, a path to a file containing SHAPEJuMP data. If dataframe, the dataframe containing SHAPEJuMP data. The dataframe must contain columns “i”, “j”, “Metric” (JuMP rate) and “Percentile” (percentile ranking). Dataframe may also include other columns.

sequencestring or rnavigate.data.Sequence

The sequence string corresponding to the SHAPEJuMP data.

metricstring, defaults to “Percentile”

The column name to use for visualization.

metric_defaultsdict

Keys are metric names and values are dictionaries of metric-specific defaults. These defaults include:

“metric_column”string
the column name to use for visualization

“cmap”string or matplotlib.colors.Colormap)
the colormap to use for visualization

“normalization”“min_max”, “0_1”, “none”, or “bins”
The type of normalization to use when mapping values to colors

“values”list of float
The values to used with normalization of the data

“title”string
the title to use for colorbars

“extend”“min”, “max”, “both”, or “neither”
Which ends to extend when drawing the colorbar.

“tick_labels” : list of string

read_table_kwdict

kwargs passed to pandas.read_table() when reading input_data.

windowint

The window size used to generate the SHAPEJuMP data.

namestr

A name for the interactions object.

Attributes

datapandas.DataFrame: The SHAPEJuMP data.

read_file(input_data, read_table_kw=None)

Parses a deletions.txt file and stores it as a dataframe.

Also calculates a “Percentile” column.

Parameters

input_datastr: path to deletions.txt file
read_table_kwdict, defaults to {}: kwargs passed to pandas.read_table().

Returns

pandas.DataFrame: the SHAPEJuMP data

class rnavigate.data.interactions.StructureAsInteractions(input_data, sequence, metric=None, metric_defaults=None, window=1, name=None)

Bases: Interactions

A class for storing and manipulating structure data.

Parameters

input_datastring or pandas.DataFrame

If string, a path to a file containing structure data. If dataframe, the dataframe containing structure data. The dataframe must contain columns “i”, “j”, and “Structure”. Dataframe may also include other columns.

sequencestring or rnavigate.data.Sequence

The sequence string corresponding to the structure data.

metricstring, defaults to “Structure”

The column name to use for visualization.

metric_defaultsdict

Keys are metric names and values are dictionaries of metric-specific defaults. These defaults include:

“metric_column”string
the column name to use for visualization

“cmap”string or matplotlib.colors.Colormap
the colormap to use for visualization

“normalization”“min_max”, “0_1”, “none”, or “bins”
The type of normalization to use when mapping values to colors

“values”list of float
The values to used with normalization of the data

“title”string
the title to use for colorbars

“extend”“min”, “max”, “both”, or “neither”
Which ends to extend when drawing the colorbar.

“tick_labels” : list of string

read_table_kwdict, optional

kwargs passed to pandas.read_table() when reading input_data.

windowint, defaults to 1

The window size used to generate the structure data.

namestr, optional

A name for the StructureAsInteractions object.

Attributes

datapandas.DataFrame: The structure data.

class rnavigate.data.interactions.StructureCompareMany(input_data, sequence, metric=None, metric_defaults=None, window=1, name=None)

Bases: Interactions

A class for storing and manipulating a comparison of many structures.

Parameters

input_datastring or pandas.DataFrame

If string, a path to a file containing structure data. If dataframe, the dataframe containing structure data. The dataframe must contain columns “i”, “j”, and “Structure”. Dataframe may also include other columns.

sequencestring or rnavigate.data.Sequence

The sequence string corresponding to the structure data.

metricstring, defaults to “Structure”

The column name to use for visualization.

metric_defaultsdict

Keys are metric names and values are dictionaries of metric-specific defaults. These defaults include:

“metric_column”string
the column name to use for visualization

“cmap”string or matplotlib.colors.Colormap
the colormap to use for visualization

“normalization”“min_max”, “0_1”, “none”, or “bins”
The type of normalization to use when mapping values to colors

“values”list of float
The values to used with normalization of the data

“title”string
the title to use for colorbars

“extend”“min”, “max”, “both”, or “neither”
Which ends to extend when drawing the colorbar.

“tick_labels” : list of string

read_table_kwdict, optional

kwargs passed to pandas.read_table() when reading input_data.

windowint, defaults to 1

The window size used to generate the structure data.

namestr, optional

A name for the StructureAsInteractions object.

Attributes

datapandas.DataFrame: The structure data.

class rnavigate.data.interactions.StructureCompareTwo(input_data, sequence, metric=None, metric_defaults=None, window=1, name=None)

Bases: Interactions

A class for storing and manipulating a comparison of two structures.

Parameters

input_datastring or pandas.DataFrame

If string, a path to a file containing structure data. If dataframe, the dataframe containing structure data. The dataframe must contain columns “i”, “j”, and “Structure”. Dataframe may also include other columns.

sequencestring or rnavigate.data.Sequence

The sequence string corresponding to the structure data.

metricstring, defaults to “Structure”

The column name to use for visualization.

metric_defaultsdict

Keys are metric names and values are dictionaries of metric-specific defaults. These defaults include:

“metric_column”string
the column name to use for visualization

“cmap”string or matplotlib.colors.Colormap
the colormap to use for visualization

“normalization”“min_max”, “0_1”, “none”, or “bins”
The type of normalization to use when mapping values to colors

“values”list of float
The values to used with normalization of the data

“title”string
the title to use for colorbars

“extend”“min”, “max”, “both”, or “neither”
Which ends to extend when drawing the colorbar.

“tick_labels” : list of string

read_table_kwdict, optional

kwargs passed to pandas.read_table() when reading input_data.

windowint, defaults to 1

The window size used to generate the structure data.

namestr, optional

A name for the StructureAsInteractions object.

Attributes

datapandas.DataFrame: The structure data.

rnavigate.data.pdb module

The PDB object to represent tertiary structures with atomic coordinates.

This data can be used to filter interactions by 3D distance, and to visualize profile and interactions data on interactive 3D structures.

class rnavigate.data.pdb.PDB(input_data, chain, sequence=None, name=None)

Bases: Sequence

A class to represent RNA tertiary structures with atomic coordinates.

This data can be used to filter interactions by 3D distance, and to visualize profile and interactions data on interactive 3D structures.

Parameters

input_datastr: path to a PDB or CIF file
chainstr: chain identifier of RNA of interest
sequencernavigate.Sequence or str, optional: A sequence to use as the reference sequence. This is required if the sequence cannot be found in the header Defaults to None.
namestr, optional: A name for the data set. Defaults to None.

Attributes

sequencestr: The RNA sequence
lengthint: The length of the RNA sequence
namestr: A name for the data set
pathstr: The path to the PDB or CIF file
chainstr: The chain identifier of the RNA of interest
offsetint: The offset between the sequence positions and the PDB residue indices
pdbBio.PDB.Structure.Structure: The PDB structure
pdb_idxnp.array: The PDB indices of the RNA
pdb_seqnp.array: The PDB sequence of the RNA
distance_matrixdict: A dictionary of distance matrices for each atom type

get_distance(i, j, atom="O2'")

Get the distance between given atom in nucleotides i and j (1-indexed).

Parameters

iint: The first nucleotide
jint: The second nucleotide
atomstring or dict, defaults to “O2’”: The atom to use for distance calculations. If a string, the same atom will be used for all residues. If a dict, the atom will be chosen based on the nucleotide type. If “DMS”, the N1 atom will be used for A and G, and the N3 atom will be used for U and C.

Returns

distancefloat: The distance between the atoms

get_distance_matrix(atom="O2'")

Get the pairwise atomic distance matrix for all residues.

Parameters

atomstring or dict, defaults to “O2’”: The atom to use for distance calculations. If a string, the same atom will be used for all residues. If a dict, the atom will be chosen based on the nucleotide type. If “DMS”, the N1 atom will be used for A and G, and the N3 atom will be used for U and C.

Returns

matrixNxN numpy.ndarray: A 2D array of pairwise distances. N is the length of the RNA.

get_pdb_idx(seq_idx): Return the PDB index given the sequence index (0-indexed).

get_seq_idx(pdb_idx): Return the sequence index given the PDB index.

get_sequence(pdb)

Find the sequence in the provided CIF or PDB file.

Parameters

pdbstr: path to a PDB or CIF file

Returns

sequencestring: The RNA sequence

get_sequence_from_seqres(seqres)

Used by get_sequence to parse the SEQRES entries.

Parameters

seqreslist: A list of SEQRES entries for the RNA chain of interest

Returns

sequencestring: The RNA sequence

get_xyz_coord(nt, atom)

Return the x, y, and z coordinates for a given residue and atom.

Parameters

ntint: The nucleotide of interest (1-indexed)
atomstring or dict, defaults to “O2’”: The atom to use for distance calculations. If a string, the same atom will be used for all residues. If a dict, the atom will be chosen based on the nucleotide type. If “DMS”, the N1 atom will be used for A and G, and the N3 atom will be used for U and C.

Returns

xyzlist: A list of x, y, and z coordinates

is_valid_idx(pdb_idx=None, seq_idx=None)

Determines if a PDB or sequence index is in the PDB structure.

Parameters

pdb_idxint, optional: A PDB index (1-indexed). Defaults to None.
seq_idxint, optional: A sequence index (1-indexed). Defaults to None.

Returns

bool: True if the index is in the PDB structure, False otherwise.

read_pdb(pdb)

Read a PDB or CIF file into the data structure.

Parameters

pdbstr: path to a PDB or CIF file

set_indices(): Uses self.data and self.sequence to set self.offset

rnavigate.data.profile module

class rnavigate.data.profile.DanceMaP(input_data, component, read_table_kw=None, sequence=None, metric='Norm_profile', metric_defaults=None, name=None)

Bases: SHAPEMaP

A class to represent per-nucleotide DanceMaP data.

Parameters

input_datastr or pandas.DataFrame: path to a DanceMapper reactivities.txt file or a pandas DataFrame
componentint: Which component of the DanceMapper ensemble to read in (0-indexed).
read_table_kwdict, optional: Keyword arguments to pass to pandas.read_table. These are not necessary for reactivities.txt files. Defaults to None.
sequencernavigate.Sequence or str, optional: A sequence to use as the reference sequence. This is not necessary for reactivities.txt files. Defaults to None.
metricstr, defaults to “Norm_profile”: The name of the set of value-to-color options to use.

read_file(input_data, read_table_kw={})

Convert data file to pandas dataframe and store as self.data

Parameters

filepathstring: path to data file containing interactions
read_table_kwdict: kwargs dictionary passed to pd.read_table

Returns

dataframepandas.DataFrame: the data table

property recreation_kwargs: A dictionary of keyword arguments to pass when recreating the object.

class rnavigate.data.profile.DeltaProfile(profile1, profile2, metric=None, metric_defaults=None, name=None)

Bases: Profile

A class to represent the difference between two profiles.

Parameters

profile1Profile: The first profile to compare.
profile2Profile: The second profile to compare.
metricstr, optional: The name of the metric to use. Defaults to the metric of profile1.
metric_defaultsdict, optional: Keys are metric names, to be used with metric. Values are dictionaries of plotting parameters. Defaults to None.
namestr, optional: A name for the data set. Defaults to None.

class rnavigate.data.profile.Profile(input_data, metric='default', metric_defaults=None, read_table_kw=None, sequence=None, name=None)

Bases: Data

A class to represent per-nucleotide data.

Parameters

input_datastr or pandas.DataFrame

path to a csv or tab file or a pandas DataFrame Table must be 1 row for each nucleotide in the sequence. table columns must contain these columns:

A nucleotide position column labelled “Nucleotide” A sequence column labelled “Sequence” with 1 of (A, C, G, U, T) per row

These will be added to the table if sequence is provided.

A data measurement column labelled “Profile” with a float or integer
Label may be another name if specified in metric_defaults

Optionally: A measurement error column.
Label must be specified in metric_defaults

Other columns may be present, and set up using metric_defaults.
See metric_defaults for more information.

read_table_kwdict, optional

Keyword arguments to pass to pandas.read_table. Defaults to None.

sequencernavigate.Sequence or str, optional

A sequence to use as the reference sequence. This is required if input_data does not contain a “Sequence” column. Defaults to None.

metricstr, defaults to “default”

The name of the set of value-to-color options to use. “default” specifies:

“Profile” column is used No error rates are present Values are normalized to the range [0, 1] Values are mapped to colors using the “viridis” colormap

“Distance” specifies:: (3-D) “Distance” column is used No error rates are present Values in the range [5, 50] are normalized to the range [0, 1] Values are mapped to colors using the “cool” colormap

Other options may be defined in metric_defaults.

metric_defaultsdict, optional

Keys are metric names, to be used with metric. Values are dictionaries of plotting parameters:

“metric_column”str
The name of the column to use as the metric. Plots and analyses that use per-nucleotide data will use this column. If “color_column” is not provided, this column also defines colors.

“error_column”str or None
The name of the column to use as the error. If None, no error bars are plotted.

“color_column”str or None
The name of the column to use for coloring. If None, colors are defined by “metric_column”.

“cmap”str or list
The name of the colormap to use. If a list, the list of colors to use.

“normalization”str
The type of normalization to use. In order to be used with colormaps, values are normalized to either be integers for categorical colormaps, or floats in the range [0, 1] for continuous colormaps. “none” : no normalization is performed “min_max” : values are scaled to floats in the range [0, 1] based on

the upper and lower bounds defined in “values”

“0_1”values are scaled to floats in the range [0, 1] based on
the minimum and maximum values in the data

“bins”values are scaled an integer based on bins defined by the
list of bounds defined in “values”

“percentiles”values are scaled to floats in the range [0, 1]
based on upper and lower percentile bounds defined by “values”

“values”list or None
The values to use when normalizing the data. if “normalization” is “min_max”, this should be a list of two values

defining the upper and lower bounds.

if “normalization” is “bins”, this should be a list of values
of length 1 less than the length of cmap. example: [5, 10, 20] defines 4 bins:

(-infinity, 5), [5, 10), [10, 20), [20, infinity)

if “normalization” is “percentiles”, this should be a list of two
values defining the upper and lower percentile bounds.

if “normalization” is “0_1” or “none”, this should be None.

“title”str, defaults to “”
The title of the colorbar.

“ticks”list, defaults to None
The tick locations to use for the colorbar. If None, values are determined automatically.

“tick_labels”list, defaults to None
The labels to use for the colorbar ticks. If None, values are determined automatically from “ticks”.

“extend”“neither”, “both”, “min”, or “max”, defaults to “neither”
Which ends of the colorbar to extend (places an arrow head).

Defaults to None.

namestr, optional

A name for the data set. Defaults to None.

Attributes

datapandas.DataFrame: The data table

calculate_gini_index(values): Calculate the Gini index of an array of values.

calculate_windows(column, window, method='median', new_name=None, minimum_points=None, mask_na=True)

calculates a windowed operation over a column of data.

Result is stored in a new column. Value of each window is assigned to the center position of the window.

Parameters

columnstr: name of column to perform operation on
windowint: window size, must be an odd number
methodstring or function, defaults to “median”: operation to perform over windows. if string, must be “median”, “mean”, “minimum”, or “maximum” if function, must take a 1D numpy array as input and return a scalar
new_namestr, defaults to f”{method}_{window}_nt”: name of new column for stored result.
minimum_pointsint, defaults to value of window: minimum number of points within each window.
mask_nabool, defaults to True: whether to mask the result of the operation where the original column has a nan value.

copy(): Returns a copy of the Profile.

classmethod from_array(input_data, sequence, **kwargs)

Construct a Profile object from an array of values.

Parameters

input_datalist or np.array: A list or array of values to use as the metric.
sequencestr: The RNA sequence.
**kwargs: Additional keyword arguments to pass to the Profile constructor.

Returns

Profile: A Profile object with the provided values.

get_aligned_data(alignment)

Returns a new Profile object with the data aligned to a sequence.

Parameters

alignmentrnavigate.data.SequenceAlignment: The alignment to use to map rows of self.data to a new sequence.

Returns

Profile: A new Profile object with the data aligned to the sequence in the alignment.

get_plotting_dataframe()

Returns a dataframe with the data to be plotted.

Returns

pandas.DataFrame: A dataframe with the columns “Nucleotide”, “Values”, “Errors”, and “Colors”.

norm_boxplot(values)

removes outliers (> 1.5 * IQR) and scales the mean to 1.

NOTE: This method varies slightly from normalization method used in the SHAPEMapper pipeline. Shapemapper sets undefined values to 0, and then uses these values when computing iqr and 90th percentile. Including these values can skew these result. This method excludes such nan values. Other elements are the same.

Parameters

values1D numpy array: values to normalize

Returns

(float, float): scaling factor and error propagation factor

norm_eDMS(values)

Calculates norm factors following eDMS pernt scheme in ShapeMapper 2.2

Parameters

values1D numpy array: values to normalize

Returns

(float, float): scaling factor and error propagation factor

norm_percentiles(values, lower_bound=90, upper_bound=99, median_or_mean='mean')

Calculates factors to scale the median between percentile bounds to 1.

Parameters

values1D numpy array: values to normalize
lower_boundint or float, optional: percentile of lower bound, Defaults to 90
upper_boundint or float, optional: percentile of upper bound, Defaults to 99
median_or_meanstring, optional: whether to use the median or mean of the values between the bounds.

Returns

(float, float): scaling factor and error propagation factor

normalize(profile_column=None, new_profile=None, error_column=None, new_error=None, norm_method='boxplot', nt_groups=None, profile_factors=None, **norm_kwargs)

Normalize values in a column, and store in a new column.

By default, performs ShapeMapper2 boxplot normalization on self.metric and stores the result as “Norm_profile”.

Parameters

profile_columnstring, defaults to self.metric

column name of values to normalize

new_profilestring, defaults to “Norm_profile”

column name of new normalized values

error_columnstring, defaults to self.error_column

column name of error values to propagate

new_errorstring, defaults to “Norm_error”

column name of new propagated error values

norm_methodstring, defaults to “boxplot”

normalization method to use. “DMS” uses self.norm_percentile and nt_groups=[‘AC’, ‘UG’]

scales the median of 90th to 95th percentiles to 1 As and Cs are normalized seperately from Us and Gs

“eDMS” uses self.norm_eDMS and nt_groups=[‘A’, ‘U’, ‘C’, ‘G’]: Applies the new eDMS-MaP normalization. Each nucleotide is normalized seperately.
“boxplot” uses self.norm_boxplot and nt_groups=[‘AUCG’]: removes outliers (> 1.5 iqr) and scales median to 1 scales nucleotides together unless specified with nt_groups
“percentile” uses self.norm_percentile and nt_groups=[‘AUCG’]: scales the median of 90th to 95th percentiles to 1 scales nucleotides together unless specified with nt_groups

Defaults to “boxplot”: the default normalization of ShapeMapper

nt_groupslist of strings, defaults to None

A list of nucleotides to group e.g. [‘AUCG’] groups all nts together

[‘AC’, ‘UG’] groups As with Cs and Us with Gs [‘A’, ‘C’, ‘U’, ‘G’] scales each nt seperately

Default depends on norm_method

profile_factorsdictionary, defaults to None

a scaling factor (float) for each nucleotide. keys must be:: ‘A’, ‘C’, ‘U’, ‘G’

Note: using this argument overrides any calculation of scaling Defaults to None

**norm_kwargs

these are passed to the norm_method function

Returns

profile_factorsdict: the new profile scaling factors dictionary

normalize_external(profiles, **kwargs)

normalize reactivities using other profiles to normfactors.

Parameters

profileslist of rnavigate.data.Profile: a list of other profiles used to compute scaling factors

Returns

profile_factorsdict: the new profile scaling factors dictionary

normalize_sequence(t_or_u='U', uppercase=True)

Changes the values in self.data[“Sequence”] to the normalized sequence.

Parameters

t_or_u“T” or “U”, Defaults to “U”.: Whether to replace T with U or U with T.
uppercasebool, Defaults to True.: Whether to convert the sequence to uppercase.

property recreation_kwargs: A dictionary of keyword arguments to pass when recreating the object.

winsorize(column, lower_bound=None, upper_bound=None)

Winsorize the data between bounds.

If either bound is set to None, one-sided Winsorization is performed.

Parameters

columnstring: the column of data to be winsorized
lower_boundNumber or None, defaults to None: Data below this value is set to this value. If None, no lower bound is applied.
upper_boundNumber or None, defaults to None: Data above this value is set to this value. If None, no upper bound is applied.

class rnavigate.data.profile.RNPMaP(input_data, read_table_kw=None, sequence=None, metric='NormedP', metric_defaults=None, name=None)

Bases: Profile

Represents per-nucleotide RNPMaP data.

Parameters

input_datastr or pandas.DataFrame: path to an RNAModMapper reactivities.txt file or a pandas DataFrame
read_table_kwdict, optional: Keyword arguments to pass to pandas.read_table. These are not necessary for reactivities.txt files. Defaults to None.
sequencernavigate.Sequence or str, optional: A sequence to use as the reference sequence. This is not necessary for reactivities.txt files. Defaults to None.
metricstr, defaults to “NormedP”: The name of the set of value-to-color options to use.
metric_defaultsdict, optional: Keys are metric names, to be used with metric. Values are dictionaries of plotting parameters. Defaults to None.
namestr, optional: A name for the data set. Defaults to None.

class rnavigate.data.profile.SHAPEMaP(input_data, normalize=None, read_table_kw=None, sequence=None, metric='Norm_profile', metric_defaults=None, log=None, name=None)

Bases: Profile

A class to represent per-nucleotide SHAPE-MaP data.

Parameters

input_datastr or pandas.DataFrame

path to a ShapeMapper2 profile.txt or .map file or a pandas DataFrame

normalize“DMS”, “eDMS”, “boxplot”, “percentiles”, or None, defaults to None

The normalization method to use. “DMS” uses self.norm_percentile and nt_groups=[‘AC’, ‘UG’]

scales the median of 90th to 95th percentiles to 1 As and Cs are normalized seperately from Us and Gs

“eDMS” uses self.norm_eDMS and nt_groups=[‘A’, ‘U’, ‘C’, ‘G’]: Applies the new eDMS-MaP normalization. Each nucleotide is normalized seperately.
“boxplot” uses self.norm_boxplot and nt_groups=[‘AUCG’]: removes outliers (> 1.5 iqr) and scales median to 1 scales nucleotides together unless specified with nt_groups
“percentiles” uses self.norm_percentile and nt_groups=[‘AUCG’]: scales the median of 90th to 95th percentiles to 1 scales nucleotides together unless specified with nt_groups

Defaults to None: no normalization is performed

read_table_kwdict, optional

Keyword arguments to pass to pandas.read_table. These are not necessary for profile.txt and .map files. Defaults to None.

sequencernavigate.Sequence or str, optional

A sequence to use as the reference sequence. This is not necessary for profile.txt and .map files. Defaults to None.

metricstr, defaults to “Norm_profile”

The name of the set of value-to-color options to use. “Norm_profile” specifies:

“Norm_profile” column is used “Norm_stderr” column is used for error bars Values are normalized to bins:

(-inf, -0.4), [-0.4, 0.4), [0.4, 0.85), [0.85, 2), [2, inf)

Bins are mapped to “grey”, “black”, “orange”, “red”, “red”

Other options may be defined in metric_defaults.

metric_defaultsdict, optional

Keys are metric names, to be used with metric. Values are dictionaries of plotting parameters:

“metric_column”str
The name of the column to use as the metric. Plots and analyses that use per-nucleotide data will use this column. If “color_column” is not provided, this column also defines colors.

“error_column”str or None
The name of the column to use as the error. If None, no error bars are plotted.

“color_column”str or None
The name of the column to use for coloring. If None, colors are defined by “metric_column”.

“cmap”str or list
The name of the colormap to use. If a list, the list of colors to use.

“normalization”str
The type of normalization to use. In order to be used with colormaps, values are normalized to either be integers for categorical colormaps, or floats in the range [0, 1] for continuous colormaps. “none” : no normalization is performed “min_max” : values are scaled to floats in the range [0, 1] based on

the upper and lower bounds defined in “values”

“0_1”values are scaled to floats in the range [0, 1] based on
the minimum and maximum values in the data

“bins”values are scaled an integer based on bins defined by the
list of bounds defined in “values”

“percentiles”values are scaled to floats in the range [0, 1]
based on upper and lower percentile bounds defined by “values”

“values”list or None
The values to use when normalizing the data. if “normalization” is “min_max”, this should be a list of two values

defining the upper and lower bounds.

if “normalization” is “bins”, this should be a list of values
of length 1 less than the length of cmap. example: [5, 10, 20] defines 4 bins:

(-infinity, 5), [5, 10), [10, 20), [20, infinity)

if “normalization” is “percentiles”, this should be a list of two
values defining the upper and lower percentile bounds.

if “normalization” is “0_1” or “none”, this should be None.

“title”str, defaults to “”
The title of the colorbar.

“ticks”list, defaults to None
The tick locations to use for the colorbar. If None, values are determined automatically.

“tick_labels”list, defaults to None
The labels to use for the colorbar ticks. If None, values are determined automatically from “ticks”.

“extend”“neither”, “both”, “min”, or “max”, defaults to “neither”
Which ends of the colorbar to extend (places an arrow head).

Defaults to None.

logstr, optional

Path to a ShapeMapper v2 shapemap_log.txt file with mutations-per-molecule and read-length histograms. These will be present if the –per-read-histogram flag was used when running ShapeMapper v2. Currently, this is not working with ShapeMapper v2.2 files. Defaults to None.

namestr, optional

A name for the data set. Defaults to None.

Attributes

datapandas.DataFrame: The data table

classmethod from_rnaframework(input_data, normalize=None)

Construct a SHAPEMaP object from an RNAFramework output file.

Parameters

input_datastr

path to an RNAFramework .xml reactivities file

normalize“DMS”, “eDMS”, “boxplot”, “percentiles”, or None, defaults to None

The normalization method to use. “DMS” uses self.norm_percentile and nt_groups=[‘AC’, ‘UG’]

scales the median of 90th to 95th percentiles to 1 As and Cs are normalized seperately from Us and Gs

“eDMS” uses self.norm_eDMS and nt_groups=[‘A’, ‘U’, ‘C’, ‘G’]: Applies the new eDMS-MaP normalization. Each nucleotide is normalized seperately.
“boxplot” uses self.norm_boxplot and nt_groups=[‘AUCG’]: removes outliers (> 1.5 iqr) and scales median to 1 scales nucleotides together unless specified with nt_groups
“percentiles” uses self.norm_percentile and nt_groups=[‘AUCG’]: scales the median of 90th to 95th percentiles to 1 scales nucleotides together unless specified with nt_groups

Defaults to None: no normalization is performed

Returns

SHAPEMaP: A SHAPEMaP object with the provided values.

read_log(log)

Read the ShapeMapper log file.

Parameters

logstr: Path to a ShapeMapper v2 shapemap_log.txt file with mutations-per-molecule and read-length histograms.

Returns

read_lengthspandas.DataFrame: A dataframe with the columns “Read_length”, “Modified_read_length”, and “Untreated_read_length”.
mutations_per_moleculepandas.DataFrame: A dataframe with the columns “Mutation_count”, “Modified_mutations_per_molecule”, and “Untreated_mutations_per_molecule”.

write_bpp2seq_file(output_file)

Write the data to a ShapeMapper2 .bpp2seq file (for Contra/EternaFold).

Parameters

output_filestr: The path to write the output file.

write_shape_file(output_file)

Write the data to a ShapeMapper2 .shape file (for RNAstructure programs).

Parameters

output_filestr: The path to write the output file.

rnavigate.data.secondary_structure module

class rnavigate.data.secondary_structure.SecondaryStructure(input_data, extension=None, autoscale=True, name=None, **kwargs)

Bases: Sequence

Base class for secondary structures.

Parameters

input_datastr or pandas.DataFrame

A dataframe or filepath containing a secondary structure DataFrame should contain these columns:

[“Nucleotide”, “Sequence”, “Pair”]

“Pair” column must be redundant. Filepath parsing is determined by file extension:

varna, xrna, nsd, cte, ct, dbn, bracket, json (R2DT), forna

extensionstr, optional

The file extension of the input_data file. If not provided, the extension will be inferred from the input_data filepath.

autoscalebool, optional

Whether to automatically scale the x and y coordinates. Defaults to True.

namestr, optional

The name of the RNA sequence. Defaults to None.

Attributes

datapandas.DataFrame: DataFrame storing base-pairs
filepathstr: The path to the input file, if provided, otherwise “dataframe”
sequencestr: The RNA sequence
ntsnumpy.array: The “Nucleotide” column of data
pair_ntsnumpy.array: The “Pair” column of data
headerstr: Header information from CT file
xcoordinatesnumpy.array: The “X_coordinate” column of data
ycoordinatesnumpy.array: The “X_coordinate” column of data
distance_matrixnumpy.array: The contact distance matrix of the RNA structure

add_pairs(pairs, break_conflicting_pairs=False)

Add base pairs to current secondary structure.

Parameters

pairslist: 1-indexed list of paired residues. e.g. [(1, 20), (2, 19)]
break_conflicting_pairsbool, defaults to False: Whether to break existing pairs if there is a conflict

as_interactions(structure2=None)

Returns rnavigate.Interactions representation of this, or more, structures.

Parameters

structure2SecondaryStructure or list of these, defaults to None: If provided, basepairs from all structures are included and labeled by which structures contain them and how many structures contain them.

property boolean: Return a boolean array of paired and unpaired nucleotides.

break_noncanonical_pairs()

Removes non-canonical basepairs from the secondary structure.

WARNING: this deletes information.

break_pairs_nts(nt_positions)

break base pairs at the given list of positions.

WARNING: this deletes information.

Parameters

nt_positionslist of int: 1-indexed positions to break pairs

break_pairs_region(start, end, break_crossing=True, inverse=False)

Removes pairs from the specified region (1-indexed, inclusive).

WARNING: this deletes information

Parameters

startint: start position (1-indexed, inclusive)
endint: end position (1-indexed, inclusive)
break_crossingbool, defaults to True: Whether to keep pairs that cross over the specified region
inversebool, defaults to False: Invert the behavior, i.e. remove pairs that are not in this region

break_singleton_pairs()

Removes singleton basepairs from the secondary structure.

WARNING: This deletes information.

compute_ppv_sens(structure2, exact=True)

Compute the PPV and sensitivity between this and another structure.

True and False are determined from this structure. Positive and Negative are determined from structure2.

PPV = TP / (TP + FP) Sensitivity = TP / (TP + FN)

Parameters

structure2SecondaryStructure: The SecondaryStructure to compare to.
exactbool, defaults to True: True requires BPs to be exactly correct. False allows +/-1 bp slippage.

Returns

float: sensitivity
float: PPV
2-tuple of floats: (TP, TP+FP, TP+FN)

contact_distance(i, j): Returns the contact distance between positions i and j

copy()

fill_mismatches(mismatch=1)

Adds base pairs to fill 1,1 and optionally 2,2 mismatches.

Parameters

mismatchint, defaults to 1: 1 will fill only 1,1 mismatches 2 will fill 1,1 and 2,2 mismatches

classmethod from_pairs_list(input_data, sequence)

Creates a SecondaryStructure from a list of pairs and a sequence.

Parameters

input_datalist: 1-indexed list of base pairs. e.g. [(1, 20), (2, 19)]
sequencestr: The RNA sequence. e.g., “AUCGUGUCAUGCUA”

classmethod from_sequence(input_data)

Creates a SecondaryStructure from a sequence string.

This structure is initialized with no base pairs. If base pairs are needed, use SecondaryStructure.from_pairs_list().

get_aligned_data(alignment)

Returns a new SecondaryStructure object matching the alignment target.

Parameters

alignmentdata.Alignment: An alignment object used to map values

get_distance_matrix(recalculate=False, max_cd=50)

Get a matrix of pair-wise shortest path distances through the structure.

This function uses a BFS algorithm. The structure is represented as a complete graph with nucleotides as vertices and base-pairs and backbone as edges. All edges are length 1. Matrix is stored as an attribute for future use.

If the attribute is set (not None) and recalculate is False, the attribute will be returned.

Based on Tom’s contact_distance, but expanded to return the pairwise matrix. New contact_distance method added to return the distance between two positions.

By default, the maximum contact distance is set to 50. This will be the maximum value reported in the matrix, i.e. a value of 50 in the matrix means >= 50. This prevents the algorithm from running for a very long time on long RNAs. If you need a larger value, set max_cd to a higher value.

Parameters

recalculatebool, defaults to False: Set to True to recalculate the matrix even if the attribute is set.
max_cdint, defaults to 50: The maximum contact distance to calculate.

get_dotbracket()

Get a dotbracket notation string representing the secondary structure.

Pseudoknot levels:: 1: () 2: [] 3: {} 4: <> 5: Aa 6: Bb 7: Cc etc…

Returns

str: A dot-bracket representation of the secondary structure

get_helices(fill_mismatches=True, split_bulge=True, keep_singles=False)

Get a dictionary of helices from the secondary structure.

Keys are equivalent to list indices. Values are lists of paired nucleotides (1-indexed) in that helix. e.g. {0:[(1,50),(2,49),(3,48)}

Parameters

fill_mismatchesbool, defaults to True: Whether 1-1 and 2-2 bulges are replaced with base pairs
split_bulgebool, defaults to True: Whether to split helices on bulges
keep_singlesbool, defaults to False: Whether to return helices that contain only 1 base-pair

Returns

dict: A dictionary of helices

get_human_dotbracket()

Get a human-readable dotbracket string representing the secondary structure.

This is an experimental format designed to be more human readable, i.e. no counting of brackets required.

Letters, instead of brackets, are used to denote nested base pairs.
Each helix is assigned a letter, which is incremented one letter alphabetically from the nearest enclosing stem.
Non-nested helices (pseudoknots) are assigned canonical brackets.

From this canonical dbn string:

how many bases are in the base stem? how many nested helices are there? ((((….(((.[[..)))))(((…(((..]].))))))))

Same question, new format:

AABB….CCC.[[..cccbbBBB…CCC..]].cccbbbaa

Read this as:

((_______________________________________)) (level 1 = A)

((_______________))(((______________))) (level 2 = B)

(((_____))) (((_____))) (level 3 = C): [[__________________]] (pseudoknot = [])

Pseudoknot levels:

1: Aa, Bb, Cc, etc. 2: [], 3: {}, 4: <>

get_interactions_df()

Returns a DataFrame of i, j basepairs.

Returns

pandas.DataFrame

A DataFrame with columns:: i: the 5’ (1-indexed) position of the base pair j: the 3’ (1-indexed) position of the base pair Structure: always 1

get_junction_nts()

Get a list of junction nucleotides (paired, but at the end of a chain).

Returns

list: A list of 1-indexed positions of junction nucleotides

get_nonredundant_ct()

Returns the ct attribute in a non-redundant form.

Only returns pairs in which i < j For example:

self.ct[i-1] == j self.ct[j-1] == i BUT self.get_nonredundant_ct()[j-1] == 0

Returns

numpy.array: A non-redundant array of base pairs

get_paired_nts()

Get a list of residues that are paired.

Returns

list: A list of 1-indexed positions of paired nucleotides

get_pairs()

Get a non-redundant list of base pairs i < j as a array of tuples.

Returns

list: A list of 1-indexed positions. e.g., [(1, 50), (2, 49), …]

get_pseudoknots(fill_mismatches=True)

Get the pk1 and pk2 pairs from the secondary structure.

Ignores single base pairs. PK1 is defined as the helix crossing the most other bps. If there is a tie, the most 5’ helix is called pk1 returns pk1 and pk2 as a list of base pairs e.g [(1,10),(2,9)…

Parameters

fill_mismatchesbool, defaults to True: Whether 1-1 and 2-2 bulges are replaced with base pairs

Returns

list of 2 lists of 2-tuples: A list of base pairs for pk1 and pk2

get_structure_elements()

This code is not yet implemented.

Returns a string with a character for each nucleotide, indicating what kind of structure element it is a part of.

Characters:: Dangling Ends (E) Stems (S) Hairpin Loops (H) Bulges (B) Internal Loops (I) MultiLoops (M) External Loops (X) Pseudoknot (P)

get_unpaired_nts()

Get a list of residues that are unpaired.

Returns

list: A list of 1-indexed positions of unpaired nucleotides

normalize_dtypes(): Convert dtypes of SecondaryStructure dataframe for consistency.

normalize_sequence(t_or_u='U', uppercase=True): Normalize the sequence attribute (fix case and/or U <-> T).

property nts

property pair_nts

read_ct(structure_number=0)

Loads secondary structure information from a given ct file.

Requires a properly formatted header.

Parameters

structure_numberint, defaults to 0: 0-indexed structure number to load from the ct file.

read_cte()

Generates SecondaryStructure object data from a CTE file

Resulting SecondaryStructure object will include nucleotide x and y coordinates and is compatible with plot_ss.

read_dotbracket()

Generates SecondaryStructure object data from a dot-bracket file.

Resulting SecondaryStructure object will include nucleotide x and y coordinates and is compatible with plot_ss.

read_forna()

Generates SecondaryStructure object data from a FORNA JSON file.

Resulting SecondaryStructure object will include nucleotide x and y coordinates and is compatible with plot_ss.

read_nsd(structure_number=0)

Generates SecondaryStructure object data from an NSD file.

Resulting SecondaryStructure object will include nucleotide x and y coordinates and is compatible with plot_ss.

read_r2dt()

Generates SecondaryStructure object data from an R2DT JSON file.

Resulting SecondaryStructure object will include nucleotide x and y coordinates and is compatible with plot_ss.

read_varna()

Generates SecondaryStructure object data from a VARNA file.

Resulting SecondaryStructure object will include nucleotide x and y coordinates and is compatible with plot_ss.

read_xrna()

Generates SecondaryStructure object data from an XRNA file.

Resulting SecondaryStructure object will include nucleotide x and y coordinates and is compatible with plot_ss.

transform_coordinates(flip=None, scale=None, center=None, rotate_degrees=None)

Perform transformations on X and Y structure coordinates.

To acheive vertical and horizontal flip together, rotate 180 degrees.

Parameters

flipstr, optional: “horizontal” or “vertical”
scalefloat, optional: new median distance of basepairs
centertuple of floats, optional: new center x and y coordinate
rotate_degreesfloat, optional: number of degrees to rotate structure

write_ct(out_file): Write structure to a ct file.

write_cte(out_file): Write structure to CTE format for Structure Editor.

write_dbn(rna_name, region='all', out_file=None)

Write the structure to a dot-bracket file.

Parameters

rna_namestr: The name of the RNA sequence
regionlist of 2 integers, optional: The region (start and end positions) of the RNA to write to file. Defaults to “all”.
out_filestr, optional: The name of the output file. If not provided, the dbn file is printed.

write_sto(out_file, name='seq'): Write structure to Stockholm (STO) file to use in infernal searches.

property xcoordinates

property ycoordinates

class rnavigate.data.secondary_structure.SequenceCircle(input_data, gap=30, name=None, **kwargs)

Bases: SecondaryStructure

A circular SecondaryStructure-like representation of RNA sequence.

class rnavigate.data.secondary_structure.StructureCoordinates(x, y, pairs=None)

Bases: object

Helper class to perform structure coordinate transformations

Parameters

xnumpy.array: x coordinates
ynumpy.array: y coordinates
pairslist of pairs, optional: list of base-paired positions required if scaling coordinates

center(x=0, y=0)

Center structure on the given x, y coordinate

Parameters

xint, defaults to 0: x coordinate of structure center
yint, defaults to 0: y coordinate of structure center

flip(horizontal=True)

Flip structure vertically or horizontally.

Parameters

horizontalbool, defaults to True: whether to flip structure horizontally, otherwise vertically

get_center_point()

Get the x, y coordinates for the center of structure.

Returns

float: x coordinate of structure center
float: y coordinate of structure center

rotate(degrees)

Rotate structure on current center point.

Parameters

degreesfloat: number of degrees to rotate structure

scale(median_bp_distance=1.0)

Scale structure such that median base-pair distance is constant.

Parameters

median_bp_distancefloat, defaults to 1.0: New median distance between all base-paired nucleotides.

Module contents

class rnavigate.data.AlignmentChain(*alignments)

Bases: BaseAlignment

Combines a list of alignments into one.

Parameters

alignmentslist of Alignment objects: the alignments to chain together

Attributes

alignmentslist: the constituent alignments
starting_sequencestr: starting sequence of alignments[0]
target_sequencestr: target sequence of alignments[-1]
mappingnumpy.array: an array which maps from starting_sequence to target_sequence. index of starting_sequence is mapping[index] of target sequence

get_inverse_alignment(): Alignments require a method to get the inverted alignment

get_mapping()

combines mappings from each alignment.

Returns

mappingnumpy.array: mapping from initial starting sequence to final target sequence index of starting_sequence is mapping[index] of target sequence

class rnavigate.data.AllPossible(sequence, metric='data', input_data=None, metric_defaults=None, read_table_kw=None, window=1, name=None)

Bases: Interactions

A class for storing and manipulating all possible interactions.

Parameters

sequencestring or rnavigate.data.Sequence

The sequence string corresponding to the pairing probability data.

metricstring, defaults to “Probability”

The column name to use for visualization.

metric_defaultsdict

Keys are metric names and values are dictionaries of metric-specific defaults. These defaults include:

“metric_column”string
the column name to use for visualization

“cmap”string or matplotlib.colors.Colormap
the colormap to use for visualization

“normalization”“min_max”, “0_1”, “none”, or “bins”
The type of normalization to use when mapping values to colors

“values”list of float
The values to used with normalization of the data

“title”string
the title to use for colorbars

“extend”“min”, “max”, “both”, or “neither”
Which ends to extend when drawing the colorbar.

“tick_labels” : list of string

read_table_kwdict, optional

kwargs passed to pandas.read_table() when reading input_data.

windowint, defaults to 1

The window size used to generate the pairing probability data.

namestr, optional

A name for the AllPossible object.

Attributes

datapandas.DataFrame: The pairing probability data.

class rnavigate.data.Annotation(input_data, annotation_type, sequence, name=None, color='blue')

Bases: Sequence

Basic annotation class to store 1D features of an RNA sequence

Each feature type must be a seperate instance. Feature types include:: a group of separted nucleotides (e.g. binding pocket) regions of interest (e.g. coding sequence, Alu elements) sites of interest (e.g. m6A locations) primer binding sites.

Parameters

input_datalist

List will be treated according to annotation_type argument. Expected behaviors for each value of annotation_type: “sites” or “group”: 1-indexed location of sites of interest

example: [1, 10, 20, 30] is four sites, 1, 10, 20, and 30

“spans”: 1-indexed, inclusive locations of spans of interest: example: [[1, 10], [20, 30]] is two spans, 1 to 10 and 20 to 30
“primers”: Similar to spans, but 5’/3’ direction is preserved.: example: [[1, 10], [30, 20]] forward 1 to 10, reverse 30 to 20

annotation_type“group”, “sites”, “spans”, or “primers”

The type of annotation.

sequencestr or pandas.DataFrame

Nucleotide sequence, path to fasta file, or dataframe containing a “Sequence” column.

namestr, defaults to None

Name of annotation.

colormatplotlib color-like, defaults to “blue”

Color to be used for displaying this annotation on plots.

Attributes

datapandas.DataFrame: Stores the list of sites or regions
namestr: The label for this annotation for use on plots
colorvalid matplotlib color: Color to represent annotation on plots
sequencestr: The reference sequence string

property boolean: Return a boolean array of the annotation on the sequence.

classmethod from_boolean_array(values, sequence, annotation_type, name, color='blue', window=1)

Create an Annotation from an array of boolean values.

True values are used to create the Annotation.

Parameters

valueslist of True or False: the boolean array
sequencestring or rnav.data.Sequence: the sequence of the Annotation
annotation_type“spans”, “sites”, “primers”, or “group”: the type of the new annotation If “spans” or “primers”, adjacent True values, or values within window are collapse to a region.
namestring: a name for labelling the annotation.
colorstring, defaults to “blue”: a color for plotting the annotation
windowinteger, defaults to 1: a window around True values to include in the annotation.

Returns

rnavigate.data.Annotation: the new Annotation

from_sites(sites): Create the self.data dataframe from a list of sites.

from_spans(spans): Create the self.data dataframe from a list of spans.

get_aligned_data(alignment)

Aligns this Annotation to a new sequence and returns a copy.

Parameters

alignmentrnavigate.data.Alignment: Alignment object used to align to a new sequence.

Returns

rnavigate.data.Annotation: A new Annotation with the same name, color, and annotation type, but with the input data aligned to the target sequence.

get_sites()

Returns a list of nucleotide positions included in this annotation.

Returns

sitestuple: a list of nucleotide positions

get_subsequences(buffer=0)

class rnavigate.data.DanceMaP(input_data, component, read_table_kw=None, sequence=None, metric='Norm_profile', metric_defaults=None, name=None)

Bases: SHAPEMaP

A class to represent per-nucleotide DanceMaP data.

Parameters

input_datastr or pandas.DataFrame: path to a DanceMapper reactivities.txt file or a pandas DataFrame
componentint: Which component of the DanceMapper ensemble to read in (0-indexed).
read_table_kwdict, optional: Keyword arguments to pass to pandas.read_table. These are not necessary for reactivities.txt files. Defaults to None.
sequencernavigate.Sequence or str, optional: A sequence to use as the reference sequence. This is not necessary for reactivities.txt files. Defaults to None.
metricstr, defaults to “Norm_profile”: The name of the set of value-to-color options to use.

read_file(input_data, read_table_kw={})

Convert data file to pandas dataframe and store as self.data

Parameters

filepathstring: path to data file containing interactions
read_table_kwdict: kwargs dictionary passed to pd.read_table

Returns

dataframepandas.DataFrame: the data table

property recreation_kwargs: A dictionary of keyword arguments to pass when recreating the object.

class rnavigate.data.Data(input_data, sequence, metric, metric_defaults, read_table_kw=None, name=None)

Bases: Sequence

The base class for RNAvigate Profile and Interactions classes.

Parameters

input_datapandas.DataFrame or str: a pandas dataframe or path to a data file
sequencestring or rnavigate.data.Sequence: the sequence to use for the data
metricstring or dict: the column of the dataframe to use as the default metric to visualize
metric_defaultsdict: a dictionary of metric defaults
read_table_kwdict, optional: kwargs dictionary passed to pd.read_table
namestring, optional: the name of the data, defaults to None

Attributes

datapandas.DataFrame: the data table
filepathstring: the path to the data file
sequencestring or rnavigate.data.Sequence: the sequence to use for the data
metricstring or dict: the column of the dataframe to use as the metric to visualize
metric_defaultsdict: A dictionary of metric values and default settings for visualization
default_metricstring: the default metric to use for visualization

add_metric_defaults(metric_defaults): Add metric defaults to self.metric_defaults

property cmap: Get the colormap to use for colorbars and to retrieve colors.

property color_column: Get the column of the dataframe to use as the color for visualization.

property colors: Get one matplotlib color-like value for each nucleotide in self.sequence.

property error_column: Get the column of the dataframe to use as the error for visualization.

property metric: Get the column of the dataframe to use as the metric for visualization.

read_file(filepath, read_table_kw)

Convert data file to pandas dataframe and store as self.data

Parameters

filepathstring: path to data file containing interactions
read_table_kwdict: kwargs dictionary passed to pd.read_table

Returns

dataframepandas.DataFrame: the data table

class rnavigate.data.DeltaProfile(profile1, profile2, metric=None, metric_defaults=None, name=None)

Bases: Profile

A class to represent the difference between two profiles.

Parameters

profile1Profile: The first profile to compare.
profile2Profile: The second profile to compare.
metricstr, optional: The name of the metric to use. Defaults to the metric of profile1.
metric_defaultsdict, optional: Keys are metric names, to be used with metric. Values are dictionaries of plotting parameters. Defaults to None.
namestr, optional: A name for the data set. Defaults to None.

class rnavigate.data.Interactions(input_data, sequence, metric, metric_defaults, read_table_kw=None, window=1, name=None)

Bases: Data

A class for storing and manipulating interactions data.

Parameters

input_datastring or pandas.DataFrame

If string, a path to a file containing interactions data. If dataframe, the dataframe containing interactions data. The dataframe must contain columns “i”, “j”, and self.metric. Dataframe may also include other columns.

sequencestring or rnavigate.data.Sequence

The sequence string corresponding to the interactions data.

metricstring

The column name to use for visualization.

metric_defaultsdict

Keys are metric names and values are dictionaries of metric-specific defaults. These defaults include:

“metric_column”string
the column name to use for visualization

“cmap”string or matplotlib.colors.Colormap)
the colormap to use for visualization

“normalization”“min_max”, “0_1”, “none”, or “bins”
The type of normalization to use when mapping values to colors

“values”list of float
The values to used with normalization of the data

“title”string
the title to use for colorbars

“extend”“min”, “max”, “both”, or “neither”
Which ends to extend when drawing the colorbar.

“tick_labels” : list of string

read_table_kwdict

kwargs passed to pandas.read_table() when reading input_data.

windowint

The window size used to generate the interactions data.

namestr

The name of the data object.

Attributes

datapandas.DataFrame: The interactions data.
windowint: The window size that is being represented by i-j pairs.

copy(apply_filter=False)

Returns a copy of the interactions, optionally with masked rows removed.

Parameters

apply_filterbool, defaults to False: If True, masked rows (“mask” == False) are dropped.

Returns

rnavigate.data.Interactions: A copy of the interactions.

count_filter(**kwargs): Counts the number of interactions that pass the given filters.

data_specific_filter(**kwargs)

Does nothing for the base Interactions class, can be overwritten in subclasses.

Returns:: dict: dictionary of keyword argument pairs

filter(prefiltered=False, reset_filter=True, structure=None, min_cd=None, max_cd=None, paired_only=False, ss_only=False, ds_only=False, profile=None, min_profile=None, max_profile=None, compliments_only=False, nts=None, max_distance=None, min_distance=None, exclude_nts=None, isolate_nts=None, resolve_conflicts=None, **kwargs)

Convenience function that applies the above filters simultaneously.

Parameters

prefilteredbool, defaults to False

If True, the mask is not updated.

reset_filterbool, defaults to True

If True, the mask is reset before applying filters.

structurernavigate.data.SecondaryStructure, defaults to None

The structure to use for filtering.

min_cdint, defaults to None

The minimum contact distance to allow.

max_cdint, defaults to None

The maximum contact distance to allow.

paired_onlybool, defaults to False

If True, only keep interactions that are paired in the structure.

ss_onlybool, defaults to False

If True, only keep interactions between single-stranded nucleotides.

ds_onlybool, defaults to False

If True, only keep interactions between double-stranded nucleotides.

profilernavigate.data.Profile, defaults to None

The profile to use for masking.

min_profilefloat, defaults to None

The minimum profile value to allow.

max_profilefloat, defaults to None

The maximum profile value to allow.

compliments_onlybool, defaults to False

If True, only keep interactions where i and j are complimentary nucleotides.

ntsstr, defaults to None

If compliment_only is False, only keep interactions where i and j are in nts.

max_distanceint, defaults to None

The maximum distance to allow. If None, no maximum distance is set.

min_distanceint, defaults to None

The minimum distance to allow. If None, no minimum distance is set.

exclude_ntslist of int, defaults to None

A list of positions to exclude.

isolate_ntslist of int, defaults to None

A list of positions to isolate.

resolve_conflictsstr, defaults to None

If not None, conflicting windows are resolved using the Maximal Weighted Independent Set. The weights are taken from the metric value. The graph is first broken into components to speed up the identification of the MWIS. Then the mask is updated to only include the MWIS.

**kwargsdict

Each keyword should have the format “column_operator” where column is a valid column name of the dataframe and operator is one of:

“ge”: greater than or equal to “le”: less than or equal to “gt”: greater than “lt”: less than “eq”: equal to “ne”: not equal to

The values given to these keywords are then used in the comparison and False comparisons are filtered out. e.g.:

self.mask_on_values(Statistic_ge=23) evaluates to: self.update_mask(self.data[“Statistic”] >= 23)

Returns

masknumpy array: a boolean array of the same length as self.data

get_aligned_data(alignment, apply_filter=True)

Returns a copy mapped to a new sequence with masked rows removed.

Parameters

alignmentrnavigate.data.SequenceAlignment: The alignment to use for mapping the interactions.
apply_filterbool, defaults to True: If True, masked rows (“mask” == False) are dropped.

Returns

rnavigate.data.Interactions: Interactions mapped to a new sequence.

get_ij_colors()

Gets i, j, and colors lists for plotting interactions.

i and j are the 5’ and 3’ ends of each interaction, and colors is the color to use for each interaction. Values of self.data[self.metric] are normalized to 0 to 1, which correspond to self.min_max values. These are then mapped to a color using self.cmap.

Returns

ilist: 5’ ends of each interaction
jlist: 3’ ends of each interaction
colorslist: colors to use for each interaction

get_sorted_data()

Returns a copy of the data sorted by self.metric.

Returns

pandas.DataFrame: a copy of the data sorted by self.metric

mask_on_distance(max_dist=None, min_dist=None)

Mask interactions based on their distance in sequence space.

Parameters

max_distint, defaults to None: The maximum distance to allow. If None, no maximum distance is set.
min_distint, defaults to None: The minimum distance to allow. If None, no minimum distance is set.

Returns

masknumpy array: a boolean array of the same length as self.data

mask_on_position(exclude=None, isolate=None)

Mask interactions based on their i and j positions.

Parameters

excludelist of int, defaults to None: A list of positions to exclude.
isolatelist of int, defaults to None: A list of positions to isolate.

Returns

masknumpy array: a boolean array of the same length as self.data

mask_on_profile(profile, min_profile=None, max_profile=None)

Masks interactions based on per-nucleotide measurements.

Parameters

profilernavigate.data.Profile: The profile to use for masking.
min_profilefloat, defaults to None: The minimum profile value to allow.
max_profilefloat, defaults to None: The maximum profile value to allow.

Returns

masknumpy array: a boolean array of the same length as self.data

mask_on_sequence(compliment_only=None, nts=None)

Mask interactions based on sequence.

Parameters

compliment_onlybool, defaults to None: If True, only keep interactions where i and j are complimentary nucleotides.
ntsstr, defaults to None: If compliment_only is False, only keep interactions where i and j are in nts.

Returns

numpy array: a boolean array of the same length as self.data

mask_on_structure(structure, min_cd=None, max_cd=None, ss_only=False, ds_only=False, paired_only=False)

Masks interactions based on a secondary structure.

Parameters

structurernavigate.data.SecondaryStructure: The secondary structure to use for masking.
min_cdint, defaults to None: The minimum contact distance to allow.
max_cdint, defaults to None: The maximum contact distance to allow.
ss_onlybool, defaults to False: If True, only keep interactions between single-stranded nucleotides.
ds_onlybool, defaults to False: If True, only keep interactions between double-stranded nucleotides.
paired_onlybool, defaults to False: If True, only keep interactions that are paired in the structure.

Returns

masknumpy array: a boolean array of the same length as self.data

mask_on_values(**kwargs)

Mask interactions based on values in self.data.

Parameters

kwargsdict

Each keyword should have the format “column_operator” where column is a valid column name of the dataframe and operator is one of:

“ge”: greater than or equal to “le”: less than or equal to “gt”: greater than “lt”: less than “eq”: equal to “ne”: not equal to

The values given to these keywords are then used in the comparison and False comparisons are filtered out. e.g.:

self.mask_on_values(Statistic_ge=23) evaluates to: self.update_mask(self.data[“Statistic”] >= 23)

Returns

masknumpy array: a boolean array of the same length as self.data

print_new_file(outfile=None)

Create a new file with mapped and filtered interactions.

Parameters

outfilestr, defaults to None: path to an output file. If None, file string is printed to console.

reset_mask(): Resets the mask to all True (removes previous filters)

resolve_conflicts(metric=None)

Uses an experimental method to resolve conflicts.

Resolves conflicting windows using the Maximal Weighted Independent Set. The weights are taken from the metric value. The graph is first broken into components to speed up the identification of the MWIS. Then the mask is updated to only include the MWIS. This method is computationally expensive for large or dense datasets.

Parameters

metricstr, defaults to None: The metric to use for weighting the graph. If None, self.metric is used.

Returns

masknumpy array: a boolean array of the same length as self.data

set_3d_distances(pdb, atom): Wrapper for set_distances for backwards compatibility.

set_distances(structure, atom="O2'")

Sets the Distance column value based on nt distances in the given structure.

If structure is a SecondaryStructure, contact distances are calculated, and if structure is a PDB, 3D distances are calculated. These distances are averaged across the window and stored in a new “Distance” column in self.data.

Parameters

structurernavigate.data.SecondaryStructure or rnavigate.data.PDB: Structure object to use for calculating distances
atomstr: atom id to use for calculating distances in a PDB structure

update_mask(mask): Updates the mask by ANDing the current mask with the given mask.

class rnavigate.data.Motif(input_data, sequence, name=None, color='blue')

Bases: Annotation

Automatically annotates the occurances of a sequence motif as spans.

Parameters

input_datastr: sequence motif to search for. Uses conventional nucleotide codes. e.g. “DRACH” = [AGTU] [AG] A C [ATUC]
sequencestr or pandas.DataFrame: Nucleotide sequence, path to fasta file, or dataframe containing a “Sequence” column.
namestr, defaults to None: Name of annotation.
colormatplotlib color-like, defaults to “blue”: Color to be used for displaying this annotation on plots.

Attributes

datapandas.DataFrame: Stores the list of regions that match the motif
namestr: The label for this annotation for use on plots
colorvalid matplotlib color: Color to represent annotation on plots
sequencestr: The reference sequence string

get_aligned_data(alignment)

Searches the new sequence for the motif and returns a new Motif annotation.

Parameters

alignmentrnavigate.data.Alignment: Alignment object used to align to a new sequence.

Returns

rnavigate.data.Motif: A new Motif with the same name, color, and motif but with the input data aligned to the target sequence.

get_spans_from_motif(sequence, motif)

Returns a list of spans for each location of motif found within sequence.

Parameters

sequencestring: sequence to be searched
motifstring: sequence motif to searched for.

Returns

spanslist of lists: list of [start, end] positions of each motif occurance

class rnavigate.data.ORFs(input_data, name=None, sequence=None, color='blue')

Bases: Annotation

Automatically annotations occurances of open-reading frames as spans.

Parameters

input_data“longest” or “all”: which ORFs to annotate. “longest” annotates the longest ORF. “all” annotates all potential ORFs.
sequencestr or pandas.DataFrame: Nucleotide sequence, path to fasta file, or dataframe containing a “Sequence” column.
namestr, defaults to None: Name of annotation.
colormatplotlib color-like, defaults to “blue”: Color to be used for displaying this annotation on plots.

Attributes

datapandas.DataFrame: Stores the list of regions that match the motif
namestr: The label for this annotation for use on plots
colorvalid matplotlib color: Color to represent annotation on plots
sequencestr: The reference sequence string

get_aligned_data(alignment)

Searches the new sequence for ORFs and returns a new ORF annotation.

Parameters

alignmentrnavigate.data.Alignment: Alignment object used to align to a new sequence.

Returns

rnavigate.data.ORFs: A new ORFs annotation with the same name, color, and input_data but with the input data aligned to the target sequence.

get_spans_from_orf(sequence, which='all')

Given a sequence string, returns spans for specified ORFs

Parameters

sequencestring: RNA nucleotide sequence
which“longest” or “all”, defaults to “all”: “all” returns all spans, “longest” returns the longest span

Returns

list of tuples: (start, end) position of each ORF 1-indexed, inclusive

class rnavigate.data.PAIRMaP(input_data, sequence=None, metric='Class', metric_defaults=None, read_table_kw=None, window=1, name=None)

Bases: RINGMaP

A class for storing and manipulating PAIRMaP data.

Parameters

input_datastring or pandas.DataFrame

If string, a path to a file containing PAIRMaP data. If dataframe, the dataframe containing PAIRMaP data. The dataframe must contain columns “i”, “j”, “Statistic”, and “Class”. Dataframe may also include other columns.

sequencestring or rnavigate.data.Sequence

The sequence string corresponding to the PAIRMaP data.

metricstring, defaults to “Class”

The column name to use for visualization.

metric_defaultsdict

Keys are metric names and values are dictionaries of metric-specific defaults. These defaults include:

“metric_column”string
the column name to use for visualization

“cmap”string or matplotlib.colors.Colormap)
the colormap to use for visualization

“normalization”“min_max”, “0_1”, “none”, or “bins”
The type of normalization to use when mapping values to colors

“values”list of float
The values to used with normalization of the data

“title”string
the title to use for colorbars

“extend”“min”, “max”, “both”, or “neither”
Which ends to extend when drawing the colorbar.

“tick_labels” : list of string

read_table_kwdict, optional

kwargs passed to pandas.read_table() when reading input_data.

windowint, defaults to 1

The window size used to generate the PAIRMaP data. If an input file is provided, this value is overwritten by the value in the header.

namestr, optional

A name for the interactions object.

Attributes

datapandas.DataFrame: The PAIRMaP data.

data_specific_filter(all_pairs=False, **kwargs)

Used by Interactions.filter(). By default, non-primary and -secondary pairs are removed. all_pairs=True changes this behavior.

Parameters

all_pairsbool, defaults to False: whether to include all PAIRs.

Returns

kwargsdict: any additional keyword-argument pairs are returned
masknumpy array: a boolean array of the same length as self.data

get_sorted_data()

Same as parent function, unless metric is set to “Class”, in which case ij pairs are returned in a different order.

Returns

pandas.DataFrame: a copy of the data sorted by self.metric

read_file(filepath, read_table_kw=None)

Parses a pairmap.txt file and stores data as a dataframe

Sets self.window (usually 3), from header.

Parameters

filepathstr: path to pairmap.txt file
read_table_kwdict, defaults to None: This argument is ignored.

class rnavigate.data.PDB(input_data, chain, sequence=None, name=None)

Bases: Sequence

A class to represent RNA tertiary structures with atomic coordinates.

This data can be used to filter interactions by 3D distance, and to visualize profile and interactions data on interactive 3D structures.

Parameters

input_datastr: path to a PDB or CIF file
chainstr: chain identifier of RNA of interest
sequencernavigate.Sequence or str, optional: A sequence to use as the reference sequence. This is required if the sequence cannot be found in the header Defaults to None.
namestr, optional: A name for the data set. Defaults to None.

Attributes

sequencestr: The RNA sequence
lengthint: The length of the RNA sequence
namestr: A name for the data set
pathstr: The path to the PDB or CIF file
chainstr: The chain identifier of the RNA of interest
offsetint: The offset between the sequence positions and the PDB residue indices
pdbBio.PDB.Structure.Structure: The PDB structure
pdb_idxnp.array: The PDB indices of the RNA
pdb_seqnp.array: The PDB sequence of the RNA
distance_matrixdict: A dictionary of distance matrices for each atom type

get_distance(i, j, atom="O2'")

Get the distance between given atom in nucleotides i and j (1-indexed).

Parameters

iint: The first nucleotide
jint: The second nucleotide
atomstring or dict, defaults to “O2’”: The atom to use for distance calculations. If a string, the same atom will be used for all residues. If a dict, the atom will be chosen based on the nucleotide type. If “DMS”, the N1 atom will be used for A and G, and the N3 atom will be used for U and C.

Returns

distancefloat: The distance between the atoms

get_distance_matrix(atom="O2'")

Get the pairwise atomic distance matrix for all residues.

Parameters

atomstring or dict, defaults to “O2’”: The atom to use for distance calculations. If a string, the same atom will be used for all residues. If a dict, the atom will be chosen based on the nucleotide type. If “DMS”, the N1 atom will be used for A and G, and the N3 atom will be used for U and C.

Returns

matrixNxN numpy.ndarray: A 2D array of pairwise distances. N is the length of the RNA.

get_pdb_idx(seq_idx): Return the PDB index given the sequence index (0-indexed).

get_seq_idx(pdb_idx): Return the sequence index given the PDB index.

get_sequence(pdb)

Find the sequence in the provided CIF or PDB file.

Parameters

pdbstr: path to a PDB or CIF file

Returns

sequencestring: The RNA sequence

get_sequence_from_seqres(seqres)

Used by get_sequence to parse the SEQRES entries.

Parameters

seqreslist: A list of SEQRES entries for the RNA chain of interest

Returns

sequencestring: The RNA sequence

get_xyz_coord(nt, atom)

Return the x, y, and z coordinates for a given residue and atom.

Parameters

ntint: The nucleotide of interest (1-indexed)
atomstring or dict, defaults to “O2’”: The atom to use for distance calculations. If a string, the same atom will be used for all residues. If a dict, the atom will be chosen based on the nucleotide type. If “DMS”, the N1 atom will be used for A and G, and the N3 atom will be used for U and C.

Returns

xyzlist: A list of x, y, and z coordinates

is_valid_idx(pdb_idx=None, seq_idx=None)

Determines if a PDB or sequence index is in the PDB structure.

Parameters

pdb_idxint, optional: A PDB index (1-indexed). Defaults to None.
seq_idxint, optional: A sequence index (1-indexed). Defaults to None.

Returns

bool: True if the index is in the PDB structure, False otherwise.

read_pdb(pdb)

Read a PDB or CIF file into the data structure.

Parameters

pdbstr: path to a PDB or CIF file

set_indices(): Uses self.data and self.sequence to set self.offset

class rnavigate.data.PairingProbability(input_data, extension=None, sequence=None, metric='Probability', metric_defaults=None, read_table_kw=None, window=1, name=None)

Bases: Interactions

A class for storing and manipulating pairing probability data.

Parameters

input_datastring or pandas.DataFrame

If string, a path to a file containing pairing probability data. If dataframe, the dataframe containing pairing probability data. The dataframe must contain columns “i”, “j”, “Probability”, and “log10p”. Dataframe may also include other columns.

extensionstring, defaults to None

The file extension of the input_data. If None, the extension is determined from the input_data string. Options are “.bps”, “.txt”, and “.dp”. If the extension is “.bps”, the sequence is parsed from the file. If the extension is “.txt” or “.dp”, the sequence must be provided via the sequence argument.

sequencestring or rnavigate.data.Sequence

The sequence string corresponding to the pairing probability data.

metricstring, defaults to “Probability”

The column name to use for visualization.

metric_defaultsdict

Keys are metric names and values are dictionaries of metric-specific defaults. These defaults include:

“metric_column”string
the column name to use for visualization

“cmap”string or matplotlib.colors.Colormap
the colormap to use for visualization

“normalization”“min_max”, “0_1”, “none”, or “bins”
The type of normalization to use when mapping values to colors

“values”list of float
The values to used with normalization of the data

“title”string
the title to use for colorbars

“extend”“min”, “max”, “both”, or “neither”
Which ends to extend when drawing the colorbar.

“tick_labels” : list of string

read_table_kwdict, optional

kwargs passed to pandas.read_table() when reading input_data.

windowint, defaults to 1

The window size used to generate the pairing probability data.

namestr, optional

A name for the PairingProbability object.

Attributes

datapandas.DataFrame: The pairing probability data.

data_specific_filter(**kwargs)

By default, interactions with probabilities less than 0.03 are removed.

Returns

kwargsdict: any additional keyword-argument pairs are returned
masknumpy array: a boolean array of the same length as self.data

get_entropy_profile(print_out=False, save_file=None)

Calculates per-nucleotide Shannon entropy from pairing probabilities.

Parameters

print_outbool, defaults to False: If True, entropy values are printed to console.
save_filestr, defaults to None: If not None, entropy values are saved to this file.

Returns

rnavigate.data.Profile: a Profile object containing the entropy data

read_bps()

Parses a bps file and returns sequence as a string and data as a dataframe.

Returns

str: the sequence string
pandas.DataFrame: the pairing probability data

read_txt()

Parses a pairing probability file and returns data as a dataframe.

Parameters

filepathstr: path to pairing probability file
read_table_kwdict, defaults to None: This argument is ignored.

Returns

pandas.DataFrame: the pairing probability data

class rnavigate.data.Profile(input_data, metric='default', metric_defaults=None, read_table_kw=None, sequence=None, name=None)

Bases: Data

A class to represent per-nucleotide data.

Parameters

input_datastr or pandas.DataFrame

path to a csv or tab file or a pandas DataFrame Table must be 1 row for each nucleotide in the sequence. table columns must contain these columns:

A nucleotide position column labelled “Nucleotide” A sequence column labelled “Sequence” with 1 of (A, C, G, U, T) per row

These will be added to the table if sequence is provided.

A data measurement column labelled “Profile” with a float or integer
Label may be another name if specified in metric_defaults

Optionally: A measurement error column.
Label must be specified in metric_defaults

Other columns may be present, and set up using metric_defaults.
See metric_defaults for more information.

read_table_kwdict, optional

Keyword arguments to pass to pandas.read_table. Defaults to None.

sequencernavigate.Sequence or str, optional

A sequence to use as the reference sequence. This is required if input_data does not contain a “Sequence” column. Defaults to None.

metricstr, defaults to “default”

The name of the set of value-to-color options to use. “default” specifies:

“Profile” column is used No error rates are present Values are normalized to the range [0, 1] Values are mapped to colors using the “viridis” colormap

“Distance” specifies:: (3-D) “Distance” column is used No error rates are present Values in the range [5, 50] are normalized to the range [0, 1] Values are mapped to colors using the “cool” colormap

Other options may be defined in metric_defaults.

metric_defaultsdict, optional

Keys are metric names, to be used with metric. Values are dictionaries of plotting parameters:

“metric_column”str
The name of the column to use as the metric. Plots and analyses that use per-nucleotide data will use this column. If “color_column” is not provided, this column also defines colors.

“error_column”str or None
The name of the column to use as the error. If None, no error bars are plotted.

“color_column”str or None
The name of the column to use for coloring. If None, colors are defined by “metric_column”.

“cmap”str or list
The name of the colormap to use. If a list, the list of colors to use.

“normalization”str
The type of normalization to use. In order to be used with colormaps, values are normalized to either be integers for categorical colormaps, or floats in the range [0, 1] for continuous colormaps. “none” : no normalization is performed “min_max” : values are scaled to floats in the range [0, 1] based on

the upper and lower bounds defined in “values”

“0_1”values are scaled to floats in the range [0, 1] based on
the minimum and maximum values in the data

“bins”values are scaled an integer based on bins defined by the
list of bounds defined in “values”

“percentiles”values are scaled to floats in the range [0, 1]
based on upper and lower percentile bounds defined by “values”

“values”list or None
The values to use when normalizing the data. if “normalization” is “min_max”, this should be a list of two values

defining the upper and lower bounds.

if “normalization” is “bins”, this should be a list of values
of length 1 less than the length of cmap. example: [5, 10, 20] defines 4 bins:

(-infinity, 5), [5, 10), [10, 20), [20, infinity)

if “normalization” is “percentiles”, this should be a list of two
values defining the upper and lower percentile bounds.

if “normalization” is “0_1” or “none”, this should be None.

“title”str, defaults to “”
The title of the colorbar.

“ticks”list, defaults to None
The tick locations to use for the colorbar. If None, values are determined automatically.

“tick_labels”list, defaults to None
The labels to use for the colorbar ticks. If None, values are determined automatically from “ticks”.

“extend”“neither”, “both”, “min”, or “max”, defaults to “neither”
Which ends of the colorbar to extend (places an arrow head).

Defaults to None.

namestr, optional

A name for the data set. Defaults to None.

Attributes

datapandas.DataFrame: The data table

calculate_gini_index(values): Calculate the Gini index of an array of values.

calculate_windows(column, window, method='median', new_name=None, minimum_points=None, mask_na=True)

calculates a windowed operation over a column of data.

Result is stored in a new column. Value of each window is assigned to the center position of the window.

Parameters

columnstr: name of column to perform operation on
windowint: window size, must be an odd number
methodstring or function, defaults to “median”: operation to perform over windows. if string, must be “median”, “mean”, “minimum”, or “maximum” if function, must take a 1D numpy array as input and return a scalar
new_namestr, defaults to f”{method}_{window}_nt”: name of new column for stored result.
minimum_pointsint, defaults to value of window: minimum number of points within each window.
mask_nabool, defaults to True: whether to mask the result of the operation where the original column has a nan value.

copy(): Returns a copy of the Profile.

classmethod from_array(input_data, sequence, **kwargs)

Construct a Profile object from an array of values.

Parameters

input_datalist or np.array: A list or array of values to use as the metric.
sequencestr: The RNA sequence.
**kwargs: Additional keyword arguments to pass to the Profile constructor.

Returns

Profile: A Profile object with the provided values.

get_aligned_data(alignment)

Returns a new Profile object with the data aligned to a sequence.

Parameters

alignmentrnavigate.data.SequenceAlignment: The alignment to use to map rows of self.data to a new sequence.

Returns

Profile: A new Profile object with the data aligned to the sequence in the alignment.

get_plotting_dataframe()

Returns a dataframe with the data to be plotted.

Returns

pandas.DataFrame: A dataframe with the columns “Nucleotide”, “Values”, “Errors”, and “Colors”.

norm_boxplot(values)

removes outliers (> 1.5 * IQR) and scales the mean to 1.

NOTE: This method varies slightly from normalization method used in the SHAPEMapper pipeline. Shapemapper sets undefined values to 0, and then uses these values when computing iqr and 90th percentile. Including these values can skew these result. This method excludes such nan values. Other elements are the same.

Parameters

values1D numpy array: values to normalize

Returns

(float, float): scaling factor and error propagation factor

norm_eDMS(values)

Calculates norm factors following eDMS pernt scheme in ShapeMapper 2.2

Parameters

values1D numpy array: values to normalize

Returns

(float, float): scaling factor and error propagation factor

norm_percentiles(values, lower_bound=90, upper_bound=99, median_or_mean='mean')

Calculates factors to scale the median between percentile bounds to 1.

Parameters

values1D numpy array: values to normalize
lower_boundint or float, optional: percentile of lower bound, Defaults to 90
upper_boundint or float, optional: percentile of upper bound, Defaults to 99
median_or_meanstring, optional: whether to use the median or mean of the values between the bounds.

Returns

(float, float): scaling factor and error propagation factor

normalize(profile_column=None, new_profile=None, error_column=None, new_error=None, norm_method='boxplot', nt_groups=None, profile_factors=None, **norm_kwargs)

Normalize values in a column, and store in a new column.

By default, performs ShapeMapper2 boxplot normalization on self.metric and stores the result as “Norm_profile”.

Parameters

profile_columnstring, defaults to self.metric

column name of values to normalize

new_profilestring, defaults to “Norm_profile”

column name of new normalized values

error_columnstring, defaults to self.error_column

column name of error values to propagate

new_errorstring, defaults to “Norm_error”

column name of new propagated error values

norm_methodstring, defaults to “boxplot”

normalization method to use. “DMS” uses self.norm_percentile and nt_groups=[‘AC’, ‘UG’]

scales the median of 90th to 95th percentiles to 1 As and Cs are normalized seperately from Us and Gs

“eDMS” uses self.norm_eDMS and nt_groups=[‘A’, ‘U’, ‘C’, ‘G’]: Applies the new eDMS-MaP normalization. Each nucleotide is normalized seperately.
“boxplot” uses self.norm_boxplot and nt_groups=[‘AUCG’]: removes outliers (> 1.5 iqr) and scales median to 1 scales nucleotides together unless specified with nt_groups
“percentile” uses self.norm_percentile and nt_groups=[‘AUCG’]: scales the median of 90th to 95th percentiles to 1 scales nucleotides together unless specified with nt_groups

Defaults to “boxplot”: the default normalization of ShapeMapper

nt_groupslist of strings, defaults to None

A list of nucleotides to group e.g. [‘AUCG’] groups all nts together

[‘AC’, ‘UG’] groups As with Cs and Us with Gs [‘A’, ‘C’, ‘U’, ‘G’] scales each nt seperately

Default depends on norm_method

profile_factorsdictionary, defaults to None

a scaling factor (float) for each nucleotide. keys must be:: ‘A’, ‘C’, ‘U’, ‘G’

Note: using this argument overrides any calculation of scaling Defaults to None

**norm_kwargs

these are passed to the norm_method function

Returns

profile_factorsdict: the new profile scaling factors dictionary

normalize_external(profiles, **kwargs)

normalize reactivities using other profiles to normfactors.

Parameters

profileslist of rnavigate.data.Profile: a list of other profiles used to compute scaling factors

Returns

profile_factorsdict: the new profile scaling factors dictionary

normalize_sequence(t_or_u='U', uppercase=True)

Changes the values in self.data[“Sequence”] to the normalized sequence.

Parameters

t_or_u“T” or “U”, Defaults to “U”.: Whether to replace T with U or U with T.
uppercasebool, Defaults to True.: Whether to convert the sequence to uppercase.

property recreation_kwargs: A dictionary of keyword arguments to pass when recreating the object.

winsorize(column, lower_bound=None, upper_bound=None)

Winsorize the data between bounds.

If either bound is set to None, one-sided Winsorization is performed.

Parameters

columnstring: the column of data to be winsorized
lower_boundNumber or None, defaults to None: Data below this value is set to this value. If None, no lower bound is applied.
upper_boundNumber or None, defaults to None: Data above this value is set to this value. If None, no upper bound is applied.

class rnavigate.data.RINGMaP(input_data, sequence=None, metric='Statistic', metric_defaults=None, read_table_kw=None, window=1, name=None)

Bases: Interactions

A class for storing and manipulating RINGMaP data.

Parameters

input_datastring or pandas.DataFrame

If string, a path to a file containing RINGMaP data. If dataframe, the dataframe containing RINGMaP data. The dataframe must contain columns “i”, “j”, “Statistic”, and “Zij”. Dataframe may also include other columns.

sequencestring or rnavigate.data.Sequence

The sequence string corresponding to the RINGMaP data.

metricstring, defaults to “Statistic”

The column name to use for visualization.

metric_defaultsdict

Keys are metric names and values are dictionaries of metric-specific defaults. These defaults include:

“metric_column”string
the column name to use for visualization

“cmap”string or matplotlib.colors.Colormap)
the colormap to use for visualization

“normalization”“min_max”, “0_1”, “none”, or “bins”
The type of normalization to use when mapping values to colors

“values”list of float
The values to used with normalization of the data

“title”string
the title to use for colorbars

“extend”“min”, “max”, “both”, or “neither”
Which ends to extend when drawing the colorbar.

“tick_labels” : list of string

read_table_kwdict, optional

kwargs passed to pandas.read_table() when reading input_data.

windowint, defaults to 1

The window size used to generate the RINGMaP data. If an input file is provided, this value is overwritten by the value in the header.

namestr, optional

A name for the interactions object.

Attributes

datapandas.DataFrame: The RINGMaP data.

data_specific_filter(positive_only=False, negative_only=False, **kwargs)

Adds filters for “Sign” column to parent filter() function

Parameters

positive_onlybool, defaults to False: If True, only keep positive correlations.
negative_onlybool, defaults to False: If True, only keep negative correlations.

Returns

kwargsdict: any additional keyword-argument pairs are returned
masknumpy array: a boolean array of the same length as self.data

get_sorted_data()

Sorts on the product of self.metric and “Sign” columns.

Except when self.metric is “Distance”.

Returns

pandas.DataFrame: a copy of the data sorted by (self.metric * “Sign”) columns

read_file(filepath, read_table_kw=None)

Parses a RINGMaP correlations file and stores data as a dataframe.

Also sets self.window (usually 1, from header).

Parameters

filepathstr: path to correlations file.
read_table_kwdict, defaults to {}: kwargs passed to pandas.read_table().

Returns

pandas.DataFrame: the RINGMaP data

class rnavigate.data.RNPMaP(input_data, read_table_kw=None, sequence=None, metric='NormedP', metric_defaults=None, name=None)

Bases: Profile

Represents per-nucleotide RNPMaP data.

Parameters

input_datastr or pandas.DataFrame: path to an RNAModMapper reactivities.txt file or a pandas DataFrame
read_table_kwdict, optional: Keyword arguments to pass to pandas.read_table. These are not necessary for reactivities.txt files. Defaults to None.
sequencernavigate.Sequence or str, optional: A sequence to use as the reference sequence. This is not necessary for reactivities.txt files. Defaults to None.
metricstr, defaults to “NormedP”: The name of the set of value-to-color options to use.
metric_defaultsdict, optional: Keys are metric names, to be used with metric. Values are dictionaries of plotting parameters. Defaults to None.
namestr, optional: A name for the data set. Defaults to None.

class rnavigate.data.SHAPEJuMP(input_data, sequence=None, metric='Percentile', metric_defaults=None, read_table_kw=None, window=1, name=None)

Bases: Interactions

A class for storing and manipulating SHAPEJuMP data.

Parameters

input_datastring or pandas.DataFrame

If string, a path to a file containing SHAPEJuMP data. If dataframe, the dataframe containing SHAPEJuMP data. The dataframe must contain columns “i”, “j”, “Metric” (JuMP rate) and “Percentile” (percentile ranking). Dataframe may also include other columns.

sequencestring or rnavigate.data.Sequence

The sequence string corresponding to the SHAPEJuMP data.

metricstring, defaults to “Percentile”

The column name to use for visualization.

metric_defaultsdict

Keys are metric names and values are dictionaries of metric-specific defaults. These defaults include:

“metric_column”string
the column name to use for visualization

“cmap”string or matplotlib.colors.Colormap)
the colormap to use for visualization

“normalization”“min_max”, “0_1”, “none”, or “bins”
The type of normalization to use when mapping values to colors

“values”list of float
The values to used with normalization of the data

“title”string
the title to use for colorbars

“extend”“min”, “max”, “both”, or “neither”
Which ends to extend when drawing the colorbar.

“tick_labels” : list of string

read_table_kwdict

kwargs passed to pandas.read_table() when reading input_data.

windowint

The window size used to generate the SHAPEJuMP data.

namestr

A name for the interactions object.

Attributes

datapandas.DataFrame: The SHAPEJuMP data.

read_file(input_data, read_table_kw=None)

Parses a deletions.txt file and stores it as a dataframe.

Also calculates a “Percentile” column.

Parameters

input_datastr: path to deletions.txt file
read_table_kwdict, defaults to {}: kwargs passed to pandas.read_table().

Returns

pandas.DataFrame: the SHAPEJuMP data

class rnavigate.data.SHAPEMaP(input_data, normalize=None, read_table_kw=None, sequence=None, metric='Norm_profile', metric_defaults=None, log=None, name=None)

Bases: Profile

A class to represent per-nucleotide SHAPE-MaP data.

Parameters

input_datastr or pandas.DataFrame

path to a ShapeMapper2 profile.txt or .map file or a pandas DataFrame

normalize“DMS”, “eDMS”, “boxplot”, “percentiles”, or None, defaults to None

The normalization method to use. “DMS” uses self.norm_percentile and nt_groups=[‘AC’, ‘UG’]

scales the median of 90th to 95th percentiles to 1 As and Cs are normalized seperately from Us and Gs

“eDMS” uses self.norm_eDMS and nt_groups=[‘A’, ‘U’, ‘C’, ‘G’]: Applies the new eDMS-MaP normalization. Each nucleotide is normalized seperately.
“boxplot” uses self.norm_boxplot and nt_groups=[‘AUCG’]: removes outliers (> 1.5 iqr) and scales median to 1 scales nucleotides together unless specified with nt_groups
“percentiles” uses self.norm_percentile and nt_groups=[‘AUCG’]: scales the median of 90th to 95th percentiles to 1 scales nucleotides together unless specified with nt_groups

Defaults to None: no normalization is performed

read_table_kwdict, optional

Keyword arguments to pass to pandas.read_table. These are not necessary for profile.txt and .map files. Defaults to None.

sequencernavigate.Sequence or str, optional

A sequence to use as the reference sequence. This is not necessary for profile.txt and .map files. Defaults to None.

metricstr, defaults to “Norm_profile”

The name of the set of value-to-color options to use. “Norm_profile” specifies:

“Norm_profile” column is used “Norm_stderr” column is used for error bars Values are normalized to bins:

(-inf, -0.4), [-0.4, 0.4), [0.4, 0.85), [0.85, 2), [2, inf)

Bins are mapped to “grey”, “black”, “orange”, “red”, “red”

Other options may be defined in metric_defaults.

metric_defaultsdict, optional

Keys are metric names, to be used with metric. Values are dictionaries of plotting parameters:

“metric_column”str
The name of the column to use as the metric. Plots and analyses that use per-nucleotide data will use this column. If “color_column” is not provided, this column also defines colors.

“error_column”str or None
The name of the column to use as the error. If None, no error bars are plotted.

“color_column”str or None
The name of the column to use for coloring. If None, colors are defined by “metric_column”.

“cmap”str or list
The name of the colormap to use. If a list, the list of colors to use.

“normalization”str
The type of normalization to use. In order to be used with colormaps, values are normalized to either be integers for categorical colormaps, or floats in the range [0, 1] for continuous colormaps. “none” : no normalization is performed “min_max” : values are scaled to floats in the range [0, 1] based on

the upper and lower bounds defined in “values”

“0_1”values are scaled to floats in the range [0, 1] based on
the minimum and maximum values in the data

“bins”values are scaled an integer based on bins defined by the
list of bounds defined in “values”

“percentiles”values are scaled to floats in the range [0, 1]
based on upper and lower percentile bounds defined by “values”

“values”list or None
The values to use when normalizing the data. if “normalization” is “min_max”, this should be a list of two values

defining the upper and lower bounds.

if “normalization” is “bins”, this should be a list of values
of length 1 less than the length of cmap. example: [5, 10, 20] defines 4 bins:

(-infinity, 5), [5, 10), [10, 20), [20, infinity)

if “normalization” is “percentiles”, this should be a list of two
values defining the upper and lower percentile bounds.

if “normalization” is “0_1” or “none”, this should be None.

“title”str, defaults to “”
The title of the colorbar.

“ticks”list, defaults to None
The tick locations to use for the colorbar. If None, values are determined automatically.

“tick_labels”list, defaults to None
The labels to use for the colorbar ticks. If None, values are determined automatically from “ticks”.

“extend”“neither”, “both”, “min”, or “max”, defaults to “neither”
Which ends of the colorbar to extend (places an arrow head).

Defaults to None.

logstr, optional

Path to a ShapeMapper v2 shapemap_log.txt file with mutations-per-molecule and read-length histograms. These will be present if the –per-read-histogram flag was used when running ShapeMapper v2. Currently, this is not working with ShapeMapper v2.2 files. Defaults to None.

namestr, optional

A name for the data set. Defaults to None.

Attributes

datapandas.DataFrame: The data table

classmethod from_rnaframework(input_data, normalize=None)

Construct a SHAPEMaP object from an RNAFramework output file.

Parameters

input_datastr

path to an RNAFramework .xml reactivities file

normalize“DMS”, “eDMS”, “boxplot”, “percentiles”, or None, defaults to None

The normalization method to use. “DMS” uses self.norm_percentile and nt_groups=[‘AC’, ‘UG’]

scales the median of 90th to 95th percentiles to 1 As and Cs are normalized seperately from Us and Gs

“eDMS” uses self.norm_eDMS and nt_groups=[‘A’, ‘U’, ‘C’, ‘G’]: Applies the new eDMS-MaP normalization. Each nucleotide is normalized seperately.
“boxplot” uses self.norm_boxplot and nt_groups=[‘AUCG’]: removes outliers (> 1.5 iqr) and scales median to 1 scales nucleotides together unless specified with nt_groups
“percentiles” uses self.norm_percentile and nt_groups=[‘AUCG’]: scales the median of 90th to 95th percentiles to 1 scales nucleotides together unless specified with nt_groups

Defaults to None: no normalization is performed

Returns

SHAPEMaP: A SHAPEMaP object with the provided values.

read_log(log)

Read the ShapeMapper log file.

Parameters

logstr: Path to a ShapeMapper v2 shapemap_log.txt file with mutations-per-molecule and read-length histograms.

Returns

read_lengthspandas.DataFrame: A dataframe with the columns “Read_length”, “Modified_read_length”, and “Untreated_read_length”.
mutations_per_moleculepandas.DataFrame: A dataframe with the columns “Mutation_count”, “Modified_mutations_per_molecule”, and “Untreated_mutations_per_molecule”.

write_bpp2seq_file(output_file)

Write the data to a ShapeMapper2 .bpp2seq file (for Contra/EternaFold).

Parameters

output_filestr: The path to write the output file.

write_shape_file(output_file)

Write the data to a ShapeMapper2 .shape file (for RNAstructure programs).

Parameters

output_filestr: The path to write the output file.

class rnavigate.data.ScalarMappable(cmap, normalization, values, title='', tick_labels=None, **cbar_args)

Bases: _ScalarMappable

Used to map scalar values to a color and to create a colorbar plot.

Parameters

cmapstr, tuple, float, or list: A valid mpl color, list of valid colors or a valid colormap name
normalization“min_max”, “0_1”, “none”, or “bins”: The type of normalization to use when mapping values to colors
valueslist: The values to use when normalizing the data
titlestr, defaults to “”: The title of the colorbar.
tick_labelslist, defaults to None: The labels to use for the colorbar ticks. If None, values are determined automatically.
**cbar_argsdict: Additional arguments to pass to the colorbar function

Attributes

rnav_normstr: The type of normalization to use when mapping values to colors
rnav_valslist: The values to use when normalizing the data
rnav_cmaplist: The colors to use when mapping values to colors
cbar_argsdict: Additional arguments to pass to the colorbar function
tick_labelslist: The labels to use for the colorbar ticks. If None, values are determined automatically.
titlestr: The title of the colorbar.

get_cmap(cmap)

Converts a cmap specification to a matplotlib colormap object.

Parameters

cmapstring, tuple, float, or list: A valid mpl color, list of valid colors or a valid colormap name

Returns

matplotlib colormap: a colormap matching the input

get_norm(normalization, values, cmap)

Given a normalization type and values, return a normalization object.

Parameters

normalization“min_max”, “0_1”, “none”, or “bins”: The type of normalization to use when mapping values to colors
valueslist: The values to use when normalizing the data
cmapmatplotlib colormap: The colormap to use when normalizing the data

Returns

matplotlib.colors normalization object: Used to normalize data before mapping to colors

is_equivalent_to(cmap2)

Check if two ScalarMappable objects are equivalent.

Parameters

cmap2ScalarMappable: The ScalarMappable object to compare to

Returns

bool: True if the two ScalarMappable objects are equivalent, False otherwise

values_to_hexcolors(values, alpha=1.0)

Map values to colors and return a list of hex colors.

Parameters

valueslist: The values to map to colors
alphafloat, defaults to 1.0: The alpha value to use for the colors

Returns

list of strings: A list of hex colors

class rnavigate.data.SecondaryStructure(input_data, extension=None, autoscale=True, name=None, **kwargs)

Bases: Sequence

Base class for secondary structures.

Parameters

input_datastr or pandas.DataFrame

A dataframe or filepath containing a secondary structure DataFrame should contain these columns:

[“Nucleotide”, “Sequence”, “Pair”]

“Pair” column must be redundant. Filepath parsing is determined by file extension:

varna, xrna, nsd, cte, ct, dbn, bracket, json (R2DT), forna

extensionstr, optional

The file extension of the input_data file. If not provided, the extension will be inferred from the input_data filepath.

autoscalebool, optional

Whether to automatically scale the x and y coordinates. Defaults to True.

namestr, optional

The name of the RNA sequence. Defaults to None.

Attributes

datapandas.DataFrame: DataFrame storing base-pairs
filepathstr: The path to the input file, if provided, otherwise “dataframe”
sequencestr: The RNA sequence
ntsnumpy.array: The “Nucleotide” column of data
pair_ntsnumpy.array: The “Pair” column of data
headerstr: Header information from CT file
xcoordinatesnumpy.array: The “X_coordinate” column of data
ycoordinatesnumpy.array: The “X_coordinate” column of data
distance_matrixnumpy.array: The contact distance matrix of the RNA structure

add_pairs(pairs, break_conflicting_pairs=False)

Add base pairs to current secondary structure.

Parameters

pairslist: 1-indexed list of paired residues. e.g. [(1, 20), (2, 19)]
break_conflicting_pairsbool, defaults to False: Whether to break existing pairs if there is a conflict

as_interactions(structure2=None)

Returns rnavigate.Interactions representation of this, or more, structures.

Parameters

structure2SecondaryStructure or list of these, defaults to None: If provided, basepairs from all structures are included and labeled by which structures contain them and how many structures contain them.

property boolean: Return a boolean array of paired and unpaired nucleotides.

break_noncanonical_pairs()

Removes non-canonical basepairs from the secondary structure.

WARNING: this deletes information.

break_pairs_nts(nt_positions)

break base pairs at the given list of positions.

WARNING: this deletes information.

Parameters

nt_positionslist of int: 1-indexed positions to break pairs

break_pairs_region(start, end, break_crossing=True, inverse=False)

Removes pairs from the specified region (1-indexed, inclusive).

WARNING: this deletes information

Parameters

startint: start position (1-indexed, inclusive)
endint: end position (1-indexed, inclusive)
break_crossingbool, defaults to True: Whether to keep pairs that cross over the specified region
inversebool, defaults to False: Invert the behavior, i.e. remove pairs that are not in this region

break_singleton_pairs()

Removes singleton basepairs from the secondary structure.

WARNING: This deletes information.

compute_ppv_sens(structure2, exact=True)

Compute the PPV and sensitivity between this and another structure.

True and False are determined from this structure. Positive and Negative are determined from structure2.

PPV = TP / (TP + FP) Sensitivity = TP / (TP + FN)

Parameters

structure2SecondaryStructure: The SecondaryStructure to compare to.
exactbool, defaults to True: True requires BPs to be exactly correct. False allows +/-1 bp slippage.

Returns

float: sensitivity
float: PPV
2-tuple of floats: (TP, TP+FP, TP+FN)

contact_distance(i, j): Returns the contact distance between positions i and j

copy()

fill_mismatches(mismatch=1)

Adds base pairs to fill 1,1 and optionally 2,2 mismatches.

Parameters

mismatchint, defaults to 1: 1 will fill only 1,1 mismatches 2 will fill 1,1 and 2,2 mismatches

classmethod from_pairs_list(input_data, sequence)

Creates a SecondaryStructure from a list of pairs and a sequence.

Parameters

input_datalist: 1-indexed list of base pairs. e.g. [(1, 20), (2, 19)]
sequencestr: The RNA sequence. e.g., “AUCGUGUCAUGCUA”

classmethod from_sequence(input_data)

Creates a SecondaryStructure from a sequence string.

This structure is initialized with no base pairs. If base pairs are needed, use SecondaryStructure.from_pairs_list().

get_aligned_data(alignment)

Returns a new SecondaryStructure object matching the alignment target.

Parameters

alignmentdata.Alignment: An alignment object used to map values

get_distance_matrix(recalculate=False, max_cd=50)

Get a matrix of pair-wise shortest path distances through the structure.

This function uses a BFS algorithm. The structure is represented as a complete graph with nucleotides as vertices and base-pairs and backbone as edges. All edges are length 1. Matrix is stored as an attribute for future use.

If the attribute is set (not None) and recalculate is False, the attribute will be returned.

Based on Tom’s contact_distance, but expanded to return the pairwise matrix. New contact_distance method added to return the distance between two positions.

By default, the maximum contact distance is set to 50. This will be the maximum value reported in the matrix, i.e. a value of 50 in the matrix means >= 50. This prevents the algorithm from running for a very long time on long RNAs. If you need a larger value, set max_cd to a higher value.

Parameters

recalculatebool, defaults to False: Set to True to recalculate the matrix even if the attribute is set.
max_cdint, defaults to 50: The maximum contact distance to calculate.

get_dotbracket()

Get a dotbracket notation string representing the secondary structure.

Pseudoknot levels:: 1: () 2: [] 3: {} 4: <> 5: Aa 6: Bb 7: Cc etc…

Returns

str: A dot-bracket representation of the secondary structure

get_helices(fill_mismatches=True, split_bulge=True, keep_singles=False)

Get a dictionary of helices from the secondary structure.

Keys are equivalent to list indices. Values are lists of paired nucleotides (1-indexed) in that helix. e.g. {0:[(1,50),(2,49),(3,48)}

Parameters

fill_mismatchesbool, defaults to True: Whether 1-1 and 2-2 bulges are replaced with base pairs
split_bulgebool, defaults to True: Whether to split helices on bulges
keep_singlesbool, defaults to False: Whether to return helices that contain only 1 base-pair

Returns

dict: A dictionary of helices

get_human_dotbracket()

Get a human-readable dotbracket string representing the secondary structure.

This is an experimental format designed to be more human readable, i.e. no counting of brackets required.

Letters, instead of brackets, are used to denote nested base pairs.
Each helix is assigned a letter, which is incremented one letter alphabetically from the nearest enclosing stem.
Non-nested helices (pseudoknots) are assigned canonical brackets.

From this canonical dbn string:

how many bases are in the base stem? how many nested helices are there? ((((….(((.[[..)))))(((…(((..]].))))))))

Same question, new format:

AABB….CCC.[[..cccbbBBB…CCC..]].cccbbbaa

Read this as:

((_______________________________________)) (level 1 = A)

((_______________))(((______________))) (level 2 = B)

(((_____))) (((_____))) (level 3 = C): [[__________________]] (pseudoknot = [])

Pseudoknot levels:

1: Aa, Bb, Cc, etc. 2: [], 3: {}, 4: <>

get_interactions_df()

Returns a DataFrame of i, j basepairs.

Returns

pandas.DataFrame

A DataFrame with columns:: i: the 5’ (1-indexed) position of the base pair j: the 3’ (1-indexed) position of the base pair Structure: always 1

get_junction_nts()

Get a list of junction nucleotides (paired, but at the end of a chain).

Returns

list: A list of 1-indexed positions of junction nucleotides

get_nonredundant_ct()

Returns the ct attribute in a non-redundant form.

Only returns pairs in which i < j For example:

self.ct[i-1] == j self.ct[j-1] == i BUT self.get_nonredundant_ct()[j-1] == 0

Returns

numpy.array: A non-redundant array of base pairs

get_paired_nts()

Get a list of residues that are paired.

Returns

list: A list of 1-indexed positions of paired nucleotides

get_pairs()

Get a non-redundant list of base pairs i < j as a array of tuples.

Returns

list: A list of 1-indexed positions. e.g., [(1, 50), (2, 49), …]

get_pseudoknots(fill_mismatches=True)

Get the pk1 and pk2 pairs from the secondary structure.

Ignores single base pairs. PK1 is defined as the helix crossing the most other bps. If there is a tie, the most 5’ helix is called pk1 returns pk1 and pk2 as a list of base pairs e.g [(1,10),(2,9)…

Parameters

fill_mismatchesbool, defaults to True: Whether 1-1 and 2-2 bulges are replaced with base pairs

Returns

list of 2 lists of 2-tuples: A list of base pairs for pk1 and pk2

get_structure_elements()

This code is not yet implemented.

Returns a string with a character for each nucleotide, indicating what kind of structure element it is a part of.

Characters:: Dangling Ends (E) Stems (S) Hairpin Loops (H) Bulges (B) Internal Loops (I) MultiLoops (M) External Loops (X) Pseudoknot (P)

get_unpaired_nts()

Get a list of residues that are unpaired.

Returns

list: A list of 1-indexed positions of unpaired nucleotides

normalize_dtypes(): Convert dtypes of SecondaryStructure dataframe for consistency.

normalize_sequence(t_or_u='U', uppercase=True): Normalize the sequence attribute (fix case and/or U <-> T).

property nts

property pair_nts

read_ct(structure_number=0)

Loads secondary structure information from a given ct file.

Requires a properly formatted header.

Parameters

structure_numberint, defaults to 0: 0-indexed structure number to load from the ct file.

read_cte()

Generates SecondaryStructure object data from a CTE file

Resulting SecondaryStructure object will include nucleotide x and y coordinates and is compatible with plot_ss.

read_dotbracket()

Generates SecondaryStructure object data from a dot-bracket file.

Resulting SecondaryStructure object will include nucleotide x and y coordinates and is compatible with plot_ss.

read_forna()

Generates SecondaryStructure object data from a FORNA JSON file.

Resulting SecondaryStructure object will include nucleotide x and y coordinates and is compatible with plot_ss.

read_nsd(structure_number=0)

Generates SecondaryStructure object data from an NSD file.

Resulting SecondaryStructure object will include nucleotide x and y coordinates and is compatible with plot_ss.

read_r2dt()

Generates SecondaryStructure object data from an R2DT JSON file.

Resulting SecondaryStructure object will include nucleotide x and y coordinates and is compatible with plot_ss.

read_varna()

Generates SecondaryStructure object data from a VARNA file.

Resulting SecondaryStructure object will include nucleotide x and y coordinates and is compatible with plot_ss.

read_xrna()

Generates SecondaryStructure object data from an XRNA file.

Resulting SecondaryStructure object will include nucleotide x and y coordinates and is compatible with plot_ss.

transform_coordinates(flip=None, scale=None, center=None, rotate_degrees=None)

Perform transformations on X and Y structure coordinates.

To acheive vertical and horizontal flip together, rotate 180 degrees.

Parameters

flipstr, optional: “horizontal” or “vertical”
scalefloat, optional: new median distance of basepairs
centertuple of floats, optional: new center x and y coordinate
rotate_degreesfloat, optional: number of degrees to rotate structure

write_ct(out_file): Write structure to a ct file.

write_cte(out_file): Write structure to CTE format for Structure Editor.

write_dbn(rna_name, region='all', out_file=None)

Write the structure to a dot-bracket file.

Parameters

rna_namestr: The name of the RNA sequence
regionlist of 2 integers, optional: The region (start and end positions) of the RNA to write to file. Defaults to “all”.
out_filestr, optional: The name of the output file. If not provided, the dbn file is printed.

write_sto(out_file, name='seq'): Write structure to Stockholm (STO) file to use in infernal searches.

property xcoordinates

property ycoordinates

class rnavigate.data.Sequence(input_data, name=None, entry=0)

Bases: object

A class for storing and manipulating RNA sequences.

Parameters

sequencestring or pandas.DataFrame: sequence string, fasta file, or a Pandas dataframe containing a “Sequence” column
namestring, optional: The name of the sequence, defaults to None
entryint, defaults to 0: The index of the sequence in the fasta file if a fasta file is provided

Attributes

sequencestring: The sequence string
namestring: The name of the sequence
other_infodict: A dictionary of additional information about the sequence
null_alignmentSequenceAlignment: An alignment of the sequence to itself

get_aligned_data(alignment)

Get a copy of the sequence positionally aligned to another sequence.

Parameters

alignmentrnavigate.data.Alignment: the alignment to use

Returns

aligned_sequencernavigate.data.Sequence: the aligned sequence

get_colors(source, pos_cmap='rainbow', profile=None, structure=None, annotations=None)

Get colors and colormap representing information about the sequence.

Parameters

sourcestr, list, or matplotlib color-like

the source of the color information if a string, must be one of:

“sequence”, “position”, “profile”, “structure”, “annotations”

if a list, must be a list of matplotlib color-like values, colormap: will be None.
if a matplotlib color-like value, all nucleotides will be colored: that color, colormap will be None.

pos_cmapstr, defaults to “rainbow”

cmap used for position colors if source is “position”

profilernavigate.data.Profile, optional

the profile to use to get colors if source is “profile”

structurernavigate.data.SecondaryStructure, optional

the structure to use to get colors if source is “structure”

annotationslist of rnavigate.data.Annotations, optional

the annotations to use to get colors if source is “annotations”

Returns

colorsnumpy array: one matplotlib color-like value for each nucleotide in self.sequence
colormaprnavigate.data.ScalarMappable: a colormap used for creating a colorbar

get_colors_from_annotations(annotations, default_color='gray')

Get colors and colormap representing sequence annotations.

Parameters

annotationslist of rnavigate.data.Annotations: the annotations to use to get colors.
default_colormatplotlib color-like, defaults to “gray”: the color to use for nucleotides not in any annotation

Returns

colorsnumpy array: one matplotlib color-like value for each nucleotide in self.sequence
colormaprnavigate.data.ScalarMappable: a colormap used for creating a colorbar

get_colors_from_positions(pos_cmap='rainbow')

Get colors and colormap representing the nucleotide position.

Parameters

pos_cmapstr, defaults to “rainbow”: cmap used for position colors

Returns

colorsnumpy array: one matplotlib color-like value for each nucleotide in self.sequence
colormaprnavigate.data.ScalarMappable: a colormap used for creating a colorbar

get_colors_from_profile(profile)

Get colors and colormap representing per-nucleotide data.

Parameters

profilernavigate.data.Profile: the profile to use to get colors.

Returns

colorsnumpy array: one matplotlib color-like value for each nucleotide in self.sequence
colormaprnavigate.data.ScalarMappable: a colormap used for creating a colorbar

get_colors_from_sequence()

Get a colors and colormap representing the nucleotide sequence.

Returns

colorsnumpy array: one matplotlib color-like value for each nucleotide in self.sequence
colormaprnavigate.data.ScalarMappable: a colormap used for creating a colorbar

get_colors_from_structure(structure)

Get colors and colormap representing base-pairing status.

Parameters

structurernavigate.data.SecondaryStructure: the structure to use to get colors.

Returns

colorsnumpy array: one matplotlib color-like value for each nucleotide in self.sequence
colormaprnavigate.data.ScalarMappable: a colormap used for creating a colorbar

get_region(region='all')

Checks region input for validity and returns start and end positions.

If region is “all”, returns 1, self.length. Otherwise, ensures that region is between these values and returns the values, sorted.

Parameters

regionlist of 2 int: start and end positions of the region

Returns

start, endint, int: the starting and ending positions

get_region_data(region='all')

Get a copy of the data object containing only the specified region.

Parameters

regionlist of 2 int, defaults to “all”: start and end positions of the region

Returns

region_datarnavigate.data.Sequence: the sequence containing only the specified region

get_seq_from_dataframe(dataframe)

Parse a dataframe for the sequence string, store as self.sequence.

Parameters

dataframepandas.DataFrame: must contain a “Sequence” column

property length

Get the length of the sequence

Returns

lengthint: the length of self.sequence

normalize_sequence(t_or_u='U', uppercase=True)

Converts sequence to all uppercase nucleotides and corrects T or U.

Parameters

t_or_u“T”, “U”, or False, defaults to “U”: “T” converts “U”s to “T”s “U” converts “T”s to “U”s False does nothing.
uppercasebool, defaults to True: Whether to make sequence all uppercase

read_fasta(fasta, entry)

Parse a fasta file for the first sequence.

Parameters

fastastring: path to fasta file
entryint: the index of the sequence in the fasta file

Returns

sequencestring: the sequence string

write_fasta(file, name)

Write the sequence to a fasta file.

Parameters

filestring: path to output fasta file
namestring: the name of the sequence to write in the fasta file

class rnavigate.data.SequenceAlignment(sequence1, sequence2, align_kwargs=None, full=False, use_previous=True)

Bases: BaseAlignment

The most useful feature of RNAvigate. Maps positions from one sequence to a totally different sequence using user-defined pairwise alignment or automatic pairwise alignment.

Parameters

sequence1string: the sequence to be aligned
sequence2string: the sequence to align to
align_kwargsdict, defaults to None: a dictionary of arguments to pass to pairwise2.align.globalms
fullbool, defaults to False: whether to keep unmapped starting sequence positions.
use_previousbool, defaults to True: whether to use previously set alignments

Attributes

sequence1str: the sequence to be aligned
sequence2str: the sequence to align to
alignment1str: the alignment string matching sequence1 to sequence2
alignment2str: the alignment string matching sequence2 to sequence1
starting_sequencestr: sequence1
target_sequencestr: sequence2 if full is False, else alignment2
mappingnumpy.array: the alignment map array. index of starting_sequence is mapping[index] of target_sequence

get_alignment()

Gets an alignment that has either been user-defined or previously calculated or produces a new pairwise alignment between two sequences.

Returns

alignment1, alignment2tuple of 2 str: the alignment strings matching sequence1 and sequence2, respectively.

get_inverse_alignment(): Gets an alignment that maps from sequence2 to sequence1.

get_mapping()

Calculates a mapping from starting sequence to target sequence.

Returns

mappingnumpy.array: an array that maps to an index of target sequence. index of starting_sequence is mapping[index] of target_sequence

print(print_format='full')

Print the alignment in a human-readable format.

Parameters

print_format“full”, “cigar”, “long” or “short”, defaults to “full”: how to format the alignment. “full”: the full length alignment with changes labeled “X” “cigar”: the CIGAR string “long”: locations and sequences of each change “short”: total number of matches, mismatches, and indels

print_all_changes(): Print location and sequence of all changes.

print_cigar(): Print the CIGAR string

print_number_of_changes(): Print the total numbers of matches, mismatches, and indels.

class rnavigate.data.SequenceCircle(input_data, gap=30, name=None, **kwargs)

Bases: SecondaryStructure

A circular SecondaryStructure-like representation of RNA sequence.

class rnavigate.data.StructureAlignment(sequence1, sequence2, structure1=None, structure2=None, full=False)

Bases: BaseAlignment

Experimental secondary structure alignment based on RNAlign2D algorithm (https://doi.org/10.1186/s12859-021-04426-8)

Parameters

sequence1string: the sequence to be aligned
sequence2string: the sequence to align to
structure1string, defaults to None: the secondary structure of sequence1
structure2string, defaults to None: the secondary structure of sequence2
fullbool, defaults to False: whether to align to full length of sequence2 or just mapped length

Attributes

sequence1str: the sequence to be aligned
sequence2str: the sequence to align to
structure1str: the secondary structure of sequence1
structure2str: the secondary structure of sequence2
alignment1str: the alignment string matching sequence1 to sequence2
alignment2str: the alignment string matching sequence2 to sequence1
starting_sequencestr: sequence1
target_sequencestr: sequence2 if full is False, else alignment2
mappingnumpy.array: the alignment map array. index of starting_sequence is mapping[index] of target_sequence

get_alignment()

Aligns pseudo-amino-acid sequences according to RNAlign2D rules.

Returns

alignment1, alignment2tuple of 2 str: the alignment strings matching sequence1 and sequence2, respectively.

get_inverse_alignment(): Gets an alignment that maps from sequence2 to sequence1.

get_mapping()

Calculates a mapping from starting sequence to target sequence.

Returns

mappingnumpy.array: an array which maps an indices to the target sequence. starting_sequence[idx] == target_sequence[self.mapping[idx]]

set_as_default_alignment(): Set this as the default alignment between sequence1 and sequence2.

class rnavigate.data.StructureAsInteractions(input_data, sequence, metric=None, metric_defaults=None, window=1, name=None)

Bases: Interactions

A class for storing and manipulating structure data.

Parameters

input_datastring or pandas.DataFrame

If string, a path to a file containing structure data. If dataframe, the dataframe containing structure data. The dataframe must contain columns “i”, “j”, and “Structure”. Dataframe may also include other columns.

sequencestring or rnavigate.data.Sequence

The sequence string corresponding to the structure data.

metricstring, defaults to “Structure”

The column name to use for visualization.

metric_defaultsdict

Keys are metric names and values are dictionaries of metric-specific defaults. These defaults include:

“metric_column”string
the column name to use for visualization

“cmap”string or matplotlib.colors.Colormap
the colormap to use for visualization

“normalization”“min_max”, “0_1”, “none”, or “bins”
The type of normalization to use when mapping values to colors

“values”list of float
The values to used with normalization of the data

“title”string
the title to use for colorbars

“extend”“min”, “max”, “both”, or “neither”
Which ends to extend when drawing the colorbar.

“tick_labels” : list of string

read_table_kwdict, optional

kwargs passed to pandas.read_table() when reading input_data.

windowint, defaults to 1

The window size used to generate the structure data.

namestr, optional

A name for the StructureAsInteractions object.

Attributes

datapandas.DataFrame: The structure data.

class rnavigate.data.StructureCompareMany(input_data, sequence, metric=None, metric_defaults=None, window=1, name=None)

Bases: Interactions

A class for storing and manipulating a comparison of many structures.

Parameters

input_datastring or pandas.DataFrame

If string, a path to a file containing structure data. If dataframe, the dataframe containing structure data. The dataframe must contain columns “i”, “j”, and “Structure”. Dataframe may also include other columns.

sequencestring or rnavigate.data.Sequence

The sequence string corresponding to the structure data.

metricstring, defaults to “Structure”

The column name to use for visualization.

metric_defaultsdict

Keys are metric names and values are dictionaries of metric-specific defaults. These defaults include:

“metric_column”string
the column name to use for visualization

“cmap”string or matplotlib.colors.Colormap
the colormap to use for visualization

“normalization”“min_max”, “0_1”, “none”, or “bins”
The type of normalization to use when mapping values to colors

“values”list of float
The values to used with normalization of the data

“title”string
the title to use for colorbars

“extend”“min”, “max”, “both”, or “neither”
Which ends to extend when drawing the colorbar.

“tick_labels” : list of string

read_table_kwdict, optional

kwargs passed to pandas.read_table() when reading input_data.

windowint, defaults to 1

The window size used to generate the structure data.

namestr, optional

A name for the StructureAsInteractions object.

Attributes

datapandas.DataFrame: The structure data.

class rnavigate.data.StructureCompareTwo(input_data, sequence, metric=None, metric_defaults=None, window=1, name=None)

Bases: Interactions

A class for storing and manipulating a comparison of two structures.

Parameters

input_datastring or pandas.DataFrame

If string, a path to a file containing structure data. If dataframe, the dataframe containing structure data. The dataframe must contain columns “i”, “j”, and “Structure”. Dataframe may also include other columns.

sequencestring or rnavigate.data.Sequence

The sequence string corresponding to the structure data.

metricstring, defaults to “Structure”

The column name to use for visualization.

metric_defaultsdict

Keys are metric names and values are dictionaries of metric-specific defaults. These defaults include:

“metric_column”string
the column name to use for visualization

“cmap”string or matplotlib.colors.Colormap
the colormap to use for visualization

“normalization”“min_max”, “0_1”, “none”, or “bins”
The type of normalization to use when mapping values to colors

“values”list of float
The values to used with normalization of the data

“title”string
the title to use for colorbars

“extend”“min”, “max”, “both”, or “neither”
Which ends to extend when drawing the colorbar.

“tick_labels” : list of string

read_table_kwdict, optional

kwargs passed to pandas.read_table() when reading input_data.

windowint, defaults to 1

The window size used to generate the structure data.

namestr, optional

A name for the StructureAsInteractions object.

Attributes

datapandas.DataFrame: The structure data.

class rnavigate.data.StructureCoordinates(x, y, pairs=None)

Bases: object

Helper class to perform structure coordinate transformations

Parameters

xnumpy.array: x coordinates
ynumpy.array: y coordinates
pairslist of pairs, optional: list of base-paired positions required if scaling coordinates

center(x=0, y=0)

Center structure on the given x, y coordinate

Parameters

xint, defaults to 0: x coordinate of structure center
yint, defaults to 0: y coordinate of structure center

flip(horizontal=True)

Flip structure vertically or horizontally.

Parameters

horizontalbool, defaults to True: whether to flip structure horizontally, otherwise vertically

get_center_point()

Get the x, y coordinates for the center of structure.

Returns

float: x coordinate of structure center
float: y coordinate of structure center

rotate(degrees)

Rotate structure on current center point.

Parameters

degreesfloat: number of degrees to rotate structure

scale(median_bp_distance=1.0)

Scale structure such that median base-pair distance is constant.

Parameters

median_bp_distancefloat, defaults to 1.0: New median distance between all base-paired nucleotides.

rnavigate.data.domains(input_data, names, colors, sequence)

Create a list of Annotations from a list of spans.

Currently, domains functionality in RNAvigate just uses a list of spans. In the future, this should be a dedicated class. Generally, domains should cover an entire sequence without overlap, but this is not enforced. e.g. [[1, 100], [101, 200]] for a 200 nt sequence.

Parameters

input_datalist of lists: list of spans for each domain
nameslist of strings: list of names for each domain
colorslist of valid matplotlib colors: list of colors for each domain
sequencestring: sequence to be annotated

Returns

list of rnavigate.data.Annotation: list of Annotations

rnavigate.data.lookup_alignment(sequence1, sequence2, t_or_u='U')

look up a previously set alignment in the _alignments_cache

Parameters

sequence1string: The first sequence to align
sequence2string: The second sequence to be aligned to
t_or_u“T”, “U”, or False, defaults to “U”: “T” converts “U”s to “T”s “U” converts “U”s to “T”s False does nothing

Returns

dictionary, if an alignment is found, otherwise None

{“seqA”: sequence1 with gap characters representing alignment,: “seqB”: sequence2 with gap characters representing alignment}

rnavigate.data.normalize_sequence(sequence, t_or_u='U', uppercase=True)

Returns sequence as all uppercase nucleotides and/or corrects T or U.

Parameters

sequencestring or RNAvigate Sequence): The sequence If given an RNAvigate Sequence, the sequence string is retrieved
t_or_u“T”, “U”, or False, defaults to “U”: “T” converts “U”s to “T”s “U” converts “T”s to “U”s False does nothing
uppercase bool, defaults to True: Whether to make sequence all uppercase

Returns

string
the cleaned-up sequence string

rnavigate.data.set_alignment(sequence1, sequence2, alignment1, alignment2, t_or_u='U')

Add an alignment to be used as the default between two sequences.

When objects with these sequences are aligned for visualization, RNAvigate uses this alignment instead of an automated pairwise sequence alignment. Alignment 1 and 2 must have matching lengths. alignment(1,2) and sequence(1,2) must differ only by dashes “-“.

e.g.:: sequence1 =”AAGCUUCGGUACAUGCAAGAUGUAC” sequence2 =”AUCGAUCGAGCUGCUGUGUACGUAC” alignment1=”AAGCUUCG———GUACAUGCAAGAUGUAC” alignment2=”AUCGAUCGAGCUGCUGUGUAC———GUAC”

|mm| | indel | | indel |

Parameters

sequence1string: the first sequence
sequence2string: the second sequence
alignment1string: first sequence, plus dashes “-” indicating indels
alignment2string: second sequence, plus dashes “-” indicating indels
t_or_u“T”, “U”, or False: “T” converts “U”s to “T”s

rnavigate.data.set_multiple_sequence_alignment(fasta, set_pairwise=False)

Set alignments from a multiple sequence alignment Pearson fasta file.

Sets alignments to a base sequence, then returns the base sequence to be when a multiple sequence alignment plot is desired. Also sets all pairwise alignments, if desired. When setting pairwise alignments, dashes that are shared between pairwise sequences are removed first.

Parameters

fastastring: location of Pearson fasta file
set_pairwisebool, defaults to False: whether to set every pairwise alignment as well as the multiple sequence alignment.