rnavigate.data package

Submodules

rnavigate.data.alignments module

Alignment objects map coordinates, vectors, and dataframes to a new sequence

Classes

BaseAlignment (ABC)

abstract base class for alignments

SequenceAlignment (BaseAlignment)

aligns one sequence another sequence

RegionAlignment (BaseAlignment)

cuts a sequence between a start and end position

AlignmentChain (BaseAlignment)

allows chaining of above alignments

class rnavigate.data.alignments.AlignmentChain(*alignments)

Bases: BaseAlignment

Combines a list of alignments into one.

Parameters

alignmentslist of Alignment objects

the alignments to chain together

Attributes

alignmentslist

the constituent alignments

starting_sequencestr

starting sequence of alignments[0]

target_sequencestr

target sequence of alignments[-1]

mappingnumpy.array

an array which maps from starting_sequence to target_sequence. index of starting_sequence is mapping[index] of target sequence

get_inverse_alignment()

Alignments require a method to get the inverted alignment

get_mapping()

combines mappings from each alignment.

Returns

mappingnumpy.array

mapping from initial starting sequence to final target sequence index of starting_sequence is mapping[index] of target sequence

class rnavigate.data.alignments.BaseAlignment(starting_sequence, target_length)

Bases: ABC

Abstract base class for alignments

Parameters

starting_sequencestring

the sequence to be aligned

target_lengthint

the length of the target sequence

Attributes

starting_sequencestring

the beginning sequence

mappingnumpy.array

the alignment map array. index of starting_sequence is mapping[index] of target_sequence

target_sequencestring

the portion of starting sequence that is mapped

target_lengthinteger

the length of the target sequence

abstractmethod get_inverse_alignment()

Alignments require a method to get the inverted alignment

abstractmethod get_mapping()

Alignments require a mapping from starting to target sequence

get_target_sequence()

Gets the portion of starting sequence that fits the alignment

map_dataframe(dataframe, position_columns)

Takes a dataframe and maps position columns to target sequence.

Rows with unmapped positions are dropped.

Parameters

dataframepandas.DataFrame

a dataframe with position columns

position_columnslist of str

a list of columns containing positions to map

Returns

pandas.DataFrame

a new dataframe (copy) with position columns mapped or dropped

map_indices(indices, keep_minus_one=True)

Takes a list of indices (0-index) and maps them to target sequence

Parameters

indicesint or list of int

a single or list of integer indices

keep_minus_onebool, defaults to True

whether to keep unmapped starting sequence indices (-1) in the returned array.

Returns

numpy.array

the equivalent indices in target sequence

map_nucleotide_dataframe(dataframe, position_column='Nucleotide', sequence_column='Sequence')

Takes a per-nt dataframe and map it to the target sequence.

Dataframe must have 1 row per nucleotide in starting sequence, with a position column and a sequence column. Dataframe is mapped to have the same format, but for target sequence nucleotides and positions.

Parameters

dataframepandas.DataFrame

a per-nucleotide dataframe

position_columnstring, defaults to “Nucleotide”

name of the position column.

sequence_columnstring, defaults to “Sequence”

name of the sequence column.

Returns

pandas.DataFrame

a new dataframe (copy) mapped to target sequence. Unmapped starting sequence positions are dropped and unmapped target sequence positions are filled.

map_positions(positions, keep_zero=True)

Takes a list of positions (1-index) and maps them to target sequence

Parameters

positionsint or list of int

a single or list of integer positions

keep_zerobool, defaults to True

whether to keep unmapped starting sequence positions (0) in the returned array.

Returns

numpy.array

the equivalent positions in target sequence

map_values(values, fill=nan)

Takes an array of length equal to starting sequence and maps them to target sequence, unmapped positions in starting sequence are dropped and unmapped positions in target sequence are filled with fill value.

Parameters

valuesiterable

values to map to target sequence.

fillany, defaults to np.nan

a value for unmapped positions in target sequence.

Returns

numpy.array

an array of values equal in length to target sequence

class rnavigate.data.alignments.SequenceAlignment(sequence1, sequence2, align_kwargs=None, full=False, use_previous=True)

Bases: BaseAlignment

The most useful feature of RNAvigate. Maps positions from one sequence to a totally different sequence using user-defined pairwise alignment or automatic pairwise alignment.

Parameters

sequence1string

the sequence to be aligned

sequence2string

the sequence to align to

align_kwargsdict, defaults to None

a dictionary of arguments to pass to pairwise2.align.globalms

fullbool, defaults to False

whether to keep unmapped starting sequence positions.

use_previousbool, defaults to True

whether to use previously set alignments

Attributes

sequence1str

the sequence to be aligned

sequence2str

the sequence to align to

alignment1str

the alignment string matching sequence1 to sequence2

alignment2str

the alignment string matching sequence2 to sequence1

starting_sequencestr

sequence1

target_sequencestr

sequence2 if full is False, else alignment2

mappingnumpy.array

the alignment map array. index of starting_sequence is mapping[index] of target_sequence

get_alignment()

Gets an alignment that has either been user-defined or previously calculated or produces a new pairwise alignment between two sequences.

Returns

alignment1, alignment2tuple of 2 str

the alignment strings matching sequence1 and sequence2, respectively.

get_inverse_alignment()

Gets an alignment that maps from sequence2 to sequence1.

get_mapping()

Calculates a mapping from starting sequence to target sequence.

Returns

mappingnumpy.array

an array that maps to an index of target sequence. index of starting_sequence is mapping[index] of target_sequence

print(print_format='full')

Print the alignment in a human-readable format.

Parameters

print_format“full”, “cigar”, “long” or “short”, defaults to “full”

how to format the alignment. “full”: the full length alignment with changes labeled “X” “cigar”: the CIGAR string “long”: locations and sequences of each change “short”: total number of matches, mismatches, and indels

print_all_changes()

Print location and sequence of all changes.

print_cigar()

Print the CIGAR string

print_number_of_changes()

Print the total numbers of matches, mismatches, and indels.

class rnavigate.data.alignments.StructureAlignment(sequence1, sequence2, structure1=None, structure2=None, full=False)

Bases: BaseAlignment

Experimental secondary structure alignment based on RNAlign2D algorithm (https://doi.org/10.1186/s12859-021-04426-8)

Parameters

sequence1string

the sequence to be aligned

sequence2string

the sequence to align to

structure1string, defaults to None

the secondary structure of sequence1

structure2string, defaults to None

the secondary structure of sequence2

fullbool, defaults to False

whether to align to full length of sequence2 or just mapped length

Attributes

sequence1str

the sequence to be aligned

sequence2str

the sequence to align to

structure1str

the secondary structure of sequence1

structure2str

the secondary structure of sequence2

alignment1str

the alignment string matching sequence1 to sequence2

alignment2str

the alignment string matching sequence2 to sequence1

starting_sequencestr

sequence1

target_sequencestr

sequence2 if full is False, else alignment2

mappingnumpy.array

the alignment map array. index of starting_sequence is mapping[index] of target_sequence

get_alignment()

Aligns pseudo-amino-acid sequences according to RNAlign2D rules.

Returns

alignment1, alignment2tuple of 2 str

the alignment strings matching sequence1 and sequence2, respectively.

get_inverse_alignment()

Gets an alignment that maps from sequence2 to sequence1.

get_mapping()

Calculates a mapping from starting sequence to target sequence.

Returns

mappingnumpy.array

an array which maps an indices to the target sequence. starting_sequence[idx] == target_sequence[self.mapping[idx]]

set_as_default_alignment()

Set this as the default alignment between sequence1 and sequence2.

rnavigate.data.alignments.convert_sequence(aas, nts, dbn)

Convert pseudo-amino-acid sequence to nucleotide and dotbracket or vice versa.

Parameters

aasstring or True

the amino acid sequence if True, returns the amino acid translation of nts and dbn

ntsstring or True

the nucleotide sequence if True, returns the nucleotide translation of aas

dbnstring or True

the dot-bracket notation string if True, returns the dot-bracket translation of aas

Returns

string

sequence of the specified translation. If nts and dbn are True, returns a tuple.

Example

conver_sequence(aas=”ACDEFGHIKLMNPQRSTVWY”, nts=True, dbn=True) returns (“AAAAACCCCCUUUUUGGGGG”, “([.])([.])([.])([.])”)

rnavigate.data.alignments.lookup_alignment(sequence1, sequence2, t_or_u='U')

look up a previously set alignment in the _alignments_cache

Parameters

sequence1string

The first sequence to align

sequence2string

The second sequence to be aligned to

t_or_u“T”, “U”, or False, defaults to “U”

“T” converts “U”s to “T”s “U” converts “U”s to “T”s False does nothing

Returns

dictionary, if an alignment is found, otherwise None
{“seqA”: sequence1 with gap characters representing alignment,

“seqB”: sequence2 with gap characters representing alignment}

rnavigate.data.alignments.set_alignment(sequence1, sequence2, alignment1, alignment2, t_or_u='U')

Add an alignment to be used as the default between two sequences.

When objects with these sequences are aligned for visualization, RNAvigate uses this alignment instead of an automated pairwise sequence alignment. Alignment 1 and 2 must have matching lengths. alignment(1,2) and sequence(1,2) must differ only by dashes “-“.

e.g.:

sequence1 =”AAGCUUCGGUACAUGCAAGAUGUAC” sequence2 =”AUCGAUCGAGCUGCUGUGUACGUAC” alignment1=”AAGCUUCG———GUACAUGCAAGAUGUAC” alignment2=”AUCGAUCGAGCUGCUGUGUAC———GUAC”

|mm| | indel | | indel |

Parameters

sequence1string

the first sequence

sequence2string

the second sequence

alignment1string

first sequence, plus dashes “-” indicating indels

alignment2string

second sequence, plus dashes “-” indicating indels

t_or_u“T”, “U”, or False

“T” converts “U”s to “T”s

rnavigate.data.alignments.set_multiple_sequence_alignment(fasta, set_pairwise=False)

Set alignments from a multiple sequence alignment Pearson fasta file.

Sets alignments to a base sequence, then returns the base sequence to be when a multiple sequence alignment plot is desired. Also sets all pairwise alignments, if desired. When setting pairwise alignments, dashes that are shared between pairwise sequences are removed first.

Parameters

fastastring

location of Pearson fasta file

set_pairwisebool, defaults to False

whether to set every pairwise alignment as well as the multiple sequence alignment.

rnavigate.data.annotation module

annotations.py contains Annotations and subclasses.

class rnavigate.data.annotation.Annotation(input_data, annotation_type, sequence, name=None, color='blue')

Bases: Sequence

Basic annotation class to store 1D features of an RNA sequence

Each feature type must be a seperate instance. Feature types include:

a group of separted nucleotides (e.g. binding pocket) regions of interest (e.g. coding sequence, Alu elements) sites of interest (e.g. m6A locations) primer binding sites.

Parameters

input_datalist

List will be treated according to annotation_type argument. Expected behaviors for each value of annotation_type: “sites” or “group”: 1-indexed location of sites of interest

example: [1, 10, 20, 30] is four sites, 1, 10, 20, and 30

“spans”: 1-indexed, inclusive locations of spans of interest

example: [[1, 10], [20, 30]] is two spans, 1 to 10 and 20 to 30

“primers”: Similar to spans, but 5’/3’ direction is preserved.

example: [[1, 10], [30, 20]] forward 1 to 10, reverse 30 to 20

annotation_type“group”, “sites”, “spans”, or “primers”

The type of annotation.

sequencestr or pandas.DataFrame

Nucleotide sequence, path to fasta file, or dataframe containing a “Sequence” column.

namestr, defaults to None

Name of annotation.

colormatplotlib color-like, defaults to “blue”

Color to be used for displaying this annotation on plots.

Attributes

datapandas.DataFrame

Stores the list of sites or regions

namestr

The label for this annotation for use on plots

colorvalid matplotlib color

Color to represent annotation on plots

sequencestr

The reference sequence string

property boolean

Return a boolean array of the annotation on the sequence.

classmethod from_boolean_array(values, sequence, annotation_type, name, color='blue', window=1)

Create an Annotation from an array of boolean values.

True values are used to create the Annotation.

Parameters

valueslist of True or False

the boolean array

sequencestring or rnav.data.Sequence

the sequence of the Annotation

annotation_type“spans”, “sites”, “primers”, or “group”

the type of the new annotation If “spans” or “primers”, adjacent True values, or values within window are collapse to a region.

namestring

a name for labelling the annotation.

colorstring, defaults to “blue”

a color for plotting the annotation

windowinteger, defaults to 1

a window around True values to include in the annotation.

Returns

rnavigate.data.Annotation

the new Annotation

from_sites(sites)

Create the self.data dataframe from a list of sites.

from_spans(spans)

Create the self.data dataframe from a list of spans.

get_aligned_data(alignment)

Aligns this Annotation to a new sequence and returns a copy.

Parameters

alignmentrnavigate.data.Alignment

Alignment object used to align to a new sequence.

Returns

rnavigate.data.Annotation

A new Annotation with the same name, color, and annotation type, but with the input data aligned to the target sequence.

get_sites()

Returns a list of nucleotide positions included in this annotation.

Returns

sitestuple

a list of nucleotide positions

get_subsequences(buffer=0)
class rnavigate.data.annotation.Motif(input_data, sequence, name=None, color='blue')

Bases: Annotation

Automatically annotates the occurances of a sequence motif as spans.

Parameters

input_datastr

sequence motif to search for. Uses conventional nucleotide codes. e.g. “DRACH” = [AGTU] [AG] A C [ATUC]

sequencestr or pandas.DataFrame

Nucleotide sequence, path to fasta file, or dataframe containing a “Sequence” column.

namestr, defaults to None

Name of annotation.

colormatplotlib color-like, defaults to “blue”

Color to be used for displaying this annotation on plots.

Attributes

datapandas.DataFrame

Stores the list of regions that match the motif

namestr

The label for this annotation for use on plots

colorvalid matplotlib color

Color to represent annotation on plots

sequencestr

The reference sequence string

get_aligned_data(alignment)

Searches the new sequence for the motif and returns a new Motif annotation.

Parameters

alignmentrnavigate.data.Alignment

Alignment object used to align to a new sequence.

Returns

rnavigate.data.Motif

A new Motif with the same name, color, and motif but with the input data aligned to the target sequence.

get_spans_from_motif(sequence, motif)

Returns a list of spans for each location of motif found within sequence.

Parameters

sequencestring

sequence to be searched

motifstring

sequence motif to searched for.

Returns

spanslist of lists

list of [start, end] positions of each motif occurance

class rnavigate.data.annotation.ORFs(input_data, name=None, sequence=None, color='blue')

Bases: Annotation

Automatically annotations occurances of open-reading frames as spans.

Parameters

input_data“longest” or “all”

which ORFs to annotate. “longest” annotates the longest ORF. “all” annotates all potential ORFs.

sequencestr or pandas.DataFrame

Nucleotide sequence, path to fasta file, or dataframe containing a “Sequence” column.

namestr, defaults to None

Name of annotation.

colormatplotlib color-like, defaults to “blue”

Color to be used for displaying this annotation on plots.

Attributes

datapandas.DataFrame

Stores the list of regions that match the motif

namestr

The label for this annotation for use on plots

colorvalid matplotlib color

Color to represent annotation on plots

sequencestr

The reference sequence string

get_aligned_data(alignment)

Searches the new sequence for ORFs and returns a new ORF annotation.

Parameters

alignmentrnavigate.data.Alignment

Alignment object used to align to a new sequence.

Returns

rnavigate.data.ORFs

A new ORFs annotation with the same name, color, and input_data but with the input data aligned to the target sequence.

get_spans_from_orf(sequence, which='all')

Given a sequence string, returns spans for specified ORFs

Parameters

sequencestring

RNA nucleotide sequence

which“longest” or “all”, defaults to “all”

“all” returns all spans, “longest” returns the longest span

Returns

list of tuples

(start, end) position of each ORF 1-indexed, inclusive

rnavigate.data.annotation.domains(input_data, names, colors, sequence)

Create a list of Annotations from a list of spans.

Currently, domains functionality in RNAvigate just uses a list of spans. In the future, this should be a dedicated class. Generally, domains should cover an entire sequence without overlap, but this is not enforced. e.g. [[1, 100], [101, 200]] for a 200 nt sequence.

Parameters

input_datalist of lists

list of spans for each domain

nameslist of strings

list of names for each domain

colorslist of valid matplotlib colors

list of colors for each domain

sequencestring

sequence to be annotated

Returns

list of rnavigate.data.Annotation

list of Annotations

rnavigate.data.colors module

class rnavigate.data.colors.ScalarMappable(cmap, normalization, values, title='', tick_labels=None, **cbar_args)

Bases: _ScalarMappable

Used to map scalar values to a color and to create a colorbar plot.

Parameters

cmapstr, tuple, float, or list

A valid mpl color, list of valid colors or a valid colormap name

normalization“min_max”, “0_1”, “none”, or “bins”

The type of normalization to use when mapping values to colors

valueslist

The values to use when normalizing the data

titlestr, defaults to “”

The title of the colorbar.

tick_labelslist, defaults to None

The labels to use for the colorbar ticks. If None, values are determined automatically.

**cbar_argsdict

Additional arguments to pass to the colorbar function

Attributes

rnav_normstr

The type of normalization to use when mapping values to colors

rnav_valslist

The values to use when normalizing the data

rnav_cmaplist

The colors to use when mapping values to colors

cbar_argsdict

Additional arguments to pass to the colorbar function

tick_labelslist

The labels to use for the colorbar ticks. If None, values are determined automatically.

titlestr

The title of the colorbar.

get_cmap(cmap)

Converts a cmap specification to a matplotlib colormap object.

Parameters

cmapstring, tuple, float, or list

A valid mpl color, list of valid colors or a valid colormap name

Returns

matplotlib colormap

a colormap matching the input

get_norm(normalization, values, cmap)

Given a normalization type and values, return a normalization object.

Parameters

normalization“min_max”, “0_1”, “none”, or “bins”

The type of normalization to use when mapping values to colors

valueslist

The values to use when normalizing the data

cmapmatplotlib colormap

The colormap to use when normalizing the data

Returns

matplotlib.colors normalization object

Used to normalize data before mapping to colors

is_equivalent_to(cmap2)

Check if two ScalarMappable objects are equivalent.

Parameters

cmap2ScalarMappable

The ScalarMappable object to compare to

Returns

bool

True if the two ScalarMappable objects are equivalent, False otherwise

values_to_hexcolors(values, alpha=1.0)

Map values to colors and return a list of hex colors.

Parameters

valueslist

The values to map to colors

alphafloat, defaults to 1.0

The alpha value to use for the colors

Returns

list of strings

A list of hex colors

rnavigate.data.data module

Classes for storing and manipulating data for RNAvigate.

This module contains the base classes for RNAvigate data classes:

Sequence: represents a nucleotide sequence Data: represents a data table with a sequence

class rnavigate.data.data.Data(input_data, sequence, metric, metric_defaults, read_table_kw=None, name=None)

Bases: Sequence

The base class for RNAvigate Profile and Interactions classes.

Parameters

input_datapandas.DataFrame or str

a pandas dataframe or path to a data file

sequencestring or rnavigate.data.Sequence

the sequence to use for the data

metricstring or dict

the column of the dataframe to use as the default metric to visualize

metric_defaultsdict

a dictionary of metric defaults

read_table_kwdict, optional

kwargs dictionary passed to pd.read_table

namestring, optional

the name of the data, defaults to None

Attributes

datapandas.DataFrame

the data table

filepathstring

the path to the data file

sequencestring or rnavigate.data.Sequence

the sequence to use for the data

metricstring or dict

the column of the dataframe to use as the metric to visualize

metric_defaultsdict

A dictionary of metric values and default settings for visualization

default_metricstring

the default metric to use for visualization

add_metric_defaults(metric_defaults)

Add metric defaults to self.metric_defaults

property cmap

Get the colormap to use for colorbars and to retrieve colors.

property color_column

Get the column of the dataframe to use as the color for visualization.

property colors

Get one matplotlib color-like value for each nucleotide in self.sequence.

property error_column

Get the column of the dataframe to use as the error for visualization.

property metric

Get the column of the dataframe to use as the metric for visualization.

read_file(filepath, read_table_kw)

Convert data file to pandas dataframe and store as self.data

Parameters

filepathstring

path to data file containing interactions

read_table_kwdict

kwargs dictionary passed to pd.read_table

Returns

dataframepandas.DataFrame

the data table

class rnavigate.data.data.Sequence(input_data, name=None, entry=0)

Bases: object

A class for storing and manipulating RNA sequences.

Parameters

sequencestring or pandas.DataFrame

sequence string, fasta file, or a Pandas dataframe containing a “Sequence” column

namestring, optional

The name of the sequence, defaults to None

entryint, defaults to 0

The index of the sequence in the fasta file if a fasta file is provided

Attributes

sequencestring

The sequence string

namestring

The name of the sequence

other_infodict

A dictionary of additional information about the sequence

null_alignmentSequenceAlignment

An alignment of the sequence to itself

get_aligned_data(alignment)

Get a copy of the sequence positionally aligned to another sequence.

Parameters

alignmentrnavigate.data.Alignment

the alignment to use

Returns

aligned_sequencernavigate.data.Sequence

the aligned sequence

get_colors(source, pos_cmap='rainbow', profile=None, structure=None, annotations=None)

Get colors and colormap representing information about the sequence.

Parameters

sourcestr, list, or matplotlib color-like

the source of the color information if a string, must be one of:

“sequence”, “position”, “profile”, “structure”, “annotations”

if a list, must be a list of matplotlib color-like values, colormap

will be None.

if a matplotlib color-like value, all nucleotides will be colored

that color, colormap will be None.

pos_cmapstr, defaults to “rainbow”

cmap used for position colors if source is “position”

profilernavigate.data.Profile, optional

the profile to use to get colors if source is “profile”

structurernavigate.data.SecondaryStructure, optional

the structure to use to get colors if source is “structure”

annotationslist of rnavigate.data.Annotations, optional

the annotations to use to get colors if source is “annotations”

Returns

colorsnumpy array

one matplotlib color-like value for each nucleotide in self.sequence

colormaprnavigate.data.ScalarMappable

a colormap used for creating a colorbar

get_colors_from_annotations(annotations, default_color='gray')

Get colors and colormap representing sequence annotations.

Parameters

annotationslist of rnavigate.data.Annotations

the annotations to use to get colors.

default_colormatplotlib color-like, defaults to “gray”

the color to use for nucleotides not in any annotation

Returns

colorsnumpy array

one matplotlib color-like value for each nucleotide in self.sequence

colormaprnavigate.data.ScalarMappable

a colormap used for creating a colorbar

get_colors_from_positions(pos_cmap='rainbow')

Get colors and colormap representing the nucleotide position.

Parameters

pos_cmapstr, defaults to “rainbow”

cmap used for position colors

Returns

colorsnumpy array

one matplotlib color-like value for each nucleotide in self.sequence

colormaprnavigate.data.ScalarMappable

a colormap used for creating a colorbar

get_colors_from_profile(profile)

Get colors and colormap representing per-nucleotide data.

Parameters

profilernavigate.data.Profile

the profile to use to get colors.

Returns

colorsnumpy array

one matplotlib color-like value for each nucleotide in self.sequence

colormaprnavigate.data.ScalarMappable

a colormap used for creating a colorbar

get_colors_from_sequence()

Get a colors and colormap representing the nucleotide sequence.

Returns

colorsnumpy array

one matplotlib color-like value for each nucleotide in self.sequence

colormaprnavigate.data.ScalarMappable

a colormap used for creating a colorbar

get_colors_from_structure(structure)

Get colors and colormap representing base-pairing status.

Parameters

structurernavigate.data.SecondaryStructure

the structure to use to get colors.

Returns

colorsnumpy array

one matplotlib color-like value for each nucleotide in self.sequence

colormaprnavigate.data.ScalarMappable

a colormap used for creating a colorbar

get_region(region='all')

Checks region input for validity and returns start and end positions.

If region is “all”, returns 1, self.length. Otherwise, ensures that region is between these values and returns the values, sorted.

Parameters

regionlist of 2 int

start and end positions of the region

Returns

start, endint, int

the starting and ending positions

get_region_data(region='all')

Get a copy of the data object containing only the specified region.

Parameters

regionlist of 2 int, defaults to “all”

start and end positions of the region

Returns

region_datarnavigate.data.Sequence

the sequence containing only the specified region

get_seq_from_dataframe(dataframe)

Parse a dataframe for the sequence string, store as self.sequence.

Parameters

dataframepandas.DataFrame

must contain a “Sequence” column

property length

Get the length of the sequence

Returns

lengthint

the length of self.sequence

normalize_sequence(t_or_u='U', uppercase=True)

Converts sequence to all uppercase nucleotides and corrects T or U.

Parameters

t_or_u“T”, “U”, or False, defaults to “U”

“T” converts “U”s to “T”s “U” converts “T”s to “U”s False does nothing.

uppercasebool, defaults to True

Whether to make sequence all uppercase

read_fasta(fasta, entry)

Parse a fasta file for the first sequence.

Parameters

fastastring

path to fasta file

entryint

the index of the sequence in the fasta file

Returns

sequencestring

the sequence string

write_fasta(file, name)

Write the sequence to a fasta file.

Parameters

filestring

path to output fasta file

namestring

the name of the sequence to write in the fasta file

rnavigate.data.data.normalize_sequence(sequence, t_or_u='U', uppercase=True)

Returns sequence as all uppercase nucleotides and/or corrects T or U.

Parameters

sequencestring or RNAvigate Sequence)

The sequence If given an RNAvigate Sequence, the sequence string is retrieved

t_or_u“T”, “U”, or False, defaults to “U”

“T” converts “U”s to “T”s “U” converts “T”s to “U”s False does nothing

uppercase bool, defaults to True

Whether to make sequence all uppercase

Returns

string

the cleaned-up sequence string

rnavigate.data.interactions module

class rnavigate.data.interactions.AllPossible(sequence, metric='data', input_data=None, metric_defaults=None, read_table_kw=None, window=1, name=None)

Bases: Interactions

A class for storing and manipulating all possible interactions.

Parameters

sequencestring or rnavigate.data.Sequence

The sequence string corresponding to the pairing probability data.

metricstring, defaults to “Probability”

The column name to use for visualization.

metric_defaultsdict

Keys are metric names and values are dictionaries of metric-specific defaults. These defaults include:

“metric_column”string

the column name to use for visualization

“cmap”string or matplotlib.colors.Colormap

the colormap to use for visualization

“normalization”“min_max”, “0_1”, “none”, or “bins”

The type of normalization to use when mapping values to colors

“values”list of float

The values to used with normalization of the data

“title”string

the title to use for colorbars

“extend”“min”, “max”, “both”, or “neither”

Which ends to extend when drawing the colorbar.

“tick_labels” : list of string

read_table_kwdict, optional

kwargs passed to pandas.read_table() when reading input_data.

windowint, defaults to 1

The window size used to generate the pairing probability data.

namestr, optional

A name for the AllPossible object.

Attributes

datapandas.DataFrame

The pairing probability data.

class rnavigate.data.interactions.Interactions(input_data, sequence, metric, metric_defaults, read_table_kw=None, window=1, name=None)

Bases: Data

A class for storing and manipulating interactions data.

Parameters

input_datastring or pandas.DataFrame

If string, a path to a file containing interactions data. If dataframe, the dataframe containing interactions data. The dataframe must contain columns “i”, “j”, and self.metric. Dataframe may also include other columns.

sequencestring or rnavigate.data.Sequence

The sequence string corresponding to the interactions data.

metricstring

The column name to use for visualization.

metric_defaultsdict

Keys are metric names and values are dictionaries of metric-specific defaults. These defaults include:

“metric_column”string

the column name to use for visualization

“cmap”string or matplotlib.colors.Colormap)

the colormap to use for visualization

“normalization”“min_max”, “0_1”, “none”, or “bins”

The type of normalization to use when mapping values to colors

“values”list of float

The values to used with normalization of the data

“title”string

the title to use for colorbars

“extend”“min”, “max”, “both”, or “neither”

Which ends to extend when drawing the colorbar.

“tick_labels” : list of string

read_table_kwdict

kwargs passed to pandas.read_table() when reading input_data.

windowint

The window size used to generate the interactions data.

namestr

The name of the data object.

Attributes

datapandas.DataFrame

The interactions data.

windowint

The window size that is being represented by i-j pairs.

copy(apply_filter=False)

Returns a copy of the interactions, optionally with masked rows removed.

Parameters

apply_filterbool, defaults to False

If True, masked rows (“mask” == False) are dropped.

Returns

rnavigate.data.Interactions

A copy of the interactions.

count_filter(**kwargs)

Counts the number of interactions that pass the given filters.

data_specific_filter(**kwargs)

Does nothing for the base Interactions class, can be overwritten in subclasses.

Returns:

dict: dictionary of keyword argument pairs

filter(prefiltered=False, reset_filter=True, structure=None, min_cd=None, max_cd=None, paired_only=False, ss_only=False, ds_only=False, profile=None, min_profile=None, max_profile=None, compliments_only=False, nts=None, max_distance=None, min_distance=None, exclude_nts=None, isolate_nts=None, resolve_conflicts=None, **kwargs)

Convenience function that applies the above filters simultaneously.

Parameters

prefilteredbool, defaults to False

If True, the mask is not updated.

reset_filterbool, defaults to True

If True, the mask is reset before applying filters.

structurernavigate.data.SecondaryStructure, defaults to None

The structure to use for filtering.

min_cdint, defaults to None

The minimum contact distance to allow.

max_cdint, defaults to None

The maximum contact distance to allow.

paired_onlybool, defaults to False

If True, only keep interactions that are paired in the structure.

ss_onlybool, defaults to False

If True, only keep interactions between single-stranded nucleotides.

ds_onlybool, defaults to False

If True, only keep interactions between double-stranded nucleotides.

profilernavigate.data.Profile, defaults to None

The profile to use for masking.

min_profilefloat, defaults to None

The minimum profile value to allow.

max_profilefloat, defaults to None

The maximum profile value to allow.

compliments_onlybool, defaults to False

If True, only keep interactions where i and j are complimentary nucleotides.

ntsstr, defaults to None

If compliment_only is False, only keep interactions where i and j are in nts.

max_distanceint, defaults to None

The maximum distance to allow. If None, no maximum distance is set.

min_distanceint, defaults to None

The minimum distance to allow. If None, no minimum distance is set.

exclude_ntslist of int, defaults to None

A list of positions to exclude.

isolate_ntslist of int, defaults to None

A list of positions to isolate.

resolve_conflictsstr, defaults to None

If not None, conflicting windows are resolved using the Maximal Weighted Independent Set. The weights are taken from the metric value. The graph is first broken into components to speed up the identification of the MWIS. Then the mask is updated to only include the MWIS.

**kwargsdict

Each keyword should have the format “column_operator” where column is a valid column name of the dataframe and operator is one of:

“ge”: greater than or equal to “le”: less than or equal to “gt”: greater than “lt”: less than “eq”: equal to “ne”: not equal to

The values given to these keywords are then used in the comparison and False comparisons are filtered out. e.g.:

self.mask_on_values(Statistic_ge=23) evaluates to: self.update_mask(self.data[“Statistic”] >= 23)

Returns

masknumpy array

a boolean array of the same length as self.data

get_aligned_data(alignment, apply_filter=True)

Returns a copy mapped to a new sequence with masked rows removed.

Parameters

alignmentrnavigate.data.SequenceAlignment

The alignment to use for mapping the interactions.

apply_filterbool, defaults to True

If True, masked rows (“mask” == False) are dropped.

Returns

rnavigate.data.Interactions

Interactions mapped to a new sequence.

get_ij_colors()

Gets i, j, and colors lists for plotting interactions.

i and j are the 5’ and 3’ ends of each interaction, and colors is the color to use for each interaction. Values of self.data[self.metric] are normalized to 0 to 1, which correspond to self.min_max values. These are then mapped to a color using self.cmap.

Returns

ilist

5’ ends of each interaction

jlist

3’ ends of each interaction

colorslist

colors to use for each interaction

get_sorted_data()

Returns a copy of the data sorted by self.metric.

Returns

pandas.DataFrame

a copy of the data sorted by self.metric

mask_on_distance(max_dist=None, min_dist=None)

Mask interactions based on their distance in sequence space.

Parameters

max_distint, defaults to None

The maximum distance to allow. If None, no maximum distance is set.

min_distint, defaults to None

The minimum distance to allow. If None, no minimum distance is set.

Returns

masknumpy array

a boolean array of the same length as self.data

mask_on_position(exclude=None, isolate=None)

Mask interactions based on their i and j positions.

Parameters

excludelist of int, defaults to None

A list of positions to exclude.

isolatelist of int, defaults to None

A list of positions to isolate.

Returns

masknumpy array

a boolean array of the same length as self.data

mask_on_profile(profile, min_profile=None, max_profile=None)

Masks interactions based on per-nucleotide measurements.

Parameters

profilernavigate.data.Profile

The profile to use for masking.

min_profilefloat, defaults to None

The minimum profile value to allow.

max_profilefloat, defaults to None

The maximum profile value to allow.

Returns

masknumpy array

a boolean array of the same length as self.data

mask_on_sequence(compliment_only=None, nts=None)

Mask interactions based on sequence.

Parameters

compliment_onlybool, defaults to None

If True, only keep interactions where i and j are complimentary nucleotides.

ntsstr, defaults to None

If compliment_only is False, only keep interactions where i and j are in nts.

Returns

numpy array

a boolean array of the same length as self.data

mask_on_structure(structure, min_cd=None, max_cd=None, ss_only=False, ds_only=False, paired_only=False)

Masks interactions based on a secondary structure.

Parameters

structurernavigate.data.SecondaryStructure

The secondary structure to use for masking.

min_cdint, defaults to None

The minimum contact distance to allow.

max_cdint, defaults to None

The maximum contact distance to allow.

ss_onlybool, defaults to False

If True, only keep interactions between single-stranded nucleotides.

ds_onlybool, defaults to False

If True, only keep interactions between double-stranded nucleotides.

paired_onlybool, defaults to False

If True, only keep interactions that are paired in the structure.

Returns

masknumpy array

a boolean array of the same length as self.data

mask_on_values(**kwargs)

Mask interactions based on values in self.data.

Parameters

kwargsdict

Each keyword should have the format “column_operator” where column is a valid column name of the dataframe and operator is one of:

“ge”: greater than or equal to “le”: less than or equal to “gt”: greater than “lt”: less than “eq”: equal to “ne”: not equal to

The values given to these keywords are then used in the comparison and False comparisons are filtered out. e.g.:

self.mask_on_values(Statistic_ge=23) evaluates to: self.update_mask(self.data[“Statistic”] >= 23)

Returns

masknumpy array

a boolean array of the same length as self.data

print_new_file(outfile=None)

Create a new file with mapped and filtered interactions.

Parameters

outfilestr, defaults to None

path to an output file. If None, file string is printed to console.

reset_mask()

Resets the mask to all True (removes previous filters)

resolve_conflicts(metric=None)

Uses an experimental method to resolve conflicts.

Resolves conflicting windows using the Maximal Weighted Independent Set. The weights are taken from the metric value. The graph is first broken into components to speed up the identification of the MWIS. Then the mask is updated to only include the MWIS. This method is computationally expensive for large or dense datasets.

Parameters

metricstr, defaults to None

The metric to use for weighting the graph. If None, self.metric is used.

Returns

masknumpy array

a boolean array of the same length as self.data

set_3d_distances(pdb, atom)

Wrapper for set_distances for backwards compatibility.

set_distances(structure, atom="O2'")

Sets the Distance column value based on nt distances in the given structure.

If structure is a SecondaryStructure, contact distances are calculated, and if structure is a PDB, 3D distances are calculated. These distances are averaged across the window and stored in a new “Distance” column in self.data.

Parameters

structurernavigate.data.SecondaryStructure or rnavigate.data.PDB

Structure object to use for calculating distances

atomstr

atom id to use for calculating distances in a PDB structure

update_mask(mask)

Updates the mask by ANDing the current mask with the given mask.

class rnavigate.data.interactions.PAIRMaP(input_data, sequence=None, metric='Class', metric_defaults=None, read_table_kw=None, window=1, name=None)

Bases: RINGMaP

A class for storing and manipulating PAIRMaP data.

Parameters

input_datastring or pandas.DataFrame

If string, a path to a file containing PAIRMaP data. If dataframe, the dataframe containing PAIRMaP data. The dataframe must contain columns “i”, “j”, “Statistic”, and “Class”. Dataframe may also include other columns.

sequencestring or rnavigate.data.Sequence

The sequence string corresponding to the PAIRMaP data.

metricstring, defaults to “Class”

The column name to use for visualization.

metric_defaultsdict

Keys are metric names and values are dictionaries of metric-specific defaults. These defaults include:

“metric_column”string

the column name to use for visualization

“cmap”string or matplotlib.colors.Colormap)

the colormap to use for visualization

“normalization”“min_max”, “0_1”, “none”, or “bins”

The type of normalization to use when mapping values to colors

“values”list of float

The values to used with normalization of the data

“title”string

the title to use for colorbars

“extend”“min”, “max”, “both”, or “neither”

Which ends to extend when drawing the colorbar.

“tick_labels” : list of string

read_table_kwdict, optional

kwargs passed to pandas.read_table() when reading input_data.

windowint, defaults to 1

The window size used to generate the PAIRMaP data. If an input file is provided, this value is overwritten by the value in the header.

namestr, optional

A name for the interactions object.

Attributes

datapandas.DataFrame

The PAIRMaP data.

data_specific_filter(all_pairs=False, **kwargs)

Used by Interactions.filter(). By default, non-primary and -secondary pairs are removed. all_pairs=True changes this behavior.

Parameters

all_pairsbool, defaults to False

whether to include all PAIRs.

Returns

kwargsdict

any additional keyword-argument pairs are returned

masknumpy array

a boolean array of the same length as self.data

get_sorted_data()

Same as parent function, unless metric is set to “Class”, in which case ij pairs are returned in a different order.

Returns

pandas.DataFrame

a copy of the data sorted by self.metric

read_file(filepath, read_table_kw=None)

Parses a pairmap.txt file and stores data as a dataframe

Sets self.window (usually 3), from header.

Parameters

filepathstr

path to pairmap.txt file

read_table_kwdict, defaults to None

This argument is ignored.

class rnavigate.data.interactions.PairingProbability(input_data, extension=None, sequence=None, metric='Probability', metric_defaults=None, read_table_kw=None, window=1, name=None)

Bases: Interactions

A class for storing and manipulating pairing probability data.

Parameters

input_datastring or pandas.DataFrame

If string, a path to a file containing pairing probability data. If dataframe, the dataframe containing pairing probability data. The dataframe must contain columns “i”, “j”, “Probability”, and “log10p”. Dataframe may also include other columns.

extensionstring, defaults to None

The file extension of the input_data. If None, the extension is determined from the input_data string. Options are “.bps”, “.txt”, and “.dp”. If the extension is “.bps”, the sequence is parsed from the file. If the extension is “.txt” or “.dp”, the sequence must be provided via the sequence argument.

sequencestring or rnavigate.data.Sequence

The sequence string corresponding to the pairing probability data.

metricstring, defaults to “Probability”

The column name to use for visualization.

metric_defaultsdict

Keys are metric names and values are dictionaries of metric-specific defaults. These defaults include:

“metric_column”string

the column name to use for visualization

“cmap”string or matplotlib.colors.Colormap

the colormap to use for visualization

“normalization”“min_max”, “0_1”, “none”, or “bins”

The type of normalization to use when mapping values to colors

“values”list of float

The values to used with normalization of the data

“title”string

the title to use for colorbars

“extend”“min”, “max”, “both”, or “neither”

Which ends to extend when drawing the colorbar.

“tick_labels” : list of string

read_table_kwdict, optional

kwargs passed to pandas.read_table() when reading input_data.

windowint, defaults to 1

The window size used to generate the pairing probability data.

namestr, optional

A name for the PairingProbability object.

Attributes

datapandas.DataFrame

The pairing probability data.

data_specific_filter(**kwargs)

By default, interactions with probabilities less than 0.03 are removed.

Returns

kwargsdict

any additional keyword-argument pairs are returned

masknumpy array

a boolean array of the same length as self.data

get_entropy_profile(print_out=False, save_file=None)

Calculates per-nucleotide Shannon entropy from pairing probabilities.

Parameters

print_outbool, defaults to False

If True, entropy values are printed to console.

save_filestr, defaults to None

If not None, entropy values are saved to this file.

Returns

rnavigate.data.Profile

a Profile object containing the entropy data

read_bps()

Parses a bps file and returns sequence as a string and data as a dataframe.

Returns

str

the sequence string

pandas.DataFrame

the pairing probability data

read_txt()

Parses a pairing probability file and returns data as a dataframe.

Parameters

filepathstr

path to pairing probability file

read_table_kwdict, defaults to None

This argument is ignored.

Returns

pandas.DataFrame

the pairing probability data

class rnavigate.data.interactions.RINGMaP(input_data, sequence=None, metric='Statistic', metric_defaults=None, read_table_kw=None, window=1, name=None)

Bases: Interactions

A class for storing and manipulating RINGMaP data.

Parameters

input_datastring or pandas.DataFrame

If string, a path to a file containing RINGMaP data. If dataframe, the dataframe containing RINGMaP data. The dataframe must contain columns “i”, “j”, “Statistic”, and “Zij”. Dataframe may also include other columns.

sequencestring or rnavigate.data.Sequence

The sequence string corresponding to the RINGMaP data.

metricstring, defaults to “Statistic”

The column name to use for visualization.

metric_defaultsdict

Keys are metric names and values are dictionaries of metric-specific defaults. These defaults include:

“metric_column”string

the column name to use for visualization

“cmap”string or matplotlib.colors.Colormap)

the colormap to use for visualization

“normalization”“min_max”, “0_1”, “none”, or “bins”

The type of normalization to use when mapping values to colors

“values”list of float

The values to used with normalization of the data

“title”string

the title to use for colorbars

“extend”“min”, “max”, “both”, or “neither”

Which ends to extend when drawing the colorbar.

“tick_labels” : list of string

read_table_kwdict, optional

kwargs passed to pandas.read_table() when reading input_data.

windowint, defaults to 1

The window size used to generate the RINGMaP data. If an input file is provided, this value is overwritten by the value in the header.

namestr, optional

A name for the interactions object.

Attributes

datapandas.DataFrame

The RINGMaP data.

data_specific_filter(positive_only=False, negative_only=False, **kwargs)

Adds filters for “Sign” column to parent filter() function

Parameters

positive_onlybool, defaults to False

If True, only keep positive correlations.

negative_onlybool, defaults to False

If True, only keep negative correlations.

Returns

kwargsdict

any additional keyword-argument pairs are returned

masknumpy array

a boolean array of the same length as self.data

get_sorted_data()

Sorts on the product of self.metric and “Sign” columns.

Except when self.metric is “Distance”.

Returns

pandas.DataFrame

a copy of the data sorted by (self.metric * “Sign”) columns

read_file(filepath, read_table_kw=None)

Parses a RINGMaP correlations file and stores data as a dataframe.

Also sets self.window (usually 1, from header).

Parameters

filepathstr

path to correlations file.

read_table_kwdict, defaults to {}

kwargs passed to pandas.read_table().

Returns

pandas.DataFrame

the RINGMaP data

class rnavigate.data.interactions.SHAPEJuMP(input_data, sequence=None, metric='Percentile', metric_defaults=None, read_table_kw=None, window=1, name=None)

Bases: Interactions

A class for storing and manipulating SHAPEJuMP data.

Parameters

input_datastring or pandas.DataFrame

If string, a path to a file containing SHAPEJuMP data. If dataframe, the dataframe containing SHAPEJuMP data. The dataframe must contain columns “i”, “j”, “Metric” (JuMP rate) and “Percentile” (percentile ranking). Dataframe may also include other columns.

sequencestring or rnavigate.data.Sequence

The sequence string corresponding to the SHAPEJuMP data.

metricstring, defaults to “Percentile”

The column name to use for visualization.

metric_defaultsdict

Keys are metric names and values are dictionaries of metric-specific defaults. These defaults include:

“metric_column”string

the column name to use for visualization

“cmap”string or matplotlib.colors.Colormap)

the colormap to use for visualization

“normalization”“min_max”, “0_1”, “none”, or “bins”

The type of normalization to use when mapping values to colors

“values”list of float

The values to used with normalization of the data

“title”string

the title to use for colorbars

“extend”“min”, “max”, “both”, or “neither”

Which ends to extend when drawing the colorbar.

“tick_labels” : list of string

read_table_kwdict

kwargs passed to pandas.read_table() when reading input_data.

windowint

The window size used to generate the SHAPEJuMP data.

namestr

A name for the interactions object.

Attributes

datapandas.DataFrame

The SHAPEJuMP data.

read_file(input_data, read_table_kw=None)

Parses a deletions.txt file and stores it as a dataframe.

Also calculates a “Percentile” column.

Parameters

input_datastr

path to deletions.txt file

read_table_kwdict, defaults to {}

kwargs passed to pandas.read_table().

Returns

pandas.DataFrame

the SHAPEJuMP data

class rnavigate.data.interactions.StructureAsInteractions(input_data, sequence, metric=None, metric_defaults=None, window=1, name=None)

Bases: Interactions

A class for storing and manipulating structure data.

Parameters

input_datastring or pandas.DataFrame

If string, a path to a file containing structure data. If dataframe, the dataframe containing structure data. The dataframe must contain columns “i”, “j”, and “Structure”. Dataframe may also include other columns.

sequencestring or rnavigate.data.Sequence

The sequence string corresponding to the structure data.

metricstring, defaults to “Structure”

The column name to use for visualization.

metric_defaultsdict

Keys are metric names and values are dictionaries of metric-specific defaults. These defaults include:

“metric_column”string

the column name to use for visualization

“cmap”string or matplotlib.colors.Colormap

the colormap to use for visualization

“normalization”“min_max”, “0_1”, “none”, or “bins”

The type of normalization to use when mapping values to colors

“values”list of float

The values to used with normalization of the data

“title”string

the title to use for colorbars

“extend”“min”, “max”, “both”, or “neither”

Which ends to extend when drawing the colorbar.

“tick_labels” : list of string

read_table_kwdict, optional

kwargs passed to pandas.read_table() when reading input_data.

windowint, defaults to 1

The window size used to generate the structure data.

namestr, optional

A name for the StructureAsInteractions object.

Attributes

datapandas.DataFrame

The structure data.

class rnavigate.data.interactions.StructureCompareMany(input_data, sequence, metric=None, metric_defaults=None, window=1, name=None)

Bases: Interactions

A class for storing and manipulating a comparison of many structures.

Parameters

input_datastring or pandas.DataFrame

If string, a path to a file containing structure data. If dataframe, the dataframe containing structure data. The dataframe must contain columns “i”, “j”, and “Structure”. Dataframe may also include other columns.

sequencestring or rnavigate.data.Sequence

The sequence string corresponding to the structure data.

metricstring, defaults to “Structure”

The column name to use for visualization.

metric_defaultsdict

Keys are metric names and values are dictionaries of metric-specific defaults. These defaults include:

“metric_column”string

the column name to use for visualization

“cmap”string or matplotlib.colors.Colormap

the colormap to use for visualization

“normalization”“min_max”, “0_1”, “none”, or “bins”

The type of normalization to use when mapping values to colors

“values”list of float

The values to used with normalization of the data

“title”string

the title to use for colorbars

“extend”“min”, “max”, “both”, or “neither”

Which ends to extend when drawing the colorbar.

“tick_labels” : list of string

read_table_kwdict, optional

kwargs passed to pandas.read_table() when reading input_data.

windowint, defaults to 1

The window size used to generate the structure data.

namestr, optional

A name for the StructureAsInteractions object.

Attributes

datapandas.DataFrame

The structure data.

class rnavigate.data.interactions.StructureCompareTwo(input_data, sequence, metric=None, metric_defaults=None, window=1, name=None)

Bases: Interactions

A class for storing and manipulating a comparison of two structures.

Parameters

input_datastring or pandas.DataFrame

If string, a path to a file containing structure data. If dataframe, the dataframe containing structure data. The dataframe must contain columns “i”, “j”, and “Structure”. Dataframe may also include other columns.

sequencestring or rnavigate.data.Sequence

The sequence string corresponding to the structure data.

metricstring, defaults to “Structure”

The column name to use for visualization.

metric_defaultsdict

Keys are metric names and values are dictionaries of metric-specific defaults. These defaults include:

“metric_column”string

the column name to use for visualization

“cmap”string or matplotlib.colors.Colormap

the colormap to use for visualization

“normalization”“min_max”, “0_1”, “none”, or “bins”

The type of normalization to use when mapping values to colors

“values”list of float

The values to used with normalization of the data

“title”string

the title to use for colorbars

“extend”“min”, “max”, “both”, or “neither”

Which ends to extend when drawing the colorbar.

“tick_labels” : list of string

read_table_kwdict, optional

kwargs passed to pandas.read_table() when reading input_data.

windowint, defaults to 1

The window size used to generate the structure data.

namestr, optional

A name for the StructureAsInteractions object.

Attributes

datapandas.DataFrame

The structure data.

rnavigate.data.pdb module

The PDB object to represent tertiary structures with atomic coordinates.

This data can be used to filter interactions by 3D distance, and to visualize profile and interactions data on interactive 3D structures.

class rnavigate.data.pdb.PDB(input_data, chain, sequence=None, name=None)

Bases: Sequence

A class to represent RNA tertiary structures with atomic coordinates.

This data can be used to filter interactions by 3D distance, and to visualize profile and interactions data on interactive 3D structures.

Parameters

input_datastr

path to a PDB or CIF file

chainstr

chain identifier of RNA of interest

sequencernavigate.Sequence or str, optional

A sequence to use as the reference sequence. This is required if the sequence cannot be found in the header Defaults to None.

namestr, optional

A name for the data set. Defaults to None.

Attributes

sequencestr

The RNA sequence

lengthint

The length of the RNA sequence

namestr

A name for the data set

pathstr

The path to the PDB or CIF file

chainstr

The chain identifier of the RNA of interest

offsetint

The offset between the sequence positions and the PDB residue indices

pdbBio.PDB.Structure.Structure

The PDB structure

pdb_idxnp.array

The PDB indices of the RNA

pdb_seqnp.array

The PDB sequence of the RNA

distance_matrixdict

A dictionary of distance matrices for each atom type

get_distance(i, j, atom="O2'")

Get the distance between given atom in nucleotides i and j (1-indexed).

Parameters

iint

The first nucleotide

jint

The second nucleotide

atomstring or dict, defaults to “O2’”

The atom to use for distance calculations. If a string, the same atom will be used for all residues. If a dict, the atom will be chosen based on the nucleotide type. If “DMS”, the N1 atom will be used for A and G, and the N3 atom will be used for U and C.

Returns

distancefloat

The distance between the atoms

get_distance_matrix(atom="O2'")

Get the pairwise atomic distance matrix for all residues.

Parameters

atomstring or dict, defaults to “O2’”

The atom to use for distance calculations. If a string, the same atom will be used for all residues. If a dict, the atom will be chosen based on the nucleotide type. If “DMS”, the N1 atom will be used for A and G, and the N3 atom will be used for U and C.

Returns

matrixNxN numpy.ndarray

A 2D array of pairwise distances. N is the length of the RNA.

get_pdb_idx(seq_idx)

Return the PDB index given the sequence index (0-indexed).

get_seq_idx(pdb_idx)

Return the sequence index given the PDB index.

get_sequence(pdb)

Find the sequence in the provided CIF or PDB file.

Parameters

pdbstr

path to a PDB or CIF file

Returns

sequencestring

The RNA sequence

get_sequence_from_seqres(seqres)

Used by get_sequence to parse the SEQRES entries.

Parameters

seqreslist

A list of SEQRES entries for the RNA chain of interest

Returns

sequencestring

The RNA sequence

get_xyz_coord(nt, atom)

Return the x, y, and z coordinates for a given residue and atom.

Parameters

ntint

The nucleotide of interest (1-indexed)

atomstring or dict, defaults to “O2’”

The atom to use for distance calculations. If a string, the same atom will be used for all residues. If a dict, the atom will be chosen based on the nucleotide type. If “DMS”, the N1 atom will be used for A and G, and the N3 atom will be used for U and C.

Returns

xyzlist

A list of x, y, and z coordinates

is_valid_idx(pdb_idx=None, seq_idx=None)

Determines if a PDB or sequence index is in the PDB structure.

Parameters

pdb_idxint, optional

A PDB index (1-indexed). Defaults to None.

seq_idxint, optional

A sequence index (1-indexed). Defaults to None.

Returns

bool

True if the index is in the PDB structure, False otherwise.

read_pdb(pdb)

Read a PDB or CIF file into the data structure.

Parameters

pdbstr

path to a PDB or CIF file

set_indices()

Uses self.data and self.sequence to set self.offset

rnavigate.data.profile module

class rnavigate.data.profile.DanceMaP(input_data, component, read_table_kw=None, sequence=None, metric='Norm_profile', metric_defaults=None, name=None)

Bases: SHAPEMaP

A class to represent per-nucleotide DanceMaP data.

Parameters

input_datastr or pandas.DataFrame

path to a DanceMapper reactivities.txt file or a pandas DataFrame

componentint

Which component of the DanceMapper ensemble to read in (0-indexed).

read_table_kwdict, optional

Keyword arguments to pass to pandas.read_table. These are not necessary for reactivities.txt files. Defaults to None.

sequencernavigate.Sequence or str, optional

A sequence to use as the reference sequence. This is not necessary for reactivities.txt files. Defaults to None.

metricstr, defaults to “Norm_profile”

The name of the set of value-to-color options to use.

read_file(input_data, read_table_kw={})

Convert data file to pandas dataframe and store as self.data

Parameters

filepathstring

path to data file containing interactions

read_table_kwdict

kwargs dictionary passed to pd.read_table

Returns

dataframepandas.DataFrame

the data table

property recreation_kwargs

A dictionary of keyword arguments to pass when recreating the object.

class rnavigate.data.profile.DeltaProfile(profile1, profile2, metric=None, metric_defaults=None, name=None)

Bases: Profile

A class to represent the difference between two profiles.

Parameters

profile1Profile

The first profile to compare.

profile2Profile

The second profile to compare.

metricstr, optional

The name of the metric to use. Defaults to the metric of profile1.

metric_defaultsdict, optional

Keys are metric names, to be used with metric. Values are dictionaries of plotting parameters. Defaults to None.

namestr, optional

A name for the data set. Defaults to None.

class rnavigate.data.profile.Profile(input_data, metric='default', metric_defaults=None, read_table_kw=None, sequence=None, name=None)

Bases: Data

A class to represent per-nucleotide data.

Parameters

input_datastr or pandas.DataFrame

path to a csv or tab file or a pandas DataFrame Table must be 1 row for each nucleotide in the sequence. table columns must contain these columns:

A nucleotide position column labelled “Nucleotide” A sequence column labelled “Sequence” with 1 of (A, C, G, U, T) per row

These will be added to the table if sequence is provided.

A data measurement column labelled “Profile” with a float or integer

Label may be another name if specified in metric_defaults

Optionally: A measurement error column.

Label must be specified in metric_defaults

Other columns may be present, and set up using metric_defaults.

See metric_defaults for more information.

read_table_kwdict, optional

Keyword arguments to pass to pandas.read_table. Defaults to None.

sequencernavigate.Sequence or str, optional

A sequence to use as the reference sequence. This is required if input_data does not contain a “Sequence” column. Defaults to None.

metricstr, defaults to “default”

The name of the set of value-to-color options to use. “default” specifies:

“Profile” column is used No error rates are present Values are normalized to the range [0, 1] Values are mapped to colors using the “viridis” colormap

“Distance” specifies:

(3-D) “Distance” column is used No error rates are present Values in the range [5, 50] are normalized to the range [0, 1] Values are mapped to colors using the “cool” colormap

Other options may be defined in metric_defaults.

metric_defaultsdict, optional

Keys are metric names, to be used with metric. Values are dictionaries of plotting parameters:

“metric_column”str

The name of the column to use as the metric. Plots and analyses that use per-nucleotide data will use this column. If “color_column” is not provided, this column also defines colors.

“error_column”str or None

The name of the column to use as the error. If None, no error bars are plotted.

“color_column”str or None

The name of the column to use for coloring. If None, colors are defined by “metric_column”.

“cmap”str or list

The name of the colormap to use. If a list, the list of colors to use.

“normalization”str

The type of normalization to use. In order to be used with colormaps, values are normalized to either be integers for categorical colormaps, or floats in the range [0, 1] for continuous colormaps. “none” : no normalization is performed “min_max” : values are scaled to floats in the range [0, 1] based on

the upper and lower bounds defined in “values”

“0_1”values are scaled to floats in the range [0, 1] based on

the minimum and maximum values in the data

“bins”values are scaled an integer based on bins defined by the

list of bounds defined in “values”

“percentiles”values are scaled to floats in the range [0, 1]

based on upper and lower percentile bounds defined by “values”

“values”list or None

The values to use when normalizing the data. if “normalization” is “min_max”, this should be a list of two values

defining the upper and lower bounds.

if “normalization” is “bins”, this should be a list of values

of length 1 less than the length of cmap. example: [5, 10, 20] defines 4 bins:

(-infinity, 5), [5, 10), [10, 20), [20, infinity)

if “normalization” is “percentiles”, this should be a list of two

values defining the upper and lower percentile bounds.

if “normalization” is “0_1” or “none”, this should be None.

“title”str, defaults to “”

The title of the colorbar.

“ticks”list, defaults to None

The tick locations to use for the colorbar. If None, values are determined automatically.

“tick_labels”list, defaults to None

The labels to use for the colorbar ticks. If None, values are determined automatically from “ticks”.

“extend”“neither”, “both”, “min”, or “max”, defaults to “neither”

Which ends of the colorbar to extend (places an arrow head).

Defaults to None.

namestr, optional

A name for the data set. Defaults to None.

Attributes

datapandas.DataFrame

The data table

calculate_gini_index(values)

Calculate the Gini index of an array of values.

calculate_windows(column, window, method='median', new_name=None, minimum_points=None, mask_na=True)

calculates a windowed operation over a column of data.

Result is stored in a new column. Value of each window is assigned to the center position of the window.

Parameters

columnstr

name of column to perform operation on

windowint

window size, must be an odd number

methodstring or function, defaults to “median”

operation to perform over windows. if string, must be “median”, “mean”, “minimum”, or “maximum” if function, must take a 1D numpy array as input and return a scalar

new_namestr, defaults to f”{method}_{window}_nt”

name of new column for stored result.

minimum_pointsint, defaults to value of window

minimum number of points within each window.

mask_nabool, defaults to True

whether to mask the result of the operation where the original column has a nan value.

copy()

Returns a copy of the Profile.

classmethod from_array(input_data, sequence, **kwargs)

Construct a Profile object from an array of values.

Parameters

input_datalist or np.array

A list or array of values to use as the metric.

sequencestr

The RNA sequence.

**kwargs

Additional keyword arguments to pass to the Profile constructor.

Returns

Profile

A Profile object with the provided values.

get_aligned_data(alignment)

Returns a new Profile object with the data aligned to a sequence.

Parameters

alignmentrnavigate.data.SequenceAlignment

The alignment to use to map rows of self.data to a new sequence.

Returns

Profile

A new Profile object with the data aligned to the sequence in the alignment.

get_plotting_dataframe()

Returns a dataframe with the data to be plotted.

Returns

pandas.DataFrame

A dataframe with the columns “Nucleotide”, “Values”, “Errors”, and “Colors”.

norm_boxplot(values)

removes outliers (> 1.5 * IQR) and scales the mean to 1.

NOTE: This method varies slightly from normalization method used in the SHAPEMapper pipeline. Shapemapper sets undefined values to 0, and then uses these values when computing iqr and 90th percentile. Including these values can skew these result. This method excludes such nan values. Other elements are the same.

Parameters

values1D numpy array

values to normalize

Returns

(float, float)

scaling factor and error propagation factor

norm_eDMS(values)

Calculates norm factors following eDMS pernt scheme in ShapeMapper 2.2

Parameters

values1D numpy array

values to normalize

Returns

(float, float)

scaling factor and error propagation factor

norm_percentiles(values, lower_bound=90, upper_bound=99, median_or_mean='mean')

Calculates factors to scale the median between percentile bounds to 1.

Parameters

values1D numpy array

values to normalize

lower_boundint or float, optional

percentile of lower bound, Defaults to 90

upper_boundint or float, optional

percentile of upper bound, Defaults to 99

median_or_meanstring, optional

whether to use the median or mean of the values between the bounds.

Returns

(float, float)

scaling factor and error propagation factor

normalize(profile_column=None, new_profile=None, error_column=None, new_error=None, norm_method='boxplot', nt_groups=None, profile_factors=None, **norm_kwargs)

Normalize values in a column, and store in a new column.

By default, performs ShapeMapper2 boxplot normalization on self.metric and stores the result as “Norm_profile”.

Parameters

profile_columnstring, defaults to self.metric

column name of values to normalize

new_profilestring, defaults to “Norm_profile”

column name of new normalized values

error_columnstring, defaults to self.error_column

column name of error values to propagate

new_errorstring, defaults to “Norm_error”

column name of new propagated error values

norm_methodstring, defaults to “boxplot”

normalization method to use. “DMS” uses self.norm_percentile and nt_groups=[‘AC’, ‘UG’]

scales the median of 90th to 95th percentiles to 1 As and Cs are normalized seperately from Us and Gs

“eDMS” uses self.norm_eDMS and nt_groups=[‘A’, ‘U’, ‘C’, ‘G’]

Applies the new eDMS-MaP normalization. Each nucleotide is normalized seperately.

“boxplot” uses self.norm_boxplot and nt_groups=[‘AUCG’]

removes outliers (> 1.5 iqr) and scales median to 1 scales nucleotides together unless specified with nt_groups

“percentile” uses self.norm_percentile and nt_groups=[‘AUCG’]

scales the median of 90th to 95th percentiles to 1 scales nucleotides together unless specified with nt_groups

Defaults to “boxplot”: the default normalization of ShapeMapper

nt_groupslist of strings, defaults to None

A list of nucleotides to group e.g. [‘AUCG’] groups all nts together

[‘AC’, ‘UG’] groups As with Cs and Us with Gs [‘A’, ‘C’, ‘U’, ‘G’] scales each nt seperately

Default depends on norm_method

profile_factorsdictionary, defaults to None
a scaling factor (float) for each nucleotide. keys must be:

‘A’, ‘C’, ‘U’, ‘G’

Note: using this argument overrides any calculation of scaling Defaults to None

**norm_kwargs

these are passed to the norm_method function

Returns

profile_factorsdict

the new profile scaling factors dictionary

normalize_external(profiles, **kwargs)

normalize reactivities using other profiles to normfactors.

Parameters

profileslist of rnavigate.data.Profile

a list of other profiles used to compute scaling factors

Returns

profile_factorsdict

the new profile scaling factors dictionary

normalize_sequence(t_or_u='U', uppercase=True)

Changes the values in self.data[“Sequence”] to the normalized sequence.

Parameters

t_or_u“T” or “U”, Defaults to “U”.

Whether to replace T with U or U with T.

uppercasebool, Defaults to True.

Whether to convert the sequence to uppercase.

property recreation_kwargs

A dictionary of keyword arguments to pass when recreating the object.

winsorize(column, lower_bound=None, upper_bound=None)

Winsorize the data between bounds.

If either bound is set to None, one-sided Winsorization is performed.

Parameters

columnstring

the column of data to be winsorized

lower_boundNumber or None, defaults to None

Data below this value is set to this value. If None, no lower bound is applied.

upper_boundNumber or None, defaults to None

Data above this value is set to this value. If None, no upper bound is applied.

class rnavigate.data.profile.RNPMaP(input_data, read_table_kw=None, sequence=None, metric='NormedP', metric_defaults=None, name=None)

Bases: Profile

Represents per-nucleotide RNPMaP data.

Parameters

input_datastr or pandas.DataFrame

path to an RNAModMapper reactivities.txt file or a pandas DataFrame

read_table_kwdict, optional

Keyword arguments to pass to pandas.read_table. These are not necessary for reactivities.txt files. Defaults to None.

sequencernavigate.Sequence or str, optional

A sequence to use as the reference sequence. This is not necessary for reactivities.txt files. Defaults to None.

metricstr, defaults to “NormedP”

The name of the set of value-to-color options to use.

metric_defaultsdict, optional

Keys are metric names, to be used with metric. Values are dictionaries of plotting parameters. Defaults to None.

namestr, optional

A name for the data set. Defaults to None.

class rnavigate.data.profile.SHAPEMaP(input_data, normalize=None, read_table_kw=None, sequence=None, metric='Norm_profile', metric_defaults=None, log=None, name=None)

Bases: Profile

A class to represent per-nucleotide SHAPE-MaP data.

Parameters

input_datastr or pandas.DataFrame

path to a ShapeMapper2 profile.txt or .map file or a pandas DataFrame

normalize“DMS”, “eDMS”, “boxplot”, “percentiles”, or None, defaults to None

The normalization method to use. “DMS” uses self.norm_percentile and nt_groups=[‘AC’, ‘UG’]

scales the median of 90th to 95th percentiles to 1 As and Cs are normalized seperately from Us and Gs

“eDMS” uses self.norm_eDMS and nt_groups=[‘A’, ‘U’, ‘C’, ‘G’]

Applies the new eDMS-MaP normalization. Each nucleotide is normalized seperately.

“boxplot” uses self.norm_boxplot and nt_groups=[‘AUCG’]

removes outliers (> 1.5 iqr) and scales median to 1 scales nucleotides together unless specified with nt_groups

“percentiles” uses self.norm_percentile and nt_groups=[‘AUCG’]

scales the median of 90th to 95th percentiles to 1 scales nucleotides together unless specified with nt_groups

Defaults to None: no normalization is performed

read_table_kwdict, optional

Keyword arguments to pass to pandas.read_table. These are not necessary for profile.txt and .map files. Defaults to None.

sequencernavigate.Sequence or str, optional

A sequence to use as the reference sequence. This is not necessary for profile.txt and .map files. Defaults to None.

metricstr, defaults to “Norm_profile”

The name of the set of value-to-color options to use. “Norm_profile” specifies:

“Norm_profile” column is used “Norm_stderr” column is used for error bars Values are normalized to bins:

(-inf, -0.4), [-0.4, 0.4), [0.4, 0.85), [0.85, 2), [2, inf)

Bins are mapped to “grey”, “black”, “orange”, “red”, “red”

Other options may be defined in metric_defaults.

metric_defaultsdict, optional

Keys are metric names, to be used with metric. Values are dictionaries of plotting parameters:

“metric_column”str

The name of the column to use as the metric. Plots and analyses that use per-nucleotide data will use this column. If “color_column” is not provided, this column also defines colors.

“error_column”str or None

The name of the column to use as the error. If None, no error bars are plotted.

“color_column”str or None

The name of the column to use for coloring. If None, colors are defined by “metric_column”.

“cmap”str or list

The name of the colormap to use. If a list, the list of colors to use.

“normalization”str

The type of normalization to use. In order to be used with colormaps, values are normalized to either be integers for categorical colormaps, or floats in the range [0, 1] for continuous colormaps. “none” : no normalization is performed “min_max” : values are scaled to floats in the range [0, 1] based on

the upper and lower bounds defined in “values”

“0_1”values are scaled to floats in the range [0, 1] based on

the minimum and maximum values in the data

“bins”values are scaled an integer based on bins defined by the

list of bounds defined in “values”

“percentiles”values are scaled to floats in the range [0, 1]

based on upper and lower percentile bounds defined by “values”

“values”list or None

The values to use when normalizing the data. if “normalization” is “min_max”, this should be a list of two values

defining the upper and lower bounds.

if “normalization” is “bins”, this should be a list of values

of length 1 less than the length of cmap. example: [5, 10, 20] defines 4 bins:

(-infinity, 5), [5, 10), [10, 20), [20, infinity)

if “normalization” is “percentiles”, this should be a list of two

values defining the upper and lower percentile bounds.

if “normalization” is “0_1” or “none”, this should be None.

“title”str, defaults to “”

The title of the colorbar.

“ticks”list, defaults to None

The tick locations to use for the colorbar. If None, values are determined automatically.

“tick_labels”list, defaults to None

The labels to use for the colorbar ticks. If None, values are determined automatically from “ticks”.

“extend”“neither”, “both”, “min”, or “max”, defaults to “neither”

Which ends of the colorbar to extend (places an arrow head).

Defaults to None.

logstr, optional

Path to a ShapeMapper v2 shapemap_log.txt file with mutations-per-molecule and read-length histograms. These will be present if the –per-read-histogram flag was used when running ShapeMapper v2. Currently, this is not working with ShapeMapper v2.2 files. Defaults to None.

namestr, optional

A name for the data set. Defaults to None.

Attributes

datapandas.DataFrame

The data table

classmethod from_rnaframework(input_data, normalize=None)

Construct a SHAPEMaP object from an RNAFramework output file.

Parameters

input_datastr

path to an RNAFramework .xml reactivities file

normalize“DMS”, “eDMS”, “boxplot”, “percentiles”, or None, defaults to None

The normalization method to use. “DMS” uses self.norm_percentile and nt_groups=[‘AC’, ‘UG’]

scales the median of 90th to 95th percentiles to 1 As and Cs are normalized seperately from Us and Gs

“eDMS” uses self.norm_eDMS and nt_groups=[‘A’, ‘U’, ‘C’, ‘G’]

Applies the new eDMS-MaP normalization. Each nucleotide is normalized seperately.

“boxplot” uses self.norm_boxplot and nt_groups=[‘AUCG’]

removes outliers (> 1.5 iqr) and scales median to 1 scales nucleotides together unless specified with nt_groups

“percentiles” uses self.norm_percentile and nt_groups=[‘AUCG’]

scales the median of 90th to 95th percentiles to 1 scales nucleotides together unless specified with nt_groups

Defaults to None: no normalization is performed

Returns

SHAPEMaP

A SHAPEMaP object with the provided values.

read_log(log)

Read the ShapeMapper log file.

Parameters

logstr

Path to a ShapeMapper v2 shapemap_log.txt file with mutations-per-molecule and read-length histograms.

Returns

read_lengthspandas.DataFrame

A dataframe with the columns “Read_length”, “Modified_read_length”, and “Untreated_read_length”.

mutations_per_moleculepandas.DataFrame

A dataframe with the columns “Mutation_count”, “Modified_mutations_per_molecule”, and “Untreated_mutations_per_molecule”.

write_bpp2seq_file(output_file)

Write the data to a ShapeMapper2 .bpp2seq file (for Contra/EternaFold).

Parameters

output_filestr

The path to write the output file.

write_shape_file(output_file)

Write the data to a ShapeMapper2 .shape file (for RNAstructure programs).

Parameters

output_filestr

The path to write the output file.

rnavigate.data.secondary_structure module

class rnavigate.data.secondary_structure.SecondaryStructure(input_data, extension=None, autoscale=True, name=None, **kwargs)

Bases: Sequence

Base class for secondary structures.

Parameters

input_datastr or pandas.DataFrame

A dataframe or filepath containing a secondary structure DataFrame should contain these columns:

[“Nucleotide”, “Sequence”, “Pair”]

“Pair” column must be redundant. Filepath parsing is determined by file extension:

varna, xrna, nsd, cte, ct, dbn, bracket, json (R2DT), forna

extensionstr, optional

The file extension of the input_data file. If not provided, the extension will be inferred from the input_data filepath.

autoscalebool, optional

Whether to automatically scale the x and y coordinates. Defaults to True.

namestr, optional

The name of the RNA sequence. Defaults to None.

Attributes

datapandas.DataFrame

DataFrame storing base-pairs

filepathstr

The path to the input file, if provided, otherwise “dataframe”

sequencestr

The RNA sequence

ntsnumpy.array

The “Nucleotide” column of data

pair_ntsnumpy.array

The “Pair” column of data

headerstr

Header information from CT file

xcoordinatesnumpy.array

The “X_coordinate” column of data

ycoordinatesnumpy.array

The “X_coordinate” column of data

distance_matrixnumpy.array

The contact distance matrix of the RNA structure

add_pairs(pairs, break_conflicting_pairs=False)

Add base pairs to current secondary structure.

Parameters

pairslist

1-indexed list of paired residues. e.g. [(1, 20), (2, 19)]

break_conflicting_pairsbool, defaults to False

Whether to break existing pairs if there is a conflict

as_interactions(structure2=None)

Returns rnavigate.Interactions representation of this, or more, structures.

Parameters

structure2SecondaryStructure or list of these, defaults to None

If provided, basepairs from all structures are included and labeled by which structures contain them and how many structures contain them.

property boolean

Return a boolean array of paired and unpaired nucleotides.

break_noncanonical_pairs()

Removes non-canonical basepairs from the secondary structure.

WARNING: this deletes information.

break_pairs_nts(nt_positions)

break base pairs at the given list of positions.

WARNING: this deletes information.

Parameters

nt_positionslist of int

1-indexed positions to break pairs

break_pairs_region(start, end, break_crossing=True, inverse=False)

Removes pairs from the specified region (1-indexed, inclusive).

WARNING: this deletes information

Parameters

startint

start position (1-indexed, inclusive)

endint

end position (1-indexed, inclusive)

break_crossingbool, defaults to True

Whether to keep pairs that cross over the specified region

inversebool, defaults to False

Invert the behavior, i.e. remove pairs that are not in this region

break_singleton_pairs()

Removes singleton basepairs from the secondary structure.

WARNING: This deletes information.

compute_ppv_sens(structure2, exact=True)

Compute the PPV and sensitivity between this and another structure.

True and False are determined from this structure. Positive and Negative are determined from structure2.

PPV = TP / (TP + FP) Sensitivity = TP / (TP + FN)

Parameters

structure2SecondaryStructure

The SecondaryStructure to compare to.

exactbool, defaults to True

True requires BPs to be exactly correct. False allows +/-1 bp slippage.

Returns

float

sensitivity

float

PPV

2-tuple of floats

(TP, TP+FP, TP+FN)

contact_distance(i, j)

Returns the contact distance between positions i and j

copy()
fill_mismatches(mismatch=1)

Adds base pairs to fill 1,1 and optionally 2,2 mismatches.

Parameters

mismatchint, defaults to 1

1 will fill only 1,1 mismatches 2 will fill 1,1 and 2,2 mismatches

classmethod from_pairs_list(input_data, sequence)

Creates a SecondaryStructure from a list of pairs and a sequence.

Parameters

input_datalist

1-indexed list of base pairs. e.g. [(1, 20), (2, 19)]

sequencestr

The RNA sequence. e.g., “AUCGUGUCAUGCUA”

classmethod from_sequence(input_data)

Creates a SecondaryStructure from a sequence string.

This structure is initialized with no base pairs. If base pairs are needed, use SecondaryStructure.from_pairs_list().

get_aligned_data(alignment)

Returns a new SecondaryStructure object matching the alignment target.

Parameters

alignmentdata.Alignment

An alignment object used to map values

get_distance_matrix(recalculate=False, max_cd=50)

Get a matrix of pair-wise shortest path distances through the structure.

This function uses a BFS algorithm. The structure is represented as a complete graph with nucleotides as vertices and base-pairs and backbone as edges. All edges are length 1. Matrix is stored as an attribute for future use.

If the attribute is set (not None) and recalculate is False, the attribute will be returned.

Based on Tom’s contact_distance, but expanded to return the pairwise matrix. New contact_distance method added to return the distance between two positions.

By default, the maximum contact distance is set to 50. This will be the maximum value reported in the matrix, i.e. a value of 50 in the matrix means >= 50. This prevents the algorithm from running for a very long time on long RNAs. If you need a larger value, set max_cd to a higher value.

Parameters

recalculatebool, defaults to False

Set to True to recalculate the matrix even if the attribute is set.

max_cdint, defaults to 50

The maximum contact distance to calculate.

get_dotbracket()

Get a dotbracket notation string representing the secondary structure.

Pseudoknot levels:

1: () 2: [] 3: {} 4: <> 5: Aa 6: Bb 7: Cc etc…

Returns

str

A dot-bracket representation of the secondary structure

get_helices(fill_mismatches=True, split_bulge=True, keep_singles=False)

Get a dictionary of helices from the secondary structure.

Keys are equivalent to list indices. Values are lists of paired nucleotides (1-indexed) in that helix. e.g. {0:[(1,50),(2,49),(3,48)}

Parameters

fill_mismatchesbool, defaults to True

Whether 1-1 and 2-2 bulges are replaced with base pairs

split_bulgebool, defaults to True

Whether to split helices on bulges

keep_singlesbool, defaults to False

Whether to return helices that contain only 1 base-pair

Returns

dict

A dictionary of helices

get_human_dotbracket()

Get a human-readable dotbracket string representing the secondary structure.

This is an experimental format designed to be more human readable, i.e. no counting of brackets required.

  1. Letters, instead of brackets, are used to denote nested base pairs.

  2. Each helix is assigned a letter, which is incremented one letter alphabetically from the nearest enclosing stem.

  3. Non-nested helices (pseudoknots) are assigned canonical brackets.

From this canonical dbn string:

how many bases are in the base stem? how many nested helices are there? ((((….(((.[[..)))))(((…(((..]].))))))))

Same question, new format:

AABB….CCC.[[..cccbbBBB…CCC..]].cccbbbaa

Read this as:
((_______________________________________)) (level 1 = A)
((_______________))(((______________))) (level 2 = B)
(((_____))) (((_____))) (level 3 = C)

[[__________________]] (pseudoknot = [])

Pseudoknot levels:

1: Aa, Bb, Cc, etc. 2: [], 3: {}, 4: <>

get_interactions_df()

Returns a DataFrame of i, j basepairs.

Returns

pandas.DataFrame
A DataFrame with columns:

i: the 5’ (1-indexed) position of the base pair j: the 3’ (1-indexed) position of the base pair Structure: always 1

get_junction_nts()

Get a list of junction nucleotides (paired, but at the end of a chain).

Returns

list

A list of 1-indexed positions of junction nucleotides

get_nonredundant_ct()

Returns the ct attribute in a non-redundant form.

Only returns pairs in which i < j For example:

self.ct[i-1] == j self.ct[j-1] == i BUT self.get_nonredundant_ct()[j-1] == 0

Returns

numpy.array

A non-redundant array of base pairs

get_paired_nts()

Get a list of residues that are paired.

Returns

list

A list of 1-indexed positions of paired nucleotides

get_pairs()

Get a non-redundant list of base pairs i < j as a array of tuples.

Returns

list

A list of 1-indexed positions. e.g., [(1, 50), (2, 49), …]

get_pseudoknots(fill_mismatches=True)

Get the pk1 and pk2 pairs from the secondary structure.

Ignores single base pairs. PK1 is defined as the helix crossing the most other bps. If there is a tie, the most 5’ helix is called pk1 returns pk1 and pk2 as a list of base pairs e.g [(1,10),(2,9)…

Parameters

fill_mismatchesbool, defaults to True

Whether 1-1 and 2-2 bulges are replaced with base pairs

Returns

list of 2 lists of 2-tuples

A list of base pairs for pk1 and pk2

get_structure_elements()

This code is not yet implemented.

Returns a string with a character for each nucleotide, indicating what kind of structure element it is a part of.

Characters:

Dangling Ends (E) Stems (S) Hairpin Loops (H) Bulges (B) Internal Loops (I) MultiLoops (M) External Loops (X) Pseudoknot (P)

get_unpaired_nts()

Get a list of residues that are unpaired.

Returns

list

A list of 1-indexed positions of unpaired nucleotides

normalize_dtypes()

Convert dtypes of SecondaryStructure dataframe for consistency.

normalize_sequence(t_or_u='U', uppercase=True)

Normalize the sequence attribute (fix case and/or U <-> T).

property nts
property pair_nts
read_ct(structure_number=0)

Loads secondary structure information from a given ct file.

Requires a properly formatted header.

Parameters

structure_numberint, defaults to 0

0-indexed structure number to load from the ct file.

read_cte()

Generates SecondaryStructure object data from a CTE file

Resulting SecondaryStructure object will include nucleotide x and y coordinates and is compatible with plot_ss.

read_dotbracket()

Generates SecondaryStructure object data from a dot-bracket file.

Resulting SecondaryStructure object will include nucleotide x and y coordinates and is compatible with plot_ss.

read_forna()

Generates SecondaryStructure object data from a FORNA JSON file.

Resulting SecondaryStructure object will include nucleotide x and y coordinates and is compatible with plot_ss.

read_nsd(structure_number=0)

Generates SecondaryStructure object data from an NSD file.

Resulting SecondaryStructure object will include nucleotide x and y coordinates and is compatible with plot_ss.

read_r2dt()

Generates SecondaryStructure object data from an R2DT JSON file.

Resulting SecondaryStructure object will include nucleotide x and y coordinates and is compatible with plot_ss.

read_varna()

Generates SecondaryStructure object data from a VARNA file.

Resulting SecondaryStructure object will include nucleotide x and y coordinates and is compatible with plot_ss.

read_xrna()

Generates SecondaryStructure object data from an XRNA file.

Resulting SecondaryStructure object will include nucleotide x and y coordinates and is compatible with plot_ss.

transform_coordinates(flip=None, scale=None, center=None, rotate_degrees=None)

Perform transformations on X and Y structure coordinates.

To acheive vertical and horizontal flip together, rotate 180 degrees.

Parameters

flipstr, optional

“horizontal” or “vertical”

scalefloat, optional

new median distance of basepairs

centertuple of floats, optional

new center x and y coordinate

rotate_degreesfloat, optional

number of degrees to rotate structure

write_ct(out_file)

Write structure to a ct file.

write_cte(out_file)

Write structure to CTE format for Structure Editor.

write_dbn(rna_name, region='all', out_file=None)

Write the structure to a dot-bracket file.

Parameters

rna_namestr

The name of the RNA sequence

regionlist of 2 integers, optional

The region (start and end positions) of the RNA to write to file. Defaults to “all”.

out_filestr, optional

The name of the output file. If not provided, the dbn file is printed.

write_sto(out_file, name='seq')

Write structure to Stockholm (STO) file to use in infernal searches.

property xcoordinates
property ycoordinates
class rnavigate.data.secondary_structure.SequenceCircle(input_data, gap=30, name=None, **kwargs)

Bases: SecondaryStructure

A circular SecondaryStructure-like representation of RNA sequence.

class rnavigate.data.secondary_structure.StructureCoordinates(x, y, pairs=None)

Bases: object

Helper class to perform structure coordinate transformations

Parameters

xnumpy.array

x coordinates

ynumpy.array

y coordinates

pairslist of pairs, optional

list of base-paired positions required if scaling coordinates

center(x=0, y=0)

Center structure on the given x, y coordinate

Parameters

xint, defaults to 0

x coordinate of structure center

yint, defaults to 0

y coordinate of structure center

flip(horizontal=True)

Flip structure vertically or horizontally.

Parameters

horizontalbool, defaults to True

whether to flip structure horizontally, otherwise vertically

get_center_point()

Get the x, y coordinates for the center of structure.

Returns

float

x coordinate of structure center

float

y coordinate of structure center

rotate(degrees)

Rotate structure on current center point.

Parameters

degreesfloat

number of degrees to rotate structure

scale(median_bp_distance=1.0)

Scale structure such that median base-pair distance is constant.

Parameters

median_bp_distancefloat, defaults to 1.0

New median distance between all base-paired nucleotides.

Module contents

class rnavigate.data.AlignmentChain(*alignments)

Bases: BaseAlignment

Combines a list of alignments into one.

Parameters

alignmentslist of Alignment objects

the alignments to chain together

Attributes

alignmentslist

the constituent alignments

starting_sequencestr

starting sequence of alignments[0]

target_sequencestr

target sequence of alignments[-1]

mappingnumpy.array

an array which maps from starting_sequence to target_sequence. index of starting_sequence is mapping[index] of target sequence

get_inverse_alignment()

Alignments require a method to get the inverted alignment

get_mapping()

combines mappings from each alignment.

Returns

mappingnumpy.array

mapping from initial starting sequence to final target sequence index of starting_sequence is mapping[index] of target sequence

class rnavigate.data.AllPossible(sequence, metric='data', input_data=None, metric_defaults=None, read_table_kw=None, window=1, name=None)

Bases: Interactions

A class for storing and manipulating all possible interactions.

Parameters

sequencestring or rnavigate.data.Sequence

The sequence string corresponding to the pairing probability data.

metricstring, defaults to “Probability”

The column name to use for visualization.

metric_defaultsdict

Keys are metric names and values are dictionaries of metric-specific defaults. These defaults include:

“metric_column”string

the column name to use for visualization

“cmap”string or matplotlib.colors.Colormap

the colormap to use for visualization

“normalization”“min_max”, “0_1”, “none”, or “bins”

The type of normalization to use when mapping values to colors

“values”list of float

The values to used with normalization of the data

“title”string

the title to use for colorbars

“extend”“min”, “max”, “both”, or “neither”

Which ends to extend when drawing the colorbar.

“tick_labels” : list of string

read_table_kwdict, optional

kwargs passed to pandas.read_table() when reading input_data.

windowint, defaults to 1

The window size used to generate the pairing probability data.

namestr, optional

A name for the AllPossible object.

Attributes

datapandas.DataFrame

The pairing probability data.

class rnavigate.data.Annotation(input_data, annotation_type, sequence, name=None, color='blue')

Bases: Sequence

Basic annotation class to store 1D features of an RNA sequence

Each feature type must be a seperate instance. Feature types include:

a group of separted nucleotides (e.g. binding pocket) regions of interest (e.g. coding sequence, Alu elements) sites of interest (e.g. m6A locations) primer binding sites.

Parameters

input_datalist

List will be treated according to annotation_type argument. Expected behaviors for each value of annotation_type: “sites” or “group”: 1-indexed location of sites of interest

example: [1, 10, 20, 30] is four sites, 1, 10, 20, and 30

“spans”: 1-indexed, inclusive locations of spans of interest

example: [[1, 10], [20, 30]] is two spans, 1 to 10 and 20 to 30

“primers”: Similar to spans, but 5’/3’ direction is preserved.

example: [[1, 10], [30, 20]] forward 1 to 10, reverse 30 to 20

annotation_type“group”, “sites”, “spans”, or “primers”

The type of annotation.

sequencestr or pandas.DataFrame

Nucleotide sequence, path to fasta file, or dataframe containing a “Sequence” column.

namestr, defaults to None

Name of annotation.

colormatplotlib color-like, defaults to “blue”

Color to be used for displaying this annotation on plots.

Attributes

datapandas.DataFrame

Stores the list of sites or regions

namestr

The label for this annotation for use on plots

colorvalid matplotlib color

Color to represent annotation on plots

sequencestr

The reference sequence string

property boolean

Return a boolean array of the annotation on the sequence.

classmethod from_boolean_array(values, sequence, annotation_type, name, color='blue', window=1)

Create an Annotation from an array of boolean values.

True values are used to create the Annotation.

Parameters

valueslist of True or False

the boolean array

sequencestring or rnav.data.Sequence

the sequence of the Annotation

annotation_type“spans”, “sites”, “primers”, or “group”

the type of the new annotation If “spans” or “primers”, adjacent True values, or values within window are collapse to a region.

namestring

a name for labelling the annotation.

colorstring, defaults to “blue”

a color for plotting the annotation

windowinteger, defaults to 1

a window around True values to include in the annotation.

Returns

rnavigate.data.Annotation

the new Annotation

from_sites(sites)

Create the self.data dataframe from a list of sites.

from_spans(spans)

Create the self.data dataframe from a list of spans.

get_aligned_data(alignment)

Aligns this Annotation to a new sequence and returns a copy.

Parameters

alignmentrnavigate.data.Alignment

Alignment object used to align to a new sequence.

Returns

rnavigate.data.Annotation

A new Annotation with the same name, color, and annotation type, but with the input data aligned to the target sequence.

get_sites()

Returns a list of nucleotide positions included in this annotation.

Returns

sitestuple

a list of nucleotide positions

get_subsequences(buffer=0)
class rnavigate.data.DanceMaP(input_data, component, read_table_kw=None, sequence=None, metric='Norm_profile', metric_defaults=None, name=None)

Bases: SHAPEMaP

A class to represent per-nucleotide DanceMaP data.

Parameters

input_datastr or pandas.DataFrame

path to a DanceMapper reactivities.txt file or a pandas DataFrame

componentint

Which component of the DanceMapper ensemble to read in (0-indexed).

read_table_kwdict, optional

Keyword arguments to pass to pandas.read_table. These are not necessary for reactivities.txt files. Defaults to None.

sequencernavigate.Sequence or str, optional

A sequence to use as the reference sequence. This is not necessary for reactivities.txt files. Defaults to None.

metricstr, defaults to “Norm_profile”

The name of the set of value-to-color options to use.

read_file(input_data, read_table_kw={})

Convert data file to pandas dataframe and store as self.data

Parameters

filepathstring

path to data file containing interactions

read_table_kwdict

kwargs dictionary passed to pd.read_table

Returns

dataframepandas.DataFrame

the data table

property recreation_kwargs

A dictionary of keyword arguments to pass when recreating the object.

class rnavigate.data.Data(input_data, sequence, metric, metric_defaults, read_table_kw=None, name=None)

Bases: Sequence

The base class for RNAvigate Profile and Interactions classes.

Parameters

input_datapandas.DataFrame or str

a pandas dataframe or path to a data file

sequencestring or rnavigate.data.Sequence

the sequence to use for the data

metricstring or dict

the column of the dataframe to use as the default metric to visualize

metric_defaultsdict

a dictionary of metric defaults

read_table_kwdict, optional

kwargs dictionary passed to pd.read_table

namestring, optional

the name of the data, defaults to None

Attributes

datapandas.DataFrame

the data table

filepathstring

the path to the data file

sequencestring or rnavigate.data.Sequence

the sequence to use for the data

metricstring or dict

the column of the dataframe to use as the metric to visualize

metric_defaultsdict

A dictionary of metric values and default settings for visualization

default_metricstring

the default metric to use for visualization

add_metric_defaults(metric_defaults)

Add metric defaults to self.metric_defaults

property cmap

Get the colormap to use for colorbars and to retrieve colors.

property color_column

Get the column of the dataframe to use as the color for visualization.

property colors

Get one matplotlib color-like value for each nucleotide in self.sequence.

property error_column

Get the column of the dataframe to use as the error for visualization.

property metric

Get the column of the dataframe to use as the metric for visualization.

read_file(filepath, read_table_kw)

Convert data file to pandas dataframe and store as self.data

Parameters

filepathstring

path to data file containing interactions

read_table_kwdict

kwargs dictionary passed to pd.read_table

Returns

dataframepandas.DataFrame

the data table

class rnavigate.data.DeltaProfile(profile1, profile2, metric=None, metric_defaults=None, name=None)

Bases: Profile

A class to represent the difference between two profiles.

Parameters

profile1Profile

The first profile to compare.

profile2Profile

The second profile to compare.

metricstr, optional

The name of the metric to use. Defaults to the metric of profile1.

metric_defaultsdict, optional

Keys are metric names, to be used with metric. Values are dictionaries of plotting parameters. Defaults to None.

namestr, optional

A name for the data set. Defaults to None.

class rnavigate.data.Interactions(input_data, sequence, metric, metric_defaults, read_table_kw=None, window=1, name=None)

Bases: Data

A class for storing and manipulating interactions data.

Parameters

input_datastring or pandas.DataFrame

If string, a path to a file containing interactions data. If dataframe, the dataframe containing interactions data. The dataframe must contain columns “i”, “j”, and self.metric. Dataframe may also include other columns.

sequencestring or rnavigate.data.Sequence

The sequence string corresponding to the interactions data.

metricstring

The column name to use for visualization.

metric_defaultsdict

Keys are metric names and values are dictionaries of metric-specific defaults. These defaults include:

“metric_column”string

the column name to use for visualization

“cmap”string or matplotlib.colors.Colormap)

the colormap to use for visualization

“normalization”“min_max”, “0_1”, “none”, or “bins”

The type of normalization to use when mapping values to colors

“values”list of float

The values to used with normalization of the data

“title”string

the title to use for colorbars

“extend”“min”, “max”, “both”, or “neither”

Which ends to extend when drawing the colorbar.

“tick_labels” : list of string

read_table_kwdict

kwargs passed to pandas.read_table() when reading input_data.

windowint

The window size used to generate the interactions data.

namestr

The name of the data object.

Attributes

datapandas.DataFrame

The interactions data.

windowint

The window size that is being represented by i-j pairs.

copy(apply_filter=False)

Returns a copy of the interactions, optionally with masked rows removed.

Parameters

apply_filterbool, defaults to False

If True, masked rows (“mask” == False) are dropped.

Returns

rnavigate.data.Interactions

A copy of the interactions.

count_filter(**kwargs)

Counts the number of interactions that pass the given filters.

data_specific_filter(**kwargs)

Does nothing for the base Interactions class, can be overwritten in subclasses.

Returns:

dict: dictionary of keyword argument pairs

filter(prefiltered=False, reset_filter=True, structure=None, min_cd=None, max_cd=None, paired_only=False, ss_only=False, ds_only=False, profile=None, min_profile=None, max_profile=None, compliments_only=False, nts=None, max_distance=None, min_distance=None, exclude_nts=None, isolate_nts=None, resolve_conflicts=None, **kwargs)

Convenience function that applies the above filters simultaneously.

Parameters

prefilteredbool, defaults to False

If True, the mask is not updated.

reset_filterbool, defaults to True

If True, the mask is reset before applying filters.

structurernavigate.data.SecondaryStructure, defaults to None

The structure to use for filtering.

min_cdint, defaults to None

The minimum contact distance to allow.

max_cdint, defaults to None

The maximum contact distance to allow.

paired_onlybool, defaults to False

If True, only keep interactions that are paired in the structure.

ss_onlybool, defaults to False

If True, only keep interactions between single-stranded nucleotides.

ds_onlybool, defaults to False

If True, only keep interactions between double-stranded nucleotides.

profilernavigate.data.Profile, defaults to None

The profile to use for masking.

min_profilefloat, defaults to None

The minimum profile value to allow.

max_profilefloat, defaults to None

The maximum profile value to allow.

compliments_onlybool, defaults to False

If True, only keep interactions where i and j are complimentary nucleotides.

ntsstr, defaults to None

If compliment_only is False, only keep interactions where i and j are in nts.

max_distanceint, defaults to None

The maximum distance to allow. If None, no maximum distance is set.

min_distanceint, defaults to None

The minimum distance to allow. If None, no minimum distance is set.

exclude_ntslist of int, defaults to None

A list of positions to exclude.

isolate_ntslist of int, defaults to None

A list of positions to isolate.

resolve_conflictsstr, defaults to None

If not None, conflicting windows are resolved using the Maximal Weighted Independent Set. The weights are taken from the metric value. The graph is first broken into components to speed up the identification of the MWIS. Then the mask is updated to only include the MWIS.

**kwargsdict

Each keyword should have the format “column_operator” where column is a valid column name of the dataframe and operator is one of:

“ge”: greater than or equal to “le”: less than or equal to “gt”: greater than “lt”: less than “eq”: equal to “ne”: not equal to

The values given to these keywords are then used in the comparison and False comparisons are filtered out. e.g.:

self.mask_on_values(Statistic_ge=23) evaluates to: self.update_mask(self.data[“Statistic”] >= 23)

Returns

masknumpy array

a boolean array of the same length as self.data

get_aligned_data(alignment, apply_filter=True)

Returns a copy mapped to a new sequence with masked rows removed.

Parameters

alignmentrnavigate.data.SequenceAlignment

The alignment to use for mapping the interactions.

apply_filterbool, defaults to True

If True, masked rows (“mask” == False) are dropped.

Returns

rnavigate.data.Interactions

Interactions mapped to a new sequence.

get_ij_colors()

Gets i, j, and colors lists for plotting interactions.

i and j are the 5’ and 3’ ends of each interaction, and colors is the color to use for each interaction. Values of self.data[self.metric] are normalized to 0 to 1, which correspond to self.min_max values. These are then mapped to a color using self.cmap.

Returns

ilist

5’ ends of each interaction

jlist

3’ ends of each interaction

colorslist

colors to use for each interaction

get_sorted_data()

Returns a copy of the data sorted by self.metric.

Returns

pandas.DataFrame

a copy of the data sorted by self.metric

mask_on_distance(max_dist=None, min_dist=None)

Mask interactions based on their distance in sequence space.

Parameters

max_distint, defaults to None

The maximum distance to allow. If None, no maximum distance is set.

min_distint, defaults to None

The minimum distance to allow. If None, no minimum distance is set.

Returns

masknumpy array

a boolean array of the same length as self.data

mask_on_position(exclude=None, isolate=None)

Mask interactions based on their i and j positions.

Parameters

excludelist of int, defaults to None

A list of positions to exclude.

isolatelist of int, defaults to None

A list of positions to isolate.

Returns

masknumpy array

a boolean array of the same length as self.data

mask_on_profile(profile, min_profile=None, max_profile=None)

Masks interactions based on per-nucleotide measurements.

Parameters

profilernavigate.data.Profile

The profile to use for masking.

min_profilefloat, defaults to None

The minimum profile value to allow.

max_profilefloat, defaults to None

The maximum profile value to allow.

Returns

masknumpy array

a boolean array of the same length as self.data

mask_on_sequence(compliment_only=None, nts=None)

Mask interactions based on sequence.

Parameters

compliment_onlybool, defaults to None

If True, only keep interactions where i and j are complimentary nucleotides.

ntsstr, defaults to None

If compliment_only is False, only keep interactions where i and j are in nts.

Returns

numpy array

a boolean array of the same length as self.data

mask_on_structure(structure, min_cd=None, max_cd=None, ss_only=False, ds_only=False, paired_only=False)

Masks interactions based on a secondary structure.

Parameters

structurernavigate.data.SecondaryStructure

The secondary structure to use for masking.

min_cdint, defaults to None

The minimum contact distance to allow.

max_cdint, defaults to None

The maximum contact distance to allow.

ss_onlybool, defaults to False

If True, only keep interactions between single-stranded nucleotides.

ds_onlybool, defaults to False

If True, only keep interactions between double-stranded nucleotides.

paired_onlybool, defaults to False

If True, only keep interactions that are paired in the structure.

Returns

masknumpy array

a boolean array of the same length as self.data

mask_on_values(**kwargs)

Mask interactions based on values in self.data.

Parameters

kwargsdict

Each keyword should have the format “column_operator” where column is a valid column name of the dataframe and operator is one of:

“ge”: greater than or equal to “le”: less than or equal to “gt”: greater than “lt”: less than “eq”: equal to “ne”: not equal to

The values given to these keywords are then used in the comparison and False comparisons are filtered out. e.g.:

self.mask_on_values(Statistic_ge=23) evaluates to: self.update_mask(self.data[“Statistic”] >= 23)

Returns

masknumpy array

a boolean array of the same length as self.data

print_new_file(outfile=None)

Create a new file with mapped and filtered interactions.

Parameters

outfilestr, defaults to None

path to an output file. If None, file string is printed to console.

reset_mask()

Resets the mask to all True (removes previous filters)

resolve_conflicts(metric=None)

Uses an experimental method to resolve conflicts.

Resolves conflicting windows using the Maximal Weighted Independent Set. The weights are taken from the metric value. The graph is first broken into components to speed up the identification of the MWIS. Then the mask is updated to only include the MWIS. This method is computationally expensive for large or dense datasets.

Parameters

metricstr, defaults to None

The metric to use for weighting the graph. If None, self.metric is used.

Returns

masknumpy array

a boolean array of the same length as self.data

set_3d_distances(pdb, atom)

Wrapper for set_distances for backwards compatibility.

set_distances(structure, atom="O2'")

Sets the Distance column value based on nt distances in the given structure.

If structure is a SecondaryStructure, contact distances are calculated, and if structure is a PDB, 3D distances are calculated. These distances are averaged across the window and stored in a new “Distance” column in self.data.

Parameters

structurernavigate.data.SecondaryStructure or rnavigate.data.PDB

Structure object to use for calculating distances

atomstr

atom id to use for calculating distances in a PDB structure

update_mask(mask)

Updates the mask by ANDing the current mask with the given mask.

class rnavigate.data.Motif(input_data, sequence, name=None, color='blue')

Bases: Annotation

Automatically annotates the occurances of a sequence motif as spans.

Parameters

input_datastr

sequence motif to search for. Uses conventional nucleotide codes. e.g. “DRACH” = [AGTU] [AG] A C [ATUC]

sequencestr or pandas.DataFrame

Nucleotide sequence, path to fasta file, or dataframe containing a “Sequence” column.

namestr, defaults to None

Name of annotation.

colormatplotlib color-like, defaults to “blue”

Color to be used for displaying this annotation on plots.

Attributes

datapandas.DataFrame

Stores the list of regions that match the motif

namestr

The label for this annotation for use on plots

colorvalid matplotlib color

Color to represent annotation on plots

sequencestr

The reference sequence string

get_aligned_data(alignment)

Searches the new sequence for the motif and returns a new Motif annotation.

Parameters

alignmentrnavigate.data.Alignment

Alignment object used to align to a new sequence.

Returns

rnavigate.data.Motif

A new Motif with the same name, color, and motif but with the input data aligned to the target sequence.

get_spans_from_motif(sequence, motif)

Returns a list of spans for each location of motif found within sequence.

Parameters

sequencestring

sequence to be searched

motifstring

sequence motif to searched for.

Returns

spanslist of lists

list of [start, end] positions of each motif occurance

class rnavigate.data.ORFs(input_data, name=None, sequence=None, color='blue')

Bases: Annotation

Automatically annotations occurances of open-reading frames as spans.

Parameters

input_data“longest” or “all”

which ORFs to annotate. “longest” annotates the longest ORF. “all” annotates all potential ORFs.

sequencestr or pandas.DataFrame

Nucleotide sequence, path to fasta file, or dataframe containing a “Sequence” column.

namestr, defaults to None

Name of annotation.

colormatplotlib color-like, defaults to “blue”

Color to be used for displaying this annotation on plots.

Attributes

datapandas.DataFrame

Stores the list of regions that match the motif

namestr

The label for this annotation for use on plots

colorvalid matplotlib color

Color to represent annotation on plots

sequencestr

The reference sequence string

get_aligned_data(alignment)

Searches the new sequence for ORFs and returns a new ORF annotation.

Parameters

alignmentrnavigate.data.Alignment

Alignment object used to align to a new sequence.

Returns

rnavigate.data.ORFs

A new ORFs annotation with the same name, color, and input_data but with the input data aligned to the target sequence.

get_spans_from_orf(sequence, which='all')

Given a sequence string, returns spans for specified ORFs

Parameters

sequencestring

RNA nucleotide sequence

which“longest” or “all”, defaults to “all”

“all” returns all spans, “longest” returns the longest span

Returns

list of tuples

(start, end) position of each ORF 1-indexed, inclusive

class rnavigate.data.PAIRMaP(input_data, sequence=None, metric='Class', metric_defaults=None, read_table_kw=None, window=1, name=None)

Bases: RINGMaP

A class for storing and manipulating PAIRMaP data.

Parameters

input_datastring or pandas.DataFrame

If string, a path to a file containing PAIRMaP data. If dataframe, the dataframe containing PAIRMaP data. The dataframe must contain columns “i”, “j”, “Statistic”, and “Class”. Dataframe may also include other columns.

sequencestring or rnavigate.data.Sequence

The sequence string corresponding to the PAIRMaP data.

metricstring, defaults to “Class”

The column name to use for visualization.

metric_defaultsdict

Keys are metric names and values are dictionaries of metric-specific defaults. These defaults include:

“metric_column”string

the column name to use for visualization

“cmap”string or matplotlib.colors.Colormap)

the colormap to use for visualization

“normalization”“min_max”, “0_1”, “none”, or “bins”

The type of normalization to use when mapping values to colors

“values”list of float

The values to used with normalization of the data

“title”string

the title to use for colorbars

“extend”“min”, “max”, “both”, or “neither”

Which ends to extend when drawing the colorbar.

“tick_labels” : list of string

read_table_kwdict, optional

kwargs passed to pandas.read_table() when reading input_data.

windowint, defaults to 1

The window size used to generate the PAIRMaP data. If an input file is provided, this value is overwritten by the value in the header.

namestr, optional

A name for the interactions object.

Attributes

datapandas.DataFrame

The PAIRMaP data.

data_specific_filter(all_pairs=False, **kwargs)

Used by Interactions.filter(). By default, non-primary and -secondary pairs are removed. all_pairs=True changes this behavior.

Parameters

all_pairsbool, defaults to False

whether to include all PAIRs.

Returns

kwargsdict

any additional keyword-argument pairs are returned

masknumpy array

a boolean array of the same length as self.data

get_sorted_data()

Same as parent function, unless metric is set to “Class”, in which case ij pairs are returned in a different order.

Returns

pandas.DataFrame

a copy of the data sorted by self.metric

read_file(filepath, read_table_kw=None)

Parses a pairmap.txt file and stores data as a dataframe

Sets self.window (usually 3), from header.

Parameters

filepathstr

path to pairmap.txt file

read_table_kwdict, defaults to None

This argument is ignored.

class rnavigate.data.PDB(input_data, chain, sequence=None, name=None)

Bases: Sequence

A class to represent RNA tertiary structures with atomic coordinates.

This data can be used to filter interactions by 3D distance, and to visualize profile and interactions data on interactive 3D structures.

Parameters

input_datastr

path to a PDB or CIF file

chainstr

chain identifier of RNA of interest

sequencernavigate.Sequence or str, optional

A sequence to use as the reference sequence. This is required if the sequence cannot be found in the header Defaults to None.

namestr, optional

A name for the data set. Defaults to None.

Attributes

sequencestr

The RNA sequence

lengthint

The length of the RNA sequence

namestr

A name for the data set

pathstr

The path to the PDB or CIF file

chainstr

The chain identifier of the RNA of interest

offsetint

The offset between the sequence positions and the PDB residue indices

pdbBio.PDB.Structure.Structure

The PDB structure

pdb_idxnp.array

The PDB indices of the RNA

pdb_seqnp.array

The PDB sequence of the RNA

distance_matrixdict

A dictionary of distance matrices for each atom type

get_distance(i, j, atom="O2'")

Get the distance between given atom in nucleotides i and j (1-indexed).

Parameters

iint

The first nucleotide

jint

The second nucleotide

atomstring or dict, defaults to “O2’”

The atom to use for distance calculations. If a string, the same atom will be used for all residues. If a dict, the atom will be chosen based on the nucleotide type. If “DMS”, the N1 atom will be used for A and G, and the N3 atom will be used for U and C.

Returns

distancefloat

The distance between the atoms

get_distance_matrix(atom="O2'")

Get the pairwise atomic distance matrix for all residues.

Parameters

atomstring or dict, defaults to “O2’”

The atom to use for distance calculations. If a string, the same atom will be used for all residues. If a dict, the atom will be chosen based on the nucleotide type. If “DMS”, the N1 atom will be used for A and G, and the N3 atom will be used for U and C.

Returns

matrixNxN numpy.ndarray

A 2D array of pairwise distances. N is the length of the RNA.

get_pdb_idx(seq_idx)

Return the PDB index given the sequence index (0-indexed).

get_seq_idx(pdb_idx)

Return the sequence index given the PDB index.

get_sequence(pdb)

Find the sequence in the provided CIF or PDB file.

Parameters

pdbstr

path to a PDB or CIF file

Returns

sequencestring

The RNA sequence

get_sequence_from_seqres(seqres)

Used by get_sequence to parse the SEQRES entries.

Parameters

seqreslist

A list of SEQRES entries for the RNA chain of interest

Returns

sequencestring

The RNA sequence

get_xyz_coord(nt, atom)

Return the x, y, and z coordinates for a given residue and atom.

Parameters

ntint

The nucleotide of interest (1-indexed)

atomstring or dict, defaults to “O2’”

The atom to use for distance calculations. If a string, the same atom will be used for all residues. If a dict, the atom will be chosen based on the nucleotide type. If “DMS”, the N1 atom will be used for A and G, and the N3 atom will be used for U and C.

Returns

xyzlist

A list of x, y, and z coordinates

is_valid_idx(pdb_idx=None, seq_idx=None)

Determines if a PDB or sequence index is in the PDB structure.

Parameters

pdb_idxint, optional

A PDB index (1-indexed). Defaults to None.

seq_idxint, optional

A sequence index (1-indexed). Defaults to None.

Returns

bool

True if the index is in the PDB structure, False otherwise.

read_pdb(pdb)

Read a PDB or CIF file into the data structure.

Parameters

pdbstr

path to a PDB or CIF file

set_indices()

Uses self.data and self.sequence to set self.offset

class rnavigate.data.PairingProbability(input_data, extension=None, sequence=None, metric='Probability', metric_defaults=None, read_table_kw=None, window=1, name=None)

Bases: Interactions

A class for storing and manipulating pairing probability data.

Parameters

input_datastring or pandas.DataFrame

If string, a path to a file containing pairing probability data. If dataframe, the dataframe containing pairing probability data. The dataframe must contain columns “i”, “j”, “Probability”, and “log10p”. Dataframe may also include other columns.

extensionstring, defaults to None

The file extension of the input_data. If None, the extension is determined from the input_data string. Options are “.bps”, “.txt”, and “.dp”. If the extension is “.bps”, the sequence is parsed from the file. If the extension is “.txt” or “.dp”, the sequence must be provided via the sequence argument.

sequencestring or rnavigate.data.Sequence

The sequence string corresponding to the pairing probability data.

metricstring, defaults to “Probability”

The column name to use for visualization.

metric_defaultsdict

Keys are metric names and values are dictionaries of metric-specific defaults. These defaults include:

“metric_column”string

the column name to use for visualization

“cmap”string or matplotlib.colors.Colormap

the colormap to use for visualization

“normalization”“min_max”, “0_1”, “none”, or “bins”

The type of normalization to use when mapping values to colors

“values”list of float

The values to used with normalization of the data

“title”string

the title to use for colorbars

“extend”“min”, “max”, “both”, or “neither”

Which ends to extend when drawing the colorbar.

“tick_labels” : list of string

read_table_kwdict, optional

kwargs passed to pandas.read_table() when reading input_data.

windowint, defaults to 1

The window size used to generate the pairing probability data.

namestr, optional

A name for the PairingProbability object.

Attributes

datapandas.DataFrame

The pairing probability data.

data_specific_filter(**kwargs)

By default, interactions with probabilities less than 0.03 are removed.

Returns

kwargsdict

any additional keyword-argument pairs are returned

masknumpy array

a boolean array of the same length as self.data

get_entropy_profile(print_out=False, save_file=None)

Calculates per-nucleotide Shannon entropy from pairing probabilities.

Parameters

print_outbool, defaults to False

If True, entropy values are printed to console.

save_filestr, defaults to None

If not None, entropy values are saved to this file.

Returns

rnavigate.data.Profile

a Profile object containing the entropy data

read_bps()

Parses a bps file and returns sequence as a string and data as a dataframe.

Returns

str

the sequence string

pandas.DataFrame

the pairing probability data

read_txt()

Parses a pairing probability file and returns data as a dataframe.

Parameters

filepathstr

path to pairing probability file

read_table_kwdict, defaults to None

This argument is ignored.

Returns

pandas.DataFrame

the pairing probability data

class rnavigate.data.Profile(input_data, metric='default', metric_defaults=None, read_table_kw=None, sequence=None, name=None)

Bases: Data

A class to represent per-nucleotide data.

Parameters

input_datastr or pandas.DataFrame

path to a csv or tab file or a pandas DataFrame Table must be 1 row for each nucleotide in the sequence. table columns must contain these columns:

A nucleotide position column labelled “Nucleotide” A sequence column labelled “Sequence” with 1 of (A, C, G, U, T) per row

These will be added to the table if sequence is provided.

A data measurement column labelled “Profile” with a float or integer

Label may be another name if specified in metric_defaults

Optionally: A measurement error column.

Label must be specified in metric_defaults

Other columns may be present, and set up using metric_defaults.

See metric_defaults for more information.

read_table_kwdict, optional

Keyword arguments to pass to pandas.read_table. Defaults to None.

sequencernavigate.Sequence or str, optional

A sequence to use as the reference sequence. This is required if input_data does not contain a “Sequence” column. Defaults to None.

metricstr, defaults to “default”

The name of the set of value-to-color options to use. “default” specifies:

“Profile” column is used No error rates are present Values are normalized to the range [0, 1] Values are mapped to colors using the “viridis” colormap

“Distance” specifies:

(3-D) “Distance” column is used No error rates are present Values in the range [5, 50] are normalized to the range [0, 1] Values are mapped to colors using the “cool” colormap

Other options may be defined in metric_defaults.

metric_defaultsdict, optional

Keys are metric names, to be used with metric. Values are dictionaries of plotting parameters:

“metric_column”str

The name of the column to use as the metric. Plots and analyses that use per-nucleotide data will use this column. If “color_column” is not provided, this column also defines colors.

“error_column”str or None

The name of the column to use as the error. If None, no error bars are plotted.

“color_column”str or None

The name of the column to use for coloring. If None, colors are defined by “metric_column”.

“cmap”str or list

The name of the colormap to use. If a list, the list of colors to use.

“normalization”str

The type of normalization to use. In order to be used with colormaps, values are normalized to either be integers for categorical colormaps, or floats in the range [0, 1] for continuous colormaps. “none” : no normalization is performed “min_max” : values are scaled to floats in the range [0, 1] based on

the upper and lower bounds defined in “values”

“0_1”values are scaled to floats in the range [0, 1] based on

the minimum and maximum values in the data

“bins”values are scaled an integer based on bins defined by the

list of bounds defined in “values”

“percentiles”values are scaled to floats in the range [0, 1]

based on upper and lower percentile bounds defined by “values”

“values”list or None

The values to use when normalizing the data. if “normalization” is “min_max”, this should be a list of two values

defining the upper and lower bounds.

if “normalization” is “bins”, this should be a list of values

of length 1 less than the length of cmap. example: [5, 10, 20] defines 4 bins:

(-infinity, 5), [5, 10), [10, 20), [20, infinity)

if “normalization” is “percentiles”, this should be a list of two

values defining the upper and lower percentile bounds.

if “normalization” is “0_1” or “none”, this should be None.

“title”str, defaults to “”

The title of the colorbar.

“ticks”list, defaults to None

The tick locations to use for the colorbar. If None, values are determined automatically.

“tick_labels”list, defaults to None

The labels to use for the colorbar ticks. If None, values are determined automatically from “ticks”.

“extend”“neither”, “both”, “min”, or “max”, defaults to “neither”

Which ends of the colorbar to extend (places an arrow head).

Defaults to None.

namestr, optional

A name for the data set. Defaults to None.

Attributes

datapandas.DataFrame

The data table

calculate_gini_index(values)

Calculate the Gini index of an array of values.

calculate_windows(column, window, method='median', new_name=None, minimum_points=None, mask_na=True)

calculates a windowed operation over a column of data.

Result is stored in a new column. Value of each window is assigned to the center position of the window.

Parameters

columnstr

name of column to perform operation on

windowint

window size, must be an odd number

methodstring or function, defaults to “median”

operation to perform over windows. if string, must be “median”, “mean”, “minimum”, or “maximum” if function, must take a 1D numpy array as input and return a scalar

new_namestr, defaults to f”{method}_{window}_nt”

name of new column for stored result.

minimum_pointsint, defaults to value of window

minimum number of points within each window.

mask_nabool, defaults to True

whether to mask the result of the operation where the original column has a nan value.

copy()

Returns a copy of the Profile.

classmethod from_array(input_data, sequence, **kwargs)

Construct a Profile object from an array of values.

Parameters

input_datalist or np.array

A list or array of values to use as the metric.

sequencestr

The RNA sequence.

**kwargs

Additional keyword arguments to pass to the Profile constructor.

Returns

Profile

A Profile object with the provided values.

get_aligned_data(alignment)

Returns a new Profile object with the data aligned to a sequence.

Parameters

alignmentrnavigate.data.SequenceAlignment

The alignment to use to map rows of self.data to a new sequence.

Returns

Profile

A new Profile object with the data aligned to the sequence in the alignment.

get_plotting_dataframe()

Returns a dataframe with the data to be plotted.

Returns

pandas.DataFrame

A dataframe with the columns “Nucleotide”, “Values”, “Errors”, and “Colors”.

norm_boxplot(values)

removes outliers (> 1.5 * IQR) and scales the mean to 1.

NOTE: This method varies slightly from normalization method used in the SHAPEMapper pipeline. Shapemapper sets undefined values to 0, and then uses these values when computing iqr and 90th percentile. Including these values can skew these result. This method excludes such nan values. Other elements are the same.

Parameters

values1D numpy array

values to normalize

Returns

(float, float)

scaling factor and error propagation factor

norm_eDMS(values)

Calculates norm factors following eDMS pernt scheme in ShapeMapper 2.2

Parameters

values1D numpy array

values to normalize

Returns

(float, float)

scaling factor and error propagation factor

norm_percentiles(values, lower_bound=90, upper_bound=99, median_or_mean='mean')

Calculates factors to scale the median between percentile bounds to 1.

Parameters

values1D numpy array

values to normalize

lower_boundint or float, optional

percentile of lower bound, Defaults to 90

upper_boundint or float, optional

percentile of upper bound, Defaults to 99

median_or_meanstring, optional

whether to use the median or mean of the values between the bounds.

Returns

(float, float)

scaling factor and error propagation factor

normalize(profile_column=None, new_profile=None, error_column=None, new_error=None, norm_method='boxplot', nt_groups=None, profile_factors=None, **norm_kwargs)

Normalize values in a column, and store in a new column.

By default, performs ShapeMapper2 boxplot normalization on self.metric and stores the result as “Norm_profile”.

Parameters

profile_columnstring, defaults to self.metric

column name of values to normalize

new_profilestring, defaults to “Norm_profile”

column name of new normalized values

error_columnstring, defaults to self.error_column

column name of error values to propagate

new_errorstring, defaults to “Norm_error”

column name of new propagated error values

norm_methodstring, defaults to “boxplot”

normalization method to use. “DMS” uses self.norm_percentile and nt_groups=[‘AC’, ‘UG’]

scales the median of 90th to 95th percentiles to 1 As and Cs are normalized seperately from Us and Gs

“eDMS” uses self.norm_eDMS and nt_groups=[‘A’, ‘U’, ‘C’, ‘G’]

Applies the new eDMS-MaP normalization. Each nucleotide is normalized seperately.

“boxplot” uses self.norm_boxplot and nt_groups=[‘AUCG’]

removes outliers (> 1.5 iqr) and scales median to 1 scales nucleotides together unless specified with nt_groups

“percentile” uses self.norm_percentile and nt_groups=[‘AUCG’]

scales the median of 90th to 95th percentiles to 1 scales nucleotides together unless specified with nt_groups

Defaults to “boxplot”: the default normalization of ShapeMapper

nt_groupslist of strings, defaults to None

A list of nucleotides to group e.g. [‘AUCG’] groups all nts together

[‘AC’, ‘UG’] groups As with Cs and Us with Gs [‘A’, ‘C’, ‘U’, ‘G’] scales each nt seperately

Default depends on norm_method

profile_factorsdictionary, defaults to None
a scaling factor (float) for each nucleotide. keys must be:

‘A’, ‘C’, ‘U’, ‘G’

Note: using this argument overrides any calculation of scaling Defaults to None

**norm_kwargs

these are passed to the norm_method function

Returns

profile_factorsdict

the new profile scaling factors dictionary

normalize_external(profiles, **kwargs)

normalize reactivities using other profiles to normfactors.

Parameters

profileslist of rnavigate.data.Profile

a list of other profiles used to compute scaling factors

Returns

profile_factorsdict

the new profile scaling factors dictionary

normalize_sequence(t_or_u='U', uppercase=True)

Changes the values in self.data[“Sequence”] to the normalized sequence.

Parameters

t_or_u“T” or “U”, Defaults to “U”.

Whether to replace T with U or U with T.

uppercasebool, Defaults to True.

Whether to convert the sequence to uppercase.

property recreation_kwargs

A dictionary of keyword arguments to pass when recreating the object.

winsorize(column, lower_bound=None, upper_bound=None)

Winsorize the data between bounds.

If either bound is set to None, one-sided Winsorization is performed.

Parameters

columnstring

the column of data to be winsorized

lower_boundNumber or None, defaults to None

Data below this value is set to this value. If None, no lower bound is applied.

upper_boundNumber or None, defaults to None

Data above this value is set to this value. If None, no upper bound is applied.

class rnavigate.data.RINGMaP(input_data, sequence=None, metric='Statistic', metric_defaults=None, read_table_kw=None, window=1, name=None)

Bases: Interactions

A class for storing and manipulating RINGMaP data.

Parameters

input_datastring or pandas.DataFrame

If string, a path to a file containing RINGMaP data. If dataframe, the dataframe containing RINGMaP data. The dataframe must contain columns “i”, “j”, “Statistic”, and “Zij”. Dataframe may also include other columns.

sequencestring or rnavigate.data.Sequence

The sequence string corresponding to the RINGMaP data.

metricstring, defaults to “Statistic”

The column name to use for visualization.

metric_defaultsdict

Keys are metric names and values are dictionaries of metric-specific defaults. These defaults include:

“metric_column”string

the column name to use for visualization

“cmap”string or matplotlib.colors.Colormap)

the colormap to use for visualization

“normalization”“min_max”, “0_1”, “none”, or “bins”

The type of normalization to use when mapping values to colors

“values”list of float

The values to used with normalization of the data

“title”string

the title to use for colorbars

“extend”“min”, “max”, “both”, or “neither”

Which ends to extend when drawing the colorbar.

“tick_labels” : list of string

read_table_kwdict, optional

kwargs passed to pandas.read_table() when reading input_data.

windowint, defaults to 1

The window size used to generate the RINGMaP data. If an input file is provided, this value is overwritten by the value in the header.

namestr, optional

A name for the interactions object.

Attributes

datapandas.DataFrame

The RINGMaP data.

data_specific_filter(positive_only=False, negative_only=False, **kwargs)

Adds filters for “Sign” column to parent filter() function

Parameters

positive_onlybool, defaults to False

If True, only keep positive correlations.

negative_onlybool, defaults to False

If True, only keep negative correlations.

Returns

kwargsdict

any additional keyword-argument pairs are returned

masknumpy array

a boolean array of the same length as self.data

get_sorted_data()

Sorts on the product of self.metric and “Sign” columns.

Except when self.metric is “Distance”.

Returns

pandas.DataFrame

a copy of the data sorted by (self.metric * “Sign”) columns

read_file(filepath, read_table_kw=None)

Parses a RINGMaP correlations file and stores data as a dataframe.

Also sets self.window (usually 1, from header).

Parameters

filepathstr

path to correlations file.

read_table_kwdict, defaults to {}

kwargs passed to pandas.read_table().

Returns

pandas.DataFrame

the RINGMaP data

class rnavigate.data.RNPMaP(input_data, read_table_kw=None, sequence=None, metric='NormedP', metric_defaults=None, name=None)

Bases: Profile

Represents per-nucleotide RNPMaP data.

Parameters

input_datastr or pandas.DataFrame

path to an RNAModMapper reactivities.txt file or a pandas DataFrame

read_table_kwdict, optional

Keyword arguments to pass to pandas.read_table. These are not necessary for reactivities.txt files. Defaults to None.

sequencernavigate.Sequence or str, optional

A sequence to use as the reference sequence. This is not necessary for reactivities.txt files. Defaults to None.

metricstr, defaults to “NormedP”

The name of the set of value-to-color options to use.

metric_defaultsdict, optional

Keys are metric names, to be used with metric. Values are dictionaries of plotting parameters. Defaults to None.

namestr, optional

A name for the data set. Defaults to None.

class rnavigate.data.SHAPEJuMP(input_data, sequence=None, metric='Percentile', metric_defaults=None, read_table_kw=None, window=1, name=None)

Bases: Interactions

A class for storing and manipulating SHAPEJuMP data.

Parameters

input_datastring or pandas.DataFrame

If string, a path to a file containing SHAPEJuMP data. If dataframe, the dataframe containing SHAPEJuMP data. The dataframe must contain columns “i”, “j”, “Metric” (JuMP rate) and “Percentile” (percentile ranking). Dataframe may also include other columns.

sequencestring or rnavigate.data.Sequence

The sequence string corresponding to the SHAPEJuMP data.

metricstring, defaults to “Percentile”

The column name to use for visualization.

metric_defaultsdict

Keys are metric names and values are dictionaries of metric-specific defaults. These defaults include:

“metric_column”string

the column name to use for visualization

“cmap”string or matplotlib.colors.Colormap)

the colormap to use for visualization

“normalization”“min_max”, “0_1”, “none”, or “bins”

The type of normalization to use when mapping values to colors

“values”list of float

The values to used with normalization of the data

“title”string

the title to use for colorbars

“extend”“min”, “max”, “both”, or “neither”

Which ends to extend when drawing the colorbar.

“tick_labels” : list of string

read_table_kwdict

kwargs passed to pandas.read_table() when reading input_data.

windowint

The window size used to generate the SHAPEJuMP data.

namestr

A name for the interactions object.

Attributes

datapandas.DataFrame

The SHAPEJuMP data.

read_file(input_data, read_table_kw=None)

Parses a deletions.txt file and stores it as a dataframe.

Also calculates a “Percentile” column.

Parameters

input_datastr

path to deletions.txt file

read_table_kwdict, defaults to {}

kwargs passed to pandas.read_table().

Returns

pandas.DataFrame

the SHAPEJuMP data

class rnavigate.data.SHAPEMaP(input_data, normalize=None, read_table_kw=None, sequence=None, metric='Norm_profile', metric_defaults=None, log=None, name=None)

Bases: Profile

A class to represent per-nucleotide SHAPE-MaP data.

Parameters

input_datastr or pandas.DataFrame

path to a ShapeMapper2 profile.txt or .map file or a pandas DataFrame

normalize“DMS”, “eDMS”, “boxplot”, “percentiles”, or None, defaults to None

The normalization method to use. “DMS” uses self.norm_percentile and nt_groups=[‘AC’, ‘UG’]

scales the median of 90th to 95th percentiles to 1 As and Cs are normalized seperately from Us and Gs

“eDMS” uses self.norm_eDMS and nt_groups=[‘A’, ‘U’, ‘C’, ‘G’]

Applies the new eDMS-MaP normalization. Each nucleotide is normalized seperately.

“boxplot” uses self.norm_boxplot and nt_groups=[‘AUCG’]

removes outliers (> 1.5 iqr) and scales median to 1 scales nucleotides together unless specified with nt_groups

“percentiles” uses self.norm_percentile and nt_groups=[‘AUCG’]

scales the median of 90th to 95th percentiles to 1 scales nucleotides together unless specified with nt_groups

Defaults to None: no normalization is performed

read_table_kwdict, optional

Keyword arguments to pass to pandas.read_table. These are not necessary for profile.txt and .map files. Defaults to None.

sequencernavigate.Sequence or str, optional

A sequence to use as the reference sequence. This is not necessary for profile.txt and .map files. Defaults to None.

metricstr, defaults to “Norm_profile”

The name of the set of value-to-color options to use. “Norm_profile” specifies:

“Norm_profile” column is used “Norm_stderr” column is used for error bars Values are normalized to bins:

(-inf, -0.4), [-0.4, 0.4), [0.4, 0.85), [0.85, 2), [2, inf)

Bins are mapped to “grey”, “black”, “orange”, “red”, “red”

Other options may be defined in metric_defaults.

metric_defaultsdict, optional

Keys are metric names, to be used with metric. Values are dictionaries of plotting parameters:

“metric_column”str

The name of the column to use as the metric. Plots and analyses that use per-nucleotide data will use this column. If “color_column” is not provided, this column also defines colors.

“error_column”str or None

The name of the column to use as the error. If None, no error bars are plotted.

“color_column”str or None

The name of the column to use for coloring. If None, colors are defined by “metric_column”.

“cmap”str or list

The name of the colormap to use. If a list, the list of colors to use.

“normalization”str

The type of normalization to use. In order to be used with colormaps, values are normalized to either be integers for categorical colormaps, or floats in the range [0, 1] for continuous colormaps. “none” : no normalization is performed “min_max” : values are scaled to floats in the range [0, 1] based on

the upper and lower bounds defined in “values”

“0_1”values are scaled to floats in the range [0, 1] based on

the minimum and maximum values in the data

“bins”values are scaled an integer based on bins defined by the

list of bounds defined in “values”

“percentiles”values are scaled to floats in the range [0, 1]

based on upper and lower percentile bounds defined by “values”

“values”list or None

The values to use when normalizing the data. if “normalization” is “min_max”, this should be a list of two values

defining the upper and lower bounds.

if “normalization” is “bins”, this should be a list of values

of length 1 less than the length of cmap. example: [5, 10, 20] defines 4 bins:

(-infinity, 5), [5, 10), [10, 20), [20, infinity)

if “normalization” is “percentiles”, this should be a list of two

values defining the upper and lower percentile bounds.

if “normalization” is “0_1” or “none”, this should be None.

“title”str, defaults to “”

The title of the colorbar.

“ticks”list, defaults to None

The tick locations to use for the colorbar. If None, values are determined automatically.

“tick_labels”list, defaults to None

The labels to use for the colorbar ticks. If None, values are determined automatically from “ticks”.

“extend”“neither”, “both”, “min”, or “max”, defaults to “neither”

Which ends of the colorbar to extend (places an arrow head).

Defaults to None.

logstr, optional

Path to a ShapeMapper v2 shapemap_log.txt file with mutations-per-molecule and read-length histograms. These will be present if the –per-read-histogram flag was used when running ShapeMapper v2. Currently, this is not working with ShapeMapper v2.2 files. Defaults to None.

namestr, optional

A name for the data set. Defaults to None.

Attributes

datapandas.DataFrame

The data table

classmethod from_rnaframework(input_data, normalize=None)

Construct a SHAPEMaP object from an RNAFramework output file.

Parameters

input_datastr

path to an RNAFramework .xml reactivities file

normalize“DMS”, “eDMS”, “boxplot”, “percentiles”, or None, defaults to None

The normalization method to use. “DMS” uses self.norm_percentile and nt_groups=[‘AC’, ‘UG’]

scales the median of 90th to 95th percentiles to 1 As and Cs are normalized seperately from Us and Gs

“eDMS” uses self.norm_eDMS and nt_groups=[‘A’, ‘U’, ‘C’, ‘G’]

Applies the new eDMS-MaP normalization. Each nucleotide is normalized seperately.

“boxplot” uses self.norm_boxplot and nt_groups=[‘AUCG’]

removes outliers (> 1.5 iqr) and scales median to 1 scales nucleotides together unless specified with nt_groups

“percentiles” uses self.norm_percentile and nt_groups=[‘AUCG’]

scales the median of 90th to 95th percentiles to 1 scales nucleotides together unless specified with nt_groups

Defaults to None: no normalization is performed

Returns

SHAPEMaP

A SHAPEMaP object with the provided values.

read_log(log)

Read the ShapeMapper log file.

Parameters

logstr

Path to a ShapeMapper v2 shapemap_log.txt file with mutations-per-molecule and read-length histograms.

Returns

read_lengthspandas.DataFrame

A dataframe with the columns “Read_length”, “Modified_read_length”, and “Untreated_read_length”.

mutations_per_moleculepandas.DataFrame

A dataframe with the columns “Mutation_count”, “Modified_mutations_per_molecule”, and “Untreated_mutations_per_molecule”.

write_bpp2seq_file(output_file)

Write the data to a ShapeMapper2 .bpp2seq file (for Contra/EternaFold).

Parameters

output_filestr

The path to write the output file.

write_shape_file(output_file)

Write the data to a ShapeMapper2 .shape file (for RNAstructure programs).

Parameters

output_filestr

The path to write the output file.

class rnavigate.data.ScalarMappable(cmap, normalization, values, title='', tick_labels=None, **cbar_args)

Bases: _ScalarMappable

Used to map scalar values to a color and to create a colorbar plot.

Parameters

cmapstr, tuple, float, or list

A valid mpl color, list of valid colors or a valid colormap name

normalization“min_max”, “0_1”, “none”, or “bins”

The type of normalization to use when mapping values to colors

valueslist

The values to use when normalizing the data

titlestr, defaults to “”

The title of the colorbar.

tick_labelslist, defaults to None

The labels to use for the colorbar ticks. If None, values are determined automatically.

**cbar_argsdict

Additional arguments to pass to the colorbar function

Attributes

rnav_normstr

The type of normalization to use when mapping values to colors

rnav_valslist

The values to use when normalizing the data

rnav_cmaplist

The colors to use when mapping values to colors

cbar_argsdict

Additional arguments to pass to the colorbar function

tick_labelslist

The labels to use for the colorbar ticks. If None, values are determined automatically.

titlestr

The title of the colorbar.

get_cmap(cmap)

Converts a cmap specification to a matplotlib colormap object.

Parameters

cmapstring, tuple, float, or list

A valid mpl color, list of valid colors or a valid colormap name

Returns

matplotlib colormap

a colormap matching the input

get_norm(normalization, values, cmap)

Given a normalization type and values, return a normalization object.

Parameters

normalization“min_max”, “0_1”, “none”, or “bins”

The type of normalization to use when mapping values to colors

valueslist

The values to use when normalizing the data

cmapmatplotlib colormap

The colormap to use when normalizing the data

Returns

matplotlib.colors normalization object

Used to normalize data before mapping to colors

is_equivalent_to(cmap2)

Check if two ScalarMappable objects are equivalent.

Parameters

cmap2ScalarMappable

The ScalarMappable object to compare to

Returns

bool

True if the two ScalarMappable objects are equivalent, False otherwise

values_to_hexcolors(values, alpha=1.0)

Map values to colors and return a list of hex colors.

Parameters

valueslist

The values to map to colors

alphafloat, defaults to 1.0

The alpha value to use for the colors

Returns

list of strings

A list of hex colors

class rnavigate.data.SecondaryStructure(input_data, extension=None, autoscale=True, name=None, **kwargs)

Bases: Sequence

Base class for secondary structures.

Parameters

input_datastr or pandas.DataFrame

A dataframe or filepath containing a secondary structure DataFrame should contain these columns:

[“Nucleotide”, “Sequence”, “Pair”]

“Pair” column must be redundant. Filepath parsing is determined by file extension:

varna, xrna, nsd, cte, ct, dbn, bracket, json (R2DT), forna

extensionstr, optional

The file extension of the input_data file. If not provided, the extension will be inferred from the input_data filepath.

autoscalebool, optional

Whether to automatically scale the x and y coordinates. Defaults to True.

namestr, optional

The name of the RNA sequence. Defaults to None.

Attributes

datapandas.DataFrame

DataFrame storing base-pairs

filepathstr

The path to the input file, if provided, otherwise “dataframe”

sequencestr

The RNA sequence

ntsnumpy.array

The “Nucleotide” column of data

pair_ntsnumpy.array

The “Pair” column of data

headerstr

Header information from CT file

xcoordinatesnumpy.array

The “X_coordinate” column of data

ycoordinatesnumpy.array

The “X_coordinate” column of data

distance_matrixnumpy.array

The contact distance matrix of the RNA structure

add_pairs(pairs, break_conflicting_pairs=False)

Add base pairs to current secondary structure.

Parameters

pairslist

1-indexed list of paired residues. e.g. [(1, 20), (2, 19)]

break_conflicting_pairsbool, defaults to False

Whether to break existing pairs if there is a conflict

as_interactions(structure2=None)

Returns rnavigate.Interactions representation of this, or more, structures.

Parameters

structure2SecondaryStructure or list of these, defaults to None

If provided, basepairs from all structures are included and labeled by which structures contain them and how many structures contain them.

property boolean

Return a boolean array of paired and unpaired nucleotides.

break_noncanonical_pairs()

Removes non-canonical basepairs from the secondary structure.

WARNING: this deletes information.

break_pairs_nts(nt_positions)

break base pairs at the given list of positions.

WARNING: this deletes information.

Parameters

nt_positionslist of int

1-indexed positions to break pairs

break_pairs_region(start, end, break_crossing=True, inverse=False)

Removes pairs from the specified region (1-indexed, inclusive).

WARNING: this deletes information

Parameters

startint

start position (1-indexed, inclusive)

endint

end position (1-indexed, inclusive)

break_crossingbool, defaults to True

Whether to keep pairs that cross over the specified region

inversebool, defaults to False

Invert the behavior, i.e. remove pairs that are not in this region

break_singleton_pairs()

Removes singleton basepairs from the secondary structure.

WARNING: This deletes information.

compute_ppv_sens(structure2, exact=True)

Compute the PPV and sensitivity between this and another structure.

True and False are determined from this structure. Positive and Negative are determined from structure2.

PPV = TP / (TP + FP) Sensitivity = TP / (TP + FN)

Parameters

structure2SecondaryStructure

The SecondaryStructure to compare to.

exactbool, defaults to True

True requires BPs to be exactly correct. False allows +/-1 bp slippage.

Returns

float

sensitivity

float

PPV

2-tuple of floats

(TP, TP+FP, TP+FN)

contact_distance(i, j)

Returns the contact distance between positions i and j

copy()
fill_mismatches(mismatch=1)

Adds base pairs to fill 1,1 and optionally 2,2 mismatches.

Parameters

mismatchint, defaults to 1

1 will fill only 1,1 mismatches 2 will fill 1,1 and 2,2 mismatches

classmethod from_pairs_list(input_data, sequence)

Creates a SecondaryStructure from a list of pairs and a sequence.

Parameters

input_datalist

1-indexed list of base pairs. e.g. [(1, 20), (2, 19)]

sequencestr

The RNA sequence. e.g., “AUCGUGUCAUGCUA”

classmethod from_sequence(input_data)

Creates a SecondaryStructure from a sequence string.

This structure is initialized with no base pairs. If base pairs are needed, use SecondaryStructure.from_pairs_list().

get_aligned_data(alignment)

Returns a new SecondaryStructure object matching the alignment target.

Parameters

alignmentdata.Alignment

An alignment object used to map values

get_distance_matrix(recalculate=False, max_cd=50)

Get a matrix of pair-wise shortest path distances through the structure.

This function uses a BFS algorithm. The structure is represented as a complete graph with nucleotides as vertices and base-pairs and backbone as edges. All edges are length 1. Matrix is stored as an attribute for future use.

If the attribute is set (not None) and recalculate is False, the attribute will be returned.

Based on Tom’s contact_distance, but expanded to return the pairwise matrix. New contact_distance method added to return the distance between two positions.

By default, the maximum contact distance is set to 50. This will be the maximum value reported in the matrix, i.e. a value of 50 in the matrix means >= 50. This prevents the algorithm from running for a very long time on long RNAs. If you need a larger value, set max_cd to a higher value.

Parameters

recalculatebool, defaults to False

Set to True to recalculate the matrix even if the attribute is set.

max_cdint, defaults to 50

The maximum contact distance to calculate.

get_dotbracket()

Get a dotbracket notation string representing the secondary structure.

Pseudoknot levels:

1: () 2: [] 3: {} 4: <> 5: Aa 6: Bb 7: Cc etc…

Returns

str

A dot-bracket representation of the secondary structure

get_helices(fill_mismatches=True, split_bulge=True, keep_singles=False)

Get a dictionary of helices from the secondary structure.

Keys are equivalent to list indices. Values are lists of paired nucleotides (1-indexed) in that helix. e.g. {0:[(1,50),(2,49),(3,48)}

Parameters

fill_mismatchesbool, defaults to True

Whether 1-1 and 2-2 bulges are replaced with base pairs

split_bulgebool, defaults to True

Whether to split helices on bulges

keep_singlesbool, defaults to False

Whether to return helices that contain only 1 base-pair

Returns

dict

A dictionary of helices

get_human_dotbracket()

Get a human-readable dotbracket string representing the secondary structure.

This is an experimental format designed to be more human readable, i.e. no counting of brackets required.

  1. Letters, instead of brackets, are used to denote nested base pairs.

  2. Each helix is assigned a letter, which is incremented one letter alphabetically from the nearest enclosing stem.

  3. Non-nested helices (pseudoknots) are assigned canonical brackets.

From this canonical dbn string:

how many bases are in the base stem? how many nested helices are there? ((((….(((.[[..)))))(((…(((..]].))))))))

Same question, new format:

AABB….CCC.[[..cccbbBBB…CCC..]].cccbbbaa

Read this as:
((_______________________________________)) (level 1 = A)
((_______________))(((______________))) (level 2 = B)
(((_____))) (((_____))) (level 3 = C)

[[__________________]] (pseudoknot = [])

Pseudoknot levels:

1: Aa, Bb, Cc, etc. 2: [], 3: {}, 4: <>

get_interactions_df()

Returns a DataFrame of i, j basepairs.

Returns

pandas.DataFrame
A DataFrame with columns:

i: the 5’ (1-indexed) position of the base pair j: the 3’ (1-indexed) position of the base pair Structure: always 1

get_junction_nts()

Get a list of junction nucleotides (paired, but at the end of a chain).

Returns

list

A list of 1-indexed positions of junction nucleotides

get_nonredundant_ct()

Returns the ct attribute in a non-redundant form.

Only returns pairs in which i < j For example:

self.ct[i-1] == j self.ct[j-1] == i BUT self.get_nonredundant_ct()[j-1] == 0

Returns

numpy.array

A non-redundant array of base pairs

get_paired_nts()

Get a list of residues that are paired.

Returns

list

A list of 1-indexed positions of paired nucleotides

get_pairs()

Get a non-redundant list of base pairs i < j as a array of tuples.

Returns

list

A list of 1-indexed positions. e.g., [(1, 50), (2, 49), …]

get_pseudoknots(fill_mismatches=True)

Get the pk1 and pk2 pairs from the secondary structure.

Ignores single base pairs. PK1 is defined as the helix crossing the most other bps. If there is a tie, the most 5’ helix is called pk1 returns pk1 and pk2 as a list of base pairs e.g [(1,10),(2,9)…

Parameters

fill_mismatchesbool, defaults to True

Whether 1-1 and 2-2 bulges are replaced with base pairs

Returns

list of 2 lists of 2-tuples

A list of base pairs for pk1 and pk2

get_structure_elements()

This code is not yet implemented.

Returns a string with a character for each nucleotide, indicating what kind of structure element it is a part of.

Characters:

Dangling Ends (E) Stems (S) Hairpin Loops (H) Bulges (B) Internal Loops (I) MultiLoops (M) External Loops (X) Pseudoknot (P)

get_unpaired_nts()

Get a list of residues that are unpaired.

Returns

list

A list of 1-indexed positions of unpaired nucleotides

normalize_dtypes()

Convert dtypes of SecondaryStructure dataframe for consistency.

normalize_sequence(t_or_u='U', uppercase=True)

Normalize the sequence attribute (fix case and/or U <-> T).

property nts
property pair_nts
read_ct(structure_number=0)

Loads secondary structure information from a given ct file.

Requires a properly formatted header.

Parameters

structure_numberint, defaults to 0

0-indexed structure number to load from the ct file.

read_cte()

Generates SecondaryStructure object data from a CTE file

Resulting SecondaryStructure object will include nucleotide x and y coordinates and is compatible with plot_ss.

read_dotbracket()

Generates SecondaryStructure object data from a dot-bracket file.

Resulting SecondaryStructure object will include nucleotide x and y coordinates and is compatible with plot_ss.

read_forna()

Generates SecondaryStructure object data from a FORNA JSON file.

Resulting SecondaryStructure object will include nucleotide x and y coordinates and is compatible with plot_ss.

read_nsd(structure_number=0)

Generates SecondaryStructure object data from an NSD file.

Resulting SecondaryStructure object will include nucleotide x and y coordinates and is compatible with plot_ss.

read_r2dt()

Generates SecondaryStructure object data from an R2DT JSON file.

Resulting SecondaryStructure object will include nucleotide x and y coordinates and is compatible with plot_ss.

read_varna()

Generates SecondaryStructure object data from a VARNA file.

Resulting SecondaryStructure object will include nucleotide x and y coordinates and is compatible with plot_ss.

read_xrna()

Generates SecondaryStructure object data from an XRNA file.

Resulting SecondaryStructure object will include nucleotide x and y coordinates and is compatible with plot_ss.

transform_coordinates(flip=None, scale=None, center=None, rotate_degrees=None)

Perform transformations on X and Y structure coordinates.

To acheive vertical and horizontal flip together, rotate 180 degrees.

Parameters

flipstr, optional

“horizontal” or “vertical”

scalefloat, optional

new median distance of basepairs

centertuple of floats, optional

new center x and y coordinate

rotate_degreesfloat, optional

number of degrees to rotate structure

write_ct(out_file)

Write structure to a ct file.

write_cte(out_file)

Write structure to CTE format for Structure Editor.

write_dbn(rna_name, region='all', out_file=None)

Write the structure to a dot-bracket file.

Parameters

rna_namestr

The name of the RNA sequence

regionlist of 2 integers, optional

The region (start and end positions) of the RNA to write to file. Defaults to “all”.

out_filestr, optional

The name of the output file. If not provided, the dbn file is printed.

write_sto(out_file, name='seq')

Write structure to Stockholm (STO) file to use in infernal searches.

property xcoordinates
property ycoordinates
class rnavigate.data.Sequence(input_data, name=None, entry=0)

Bases: object

A class for storing and manipulating RNA sequences.

Parameters

sequencestring or pandas.DataFrame

sequence string, fasta file, or a Pandas dataframe containing a “Sequence” column

namestring, optional

The name of the sequence, defaults to None

entryint, defaults to 0

The index of the sequence in the fasta file if a fasta file is provided

Attributes

sequencestring

The sequence string

namestring

The name of the sequence

other_infodict

A dictionary of additional information about the sequence

null_alignmentSequenceAlignment

An alignment of the sequence to itself

get_aligned_data(alignment)

Get a copy of the sequence positionally aligned to another sequence.

Parameters

alignmentrnavigate.data.Alignment

the alignment to use

Returns

aligned_sequencernavigate.data.Sequence

the aligned sequence

get_colors(source, pos_cmap='rainbow', profile=None, structure=None, annotations=None)

Get colors and colormap representing information about the sequence.

Parameters

sourcestr, list, or matplotlib color-like

the source of the color information if a string, must be one of:

“sequence”, “position”, “profile”, “structure”, “annotations”

if a list, must be a list of matplotlib color-like values, colormap

will be None.

if a matplotlib color-like value, all nucleotides will be colored

that color, colormap will be None.

pos_cmapstr, defaults to “rainbow”

cmap used for position colors if source is “position”

profilernavigate.data.Profile, optional

the profile to use to get colors if source is “profile”

structurernavigate.data.SecondaryStructure, optional

the structure to use to get colors if source is “structure”

annotationslist of rnavigate.data.Annotations, optional

the annotations to use to get colors if source is “annotations”

Returns

colorsnumpy array

one matplotlib color-like value for each nucleotide in self.sequence

colormaprnavigate.data.ScalarMappable

a colormap used for creating a colorbar

get_colors_from_annotations(annotations, default_color='gray')

Get colors and colormap representing sequence annotations.

Parameters

annotationslist of rnavigate.data.Annotations

the annotations to use to get colors.

default_colormatplotlib color-like, defaults to “gray”

the color to use for nucleotides not in any annotation

Returns

colorsnumpy array

one matplotlib color-like value for each nucleotide in self.sequence

colormaprnavigate.data.ScalarMappable

a colormap used for creating a colorbar

get_colors_from_positions(pos_cmap='rainbow')

Get colors and colormap representing the nucleotide position.

Parameters

pos_cmapstr, defaults to “rainbow”

cmap used for position colors

Returns

colorsnumpy array

one matplotlib color-like value for each nucleotide in self.sequence

colormaprnavigate.data.ScalarMappable

a colormap used for creating a colorbar

get_colors_from_profile(profile)

Get colors and colormap representing per-nucleotide data.

Parameters

profilernavigate.data.Profile

the profile to use to get colors.

Returns

colorsnumpy array

one matplotlib color-like value for each nucleotide in self.sequence

colormaprnavigate.data.ScalarMappable

a colormap used for creating a colorbar

get_colors_from_sequence()

Get a colors and colormap representing the nucleotide sequence.

Returns

colorsnumpy array

one matplotlib color-like value for each nucleotide in self.sequence

colormaprnavigate.data.ScalarMappable

a colormap used for creating a colorbar

get_colors_from_structure(structure)

Get colors and colormap representing base-pairing status.

Parameters

structurernavigate.data.SecondaryStructure

the structure to use to get colors.

Returns

colorsnumpy array

one matplotlib color-like value for each nucleotide in self.sequence

colormaprnavigate.data.ScalarMappable

a colormap used for creating a colorbar

get_region(region='all')

Checks region input for validity and returns start and end positions.

If region is “all”, returns 1, self.length. Otherwise, ensures that region is between these values and returns the values, sorted.

Parameters

regionlist of 2 int

start and end positions of the region

Returns

start, endint, int

the starting and ending positions

get_region_data(region='all')

Get a copy of the data object containing only the specified region.

Parameters

regionlist of 2 int, defaults to “all”

start and end positions of the region

Returns

region_datarnavigate.data.Sequence

the sequence containing only the specified region

get_seq_from_dataframe(dataframe)

Parse a dataframe for the sequence string, store as self.sequence.

Parameters

dataframepandas.DataFrame

must contain a “Sequence” column

property length

Get the length of the sequence

Returns

lengthint

the length of self.sequence

normalize_sequence(t_or_u='U', uppercase=True)

Converts sequence to all uppercase nucleotides and corrects T or U.

Parameters

t_or_u“T”, “U”, or False, defaults to “U”

“T” converts “U”s to “T”s “U” converts “T”s to “U”s False does nothing.

uppercasebool, defaults to True

Whether to make sequence all uppercase

read_fasta(fasta, entry)

Parse a fasta file for the first sequence.

Parameters

fastastring

path to fasta file

entryint

the index of the sequence in the fasta file

Returns

sequencestring

the sequence string

write_fasta(file, name)

Write the sequence to a fasta file.

Parameters

filestring

path to output fasta file

namestring

the name of the sequence to write in the fasta file

class rnavigate.data.SequenceAlignment(sequence1, sequence2, align_kwargs=None, full=False, use_previous=True)

Bases: BaseAlignment

The most useful feature of RNAvigate. Maps positions from one sequence to a totally different sequence using user-defined pairwise alignment or automatic pairwise alignment.

Parameters

sequence1string

the sequence to be aligned

sequence2string

the sequence to align to

align_kwargsdict, defaults to None

a dictionary of arguments to pass to pairwise2.align.globalms

fullbool, defaults to False

whether to keep unmapped starting sequence positions.

use_previousbool, defaults to True

whether to use previously set alignments

Attributes

sequence1str

the sequence to be aligned

sequence2str

the sequence to align to

alignment1str

the alignment string matching sequence1 to sequence2

alignment2str

the alignment string matching sequence2 to sequence1

starting_sequencestr

sequence1

target_sequencestr

sequence2 if full is False, else alignment2

mappingnumpy.array

the alignment map array. index of starting_sequence is mapping[index] of target_sequence

get_alignment()

Gets an alignment that has either been user-defined or previously calculated or produces a new pairwise alignment between two sequences.

Returns

alignment1, alignment2tuple of 2 str

the alignment strings matching sequence1 and sequence2, respectively.

get_inverse_alignment()

Gets an alignment that maps from sequence2 to sequence1.

get_mapping()

Calculates a mapping from starting sequence to target sequence.

Returns

mappingnumpy.array

an array that maps to an index of target sequence. index of starting_sequence is mapping[index] of target_sequence

print(print_format='full')

Print the alignment in a human-readable format.

Parameters

print_format“full”, “cigar”, “long” or “short”, defaults to “full”

how to format the alignment. “full”: the full length alignment with changes labeled “X” “cigar”: the CIGAR string “long”: locations and sequences of each change “short”: total number of matches, mismatches, and indels

print_all_changes()

Print location and sequence of all changes.

print_cigar()

Print the CIGAR string

print_number_of_changes()

Print the total numbers of matches, mismatches, and indels.

class rnavigate.data.SequenceCircle(input_data, gap=30, name=None, **kwargs)

Bases: SecondaryStructure

A circular SecondaryStructure-like representation of RNA sequence.

class rnavigate.data.StructureAlignment(sequence1, sequence2, structure1=None, structure2=None, full=False)

Bases: BaseAlignment

Experimental secondary structure alignment based on RNAlign2D algorithm (https://doi.org/10.1186/s12859-021-04426-8)

Parameters

sequence1string

the sequence to be aligned

sequence2string

the sequence to align to

structure1string, defaults to None

the secondary structure of sequence1

structure2string, defaults to None

the secondary structure of sequence2

fullbool, defaults to False

whether to align to full length of sequence2 or just mapped length

Attributes

sequence1str

the sequence to be aligned

sequence2str

the sequence to align to

structure1str

the secondary structure of sequence1

structure2str

the secondary structure of sequence2

alignment1str

the alignment string matching sequence1 to sequence2

alignment2str

the alignment string matching sequence2 to sequence1

starting_sequencestr

sequence1

target_sequencestr

sequence2 if full is False, else alignment2

mappingnumpy.array

the alignment map array. index of starting_sequence is mapping[index] of target_sequence

get_alignment()

Aligns pseudo-amino-acid sequences according to RNAlign2D rules.

Returns

alignment1, alignment2tuple of 2 str

the alignment strings matching sequence1 and sequence2, respectively.

get_inverse_alignment()

Gets an alignment that maps from sequence2 to sequence1.

get_mapping()

Calculates a mapping from starting sequence to target sequence.

Returns

mappingnumpy.array

an array which maps an indices to the target sequence. starting_sequence[idx] == target_sequence[self.mapping[idx]]

set_as_default_alignment()

Set this as the default alignment between sequence1 and sequence2.

class rnavigate.data.StructureAsInteractions(input_data, sequence, metric=None, metric_defaults=None, window=1, name=None)

Bases: Interactions

A class for storing and manipulating structure data.

Parameters

input_datastring or pandas.DataFrame

If string, a path to a file containing structure data. If dataframe, the dataframe containing structure data. The dataframe must contain columns “i”, “j”, and “Structure”. Dataframe may also include other columns.

sequencestring or rnavigate.data.Sequence

The sequence string corresponding to the structure data.

metricstring, defaults to “Structure”

The column name to use for visualization.

metric_defaultsdict

Keys are metric names and values are dictionaries of metric-specific defaults. These defaults include:

“metric_column”string

the column name to use for visualization

“cmap”string or matplotlib.colors.Colormap

the colormap to use for visualization

“normalization”“min_max”, “0_1”, “none”, or “bins”

The type of normalization to use when mapping values to colors

“values”list of float

The values to used with normalization of the data

“title”string

the title to use for colorbars

“extend”“min”, “max”, “both”, or “neither”

Which ends to extend when drawing the colorbar.

“tick_labels” : list of string

read_table_kwdict, optional

kwargs passed to pandas.read_table() when reading input_data.

windowint, defaults to 1

The window size used to generate the structure data.

namestr, optional

A name for the StructureAsInteractions object.

Attributes

datapandas.DataFrame

The structure data.

class rnavigate.data.StructureCompareMany(input_data, sequence, metric=None, metric_defaults=None, window=1, name=None)

Bases: Interactions

A class for storing and manipulating a comparison of many structures.

Parameters

input_datastring or pandas.DataFrame

If string, a path to a file containing structure data. If dataframe, the dataframe containing structure data. The dataframe must contain columns “i”, “j”, and “Structure”. Dataframe may also include other columns.

sequencestring or rnavigate.data.Sequence

The sequence string corresponding to the structure data.

metricstring, defaults to “Structure”

The column name to use for visualization.

metric_defaultsdict

Keys are metric names and values are dictionaries of metric-specific defaults. These defaults include:

“metric_column”string

the column name to use for visualization

“cmap”string or matplotlib.colors.Colormap

the colormap to use for visualization

“normalization”“min_max”, “0_1”, “none”, or “bins”

The type of normalization to use when mapping values to colors

“values”list of float

The values to used with normalization of the data

“title”string

the title to use for colorbars

“extend”“min”, “max”, “both”, or “neither”

Which ends to extend when drawing the colorbar.

“tick_labels” : list of string

read_table_kwdict, optional

kwargs passed to pandas.read_table() when reading input_data.

windowint, defaults to 1

The window size used to generate the structure data.

namestr, optional

A name for the StructureAsInteractions object.

Attributes

datapandas.DataFrame

The structure data.

class rnavigate.data.StructureCompareTwo(input_data, sequence, metric=None, metric_defaults=None, window=1, name=None)

Bases: Interactions

A class for storing and manipulating a comparison of two structures.

Parameters

input_datastring or pandas.DataFrame

If string, a path to a file containing structure data. If dataframe, the dataframe containing structure data. The dataframe must contain columns “i”, “j”, and “Structure”. Dataframe may also include other columns.

sequencestring or rnavigate.data.Sequence

The sequence string corresponding to the structure data.

metricstring, defaults to “Structure”

The column name to use for visualization.

metric_defaultsdict

Keys are metric names and values are dictionaries of metric-specific defaults. These defaults include:

“metric_column”string

the column name to use for visualization

“cmap”string or matplotlib.colors.Colormap

the colormap to use for visualization

“normalization”“min_max”, “0_1”, “none”, or “bins”

The type of normalization to use when mapping values to colors

“values”list of float

The values to used with normalization of the data

“title”string

the title to use for colorbars

“extend”“min”, “max”, “both”, or “neither”

Which ends to extend when drawing the colorbar.

“tick_labels” : list of string

read_table_kwdict, optional

kwargs passed to pandas.read_table() when reading input_data.

windowint, defaults to 1

The window size used to generate the structure data.

namestr, optional

A name for the StructureAsInteractions object.

Attributes

datapandas.DataFrame

The structure data.

class rnavigate.data.StructureCoordinates(x, y, pairs=None)

Bases: object

Helper class to perform structure coordinate transformations

Parameters

xnumpy.array

x coordinates

ynumpy.array

y coordinates

pairslist of pairs, optional

list of base-paired positions required if scaling coordinates

center(x=0, y=0)

Center structure on the given x, y coordinate

Parameters

xint, defaults to 0

x coordinate of structure center

yint, defaults to 0

y coordinate of structure center

flip(horizontal=True)

Flip structure vertically or horizontally.

Parameters

horizontalbool, defaults to True

whether to flip structure horizontally, otherwise vertically

get_center_point()

Get the x, y coordinates for the center of structure.

Returns

float

x coordinate of structure center

float

y coordinate of structure center

rotate(degrees)

Rotate structure on current center point.

Parameters

degreesfloat

number of degrees to rotate structure

scale(median_bp_distance=1.0)

Scale structure such that median base-pair distance is constant.

Parameters

median_bp_distancefloat, defaults to 1.0

New median distance between all base-paired nucleotides.

rnavigate.data.domains(input_data, names, colors, sequence)

Create a list of Annotations from a list of spans.

Currently, domains functionality in RNAvigate just uses a list of spans. In the future, this should be a dedicated class. Generally, domains should cover an entire sequence without overlap, but this is not enforced. e.g. [[1, 100], [101, 200]] for a 200 nt sequence.

Parameters

input_datalist of lists

list of spans for each domain

nameslist of strings

list of names for each domain

colorslist of valid matplotlib colors

list of colors for each domain

sequencestring

sequence to be annotated

Returns

list of rnavigate.data.Annotation

list of Annotations

rnavigate.data.lookup_alignment(sequence1, sequence2, t_or_u='U')

look up a previously set alignment in the _alignments_cache

Parameters

sequence1string

The first sequence to align

sequence2string

The second sequence to be aligned to

t_or_u“T”, “U”, or False, defaults to “U”

“T” converts “U”s to “T”s “U” converts “U”s to “T”s False does nothing

Returns

dictionary, if an alignment is found, otherwise None
{“seqA”: sequence1 with gap characters representing alignment,

“seqB”: sequence2 with gap characters representing alignment}

rnavigate.data.normalize_sequence(sequence, t_or_u='U', uppercase=True)

Returns sequence as all uppercase nucleotides and/or corrects T or U.

Parameters

sequencestring or RNAvigate Sequence)

The sequence If given an RNAvigate Sequence, the sequence string is retrieved

t_or_u“T”, “U”, or False, defaults to “U”

“T” converts “U”s to “T”s “U” converts “T”s to “U”s False does nothing

uppercase bool, defaults to True

Whether to make sequence all uppercase

Returns

string

the cleaned-up sequence string

rnavigate.data.set_alignment(sequence1, sequence2, alignment1, alignment2, t_or_u='U')

Add an alignment to be used as the default between two sequences.

When objects with these sequences are aligned for visualization, RNAvigate uses this alignment instead of an automated pairwise sequence alignment. Alignment 1 and 2 must have matching lengths. alignment(1,2) and sequence(1,2) must differ only by dashes “-“.

e.g.:

sequence1 =”AAGCUUCGGUACAUGCAAGAUGUAC” sequence2 =”AUCGAUCGAGCUGCUGUGUACGUAC” alignment1=”AAGCUUCG———GUACAUGCAAGAUGUAC” alignment2=”AUCGAUCGAGCUGCUGUGUAC———GUAC”

|mm| | indel | | indel |

Parameters

sequence1string

the first sequence

sequence2string

the second sequence

alignment1string

first sequence, plus dashes “-” indicating indels

alignment2string

second sequence, plus dashes “-” indicating indels

t_or_u“T”, “U”, or False

“T” converts “U”s to “T”s

rnavigate.data.set_multiple_sequence_alignment(fasta, set_pairwise=False)

Set alignments from a multiple sequence alignment Pearson fasta file.

Sets alignments to a base sequence, then returns the base sequence to be when a multiple sequence alignment plot is desired. Also sets all pairwise alignments, if desired. When setting pairwise alignments, dashes that are shared between pairwise sequences are removed first.

Parameters

fastastring

location of Pearson fasta file

set_pairwisebool, defaults to False

whether to set every pairwise alignment as well as the multiple sequence alignment.