rnavigate.analysis package

Submodules

rnavigate.analysis.auroc module

Windowed AUROC assesses agreement between reactivities and base-pairing.

class rnavigate.analysis.auroc.WindowedAUROC(sample, window=81, profile='default_profile', structure='default_structure')

Bases: object

Compute and display windowed AUROC analysis.

This analysis computes the ROC curve over a sliding window for the performance of per-nucleotide data (usually SHAPE-MaP or DMS-MaP Normalized reactivity) in predicting the base-pairing status of each nucleotide. The area under this curve (AUROC) is displayed compared to the median across the RNA. Below, an arc plot displays the secondary structure and per-nucleotide profile.

AUROC values (should) range from 0.5 (no predictive power) to 1.0

(perfect predictive power). A value of 0.5 indicates that the reactivity profile does not fit the structure prediction well. These regions are good candidates for further investigation with ensemble deconvolution.

References

Lan, T.C.T., Allan, M.F., Malsick, L.E. et al. Secondary structural: ensembles of the SARS-CoV-2 RNA genome in infected cells. Nat Commun 13, 1128 (2022). https://doi.org/10.1038/s41467-022-28603-2

Methods

__init__: Computes the AUROC array and AUROC median. plot_auroc: Displays the AUROC analysis over the given region. Returns Plot object.

Attributes

samplernavigate.Sample: sample to retrieve profile and secondary structure
structurestr: Data keyword of sample pointing to secondary structure e.g. sample.data[structure]
profilestr: Data keyword of sample pointing to profile e.g. sample.data[profile]

sequence : the sequence string of sample.data[structure] window: the size of the windows nt_length: the length of sequence string auroc: the auroc numpy array, length = nt_length, padded with np.nan median_auroc: the median of the auroc array

plot_auroc(region=None)

Plot the result of the windowed AUROC analysis, with arc plot of structure and reactivity profile.

Args:

region (list of int: length 2, optional): Start and end nucleotide: positions to plot. Defaults to [1, RNA length].

rnavigate.analysis.check_sequence module

SequenceChecker analysis used to inspect sequence differences.

Given a list of samples, we can inspect which data keywords belong to the samples, which sequences match up perfectly, and inspect the differences between sequences.

class rnavigate.analysis.check_sequence.SequenceChecker(samples)

Bases: object

Check the sequences stored in a list of samples.

Attributes

sampleslist: samples in which to check sequences
sequenceslist: all unique sequence strings stored in the list of samples. These are converted to an all uppercase RNA alphabet.
keywordslist: all unique data keywords stored in the list of samples.
which_sequencesPandas.DataFrame: each row is a sample, keyword, and index of self.sequences

get_keywords(): A list of all unique data keywords across samples.

get_sequences(): A list of all unique sequences (uppercase RNA) across samples.

get_which_sequences(): A DataFrame of sequence IDs (integers) for each data keyword.

print_alignments(print_format='long', which='all')

Print alignments in the given format for sequence IDs provided.

Parameters

print_formatstring, defaults to “long”: What format to print the alignments in: “cigar” prints the cigar string “short” prints the numbers of mismatches and indels “long” prints the location and nucleotide identity of all mismatches, insertions and deletions.
whichtuple of two of integers, defaults to “all” (every pairwise comparison): two sequence IDs to compare.

print_mulitple_sequence_alignment(base_sequence)

Print the multiple sequence alignment with nice formatting.

Parameters

base_sequencestring: a sequence string that represents the longest common sequence. Usually, this is the return value from rnav.data.set_multiple_sequence_alignment()

print_which_sequences(): Print sequence ID (integer) for each data keyword and sample.

reset(): Reset keywords and sequences from sample list in case of changes.

write_fasta(filename, which='all')

Write all unique sequences to a fasta file.

This is very useful for using external multiple sequence aligners such as ClustalOmega. 1) go to https://www.ebi.ac.uk/Tools/msa/clustalo/ 2) upload new fasta file 3) under STEP 2 output format, select Pearson/FASTA 4) click ‘Submit’ 5) wait for your alignment to finish 6) download the alignment fasta file 7) use rnav.data.set_multiple_sequence_alignment()

Parameters

filenamestring: path to a new file to which fasta entries are written
whichlist of integers, defaults to “all” (every sequence): Sequence IDs to write to file.

rnavigate.analysis.deltashape module

DeltaSHAPE for detecting meaningful changes in SHAPE reactivity between two samples.

Parameters are optimized for detecting in cell vs. cell free protein protections and enhancements, but useful for identifying any useful differences.

Copyright Matthew J. Smola 2015 Largely rewritten for RNAvigate by Patrick Irving 2023

class rnavigate.analysis.deltashape.DeltaSHAPE(sample1, sample2, profile='shapemap', smoothing_window=3, zf_coeff=1.96, ss_thresh=1, site_window=3, site_nts=2)

Bases: Sample

Detects meaningful differences in chemical probing reactivity

References

doi:10.1021/acs.biochem.5b00977

Algorithm

Extract SHAPE-MaP sequence, normalized profile, and normalized
standard error from given samples
Calculated smoothed profiles (mean) and propagate standard errors
over rolling windows
Subtract raw and smoothed normalized profiles and propogate errors
Calculate Z-factors for smoothed data. This is the magnitude of the
difference relative to the standard error
Calculate Z-scores for smoothed data. This is the magnitude of the
difference in standard deviations from the mean difference
Call sites. Called sites must have # nucleotides that pass Z-factor
and Z-score thresholds per window.

Smoothing window size, Z factor threshold, Z score threshold, site-calling window size and minimum nucleotides per site can be specified.

calculate_deltashape(smoothing_window=3, zf_coeff=1.96, ss_thresh=1, site_window=2, site_nts=3)

Calculate or recalculate deltaSHAPE profile and called sites

Parameters

smoothing_windowint, default=3: Size of windows for data smoothing
zf_coefffloat, default=1.96: Sites must have a difference more than zf_coeff standard errors
ss_threshint, default=1: Sites must have a difference that is ss_thresh standard deviations from the mean difference
site_windowint, default=3: Number of nucleotides to include when calling sites
site_ntsint, default=2: Number of nts within site_window that must pass thresholds

plot(region='all')

Plot the deltaSHAPE result

Parameters

regionlist of 2 integers, default=”all”: start and end positions to plot

Returns

rnav.plots.Profile: The plot object

class rnavigate.analysis.deltashape.DeltaSHAPEProfile(input_data, metric='Smooth_diff', metric_defaults=None, sequence=None, name=None, **kwargs)

Bases: Profile

Profile data class for performing deltaSHAPE analysis

calculate_deltashape(smoothing_window=3, zf_coeff=1.96, ss_thresh=1, site_window=3, site_nts=2)

Calculate the deltaSHAPE profile metrics

Parameters

smoothing_windowint, default=3: Size of windows for data smoothing
zf_coefffloat, default=1.96: Sites must have a difference more than zf_coeff standard errors
ss_threshint, default=1: Sites must have a difference that is ss_thresh standard deviations from the mean difference
site_windowint, default=3: Number of nucleotides to include when calling sites
site_ntsint, default=2: Number of nts within site_window that must pass thresholds

get_enhancements_annotation(): Get an annotations object for the significant enhancements

get_protections_annotation(): Get an annotations object for the significant protections

rnavigate.analysis.fragmapper module

Fragmapper analysis tools.

Description: FragMapper compares reactivity profile differences between SHAPE-MaP profiles. The intended application of Fragmapper is to detect fragment or ligand crosslinking sites in RNA.

class rnavigate.analysis.fragmapper.FragMaP(input_data, parameters, metric='Delta_zscore', metric_defaults=None, read_table_kw=None, sequence=None, name=None)

Bases: Profile

get_annotation()

get_dataframe(profile1, profile2, mutation_rate_threshold, depth_threshold, delta_rate_threshold, zscore_threshold, zscore_min_threshold)

property recreation_kwargs: A dictionary of keyword arguments to pass when recreating the object.

class rnavigate.analysis.fragmapper.Fragmapper(sample1, sample2, parameters=None, profile='shapemap')

Bases: Sample

plot_scatter(column='Modified_rate')

Generates scatter plots useful for fragmapper quality control.

Args:

column (str, optional):: Dataframe column containing data to plot (must be avalible for the sample and control). Defaults to “Modified_rate”.

Returns:

(matplotlib figure, matplotlib axis): Scatter plot with control values on the x-axis, sample values on the y-axis, and each point representing a nucleotide not filtered out in the fragmapper pipeline.

update_annotation()

class rnavigate.analysis.fragmapper.FragmapperReplicates(samples_1: list, samples_2: list, parameters=None, profile='shapemap')

Bases: Sample

average_columns(df: DataFrame, avg_columns: list[str] = ['Modified_mutations', 'Modified_effective_depth', 'Modified_rate'], sem_column: list[str] = ['Modified_rate'])

merge_samples(samples: list, profile: str = 'shapemap', suffix: str = 'rep', columns: list = ['Nucleotide', 'Sequence', 'Modified_mutations', 'Modified_effective_depth', 'Modified_rate'], exceptions: list = ['Nucleotide', 'Sequence'])

plot_scatter(column: str = 'Modified_rate', error: str = 'Std_err', label_size: int = None, ylabel: str = None, xlabel: str = None)

Generates scatter plots useful for fragmapper quality control.

Args:

column (str, optional):: Dataframe column containing data to plot (must be avalible for the sample and control). Defaults to “Modified_rate”.

Returns:

(matplotlib figure, matplotlib axis): Scatter plot with control values on the x-axis, sample values on the y-axis, and each point representing a nucleotide not filtered out in the fragmapper pipeline.

update_annotation()

rnavigate.analysis.logcompare module

LogCompare compares reactivity profiles for significant differences.

This analysis requires replicates.

class rnavigate.analysis.logcompare.LogCompare(samples1, samples2, name1, name2, profile_kw, sequence=None, inherit=None)

Bases: Sample

Compares 2 experimental samples, given replicates of each sample.

Algorithm

Calculate the ln(modified/untreated) rate for each replicate.

2. Scale these values to minimize the median of the absolute difference between samples. 3. Calculate the standard error in these values for each replicate. 4. Calculate the difference between samples. 5. Calculate z-scores between samples. 6. Plot the results in two panels: (1) the scaled log10(modified/untreated) rate for each sample with error bars, and (2) the difference between samples, colored by z-score.

Methods

__init__: computes log10(modified/untreated) rates, rescales the data, then calls make_plot() get_profile_sequence: gets log10(m/u) rate and sequence from sample rescale: rescales a profile to minimize difference to another profile load_replicates: calculates average and standard error of replicates make_plots: displays the two panels described above.

Attributes

datastr: a key of sample.data to retrieve per-nucleotide data
groupsdict: a dictionary with keys 1 and 2, each containing: self.data (averaged scaled log10(m/u)), “stderr” (standard errors), “stacked” (2d array of scaled log10(m/u) per replicate), “seq” (the sequence string)

class rnavigate.analysis.logcompare.LogProfile(input_data, metric='mean_diff', metric_defaults=None, sequence=None, **kwargs)

Bases: Profile

A class for log10(Modified_rate/Untreated_rate) profiles.

calc_profile(profile)

Calculate log10(Modified_rate/Untreated_rate) for the given sample/profile.

Args:: sample (rnavigate.Sample): an rnavigate sample
Returns:: np.array: log profile

load_replicates(profiles)

calculates log profiles, avg and sterr for a group of replicates.

Args:: *profiles (list of rnavigate.Sample): replicates to load

rescale(profile, target_profile)

scales profile to minimize difference to target_profile.

Args:: profile (np.array): log10 profile to scale target_profile (np.array): 2nd log10 profile
Returns:: np.array: scaled profile

rnavigate.analysis.lowss module

Performs low SHAPE, low Shannon entropy analysis

Citation:: Siegfried, N., Busan, S., Rice, G. et al. RNA motif discovery by SHAPE and mutational profiling (SHAPE-MaP). Nat Methods 11, 959-965 (2014). https://doi.org/10.1038/nmeth.3029

Typical usage example:

import rnavigate as rnav
my_sample = rnav.Sample(
    sample="example sample",
    shapemap="my_shape_profile.txt",
    pairprob="pairing_probabilities.txt",
    ss="MFE_structure.ct"
)
lowss_sample = rnav.analysis.LowSS(my_sample)
plot = lowss_sample.plot_lowss()
plot.save("lowss_figure.svg")

class rnavigate.analysis.lowss.LowSS(sample, window=55, shapemap='shapemap', pairprob='pairprob', structure='ss')

Bases: Sample

Creates a new RNAvigate Sample which computes and displays Low SHAPE, low Shannon entropy regions (LowSS) given a sample containing SHAPE reactivities, pairing probabilities, and MFE structure.

Methods

__init__: performs the analysis plot_lowss: displays the result and returns plot object

Attributes

samplestr

the new label for this Sample’s data on plots

parentrnavigate.Sample

the sample from which data is retrieved

windowint

size of the windows, must be odd

median_shapefloat

global median SHAPE reactivity

median_entropyfloat

global median Shannon entropy

datadictionary

dictionary of data keyword: Data objects, keys are:

“structure” (rnav.data.SecondaryStructure): copy of provided MFE structure
“shapemap” (rnav.data.SHAPEMaP): copy of provided SHAPE-MaP data aligned to “structure”
“pairprob” (rnav.data.PairingProbability): copy of pairing probabilities aligned to “structure”
“entropies” (rnav.data.Profile): Profile of Shannon entropies calculated from “pairprob”
“lowSS” (rnav.data.Annotations): annotations defining low SHAPE, low Shannon entropy regions

plot_lowss(region=None, colorbars=True)

Visualize LowSS analysis over the given region.

Parameters

regioninteger or list of 2 integers, default=None (entire sequence): If list: lowSS start and end positions to plot. If integer: region number, +/- 150 nts are shown.
colorbarsbool, default=True: whether to plot colorbars for pairing probability

Returns

rnavigate.plots.AP: LowSS visualization

reset_lowss(maximum_shape=None, maximum_entropy=0.08)

Generates an annotation of lowSS regions. Stored as self.lowSS

Parameters

maximum_shapefloat, default=None (median SHAPE reactivity): maximum normalized SHAPE reactivity to be called lowSS.
maximum_entropyfloat, default=0.08: maximum shannon entropy to be called lowSS.

reset_window(window=None)

Resets the window size and recalculates windowed SHAPE reactivities and shannon entropies and lowSS region annotations.

Parameters

windowint, default=None (self.window): window size for calculating median SHAPE and Shannon entropy, must be odd

Module contents

class rnavigate.analysis.DeltaSHAPE(sample1, sample2, profile='shapemap', smoothing_window=3, zf_coeff=1.96, ss_thresh=1, site_window=3, site_nts=2)

Bases: Sample

Detects meaningful differences in chemical probing reactivity

References

doi:10.1021/acs.biochem.5b00977

Algorithm

Extract SHAPE-MaP sequence, normalized profile, and normalized
standard error from given samples
Calculated smoothed profiles (mean) and propagate standard errors
over rolling windows
Subtract raw and smoothed normalized profiles and propogate errors
Calculate Z-factors for smoothed data. This is the magnitude of the
difference relative to the standard error
Calculate Z-scores for smoothed data. This is the magnitude of the
difference in standard deviations from the mean difference
Call sites. Called sites must have # nucleotides that pass Z-factor
and Z-score thresholds per window.

Smoothing window size, Z factor threshold, Z score threshold, site-calling window size and minimum nucleotides per site can be specified.

calculate_deltashape(smoothing_window=3, zf_coeff=1.96, ss_thresh=1, site_window=2, site_nts=3)

Calculate or recalculate deltaSHAPE profile and called sites

Parameters

smoothing_windowint, default=3: Size of windows for data smoothing
zf_coefffloat, default=1.96: Sites must have a difference more than zf_coeff standard errors
ss_threshint, default=1: Sites must have a difference that is ss_thresh standard deviations from the mean difference
site_windowint, default=3: Number of nucleotides to include when calling sites
site_ntsint, default=2: Number of nts within site_window that must pass thresholds

plot(region='all')

Plot the deltaSHAPE result

Parameters

regionlist of 2 integers, default=”all”: start and end positions to plot

Returns

rnav.plots.Profile: The plot object

class rnavigate.analysis.DeltaSHAPEProfile(input_data, metric='Smooth_diff', metric_defaults=None, sequence=None, name=None, **kwargs)

Bases: Profile

Profile data class for performing deltaSHAPE analysis

calculate_deltashape(smoothing_window=3, zf_coeff=1.96, ss_thresh=1, site_window=3, site_nts=2)

Calculate the deltaSHAPE profile metrics

Parameters

smoothing_windowint, default=3: Size of windows for data smoothing
zf_coefffloat, default=1.96: Sites must have a difference more than zf_coeff standard errors
ss_threshint, default=1: Sites must have a difference that is ss_thresh standard deviations from the mean difference
site_windowint, default=3: Number of nucleotides to include when calling sites
site_ntsint, default=2: Number of nts within site_window that must pass thresholds

get_enhancements_annotation(): Get an annotations object for the significant enhancements

get_protections_annotation(): Get an annotations object for the significant protections

class rnavigate.analysis.FragMaP(input_data, parameters, metric='Delta_zscore', metric_defaults=None, read_table_kw=None, sequence=None, name=None)

Bases: Profile

get_annotation()

get_dataframe(profile1, profile2, mutation_rate_threshold, depth_threshold, delta_rate_threshold, zscore_threshold, zscore_min_threshold)

property recreation_kwargs: A dictionary of keyword arguments to pass when recreating the object.

class rnavigate.analysis.Fragmapper(sample1, sample2, parameters=None, profile='shapemap')

Bases: Sample

plot_scatter(column='Modified_rate')

Generates scatter plots useful for fragmapper quality control.

Args:

column (str, optional):: Dataframe column containing data to plot (must be avalible for the sample and control). Defaults to “Modified_rate”.

Returns:

(matplotlib figure, matplotlib axis): Scatter plot with control values on the x-axis, sample values on the y-axis, and each point representing a nucleotide not filtered out in the fragmapper pipeline.

update_annotation()

class rnavigate.analysis.FragmapperReplicates(samples_1: list, samples_2: list, parameters=None, profile='shapemap')

Bases: Sample

average_columns(df: DataFrame, avg_columns: list[str] = ['Modified_mutations', 'Modified_effective_depth', 'Modified_rate'], sem_column: list[str] = ['Modified_rate'])

merge_samples(samples: list, profile: str = 'shapemap', suffix: str = 'rep', columns: list = ['Nucleotide', 'Sequence', 'Modified_mutations', 'Modified_effective_depth', 'Modified_rate'], exceptions: list = ['Nucleotide', 'Sequence'])

plot_scatter(column: str = 'Modified_rate', error: str = 'Std_err', label_size: int = None, ylabel: str = None, xlabel: str = None)

Generates scatter plots useful for fragmapper quality control.

Args:

column (str, optional):: Dataframe column containing data to plot (must be avalible for the sample and control). Defaults to “Modified_rate”.

Returns:

(matplotlib figure, matplotlib axis): Scatter plot with control values on the x-axis, sample values on the y-axis, and each point representing a nucleotide not filtered out in the fragmapper pipeline.

update_annotation()

class rnavigate.analysis.LogCompare(samples1, samples2, name1, name2, profile_kw, sequence=None, inherit=None)

Bases: Sample

Compares 2 experimental samples, given replicates of each sample.

Algorithm

Calculate the ln(modified/untreated) rate for each replicate.

2. Scale these values to minimize the median of the absolute difference between samples. 3. Calculate the standard error in these values for each replicate. 4. Calculate the difference between samples. 5. Calculate z-scores between samples. 6. Plot the results in two panels: (1) the scaled log10(modified/untreated) rate for each sample with error bars, and (2) the difference between samples, colored by z-score.

Methods

__init__: computes log10(modified/untreated) rates, rescales the data, then calls make_plot() get_profile_sequence: gets log10(m/u) rate and sequence from sample rescale: rescales a profile to minimize difference to another profile load_replicates: calculates average and standard error of replicates make_plots: displays the two panels described above.

Attributes

datastr: a key of sample.data to retrieve per-nucleotide data
groupsdict: a dictionary with keys 1 and 2, each containing: self.data (averaged scaled log10(m/u)), “stderr” (standard errors), “stacked” (2d array of scaled log10(m/u) per replicate), “seq” (the sequence string)

class rnavigate.analysis.LowSS(sample, window=55, shapemap='shapemap', pairprob='pairprob', structure='ss')

Bases: Sample

Creates a new RNAvigate Sample which computes and displays Low SHAPE, low Shannon entropy regions (LowSS) given a sample containing SHAPE reactivities, pairing probabilities, and MFE structure.

Methods

__init__: performs the analysis plot_lowss: displays the result and returns plot object

Attributes

samplestr

the new label for this Sample’s data on plots

parentrnavigate.Sample

the sample from which data is retrieved

windowint

size of the windows, must be odd

median_shapefloat

global median SHAPE reactivity

median_entropyfloat

global median Shannon entropy

datadictionary

dictionary of data keyword: Data objects, keys are:

“structure” (rnav.data.SecondaryStructure): copy of provided MFE structure
“shapemap” (rnav.data.SHAPEMaP): copy of provided SHAPE-MaP data aligned to “structure”
“pairprob” (rnav.data.PairingProbability): copy of pairing probabilities aligned to “structure”
“entropies” (rnav.data.Profile): Profile of Shannon entropies calculated from “pairprob”
“lowSS” (rnav.data.Annotations): annotations defining low SHAPE, low Shannon entropy regions

plot_lowss(region=None, colorbars=True)

Visualize LowSS analysis over the given region.

Parameters

regioninteger or list of 2 integers, default=None (entire sequence): If list: lowSS start and end positions to plot. If integer: region number, +/- 150 nts are shown.
colorbarsbool, default=True: whether to plot colorbars for pairing probability

Returns

rnavigate.plots.AP: LowSS visualization

reset_lowss(maximum_shape=None, maximum_entropy=0.08)

Generates an annotation of lowSS regions. Stored as self.lowSS

Parameters

maximum_shapefloat, default=None (median SHAPE reactivity): maximum normalized SHAPE reactivity to be called lowSS.
maximum_entropyfloat, default=0.08: maximum shannon entropy to be called lowSS.

reset_window(window=None)

Resets the window size and recalculates windowed SHAPE reactivities and shannon entropies and lowSS region annotations.

Parameters

windowint, default=None (self.window): window size for calculating median SHAPE and Shannon entropy, must be odd

class rnavigate.analysis.SequenceChecker(samples)

Bases: object

Check the sequences stored in a list of samples.

Attributes

sampleslist: samples in which to check sequences
sequenceslist: all unique sequence strings stored in the list of samples. These are converted to an all uppercase RNA alphabet.
keywordslist: all unique data keywords stored in the list of samples.
which_sequencesPandas.DataFrame: each row is a sample, keyword, and index of self.sequences

get_keywords(): A list of all unique data keywords across samples.

get_sequences(): A list of all unique sequences (uppercase RNA) across samples.

get_which_sequences(): A DataFrame of sequence IDs (integers) for each data keyword.

print_alignments(print_format='long', which='all')

Print alignments in the given format for sequence IDs provided.

Parameters

print_formatstring, defaults to “long”: What format to print the alignments in: “cigar” prints the cigar string “short” prints the numbers of mismatches and indels “long” prints the location and nucleotide identity of all mismatches, insertions and deletions.
whichtuple of two of integers, defaults to “all” (every pairwise comparison): two sequence IDs to compare.

print_mulitple_sequence_alignment(base_sequence)

Print the multiple sequence alignment with nice formatting.

Parameters

base_sequencestring: a sequence string that represents the longest common sequence. Usually, this is the return value from rnav.data.set_multiple_sequence_alignment()

print_which_sequences(): Print sequence ID (integer) for each data keyword and sample.

reset(): Reset keywords and sequences from sample list in case of changes.

write_fasta(filename, which='all')

Write all unique sequences to a fasta file.

This is very useful for using external multiple sequence aligners such as ClustalOmega. 1) go to https://www.ebi.ac.uk/Tools/msa/clustalo/ 2) upload new fasta file 3) under STEP 2 output format, select Pearson/FASTA 4) click ‘Submit’ 5) wait for your alignment to finish 6) download the alignment fasta file 7) use rnav.data.set_multiple_sequence_alignment()

Parameters

filenamestring: path to a new file to which fasta entries are written
whichlist of integers, defaults to “all” (every sequence): Sequence IDs to write to file.

class rnavigate.analysis.WindowedAUROC(sample, window=81, profile='default_profile', structure='default_structure')

Bases: object

Compute and display windowed AUROC analysis.

This analysis computes the ROC curve over a sliding window for the performance of per-nucleotide data (usually SHAPE-MaP or DMS-MaP Normalized reactivity) in predicting the base-pairing status of each nucleotide. The area under this curve (AUROC) is displayed compared to the median across the RNA. Below, an arc plot displays the secondary structure and per-nucleotide profile.

AUROC values (should) range from 0.5 (no predictive power) to 1.0

(perfect predictive power). A value of 0.5 indicates that the reactivity profile does not fit the structure prediction well. These regions are good candidates for further investigation with ensemble deconvolution.

References

Lan, T.C.T., Allan, M.F., Malsick, L.E. et al. Secondary structural: ensembles of the SARS-CoV-2 RNA genome in infected cells. Nat Commun 13, 1128 (2022). https://doi.org/10.1038/s41467-022-28603-2

Methods

__init__: Computes the AUROC array and AUROC median. plot_auroc: Displays the AUROC analysis over the given region. Returns Plot object.

Attributes

samplernavigate.Sample: sample to retrieve profile and secondary structure
structurestr: Data keyword of sample pointing to secondary structure e.g. sample.data[structure]
profilestr: Data keyword of sample pointing to profile e.g. sample.data[profile]

sequence : the sequence string of sample.data[structure] window: the size of the windows nt_length: the length of sequence string auroc: the auroc numpy array, length = nt_length, padded with np.nan median_auroc: the median of the auroc array

plot_auroc(region=None)

Plot the result of the windowed AUROC analysis, with arc plot of structure and reactivity profile.

Args:

region (list of int: length 2, optional): Start and end nucleotide: positions to plot. Defaults to [1, RNA length].