rnavigate.analysis package

Submodules

rnavigate.analysis.auroc module

Windowed AUROC assesses agreement between reactivities and base-pairing.

class rnavigate.analysis.auroc.WindowedAUROC(sample, window=81, profile='default_profile', structure='default_structure')

Bases: object

Compute and display windowed AUROC analysis.

This analysis computes the ROC curve over a sliding window for the performance of per-nucleotide data (usually SHAPE-MaP or DMS-MaP Normalized reactivity) in predicting the base-pairing status of each nucleotide. The area under this curve (AUROC) is displayed compared to the median across the RNA. Below, an arc plot displays the secondary structure and per-nucleotide profile.

AUROC values (should) range from 0.5 (no predictive power) to 1.0

(perfect predictive power). A value of 0.5 indicates that the reactivity profile does not fit the structure prediction well. These regions are good candidates for further investigation with ensemble deconvolution.

References

Lan, T.C.T., Allan, M.F., Malsick, L.E. et al. Secondary structural

ensembles of the SARS-CoV-2 RNA genome in infected cells. Nat Commun 13, 1128 (2022). https://doi.org/10.1038/s41467-022-28603-2

Methods

__init__: Computes the AUROC array and AUROC median. plot_auroc: Displays the AUROC analysis over the given region. Returns Plot object.

Attributes

samplernavigate.Sample

sample to retrieve profile and secondary structure

structurestr

Data keyword of sample pointing to secondary structure e.g. sample.data[structure]

profilestr

Data keyword of sample pointing to profile e.g. sample.data[profile]

sequence : the sequence string of sample.data[structure] window: the size of the windows nt_length: the length of sequence string auroc: the auroc numpy array, length = nt_length, padded with np.nan median_auroc: the median of the auroc array

plot_auroc(region=None)

Plot the result of the windowed AUROC analysis, with arc plot of structure and reactivity profile.

Args:
region (list of int: length 2, optional): Start and end nucleotide

positions to plot. Defaults to [1, RNA length].

rnavigate.analysis.check_sequence module

SequenceChecker analysis used to inspect sequence differences.

Given a list of samples, we can inspect which data keywords belong to the samples, which sequences match up perfectly, and inspect the differences between sequences.

class rnavigate.analysis.check_sequence.SequenceChecker(samples)

Bases: object

Check the sequences stored in a list of samples.

Attributes

sampleslist

samples in which to check sequences

sequenceslist

all unique sequence strings stored in the list of samples. These are converted to an all uppercase RNA alphabet.

keywordslist

all unique data keywords stored in the list of samples.

which_sequencesPandas.DataFrame

each row is a sample, keyword, and index of self.sequences

get_keywords()

A list of all unique data keywords across samples.

get_sequences()

A list of all unique sequences (uppercase RNA) across samples.

get_which_sequences()

A DataFrame of sequence IDs (integers) for each data keyword.

print_alignments(print_format='long', which='all')

Print alignments in the given format for sequence IDs provided.

Parameters

print_formatstring, defaults to “long”

What format to print the alignments in: “cigar” prints the cigar string “short” prints the numbers of mismatches and indels “long” prints the location and nucleotide identity of all mismatches, insertions and deletions.

whichtuple of two of integers, defaults to “all” (every pairwise comparison)

two sequence IDs to compare.

print_mulitple_sequence_alignment(base_sequence)

Print the multiple sequence alignment with nice formatting.

Parameters

base_sequencestring

a sequence string that represents the longest common sequence. Usually, this is the return value from rnav.data.set_multiple_sequence_alignment()

print_which_sequences()

Print sequence ID (integer) for each data keyword and sample.

reset()

Reset keywords and sequences from sample list in case of changes.

write_fasta(filename, which='all')

Write all unique sequences to a fasta file.

This is very useful for using external multiple sequence aligners such as ClustalOmega. 1) go to https://www.ebi.ac.uk/Tools/msa/clustalo/ 2) upload new fasta file 3) under STEP 2 output format, select Pearson/FASTA 4) click ‘Submit’ 5) wait for your alignment to finish 6) download the alignment fasta file 7) use rnav.data.set_multiple_sequence_alignment()

Parameters

filenamestring

path to a new file to which fasta entries are written

whichlist of integers, defaults to “all” (every sequence)

Sequence IDs to write to file.

rnavigate.analysis.deltashape module

DeltaSHAPE for detecting meaningful changes in SHAPE reactivity between two samples.

Parameters are optimized for detecting in cell vs. cell free protein protections and enhancements, but useful for identifying any useful differences.

Copyright Matthew J. Smola 2015 Largely rewritten for RNAvigate by Patrick Irving 2023

class rnavigate.analysis.deltashape.DeltaSHAPE(sample1, sample2, profile='shapemap', smoothing_window=3, zf_coeff=1.96, ss_thresh=1, site_window=3, site_nts=2)

Bases: Sample

Detects meaningful differences in chemical probing reactivity

References

doi:10.1021/acs.biochem.5b00977

Algorithm

  1. Extract SHAPE-MaP sequence, normalized profile, and normalized

    standard error from given samples

  2. Calculated smoothed profiles (mean) and propagate standard errors

    over rolling windows

  3. Subtract raw and smoothed normalized profiles and propogate errors

  4. Calculate Z-factors for smoothed data. This is the magnitude of the

    difference relative to the standard error

  5. Calculate Z-scores for smoothed data. This is the magnitude of the

    difference in standard deviations from the mean difference

  6. Call sites. Called sites must have # nucleotides that pass Z-factor

    and Z-score thresholds per window.

Smoothing window size, Z factor threshold, Z score threshold, site-calling window size and minimum nucleotides per site can be specified.

calculate_deltashape(smoothing_window=3, zf_coeff=1.96, ss_thresh=1, site_window=2, site_nts=3)

Calculate or recalculate deltaSHAPE profile and called sites

Parameters

smoothing_windowint, default=3

Size of windows for data smoothing

zf_coefffloat, default=1.96

Sites must have a difference more than zf_coeff standard errors

ss_threshint, default=1

Sites must have a difference that is ss_thresh standard deviations from the mean difference

site_windowint, default=3

Number of nucleotides to include when calling sites

site_ntsint, default=2

Number of nts within site_window that must pass thresholds

plot(region='all')

Plot the deltaSHAPE result

Parameters

regionlist of 2 integers, default=”all”

start and end positions to plot

Returns

rnav.plots.Profile

The plot object

class rnavigate.analysis.deltashape.DeltaSHAPEProfile(input_data, metric='Smooth_diff', metric_defaults=None, sequence=None, name=None, **kwargs)

Bases: Profile

Profile data class for performing deltaSHAPE analysis

calculate_deltashape(smoothing_window=3, zf_coeff=1.96, ss_thresh=1, site_window=3, site_nts=2)

Calculate the deltaSHAPE profile metrics

Parameters

smoothing_windowint, default=3

Size of windows for data smoothing

zf_coefffloat, default=1.96

Sites must have a difference more than zf_coeff standard errors

ss_threshint, default=1

Sites must have a difference that is ss_thresh standard deviations from the mean difference

site_windowint, default=3

Number of nucleotides to include when calling sites

site_ntsint, default=2

Number of nts within site_window that must pass thresholds

get_enhancements_annotation()

Get an annotations object for the significant enhancements

get_protections_annotation()

Get an annotations object for the significant protections

rnavigate.analysis.fragmapper module

Fragmapper analysis tools.

Description: FragMapper compares reactivity profile differences between SHAPE-MaP profiles. The intended application of Fragmapper is to detect fragment or ligand crosslinking sites in RNA.

class rnavigate.analysis.fragmapper.FragMaP(input_data, parameters, metric='Delta_zscore', metric_defaults=None, read_table_kw=None, sequence=None, name=None)

Bases: Profile

get_annotation()
get_dataframe(profile1, profile2, mutation_rate_threshold, depth_threshold, delta_rate_threshold, zscore_threshold, zscore_min_threshold)
property recreation_kwargs

A dictionary of keyword arguments to pass when recreating the object.

class rnavigate.analysis.fragmapper.Fragmapper(sample1, sample2, parameters=None, profile='shapemap')

Bases: Sample

plot_scatter(column='Modified_rate')

Generates scatter plots useful for fragmapper quality control.

Args:
column (str, optional):

Dataframe column containing data to plot (must be avalible for the sample and control). Defaults to “Modified_rate”.

Returns:
(matplotlib figure, matplotlib axis)

Scatter plot with control values on the x-axis, sample values on the y-axis, and each point representing a nucleotide not filtered out in the fragmapper pipeline.

update_annotation()
class rnavigate.analysis.fragmapper.FragmapperReplicates(samples_1: list, samples_2: list, parameters=None, profile='shapemap')

Bases: Sample

average_columns(df: DataFrame, avg_columns: list[str] = ['Modified_mutations', 'Modified_effective_depth', 'Modified_rate'], sem_column: list[str] = ['Modified_rate'])
merge_samples(samples: list, profile: str = 'shapemap', suffix: str = 'rep', columns: list = ['Nucleotide', 'Sequence', 'Modified_mutations', 'Modified_effective_depth', 'Modified_rate'], exceptions: list = ['Nucleotide', 'Sequence'])
plot_scatter(column: str = 'Modified_rate', error: str = 'Std_err', label_size: int = None, ylabel: str = None, xlabel: str = None)

Generates scatter plots useful for fragmapper quality control.

Args:
column (str, optional):

Dataframe column containing data to plot (must be avalible for the sample and control). Defaults to “Modified_rate”.

Returns:
(matplotlib figure, matplotlib axis)

Scatter plot with control values on the x-axis, sample values on the y-axis, and each point representing a nucleotide not filtered out in the fragmapper pipeline.

update_annotation()

rnavigate.analysis.logcompare module

LogCompare compares reactivity profiles for significant differences.

This analysis requires replicates.

class rnavigate.analysis.logcompare.LogCompare(samples1, samples2, name1, name2, profile_kw, sequence=None, inherit=None)

Bases: Sample

Compares 2 experimental samples, given replicates of each sample.

Algorithm

  1. Calculate the ln(modified/untreated) rate for each replicate.

2. Scale these values to minimize the median of the absolute difference between samples. 3. Calculate the standard error in these values for each replicate. 4. Calculate the difference between samples. 5. Calculate z-scores between samples. 6. Plot the results in two panels: (1) the scaled log10(modified/untreated) rate for each sample with error bars, and (2) the difference between samples, colored by z-score.

Methods

__init__: computes log10(modified/untreated) rates, rescales the data, then calls make_plot() get_profile_sequence: gets log10(m/u) rate and sequence from sample rescale: rescales a profile to minimize difference to another profile load_replicates: calculates average and standard error of replicates make_plots: displays the two panels described above.

Attributes

datastr

a key of sample.data to retrieve per-nucleotide data

groupsdict

a dictionary with keys 1 and 2, each containing: self.data (averaged scaled log10(m/u)), “stderr” (standard errors), “stacked” (2d array of scaled log10(m/u) per replicate), “seq” (the sequence string)

class rnavigate.analysis.logcompare.LogProfile(input_data, metric='mean_diff', metric_defaults=None, sequence=None, **kwargs)

Bases: Profile

A class for log10(Modified_rate/Untreated_rate) profiles.

calc_profile(profile)

Calculate log10(Modified_rate/Untreated_rate) for the given sample/profile.

Args:

sample (rnavigate.Sample): an rnavigate sample

Returns:

np.array: log profile

load_replicates(profiles)

calculates log profiles, avg and sterr for a group of replicates.

Args:

*profiles (list of rnavigate.Sample): replicates to load

rescale(profile, target_profile)

scales profile to minimize difference to target_profile.

Args:

profile (np.array): log10 profile to scale target_profile (np.array): 2nd log10 profile

Returns:

np.array: scaled profile

rnavigate.analysis.lowss module

Performs low SHAPE, low Shannon entropy analysis

Citation:

Siegfried, N., Busan, S., Rice, G. et al. RNA motif discovery by SHAPE and mutational profiling (SHAPE-MaP). Nat Methods 11, 959-965 (2014). https://doi.org/10.1038/nmeth.3029

Typical usage example:

import rnavigate as rnav
my_sample = rnav.Sample(
    sample="example sample",
    shapemap="my_shape_profile.txt",
    pairprob="pairing_probabilities.txt",
    ss="MFE_structure.ct"
)
lowss_sample = rnav.analysis.LowSS(my_sample)
plot = lowss_sample.plot_lowss()
plot.save("lowss_figure.svg")
class rnavigate.analysis.lowss.LowSS(sample, window=55, shapemap='shapemap', pairprob='pairprob', structure='ss')

Bases: Sample

Creates a new RNAvigate Sample which computes and displays Low SHAPE, low Shannon entropy regions (LowSS) given a sample containing SHAPE reactivities, pairing probabilities, and MFE structure.

Methods

__init__: performs the analysis plot_lowss: displays the result and returns plot object

Attributes

samplestr

the new label for this Sample’s data on plots

parentrnavigate.Sample

the sample from which data is retrieved

windowint

size of the windows, must be odd

median_shapefloat

global median SHAPE reactivity

median_entropyfloat

global median Shannon entropy

datadictionary
dictionary of data keyword: Data objects, keys are:
“structure” (rnav.data.SecondaryStructure)

copy of provided MFE structure

“shapemap” (rnav.data.SHAPEMaP)

copy of provided SHAPE-MaP data aligned to “structure”

“pairprob” (rnav.data.PairingProbability)

copy of pairing probabilities aligned to “structure”

“entropies” (rnav.data.Profile)

Profile of Shannon entropies calculated from “pairprob”

“lowSS” (rnav.data.Annotations)

annotations defining low SHAPE, low Shannon entropy regions

plot_lowss(region=None, colorbars=True)

Visualize LowSS analysis over the given region.

Parameters

regioninteger or list of 2 integers, default=None (entire sequence)

If list: lowSS start and end positions to plot. If integer: region number, +/- 150 nts are shown.

colorbarsbool, default=True

whether to plot colorbars for pairing probability

Returns

rnavigate.plots.AP

LowSS visualization

reset_lowss(maximum_shape=None, maximum_entropy=0.08)

Generates an annotation of lowSS regions. Stored as self.lowSS

Parameters

maximum_shapefloat, default=None (median SHAPE reactivity)

maximum normalized SHAPE reactivity to be called lowSS.

maximum_entropyfloat, default=0.08

maximum shannon entropy to be called lowSS.

reset_window(window=None)

Resets the window size and recalculates windowed SHAPE reactivities and shannon entropies and lowSS region annotations.

Parameters

windowint, default=None (self.window)

window size for calculating median SHAPE and Shannon entropy, must be odd

Module contents

class rnavigate.analysis.DeltaSHAPE(sample1, sample2, profile='shapemap', smoothing_window=3, zf_coeff=1.96, ss_thresh=1, site_window=3, site_nts=2)

Bases: Sample

Detects meaningful differences in chemical probing reactivity

References

doi:10.1021/acs.biochem.5b00977

Algorithm

  1. Extract SHAPE-MaP sequence, normalized profile, and normalized

    standard error from given samples

  2. Calculated smoothed profiles (mean) and propagate standard errors

    over rolling windows

  3. Subtract raw and smoothed normalized profiles and propogate errors

  4. Calculate Z-factors for smoothed data. This is the magnitude of the

    difference relative to the standard error

  5. Calculate Z-scores for smoothed data. This is the magnitude of the

    difference in standard deviations from the mean difference

  6. Call sites. Called sites must have # nucleotides that pass Z-factor

    and Z-score thresholds per window.

Smoothing window size, Z factor threshold, Z score threshold, site-calling window size and minimum nucleotides per site can be specified.

calculate_deltashape(smoothing_window=3, zf_coeff=1.96, ss_thresh=1, site_window=2, site_nts=3)

Calculate or recalculate deltaSHAPE profile and called sites

Parameters

smoothing_windowint, default=3

Size of windows for data smoothing

zf_coefffloat, default=1.96

Sites must have a difference more than zf_coeff standard errors

ss_threshint, default=1

Sites must have a difference that is ss_thresh standard deviations from the mean difference

site_windowint, default=3

Number of nucleotides to include when calling sites

site_ntsint, default=2

Number of nts within site_window that must pass thresholds

plot(region='all')

Plot the deltaSHAPE result

Parameters

regionlist of 2 integers, default=”all”

start and end positions to plot

Returns

rnav.plots.Profile

The plot object

class rnavigate.analysis.DeltaSHAPEProfile(input_data, metric='Smooth_diff', metric_defaults=None, sequence=None, name=None, **kwargs)

Bases: Profile

Profile data class for performing deltaSHAPE analysis

calculate_deltashape(smoothing_window=3, zf_coeff=1.96, ss_thresh=1, site_window=3, site_nts=2)

Calculate the deltaSHAPE profile metrics

Parameters

smoothing_windowint, default=3

Size of windows for data smoothing

zf_coefffloat, default=1.96

Sites must have a difference more than zf_coeff standard errors

ss_threshint, default=1

Sites must have a difference that is ss_thresh standard deviations from the mean difference

site_windowint, default=3

Number of nucleotides to include when calling sites

site_ntsint, default=2

Number of nts within site_window that must pass thresholds

get_enhancements_annotation()

Get an annotations object for the significant enhancements

get_protections_annotation()

Get an annotations object for the significant protections

class rnavigate.analysis.FragMaP(input_data, parameters, metric='Delta_zscore', metric_defaults=None, read_table_kw=None, sequence=None, name=None)

Bases: Profile

get_annotation()
get_dataframe(profile1, profile2, mutation_rate_threshold, depth_threshold, delta_rate_threshold, zscore_threshold, zscore_min_threshold)
property recreation_kwargs

A dictionary of keyword arguments to pass when recreating the object.

class rnavigate.analysis.Fragmapper(sample1, sample2, parameters=None, profile='shapemap')

Bases: Sample

plot_scatter(column='Modified_rate')

Generates scatter plots useful for fragmapper quality control.

Args:
column (str, optional):

Dataframe column containing data to plot (must be avalible for the sample and control). Defaults to “Modified_rate”.

Returns:
(matplotlib figure, matplotlib axis)

Scatter plot with control values on the x-axis, sample values on the y-axis, and each point representing a nucleotide not filtered out in the fragmapper pipeline.

update_annotation()
class rnavigate.analysis.FragmapperReplicates(samples_1: list, samples_2: list, parameters=None, profile='shapemap')

Bases: Sample

average_columns(df: DataFrame, avg_columns: list[str] = ['Modified_mutations', 'Modified_effective_depth', 'Modified_rate'], sem_column: list[str] = ['Modified_rate'])
merge_samples(samples: list, profile: str = 'shapemap', suffix: str = 'rep', columns: list = ['Nucleotide', 'Sequence', 'Modified_mutations', 'Modified_effective_depth', 'Modified_rate'], exceptions: list = ['Nucleotide', 'Sequence'])
plot_scatter(column: str = 'Modified_rate', error: str = 'Std_err', label_size: int = None, ylabel: str = None, xlabel: str = None)

Generates scatter plots useful for fragmapper quality control.

Args:
column (str, optional):

Dataframe column containing data to plot (must be avalible for the sample and control). Defaults to “Modified_rate”.

Returns:
(matplotlib figure, matplotlib axis)

Scatter plot with control values on the x-axis, sample values on the y-axis, and each point representing a nucleotide not filtered out in the fragmapper pipeline.

update_annotation()
class rnavigate.analysis.LogCompare(samples1, samples2, name1, name2, profile_kw, sequence=None, inherit=None)

Bases: Sample

Compares 2 experimental samples, given replicates of each sample.

Algorithm

  1. Calculate the ln(modified/untreated) rate for each replicate.

2. Scale these values to minimize the median of the absolute difference between samples. 3. Calculate the standard error in these values for each replicate. 4. Calculate the difference between samples. 5. Calculate z-scores between samples. 6. Plot the results in two panels: (1) the scaled log10(modified/untreated) rate for each sample with error bars, and (2) the difference between samples, colored by z-score.

Methods

__init__: computes log10(modified/untreated) rates, rescales the data, then calls make_plot() get_profile_sequence: gets log10(m/u) rate and sequence from sample rescale: rescales a profile to minimize difference to another profile load_replicates: calculates average and standard error of replicates make_plots: displays the two panels described above.

Attributes

datastr

a key of sample.data to retrieve per-nucleotide data

groupsdict

a dictionary with keys 1 and 2, each containing: self.data (averaged scaled log10(m/u)), “stderr” (standard errors), “stacked” (2d array of scaled log10(m/u) per replicate), “seq” (the sequence string)

class rnavigate.analysis.LowSS(sample, window=55, shapemap='shapemap', pairprob='pairprob', structure='ss')

Bases: Sample

Creates a new RNAvigate Sample which computes and displays Low SHAPE, low Shannon entropy regions (LowSS) given a sample containing SHAPE reactivities, pairing probabilities, and MFE structure.

Methods

__init__: performs the analysis plot_lowss: displays the result and returns plot object

Attributes

samplestr

the new label for this Sample’s data on plots

parentrnavigate.Sample

the sample from which data is retrieved

windowint

size of the windows, must be odd

median_shapefloat

global median SHAPE reactivity

median_entropyfloat

global median Shannon entropy

datadictionary
dictionary of data keyword: Data objects, keys are:
“structure” (rnav.data.SecondaryStructure)

copy of provided MFE structure

“shapemap” (rnav.data.SHAPEMaP)

copy of provided SHAPE-MaP data aligned to “structure”

“pairprob” (rnav.data.PairingProbability)

copy of pairing probabilities aligned to “structure”

“entropies” (rnav.data.Profile)

Profile of Shannon entropies calculated from “pairprob”

“lowSS” (rnav.data.Annotations)

annotations defining low SHAPE, low Shannon entropy regions

plot_lowss(region=None, colorbars=True)

Visualize LowSS analysis over the given region.

Parameters

regioninteger or list of 2 integers, default=None (entire sequence)

If list: lowSS start and end positions to plot. If integer: region number, +/- 150 nts are shown.

colorbarsbool, default=True

whether to plot colorbars for pairing probability

Returns

rnavigate.plots.AP

LowSS visualization

reset_lowss(maximum_shape=None, maximum_entropy=0.08)

Generates an annotation of lowSS regions. Stored as self.lowSS

Parameters

maximum_shapefloat, default=None (median SHAPE reactivity)

maximum normalized SHAPE reactivity to be called lowSS.

maximum_entropyfloat, default=0.08

maximum shannon entropy to be called lowSS.

reset_window(window=None)

Resets the window size and recalculates windowed SHAPE reactivities and shannon entropies and lowSS region annotations.

Parameters

windowint, default=None (self.window)

window size for calculating median SHAPE and Shannon entropy, must be odd

class rnavigate.analysis.SequenceChecker(samples)

Bases: object

Check the sequences stored in a list of samples.

Attributes

sampleslist

samples in which to check sequences

sequenceslist

all unique sequence strings stored in the list of samples. These are converted to an all uppercase RNA alphabet.

keywordslist

all unique data keywords stored in the list of samples.

which_sequencesPandas.DataFrame

each row is a sample, keyword, and index of self.sequences

get_keywords()

A list of all unique data keywords across samples.

get_sequences()

A list of all unique sequences (uppercase RNA) across samples.

get_which_sequences()

A DataFrame of sequence IDs (integers) for each data keyword.

print_alignments(print_format='long', which='all')

Print alignments in the given format for sequence IDs provided.

Parameters

print_formatstring, defaults to “long”

What format to print the alignments in: “cigar” prints the cigar string “short” prints the numbers of mismatches and indels “long” prints the location and nucleotide identity of all mismatches, insertions and deletions.

whichtuple of two of integers, defaults to “all” (every pairwise comparison)

two sequence IDs to compare.

print_mulitple_sequence_alignment(base_sequence)

Print the multiple sequence alignment with nice formatting.

Parameters

base_sequencestring

a sequence string that represents the longest common sequence. Usually, this is the return value from rnav.data.set_multiple_sequence_alignment()

print_which_sequences()

Print sequence ID (integer) for each data keyword and sample.

reset()

Reset keywords and sequences from sample list in case of changes.

write_fasta(filename, which='all')

Write all unique sequences to a fasta file.

This is very useful for using external multiple sequence aligners such as ClustalOmega. 1) go to https://www.ebi.ac.uk/Tools/msa/clustalo/ 2) upload new fasta file 3) under STEP 2 output format, select Pearson/FASTA 4) click ‘Submit’ 5) wait for your alignment to finish 6) download the alignment fasta file 7) use rnav.data.set_multiple_sequence_alignment()

Parameters

filenamestring

path to a new file to which fasta entries are written

whichlist of integers, defaults to “all” (every sequence)

Sequence IDs to write to file.

class rnavigate.analysis.WindowedAUROC(sample, window=81, profile='default_profile', structure='default_structure')

Bases: object

Compute and display windowed AUROC analysis.

This analysis computes the ROC curve over a sliding window for the performance of per-nucleotide data (usually SHAPE-MaP or DMS-MaP Normalized reactivity) in predicting the base-pairing status of each nucleotide. The area under this curve (AUROC) is displayed compared to the median across the RNA. Below, an arc plot displays the secondary structure and per-nucleotide profile.

AUROC values (should) range from 0.5 (no predictive power) to 1.0

(perfect predictive power). A value of 0.5 indicates that the reactivity profile does not fit the structure prediction well. These regions are good candidates for further investigation with ensemble deconvolution.

References

Lan, T.C.T., Allan, M.F., Malsick, L.E. et al. Secondary structural

ensembles of the SARS-CoV-2 RNA genome in infected cells. Nat Commun 13, 1128 (2022). https://doi.org/10.1038/s41467-022-28603-2

Methods

__init__: Computes the AUROC array and AUROC median. plot_auroc: Displays the AUROC analysis over the given region. Returns Plot object.

Attributes

samplernavigate.Sample

sample to retrieve profile and secondary structure

structurestr

Data keyword of sample pointing to secondary structure e.g. sample.data[structure]

profilestr

Data keyword of sample pointing to profile e.g. sample.data[profile]

sequence : the sequence string of sample.data[structure] window: the size of the windows nt_length: the length of sequence string auroc: the auroc numpy array, length = nt_length, padded with np.nan median_auroc: the median of the auroc array

plot_auroc(region=None)

Plot the result of the windowed AUROC analysis, with arc plot of structure and reactivity profile.

Args:
region (list of int: length 2, optional): Start and end nucleotide

positions to plot. Defaults to [1, RNA length].