Linear regression plots

Linear regression plots are useful to determine reproducibility between replicates, or to quickly quantify the difference between structural states. RNAvigate creates scatter plots of per-nucleotide values from one sample on the x-axis for another sample on the y-axis. Slope and R^2 values are displayed. Nucleotides can be colored by sequence or base-pairing status. A KDE of paired/unpaired reactivity distributions may also be plotted for each sample.

[1]:
import rnavigate as rnav
from rnavigate.examples import rnasep_1, rnasep_2, rnasep_3, rnasep_4

plot = rnav.plot_linreg(
    samples=[rnasep_1, rnasep_2, rnasep_3, rnasep_4],
    profile="shapemap",
    scale="log",
)

../_images/plot_options_linreg_1_0.png
../_images/plot_options_linreg_1_1.png
[2]:
help(rnav.plot_linreg)

Help on function plot_linreg in module rnavigate.plotting_functions:

plot_linreg(samples, profile, sequence=None, structure=None, annotations=None, labels=None, kde=False, scale='linear', regression='pearson', colors='sequence', column=None, region='all', colorbars=True, plot_kwargs=None)
    Performs linear regression analysis and generates scatter plots of all
    sample-to-sample profile vs. profile comparisons. Colors nucleotides by
    identity or base-pairing status.

    Parameters
    ----------
    samples : list of rnavigate Samples
        samples used to retrieve data
    profile : data keyword string or data object
        per-nucleotide data to perform linear regression
        all data are mapped to the sequence of the profile data from the
        first sample before plotting, unless sequence is supplied
    sequence : data keyword str, data obj, or sequence str, defaults to None
        a sequence from which to align all profiles
        if a data keyword, uses data from the first sample
    structure : data keyword string or data object, defaults to None
        Structure used for coloring if colors argument is "structure"
    annotations : list of data keyword strings or data objects, defaults to []
        Annotations used for coloring if colors argument is "annotations"
    labels : list of strings, defaults to sample.sample for each sample
        list containing Labels to be used in plot legends
    kde : bool, defaults to False
        whether to plot kde (density) instead of a scatter plot
    scale : "linear" or "log", defaults to "linear"
        "linear" performs regression on raw values, displays linear units
        "log" performs regression on log10(values), displays log10 units
    regression : "pearson" or "spearman", defaults to "pearson"
        "pearson" calculates Pearson R-squared (standard)
        "spearman" calculates Spearman R-squared (rank-order)
    colors : string or list of colors, defaults to "sequence"
        Values can be: None (don't plot), "sequence" (color by nucleotide identity),
        "position" (position in sequence), "annotations" (sequence annotations),
        "profile" (per-nucleotide data from profile argument),
        "structure" (base-pairing status), a single matplotlib color for all positions,
        or an array of one color per position which matches the structure length.
    column : string, defaults to profile.metric
        column name of values from profile to use in regression
    region : list of 2 integers, defaults to [1, length of sequence]
        start and end nucleotide positions to include. 1-indexed, inclusive
    colorbars : bool, defaults to ``True``
        Whether to plot colorbars for scatter plot colors
    plot_kwargs : dict, defaults to {}
        Keyword-arguments passed to matplotlib.pyplot.subplots

    Returns
    -------
    rnavigate.plots.LinReg
        object containing matplotlib figure and axes with additional plotting and
        file saving methods