Generics for DotPlot plotting — plot-generics • iSEE

A series of generics for controlling how plotting is performed in DotPlot panels. DotPlot subclasses can specialize one or more of them to modify the behavior of .generateOutput.

Generating plotting data

.generateDotPlotData(x, envir) sets up the data to use in the DotPlot plot. The following arguments are required:

x, an instance of a DotPlot subclass.
envir, the evaluation environment in which the data.frame is to be constructed. This can be assumed to have se, the SummarizedExperiment object containing the current dataset; possibly col_selected, if a multiple column selection is being transmitted to x; and possibly row_selected, if a multiple row selection is being transmitted to x.

A method for this generic should add a plot.data variable in envir containing a data.frame with columns named "X" and "Y", denoting the variables to show on the x- and y-axes respectively. It should return a list with commands, a character vector of commands that produces plot.data when evaluated in envir; and labels, a list of strings containing labels for the x-axis (named "X"), y-axis ("Y") and plot ("title").

Each row of the plot.data data.frame should correspond to one row or column in the SummarizedExperiment envir$se for RowDotPlots and ColumnDotPlots respectively. Note that, even if only a subset of rows/columns in the SummarizedExperiment are to be shown, there must always be one row in the data.frame per row/column of the SummarizedExperiment, and in the same order. All other rows of the data.frame should be filled in with NAs rather than omitted entirely. This is necessary for correct interactions with later methods that add other variables to plot.data.

Any internal variables that are generated by the commands in commands should be prefixed with . to avoid potential clashes with reserved variable names in the rest of the application.

This generic is called by .generateDotPlot (see below), which is in turn called by .generateOutput. The idea is that developers can specialize .generateDotPlotData to change the data source for a DotPlot subclass without needing to reimplement the entirety of .generateDotPlot.

Generating the ggplot object

.generateDotPlot(x, labels, envir) creates the plot to be shown in the interface. The following arguments are required:

x, an instance of a DotPlot subclass.
labels, a list of labels corresponding to the columns of plot.data. This is typically used to define axis or legend labels in the plot.
envir, the evaluation environment in which the ggplot object is to be constructed. This can be assumed to have plot.data, a data.frame of plotting data.

Note that se, row_selected and col_selected will still be present in envir, but it is simplest to only use information that has already been incorporated into plot.data where possible. This is because the order and number of rows in plot.data may have changed since .generateDotPlotData.

Methods for this generic should return a list with plot, a ggplot object; and commands, a character vector of commands to produce that object when evaluated inside envir. This plot will subsequently be the rendered output in .renderOutput. Note that envir should contain a copy of the plot object in a variable named dot.plot - see below for details.

Methods are expected to respond to the presence of various fields in the plot.data. The data.frame will contain, at the very least, the fields "X" and "Y" from .generateDotPlotData. Depending on the parameters of x, it may also have the following columns:

"ColorBy", the values of the covariate to use to color each point.
"ShapeBy", the values of the covariate to use for shaping each point. This is guaranteed to be categorical.
"SizeBy", the values of the covariate to use for sizing each point. This is guaranteed to be continuous.
"FacetRow", the values of the covariate to use to create row facets. This is guaranteed to be categorical.
"FacetColumn", the values of the covariate to use to create column facets. This is guaranteed to be categorical.
"SelectBy", a logical field indicating whether the point was included in a multiple selection (i.e., transmitted from another plot with x as the receiver). Note that if RowSelectionRestrict=TRUE or ColumnSelectionRestrict=TRUE (for RowDotPlots and ColumnDotPlots, respectively), plot.data will already have been subsetted to only retain TRUE values of this field.

envir may also contain the following variables:

plot.data.all, present when a multiple selection is transmitted to x and RowSelectionRestrict=TRUE or ColumnSelectionRestrict=TRUE (for RowDotPlots and ColumnDotPlots, respectively). This is a data.frame that contains all points prior to subsetting and is useful for defining the boundaries of the plot such that they do not change when the transmitted multiple selection changes.
plot.data.pre, present when downsampling is turned on. This is a data.frame that contains all points prior to downsampling (but after subsetting, if that was performed) and is again mainly used to fix the boundaries of the plot.

Developers may wish to use the .addMultiSelectionPlotCommands utility to draw brushes and lassos of x. Note that this refers to the brushes and lassos made on x itself, not those transmitted from another panel to x.

It would be very unwise for methods to alter the x-axis, y-axis or faceting values in plot.data. This will lead to unintuitive discrepancies between apparent visual selections for a brush/lasso and the actual multiple selection that is evaluated by downstream functions like .processMultiSelections.

In certain situations, a DotPlot subclass may be able to build off a ggplot generated by its parent class. This is easily done by exploiting the fact that methods for this generic are expected to store a copy of their plot ggplot object as a dot.plot variable in envir. A specialized method for the subclass can callNextMethod() to populate envir with the initial dot.plot, and then just construct and execute commands to add more ggplot2 layers as desired.

This generic is called by .generateOutput for DotPlot subclasses. Again, the idea here is that developers can specialize .generateDotPlot to change the plot aesthetics without needing to reimplement the entirety of .generateOutput.

Prioritizing points

.prioritizeDotPlotData(x, envir) specifies the “priority” of points to be plotted, where high-priority points are plotted last so that they will not be masked by other points. The following arguments are required:

x, an instance of a DotPlot subclass.
envir, the evaluation environment in which the ggplot object is to be constructed. This can be assumed to have plot.data, a data.frame of plotting data.

Again, note that se, row_selected and col_selected will still be present in envir, but it is simplest to only use information that has already been incorporated into plot.data where possible. This is because the order and number of rows in plot.data may have changed since .generateDotPlotData.

Methods for this generic are expected to generate a .priority variable in envir, an ordered factor of length equal to nrow(plot.data) indicating the priority of each point. They may also generate a .rescaled variable, a named numeric vector containing the scaling factor to apply to the downsampling resolution for each level of .priority.

The method itself should return a list containing commands, a character vector of R commands required to generate these variables; and rescaled, a logical scalar indicating whether a .rescaled variable was produced.

Points assigned the highest level in .priority are regarded as having the highest visual importance. Such points will be shown on top of other points if there are overlaps on the plot, allowing developers to specify that, e.g., DE genes should be shown on top of non-DE genes. Scaling of the resolution enables developers to perform more aggressive downsampling for unimportant points.

Methods for this generic may also return NULL, in which case no special action is taken.

This generic is called by .generateDotPlot, which is in turn called by .generateOutput. Thus, developers of DotPlot subclasses can specialize this generic to change the point priority without needing to reimplement the entirety of .generateDotPlot.

Controlling the “None” color scale

In some cases, it is desirable to insert a default scale when ColorBy="None". This is useful for highlighting points in a manner that is integral to the nature of the plot, e.g., up- or down-regulated genes in a MA plot. We provide a few generics to help control which points are highlighted and how they are colored.

.colorByNoneDotPlotField(x) expects x, an instance of a DotPlot subclass, and returns a string containing a name of a column in plot.data to use for coloring in the ggplot mapping. This assumes that the relevant field was added to plot.data by a method for .generateDotPlotData.

.colorByNoneDotPlotScale(x) expects x, an instance of a DotPlot subclass, and returns a string containing a ggplot2 scale_color_* call, e.g., scale_color_manual. This string should end with a "+" operator as additional ggplot2 layers will be added by iSEE.

This generic is called by .generateDotPlot, which is in turn called by .generateOutput. Thus, developers of DotPlot subclasses can specialize this generic to change the default color scheme without needing to reimplement the entirety of .generateDotPlot.

Author

Kevin “K-pop” Rue-Albrecht, Aaron “A-bomb” Lun