Templates for input data sets • WES

To use the data and methods provided in the ‘WES’ R package, input data must match the formatting shown in the template_WES_data and tamplate_WES_standard_curves data objects. These objects are installed along with the ‘WES’ package and provide a starting point to calculate derivative quantities, adding spatial metadata, and launching the visualization application. Both template data sources are described in more detail below:

Template environmental sampling data

The primary data that is required for the methods in the ‘WES’ package is shown in the template_WES_data object. These data have 6 columns and give the time and location of each observation and a Cycle Threshold (Ct) value from a qPCR assay. The code below shows how to view the data template:

library(WES)
head(template_WES_data)
date location_id  lat   lon target_name ct_value
1 2020-03-07           1 23.8 90.37    target_0       NA
2 2020-03-07           1 23.8 90.37    target_0       NA
3 2020-03-07           1 23.8 90.37    target_0       NA
4 2020-03-07           1 23.8 90.37    target_0 29.94516
5 2020-03-07           1 23.8 90.37    target_1 31.61178
6 2020-03-07           1 23.8 90.37    target_1 32.22351

str(template_WES_data)
'data.frame':   5200 obs. of  6 variables:
     $ date       : IDate, format: "2020-03-07" "2020-03-07" ...
$ location_id: int  1 1 1 1 1 1 1 1 1 1 ...
$ lat        : num  23.8 23.8 23.8 23.8 23.8 23.8 23.8 23.8 23.8 23.8 ...
$ lon        : num  90.4 90.4 90.4 90.4 90.4 ...
$ target_name: chr  "target_0" "target_0" "target_0" "target_0" ...
$ ct_value   : num  NA NA NA 29.9 31.6 ...

More detailed descriptions of each variable in the template_WES_data object are shown in the table below:

Variable	Class	Description
date	Date, IDate	The date the environmental sample was collected. Format is YYY-MM-DD.
location_id	Integer, Character	A unique identifier for each sampling location.
lat	Numeric	The lattitude of the sampling location in Decimal Degrees (DD)
lon	Numeric	The longitude of the sampling location in Decimal Degrees (DD)
target_name	Character	The unique name of each gene target in qPCR assays
ct_value	Numeric	The Cycle Threshold (Ct) value returned by qPCR assays

And a the plot below shows the temporal distribution of the simulated data in the template_WES_data object:

ggplot(template_WES_data, aes(x=sample_date, y=ct_value, color=target_name)) +
     geom_point(alpha=0.5) +
     facet_grid(rows=vars(location_id), cols=vars(target_name)) +
     scale_x_date(date_breaks = "3 month", date_labels = "%b %Y") +
     theme_bw() +
     theme(legend.position = 'none') +
     labs(x = element_blank(),
          y = "Ct value",
          title = "Template of simulated environmental sampling data")

Template standard curve data

Standard curve data provide the results from standardized qPCR assays which relate Ct values to the number of gene copies for a particular target. With these data, the calc_n_copies() function will infer the number of gene copies for each of the observed Ct values in a data set formatted according to the template_WES_data object. To view the template standard curve data, see the code below:

library(WES)
head(template_WES_standard_curve)
  target_name n_copies ct_value
1    target_1    1e+01 31.29322
2    target_1    1e+02 27.73392
3    target_1    1e+03 23.48097
4    target_1    1e+04 18.91412
5    target_1    1e+05 16.68971
6    target_2    1e+01 32.34237

str(template_WES_standard_curve)
'data.frame':   15 obs. of  3 variables:
 $ target_name: chr  "target_1" "target_1" "target_1" "target_1" ...
 $ n_copies   : num  1e+01 1e+02 1e+03 1e+04 1e+05 1e+01 1e+02 1e+03 1e+04 1e+05 ...
 $ ct_value   : num  31.3 27.7 23.5 18.9 16.7 ...

More detailed descriptions of each variable in the template_WES_standard_curve object are shown in the table below:

Variable	Class	Description
target_name	Character	The unique name of each gene target in qPCR assays
n_copies	Numeric	The known number of gene copies for the observation in the standardized qPCR assay
ct_value	Numeric	The Cycle Threshold (Ct) value returned by qPCR assays

And the plot below shows the simulated standard curve data in the template_WES_standard_curve object:

ggplot(template_WES_standard_curve, aes(x=ct_value, y=log(n_copies), color=target_name)) +
     geom_point(alpha=0.5) +
     geom_smooth(method = "lm", se = FALSE, size = 0.5, linetype=2) +
     facet_wrap(vars(target_name)) +
     theme_bw() +
     theme(legend.position = 'none') +
     labs(x = "Ct value",
          y = "log(number gene copies)",
          title = "Template standard curve data for each target")