## Time Course Expression Analysis

# Introduction

This tool is designed to perform time-course expression analysis of count data arising from RNA-seq technology. Based on the maSigPro program, this application allows the detection of genomic features (e.g. genes) with significant temporal expression changes and significant differences between experimental groups. The software package **maSigPro**, which belongs to the Bioconductor project, implements a two steps regression strategy to find genes for which there are significant expression profile differences in time course RNA-seq experiments.

Please cite maSigPro as:

Conesa A, Nueda MJ (2018). "maSigPro: Significant Gene Expression Profile Differences in Time Course Gene Expression Data." R package version 1.52.0, http://bioinfo.cipf.es/.

**Figure 1: **Time Course Expression Interface

## Expression Data

The pairwise differential expression analysis application expects gene expression levels in the form of a count table. In OmicsBox, count tables can be generated via the **Create Count Table** application.

Count tables can also be imported from a text file. Go to **File → Load → Load Count Table** (Figure 2) and select your .txt file containing the count table.

**Figure 2: **Count Table File

# Run Analysis

Go to **Transcriptomics → Run Differential Expression Analysis and** choose the ”Time Course Expression Analysis” option. Here you can specify the following parameters, which are divided into three different sections: Preprocessing Data (Figure 3), Experimental Design (Figure 4), and Analysis Options (Figure 5).

## Preprocessing Data Page

**Filter low count genes:****CPM Filter:**Establish a filter to exclude genes with low counts across libraries, as those genes may interfere with the subsequent statistical approximations. Filtering is performed on a count-per-million (CPM) basis to account for differences in library size between samples (e.g. a CPM of 1 corresponds to a count of 6 in a sample with 6 million reads).**Samples reaching CPM Filter:**Set a minimum number of samples in which the gene's CPM is above the filter level (is expressed). If this value is set to e.g. five, at least 5 of the samples have to be above the given CPM. The number of samples of the smallest group is usually taken (e.g. in an experiment that has two replicates for each condition (or group), a gene should be expressed in at least two samples). Set value to 0 if no filter is desired.

**Normalization procedure:****Normalization Method**: Normalization is an important step to make the samples comparable and to remove possible biases (as sequencing depth bias) in count data. You can select the normalization method to be used:**TMM**: Weighted trimmed mean of M-values. In this method, weights are obtained from the delta method on Binomial Data (this method is recommended).**RPKM**: Reads Per Kilobase per Million mapped reads. This method corrects for gene length and the number of sequencing reads (gene length is required).**Upper-quartile**: 75% quantile for the counts for each library is used to calculate the scale factors for normalization.**None**: No normalization method is applied.

**Feature Length File**: For RPKM normalization load a tab-delimited file (or ID-Value object) with two columns containing the name and length of each gene or genomic feature.

**Figure 3: **Preprocessing Data Page

## Experimental Design Page

**Experimental design file:**Select your .txt file containing your experiment descriptors associated with each sample in tab-delimited format. As shown below, rows correspond to samples and columns to experimental descriptors. A column must contain the associated time points for each sample, and another column should show the assignment of samples to experimental groups. Make sure that the names in the first column of the experimental design table are exactly the same as the sample names in the count table header. If your experimental design file has fewer samples than the count table, only the samples contained in this file will be analyzed.

Sample Time Group B12_A6_06hpi_1 6 A6 B12_A6_06hpi_2 6 A6 B12_A6_06hpi_3 6 A6 B12_A6_12hpi_1 12 A6 B12_A6_12hpi_2 12 A6 B12_A6_12hpi_3 12 A6 B12_A6_18hpi_1 18 A6 B12_A6_18hpi_2 18 A6 B12_A6_18hpi_3 18 A6 B12_A6_24hpi_1 24 A6 B12_A6_24hpi_2 24 A6 B12_A6_24hpi_3 24 A6 B12_K1_06hpi_1 6 K1 B12_K1_06hpi_2 6 K1 B12_K1_06hpi_3 6 K1 B12_K1_12hpi_1 12 K1 B12_K1_12hpi_2 12 K1 B12_K1_12hpi_3 12 K1 B12_K1_18hpi_1 18 K1 B12_K1_18hpi_2 18 K1 B12_K1_18hpi_3 18 K1 B12_K1_24hpi_1 24 K1 B12_K1_24hpi_2 24 K1 B12_K1_24hpi_3 24 K1 pps_A6_06hpi_1 6 A6 pps_A6_06hpi_2 6 A6 pps_A6_06hpi_3 6 A6 pps_A6_12hpi_1 12 A6 pps_A6_12hpi_2 12 A6 pps_A6_12hpi_3 12 A6 pps_A6_18hpi_1 18 A6 pps_A6_18hpi_2 18 A6 pps_A6_18hpi_3 18 A6 pps_A6_24hpi_1 24 A6 pps_A6_24hpi_2 24 A6 pps_A6_24hpi_3 24 A6 pps_K1_06hpi_1 6 K1 pps_K1_06hpi_2 6 K1 pps_K1_06hpi_3 6 K1 pps_K1_12hpi_1 12 K1 pps_K1_12hpi_2 12 K1 pps_K1_12hpi_3 12 K1 pps_K1_18hpi_1 18 K1 pps_K1_18hpi_2 18 K1 pps_K1_18hpi_3 18 K1 pps_K1_24hpi_1 24 K1 pps_K1_24hpi_2 24 K1 pps_K1_24hpi_3 24 K1

**Figure 4: **Experimental Design Page

## Analysis Options

**Design Type:**Choose the design type to adjust the analysis.**Single Series Time Course**: Detects genes that show significant expression changes over time. You only have to select the time factor of your experimental design in “Targets".**Multiple Series Time Course**: Find genes with significant temporal expression changes and significant differences between experimental groups. You have to establish the time and experimental factors, and select the control condition of your experimental design in “Targets".

**Statistical Settings:**Significance Level (Alfa): The level of FDR control used for variable selection in the stepwise regression.

R-squared Cutoff: Cutoff value for the R-squared of the regression model.

**Visualization of Results:****Number of Clusters**: Establish a number of clusters to group genes by similar expression profiles.**Clustering Method**: Choose a clustering method for data partitioning.Hierarchical Clustering: Performs a hierarchical cluster analysis using a set of dissimilarities for the features being clustered.

K-Means Clustering: Is intended to divide the points into K clusters such that the sum of squares of the points to the centers of the clusters assigned is minimized.

Model-Based Clustering: The optimal model according to BIC for EM initialized by hierarchical clustering for Gaussian mixture models. This method computes an optimal number of clusters. Keep in mind that this method requires more time.

**Figure 5: **Analysis Options

# Results

Once the input counts have been processed and analyzed via the “Time Course Expression Analysis"“ tool, a new tab is opened containing statistical results obtaining by the stepwise regression statistical test (Figure 6):

P-value of the regression ANOVA.

R-squared of the model.

P-value of the regression coefficients of the selected variables.

Tags: Indicate the list/s of significant genes in which the feature appears (R-squared ≥ R-squared Cutoff).

Red tags: Lists of significant genes for each experimental group (only available in “Multiple Series Time Course”).

Blue tags: List of significant genes for each variable of the regression model.

Only the genes that have passed the established Significance Level are shown in the new tab. For further details please refer to the maSigPro User"s Guide.

Results can be saved as a TC Results object. Note that is not possible to perform the analysis on this object. For this purpose, you have to open the Count Table object.

**Figure 6: **Table Viewer

A result page will show a summary of the time-course expression analysis results, including the cluster of features with similar expression profiles (Figure 7). Go to **Side Panel **→ **Result Summary** in order to visualize the result summary and to export it in pdf.

During the Time Course Expression Analysis, raw counts are transformed according to the normalization method selected in the analysis configuration. Go to **Export Normalized Counts **(sidebar) to export normalized counts to a tabular text file.

**Figure 7: **Summary Report

## Charts and Statistics

Different statistics charts can be generated for a global visualization of the results. These charts can be found under the **Side Panel **of the TimeCourse Results viewer.

### MDS Plot

Generates a two-dimensional scatterplot in which the distances represent the typical log2 fold changes between samples. You can select an experimental factor by which you want to color the MDS graphic (Figure 8).

**Figure 8: **MDS Plot

### Venn Diagram

Diagram showing all possible logical relations between a finite collection of different feature sets (Figure 9). You can choose between two types of Venn Diagram (“Pairwise” or “Triple”), and select the sets of significant genes to display.

**Figure 9: **Venn Diagram

### Expression Profile by Gene

Graph of gene expression profiles over time for a particular gene (Figure 10). It is possible to see them by right-clicking on the chosen gene and selecting the “Show Expression Profile” option.

**Figure 10: **Gene Expression Profile

### Experiment-wide Expression Profiles

Plot showing the expression level levels across samples for each cluster of genes (Figure 11).

**Figure 11: **Experiment-wide Expression Profile

### Summary Expression Profiles

Plot showing the median level expression of each cluster of genes across time (Figure 12).

**Figure 12: **Summary Expression Profile