What is BrainCloud

The BrainCloud application allows the query of genome-wide gene expression data and their genetic control in the human postmortem dorsolateral prefrontal cortex (DLPFC) of normal subjects across the lifespan. It is available at http://www.libd.org/braincloud.

Accessing the BrainCloudData

The freely available version of BrainCloud contains expression data. In order to integrate SNP data (along with genetic associations with gene expression), users must apply for “authorized access” with dbGaP and download PLINK-format SNP data associated with dbGaP accession “phs000417.v1.p1”. This data are then integrated by the BrainCloud application.

Visit this site to apply for ‘authorized access’ with dbGaP: https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?page=login. Please choose this study from the alphabetical list for authorization: “BrainCloud: Data from human postmortem brain procurement for the neuropathology section”

After you receive authorization, download the 29 MB file phg000160.v1.p1.NIMH_BrainCloud.genotype-calls-matrixfmt.Illumina.c1.GRU.tar.gz and unzip. “SNP268.bed” file is needed for BrainCloud.

Load Genotyping Data

Once you have access to dbGaP data, download snp268.bed to your local computer. To locate this file in BrainCloud, click “Load SNP data” and look for snp268.bed file in the open file dialog. Open the file by clicking “Open”. This step will integrate the data into the software.

Update and Error Report

The current application is the beta version of BrainCloud. We expect bugs in this software. Please report the bugs to: brain.cloud@libd.org . We will try to fix bugs and improve the software as often as necessary. The application support automatic update, it checks updates at the start of running and prompts you for update if a newer version of BrainCloud is available.

Note: 64 bit users need to install SSCERuntime from http://www.microsoft.com/download/en/details.aspx?id=5783  in order to run BrainCloud properly.

Release date of latest version of BrainCloud: November 8, 2011

The BrainCloud Application

Buttons in the upper menu bar allow access to various views of gene expression and multiple modes of analysis.

            Overview

To start the data query, type the gene name or part of gene name, i.e., keyword (e.g., “SLC12A2” or just “SLC”) in the empty field “Gene name/keyword” at the lower right. By clicking “Enter” or “Search” button, all the probes corresponding to this gene/keyword will appear in the spreadsheet underneath the upper menu bar. There are usually multiple probes designed for variants of a given gene measuring multiple transcripts or groups of transcripts (more on this in Gene View). The probes appear in the rows of the spreadsheet under the main bar. To select a particular probe, click the outermost left field (to the left of the CloneIndex) and highlight the entire row (the entire row highlights in blue). The data query will be now focused on this selected probe. To select a SNP that will be included in the analysis, type a SNP name in the field “SNP Name” next to “Gene Name/Keyword”. This will be a selected SNP. SNP selection is not required if the query is not related to SNPs.

The columns in this spreadsheet provide information about the following characteristics of the probe:

Note: narrow or widen the columns by clicking and dragging. To sort the data by any of the factors, click the column header.

·        CloneIndex – ID of the Illumina oligonucleotide probe

·        Gene – official gene symbol corresponding to the probe

·        Type – type of the oligonucleotide probe (see Illumina definitions of the abbreviated types in the Table below)

·        Genomic Match – indicates the number of times the probe sequence appears in the genome

·        mRNA Match– indicates the number mRNA sequences in which the probe sequence appears

·        EST Match – indicates the number of EST sequences in which the probe sequence appears

·        SNP In Probe – indicates the number of HapMap SNPs in probe sequence (0 - no SNP in the probe sequence; 1 – one SNP in the probe sequence)

·        TM - melting temperature of the probe

·        GC – G/C content of the probe

·        Intensity – the average log2 intensity of the fluorescent signal for this probe across all the subjects/arrays

·        Description – gene description

·        Accession – accession number from GenBank at http://www.ncbi.nlm.nih.gov/genbank/

·        Gene Pos – chromosomal position of the gene

·        Probe Pos – chromosomal position of the probe

 After selecting a particular probe (the entire row will be highlighted in blue), the expression data will appear in the spreadsheet below the probe selection as well as in the lower right graph.

 

Table. Illumina probe type descriptions.

Probes.jpg

The expression spreadsheet contains the following information:

·        ID – subject ID number

·        Age – age of the subject at death (in years). Note: “negative numbers refer to prenatal age in years calculated back from day 0, the day of expected birth after 40 gestational weeks”

·        Sex – sex of the subject

·        RIN – a quantitative measure of RNA quality - RNA integrity number obtained from Agilent Biooanalyzer2100 (0-10 with 10 indicating the best RNA quality)

·        PMI – postmortem interval in hours (estimated time from death to freezing the brain tissue)

·        Race – race of the subject (CAUC – Caucasian, AA – African American, AS – Asian, HISP – Hispanic)

·        pH – pH of the cerebellar tissue obtained from the subject

·        Exp – expression of the selected probe expressed as log2 of the ratio of sample signal to the reference signal (reference – pooled RNA from all the subjects)

·        SNP – genotype at a selected SNP. To see the genotyping data, the SNP name (rs followed by a number)  is entered at the lower  right field (“SNP name). The genotype at this SNP will appear as 1,2, or 3 (1- homozygotes for A or T, 2 – heterozygotes, 3 – homozygotes for C or G)

 

Note: To sort the data by any of the factors, click on the column header.

 

To export the expression, genotypic and demographic/tissue data to Excel for further analysis, click on the lower right button “Export to Excel”.

 

A small graph underneath the expression spreadsheet shows the distribution of the intensity of fluorescent signal across all the probes. The vertical lines depict relative signal intensity (average signal intensity across all the subjects) of the selected probe compared to the intensity of two negative controls (EMP – empty well, NCT – an oligoprobe not expressed) and three housekeeping genes (HMBS (PBGD) hydroxymethylbilane synthase -, ACTB – beta actin, GAPDH - glyceraldehyde-3-phosphate dehydrogenase). The intensity of the selected probe is shown as a red vertical line, controls as green vertical lines. X-axis – Log2 signal intensity, Y-axis – density (the probe frequency).

 

A large graph to the right shows the expression data (Y axis) plotted as a function of age (X axis).

Note:  the Age scale is expressed differently for prenatal subjects (in gestational weeks) than in all the other postnatal subjects (in years). Each black dot represents an individual subject. The red dots represent Loess fit (separate fit for fetal data and for postnatal data).

To save a graph click  “Save graph” button.

            Gene View

This option allows inspection of p-values for associations of the selected probe expression with SNPs (upper graph) and the location of the probe (lower graph).

 

The upper graph depicts p-values [Y axis: -log10(p)] for associations with SNPs. Chromosomal location of SNPs is depicted on the X axis in the yellow horizontal bar. Association statistics (SNP name, SNP position and p-value) show in the box when holding a cursor on the vertical line.

 

The lower graph allows the view of the probe(s) location relative to the known expressed sequences from the gene. Probe(s) are shown as CloneIndex numbers and their exact locations are depicted as vertical red lines. Known transcripts and ESTs with the corresponding accession numbers are also depicted across the length of the gene.

Save the graph (the entire page) by clicking “Save graph” button.

 

   Note: Right clicking on the white area between CloneIndex and Gene highlights in red the wide horizontal bar and allows panning to the right or left from the current genomic position.

 

            SNP View

This option allows inspection of genome wide SNP associations for the selected probe.

The upper graph depicts  –log10(p-values) of associations of the selected probe with SNPs in the region of the chromosome highlighted in the lower graph. A left click on the vertical line indicating probe/SNP association, selects a SNP and switches to the view to the next window “Analysis”.

Note: To go back to SNP View, click on the button in the main menu.

To highlight a particular chromosomal region in the lower graph, click on the desired chromosomal location. A red rectangle will appear showing the selected region. The Y axis on the lower graph shows –log10(p-values) for associations with expression of the selected probe across the entire genome.

Analysis

This option allows the display and analysis of the expression data for a selected probe using a general linear model (GLM). The graph shows the adjusted gene expression data.

 

The fields to the left of the graph represent:

·        Gene – selected gene

·        CloneID – a selected probe ID for a gene

·        SNP – type a SNP name (e.g. rs48359370) if the expression data are analyzed as a function of the genotype

·        Factor – select a factor by which to analyze (segregate) the expression data (SNP, sex or race)

Filter allows selecting subgroups of data for analysis. Selection can be done by filtering:

·        Age – select an age range (all subjects will be included in the analysis if no age range is selected)

·        Sex – select sex(es) to be analyzed (All, Males or Females)

·        Race – select a race group to be included in the analysis (empty field will include all subjects)

·        RIN – select a range of RNA quality to be included in the analysis

·        Exp – select an expression level [log2(sample/reference] range to be included in the analysis. This option is convenient if expression outliers need to be removed.

 

Click on “Scatter plot” button to see a scatter plot or on “Box plot” to see a box plot of the data.

 

In the scatter plot, groups for the selected factor will appear in different colors. See a displayed legend for details.

 

Note: Remove Loess fit from the graph by clicking on the left small box “Remove LOESS fit”

 

Click on the “Box plot” button to see a box plot. This plot shows the expression data segregated by the selected factor (None, SNP or Sex) in both races combined, in African Americans only and in Caucasians only. The last box plot shows age differences between the selected groups.

 For instance, if SNP is a selected factor, four separate box plots will appear in the window illustrating:

·        expression differences between genotypes for both races combined (race=Both, African Americans and Caucasians), p value is for the effect of SNP on expression obtained from GLM based on the best fit model procedure

·        expression differences between genotypes in African Americans (AA) only

·        expression differences between genotypes in Caucasians only

·        age differences between the genotypic groups

Numbers underneath the graphs represent the number of subjects per group/genotype (1 – homozygotes for A or T, 2 – heterozygotes, 3 – homozygotes for C or G).

 

To save graphs, click on “Save graph” button.

“Model” field underneath the graph shows the terms included in the GLM based on a best fit model selection procedure.

            Cis Assoc

This option allows interrogation of the data for associations of expression of a selected probe with Cis SNPs (Cis – 100 Kb upstream or downstream from the gene).

The spreadsheet shows the data for all the Cis SNPs.

·        SNP – rs number for the SNP

·        p – p value for the association obtained from GLM based on the best fit model procedure

·        Dist – chromosomal distance of the SNP from the gene (0 indicates that the SNP is within the gene, a negative number indicates chromosomal position in numbers smaller than the gene position, a positive number indicates chromosomal position in numbers bigger than the gene position)

·        Chr – chromosomal location of the SNP

·        Gene Start – position of the gene start

·        Gene End – position of the gene end

·        SNP Pos – location of the SNP

·        MAF – minor allelic frequency for the SNP

·        p HWE AA – p value for the Hardy-Weinberg equilibrium in African American subjects

·        p HWE CAUC – p value for the Hardy-Weinberg equilibrium in Caucasian subjects

 

Select the SNP by highlighting the entire row with a click on the outermost left field next to the SNP name (the entire row will highlight in blue). Graphs illustrating expression as a function of genotype for this selected SNP will appear on the right (a scatter plot in the upper right, a box plot in the lower right). To enlarge graphs and see legends, click on the graph of interest (scatter plot or box plot). In the box plot three genotypes are shown as 1, 2, 3 (1 – homozygotes for A or T blue box, 2- heterozygotes emerald box, 3 – homozygotes for C or G pink box). Numbers underneath the box plot indicate the numbers of subjects per genotype.

 

Note: Sort by any factor by clicking on the column header.

GWA

This option allows interrogation of associations with the expression of the selected probe with SNPs genome wide (associations with p values < 10-4 are displayed).

 

The spreadsheet shows the data for the associated SNPs

·        SNP – rs number for the SNP

·        p – p value for the association (shown if p <10-4)

·        SNP Chr – chromosome location of the SNP

·        SNP Pos – SNP position on the chromosome

·        Dist – distance of the SNP from the gene (0 indicates that the SNP is within the gene, a negative number indicates chromosomal position in numbers lower than the gene position, a positive number indicates chromosomal position in numbers higher than the gene position, empty field indicates that the SNP is on another chromosome than the gene)

·        Gene Of SNP – a gene name close to the SNP

·        Location  – description of the location of the SNP relative to the gene

·        MAF – minor allelic frequency for the SNP

·        p HWE AA – p value for the Hardy-Weinberg equilibrium in African American subjects

·        p HWE CAUC – p value for the Hardy-Weinberg equilibrium in Caucasian subjects

 

Select a SNP by highlighting the entire row with a click on the outermost left field next to the SNP name. Graphs illustrating expression as a function of genotype for this selected SNP will appear on the right (a scatter plot in the upper right, a box plot in the lower right).

 

To enlarge graphs and see legends, click on the graph of interest (scatter plot or box plot). In the box plot three genotypes are shown as 1, 2, 3 (1 – homozygotes for A or T blue box, 2- heterozygotes emerald box, 3 – homozygotes for C or G pink box). Numbers underneath the box plot indicate the numbers of subjects per genotype.

Note: Sort by any factor by clicking on the column header.

Top Gene for SNP

This option allows the genome-wide exploration of associations of the selected SNP with the expression of probes. The previously selected SNP name appears at the top of the spreadsheet. The spreadsheet lists all associations of this SNP with the probes that are below the threshold of p values < 10-4).

Note: Selected SNP can be selected in multiple windows (see Overview, SNP view, Analysis, Cis Assoc, GWA)

The spreadsheet shows the following data for these associations.

·        CloneIndex – probe ID with which the selected SNP is associated

·        Gene – a gene name of the probe

·        p – p value for the association of expression of the probe with the selected SNP obtained from GLM based on the best fit model procedure

·        SNP – rs number for the selected SNP

·        Gene Chr  chromosomal location of the probe associated with the selected SNP

·        Gene Start – chromosomal position of the start of the gene

·        Gene End –chromosomal position of the end of the gene

·        Dist - distance of the SNP from the gene (0 indicates that the SNP is within the gene, a negative number indicates chromosomal position in numbers lower than the gene position, a positive number indicates chromosomal position in numbers higher than the gene position, empty field indicates that the SNP is on another chromosome than the gene)

·        SNP Chr – chromosomal location of the SNP

·        SNP Pos – chromosomal position of the SNP

 

Select the probe by highlighting the entire row with a click on the outermost left field next to the probe CloneIndex number. Graphs illustrating the expression of the selected probe as a function of genotype for the selected SNP will appear on the right (a scatter plot in the upper right, a box plot in the lower right).

To enlarge graphs and see legends, click on the graph of interest (scatter plot or box plot). In the box plot three genotypes are shown as 1, 2, 3 (1 – homozygotes for A or T blue box, 2- heterozygotes emerald box, 3 – homozygotes for C or G pink box). Numbers underneath the box plot indicate the numbers of subjects per genotype.

 

Note: Sort by any factor by clicking on the column header.

 

Correlation

This option explores correlations of a selected probe’s expression with that of other probes genome-wide. Select the probe to interrogate either by typing the name in the lower left corner and clicking “Enter” or “Search” (the view will change back to Overview window) or by entering the gene name in the Overview section.

Note: If in Overview window come back to Correlation by clicking “Correlation” button in the main menu.

A list of probes that correlate with the selected probe appears in the spreadsheet to the right.  To select a probe pair highlight the entire row by clicking on the outermost left field next to ID.

 

The spreadsheet provides the following data:

·        CloneIndex – probe Clone Index ID

·        Gene – gene name for the selected probe

·        r – Pearson’s correlation coefficient for the correlation of the selected pair of probes. Only correlations with absolute values r>0.6 are shown.

After selecting a correlation, the graph illustrating this correlation appears to the left of the spreadsheet (X axis represents the expression of the initially selected probe as log2 ratio of probe/reference, Y axis represents the expression of the probe selected in the spreadsheet in the same units)

 

Note: Sort by any factor by clicking on the column header.

To query for correlation of the selected probe with a particular other probe, type the gene name in the upper right field “gene name”.

 

Haploview

This option provides a graphical summary of associations of the expression of the selected probe with all the Cis–SNPs as well as the graphical summary of linkage disequilibrium (LD) according to Haploview1 (http://www.broadinstitute.org/scientific-community/science/programs/medical-and-population-genetics/haploview/haploview).  LD is described by two pair-wise measures: Lewontin’s standardized disequilibrium coefficient D’ and R2. Switch between the two by clicking the outermost field at the lower right.

 

Reference: 1. Barrett JC, Fry B, Maller J, Daly MJ. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005 Jan 15 [PubMed ID: 15297300]