Data were normalized and processed as previously described [1] briefly, background correction and loess normalization were performed on raw two-channel intensity data, and low quality probes were removed from subsequent analysis, leaving 30,176 probes on 269 samples across the lifespan (fetal through the aged). A 2nd degree basis spline was fit to the expression value at each probe, with knots at birth, 1, 10, 20, and 50 years [8 degrees of freedom], i.e. a curve fit to expression across age within each age range between these knots. Each model also allowed an offset at birth, because there were no samples in the third trimester of fetal life. This spline model was used for subsequent estimation of 31 surrogate variables via surrogate variable analysis [2, 3] that were then regressed out of the normalized expression data, and visualized here.


1. Colantuoni, C., et al., Temporal dynamics and genetic control of transcription in the human prefrontal cortex. Nature, 2011. 478(7370): p. 519-23.

2. Leek, J.T. and J.D. Storey, Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet., 2007. 3: p. 1724-1735.

3. Leek, J.T., et al., The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics, 2012. 28(6): p. 882-3.