Introducing the linear mixed model
Introduction
The two main uses of the linear mixed model are
to estimate treatment effects and test for differences, and
to predict genetic merit of individuals.
In the first case, treatment effects are estimated as fixed effects
and reflect what happened in the experiment. Random effects then
reflect sources of variation fitted to get the most efficient hypothesis tests.
In the second case, the random genetic effects are of most interest,
having been adjusted for other fixed or random sources of variation.
If y denotes the n × 1 vector of observations, the
linear mixed model can be written as
y = X τ + Z u + e
where τ is the p × 1 vector of fixed
effects,
X is an n × p design matrix of full column rank (after deleting any redundant singular columns)
which
associates observations with the appropriate combination of fixed
effects, u is the q × 1 vector of random
effects,
Z is the n × q design matrix which associates
observations with the appropriate combination of random effects, and
e is the n × 1 vector of residual errors.
The data may be collected from a controlled experiment, or by
sampling an existing population.
Introducing the linear mixed model
The linear mixed model is also called a linear
mixed effects model. It is assumed that
u ~ N(0, G (σg)
is independent of (uncorrelated with)
e ~ N(0, R(σr)
where the
matrices G and R are functions of parameters
σg and σr, respectively.
The variance matrix for y is then of the form
var ( y) = Z G (σg)Z' + R (σr)
which we will refer to as the sigma parameterization of the G and R variance structures,
and the variance structure parameters in σg and σr
will be referred to as sigmas although they are not necessarily all variances.
The variance models given by G and R are referred to as G structures
and R structures respectively.
We illustrate these concepts using the simplest linear mixed model.
A one-way classification
Consider a one-way classification comprising a single random effect u,
and a residual error term e.
The two random components of this model, namely u and e,
are each assumed to be independent and identically distributed (IID)
and to follow a normal distribution such that
u~ N(0, σ2uIq) and
e~ N(0, σ2eIn).
Hence the variance of u has the form
var (y) = σ2uZZ' + σ2eIn
This model has two variance structure parameters or sigmas: the variance component
σ2u associated with u, and the variance component σ2eassociated with e.
Mapping this equation back to the general expression, we have
σg=σ2u, G(σg) = σ2uIq,
σr =σ2e and R(σr) = σ2eIn .
G structure for the random model terms}
Typically, τ and u are composed of several model terms,
that is, τ can be partitioned as τ=[τ'1: τ't]'
and u can be partitioned as u = [u'1: u'b]'
with X and Z partitioned conformably as X = [X1: Xt]
and Z = [Z1: Zb].
For u thus partitioned,
we impose a direct sum structure on the matrix G, written
G = ⊕i=1:b Gi giving
| | G1 | 0 | : | 0 | 0
|
| | 0 | G2 | : | 0 | 0
|
| | : | : | : | : | :
|
| | 0 | 0 | : | Gb-1 | 0
|
| | 0 | 0 | : | 0 | Gb
|
where ⊕ is the direct sum operator, each
Gi is of size qi and q = ∑iqi.
The default assumption is that each random model term generates one component of this direct sum;
var(ui) = Gi for i = 1 : b.
In this case, the random effects from any two distinct model terms are uncorrelated.
Traditional Variance components models
Extending the one-way example to a linear mixed model with more than one (b>1) random terms
(typically known as a variance components mixed model),
the random effects ui in u, and the residual errors e,
are assumed uncorrelated and to each be
normally distributed with mean zero and variance given by
var(ui)= σ2ui Iqi
and
var(e) = σ2eIn
where Iqi and In
are identity matrices of dimension qi and n, respectively.
In this case
var(y)= ∑i=1:b σ2ui ZiZ'i + σ2eIn
Partitioning the residual error term
As for the fixed and random model terms, it is often appropriate
to partition the vector of residual errors e according to some conditioning factor.
We use the term section to describe this partitioning.
A common reason for using sections in e is because subsets of the residuals have different variances.
For example, in the analysis of multi-environment trials (METs) it is natural to expect
that each trial will have a different (possibly spatial) error structure.
In this case, for s sections we have e = [e'1: e's]'
assuming that the data vector is ordered by section, and where ej represents
the vector of errors for the jth section.
For e partitioned as e = [e1,e2, :,es]'
we allow the matrix R to have a similar direct sum structure, with
R = ⊕j=1:s Rj giving
| | R1 | 0 | : | 0 | 0
|
| | 0 | R2 | : | 0 | 0
|
| | : | : | : | : | :
|
| | 0 | 0 | : | Rs-1 | 0
|
| | 0 | 0 | : | 0 | Rs
|
for s sections and the data ordered by section.
It may be necessary to re-order (sort) the data units to achieve this structure.
While the default variance structure for a section is IID (σ2ej Inj ),
a more complex structure is often appropriate.
ASReml offers a wide range of
variance structures.
Gamma parameterization for the linear mixed model
In the sigma parameterization of model (2.3) the scale parameters in both G(σg and R(σr
are variances. However, in a univariate analysis with one data section, the
one residual variance parameter may be factored out of
G(σg) and R(σr) so that
var (y)= σ2eΣi=1:b γui ZiZ'i + In
for γui being the ratio of the variance component for the random term ui
relative to error variance, that is, γui = σ2ui /σ2e.
We will refer to this as the gamma parameterization, and the individual variance structure parameters
in γg and γr will be referred to as gammas.
ASReml defaults to the gamma parameterizations when there is just one residual variance
as it it easier to set initial values in terms of these ratios.
This parameterization was used in some of the early development of REML for the traditional mixed model.
This leads to an alternative specification of the random effects:
u ~ N(0, σ2eG (σg))
and
e ~ N(0, σ2eR(σr))
where γg and γr are the parameters
for the scaled (by σ2e) G and R matrices, respectively;
R(γr) is actually a correlation (or identity) matrix.
The variance matrix for y is then
var(y)=σ2eZ G(γg)Z' + R(γr).
Parameter types
Each sigma in σg and σr and each gamma in γg
and γr has a parameter type,
for example, variance, variance ratio, autocorrelation parameter, factor loading, distance.
Furthermore, the parameters in σg, σr, γg and γr
can have different types.
For example, the spatial analysis of a simple column trial would involve a variance
or variance ratio and a spatial autocorrelation parameter.
Variance structures for the random model terms
The random model terms ui in u define the random effects and associated design matrices,
Zi∈Z , but a variance structure is required for each term before the model can be fitted.
If nothing is specified, σ2iIqi is assumed.
Other structures are specified
by using functions applied to the model term, or to components of the model term
in an interaction,
to define Gi. This produces a
consolidated model term
that simultaneously defines both the design matrix (Zi) and variance model (Gi).
A random model term may comprise either a single factor or
be an interaction of several component factors to give a compound model term.
Consider a compound model term represented by A.B, where the component factors A and B
have m and n levels respectively and the ˙ . operator
forms a model term with levels corresponding to the combinations of all levels of A with all levels of B.
The effects abij for A.B are generated with the levels of B nested in the levels of A,
i.e. the levels of B cycling fastest:
(ab)= (ab11, ab12, : ab1n, ab21, ab22, : ab2n, : abm1, abm2, : abmn)'.
If we propose variance structures A = [Aij] for A and B = [Bkl] for B,
the variance structure for A.B is given by the direct product A⊗B.
This means that the covariance between two effects abik and abjl in (ab)
is constructed as the product of the covariance between ai and aj ( Aij )
and the covariance between bk and bl (Bkl ) giving
cov(abik , abjl )= Aij Bkl .
Note that for direct product structures, only one of the component 'variance structure' matrices should
contain estimable variances. The other can be an Identity, a correlation matrix or some other
known matrix with no estimable variance parameter; you can't estimate the two elements of a product individually
if you only know the product.
Direct product structures
Mathematically, the
direct product of two matrices A(m × p) and B(n × q) is
A⊗B giving
| | A11 B | : | A1p B
|
| : | : | :
|
| Am1 B | : | Amp B |
Variance structures formed as direct product construction are known as separable variance structures
and we call the assumption that a separable variance structure is plausible the assumption of separability.
A small direct product structure
If A has 3 levels and B has 2 levels, then the term A.B would have the 6 levels:
(ab)= (ab11, ab12, ab21, ab22, ab31, ab32)'.
Writing ΣA and ΣB as
| | A11 | A12 | A13 |
| | | |
|
| | A21 | A22 | A23 | | and | | B11 | B12
|
| | A31 | A32 | A33 | | | | B21 | B22
|
then cov(ab21 , ab32 ) = A23 B12
Direct products in R structures
Consider a vector of common errors associated with an experiment. The
usual least squares assumption (and the default in ASReml) is that these
are independently and identically distributed (IID). However, if e was from a field
experiment laid out in a rectangular array of r rows by c columns,
we could arrange the residuals as a matrix and might consider that they were
autocorrelated within rows and columns. Writing the residuals as a vector in field order,
that is, by sorting the residuals rows within columns (plots within blocks)
the variance of the residuals might then be
σ2eΣr(ρr)⊗ Σc(ρc)
where Σr(ρr) and Σc(ρc) are correlation
matrices for the row model (order r, autocorrelation parameter
ρr) and column model (order c, autocorrelation parameter
ρc), respectively. More specifically, a two-dimensional separable
autoregressive spatial structure (AR1 ⊗ AR1) is
sometimes assumed for the common errors in a field trial analysis (see
Gogel (1997) and Cullis {em et al}. (1998) for examples). In this case
Σr and Σc are like
|
| 1 | | | |
| |
| 1 | | | |
|
| | ρr | 1 | | |
| | | ρc | 1 | | |
|
| | ρr2 | ρr | 1 | |
| | and | ρc2 | ρc | 1 | |
|
| | : | : | : | : |
| | | : | : | : | : |
|
| | ρrr-1 | ρrr-2 | ρrr - 3 | : | 1
| | | ρcc-1 | ρcc-2 | ρcc - 3 | : | 1
|
Alternatively, the residuals might relate to a multivariate analysis
with nt traits and n units and be ordered traits within
units. In this case an appropriate variance structure might be
In ⊗ Σ
where Σ(nt× nt is a general or unstructured
variance matrix.
Direct products in G structures
Likewise, the random terms in u in the model may have a direct
product variance structure. For example, for a field trial with s sites, g
varieties and the effects ordered varieties within sites, the model
term site.variety, assumed random, may have the variance structure
Σ ⊗ Ig
where Σ is the variance matrix for sites.
This would imply that the varieties are independent random effects
within each site, have different variances at each site, and are
correlated across sites.
Whenever a
random term is formed as the interaction of two factors you should
consider whether the IID assumption is
sufficient or if a direct product structure might be more appropriate.
Common consolidated model terms
This section lists the common variance structures used in ASReml. Some common variables used in mixed models
are:
range/row/column: index field plots in field order
site/location/expt: associate plots in the same place
animal/sire/dam/genotype/entry genetically related individuals
ar1(row).ar1v(column)
The ar1() variance function defines an autoregressive correlation matrix with one correlation parameter.
Appending v to the name scales the matrix by a second variance parameter. This model term, typically used
for the residual in the spatial analysis of a field trial, has a row autocorrelation, a column
autocorrelation and a variance parameter. When analysing data from several sites together, the model term becomes
sat(site).ar1(row).ar1v(column) which is expanded to define a separate model term for each site.
units.us(Trait)
units is an internally defined factor indexing the rows of the data file.
Trait is an internally defined factor indexing the variables involved in a multivariate analysis.
The us() variance function has a variance parameter for each trait/variable and a covariance
for each pairwise combination; ten parameters for four traits.
nrm(animal)
The nrm() variance function associates a known (previously defined) Numerator Relationship Matrix (NRM)
with the animal factor. When used alone, a variance parameter, the genetic variance, is estimated.
The grm() variance function associates a known (previously defined) Genomic Relationship Matrix (GRM)
with its argument.
us(Trait).nrm(animal)
estimates a genetic variance for each trait, and a genetic covariance for each pair of traits.
coruh(site).genotype
The coru() variance function defines a uniform correlation matrix. The h estimates a
separate genetic variance for each site, to scale the correlation matrix. In this model term,
genotypes are treated as independent.
xfa1(site).grm(genotype)
The xfa1() variance function defines a factor analytic variance structure
(ΓΛ'+Ψ) with 1 factor (Γ has 1 column, called loadings,
Ψ is diagonal with values called specific variances). This is an intermediate
model having more parameters than coruh() but less than us().
It is also a sparser model to fit than coruh() or us() when there are many sites (more than 4).
The model term also includes genomic relationships among genotypes in the second dimension
of the site.genotype table.
rrd1(site).grm(genotype)
This is expanded to rr1(site).grm(genotype) + diag(site).grm(genotype).
The rr1() variance function is the xfa1() variance function without specific variances; it estimates the covariances.
The diag() variance function has all covariances zero. So the sum is
equivalent to the regular xfa1() variance function. However, the mixed model equations have a sparser form
when interacted with the dense grm(genotype) component and so runs faster, especially for
many sites.
Return to index