Introducing the linear mixed model

Introduction

The two main uses of the linear mixed model are
  • to estimate treatment effects and test for differences, and
  • to predict genetic merit of individuals.

    In the first case, treatment effects are estimated as fixed effects and reflect what happened in the experiment. Random effects then reflect sources of variation fitted to get the most efficient hypothesis tests.

    In the second case, the random genetic effects are of most interest, having been adjusted for other fixed or random sources of variation.

    If y denotes the n × 1 vector of observations, the linear mixed model can be written as

    y = X τ + Z u + e

    where τ is the p × 1 vector of fixed effects, X is an n × p design matrix of full column rank (after deleting any redundant singular columns) which associates observations with the appropriate combination of fixed effects, u is the q × 1 vector of random effects, Z is the n × q design matrix which associates observations with the appropriate combination of random effects, and e is the n × 1 vector of residual errors.

    The data may be collected from a controlled experiment, or by sampling an existing population.

    Introducing the linear mixed model

    The linear mixed model is also called a linear mixed effects model. It is assumed that
    u ~ N(0, G (σg)
    is independent of (uncorrelated with)
    e ~ N(0, R(σr)
    where the matrices G and R are functions of parameters σg and σr, respectively.

    The variance matrix for y is then of the form
    var ( y) = Z G (σg)Z' + R (σr)
    which we will refer to as the sigma parameterization of the G and R variance structures, and the variance structure parameters in σg and σr will be referred to as sigmas although they are not necessarily all variances. The variance models given by G and R are referred to as G structures and R structures respectively.

    We illustrate these concepts using the simplest linear mixed model.

    A one-way classification

    Consider a one-way classification comprising a single random effect u, and a residual error term e. The two random components of this model, namely u and e, are each assumed to be independent and identically distributed (IID) and to follow a normal distribution such that u~ N(0, σ2uIq) and e~ N(0, σ2eIn). Hence the variance of u has the form
    var (y) = σ2uZZ' + σ2eIn

    This model has two variance structure parameters or sigmas: the variance component σ2u associated with u, and the variance component σ2eassociated with e. Mapping this equation back to the general expression, we have σg2u, G(σg) = σ2uIq, σr2e and R(σr) = σ2eIn .

    G structure for the random model terms}

    Typically, τ and u are composed of several model terms, that is, τ can be partitioned as τ=[τ'1: τ't]' and u can be partitioned as u = [u'1: u'b]' with X and Z partitioned conformably as X = [X1: Xt] and Z = [Z1: Zb].

    For u thus partitioned, we impose a direct sum structure on the matrix G, written G = ⊕i=1:b Gi giving
    G1 0 : 0 0
    0 G2 : 0 0
    : : : : :
    0 0 : Gb-1 0
    0 0 : 0 Gb
    where ⊕ is the direct sum operator, each Gi is of size qi and q = ∑iqi.

    The default assumption is that each random model term generates one component of this direct sum; var(ui) = Gi for i = 1 : b. In this case, the random effects from any two distinct model terms are uncorrelated.

    Traditional Variance components models

    Extending the one-way example to a linear mixed model with more than one (b>1) random terms (typically known as a variance components mixed model), the random effects ui in u, and the residual errors e, are assumed uncorrelated and to each be normally distributed with mean zero and variance given by var(ui)= σ2ui Iqi and var(e) = σ2eIn where Iqi and In are identity matrices of dimension qi and n, respectively. In this case
    var(y)= ∑i=1:b σ2ui ZiZ'i + σ2eIn

    Partitioning the residual error term

    As for the fixed and random model terms, it is often appropriate to partition the vector of residual errors e according to some conditioning factor. We use the term section to describe this partitioning. A common reason for using sections in e is because subsets of the residuals have different variances. For example, in the analysis of multi-environment trials (METs) it is natural to expect that each trial will have a different (possibly spatial) error structure. In this case, for s sections we have e = [e'1: e's]' assuming that the data vector is ordered by section, and where ej represents the vector of errors for the jth section.

    For e partitioned as e = [e1,e2, :,es]' we allow the matrix R to have a similar direct sum structure, with R = ⊕j=1:s Rj giving
    R1 0 : 0 0
    0 R2 : 0 0
    : : : : :
    0 0 : Rs-1 0
    0 0 : 0 Rs
    for s sections and the data ordered by section. It may be necessary to re-order (sort) the data units to achieve this structure.

    While the default variance structure for a section is IID (σ2ej Inj ), a more complex structure is often appropriate. ASReml offers a wide range of variance structures.

    Gamma parameterization for the linear mixed model

    In the sigma parameterization of model (2.3) the scale parameters in both G(σg and R(σr are variances. However, in a univariate analysis with one data section, the one residual variance parameter may be factored out of G(σg) and R(σr) so that
    var (y)= σ2eΣi=1:b γui ZiZ'i + In
    for γui being the ratio of the variance component for the random term ui relative to error variance, that is, γui = σ2ui2e. We will refer to this as the gamma parameterization, and the individual variance structure parameters in γg and γr will be referred to as gammas. ASReml defaults to the gamma parameterizations when there is just one residual variance as it it easier to set initial values in terms of these ratios. This parameterization was used in some of the early development of REML for the traditional mixed model.

    This leads to an alternative specification of the random effects:
    u ~ N(0, σ2eG (σg)) and e ~ N(0, σ2eR(σr))
    where γg and γr are the parameters for the scaled (by σ2e) G and R matrices, respectively; R(γr) is actually a correlation (or identity) matrix. The variance matrix for y is then
    var(y)=σ2eZ G(γg)Z' + R(γr).

    Parameter types

    Each sigma in σg and σr and each gamma in γg and γr has a parameter type, for example, variance, variance ratio, autocorrelation parameter, factor loading, distance. Furthermore, the parameters in σg, σr, γg and γr can have different types. For example, the spatial analysis of a simple column trial would involve a variance or variance ratio and a spatial autocorrelation parameter.

    Variance structures for the random model terms

    The random model terms ui in u define the random effects and associated design matrices, ZiZ , but a variance structure is required for each term before the model can be fitted. If nothing is specified, σ2iIqi is assumed. Other structures are specified by using functions applied to the model term, or to components of the model term in an interaction, to define Gi. This produces a consolidated model term that simultaneously defines both the design matrix (Zi) and variance model (Gi).

    A random model term may comprise either a single factor or be an interaction of several component factors to give a compound model term. Consider a compound model term represented by A.B, where the component factors A and B have m and n levels respectively and the ˙ . operator forms a model term with levels corresponding to the combinations of all levels of A with all levels of B. The effects abij for A.B are generated with the levels of B nested in the levels of A, i.e. the levels of B cycling fastest: (ab)= (ab11, ab12, : ab1n, ab21, ab22, : ab2n, : abm1, abm2, : abmn)'.

    If we propose variance structures A = [Aij] for A and B = [Bkl] for B, the variance structure for A.B is given by the direct product AB. This means that the covariance between two effects abik and abjl in (ab) is constructed as the product of the covariance between ai and aj ( Aij ) and the covariance between bk and bl (Bkl ) giving cov(abik , abjl )= Aij Bkl .

    Note that for direct product structures, only one of the component 'variance structure' matrices should contain estimable variances. The other can be an Identity, a correlation matrix or some other known matrix with no estimable variance parameter; you can't estimate the two elements of a product individually if you only know the product.

    Direct product structures

    Mathematically, the direct product of two matrices A(m × p) and B(n × q) is AB giving
    A11 B : A1p B
    : : :
    Am1 B : Amp B
    Variance structures formed as direct product construction are known as separable variance structures and we call the assumption that a separable variance structure is plausible the assumption of separability.

    A small direct product structure

    If A has 3 levels and B has 2 levels, then the term A.B would have the 6 levels: (ab)= (ab11, ab12, ab21, ab22, ab31, ab32)'. Writing ΣA and ΣB as
    A11 A12 A13
    A21 A22 A23 and B11 B12
    A31 A32 A33 B21 B22
    then cov(ab21 , ab32 ) = A23 B12

    Direct products in R structures

    Consider a vector of common errors associated with an experiment. The usual least squares assumption (and the default in ASReml) is that these are independently and identically distributed (IID). However, if e was from a field experiment laid out in a rectangular array of r rows by c columns, we could arrange the residuals as a matrix and might consider that they were autocorrelated within rows and columns. Writing the residuals as a vector in field order, that is, by sorting the residuals rows within columns (plots within blocks) the variance of the residuals might then be
    σ2eΣrr)⊗ Σcc)
    where Σrr) and Σcc) are correlation matrices for the row model (order r, autocorrelation parameter ρr) and column model (order c, autocorrelation parameter ρc), respectively. More specifically, a two-dimensional separable autoregressive spatial structure (AR1 ⊗ AR1) is sometimes assumed for the common errors in a field trial analysis (see Gogel (1997) and Cullis {em et al}. (1998) for examples). In this case Σr and Σc are like
    1 1
    ρr 1 ρc 1
    ρr2 ρr 1 and ρc2 ρc 1
    : : : : : : : :
    ρrr-1 ρrr-2 ρrr - 3 : 1 ρcc-1 ρcc-2 ρcc - 3 : 1

    Alternatively, the residuals might relate to a multivariate analysis with nt traits and n units and be ordered traits within units. In this case an appropriate variance structure might be InΣ where Σ(nt× nt is a general or unstructured variance matrix.

    Direct products in G structures

    Likewise, the random terms in u in the model may have a direct product variance structure. For example, for a field trial with s sites, g varieties and the effects ordered varieties within sites, the model term site.variety, assumed random, may have the variance structure ΣIg where Σ is the variance matrix for sites. This would imply that the varieties are independent random effects within each site, have different variances at each site, and are correlated across sites. Whenever a random term is formed as the interaction of two factors you should consider whether the IID assumption is sufficient or if a direct product structure might be more appropriate.

    Common consolidated model terms

    This section lists the common variance structures used in ASReml. Some common variables used in mixed models are:
    range/row/column: index field plots in field order
    site/location/expt: associate plots in the same place
    animal/sire/dam/genotype/entry genetically related individuals

    ar1(row).ar1v(column) The ar1() variance function defines an autoregressive correlation matrix with one correlation parameter. Appending v to the name scales the matrix by a second variance parameter. This model term, typically used for the residual in the spatial analysis of a field trial, has a row autocorrelation, a column autocorrelation and a variance parameter. When analysing data from several sites together, the model term becomes sat(site).ar1(row).ar1v(column) which is expanded to define a separate model term for each site.

    units.us(Trait) units is an internally defined factor indexing the rows of the data file. Trait is an internally defined factor indexing the variables involved in a multivariate analysis. The us() variance function has a variance parameter for each trait/variable and a covariance for each pairwise combination; ten parameters for four traits.

    nrm(animal) The nrm() variance function associates a known (previously defined) Numerator Relationship Matrix (NRM) with the animal factor. When used alone, a variance parameter, the genetic variance, is estimated. The grm() variance function associates a known (previously defined) Genomic Relationship Matrix (GRM) with its argument.

    us(Trait).nrm(animal) estimates a genetic variance for each trait, and a genetic covariance for each pair of traits.

    coruh(site).genotype The coru() variance function defines a uniform correlation matrix. The h estimates a separate genetic variance for each site, to scale the correlation matrix. In this model term, genotypes are treated as independent.

    xfa1(site).grm(genotype) The xfa1() variance function defines a factor analytic variance structure (ΓΛ'+Ψ) with 1 factor (Γ has 1 column, called loadings, Ψ is diagonal with values called specific variances). This is an intermediate model having more parameters than coruh() but less than us(). It is also a sparser model to fit than coruh() or us() when there are many sites (more than 4). The model term also includes genomic relationships among genotypes in the second dimension of the site.genotype table.

    rrd1(site).grm(genotype) This is expanded to rr1(site).grm(genotype) + diag(site).grm(genotype). The rr1() variance function is the xfa1() variance function without specific variances; it estimates the covariances. The diag() variance function has all covariances zero. So the sum is equivalent to the regular xfa1() variance function. However, the mixed model equations have a sparser form when interacted with the dense grm(genotype) component and so runs faster, especially for many sites.

    Return to index