New/Changed job control qualifiers

Summary of new qualifiers associated with relationship matrices

Some new qualifiers for the GRM/GIV and GRR lines have been added to enable the saving, and keeping of relationship matrices and the formation of other relationship matrices,

!DOMINANCE      on the GRR line invokes the formation of a dominance relationship matrix as well as the usual additive relationship matrix from marker data.

!EIGTRANSFORM      on the GRM line invokes a singular value decomposition (SVD) of the G (G = UDU' with UU' = I and D diagonal), writes them to files, transforms the fitted linear from y = XB + Zu where var(u)=G to U'y = U'X B + Zu* where var(u*) is D.

!EPISTATIC     used on the GRR line invokes the formation of a epistatic matrices by forming Dadamard GA# GA (GA# GD and GD# GD) from marker data.

!HINV file and associated qualifiers !HSKIP h,!OMEGA ω and !TAU τ     on the GRM line invokes a H (or hybrid) matrix. The H matrix is a particular form of a G obtained by merging an A (numerator relationship) matrix with a G (genomic relationship) pertaining to a subset of the genotypes in A (Legarra et al., 2009).

!KEEPGRM     This GIV/GRM qualifier instructs ASReml to keep in memory the GRM matrix as well as its inverse. It is needed when the genetic trimming option (see below) used to fit the model faster by using multiple inverses of reduced dimension, rather than repeated use of the whole G matrix.

!SAVEGIV [f]      used on the pedigree line is used to save the G^(-1) matrix in a binary form so that it can be read back in ASReml, and so save re-computation and have reduced reading time. This facility has been extended to pedigree and GRR lines. If f = 3 the inverse matrix is written as a binary .sgiv file in single precision and f = 4 writes the inverse matrix as a binary .dgiv file in double precision. If used with !KEEPGRM on the GIV/GRM lines, the uninverted G matrix is also written to .sgrm/.dgrm binary files.

Dominance and Epistatic GRM matrices for Marker Data

GRM matrices are typically formed from marker matrices (M) in which SNPs are coded 0, 1 or 2 and arranged with genotypes in rows (lines of the file) and SNPs as data fields. Vitezica et al. (2017) define
GA=MM'/[2Σ1,m pj(1-pj ]
where M has elements (0 - 2pj), (1 - 2pj) and (2 - 2pj) for genotypes AA, AB and BB, respectively, 2pj being the mean for SNP j.
GD=KK'/[4Σ1,m pj2 (1-pj2)]
where K has elements -2pj2, 2pj(1 - 2pj) and -2(1-pj2 ) for genotypes AA, AB and BB, respectively. Also,
GAA =GA#GA/(tr(GA#GA)/n)
GAD =GA#GD/(tr(GA#GA)/n)
GDD =GD#GD/(tr(GD#GD)/n)
where # represents the Hadamard product and tr() is the trace function.
The matrices are available in ASReml with the use of the !DOMINANCE and !EPISTATIC qualifiers on the .grr file specification line, which supplies the file containing the marker (0, 1, 2) matrix, where !DOMINANCE requests GDbe formed (along with GA, and !EPISTATIC requests GAA be formed (also GAD and GDD if GD is available).

Eigen-Model Transformation with !EIGTRANSFORM

In the context of needing to fit a large genomic Animal model to many traits, an eigenvector transformation of the model design matrix can result in faster processing (for more details, see Lee and van der Werf 2016). This idea is relevant when the GRM term dominates the model. It essentially turns the part of the coefficient matrix relating to genomic effects to a diagonal structure at the expense of turning the (much fewer) non-genomic effects into dense equations.

The !EIGTRANSFORM qualifier on the GRM line invokes a singular value decomposition (SVD) of G (G = UDU' with UU' = I and D diagonal), holds the U' and D for use in transforming (U) and fitting (D) the model, and writes them to file (..._D.sgrm, ..., _U.sgrm). If the U' and D files already exist, U' and D are retrieved from them. Then, if this GRM is specified in the model, and the model and data meet the necessary requirements, the model design matrix is transformed (pre-multiplied by U' ) except for the columns specific to the GRM; D is used as the GRM variance matrix for the genetic effects in the analysis of the transformed model so that this block of the C matrix remains diagonal.

Generally, this will run much faster. The eigen-transformation is only possible when there is one data record for each genotype, as occurs with the Animal model. A large amount of workspace is required for the process of transforming the design matrix. The fitted effects between the two models agree except for the genomic BLUPs. The BLUPs from the conventional BLUP (u) are calculated as u = Us, where s are the BLUPs from the transformed analysis. ASReml 4.3 does not calculate them at present. For example, for the common genomic animal model
 !WORK 6
 10K bivariate data
 ID !A !LL20 !L ID_order.txt
 CG * CGs *		    # contemporary group factors
 imf sf5		    # response variates
 A.sgiv !GDENSE		    # A inverse genomic matrix
 data.csv !SKIP 1	    # data file in same order as A22
 imf ~ CG !r grm1(ID)		
the equivalent model and two other similar equivalent models can be fitted with
 !WORK 6 !RENAME 1 !ARG 1 2 3
 10K bivariate data    !DOPART $1
  ID !A !LL 20 !L IDorder.txt
  CG * CGs *			 # contemporary group factors
  imf sf5			 # response variates
  CG  376			 # CG design matrix
 A.grm !EIGENTRANSFORM	 # instruction for eigen value analysis
 data.csv !SKIP 1	         # data file

 !PART 1
 imf ~ CG !r grm1(ID)	 # transformed model
 !PART 2
 sf5 ~ CG !r grm1(ID)	 # transformed model
 !PART 3
 imf sf5 ~ Trait !r us(Trait).grm1(ID) !f Trait.CG # bivariate transf. model
The !EIGTRANSFORM qualifier performs an SVD (Singular Value Decomposition, eigen analysis) of the G matrix and holds the eigen values and vectors. It also flags that when the model is fitted, the design matrix is to be transformed using the eigen vectors.

This approach only works when the data file has the same (or fewer) rows as the GRM matrix and the number of other effects in the model is substantially less than the number of genotypes.

It will save considerable time when there are many response variates with no missing values to analyse, especially for bivariate analyses. Comparing the two models given in the code above, with 9,688 genotypes, the conventional analysis took 9 sec (to invert GRM) + 8 43 sec (for 8 iterations). The overhead of the SVD factorization was 160 sec and 8 sec (for transforming the model); the transformed analysis took 8 2 sec for 8 iterations. A conventional bivariate analysis took 8 313 sec. The transformed bivariate analysis took 8 13 sec, a considerable difference.

The fitted effects between the two models agree except for the genomic BLUPs. As indicated before, the BLUPs from the conventional BLUP (u) are calculated as u = Us, where s are the BLUPs from the transformed analysis.

Genetic Trimming in MET Models with !KEEPGRM, sat()and !VGROUP

The situation where not all genotypes are observed in all environments is becoming more common. That is, when each environment (Env) samples different subsets of the genotypes (Gen). This, when combined with the use of a GRM matrix to specify the genetic relationships, can quickly result in a large and fairly dense set of equations to solve.

The approach presented here is motivated by the work of Mazur (2021) and concerns the fitting of genotype by environment factor analytic models with a dense grm(Gen) matrix term. As mentioned in the previous section,
xfak(Env).grm(Gen)     can be more efficiently fitted as rrk(Env).grm(Gen) + diag(Env).grm(Gen).

Mazur (2021) suggested replacing diag(Env).grm(Gen) by a set of separate model terms, one for each level of Env, using only the subset of genotypes with data for that level (i) of Env and using the associated reduced GRM matrices. He calls this operation trimming, and leads to the same variance model (hence, identical variance parameter estimates in the two models).

The expanded model limits the prediction of genotypes to the environments where they occur, and this can result in a dramatic reduction of computing time, as a result of creating a separate model term for each environment using a reduced environment-specific GRM matrix. In particular, you may just want an average genotype effect (across environments) or relative genotype effects for only those genotypes present at a particular environment. This operation of forming separate variance structures for each level of environment is analogous to the operation at residual level of forming similar variance models for each level of sat(), and the sizes are defined by the data associated with each level of sat(). ASReml 4.3 uses
sat(Env).grm(Gen) to specify this operation. Each of the expanded terms is denoted in the output file as

sat(Env,i).grm(Gen|Env_i)  
where i indexes the environment, grm(Gen|Env_i) denotes the reduced GRM matrix specific for the genotypes present at environment i. With this model, the rrk(Env).grm(Gen) term fits the common (factor) effects and the sat(Env).grm(Gen) terms are the environment-specific 'specific' variances (genetic deviation from the common effect). In order to form the reduced GRM matrices the full GRM matrix needs to be available and the GRM qualifier !KEEPGRM is used for this.

In some analyses, experiments are grouped together, and the same subset of genotypes is used in all experiments in a group. This means that a smaller set of reduced GRM (and GIV) needs to be formed. The qualifier !VGROUP Group has been introduced to allow that, for example
sat(Exp !VGROUP Group).grm(Gen)

The nested association between Exp and Group is deduced from the data, and the generated model terms are denoted in the output file as sat(Exp,i).grm(Gen|Group_j) where Gen|Group_j is the 'subset' of genotypes associated with group j, and is fitted using a reduced GRM pertaining to those genotypes in the data pertaining to experiment i, which is a member of group j.

We can also use !VPGROUP to constrain the variance parameters for a parameter-reduced version of this model. Consider the untrimmed model
Yield ~ mu Exp !r Rep.Exp Block.Rep.Exp grm(Gen) idv(Exp).grm(Gen)

This model uses the same variance parameter for all Exp.Geno effects. A trimmed version of this model is
Yield ~ mu Exp !r Rep.Exp Block.Rep.Exp grm(Gen) sat(Exp !VPGROUP mu).grm(Gen)
with !VPGROUP mu constraining the parameters associated with Exp.Gen to be the same.

There is a more detailed wheatgrass example in Section 16.8.3 of the User Guide. In this example, the trimmed model took 33.3 seconds per iteration compared with 116.7 seconds per iteration for the original model. See here for more detailed timings which are consistent with the findings of Mazur (2021).

Forming the H matrix

The H (hybrid) matrix is a particular form of a G matrix obtained by merging an A (numerator relationship) matrix with a G (genomic relationship) matrix pertaining to a subset of the genotypes in A (Legarra et al., 2009).

Hτ,ω-1 = A-1 + diag( 0, τ G-1 - ω A22 -1 )
where A22-1 is the inverse of the portion of the A matrix that contains the genotyped individuals. Note that the cells present in the H inverse can be tuned by τ and ω, which have default values of 1 (for more details, see Martini et al. 2018).

The qualifiers
 !HINV [!HSKIP h] [!OMEGA ω] [!TAU τ]  GRM_ID_file.txt
on a GRM definition line forms the special G inverse known as an H inverse. A pedigree from which to form an A inverse must have already been specified. The genotype identifiers in the G matrix are to be specified as a list in the ASCII file provided as the argument to the !HINV qualifier. All identifiers must also be present in the pedigree. In order to form the H-1 matrix, the pedigree file from which the A-1 matrix is first formed, then the G (or its inverse) matrix is specified and read, and finally, the !HINV qualifier forms the H-1 matrix. Note that the GRM line with the !HINV qualifier, defines two inverse matrices available for use in the model: the G-1 and the H-1. Therefore, we could fit any of nrm(ID), grm1(GID) or grm2(ID) in the model. This assumes that the levels associated with the !HINV qualifier are included in the pedigree. ASReml saves the H-1 matrix as a binary file (filename given in the output) which can subsequently be used directly (saving the setup time in the subsequent runs).

Dealing with Binary G Matrices

ASReml 4.3 can read in a GRM matrix in various forms defined by content (G or G-1), form (ASCII, REAL_S or REAL_R) and layout (row-wise or cell-wise). Their content and form are indicated by the following filename extensions
content ASCII REAL_S REAL_D REAL_R
G .grm .sgrm, .bgrm .dgrm .rgrm
G-1.giv .sgiv, .bgiv .dgiv .rgiv
The binary forms are preferred because the files are smaller and accuracy is greater. Providing the inverse and log-determinant of the matrix saves processing time calculating them. REAL_S refers to the FORTRAN sequential binary file structure in which each 'record' is enclosed in a 'wrapper' indicating the record size in bytes. When ASReml writes a binary file, it uses the REAL_S file structure. REAL_D is a double-precision binary file, but the extra precision is generally unnecessary. REAL_R refers to the R(C) binary form, which does not include any 'record size' information. Typically, G and G-1) are dense matrices ((nearly) all cells non-zero) and are half-stored. However, some very large matrices have significant sparsity (most cells are zero). Note that
  • .sgiv, .sgrm, .bgrm, and .bgiv files are binary, single precision 'sequential' files containing sparse or dense matrices of various styles.
  • .rgiv and .rgrm files are 'C-style' binary, single precision, written by R as binary versions for the ASCII layouts.
  • and .dgrm files are binary, dense format double precision files.
  • More on binary GRM files.
  • Back to What's New

    Return to index