New/Changed job control qualifiers
Summary of new qualifiers associated with relationship matrices
Some new qualifiers for
the GRM/GIV and GRR lines have been added to enable the saving,
and keeping of relationship matrices and the formation of other relationship matrices,
!DOMINANCE
on the GRR line invokes the formation of a dominance relationship matrix
as well as the usual additive relationship matrix from marker data.
!EIGTRANSFORM
on the GRM line invokes a singular value decomposition (SVD) of the G
(G = UDU' with UU' = I and D diagonal), writes them to files,
transforms the fitted linear from y = XB + Zu where var(u)=G to
U'y = U'X B + Zu* where var(u*) is D.
!EPISTATIC
used on the GRR line invokes the formation of a epistatic matrices by forming Dadamard
GA# GA (GA# GD and GD# GD) from marker data.
!HINV
file and associated qualifiers !HSKIP h,!OMEGA ω and !TAU τ
on the GRM line invokes a H (or hybrid) matrix. The H matrix is a particular form of a G
obtained by merging an A (numerator relationship) matrix with a G (genomic relationship)
pertaining to a subset of the genotypes in A (Legarra et al., 2009).
!KEEPGRM
This GIV/GRM qualifier instructs ASReml to keep in memory the GRM matrix as well as its inverse.
It is needed when the genetic trimming option (see below) used to fit the model faster by using
multiple inverses of reduced dimension, rather than repeated use of the whole G matrix.
!SAVEGIV [f]
used on the pedigree line is used to save the G^(-1) matrix in a binary form so that it
can be read back in ASReml, and so save re-computation and have reduced reading time.
This facility has been extended to pedigree and GRR lines.
If f = 3 the inverse matrix is written as a binary .sgiv file in single precision
and f = 4 writes the inverse matrix as a binary .dgiv file in double precision.
If used with !KEEPGRM on the GIV/GRM lines, the uninverted G matrix is also
written to .sgrm/.dgrm binary files.
Dominance and Epistatic GRM matrices for Marker Data
GRM matrices are typically formed from marker matrices (M) in which SNPs are coded
0, 1 or 2 and arranged with genotypes in rows (lines of the file) and SNPs as data fields.
Vitezica et al. (2017) define
GA=MM'/[2Σ1,m pj(1-pj ]
where M has elements (0 - 2pj), (1 - 2pj) and (2 - 2pj) for genotypes AA, AB and BB,
respectively, 2pj being the mean for SNP j.
GD=KK'/[4Σ1,m pj2 (1-pj2)]
where K has elements -2pj2, 2pj(1 - 2pj) and -2(1-pj2 ) for genotypes AA, AB and BB, respectively. Also,
GAA =GA#GA/(tr(GA#GA)/n)
GAD =GA#GD/(tr(GA#GA)/n)
GDD =GD#GD/(tr(GD#GD)/n)
where # represents the Hadamard product and tr() is the trace function.
The matrices are available in ASReml with the use of the !DOMINANCE and !EPISTATIC
qualifiers on the .grr file specification line, which supplies the file containing the
marker (0, 1, 2) matrix, where !DOMINANCE requests GDbe formed (along with GA,
and !EPISTATIC requests GAA be formed (also GAD and GDD if GD is available).
Eigen-Model Transformation with !EIGTRANSFORM
In the context of needing to fit a large genomic Animal model to many traits, an eigenvector
transformation of the model design matrix can result in faster processing
(for more details, see Lee and van der Werf 2016).
This idea is relevant when the GRM term dominates the model.
It essentially turns the part of the coefficient matrix relating to genomic effects to
a diagonal structure at the expense of turning the (much fewer) non-genomic effects into dense equations.
The !EIGTRANSFORM qualifier on the GRM line invokes a singular value decomposition (SVD) of
G (G = UDU' with UU' = I and D diagonal), holds the U' and D for use in transforming (U)
and fitting (D) the model,
and writes them to
file (..._D.sgrm, ..., _U.sgrm). If the U' and D files already exist, U' and D
are retrieved from them. Then, if this GRM is specified in the model,
and the model and data meet the necessary requirements,
the model design matrix is transformed (pre-multiplied by U' )
except for the columns specific to the GRM; D is used as the GRM variance matrix for the
genetic effects in the analysis of the transformed model so that this block of the
C matrix remains diagonal.
Generally, this will run much faster.
The eigen-transformation is only possible when there is one data record for each genotype,
as occurs with the Animal model. A large amount of workspace is required for the
process of transforming the design matrix.
The fitted effects between the two models agree except for the genomic BLUPs.
The BLUPs from the conventional BLUP (u) are calculated as u = Us, where s are the
BLUPs from the transformed analysis. ASReml 4.3 does not calculate them at present.
For example, for the common genomic animal model
!WORK 6
10K bivariate data
ID !A !LL20 !L ID_order.txt
CG * CGs * # contemporary group factors
imf sf5 # response variates
A.sgiv !GDENSE # A inverse genomic matrix
data.csv !SKIP 1 # data file in same order as A22
imf ~ CG !r grm1(ID)
the equivalent model and two other similar equivalent models can be fitted with
!WORK 6 !RENAME 1 !ARG 1 2 3
10K bivariate data !DOPART $1
ID !A !LL 20 !L IDorder.txt
CG * CGs * # contemporary group factors
imf sf5 # response variates
CG 376 # CG design matrix
A.grm !EIGENTRANSFORM # instruction for eigen value analysis
data.csv !SKIP 1 # data file
!PART 1
imf ~ CG !r grm1(ID) # transformed model
!PART 2
sf5 ~ CG !r grm1(ID) # transformed model
!PART 3
imf sf5 ~ Trait !r us(Trait).grm1(ID) !f Trait.CG # bivariate transf. model
The !EIGTRANSFORM qualifier performs an SVD (Singular Value Decomposition, eigen analysis) of the
G matrix and holds the eigen values and vectors. It also flags that when the model is fitted,
the design matrix is to be transformed using the eigen vectors.
This approach only works when the data file has the same (or fewer) rows as the GRM matrix and
the number of other effects in the model is substantially less than the number of genotypes.
It will save considerable time when there are many response variates with no missing values to
analyse, especially for bivariate analyses.
Comparing the two models given in the code above, with 9,688 genotypes, the conventional
analysis took 9 sec (to invert GRM) + 8 43 sec (for 8 iterations). The overhead of the
SVD factorization was 160 sec and 8 sec (for transforming the model); the transformed
analysis took 8 2 sec for 8 iterations. A conventional bivariate analysis took 8 313 sec.
The transformed bivariate analysis took 8 13 sec, a considerable difference.
The fitted effects between the two models agree except for the genomic BLUPs.
As indicated before, the BLUPs from the conventional BLUP (u) are calculated as u = Us,
where s are the BLUPs from the transformed analysis.
Genetic Trimming in MET Models with !KEEPGRM, sat()and !VGROUP
The situation where not all genotypes are observed in all environments is becoming more common.
That is, when each environment (Env) samples different subsets of the genotypes (Gen).
This, when combined with the use of a GRM matrix to specify the genetic relationships,
can quickly result in a large and fairly dense set of equations to solve.
The approach presented here is motivated by the work of Mazur (2021) and concerns the fitting of
genotype by environment factor analytic models with a dense grm(Gen) matrix term.
As mentioned in the previous section,
xfak(Env).grm(Gen)
can be more efficiently fitted as
rrk(Env).grm(Gen) + diag(Env).grm(Gen).
Mazur (2021) suggested replacing diag(Env).grm(Gen)
by a set of separate model terms, one for each level of Env, using only the subset of genotypes
with data for that level (i) of Env and using the associated reduced GRM matrices.
He calls this operation trimming, and leads to the same variance model (hence,
identical variance parameter estimates in the two models).
The expanded model limits the prediction of genotypes to the environments where they occur,
and this can result in a dramatic reduction of computing time, as a result of creating
a separate model term for each environment using a reduced environment-specific GRM matrix.
In particular, you may just want an average genotype effect (across environments) or
relative genotype effects for only those genotypes present at a particular environment.
This operation of forming separate variance structures for each level of environment is
analogous to the operation at residual level of forming similar variance models for each
level of sat(), and the sizes are defined by the data associated with each level of sat().
ASReml 4.3 uses
sat(Env).grm(Gen)
to specify this operation. Each of the expanded terms is denoted in the output file as
sat(Env,i).grm(Gen|Env_i)
where i indexes the environment, grm(Gen|Env_i) denotes the reduced GRM matrix specific for the
genotypes present at environment i. With this model, the rrk(Env).grm(Gen) term fits the common
(factor) effects and the sat(Env).grm(Gen) terms are the environment-specific 'specific'
variances (genetic deviation from the common effect).
In order to form the reduced GRM matrices the full GRM matrix needs to be available and the
GRM qualifier !KEEPGRM is used for this.
In some analyses, experiments are grouped together, and the same subset of genotypes is used
in all experiments in a group. This means that a smaller set of reduced GRM (and GIV) needs to be formed.
The qualifier !VGROUP Group has been introduced to allow that, for example
sat(Exp !VGROUP Group).grm(Gen)
The nested association between Exp and Group is deduced from the data, and the
generated model terms are denoted in the output file as
sat(Exp,i).grm(Gen|Group_j)
where Gen|Group_j is the 'subset' of genotypes associated with group j,
and is fitted using a reduced GRM pertaining to those genotypes in the data pertaining to experiment i,
which is a member of group j.
We can also use !VPGROUP to constrain the variance parameters for a parameter-reduced version of this model.
Consider the untrimmed model
Yield ~ mu Exp !r Rep.Exp Block.Rep.Exp grm(Gen) idv(Exp).grm(Gen)
This model uses the same variance parameter for all Exp.Geno effects. A trimmed version of this model is
Yield ~ mu Exp !r Rep.Exp Block.Rep.Exp grm(Gen) sat(Exp !VPGROUP mu).grm(Gen)
with !VPGROUP mu constraining the parameters associated with Exp.Gen to be the same.
There is a more detailed wheatgrass example in Section 16.8.3 of the User Guide.
In this example, the trimmed model took 33.3 seconds per iteration compared with 116.7
seconds per iteration for the original model.
See here
for more detailed timings
which are consistent with the findings of Mazur (2021).
Forming the H matrix
The H (hybrid) matrix is a particular form of a G matrix obtained by merging an A
(numerator relationship) matrix with a G (genomic relationship) matrix pertaining
to a subset of the genotypes in A (Legarra et al., 2009).
Hτ,ω-1 =
A-1 + diag( 0, τ G-1 - ω A22 -1 )
where A22-1
is the inverse of the portion of the A matrix that contains the genotyped individuals.
Note that the cells present in the H inverse can be tuned by τ and ω,
which have default values of 1 (for more details, see Martini et al. 2018).
The qualifiers
!HINV [!HSKIP h] [!OMEGA ω] [!TAU τ] GRM_ID_file.txt
on a GRM definition line forms the special G inverse known as an H inverse.
A pedigree from which to form an A inverse must have already been specified.
The genotype identifiers in the G matrix are to be specified as a list in the
ASCII file provided as the argument to the !HINV qualifier.
All identifiers must also be present in the pedigree.
In order to form the H-1 matrix, the pedigree file from which
the A-1 matrix is first formed,
then the G (or its inverse) matrix is specified and read, and finally, the !HINV qualifier forms the H-1
matrix. Note that the GRM line with the !HINV qualifier, defines two inverse matrices available
for use in the model: the G-1 and the H-1.
Therefore, we could fit any of nrm(ID), grm1(GID) or grm2(ID) in the model.
This assumes that the levels associated with the !HINV qualifier are included in the pedigree.
ASReml saves the H-1 matrix as a binary file (filename given in the output) which can subsequently
be used directly (saving the setup time in the subsequent runs).
Dealing with Binary G Matrices
ASReml 4.3 can read in a GRM matrix in various forms defined by content (G or G-1),
form (ASCII, REAL_S or REAL_R) and layout (row-wise or cell-wise).
Their content and form are indicated by the following filename extensions
| content | ASCII | REAL_S | REAL_D | REAL_R |
| G | .grm | .sgrm, .bgrm | .dgrm | .rgrm |
| G-1 | .giv | .sgiv, .bgiv | .dgiv | .rgiv |
The binary forms are preferred because the files are smaller and accuracy is greater.
Providing the inverse and log-determinant of the matrix saves processing time calculating them.
REAL_S refers to the FORTRAN sequential binary file structure in which each 'record'
is enclosed in a 'wrapper' indicating the record size in bytes.
When ASReml writes a binary file, it uses the REAL_S file structure.
REAL_D is a double-precision binary file, but the extra precision is generally unnecessary.
REAL_R refers to the R(C) binary form, which does not include any 'record size' information.
Typically, G and G-1) are dense matrices ((nearly) all cells non-zero) and are half-stored.
However, some very large matrices have significant sparsity (most cells are zero).
Note that
.sgiv, .sgrm, .bgrm, and .bgiv files are binary, single precision 'sequential' files containing sparse or dense matrices of various styles.
.rgiv and .rgrm files are 'C-style' binary, single precision, written by R as binary versions for the ASCII layouts.
and .dgrm files are binary, dense format double precision files.
More on binary GRM files.
Back to What's New
Return to index