Transforming the data file and model so that the large dense GRM is made diagonal.

!EIGTRANSFORM introduction

The !EIGTRANSFORM qualifier on the GRM input file line, obtains the eigen values (D) and eigen vectors (U) of a dense GRM file (XXX.[s]grm),
write them to files (XXX_D.sgrm and XXX_U.sgrm) respectively
and if the GRM is used in the model with one record for each genotype,
transforms the model by premultiplying design matrix by U.

In the context of needing to fit a large genomic animal model to many traits, an eigen vector transformation of the model design matrix can result in faster processing. By animal model, we mean one data record for each genotype. This idea is relevant when the GRM term dominates the model. It essentially turns the part of the coefficient matrix relating to genomic effects to a diagonal structure at the expense of turning the (much fewer) non-genomic effects into dense equations.
For example, the common genomic animal model:
 !WORK 6 !REN 2
 10K bivariate data set              # TITLE Line
 ID !A !LL20 !L data.csv !LSKIP 1    # defining ID
 CG *  CGs *                         # Contemporary group factors
 imf  sf5                            # response variates
 A22.sgiv                            # A22 genomic matrix
 data.csv !skip 1   !GDENSE          # data file in same order as A22
 imf ~ mu CG !r grm1(ID)             # Model for imf
The assumption in this coding is that the genotype is in the first field of the data file and that the A22.grm file is in the same genotype order as the data.
For 10,000 genotypes, every iteration involves inverting a C matrix containing a dense block of order 10,000. !GDENSE is included here because it is faster to do this as a dense matrix operation rather than as a sparse operation since the G matrix is dense.

The !EIGTRANSFORM qualifier on the GRM line invokes a singular value decomposition of G (G=UDU'), holds the U and D for subsequent use and writes them to file (... _D.bgrm, ... _U.bgrm). If the U and D files already exist, U and D are retrieved from file. Then, if this GRM is specified in the model, and the model and data meet the necessary requirements, the model design matrix is transformed (pre-multiplied by U) except for the columns specific to the GRM; D is used as the GRM variance matrix for the genetic effects in the analysis of the transformed model so that this block of the C matrix remains diagonal. Generally this will run much faster. Using !EIGTRANSFORM, the revised job now ends
 A22.grm  !EIGTRANSFORM      # A22 genomic matrix
 data.csv  !skip 1        # data file in same order as A22
 imf ~ mu CG !r grm1(ID)  # Model for imf
With 9688 genotypes and 376 CG classes, the former model takes 122 seconds (for 8 iterations) while the latter takes 160 seconds for the SVD calculation and 14 seconds (8 iterations) to fit the model. The overhead of the SVD calculation is justified when there are several traits to analyse.

The SVD approach has a bigger advantage for bivariate analysis. On this data,
imf sf5 ~Trait Tr.CG !r us(Tr).grm1(ID)
takes 187 seconds per iteration normally (!GDENSE is not available); it takes 12 seconds per iteration with the SVD transformation.

The SVD transformation is only possible when there is one data record for each genotype. A large amount of workspace is required for the process of transforming the design matrix.

The fitted effects between the two models agree except for the genomic BLUPs. The BLUPs from the conventional BLUP (u) are calculated as u=U'a where a are the BLUPs from the transformed analysis. ASReml does not calculate them at present. GRM_basename is the name (without file extension) of the GRM file. GRM files are discussed here. The expected (allowed) file extenstions are .grm or .sgrm. The .grm file is an ASCII file with a line for each cell of the lower triangle matrix in the form row column value. This form is slow to read and not well suited to a large dense matrix. The .sgrm form is a REAL BINARY form with a record for each line; row i contains the values of cells 1:i . If both forms exist, the .sgrm file is read. If only the .grm file is present, a .sgrm file is created from it.

ASReml uses the MKL (OPENBLAS) SVPED routine to factorize the (symmetric) GRM matrix as UDU'. Equivalently, D = U'GU.

Discussion

This approach only works when the data file has the same rows as the GRM matrix and the number of other effects in the model is substantially less than the number of genotypes.

It will save considerable time when there are many response variates to analyse, especially for bivariate analyses.

Comparing the two models given above, with 9688 genotypes,
the convertional analysis took 9s (invert GRM) + 8*43s (for 8 iterations)
the overhead of the SVD factorization was 160 secs and 8s (for transforming the data)
the transformed analysis took 2 sec per iteration.
A conventialal bivariate analysis took 8*313 sec
The transformed bivariate analysis took 8*13 sec

The fitted effects between the two models agree except for the genomic BLUPs. The BLUPs from the conventional BLUP (u) are calculated as u=U'a where a are the BLUPs from the transformed analysis. ASReml does not calculate them at present.
  • Also see here

    Return to index