Forms of binary GRM matrices

Introduction

Binary files have the advantage of holding information concisely and of being read faster. The disadvantage is that they cannot be readily examined. ASReml was first developed to fit 'animal' models using the pedigree based numerator relationship matrix (NRM). The inverse NRM (A-1) is sparse. Later, the dense genomic relationship matrix ( GRM) was devised and ASReml was extended to fit models using such matrices as supplied by the user, or derived directly from a marker matrix. ASReml can now read in a GRM matrix in various forms defined by content (G or G-1), form (ASCII, REAL_S or REAL_R) and layout (row-wise or cell-wise). These options are now discussed.

The content and form is indicated by the filename extension:
 Content   ASCII     REAL_S     REAL_R
 G	 .grm	 .bgrm, .sgrm	 .rgrm
 G-1	 .giv	 .bgiv, .sgiv	 .rgiv
The binary forms are preferred because the files are smaller and accuracy is greater. Providing the inverse and log-determinant of the matrix saves processing time calculating them. REAL_S refers to the Fortran sequential binary file structure in which each 'record' is enclosed in a 'wrapper' indicating the record size in bytes. When ASReml writes a binary file, it uses the REAL_S file structure. REAL_R refers to the R (C) binary form which does not include any 'record size' information.

Typically, G and G-1 are dense matrices ((nearly) all cells non-zero) and are half-stored. However, some very large matrices have significant sparsity (most cells are zero).

Some common layouts are now discussed with respect to R and an NRM matrix of order 10 based on the pedigree:
 ID Sire Dam
 1 0 0
 2 0 0
 3 0 0
 4 1 1
 5 1 1
 6 2 2
 7 4 6
 8 5 6
 9 7 8
 10 9 9
GRM matrices in ASReml are half-stored row-wise in either dense or sparse layouts. The dense layout, for NR rows, means each of NR*(NR+1)/2 cells are explicitly stored row-wise and so a particular cell I,J can be found at position I*(I-1)/2 + J where I >= J. The sparse layout collapses the vector of values, dropping (non-diagonal) values of zero. It uses a second vector, parallel to the values vector to specify the column (J) values for each cell; row (I) values are implicit in the order of values with the diagonal cells always retained.

ASCII half-stored G-1 matrix cell-wise

Processessing the small pedigree shown above, with the !SAVE qualifier to form the inverse relationship matrix, creates the ped_A.giv file shown below. It contains qualifiers for the log determinant and the number of genetic groups. The first field is the row number, the second is the column number and the third is the matrix cell value. ASReml can read this .giv file back in as a G-1 matrix.
  !LDET -6.6130181  !GROUPSDF 0
      1     1   5.000000000
      2     2   3.000000000
      3     3   1.000000000
      4     1  -2.000000000
      4     4   3.000000000
      5     1  -2.000000000
      5     5   3.000000000
      6     2  -2.000000000
      6     4   1.000000000
      6     5   1.000000000
      6     6   4.000000000
      7     4  -2.000000000
      7     6  -2.000000000
      7     7   4.500000000
      8     5  -2.000000000
      8     6  -2.000000000
      8     7  0.5000000000
      8     8   4.500000000
      9     7  -1.000000000
      9     8  -1.000000000
      9     9   4.909090909
     10     9  -2.909090909
     10    10   2.909090909

ASCII half-stored G matrix row-wise

The information in the .giv file was used to create a matrix (Ainv) in R which was inverted to form an NRM matrix.
 > NRM = solve(Ainv)
 > write(round(NRM,5),'NRM.grm')
Produces NRM.grm containing Literal "V1" "V2" "V3" "V4" "V5" "V6" "V7" "V8" "V9" "V10" "1" 1 0 0 1 1 0 0.5 0.5 0.5 0.5 "2" 0 1 0 0 0 1 0.5 0.5 0.5 0.5 "3" 0 0 1 0 0 0 0 0 0 0 "4" 1 0 0 1.5 1 0 0.75 0.5 0.625 0.625 "5" 1 0 0 1 1.5 0 0.5 0.75 0.625 0.625 "6" 0 1 0 0 0 1.5 0.75 0.75 0.75 0.75 "7" 0.5 0.5 0 0.75 0.5 0.75 1 0.625 0.8125 0.8125 "8" 0.5 0.5 0 0.5 0.75 0.75 0.625 1 0.8125 0.8125 "9" 0.5 0.5 0 0.625 0.625 0.75 0.8125 0.8125 1.3125 1.3125 "10" 0.5 0.5 0 0.625 0.625 0.75 0.8125 0.8125 1.3125 1.65625 ASReml can read this and 3 variations back in as a G matrix: without labels, without elements above diagonal and without both. Both these layouts (cell-wise and row-wise) can be used for G and G-1 files, according to the file extension.

If the !LDET qualifier is omitted in a .giv file, ASReml will calculate the log-determinant. The !GROUPSDF qualifier is only needed when the G-1 matrix is actually an A-1 formed with genetic groups.

REAL_S BINARY half-stored files ( .bgiv, .sgiv, .bgrm, .sgrm)

Usually, these files will be formed by ASReml and given ASReml can read them back, the details are not important. A Fortan binary sequential file of 4w bytes contains w 4byte words. A word is either 32bit integer value or a 32bit real value. The 32bit integer values are either record wrappers specifying the number of bytes in the record, or an integer that is part of the record. You can use the R readBin() function to examine a binary file.

ASReml looks at the leading 12 or so words in the file and will read the file if it appears to match one of the following patterns. In the header line, [..] is a record wrapper, G11 is the first cell of the matrix, Ldet is the log determinant, NG is the number of degrees of freedom associated with genetic groups, NR is the number of rows in the matrix, and 7/77 specifies a particular sparse layout.

For the '7' layout, the file begins
[20] G11 Ldet NG NR  7 [20] [8] 1 2 [8]	[4] 3. [4] ...
and matrix rows 2:NR are written as two records: NV, COL(ROW(I):ROW(I)+NV-1) and VAL(ROW(I):ROW(I)+NV-1) where I is the half row being written, ROW(I) points to the first cell of that row, NV is the number of nonzero cells in the row ending at the diagonal element, COL(...) is the list of column numbers and VAL(I) are the matrix values.

For the '77' layout, the file begins
[20] G11 Ldet NG NR 77 	[20] [12] 1 2 3. [12] ...
and matrix rows 2:NR are written as one record each: NV, (COL(K),VAL(K),K=ROW(I),ROW(I)+NV-1)

A third 'cell-wise' layout with no header begins
 [12]	1	1	G1,1 	[12]	[12]	2	1|2	G2,1|2 	[12]	[12]  ...
and every non-zero cell is specified in a separate record with its row and column index given.

A 'cell-wise' layout with header begins
[12]	NR	NG	Ldet 	[12] 	[12]	1	1	G11 	[12]	[12]  ...
and every non-zero cell is specified in a separate record with its row and column index given.

A 'dense' row-wise layout with header begins
[12]  NR  NG  Ldet  [12]  [4]  G11  [4]	[8]  G21  G22  [8] ...
or
[12]  NR  Ldet	NG  [12]  [4]  G11  [4]	[8]  G21  G22  [8] ...
A 'dense' rowwise layout without header begins
 [4]	G1,1 	[4]	[8]	G2,1 	G2,2 	[8]  ...
Note that a qualifier !SGIV has been added to the pedigree file line to write back A-1 as a sparse binary .sgiv file. To read back A-1 as a G-1 in a subsequent run, several changes will be required to the command filr coding. For example, if the original job (say PED.as) included lines
 Animal !P
 ...
 Pedigree.csv   !SGIV !AIF
 ...
 Y ~ ... !r nrm(Animal)
Copy it as say GIV.as and change it to say
 Animal !A !L  PED.aif	 # Animal !P
 ...
 Pedigree\_A.sgiv  	 #Pedigree.csv   !SGIV !DIAG
 ...
 Y ~ ... !r grm1(Animal) #Y ~ ... !r nrm(Animal)

REAL_R BINARY half-stored files ( .rgiv, .rgrm)

ASReml can also read binary files formed using the R writeBin() function (or by a C program), identified as such by the r in the file extension. Unlike the Fortran sequential binary files described above, these have no record markers and all values are 32bit real values. .rgiv files need the header to specify the log determinant.

Given a matrix called GRM held in R, it can be written to a binary file that ASReml can read by the R code:
 NR <- dim(GRM)[1] # dimension
 Tfile <- file("My.rgrm", "wb")
 for (i in 1:NR)
 writeBin (GRM[1:i,i],Tfile, size=4)
 close(Tfile)
If writing the inverse GRM matrix in R, use the filename extension .rgiv, and include a header line in the file by inserting the R code line writeBin (c(NR, 0, Ldet), Tfile, size=4) after the Tfile line, where Ldet is the log determinant of the GRM matrix usually obtained while inverting the GRM matrix.

For a sparse stored matrix held in a data.frame SAI as described above, use the R code:
 SAI=read.table( 'ped_A.giv',skip=1)
 NV <- dim(SAI)[1] # Get length
 NR <- SAI[NV,1]
 Tfile <- file("SAI.rgiv", "wb")
 writeBin (c(NR, 0, -6.61302), Tfile,  size=4)
 for (i in 1:NV) {writeBin (c(SAI[i,2],SAI[i,3]), Tfile, size=4)}
 close(Tfile)

Return to index