| Title: | Analysis of Selection Index in Plant Breeding |
|---|---|
| Description: | Provides tools for the simultaneous improvement of multiple traits in plant breeding. Building upon the classical selection index (Smith 1937 <doi:10.1111/j.1469-1809.1936.tb02143.x>) and modern quantitative genetics (Kang 2020 <doi:10.1007/978-3-319-91223-3>), this package calculates classical phenotypic, genomic, marker-assisted, restricted/constrained, and eigen selection indices. It also incorporates multi-stage selection evaluation and stochastic simulations to estimate genetic advance based on economic weights, heritability, and genetic correlations. |
| Authors: | Zankrut Goyani [aut, cre, cph] |
| Maintainer: | Zankrut Goyani <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 2.0.1.9000 |
| Built: | 2026-05-09 08:17:16 UTC |
| Source: | https://github.com/zankrut20/selection.index |
Implements the Base Index where coefficients are set equal to economic weights. This is a simple, non-optimized approach that serves as a baseline comparison.
Unlike the Smith-Hazel index which requires matrix inversion, the Base Index is computationally trivial and robust when covariance estimates are unreliable.
base_index( pmat, gmat, wmat, wcol = 1, selection_intensity = 2.063, compare_to_lpsi = TRUE, GAY = NULL )base_index( pmat, gmat, wmat, wcol = 1, selection_intensity = 2.063, compare_to_lpsi = TRUE, GAY = NULL )
pmat |
Phenotypic variance-covariance matrix (n_traits x n_traits) |
gmat |
Genotypic variance-covariance matrix (n_traits x n_traits) |
wmat |
Economic weights matrix (n_traits x k), or vector |
wcol |
Weight column to use if wmat has multiple columns (default: 1) |
selection_intensity |
Selection intensity constant (default: 2.063) |
compare_to_lpsi |
Logical. If TRUE, compares Base Index efficiency to optimal LPSI (default: TRUE) |
GAY |
Optional. Genetic advance of comparative trait for PRE calculation |
Mathematical Formulation:
Index coefficients:
The Base Index is appropriate when: - Covariance estimates are unreliable - Computational simplicity is required - A baseline for comparison is needed
List with:
summary - Data frame with coefficients and metrics
b - Vector of Base Index coefficients (equal to w)
w - Named vector of economic weights
Delta_G - Named vector of expected genetic gains per trait
lpsi_comparison - Optional comparison with Smith-Hazel LPSI
## Not run: gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) weights <- c(10, 8, 6, 4, 2, 1, 1) result <- base_index(pmat, gmat, weights, compare_to_lpsi = TRUE) print(result) ## End(Not run)## Not run: gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) weights <- c(10, 8, 6, 4, 2, 1, 1) result <- base_index(pmat, gmat, weights, compare_to_lpsi = TRUE) print(result) ## End(Not run)
Implements the Combined Linear Genomic Selection Index where selection combines both phenotypic observations and Genomic Estimated Breeding Values (GEBVs). This is used for selecting candidates with both phenotype and genotype data (e.g., in a training population).
clgsi( phen_mat = NULL, gebv_mat = NULL, pmat, gmat, wmat, wcol = 1, P_y = NULL, P_g = NULL, P_yg = NULL, reliability = NULL, selection_intensity = 2.063, GAY = NULL )clgsi( phen_mat = NULL, gebv_mat = NULL, pmat, gmat, wmat, wcol = 1, P_y = NULL, P_g = NULL, P_yg = NULL, reliability = NULL, selection_intensity = 2.063, GAY = NULL )
phen_mat |
Matrix of phenotypes (n_genotypes x n_traits). Can be NULL if P_y and P_yg are provided. |
gebv_mat |
Matrix of GEBVs (n_genotypes x n_traits). Can be NULL if P_g and P_yg are provided. |
pmat |
Phenotypic variance-covariance matrix (n_traits x n_traits) |
gmat |
Genotypic variance-covariance matrix (n_traits x n_traits) |
wmat |
Economic weights matrix (n_traits x k), or vector |
wcol |
Weight column to use if wmat has multiple columns (default: 1) |
P_y |
Optional. Phenotypic variance-covariance matrix computed from data (n_traits x n_traits). If NULL (default), uses pmat or computes from phen_mat using cov(). |
P_g |
Optional. GEBV variance-covariance matrix (n_traits x n_traits). If NULL (default), computed from gebv_mat using cov(). |
P_yg |
Optional. Covariance matrix between phenotypes and GEBVs (n_traits x n_traits). If NULL (default), computed from phen_mat and gebv_mat using cov(). |
reliability |
Optional. Reliability of GEBVs (r^2, the squared correlation). See lgsi() for details. |
selection_intensity |
Selection intensity i (default: 2.063 for 10% selection) |
GAY |
Optional. Genetic advance of comparative trait for PRE calculation |
Mathematical Formulation:
The CLGSI combines phenotypic and genomic information:
Coefficients are obtained by solving the partitioned system:
Where:
- = Var(phenotypes)
- = Var(GEBVs)
- = Cov(phenotypes, GEBVs)
- = Genotypic variance-covariance
- = Cov(GEBV, true BV)
List with components:
b_y - Coefficients for phenotypes
b_g - Coefficients for GEBVs
b_combined - Full coefficient vector [b_y; b_g]
P_combined - Combined variance matrix
Delta_H - Expected genetic advance per trait
GA - Overall genetic advance
PRE - Percent relative efficiency
hI2 - Index heritability
rHI - Index accuracy
summary - Data frame with all metrics
Cerón-Rojas, J. J., & Crossa, J. (2018). Linear Selection Indices in Modern Plant Breeding. Springer International Publishing.
## Not run: # Generate example data gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) # Simulate phenotypes and GEBVs set.seed(123) n_genotypes <- 100 n_traits <- ncol(gmat) phen_mat <- matrix(rnorm(n_genotypes * n_traits, mean = 15, sd = 3), nrow = n_genotypes, ncol = n_traits ) gebv_mat <- matrix(rnorm(n_genotypes * n_traits, mean = 10, sd = 2), nrow = n_genotypes, ncol = n_traits ) colnames(phen_mat) <- colnames(gmat) colnames(gebv_mat) <- colnames(gmat) # Economic weights weights <- c(10, 5, 3, 3, 5, 8, 4) # Calculate CLGSI result <- clgsi(phen_mat, gebv_mat, pmat, gmat, weights, reliability = 0.7) print(result$summary) ## End(Not run)## Not run: # Generate example data gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) # Simulate phenotypes and GEBVs set.seed(123) n_genotypes <- 100 n_traits <- ncol(gmat) phen_mat <- matrix(rnorm(n_genotypes * n_traits, mean = 15, sd = 3), nrow = n_genotypes, ncol = n_traits ) gebv_mat <- matrix(rnorm(n_genotypes * n_traits, mean = 10, sd = 2), nrow = n_genotypes, ncol = n_traits ) colnames(phen_mat) <- colnames(gmat) colnames(gebv_mat) <- colnames(gmat) # Economic weights weights <- c(10, 5, 3, 3, 5, 8, 4) # Calculate CLGSI result <- clgsi(phen_mat, gebv_mat, pmat, gmat, weights, reliability = 0.7) print(result$summary) ## End(Not run)
Generic mathematical operations optimized with C++/Eigen. No design-specific logic - purely mathematical primitives that can be orchestrated by R code to implement any experimental design.
This architecture allows: - Easy addition of new experimental designs (in R only) - C++ speed for heavy computation - Single source of truth (design_stats.R) - Better maintainability and testability
Implements the CPPG-LGSI which combines phenotypic and genomic information while achieving predetermined proportional gains between traits. This is the most general constrained genomic index.
cppg_lgsi( T_C = NULL, Psi_C = NULL, d, phen_mat = NULL, gebv_mat = NULL, pmat = NULL, gmat = NULL, wmat = NULL, wcol = 1, U = NULL, reliability = NULL, k_I = 2.063, L_I = 1, GAY = NULL )cppg_lgsi( T_C = NULL, Psi_C = NULL, d, phen_mat = NULL, gebv_mat = NULL, pmat = NULL, gmat = NULL, wmat = NULL, wcol = 1, U = NULL, reliability = NULL, k_I = 2.063, L_I = 1, GAY = NULL )
T_C |
Combined variance-covariance matrix (2t x 2t) |
Psi_C |
Combined genetic covariance matrix (2t x t) |
d |
Vector of desired proportional gains (length n_traits or n_constraints) |
phen_mat |
Optional. Matrix of phenotypes for automatic T_C computation |
gebv_mat |
Optional. Matrix of GEBVs for automatic T_C computation |
pmat |
Optional. Phenotypic variance-covariance matrix |
gmat |
Optional. Genotypic variance-covariance matrix |
wmat |
Optional. Economic weights for GA/PRE calculation |
wcol |
Weight column to use if wmat has multiple columns (default: 1) |
U |
Optional. Constraint matrix (n_traits x n_constraints) |
reliability |
Optional. Reliability of GEBVs (r^2) |
k_I |
Selection intensity (default: 2.063) |
L_I |
Standardization constant (default: 1) |
GAY |
Optional. Genetic advance of comparative trait for PRE calculation |
Mathematical Formulation (Chapter 6, Section 6.4):
Coefficient vector:
Where beta_CR is from CRLGSI and:
Selection response:
List with:
summary - Data frame with coefficients and metrics
b - Vector of CPPG-LGSI coefficients ()
b_y - Coefficients for phenotypes
b_g - Coefficients for GEBVs
E - Expected genetic gains per trait
theta_CP - Proportionality constant
gain_ratios - Ratios of achieved to desired gains
## Not run: # Simulate data set.seed(123) n_genotypes <- 100 n_traits <- 5 phen_mat <- matrix(rnorm(n_genotypes * n_traits, 15, 3), n_genotypes, n_traits) gebv_mat <- matrix(rnorm(n_genotypes * n_traits, 10, 2), n_genotypes, n_traits) gmat <- cov(phen_mat) * 0.6 pmat <- cov(phen_mat) # Desired proportional gains d <- c(2, 1, 1, 0.5, 0) w <- c(10, 8, 6, 4, 2) result <- cppg_lgsi( phen_mat = phen_mat, gebv_mat = gebv_mat, pmat = pmat, gmat = gmat, d = d, wmat = w, reliability = 0.7 ) print(result$summary) print(result$gain_ratios) ## End(Not run)## Not run: # Simulate data set.seed(123) n_genotypes <- 100 n_traits <- 5 phen_mat <- matrix(rnorm(n_genotypes * n_traits, 15, 3), n_genotypes, n_traits) gebv_mat <- matrix(rnorm(n_genotypes * n_traits, 10, 2), n_genotypes, n_traits) gmat <- cov(phen_mat) * 0.6 pmat <- cov(phen_mat) # Desired proportional gains d <- c(2, 1, 1, 0.5, 0) w <- c(10, 8, 6, 4, 2) result <- cppg_lgsi( phen_mat = phen_mat, gebv_mat = gebv_mat, pmat = pmat, gmat = gmat, d = d, wmat = w, reliability = 0.7 ) print(result$summary) print(result$gain_ratios) ## End(Not run)
Implements the CRLGSI which combines phenotypic and genomic information with restrictions on genetic gains. This extends CLGSI to include constraints.
crlgsi( T_C = NULL, Psi_C = NULL, phen_mat = NULL, gebv_mat = NULL, pmat = NULL, gmat = NULL, wmat, wcol = 1, restricted_traits = NULL, U = NULL, reliability = NULL, k_I = 2.063, L_I = 1, GAY = NULL )crlgsi( T_C = NULL, Psi_C = NULL, phen_mat = NULL, gebv_mat = NULL, pmat = NULL, gmat = NULL, wmat, wcol = 1, restricted_traits = NULL, U = NULL, reliability = NULL, k_I = 2.063, L_I = 1, GAY = NULL )
T_C |
Combined variance-covariance matrix (2t x 2t) where t = n_traits. Structure: [P, P_yg; P_yg', P_g] where P = phenotypic var, P_g = GEBV var, P_yg = covariance between phenotypes and GEBVs. Can be computed automatically if phen_mat and gebv_mat are provided. |
Psi_C |
Combined genetic covariance matrix (2t x t). Structure: [G; C_gebv_g] where G = genetic var, C_gebv_g = Cov(GEBV, g). Can be computed automatically if gmat and reliability are provided. |
phen_mat |
Optional. Matrix of phenotypes (n_genotypes x n_traits) |
gebv_mat |
Optional. Matrix of GEBVs (n_genotypes x n_traits) |
pmat |
Optional. Phenotypic variance-covariance matrix |
gmat |
Optional. Genotypic variance-covariance matrix |
wmat |
Economic weights matrix (n_traits x k), or vector |
wcol |
Weight column to use if wmat has multiple columns (default: 1) |
restricted_traits |
Vector of trait indices to restrict (default: NULL) |
U |
Constraint matrix (2t x n_constraints for combined traits). Alternative to restricted_traits. Ignored if restricted_traits is provided. |
reliability |
Optional. Reliability of GEBVs (r^2) |
k_I |
Selection intensity (default: 2.063) |
L_I |
Standardization constant (default: 1) |
GAY |
Optional. Genetic advance of comparative trait for PRE calculation |
Mathematical Formulation (Chapter 6, Section 6.3):
The CRLGSI combines phenotypic and genomic data with restrictions.
Coefficient vector:
Where K_C incorporates the restriction matrix.
Selection response:
Expected gains:
List with:
summary - Data frame with coefficients and metrics
b - Vector of CRLGSI coefficients ()
b_y - Coefficients for phenotypes
b_g - Coefficients for GEBVs
E - Expected genetic gains per trait
R - Overall selection response
## Not run: # Simulate data set.seed(123) n_genotypes <- 100 n_traits <- 5 phen_mat <- matrix(rnorm(n_genotypes * n_traits, 15, 3), n_genotypes, n_traits) gebv_mat <- matrix(rnorm(n_genotypes * n_traits, 10, 2), n_genotypes, n_traits) gmat <- cov(phen_mat) * 0.6 # Genotypic component pmat <- cov(phen_mat) w <- c(10, 8, 6, 4, 2) # Restrict traits 2 and 4 result <- crlgsi( phen_mat = phen_mat, gebv_mat = gebv_mat, pmat = pmat, gmat = gmat, wmat = w, restricted_traits = c(2, 4), reliability = 0.7 ) print(result$summary) ## End(Not run)## Not run: # Simulate data set.seed(123) n_genotypes <- 100 n_traits <- 5 phen_mat <- matrix(rnorm(n_genotypes * n_traits, 15, 3), n_genotypes, n_traits) gebv_mat <- matrix(rnorm(n_genotypes * n_traits, 10, 2), n_genotypes, n_traits) gmat <- cov(phen_mat) * 0.6 # Genotypic component pmat <- cov(phen_mat) w <- c(10, 8, 6, 4, 2) # Restrict traits 2 and 4 result <- crlgsi( phen_mat = phen_mat, gebv_mat = gebv_mat, pmat = pmat, gmat = gmat, wmat = w, restricted_traits = c(2, 4), reliability = 0.7 ) print(result$summary) ## End(Not run)
Implements the Pesek & Baker (1969) Desired Gains Index where breeders specify target genetic gains instead of economic weights. This enhanced version includes calculation of implied economic weights and feasibility checking.
dg_lpsi( pmat, gmat, d, return_implied_weights = TRUE, check_feasibility = TRUE, selection_intensity = 2.063 )dg_lpsi( pmat, gmat, d, return_implied_weights = TRUE, check_feasibility = TRUE, selection_intensity = 2.063 )
pmat |
Phenotypic variance-covariance matrix (n_traits x n_traits) |
gmat |
Genotypic variance-covariance matrix (n_traits x n_traits) |
d |
Vector of desired genetic gains (length n_traits). Example: d = c(1.5, 0.8, -0.2) means gain +1.5 in trait 1, +0.8 in trait 2, -0.2 in trait 3. |
return_implied_weights |
Logical - calculate implied economic weights? (default: TRUE) |
check_feasibility |
Logical - warn if desired gains are unrealistic? (default: TRUE) |
selection_intensity |
Selection intensity i (default: 2.063) |
Mathematical Formulation:
1. Index coefficients:
2. Expected response:
CRITICAL: Scale Invariance Property
The achieved gains are determined by selection intensity (i),
genetic variance (G), and phenotypic variance (P), NOT by scaling .
If you multiply by constant c, also scales by c, causing
complete cancellation in .
What DG-LPSI Actually Achieves:
- Proportional gains matching the RATIOS in d (not absolute magnitudes) - Achieved magnitude depends on biological/genetic constraints - Use feasibility checking to verify if desired gains are realistic
3. Implied economic weights (Section 1.4 of Chapter 4):
The implied weights represent the economic values that would have been needed in a Smith-Hazel index to achieve the desired gain PROPORTIONS. Large implied weights indicate traits that are "expensive" to improve (low heritability or unfavorable correlations), while small weights indicate traits that are "cheap" to improve.
Feasibility Checking:
The function estimates maximum possible gains as approximately 3.0 * sqrt(G_ii) (assuming very intense selection with i ~ 3.0) and warns if desired gains exceed 80% of these theoretical maxima.
List with:
summary - Data frame with coefficients, metrics, and implied weights
b - Vector of selection index coefficients
Delta_G - Named vector of achieved genetic gains per trait
desired_gains - Named vector of desired gains (input d)
gain_errors - Difference between desired and achieved gains
implied_weights - Economic weights that would achieve these gains in Smith-Hazel LPSI
implied_weights_normalized - Normalized implied weights (max absolute = 1)
feasibility - Data frame with feasibility analysis per trait
hI2 - Index heritability
rHI - Index accuracy
Pesek, J., & Baker, R. J. (1969). Desired improvement in relation to selection indices. Canadian Journal of Plant Science, 49(6), 803-804.
# Load data gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) # Specify desired gains (e.g., increase each trait by 1 unit) desired_gains <- rep(1, ncol(gmat)) # Calculate Desired Gains Index with all enhancements result <- dg_lpsi(pmat, gmat, d = desired_gains) # View summary print(result$summary) # Extract implied weights to understand relative "cost" of gains print(result$implied_weights_normalized) # Check feasibility print(result$feasibility)# Load data gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) # Specify desired gains (e.g., increase each trait by 1 unit) desired_gains <- rep(1, ncol(gmat)) # Calculate Desired Gains Index with all enhancements result <- dg_lpsi(pmat, gmat, d = desired_gains) # View summary print(result$summary) # Extract implied weights to understand relative "cost" of gains print(result$implied_weights_normalized) # Check feasibility print(result$feasibility)
Implements the ESIM by maximising the squared accuracy
through the generalised eigenproblem of the multi-trait heritability matrix
.
Unlike the Smith-Hazel LPSI, **no economic weights are required**. The net
genetic merit vector is instead implied by the solution.
esim(pmat, gmat, selection_intensity = 2.063, n_indices = 1L)esim(pmat, gmat, selection_intensity = 2.063, n_indices = 1L)
pmat |
Phenotypic variance-covariance matrix (n_traits x n_traits). |
gmat |
Genotypic variance-covariance matrix (n_traits x n_traits). Corresponds to C in the Chapter 7 notation. |
selection_intensity |
Selection intensity constant |
n_indices |
Number of leading ESIM vectors to return (default: 1). Returning >1 provides a ranked set of indices for comparative analysis. |
Eigenproblem (Section 7.1):
The solution (largest eigenvalue) equals the maximum
achievable index heritability .
Key metrics:
Implied economic weights:
Uses cpp_symmetric_solve and cpp_quadratic_form_sym from
math_primitives.cpp for efficient matrix operations, and R's
eigen() for the eigendecomposition.
Object of class "esim", a list with:
summaryData frame with b coefficients, hI2, rHI, sigma_I, Delta_G, and lambda2 for each index requested.
bNamed numeric vector of optimal ESIM coefficients (1st index).
Delta_GNamed numeric vector of expected genetic gains per trait.
sigma_IStandard deviation of the index .
hI2Index heritability (= leading eigenvalue).
rHIAccuracy .
lambda2Leading eigenvalue (maximised index heritability).
implied_wImplied economic weights .
all_eigenvaluesAll eigenvalues of .
selection_intensitySelection intensity used.
Ceron-Rojas, J. J., & Crossa, J. (2018). Linear Selection Indices in Modern Plant Breeding. Springer International Publishing. Section 7.1.
## Not run: gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) result <- esim(pmat, gmat) print(result) summary(result) ## End(Not run)## Not run: gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) result <- esim(pmat, gmat) print(result) summary(result) ## End(Not run)
Estimates and imputes missing values in randomized complete block design (RCBD), Latin square design (LSD), or split plot design (SPD) experimental data.
Uses one of six methods: REML, Yates, Healy, Regression, Mean, or Bartlett.
estimate_missing_values( data, genotypes, replications, columns = NULL, main_plots = NULL, design = c("RCBD", "LSD", "SPD"), method = c("REML", "Yates", "Healy", "Regression", "Mean", "Bartlett"), tolerance = 1e-06 )estimate_missing_values( data, genotypes, replications, columns = NULL, main_plots = NULL, design = c("RCBD", "LSD", "SPD"), method = c("REML", "Yates", "Healy", "Regression", "Mean", "Bartlett"), tolerance = 1e-06 )
data |
Matrix or data.frame with observations (rows) by traits (columns). May contain missing values (NA, NaN, Inf). |
genotypes |
Vector indicating genotype/treatment for each observation (sub-plot treatments in SPD). |
replications |
Vector indicating replication/block (RCBD) or row (LSD) for each observation. |
columns |
Vector indicating column index for each observation (required for Latin Square Design only). |
main_plots |
Vector indicating main plot treatment for each observation (required for Split Plot Design only). |
design |
Character string specifying experimental design:
|
method |
Character string specifying the estimation method:
|
tolerance |
Numeric convergence criterion for iterative methods. Iteration stops when maximum change in estimated values falls below this threshold. Default: 1e-6. |
The function handles missing values by iteratively estimating them based on the experimental design structure:
**RCBD:** 2-way blocking (genotypes × blocks) **LSD:** 3-way blocking (genotypes × rows × columns) **SPD:** Nested structure (blocks > main plots > sub-plots)
Method Availability:
RCBD: All methods (REML, Yates, Healy, Regression, Mean, Bartlett)
LSD: All methods (REML, Yates, Healy, Regression, Mean, Bartlett)
SPD: Mean only (other methods fall back to Mean)
Method Selection Guide:
Use REML for complex missing patterns or when precision is critical
Use Yates for balanced designs with few missing values
Use Healy when multiple values missing from same treatment/block
Use Regression for fast, deterministic estimation
Use Mean for quick estimation when precision is less critical
Use Bartlett when traits are highly correlated
The function uses the centralized design_stats engine for all ANOVA computations, ensuring consistency with gen_varcov(), phen_varcov(), and mean_performance().
Matrix of the same dimensions as data with all missing values
replaced by estimates.
Yates, F. (1933). The analysis of replicated experiments when the field results are incomplete. Empire Journal of Experimental Agriculture, 1, 129-142.
Healy, M. J. R., & Westmacott, M. (1956). Missing values in experiments analysed on automatic computers. Applied Statistics, 5(3), 203-206.
Bartlett, M. S. (1937). Some examples of statistical methods of research in agriculture and applied biology. Supplement to the Journal of the Royal Statistical Society, 4(2), 137-183.
# RCBD example with missing values data(seldata) test_data <- seldata[, 3:5] test_data[c(1, 10, 25), 1] <- NA test_data[c(5, 15), 2] <- NA # Impute using Yates method imputed <- estimate_missing_values(test_data, seldata$treat, seldata$rep, method = "Yates") # Check that no NA remain anyNA(imputed) # Should be FALSE ## Not run: # Latin Square Design example # lsd_data should have genotypes, rows, and columns imputed_lsd <- estimate_missing_values( data = lsd_data[, 3:7], genotypes = lsd_data$treat, replications = lsd_data$row, columns = lsd_data$col, design = "LSD", method = "REML" ) # Split Plot Design example # spd_data should have sub-plots, blocks, and main plots imputed_spd <- estimate_missing_values( data = spd_data[, 3:7], genotypes = spd_data$subplot, replications = spd_data$block, main_plots = spd_data$mainplot, design = "SPD", method = "Mean" ) ## End(Not run)# RCBD example with missing values data(seldata) test_data <- seldata[, 3:5] test_data[c(1, 10, 25), 1] <- NA test_data[c(5, 15), 2] <- NA # Impute using Yates method imputed <- estimate_missing_values(test_data, seldata$treat, seldata$rep, method = "Yates") # Check that no NA remain anyNA(imputed) # Should be FALSE ## Not run: # Latin Square Design example # lsd_data should have genotypes, rows, and columns imputed_lsd <- estimate_missing_values( data = lsd_data[, 3:7], genotypes = lsd_data$treat, replications = lsd_data$row, columns = lsd_data$col, design = "LSD", method = "REML" ) # Split Plot Design example # spd_data should have sub-plots, blocks, and main plots imputed_spd <- estimate_missing_values( data = spd_data[, 3:7], genotypes = spd_data$subplot, replications = spd_data$block, main_plots = spd_data$mainplot, design = "SPD", method = "Mean" ) ## End(Not run)
Genetic Advance for PRE
gen_advance(phen_mat, gen_mat, weight_mat)gen_advance(phen_mat, gen_mat, weight_mat)
phen_mat |
phenotypic matrix value of desired characters |
gen_mat |
genotypic matrix value of desired characters |
weight_mat |
weight matrix value of desired characters |
Genetic advance of character or character combinations
gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) gen_advance(phen_mat = pmat[1, 1], gen_mat = gmat[1, 1], weight_mat = weight[1, 2])gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) gen_advance(phen_mat = pmat[1, 1], gen_mat = gmat[1, 1], weight_mat = weight[1, 2])
Genotypic Variance-Covariance Analysis
gen_varcov( data, genotypes, replication, columns = NULL, main_plots = NULL, design_type = c("RCBD", "LSD", "SPD"), method = c("REML", "Yates", "Healy", "Regression", "Mean", "Bartlett") )gen_varcov( data, genotypes, replication, columns = NULL, main_plots = NULL, design_type = c("RCBD", "LSD", "SPD"), method = c("REML", "Yates", "Healy", "Regression", "Mean", "Bartlett") )
data |
traits to be analyzed |
genotypes |
vector containing genotypes/treatments (sub-plot treatments in SPD) |
replication |
vector containing replication/blocks (RCBD) or rows (LSD) |
columns |
vector containing columns (required for Latin Square Design only) |
main_plots |
vector containing main plot treatments (required for Split Plot Design only) |
design_type |
experimental design type: "RCBD" (default), "LSD" (Latin Square), or "SPD" (Split Plot) |
method |
Method for missing value imputation: "REML" (default), "Yates", "Healy", "Regression", "Mean", or "Bartlett" |
A Genotypic Variance-Covariance Matrix
# RCBD example gen_varcov(data = seldata[, 3:9], genotypes = seldata$treat, replication = seldata$rep) # Latin Square Design example (requires columns parameter) # gen_varcov(data=lsd_data[,3:7], genotypes=lsd_data$treat, # replication=lsd_data$row, columns=lsd_data$col, design_type="LSD") # Split Plot Design example (requires main_plots parameter) # gen_varcov(data=spd_data[,3:7], genotypes=spd_data$subplot, # replication=spd_data$block, main_plots=spd_data$mainplot, design_type="SPD")# RCBD example gen_varcov(data = seldata[, 3:9], genotypes = seldata$treat, replication = seldata$rep) # Latin Square Design example (requires columns parameter) # gen_varcov(data=lsd_data[,3:7], genotypes=lsd_data$treat, # replication=lsd_data$row, columns=lsd_data$col, design_type="LSD") # Split Plot Design example (requires main_plots parameter) # gen_varcov(data=spd_data[,3:7], genotypes=spd_data$subplot, # replication=spd_data$block, main_plots=spd_data$mainplot, design_type="SPD")
Computes the genetic-genomic covariance matrix (A) as defined in Chapter 8 (Equation 8.12) for GESIM and related genomic eigen selection indices.
Structure: A = [[C, C_g-gamma], [C_gamma-g, ]] (2t x 2t, square symmetric)
where:
- C = Var(g) = true genotypic variance-covariance (t x t)
- = Var() = genomic variance-covariance (t x t)
- C_g-gamma = Cov(g, ) = covariance between true BVs and GEBVs (t x t)
- C_gamma-g = Cov(, g) = transpose of C_g-gamma (t x t)
genetic_genomic_varcov( gmat, Gamma = NULL, reliability = NULL, C_gebv_g = NULL, square = TRUE )genetic_genomic_varcov( gmat, Gamma = NULL, reliability = NULL, C_gebv_g = NULL, square = TRUE )
gmat |
Genotypic variance-covariance matrix (n_traits x n_traits) |
Gamma |
Genomic variance-covariance matrix (n_traits x n_traits). If NULL, assumed equal to gmat (perfect prediction). |
reliability |
Optional. Reliability of GEBVs (r² = squared correlation
between GEBV and true BV). Can be:
- Single value (applied to all traits)
- Vector of length n_traits (one per trait)
- NULL (default): assumes C_g, |
C_gebv_g |
Optional. Direct specification of Cov( |
square |
Logical. If TRUE (default), returns (2t × 2t) square matrix as required for GESIM. If FALSE, returns (2t × t) rectangular form for LMSI. |
The genetic-genomic matrix relates selection on phenotypes + GEBVs to expected genetic gains.
**For GESIM (Chapter 8):** Requires the full (2t × 2t) square matrix for
the eigenproblem: (^(-1) A - I)b = 0
**For LMSI/CLGSI (Chapter 4):** Can use the rectangular (2t × t) form in the equation: b = P^(-1) G w, where G is (2t × t).
When reliability is provided:
- = diag()
When reliability is NULL:
- = Gamma (assumes unbiased GEBVs, perfect prediction)
Genetic-genomic covariance matrix: - If square = TRUE: (2t × 2t) symmetric matrix for GESIM/eigen indices - If square = FALSE: (2t × t) rectangular matrix for LMSI where t is the number of traits
Cerón-Rojas, J. J., & Crossa, J. (2018). Linear Selection Indices in Modern Plant Breeding. Springer International Publishing. Chapters 4 & 8.
## Not run: # Generate example data gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) # Simulate genomic covariance Gamma <- gmat * 0.8 # For GESIM: Get square (2t × 2t) matrix A_square <- genetic_genomic_varcov(gmat, Gamma, reliability = 0.7) print(dim(A_square)) # Should be 14 x 14 (2t × 2t) # For LMSI: Get rectangular (2t × t) matrix A_rect <- genetic_genomic_varcov(gmat, Gamma, reliability = 0.7, square = FALSE) print(dim(A_rect)) # Should be 14 x 7 (2t × t) ## End(Not run)## Not run: # Generate example data gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) # Simulate genomic covariance Gamma <- gmat * 0.8 # For GESIM: Get square (2t × 2t) matrix A_square <- genetic_genomic_varcov(gmat, Gamma, reliability = 0.7) print(dim(A_square)) # Should be 14 x 14 (2t × 2t) # For LMSI: Get rectangular (2t × t) matrix A_rect <- genetic_genomic_varcov(gmat, Gamma, reliability = 0.7, square = FALSE) print(dim(A_rect)) # Should be 14 x 7 (2t × t) ## End(Not run)
)Computes genomic variance-covariance matrix ( or Gamma) from a matrix of
Genomic Estimated Breeding Values (GEBVs).
(gamma) represents GEBV vectors obtained from genomic prediction models
(e.g., GBLUP, rrBLUP, Genomic BLUP). This function computes Var() = .
genomic_varcov(gebv_mat, method = "pearson", use = "complete.obs")genomic_varcov(gebv_mat, method = "pearson", use = "complete.obs")
gebv_mat |
Matrix of GEBVs (n_genotypes x n_traits) |
method |
Character string specifying correlation method: "pearson" (default), "kendall", or "spearman" |
use |
Character string specifying how to handle missing values:
"everything" (default), "complete.obs", "pairwise.complete.obs", etc.
See |
The genomic variance-covariance matrix captures genetic variation as
predicted by molecular markers. It is computed as:
where is the GEBV vector for genotype i and is the mean GEBV vector.
**Missing Value Handling:** - "complete.obs": Uses only complete observations (recommended) - "pairwise.complete.obs": Uses pairwise-complete observations (may not be PSD) - "everything": Fails if any NA present
When using pairwise deletion, the resulting matrix may not be positive semi-definite (PSD), which can cause numerical issues in selection indices.
**Applications:**
In selection index theory:
- Used in LGSI (Linear Genomic Selection Index)
- Component of (phenomic-genomic covariance)
- Component of A (genetic-genomic covariance)
Symmetric genomic variance-covariance matrix (n_traits x n_traits)
Cerón-Rojas, J. J., & Crossa, J. (2018). Linear Selection Indices in Modern Plant Breeding. Springer International Publishing. Chapters 4 & 8.
## Not run: # Simulate GEBVs set.seed(123) n_genotypes <- 100 n_traits <- 5 gebv_mat <- matrix(rnorm(n_genotypes * n_traits), nrow = n_genotypes, ncol = n_traits ) colnames(gebv_mat) <- paste0("Trait", 1:n_traits) # Compute genomic variance-covariance Gamma <- genomic_varcov(gebv_mat) print(Gamma) ## End(Not run)## Not run: # Simulate GEBVs set.seed(123) n_genotypes <- 100 n_traits <- 5 gebv_mat <- matrix(rnorm(n_genotypes * n_traits), nrow = n_genotypes, ncol = n_traits ) colnames(gebv_mat) <- paste0("Trait", 1:n_traits) # Compute genomic variance-covariance Gamma <- genomic_varcov(gebv_mat) print(Gamma) ## End(Not run)
Implements the GESIM by maximising the squared accuracy through the generalised eigenproblem combining phenotypic data with GEBVs (Genomic Estimated Breeding Values). No economic weights are required.
gesim(pmat, gmat, Gamma, selection_intensity = 2.063)gesim(pmat, gmat, Gamma, selection_intensity = 2.063)
pmat |
Phenotypic variance-covariance matrix (n_traits x n_traits). |
gmat |
Genotypic variance-covariance matrix (n_traits x n_traits). |
Gamma |
Covariance between phenotypes and GEBVs (n_traits x n_traits). This represents Cov(y, gamma) where gamma are GEBVs. |
selection_intensity |
Selection intensity constant |
Eigenproblem (Section 8.2):
where:
Implied economic weights:
Selection response:
Expected genetic gain per trait:
Object of class "gesim", a list with:
summaryData frame with coefficients and metrics.
b_yCoefficients for phenotypic data.
b_gammaCoefficients for GEBVs.
b_combinedCombined coefficient vector.
E_GExpected genetic gains per trait.
sigma_IStandard deviation of the index.
hI2Index heritability (= leading eigenvalue).
rHIAccuracy .
R_GSelection response.
lambda2Leading eigenvalue.
implied_wImplied economic weights.
selection_intensitySelection intensity used.
Ceron-Rojas, J. J., & Crossa, J. (2018). Linear Selection Indices in Modern Plant Breeding. Springer International Publishing. Section 8.2.
## Not run: gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) # Simulate GEBV covariance (in practice, compute from genomic predictions) Gamma <- gmat * 0.8 # Assume 80% GEBV-phenotype covariance result <- gesim(pmat, gmat, Gamma) print(result) ## End(Not run)## Not run: gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) # Simulate GEBV covariance (in practice, compute from genomic predictions) Gamma <- gmat * 0.8 # Assume 80% GEBV-phenotype covariance result <- gesim(pmat, gmat, Gamma) print(result) ## End(Not run)
Implements the GW-ESIM by incorporating genome-wide marker effects directly into the eigen selection index framework. Uses N marker scores alongside phenotypic data.
gw_esim(pmat, gmat, G_M, M, selection_intensity = 2.063)gw_esim(pmat, gmat, G_M, M, selection_intensity = 2.063)
pmat |
Phenotypic variance-covariance matrix (n_traits x n_traits). |
gmat |
Genotypic variance-covariance matrix (n_traits x n_traits). |
G_M |
Covariance between phenotypes and marker scores (n_traits x N_markers). |
M |
Variance-covariance matrix of marker scores (N_markers x N_markers). |
selection_intensity |
Selection intensity constant |
Eigenproblem (Section 8.3):
where:
Selection response:
Expected genetic gain per trait:
Object of class "gw_esim", a list with:
summaryData frame with key metrics.
b_yCoefficients for phenotypic data.
b_mCoefficients for marker scores.
b_combinedCombined coefficient vector.
E_WExpected genetic gains per trait.
sigma_IStandard deviation of the index.
hI2Index heritability (= leading eigenvalue).
rHIAccuracy.
R_WSelection response.
lambda2Leading eigenvalue.
selection_intensitySelection intensity used.
Ceron-Rojas, J. J., & Crossa, J. (2018). Linear Selection Indices in Modern Plant Breeding. Springer International Publishing. Section 8.3.
## Not run: gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) # Simulate marker data N_markers <- 100 n_traits <- nrow(gmat) G_M <- matrix(rnorm(n_traits * N_markers, sd = 0.5), n_traits, N_markers) M <- diag(N_markers) + matrix(rnorm(N_markers^2, sd = 0.1), N_markers, N_markers) M <- (M + t(M)) / 2 # Make symmetric result <- gw_esim(pmat, gmat, G_M, M) print(result) ## End(Not run)## Not run: gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) # Simulate marker data N_markers <- 100 n_traits <- nrow(gmat) G_M <- matrix(rnorm(n_traits * N_markers, sd = 0.5), n_traits, N_markers) M <- diag(N_markers) + matrix(rnorm(N_markers^2, sd = 0.1), N_markers, N_markers) M <- (M + t(M)) / 2 # Make symmetric result <- gw_esim(pmat, gmat, G_M, M) print(result) ## End(Not run)
Implements the GW-LMSI which uses all available genome-wide markers directly as predictors in the selection index. Unlike LMSI which uses aggregated marker scores per trait, GW-LMSI treats each individual marker as a separate predictor.
gw_lmsi( marker_mat, trait_mat = NULL, gmat, P_GW = NULL, G_GW = NULL, wmat, wcol = 1, lambda = 0, selection_intensity = 2.063, GAY = NULL )gw_lmsi( marker_mat, trait_mat = NULL, gmat, P_GW = NULL, G_GW = NULL, wmat, wcol = 1, lambda = 0, selection_intensity = 2.063, GAY = NULL )
marker_mat |
Matrix of marker genotypes (n_genotypes x n_markers). Typically coded as -1, 0, 1 or 0, 1, 2. |
trait_mat |
Matrix of trait values (n_genotypes x n_traits). Used to compute G_GW if not provided. |
gmat |
Genotypic variance-covariance matrix (n_traits x n_traits). |
P_GW |
Marker covariance matrix (n_markers x n_markers). If NULL, computed as Var(marker_mat). |
G_GW |
Covariance between markers and traits (n_markers x n_traits). If NULL, computed as Cov(marker_mat, trait_mat). |
wmat |
Economic weights matrix (n_traits x k), or vector. |
wcol |
Weight column to use if wmat has multiple columns (default: 1). |
lambda |
Ridge regularization parameter (default: 0). If lambda > 0, uses P_GW + lambda*I for regularization. Automatic warnings issued when n_markers > n_genotypes (high-dimensional case) or when P_GW is ill-conditioned. Recommended values: 0.01-0.1 times mean(diag(P_GW)). |
selection_intensity |
Selection intensity k (default: 2.063 for 10% selection). |
GAY |
Optional. Genetic advance of comparative trait for PRE calculation. |
Mathematical Formulation:
The GW-LMSI maximizes the correlation between the index
and the breeding
objective .
Marker covariance matrix:
Covariance between markers and traits:
Index coefficients (with optional Ridge regularization):
Accuracy:
Selection response:
Expected genetic gain per trait:
Note on Singularity Detection and Regularization: The function automatically detects problematic cases:
1. **High-dimensional case**: When n_markers > n_genotypes, P_GW is mathematically singular (rank-deficient). The function issues a warning and suggests an appropriate lambda value.
2. **Ill-conditioned case**: When P_GW has a high condition number (> 1e10), indicating numerical instability.
3. **Numerical singularity**: When P_GW has eigenvalues near zero.
Ridge regularization adds I to P_GW, ensuring positive definiteness. Recommended
lambda values are 0.01-0.1 times the average diagonal element of P_GW. Users can
also set lambda = 0 to force generalized inverse (less stable but sometimes needed).
List of class "gw_lmsi" with components:
bIndex coefficients for markers (n_markers).
P_GWMarker covariance matrix (n_markers x n_markers).
G_GWCovariance between markers and traits (n_markers x n_traits).
rHIIndex accuracy (correlation between index and breeding objective).
sigma_IStandard deviation of the index.
RSelection response (k * sigma_I).
Delta_HExpected genetic gain per trait (vector of length n_traits).
GAOverall genetic advance in breeding objective.
PREPercent relative efficiency (if GAY provided).
hI2Index heritability.
lambdaRidge regularization parameter used.
n_markersNumber of markers.
high_dimensionalLogical indicating if n_markers > n_genotypes.
condition_numberCondition number of P_GW (if computed).
summaryData frame with metrics.
Lande, R., & Thompson, R. (1990). Efficiency of marker-assisted selection in the improvement of quantitative traits. Genetics, 124(3), 743-756.
Cerón-Rojas, J. J., & Crossa, J. (2018). Linear Selection Indices in Modern Plant Breeding. Springer International Publishing. Chapter 4.
## Not run: # Load data data(seldata) gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) # Simulate marker data set.seed(123) n_genotypes <- 100 n_markers <- 200 n_traits <- ncol(gmat) # Marker matrix (coded as 0, 1, 2) marker_mat <- matrix(sample(0:2, n_genotypes * n_markers, replace = TRUE), nrow = n_genotypes, ncol = n_markers ) # Trait matrix trait_mat <- matrix(rnorm(n_genotypes * n_traits, mean = 15, sd = 3), nrow = n_genotypes, ncol = n_traits ) # Economic weights weights <- c(10, 5, 3, 3, 5, 8, 4) # Calculate GW-LMSI with Ridge regularization result <- gw_lmsi(marker_mat, trait_mat, gmat, wmat = weights, lambda = 0.01 ) print(result$summary) ## End(Not run)## Not run: # Load data data(seldata) gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) # Simulate marker data set.seed(123) n_genotypes <- 100 n_markers <- 200 n_traits <- ncol(gmat) # Marker matrix (coded as 0, 1, 2) marker_mat <- matrix(sample(0:2, n_genotypes * n_markers, replace = TRUE), nrow = n_genotypes, ncol = n_markers ) # Trait matrix trait_mat <- matrix(rnorm(n_genotypes * n_traits, mean = 15, sd = 3), nrow = n_genotypes, ncol = n_traits ) # Economic weights weights <- c(10, 5, 3, 3, 5, 8, 4) # Calculate GW-LMSI with Ridge regularization result <- gw_lmsi(marker_mat, trait_mat, gmat, wmat = weights, lambda = 0.01 ) print(result$summary) ## End(Not run)
Converts genetic distance (in Morgans) to recombination fraction using Haldane's mapping function. This function models the relationship between genetic distance and the probability of recombination between loci.
haldane_mapping(distance)haldane_mapping(distance)
distance |
Genetic distance in Morgans (scalar or vector). One Morgan corresponds to a 50% recombination frequency. |
Mathematical Formula (Chapter 10, Section 10.1):
The relationship between recombination fraction (r) and genetic distance (d):
Where: - d = Genetic distance in Morgans - r = Recombination fraction (probability of recombination per meiosis)
This function assumes no crossover interference beyond that implied by the mapping function itself.
Recombination fraction (r) ranging from 0 to 0.5. - r = 0 indicates complete linkage (no recombination) - r = 0.5 indicates independent assortment (unlinked loci)
# Zero distance means complete linkage (no recombination) haldane_mapping(0) # Returns 0 # 1 Morgan distance haldane_mapping(1) # Returns ~0.43 # Large distance approaches 0.5 (independent assortment) haldane_mapping(10) # Returns ~0.5 # Vector of distances distances <- c(0, 0.1, 0.5, 1.0, 2.0) haldane_mapping(distances)# Zero distance means complete linkage (no recombination) haldane_mapping(0) # Returns 0 # 1 Morgan distance haldane_mapping(1) # Returns ~0.43 # Large distance approaches 0.5 (independent assortment) haldane_mapping(10) # Returns ~0.5 # Vector of distances distances <- c(0, 0.1, 0.5, 1.0, 2.0) haldane_mapping(distances)
Converts recombination fraction back to genetic distance (in Morgans). This is the inverse of Haldane's mapping function.
inverse_haldane_mapping(recombination_fraction)inverse_haldane_mapping(recombination_fraction)
recombination_fraction |
Recombination fraction (r) between 0 and 0.5. |
Mathematical Formula:
Solving Haldane's equation for d:
Genetic distance in Morgans.
# Convert recombination fraction to distance inverse_haldane_mapping(0.25) # Returns ~0.347 Morgans inverse_haldane_mapping(0.5) # Returns Inf (unlinked)# Convert recombination fraction to distance inverse_haldane_mapping(0.25) # Returns ~0.347 Morgans inverse_haldane_mapping(0.5) # Returns Inf (unlinked)
Implements the Linear Genomic Selection Index where selection is based solely on Genomic Estimated Breeding Values (GEBVs). This is used for selecting candidates that have been genotyped but not phenotyped (e.g., in a testing population).
lgsi( gebv_mat, gmat, wmat, wcol = 1, reliability = NULL, selection_intensity = 2.063, GAY = NULL )lgsi( gebv_mat, gmat, wmat, wcol = 1, reliability = NULL, selection_intensity = 2.063, GAY = NULL )
gebv_mat |
Matrix of GEBVs (n_genotypes x n_traits) |
gmat |
Genotypic variance-covariance matrix (n_traits x n_traits) |
wmat |
Economic weights matrix (n_traits x k), or vector |
wcol |
Weight column to use if wmat has multiple columns (default: 1) |
reliability |
Optional. Reliability of GEBVs (correlation between GEBV and true BV). Can be: - Single value (applied to all traits) - Vector of length n_traits (one per trait) - NULL (default): estimated from GEBV variance (assumes reliability = GEBV_var / G_var) |
selection_intensity |
Selection intensity i (default: 2.063 for 10% selection) |
GAY |
Optional. Genetic advance of comparative trait for PRE calculation |
Mathematical Formulation:
The LGSI maximizes the correlation between the index I = b' * gebv and
Index coefficients:
Where:
- = Var(gebv) - variance-covariance of GEBVs
- = Cov(gebv, g) - covariance between GEBVs and true breeding values
If reliability (r) is known:
Expected response:
List with components:
b - Index coefficients
P_gebv - GEBV variance-covariance matrix
reliability - Reliability values used
Delta_H - Expected genetic advance per trait
GA - Overall genetic advance in the index
PRE - Percent relative efficiency (if GAY provided)
hI2 - Index heritability
rHI - Index accuracy
sigma_I - Standard deviation of the index
summary - Data frame with all metrics
Cerón-Rojas, J. J., & Crossa, J. (2018). Linear Selection Indices in Modern Plant Breeding. Springer International Publishing.
## Not run: # Generate example data gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) # Simulate GEBVs (in practice, these come from genomic prediction) set.seed(123) n_genotypes <- 100 n_traits <- ncol(gmat) gebv_mat <- matrix(rnorm(n_genotypes * n_traits, mean = 10, sd = 2), nrow = n_genotypes, ncol = n_traits ) colnames(gebv_mat) <- colnames(gmat) # Economic weights weights <- c(10, 5, 3, 3, 5, 8, 4) # Calculate LGSI result <- lgsi(gebv_mat, gmat, weights, reliability = 0.7) print(result$summary) ## End(Not run)## Not run: # Generate example data gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) # Simulate GEBVs (in practice, these come from genomic prediction) set.seed(123) n_genotypes <- 100 n_traits <- ncol(gmat) gebv_mat <- matrix(rnorm(n_genotypes * n_traits, mean = 10, sd = 2), nrow = n_genotypes, ncol = n_traits ) colnames(gebv_mat) <- colnames(gmat) # Economic weights weights <- c(10, 5, 3, 3, 5, 8, 4) # Calculate LGSI result <- lgsi(gebv_mat, gmat, weights, reliability = 0.7) print(result$summary) ## End(Not run)
Implements the LMSI which combines phenotypic information with molecular marker scores from statistically significant markers (Lande & Thompson, 1990). The index is I = b_y' * y + b_s' * s, where y are phenotypes and s are marker scores.
lmsi( phen_mat = NULL, marker_scores = NULL, pmat, gmat, G_s = NULL, wmat, wcol = 1, selection_intensity = 2.063, GAY = NULL )lmsi( phen_mat = NULL, marker_scores = NULL, pmat, gmat, G_s = NULL, wmat, wcol = 1, selection_intensity = 2.063, GAY = NULL )
phen_mat |
Matrix of phenotypes (n_genotypes x n_traits). Can be NULL if G_s is provided directly (theoretical case where covariance structure is known without needing empirical data). |
marker_scores |
Matrix of marker scores (n_genotypes x n_traits). These are computed as s_j = sum(x_jk * beta_jk) where x_jk is the coded marker value and beta_jk is the estimated marker effect for trait j. Can be NULL if G_s is provided directly. |
pmat |
Phenotypic variance-covariance matrix (n_traits x n_traits). |
gmat |
Genotypic variance-covariance matrix (n_traits x n_traits). |
G_s |
Genomic covariance matrix explained by markers (n_traits x n_traits). This represents Var(s) which approximates Cov(y, s) when markers fully explain genetic variance. If provided, phen_mat and marker_scores become optional as the covariance structure is specified directly. If NULL, computed empirically from marker_scores and phen_mat. |
wmat |
Economic weights matrix (n_traits x k), or vector. |
wcol |
Weight column to use if wmat has multiple columns (default: 1). |
selection_intensity |
Selection intensity k (default: 2.063 for 10% selection). |
GAY |
Optional. Genetic advance of comparative trait for PRE calculation. |
Mathematical Formulation:
The LMSI maximizes the correlation between the index
and the breeding objective .
Combined covariance matrices:
where is the phenotypic variance,
is the covariance between phenotypes and marker scores (computed from data),
is the variance of marker scores, is the
genotypic variance, and represents the genetic covariance
explained by markers.
Index coefficients:
Accuracy:
Selection response:
Expected genetic gain per trait:
List of class "lmsi" with components:
b_yCoefficients for phenotypes (n_traits vector).
b_sCoefficients for marker scores (n_traits vector).
b_combinedCombined coefficient vector [b_y; b_s] (2*n_traits vector).
P_LCombined phenotypic-marker covariance matrix (2*n_traits x 2*n_traits).
G_LCombined genetic-marker covariance matrix (2*n_traits x n_traits).
G_sGenomic covariance matrix explained by markers (n_traits x n_traits).
rHIIndex accuracy (correlation between index and breeding objective).
sigma_IStandard deviation of the index.
RSelection response (k * sigma_I).
Delta_HExpected genetic gain per trait (vector of length n_traits).
GAOverall genetic advance in breeding objective.
PREPercent relative efficiency (if GAY provided).
hI2Index heritability.
summaryData frame with coefficients and metrics (combined view).
phenotype_coeffsData frame with phenotype coefficients only.
marker_coeffsData frame with marker score coefficients only.
coeff_analysisData frame with coefficient distribution analysis.
Lande, R., & Thompson, R. (1990). Efficiency of marker-assisted selection in the improvement of quantitative traits. Genetics, 124(3), 743-756.
Cerón-Rojas, J. J., & Crossa, J. (2018). Linear Selection Indices in Modern Plant Breeding. Springer International Publishing. Chapter 4.
## Not run: # Load data data(seldata) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) # Simulate marker scores (in practice, computed from QTL mapping) set.seed(123) n_genotypes <- 100 n_traits <- ncol(gmat) marker_scores <- matrix(rnorm(n_genotypes * n_traits, mean = 5, sd = 1.5), nrow = n_genotypes, ncol = n_traits ) colnames(marker_scores) <- colnames(gmat) # Simulate phenotypes phen_mat <- matrix(rnorm(n_genotypes * n_traits, mean = 15, sd = 3), nrow = n_genotypes, ncol = n_traits ) colnames(phen_mat) <- colnames(gmat) # Economic weights weights <- c(10, 5, 3, 3, 5, 8, 4) # Calculate LMSI result <- lmsi(phen_mat, marker_scores, pmat, gmat, G_s = NULL, wmat = weights ) print(result$summary) ## End(Not run)## Not run: # Load data data(seldata) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) # Simulate marker scores (in practice, computed from QTL mapping) set.seed(123) n_genotypes <- 100 n_traits <- ncol(gmat) marker_scores <- matrix(rnorm(n_genotypes * n_traits, mean = 5, sd = 1.5), nrow = n_genotypes, ncol = n_traits ) colnames(marker_scores) <- colnames(gmat) # Simulate phenotypes phen_mat <- matrix(rnorm(n_genotypes * n_traits, mean = 15, sd = 3), nrow = n_genotypes, ncol = n_traits ) colnames(phen_mat) <- colnames(gmat) # Economic weights weights <- c(10, 5, 3, 3, 5, 8, 4) # Calculate LMSI result <- lmsi(phen_mat, marker_scores, pmat, gmat, G_s = NULL, wmat = weights ) print(result$summary) ## End(Not run)
Build all possible Smith-Hazel selection indices from trait combinations, with optional exclusion of specific traits.
This function systematically evaluates indices for all combinations of ncomb traits, which is useful for identifying the most efficient subset of traits for selection.
lpsi(ncomb, pmat, gmat, wmat, wcol = 1, GAY, excluding_trait = NULL)lpsi(ncomb, pmat, gmat, wmat, wcol = 1, GAY, excluding_trait = NULL)
ncomb |
Number of traits per combination |
pmat |
Phenotypic variance-covariance matrix |
gmat |
Genotypic variance-covariance matrix |
wmat |
Weight matrix |
wcol |
Weight column number if more than one weight set (default: 1) |
GAY |
Genetic advance of comparative trait (optional) |
excluding_trait |
Optional. Traits to exclude from combinations. Can be: (1) numeric vector of trait indices (e.g., c(1, 3)), (2) character vector of trait names (e.g., c("sypp", "dtf")), (3) data frame/matrix columns with trait data (trait names extracted from column names). When specified, only combinations that do NOT contain any of these traits are returned. |
Data frame of all possible selection indices with metrics (GA, PRE, Delta_G, rHI, hI2)
## Not run: gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) wmat <- weight_mat(weight) # Build all 3-trait indices result <- lpsi(ncomb = 3, pmat = pmat, gmat = gmat, wmat = wmat, wcol = 1) # Exclude specific traits result <- lpsi( ncomb = 3, pmat = pmat, gmat = gmat, wmat = wmat, excluding_trait = c(1, 3) ) ## End(Not run)## Not run: gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) wmat <- weight_mat(weight) # Build all 3-trait indices result <- lpsi(ncomb = 3, pmat = pmat, gmat = gmat, wmat = wmat, wcol = 1) # Exclude specific traits result <- lpsi( ncomb = 3, pmat = pmat, gmat = gmat, wmat = wmat, excluding_trait = c(1, 3) ) ## End(Not run)
A synthetic dataset containing Single Nucleotide Polymorphism (SNP) marker data for the 100 maize genotypes found in maize_pheno. Designed for testing genomic selection indices (GESIM/RGESIM) and relationship matrices.
data(maize_geno)data(maize_geno)
A numeric matrix with 100 rows (genotypes) and 500 columns (SNP markers):
100 genotypes, matching the Genotype column in maize_pheno.
500 simulated SNP markers, coded as 0, 1, or 2.
Simulated for the selection.index package to provide reproducible examples.
data(maize_geno) dim(maize_geno)data(maize_geno) dim(maize_geno)
A synthetic dataset containing multi-environment phenotypic records for 100 maize genotypes. Designed to demonstrate phenotypic selection index calculations and variance-covariance matrix estimations.
data(maize_pheno)data(maize_pheno)
A data frame with 600 rows and 6 variables:
Factor representing 100 unique maize genotypes.
Factor representing 2 distinct growing environments.
Factor representing 3 replicate blocks within each environment.
Numeric vector of grain yield in kg/ha.
Numeric vector of plant height in centimeters.
Numeric vector of days to physiological maturity.
Simulated for the selection.index package to provide reproducible examples.
data(maize_pheno) head(maize_pheno)data(maize_pheno) head(maize_pheno)
Mean performance of phenotypic data
mean_performance( data, genotypes, replications, columns = NULL, main_plots = NULL, design_type = c("RCBD", "LSD", "SPD"), method = c("REML", "Yates", "Healy", "Regression", "Mean", "Bartlett") )mean_performance( data, genotypes, replications, columns = NULL, main_plots = NULL, design_type = c("RCBD", "LSD", "SPD"), method = c("REML", "Yates", "Healy", "Regression", "Mean", "Bartlett") )
data |
data for analysis |
genotypes |
genotypes vector (sub-plot treatments in SPD) |
replications |
replication vector |
columns |
vector containing columns (required for Latin Square Design only) |
main_plots |
vector containing main plot treatments (required for Split Plot Design only) |
design_type |
experimental design type: "RCBD" (default), "LSD" (Latin Square), or "SPD" (Split Plot) |
method |
Method for missing value imputation: "REML" (default), "Yates", "Healy", "Regression", "Mean", or "Bartlett" |
Dataframe of mean performance analysis
mean_performance(data = seldata[, 3:9], genotypes = seldata[, 2], replications = seldata[, 1])mean_performance(data = seldata[, 3:9], genotypes = seldata[, 2], replications = seldata[, 1])
Implements the MESIM by maximising the squared accuracy through the generalised eigenproblem combining phenotypic data with molecular marker scores. Unlike Smith-Hazel LPSI, **no economic weights are required**.
mesim(pmat, gmat, S_M, S_Mg = NULL, S_var = NULL, selection_intensity = 2.063)mesim(pmat, gmat, S_M, S_Mg = NULL, S_var = NULL, selection_intensity = 2.063)
pmat |
Phenotypic variance-covariance matrix (n_traits x n_traits). |
gmat |
Genotypic variance-covariance matrix (n_traits x n_traits). |
S_M |
Covariance between phenotypes and marker scores (n_traits x n_traits). This represents Cov(y, s) where s are marker scores. Used in the phenotypic variance matrix T_M. |
S_Mg |
Covariance between genetic values and marker scores (n_traits x n_traits). This represents Cov(g, s). Used in the genetic variance matrix Psi_M. If NULL (default), uses S_M, which is appropriate when assuming Cov(e, s) ~= 0 (errors uncorrelated with markers), so Cov(y,s) ~= Cov(g,s). |
S_var |
Variance-covariance matrix of marker scores (n_traits x n_traits). Represents Var(s). If NULL (default), uses S_M for backward compatibility, but providing the actual Var(s) is more statistically rigorous. |
selection_intensity |
Selection intensity constant |
Eigenproblem (Section 8.1):
where:
Theoretical distinction:
T_M uses phenotypic covariances: Cov(y, s) provided via S_M
Psi_M uses genetic covariances: Cov(g, s) provided via S_Mg
Since y = g + e, if Cov(e, s) ~= 0, then Cov(y, s) ~= Cov(g, s)
Chapter 8.1 assumes marker scores are pure genetic predictors, so S_M can be used for both (default behavior when S_Mg = NULL)
The solution (largest eigenvalue) equals the maximum
achievable index heritability.
Key metrics:
Object of class "mesim", a list with:
summaryData frame with coefficients and metrics.
b_yCoefficients for phenotypic data.
b_sCoefficients for marker scores.
b_combinedCombined coefficient vector.
E_MExpected genetic gains per trait.
sigma_IStandard deviation of the index.
hI2Index heritability (= leading eigenvalue).
rHIAccuracy .
R_MSelection response.
lambda2Leading eigenvalue (maximised index heritability).
selection_intensitySelection intensity used.
Ceron-Rojas, J. J., & Crossa, J. (2018). Linear Selection Indices in Modern Plant Breeding. Springer International Publishing. Section 8.1.
## Not run: gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) # Simulate marker score matrices (in practice, compute from data) S_M <- gmat * 0.7 # Cov(y, s) - phenotype-marker covariance S_Mg <- gmat * 0.65 # Cov(g, s) - genetic-marker covariance S_var <- gmat * 0.8 # Var(s) - marker score variance # Most rigorous: Provide all three covariance matrices result <- mesim(pmat, gmat, S_M, S_Mg = S_Mg, S_var = S_var) print(result) # Standard usage: Cov(g,s) defaults to Cov(y,s) when errors uncorrelated result_standard <- mesim(pmat, gmat, S_M, S_var = S_var) # Backward compatible: Chapter 8.1 simplified notation result_simple <- mesim(pmat, gmat, S_M) ## End(Not run)## Not run: gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) # Simulate marker score matrices (in practice, compute from data) S_M <- gmat * 0.7 # Cov(y, s) - phenotype-marker covariance S_Mg <- gmat * 0.65 # Cov(g, s) - genetic-marker covariance S_var <- gmat * 0.8 # Var(s) - marker score variance # Most rigorous: Provide all three covariance matrices result <- mesim(pmat, gmat, S_M, S_Mg = S_Mg, S_var = S_var) print(result) # Standard usage: Cov(g,s) defaults to Cov(y,s) when errors uncorrelated result_standard <- mesim(pmat, gmat, S_M, S_var = S_var) # Backward compatible: Chapter 8.1 simplified notation result_simple <- mesim(pmat, gmat, S_M) ## End(Not run)
Exported wrapper for missing value estimation in RCBD, Latin Square, and Split Plot designs. Calls the centralized design_stats engine for ANOVA computations.
Implements the two-stage Linear Genomic Selection Index where selection is based on GEBVs at both stages with covariance adjustments due to selection effects.
mlgsi( Gamma1, Gamma, A1, A, C, G1, P1, wmat, wcol = 1, selection_proportion = 0.1, use_young_method = FALSE, k1_manual = 2.063, k2_manual = 2.063, tau = NULL, reliability = NULL )mlgsi( Gamma1, Gamma, A1, A, C, G1, P1, wmat, wcol = 1, selection_proportion = 0.1, use_young_method = FALSE, k1_manual = 2.063, k2_manual = 2.063, tau = NULL, reliability = NULL )
Gamma1 |
GEBV variance-covariance matrix for stage 1 traits (n1 x n1) |
Gamma |
GEBV variance-covariance matrix for all traits at stage 2 (n x n) |
A1 |
Covariance matrix between GEBVs and true breeding values for stage 1 (n1 x n1) |
A |
Covariance matrix between GEBVs and true breeding values for stage 2 (n x n1) |
C |
Genotypic variance-covariance matrix for all traits (n x n) |
G1 |
Genotypic variance-covariance matrix for stage 1 traits (n1 x n1) |
P1 |
Phenotypic variance-covariance matrix for stage 1 traits (n1 x n1) |
wmat |
Economic weights vector or matrix (n x k) |
wcol |
Weight column to use if wmat has multiple columns (default: 1) |
selection_proportion |
Proportion selected at each stage (default: 0.1) |
use_young_method |
Logical. Use Young's method for selection intensities (default: FALSE). Young's method tends to overestimate intensities; manual intensities are recommended. |
k1_manual |
Manual selection intensity for stage 1 |
k2_manual |
Manual selection intensity for stage 2 |
tau |
Standardized truncation point |
reliability |
Optional reliability vector for computing A matrices |
Mathematical Formulation:
Stage 1: The genomic index is
Coefficients:
Stage 2: The index uses economic weights directly:
Adjusted genomic covariance matrix:
Adjusted genotypic covariance matrix:
where
Accuracy at stage 1:
Accuracy at stage 2:
List with components:
beta1 - Stage 1 genomic index coefficients
w - Economic weights (stage 2 coefficients)
stage1_metrics - List with stage 1 metrics (R1, E1, rho_HI1)
stage2_metrics - List with stage 2 metrics (R2, E2, rho_HI2)
Gamma_star - Adjusted genomic covariance matrix at stage 2
C_star - Adjusted genotypic covariance matrix at stage 2
rho_I1I2 - Correlation between stage 1 and stage 2 indices
k1 - Selection intensity at stage 1
k2 - Selection intensity at stage 2
summary_stage1 - Data frame with stage 1 summary
summary_stage2 - Data frame with stage 2 summary
Ceron-Rojas, J. J., & Crossa, J. (2018). Linear Selection Indices in Modern Plant Breeding. Springer International Publishing. Chapter 9, Section 9.4.
## Not run: # Two-stage genomic selection example # Stage 1: Select based on GEBVs for 3 traits # Stage 2: Select based on GEBVs for all 7 traits # Compute covariance matrices gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) # Simulate GEBV covariances (in practice, compute from genomic prediction) set.seed(123) reliability <- 0.7 Gamma1 <- reliability * gmat[1:3, 1:3] Gamma <- reliability * gmat A1 <- reliability * gmat[1:3, 1:3] A <- gmat[, 1:3] # Economic weights weights <- c(10, 8, 6, 4, 3, 2, 1) # Run MLGSI result <- mlgsi( Gamma1 = Gamma1, Gamma = Gamma, A1 = A1, A = A, C = gmat, G1 = gmat[1:3, 1:3], P1 = pmat[1:3, 1:3], wmat = weights, selection_proportion = 0.1 ) print(result$summary_stage1) print(result$summary_stage2) ## End(Not run)## Not run: # Two-stage genomic selection example # Stage 1: Select based on GEBVs for 3 traits # Stage 2: Select based on GEBVs for all 7 traits # Compute covariance matrices gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) # Simulate GEBV covariances (in practice, compute from genomic prediction) set.seed(123) reliability <- 0.7 Gamma1 <- reliability * gmat[1:3, 1:3] Gamma <- reliability * gmat A1 <- reliability * gmat[1:3, 1:3] A <- gmat[, 1:3] # Economic weights weights <- c(10, 8, 6, 4, 3, 2, 1) # Run MLGSI result <- mlgsi( Gamma1 = Gamma1, Gamma = Gamma, A1 = A1, A = A, C = gmat, G1 = gmat[1:3, 1:3], P1 = pmat[1:3, 1:3], wmat = weights, selection_proportion = 0.1 ) print(result$summary_stage1) print(result$summary_stage2) ## End(Not run)
Implements the two-stage Linear Phenotypic Selection Index where selection occurs at two different stages with covariance adjustments due to selection effects using Cochran/Cunningham's method.
mlpsi( P1, P, G1, C, wmat, wcol = 1, stage1_indices = NULL, selection_proportion = 0.1, use_young_method = FALSE, k1_manual = 2.063, k2_manual = 2.063, tau = NULL )mlpsi( P1, P, G1, C, wmat, wcol = 1, stage1_indices = NULL, selection_proportion = 0.1, use_young_method = FALSE, k1_manual = 2.063, k2_manual = 2.063, tau = NULL )
P1 |
Phenotypic variance-covariance matrix for stage 1 traits (n1 x n1) |
P |
Phenotypic variance-covariance matrix for all traits at stage 2 (n x n) |
G1 |
Genotypic variance-covariance matrix for stage 1 traits (n1 x n1) |
C |
Genotypic variance-covariance matrix for all traits (n x n) |
wmat |
Economic weights vector or matrix (n x k) |
wcol |
Weight column to use if wmat has multiple columns (default: 1) |
stage1_indices |
Integer vector specifying which traits (columns of P and C) correspond to stage 1. Default is 1:nrow(P1), assuming stage 1 traits are the first n1 traits. Use this to specify non-contiguous traits, e.g., c(1, 3, 5) for traits 1, 3, and 5. |
selection_proportion |
Proportion selected at each stage (default: 0.1) |
use_young_method |
Logical. Use Young's method for selection intensities (default: FALSE). Young's method tends to overestimate intensities; manual intensities are recommended. |
k1_manual |
Manual selection intensity for stage 1 (used if use_young_method = FALSE) |
k2_manual |
Manual selection intensity for stage 2 (used if use_young_method = FALSE) |
tau |
Standardized truncation point (default: computed from selection_proportion) |
Mathematical Formulation:
Stage 1 index coefficients:
Stage 2 index coefficients:
Adjusted phenotypic covariance matrix (Cochran/Cunningham):
Adjusted genotypic covariance matrix:
where
Selection response: ,
List with components:
b1 - Stage 1 index coefficients
b2 - Stage 2 index coefficients
stage1_metrics - List with stage 1 metrics (R1, E1, rho_H1)
stage2_metrics - List with stage 2 metrics (R2, E2, rho_H2)
P_star - Adjusted phenotypic covariance matrix at stage 2
C_star - Adjusted genotypic covariance matrix at stage 2
rho_12 - Correlation between stage 1 and stage 2 indices
k1 - Selection intensity at stage 1
k2 - Selection intensity at stage 2
summary_stage1 - Data frame with stage 1 summary
summary_stage2 - Data frame with stage 2 summary
Cochran, W. G. (1951). Improvement by means of selection. Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, 449-470.
Cunningham, E. P. (1975). Multi-stage index selection. Theoretical and Applied Genetics, 46(2), 55-61.
Young, S. S. Y. (1964). Multi-stage selection for genetic gain. Heredity, 19(1), 131-144.
## Not run: # Two-stage selection example # Stage 1: Select based on 3 traits # Stage 2: Select based on all 7 traits # Compute variance-covariance matrices pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) # Stage 1 uses first 3 traits P1 <- pmat[1:3, 1:3] G1 <- gmat[1:3, 1:3] # Stage 2 uses all 7 traits P <- pmat C <- gmat # Economic weights weights <- c(10, 8, 6, 4, 3, 2, 1) # Run MLPSI (default: stage1_indices = 1:3) result <- mlpsi( P1 = P1, P = P, G1 = G1, C = C, wmat = weights, selection_proportion = 0.1 ) # Or with non-contiguous traits (e.g., traits 1, 3, 5 at stage 1): # P1 <- pmat[c(1,3,5), c(1,3,5)] # G1 <- gmat[c(1,3,5), c(1,3,5)] # result <- mlpsi(P1 = P1, P = P, G1 = G1, C = C, wmat = weights, # stage1_indices = c(1, 3, 5)) print(result$summary_stage1) print(result$summary_stage2) ## End(Not run)## Not run: # Two-stage selection example # Stage 1: Select based on 3 traits # Stage 2: Select based on all 7 traits # Compute variance-covariance matrices pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) # Stage 1 uses first 3 traits P1 <- pmat[1:3, 1:3] G1 <- gmat[1:3, 1:3] # Stage 2 uses all 7 traits P <- pmat C <- gmat # Economic weights weights <- c(10, 8, 6, 4, 3, 2, 1) # Run MLPSI (default: stage1_indices = 1:3) result <- mlpsi( P1 = P1, P = P, G1 = G1, C = C, wmat = weights, selection_proportion = 0.1 ) # Or with non-contiguous traits (e.g., traits 1, 3, 5 at stage 1): # P1 <- pmat[c(1,3,5), c(1,3,5)] # G1 <- gmat[c(1,3,5), c(1,3,5)] # result <- mlpsi(P1 = P1, P = P, G1 = G1, C = C, wmat = weights, # stage1_indices = c(1, 3, 5)) print(result$summary_stage1) print(result$summary_stage2) ## End(Not run)
Implements the two-stage Predetermined Proportional Gain LGSI where breeders specify desired proportional gains between traits at each stage using GEBVs.
mppg_lgsi( Gamma1, Gamma, A1, A, C, G1, P1, wmat, wcol = 1, d1, d2, U1 = NULL, U2 = NULL, selection_proportion = 0.1, use_young_method = FALSE, k1_manual = 2.063, k2_manual = 2.063, tau = NULL )mppg_lgsi( Gamma1, Gamma, A1, A, C, G1, P1, wmat, wcol = 1, d1, d2, U1 = NULL, U2 = NULL, selection_proportion = 0.1, use_young_method = FALSE, k1_manual = 2.063, k2_manual = 2.063, tau = NULL )
Gamma1 |
GEBV variance-covariance matrix for stage 1 traits (n1 x n1) |
Gamma |
GEBV variance-covariance matrix for all traits at stage 2 (n x n) |
A1 |
Covariance matrix between GEBVs and true breeding values for stage 1 (n1 x n1) |
A |
Covariance matrix between GEBVs and true breeding values for stage 2 (n x n1) |
C |
Genotypic variance-covariance matrix for all traits (n x n) |
G1 |
Genotypic variance-covariance matrix for stage 1 traits (n1 x n1) |
P1 |
Phenotypic variance-covariance matrix for stage 1 traits (n1 x n1) |
wmat |
Economic weights vector or matrix (n x k) |
wcol |
Weight column to use if wmat has multiple columns (default: 1) |
d1 |
Vector of desired proportional gains for stage 1 (length n1) |
d2 |
Vector of desired proportional gains for stage 2 (length n) |
U1 |
Constraint matrix for stage 1 (n1 x r1), optional |
U2 |
Constraint matrix for stage 2 (n x r2), optional |
selection_proportion |
Proportion selected at each stage (default: 0.1) |
use_young_method |
Logical. Use Young's method for selection intensities (default: FALSE). Young's method tends to overestimate intensities; manual intensities are recommended. |
k1_manual |
Manual selection intensity for stage 1 |
k2_manual |
Manual selection intensity for stage 2 |
tau |
Standardized truncation point |
Mathematical Formulation:
The PPG genomic coefficients are:
where proportionality constants are:
Covariance Adjustment:
The genetic covariance matrix is adjusted using phenotypic PPG coefficients
, which reflect
the same proportional gain constraints as the genomic coefficients .
This ensures the adjustment reflects the actual PPG selection occurring at stage 1.
Important: When using custom U1 matrices (subset constraints), the phenotypic
proxy uses the standard Tallis formula (all traits constrained), while
the genomic index respects the U1 subset. This may cause
to be slightly over-adjusted. For exact adjustment, use U1 = NULL
(default, all traits constrained). Calculating the exact restricted phenotypic proxy would
require implementing the full MPPG-LPSI projection matrix method for the phenotypic coefficients.
Note: Input covariance matrices (C, P1, Gamma1, Gamma)
should be positive definite. Non-positive definite matrices may lead to invalid results or warnings.
List with components similar to mlgsi, plus:
beta_P1 - PPG genomic stage 1 coefficients
beta_P2 - PPG genomic stage 2 coefficients
b_P1 - PPG phenotypic stage 1 coefficients (used for C* adjustment)
theta1 - Proportionality constant for stage 1
theta2 - Proportionality constant for stage 2
gain_ratios_1 - Achieved gain ratios at stage 1
gain_ratios_2 - Achieved gain ratios at stage 2
Ceron-Rojas, J. J., & Crossa, J. (2018). Linear Selection Indices in Modern Plant Breeding. Springer International Publishing. Chapter 9, Section 9.6.
## Not run: # Two-stage proportional gain genomic selection gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) reliability <- 0.7 Gamma1 <- reliability * gmat[1:3, 1:3] Gamma <- reliability * gmat A1 <- reliability * gmat[1:3, 1:3] A <- gmat[, 1:3] # Desired proportional gains d1 <- c(2, 1, 1) d2 <- c(3, 2, 1, 1, 1, 0.5, 0.5) weights <- c(10, 8, 6, 4, 3, 2, 1) result <- mppg_lgsi( Gamma1 = Gamma1, Gamma = Gamma, A1 = A1, A = A, C = gmat, G1 = gmat[1:3, 1:3], P1 = pmat[1:3, 1:3], wmat = weights, d1 = d1, d2 = d2 ) ## End(Not run)## Not run: # Two-stage proportional gain genomic selection gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) reliability <- 0.7 Gamma1 <- reliability * gmat[1:3, 1:3] Gamma <- reliability * gmat A1 <- reliability * gmat[1:3, 1:3] A <- gmat[, 1:3] # Desired proportional gains d1 <- c(2, 1, 1) d2 <- c(3, 2, 1, 1, 1, 0.5, 0.5) weights <- c(10, 8, 6, 4, 3, 2, 1) result <- mppg_lgsi( Gamma1 = Gamma1, Gamma = Gamma, A1 = A1, A = A, C = gmat, G1 = gmat[1:3, 1:3], P1 = pmat[1:3, 1:3], wmat = weights, d1 = d1, d2 = d2 ) ## End(Not run)
Implements the two-stage Predetermined Proportional Gain LPSI where breeders specify desired proportional gains between traits at each stage.
mppg_lpsi( P1, P, G1, C, wmat, wcol = 1, d1, d2, stage1_indices = NULL, selection_proportion = 0.1, use_young_method = FALSE, k1_manual = 2.063, k2_manual = 2.063, tau = NULL )mppg_lpsi( P1, P, G1, C, wmat, wcol = 1, d1, d2, stage1_indices = NULL, selection_proportion = 0.1, use_young_method = FALSE, k1_manual = 2.063, k2_manual = 2.063, tau = NULL )
P1 |
Phenotypic variance-covariance matrix for stage 1 traits (n1 x n1) |
P |
Phenotypic variance-covariance matrix for all traits at stage 2 (n x n) |
G1 |
Genotypic variance-covariance matrix for stage 1 traits (n1 x n1) |
C |
Genotypic variance-covariance matrix for all traits (n x n) |
wmat |
Economic weights vector or matrix (n x k) |
wcol |
Weight column to use if wmat has multiple columns (default: 1) |
d1 |
Vector of desired proportional gains for stage 1 (length n1) |
d2 |
Vector of desired proportional gains for stage 2 (length n) |
stage1_indices |
Integer vector specifying which traits correspond to stage 1 (default: 1:nrow(P1)) |
selection_proportion |
Proportion selected at each stage (default: 0.1) |
use_young_method |
Logical. Use Young's method for selection intensities (default: FALSE). Young's method tends to overestimate intensities; manual intensities are recommended. |
k1_manual |
Manual selection intensity for stage 1 |
k2_manual |
Manual selection intensity for stage 2 |
tau |
Standardized truncation point |
Mathematical Formulation (Chapter 9.3.1, Eq 9.17):
The PPG coefficients are computed using the projection matrix method:
where:
are restricted coefficients
is the projection matrix
is the proportionality constant computed from
(all traits constrained)
where is computed to achieve proportional gains specified by
List with components similar to mlpsi, plus:
b_M1 - PPG stage 1 coefficients
b_M2 - PPG stage 2 coefficients
b_R1 - Restricted stage 1 coefficients
b_R2 - Restricted stage 2 coefficients
K_M1 - PPG projection matrix for stage 1
K_M2 - PPG projection matrix for stage 2
theta1 - Proportionality constant for stage 1
theta2 - Proportionality constant for stage 2
gain_ratios_1 - Achieved gain ratios at stage 1
gain_ratios_2 - Achieved gain ratios at stage 2
Tallis, G. M. (1962). A selection index for optimum genotype. Biometrics, 18(1), 120-122.
## Not run: # Two-stage proportional gain selection pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) P1 <- pmat[1:3, 1:3] G1 <- gmat[1:3, 1:3] P <- pmat C <- gmat # Desired proportional gains d1 <- c(2, 1, 1) # Trait 1 gains twice as much at stage 1 d2 <- c(3, 2, 1, 1, 1, 0.5, 0.5) # Different proportions at stage 2 weights <- c(10, 8, 6, 4, 3, 2, 1) result <- mppg_lpsi( P1 = P1, P = P, G1 = G1, C = C, wmat = weights, d1 = d1, d2 = d2 ) ## End(Not run)## Not run: # Two-stage proportional gain selection pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) P1 <- pmat[1:3, 1:3] G1 <- gmat[1:3, 1:3] P <- pmat C <- gmat # Desired proportional gains d1 <- c(2, 1, 1) # Trait 1 gains twice as much at stage 1 d2 <- c(3, 2, 1, 1, 1, 0.5, 0.5) # Different proportions at stage 2 weights <- c(10, 8, 6, 4, 3, 2, 1) result <- mppg_lpsi( P1 = P1, P = P, G1 = G1, C = C, wmat = weights, d1 = d1, d2 = d2 ) ## End(Not run)
Implements the two-stage Restricted Linear Genomic Selection Index where certain traits are constrained to have zero genetic gain at each stage using GEBVs.
mrlgsi( Gamma1, Gamma, A1, A, C, G1, P1, wmat, wcol = 1, C1, C2, selection_proportion = 0.1, use_young_method = FALSE, k1_manual = 2.063, k2_manual = 2.063, tau = NULL )mrlgsi( Gamma1, Gamma, A1, A, C, G1, P1, wmat, wcol = 1, C1, C2, selection_proportion = 0.1, use_young_method = FALSE, k1_manual = 2.063, k2_manual = 2.063, tau = NULL )
Gamma1 |
GEBV variance-covariance matrix for stage 1 traits (n1 x n1) |
Gamma |
GEBV variance-covariance matrix for all traits at stage 2 (n x n) |
A1 |
Covariance matrix between GEBVs and true breeding values for stage 1 (n1 x n1) |
A |
Covariance matrix between GEBVs and true breeding values for stage 2 (n x n1) |
C |
Genotypic variance-covariance matrix for all traits (n x n) |
G1 |
Genotypic variance-covariance matrix for stage 1 traits (n1 x n1) |
P1 |
Phenotypic variance-covariance matrix for stage 1 traits (n1 x n1) |
wmat |
Economic weights vector or matrix (n x k) |
wcol |
Weight column to use if wmat has multiple columns (default: 1) |
C1 |
Constraint matrix for stage 1 (n1 x r1) |
C2 |
Constraint matrix for stage 2 (n x r2) |
selection_proportion |
Proportion selected at each stage (default: 0.1) |
use_young_method |
Logical. Use Young's method for selection intensities (default: FALSE). Young's method tends to overestimate intensities; manual intensities are recommended. |
k1_manual |
Manual selection intensity for stage 1 |
k2_manual |
Manual selection intensity for stage 2 |
tau |
Standardized truncation point |
Mathematical Formulation:
The restricted genomic coefficients are:
where restriction matrices are computed similarly to RLGSI
List with components similar to mlgsi, plus:
beta_R1 - Restricted stage 1 coefficients
beta_R2 - Restricted stage 2 coefficients
K_G1 - Restriction matrix for stage 1
K_G2 - Restriction matrix for stage 2
Ceron-Rojas, J. J., & Crossa, J. (2018). Linear Selection Indices in Modern Plant Breeding. Springer International Publishing. Chapter 9, Section 9.5.
## Not run: # Two-stage restricted genomic selection gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) reliability <- 0.7 Gamma1 <- reliability * gmat[1:3, 1:3] Gamma <- reliability * gmat A1 <- reliability * gmat[1:3, 1:3] A <- gmat[, 1:3] # Constraint matrices C1 <- matrix(0, nrow = 3, ncol = 1) C1[1, 1] <- 1 # Restrict trait 1 at stage 1 C2 <- matrix(0, nrow = 7, ncol = 2) C2[1, 1] <- 1 # Restrict trait 1 at stage 2 C2[3, 2] <- 1 # Restrict trait 3 at stage 2 weights <- c(10, 8, 6, 4, 3, 2, 1) result <- mrlgsi( Gamma1 = Gamma1, Gamma = Gamma, A1 = A1, A = A, C = gmat, G1 = gmat[1:3, 1:3], P1 = pmat[1:3, 1:3], wmat = weights, C1 = C1, C2 = C2 ) ## End(Not run)## Not run: # Two-stage restricted genomic selection gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) reliability <- 0.7 Gamma1 <- reliability * gmat[1:3, 1:3] Gamma <- reliability * gmat A1 <- reliability * gmat[1:3, 1:3] A <- gmat[, 1:3] # Constraint matrices C1 <- matrix(0, nrow = 3, ncol = 1) C1[1, 1] <- 1 # Restrict trait 1 at stage 1 C2 <- matrix(0, nrow = 7, ncol = 2) C2[1, 1] <- 1 # Restrict trait 1 at stage 2 C2[3, 2] <- 1 # Restrict trait 3 at stage 2 weights <- c(10, 8, 6, 4, 3, 2, 1) result <- mrlgsi( Gamma1 = Gamma1, Gamma = Gamma, A1 = A1, A = A, C = gmat, G1 = gmat[1:3, 1:3], P1 = pmat[1:3, 1:3], wmat = weights, C1 = C1, C2 = C2 ) ## End(Not run)
Implements the two-stage Restricted Linear Phenotypic Selection Index where certain traits are constrained to have zero genetic gain at each stage.
mrlpsi( P1, P, G1, C, wmat, wcol = 1, C1, C2, stage1_indices = NULL, selection_proportion = 0.1, use_young_method = FALSE, k1_manual = 2.063, k2_manual = 2.063, tau = NULL )mrlpsi( P1, P, G1, C, wmat, wcol = 1, C1, C2, stage1_indices = NULL, selection_proportion = 0.1, use_young_method = FALSE, k1_manual = 2.063, k2_manual = 2.063, tau = NULL )
P1 |
Phenotypic variance-covariance matrix for stage 1 traits (n1 x n1) |
P |
Phenotypic variance-covariance matrix for all traits at stage 2 (n x n) |
G1 |
Genotypic variance-covariance matrix for stage 1 traits (n1 x n1) |
C |
Genotypic variance-covariance matrix for all traits (n x n) |
wmat |
Economic weights vector or matrix (n x k) |
wcol |
Weight column to use if wmat has multiple columns (default: 1) |
C1 |
Constraint matrix for stage 1 (n1 x r1) |
C2 |
Constraint matrix for stage 2 (n x r2) |
stage1_indices |
Integer vector specifying which traits correspond to stage 1 (default: 1:nrow(P1)) |
selection_proportion |
Proportion selected at each stage (default: 0.1) |
use_young_method |
Logical. Use Young's method for selection intensities (default: FALSE). Young's method tends to overestimate intensities; manual intensities are recommended. |
k1_manual |
Manual selection intensity for stage 1 |
k2_manual |
Manual selection intensity for stage 2 |
tau |
Standardized truncation point |
Mathematical Formulation:
The restricted coefficients are computed as:
where and
and
List with components similar to mlpsi, plus:
b_R1 - Restricted stage 1 coefficients
b_R2 - Restricted stage 2 coefficients
K1 - Restriction matrix for stage 1
K2 - Restriction matrix for stage 2
Kempthorne, O., & Nordskog, A. W. (1959). Restricted selection indices. Biometrics, 15(1), 10-19.
## Not run: # Two-stage restricted selection # Restrict trait 1 at stage 1, traits 1 and 3 at stage 2 pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) P1 <- pmat[1:3, 1:3] G1 <- gmat[1:3, 1:3] P <- pmat C <- gmat # Constraint matrices C1 <- matrix(0, nrow = 3, ncol = 1) C1[1, 1] <- 1 # Restrict trait 1 at stage 1 C2 <- matrix(0, nrow = 7, ncol = 2) C2[1, 1] <- 1 # Restrict trait 1 at stage 2 C2[3, 2] <- 1 # Restrict trait 3 at stage 2 weights <- c(10, 8, 6, 4, 3, 2, 1) result <- mrlpsi( P1 = P1, P = P, G1 = G1, C = C, wmat = weights, C1 = C1, C2 = C2 ) ## End(Not run)## Not run: # Two-stage restricted selection # Restrict trait 1 at stage 1, traits 1 and 3 at stage 2 pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) P1 <- pmat[1:3, 1:3] G1 <- gmat[1:3, 1:3] P <- pmat C <- gmat # Constraint matrices C1 <- matrix(0, nrow = 3, ncol = 1) C1[1, 1] <- 1 # Restrict trait 1 at stage 1 C2 <- matrix(0, nrow = 7, ncol = 2) C2[1, 1] <- 1 # Restrict trait 1 at stage 2 C2[3, 2] <- 1 # Restrict trait 3 at stage 2 weights <- c(10, 8, 6, 4, 3, 2, 1) result <- mrlpsi( P1 = P1, P = P, G1 = G1, C = C, wmat = weights, C1 = C1, C2 = C2 ) ## End(Not run)
Phenotypic Variance-Covariance Analysis
phen_varcov( data, genotypes, replication, columns = NULL, main_plots = NULL, design_type = c("RCBD", "LSD", "SPD"), method = c("REML", "Yates", "Healy", "Regression", "Mean", "Bartlett") )phen_varcov( data, genotypes, replication, columns = NULL, main_plots = NULL, design_type = c("RCBD", "LSD", "SPD"), method = c("REML", "Yates", "Healy", "Regression", "Mean", "Bartlett") )
data |
traits to be analyzed |
genotypes |
vector containing genotypes/treatments (sub-plot treatments in SPD) |
replication |
vector containing replication/blocks (RCBD) or rows (LSD) |
columns |
vector containing columns (required for Latin Square Design only) |
main_plots |
vector containing main plot treatments (required for Split Plot Design only) |
design_type |
experimental design type: "RCBD" (default), "LSD" (Latin Square), or "SPD" (Split Plot) |
method |
Method for missing value imputation: "REML" (default), "Yates", "Healy", "Regression", "Mean", or "Bartlett" |
A Phenotypic Variance-Covariance Matrix
# RCBD example phen_varcov(data = seldata[, 3:9], genotypes = seldata$treat, replication = seldata$rep) # Latin Square Design example (requires columns parameter) # phen_varcov(data=lsd_data[,3:7], genotypes=lsd_data$treat, # replication=lsd_data$row, columns=lsd_data$col, design_type="LSD") # Split Plot Design example (requires main_plots parameter) # phen_varcov(data=spd_data[,3:7], genotypes=spd_data$subplot, # replication=spd_data$block, main_plots=spd_data$mainplot, design_type="SPD")# RCBD example phen_varcov(data = seldata[, 3:9], genotypes = seldata$treat, replication = seldata$rep) # Latin Square Design example (requires columns parameter) # phen_varcov(data=lsd_data[,3:7], genotypes=lsd_data$treat, # replication=lsd_data$row, columns=lsd_data$col, design_type="LSD") # Split Plot Design example (requires main_plots parameter) # phen_varcov(data=spd_data[,3:7], genotypes=spd_data$subplot, # replication=spd_data$block, main_plots=spd_data$mainplot, design_type="SPD")
)Computes the combined phenomic-genomic variance-covariance matrix ( or P_L),
which is the block matrix representing the joint distribution of phenotypes
and GEBVs.
Structure: = [[P, P_y], [P_y', ]]
where:
- P = Var(y) = phenotypic variance-covariance
- = Var() = genomic variance-covariance
- P_y = Cov(y, ) = covariance between phenotypes and GEBVs
phenomic_genomic_varcov( phen_mat = NULL, gebv_mat = NULL, P = NULL, Gamma = NULL, P_yg = NULL, method = "pearson", use = "complete.obs" )phenomic_genomic_varcov( phen_mat = NULL, gebv_mat = NULL, P = NULL, Gamma = NULL, P_yg = NULL, method = "pearson", use = "complete.obs" )
phen_mat |
Matrix of phenotypes (n_genotypes x n_traits). Optional if P and P_yg are provided. |
gebv_mat |
Matrix of GEBVs (n_genotypes x n_traits). Optional if Gamma and P_yg are provided. |
P |
Phenotypic variance-covariance matrix (n_traits x n_traits). Optional if phen_mat is provided. |
Gamma |
Genomic variance-covariance matrix (n_traits x n_traits). Optional if gebv_mat is provided. |
P_yg |
Covariance between phenotypes and GEBVs (n_traits x n_traits). Optional if phen_mat and gebv_mat are provided. |
method |
Character string specifying correlation method: "pearson" (default), "kendall", or "spearman" |
use |
Character string specifying how to handle missing values: "complete.obs" (default), "pairwise.complete.obs", etc. |
The phenomic-genomic covariance matrix is used in: - GESIM (Genomic Eigen Selection Index Method) - Combined phenotypic + genomic selection indices
The matrix is constructed as:
where the off-diagonal blocks are transposes, ensuring symmetry.
You can provide either: 1. Raw data: phen_mat + gebv_mat (matrices computed internally) 2. Pre-computed matrices: P + Gamma + P_yg
Symmetric block matrix (2*n_traits x 2*n_traits)
Cerón-Rojas, J. J., & Crossa, J. (2018). Linear Selection Indices in Modern Plant Breeding. Springer International Publishing. Chapter 8.
## Not run: # Simulate data set.seed(123) n_genotypes <- 100 n_traits <- 7 phen_mat <- matrix(rnorm(n_genotypes * n_traits, mean = 15, sd = 3), nrow = n_genotypes, ncol = n_traits ) gebv_mat <- matrix(rnorm(n_genotypes * n_traits, mean = 10, sd = 2), nrow = n_genotypes, ncol = n_traits ) # Compute phenomic-genomic covariance Phi <- phenomic_genomic_varcov(phen_mat, gebv_mat) print(dim(Phi)) # Should be 14 x 14 (2 * 7 traits) ## End(Not run)## Not run: # Simulate data set.seed(123) n_genotypes <- 100 n_traits <- 7 phen_mat <- matrix(rnorm(n_genotypes * n_traits, mean = 15, sd = 3), nrow = n_genotypes, ncol = n_traits ) gebv_mat <- matrix(rnorm(n_genotypes * n_traits, mean = 10, sd = 2), nrow = n_genotypes, ncol = n_traits ) # Compute phenomic-genomic covariance Phi <- phenomic_genomic_varcov(phen_mat, gebv_mat) print(dim(Phi)) # Should be 14 x 14 (2 * 7 traits) ## End(Not run)
Plot Method for Selection Simulation Results
## S3 method for class 'selection_simulation' plot(x, trait_index = 1, type = c("mean", "gain"), ...)## S3 method for class 'selection_simulation' plot(x, trait_index = 1, type = c("mean", "gain"), ...)
x |
Object of class |
trait_index |
Which trait to plot (default: 1) |
type |
Type of plot: "mean" for mean genetic value, "gain" for genetic gain per cycle |
... |
Additional arguments passed to plot |
Extends ESIM by enforcing that genetic gains are proportional to a
user-specified vector : .
A similarity transformation
aligns the eigenvector with the breeder's desired direction.
ppg_esim(pmat, gmat, d, selection_intensity = 2.063)ppg_esim(pmat, gmat, d, selection_intensity = 2.063)
pmat |
Phenotypic variance-covariance matrix (n_traits x n_traits). |
gmat |
Genotypic variance-covariance matrix (n_traits x n_traits). |
d |
Numeric vector of desired proportional gains (length n_traits). The ratios among elements define target gain proportions. Direction (positive/negative) must reflect desired improvement direction (positive = increase, negative = decrease). |
selection_intensity |
Selection intensity constant (default: 2.063). |
Restriction structure via the Mallard Matrix (Section 7.3):
The PPG-ESIM restricts the directions **orthogonal** to ,
forcing the genetic gain vector to be collinear with .
The Mallard matrix is : its columns span
the orthogonal complement of , obtained via QR decomposition of
:
With (full-trait case, ):
PPG projection matrix ( restrictions):
Because has rank 1 (projects onto the subspace),
has exactly one positive eigenvalue and
its eigenvector produces .
PPG eigenproblem (rank-1 system):
Similarity transform:
where aligns the
eigenvector sign with the breeder's intended improvement direction.
Key response metrics:
Object of class "ppg_esim", a list with:
summaryData frame with beta (transformed b), b (raw), hI2, rHI, sigma_I, Delta_G, and lambda2.
betaNamed numeric vector of post-transformation PPG-ESIM
coefficients .
bRaw eigenvector b_P before similarity transform.
Delta_GNamed vector of expected genetic gains per trait.
sigma_IIndex standard deviation.
hI2Index heritability.
rHIIndex accuracy.
lambda2Leading eigenvalue of the PPG restricted eigenproblem.
F_matDiagonal similarity transform matrix F (diag(sign(d))).
K_PPPG projection matrix (rank 1: projects onto d subspace).
D_MMallard matrix (t x t-1): orthogonal complement of d, used to construct the (t-1) restrictions.
desired_gainsInput proportional gains vector d.
selection_intensitySelection intensity used.
Ceron-Rojas, J. J., & Crossa, J. (2018). Linear Selection Indices in Modern Plant Breeding. Springer International Publishing. Section 7.3.
## Not run: gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) # Desired proportional gains: increase all traits proportionally d <- c(2, 1, 1, 1, 1, 1, 1) result <- ppg_esim(pmat, gmat, d) print(result) summary(result) ## End(Not run)## Not run: gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) # Desired proportional gains: increase all traits proportionally d <- c(2, 1, 1, 1, 1, 1, 1) result <- ppg_esim(pmat, gmat, d) print(result) summary(result) ## End(Not run)
Implements the PPG-GESIM which extends GESIM to enforce that genetic gains are proportional to a user-specified vector d. Combines eigen approach with predetermined gain proportions.
ppg_gesim(pmat, gmat, Gamma, d, selection_intensity = 2.063)ppg_gesim(pmat, gmat, Gamma, d, selection_intensity = 2.063)
pmat |
Phenotypic variance-covariance matrix (n_traits x n_traits). |
gmat |
Genotypic variance-covariance matrix (n_traits x n_traits). |
Gamma |
Covariance between phenotypes and GEBVs (n_traits x n_traits). |
d |
Numeric vector of desired proportional gains (length n_traits). The ratios among elements define target gain proportions. |
selection_intensity |
Selection intensity constant |
Eigenproblem (Section 8.5):
where:
Implied economic weights:
Selection response:
Expected genetic gain per trait:
Object of class "ppg_gesim", a list with:
summaryData frame with coefficients and metrics.
b_yCoefficients for phenotypic data.
b_gammaCoefficients for GEBVs.
b_combinedCombined coefficient vector.
E_PGExpected genetic gains per trait.
gain_ratiosRatios of actual to desired gains (should be constant).
dOriginal desired proportional gains (length t).
d_PGExtended proportional gains (length 2t, includes GEBV targets).
sigma_IStandard deviation of the index.
hI2Index heritability.
rHIAccuracy.
R_PGSelection response.
lambda2Leading eigenvalue.
implied_wImplied economic weights.
U_PGRestriction matrix ((2t-1) x 2t).
selection_intensitySelection intensity used.
Ceron-Rojas, J. J., & Crossa, J. (2018). Linear Selection Indices in Modern Plant Breeding. Springer International Publishing. Section 8.5.
## Not run: gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) # Simulate GEBV covariance Gamma <- gmat * 0.8 # Desired proportional gains (e.g., 2:1:3 ratio for first 3 traits) d <- c(2, 1, 3, 1, 1, 1, 1) result <- ppg_gesim(pmat, gmat, Gamma, d) print(result) print(result$gain_ratios) # Should be approximately constant ## End(Not run)## Not run: gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) # Simulate GEBV covariance Gamma <- gmat * 0.8 # Desired proportional gains (e.g., 2:1:3 ratio for first 3 traits) d <- c(2, 1, 3, 1, 1, 1, 1) result <- ppg_gesim(pmat, gmat, Gamma, d) print(result) print(result$gain_ratios) # Should be approximately constant ## End(Not run)
Implements the PPG-LGSI where breeders specify desired proportional gains between traits rather than restricting specific traits to zero. This is genomic version of PPG-LPSI using GEBVs only.
ppg_lgsi( Gamma, d, wmat = NULL, wcol = 1, U = NULL, k_I = 2.063, L_G = 1, gmat = NULL, GAY = NULL )ppg_lgsi( Gamma, d, wmat = NULL, wcol = 1, U = NULL, k_I = 2.063, L_G = 1, gmat = NULL, GAY = NULL )
Gamma |
GEBV variance-covariance matrix (n_traits x n_traits) |
d |
Vector of desired proportional gains (length n_traits or n_constraints). If length n_traits, constraints are applied to all traits. If length n_constraints, must provide U matrix. |
wmat |
Optional. Economic weights for GA/PRE calculation |
wcol |
Weight column to use if wmat has multiple columns (default: 1) |
U |
Optional. Constraint matrix (n_traits x n_constraints). If NULL, assumes d applies to all traits (U = I). |
k_I |
Selection intensity (default: 2.063) |
L_G |
Standardization constant (default: 1) |
gmat |
Optional. True genetic variance-covariance matrix for exact accuracy calculation. If NULL, uses Gamma as approximation. |
GAY |
Optional. Genetic advance of comparative trait for PRE calculation |
Mathematical Formulation (Chapter 6, Section 6.2):
Alternative form:
Where: - beta_RG = Restricted index coefficients (from RLGSI) - theta_G = Proportionality constant - d = Vector of desired proportional gains
Proportionality constant:
List with:
summary - Data frame with coefficients and metrics
b - Vector of PPG-LGSI coefficients ()
E - Named vector of expected genetic gains per trait
theta_G - Proportionality constant
gain_ratios - Ratios of achieved to desired gains
## Not run: # Simulate GEBV variance-covariance matrix set.seed(123) n_traits <- 5 Gamma <- matrix(rnorm(n_traits^2), n_traits, n_traits) Gamma <- (Gamma + t(Gamma)) / 2 diag(Gamma) <- abs(diag(Gamma)) + 2 # Desired proportional gains (e.g., 2:1:1:0:0 ratio) d <- c(2, 1, 1, 0, 0) # Economic weights w <- c(10, 8, 6, 4, 2) result <- ppg_lgsi(Gamma, d, wmat = w) print(result$summary) print(result$gain_ratios) # Should be approximately proportional to d ## End(Not run)## Not run: # Simulate GEBV variance-covariance matrix set.seed(123) n_traits <- 5 Gamma <- matrix(rnorm(n_traits^2), n_traits, n_traits) Gamma <- (Gamma + t(Gamma)) / 2 diag(Gamma) <- abs(diag(Gamma)) + 2 # Desired proportional gains (e.g., 2:1:1:0:0 ratio) d <- c(2, 1, 1, 0, 0) # Economic weights w <- c(10, 8, 6, 4, 2) result <- ppg_lgsi(Gamma, d, wmat = w) print(result$summary) print(result$gain_ratios) # Should be approximately proportional to d ## End(Not run)
Implements the PPG-LPSI where breeders specify desired proportional gains between traits rather than restricting specific traits to zero. Based on Tallis (1962).
ppg_lpsi(pmat, gmat, k, wmat = NULL, wcol = 1, GAY)ppg_lpsi(pmat, gmat, k, wmat = NULL, wcol = 1, GAY)
pmat |
Phenotypic variance-covariance matrix (n_traits x n_traits) |
gmat |
Genotypic variance-covariance matrix (n_traits x n_traits) |
k |
Vector of desired proportional gains (length n_traits). Example: k = c(2, 1, 1) means trait 1 should gain twice as much as traits 2 and 3. |
wmat |
Optional weight matrix for GA/PRE calculation |
wcol |
Weight column number (default: 1) |
GAY |
Genetic advance of comparative trait (optional) |
Mathematical Formulation (Chapter 3, Section 3.2):
The PPG-LPSI achieves gains in specific proportions: Delta_G = phi*k
Coefficient formula (Tallis, 1962):
Where: - k = Vector of desired proportions - phi = Proportionality constant (determined by selection intensity and variances)
The constraint ensures Delta_G1:Delta_G2:Delta_G3 = k1:k2:k3
List with:
summary - Data frame with coefficients and metrics
b - Vector of PPG-LPSI coefficients
Delta_G - Expected genetic gains per trait
phi - Proportionality constant
## Not run: gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) # Gains in ratio 2:1:1:1:1:1:1 k <- c(2, 1, 1, 1, 1, 1, 1) result <- ppg_lpsi(pmat, gmat, k) ## End(Not run)## Not run: gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) # Gains in ratio 2:1:1:1:1:1:1 k <- c(2, 1, 1, 1, 1, 1, 1) result <- ppg_lpsi(pmat, gmat, k) ## End(Not run)
Predict selection index scores
predict_selection_score(index_df, data, genotypes)predict_selection_score(index_df, data, genotypes)
index_df |
Data frame returned by lpsi() |
data |
Raw phenotypic data matrix/data frame (observations x traits) |
genotypes |
Vector of genotype/treatment labels for each observation |
Data frame of selection index scores by genotype
gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) cindex <- lpsi(ncomb = 1, pmat = pmat, gmat = gmat, wmat = weight[, -1], wcol = 1) predict_selection_score(cindex, data = seldata[, 3:9], genotypes = seldata[, 2])gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) cindex <- lpsi(ncomb = 1, pmat = pmat, gmat = gmat, wmat = weight[, -1], wcol = 1) predict_selection_score(cindex, data = seldata[, 3:9], genotypes = seldata[, 2])
Print method for Base Index
## S3 method for class 'base_index' print(x, ...)## S3 method for class 'base_index' print(x, ...)
x |
Object of class 'base_index' |
... |
Additional arguments (unused) |
Print method for Desired Gains Index
## S3 method for class 'dg_lpsi' print(x, ...)## S3 method for class 'dg_lpsi' print(x, ...)
x |
Object of class 'dg_lpsi' |
... |
Additional arguments (unused) |
Print method for ESIM
## S3 method for class 'esim' print(x, ...)## S3 method for class 'esim' print(x, ...)
x |
Object of class 'esim' |
... |
Additional arguments (unused) |
Print method for GESIM
## S3 method for class 'gesim' print(x, ...)## S3 method for class 'gesim' print(x, ...)
x |
Object of class 'gesim' |
... |
Additional arguments (unused) |
Print method for GW-ESIM
## S3 method for class 'gw_esim' print(x, ...)## S3 method for class 'gw_esim' print(x, ...)
x |
Object of class 'gw_esim' |
... |
Additional arguments (unused) |
Print method for MESIM
## S3 method for class 'mesim' print(x, ...)## S3 method for class 'mesim' print(x, ...)
x |
Object of class 'mesim' |
... |
Additional arguments (unused) |
Print method for PPG-ESIM
## S3 method for class 'ppg_esim' print(x, ...)## S3 method for class 'ppg_esim' print(x, ...)
x |
Object of class 'ppg_esim' |
... |
Additional arguments (unused) |
Print method for PPG-GESIM
## S3 method for class 'ppg_gesim' print(x, ...)## S3 method for class 'ppg_gesim' print(x, ...)
x |
Object of class 'ppg_gesim' |
... |
Additional arguments (unused) |
Print method for RESIM
## S3 method for class 'resim' print(x, ...)## S3 method for class 'resim' print(x, ...)
x |
Object of class 'resim' |
... |
Additional arguments (unused) |
Print method for RGESIM
## S3 method for class 'rgesim' print(x, ...)## S3 method for class 'rgesim' print(x, ...)
x |
Object of class 'rgesim' |
... |
Additional arguments (unused) |
Print Method for Selection Simulation Results
## S3 method for class 'selection_simulation' print(x, ...)## S3 method for class 'selection_simulation' print(x, ...)
x |
Object of class |
... |
Additional arguments (not used) |
Print method for Smith-Hazel Index
## S3 method for class 'smith_hazel' print(x, ...)## S3 method for class 'smith_hazel' print(x, ...)
x |
Object of class 'smith_hazel' |
... |
Additional arguments (unused) |
Extends ESIM by imposing null restrictions: genetic gains for selected
traits are forced to zero while the index heritability for the remaining traits
is maximised.
resim( pmat, gmat, restricted_traits = NULL, U_mat = NULL, selection_intensity = 2.063 )resim( pmat, gmat, restricted_traits = NULL, U_mat = NULL, selection_intensity = 2.063 )
pmat |
Phenotypic variance-covariance matrix (n_traits x n_traits). |
gmat |
Genotypic variance-covariance matrix (n_traits x n_traits). |
restricted_traits |
Integer vector of trait indices to restrict to zero
genetic gain. Example: |
U_mat |
Optional. Restriction matrix (n_traits x r) where each column
defines one null restriction ( |
selection_intensity |
Selection intensity constant (default: 2.063). |
Projection matrix (Section 7.2):
Restricted eigenproblem:
Selection response and genetic gain:
Implied economic weights:
where .
Object of class "resim", a list with:
summaryData frame with b coefficients and key metrics.
bNamed numeric vector of RESIM coefficients.
Delta_GNamed vector of expected genetic gains per trait.
sigma_IIndex standard deviation.
hI2Index heritability (leading eigenvalue of KP^(-1)C).
rHIIndex accuracy.
lambda2Leading eigenvalue of the restricted eigenproblem.
KProjection matrix used.
U_matRestriction matrix used.
restricted_traitsInteger vector of restricted trait indices.
implied_wImplied economic weights.
selection_intensitySelection intensity used.
Ceron-Rojas, J. J., & Crossa, J. (2018). Linear Selection Indices in Modern Plant Breeding. Springer International Publishing. Section 7.2.
## Not run: gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) # Restrict traits 1 and 3 to zero genetic gain result <- resim(pmat, gmat, restricted_traits = c(1, 3)) print(result) summary(result) ## End(Not run)## Not run: gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) # Restrict traits 1 and 3 to zero genetic gain result <- resim(pmat, gmat, restricted_traits = c(1, 3)) print(result) summary(result) ## End(Not run)
Implements the RGESIM which extends GESIM to allow restrictions on genetic gains of certain traits. Uses the eigen approach with Lagrange multipliers.
rgesim(pmat, gmat, Gamma, U_mat, selection_intensity = 2.063)rgesim(pmat, gmat, Gamma, U_mat, selection_intensity = 2.063)
pmat |
Phenotypic variance-covariance matrix (n_traits x n_traits). |
gmat |
Genotypic variance-covariance matrix (n_traits x n_traits). |
Gamma |
Covariance between phenotypes and GEBVs (n_traits x n_traits). |
U_mat |
Restriction matrix (r x n_traits) where r is number of restrictions. Each row specifies a linear combination of traits to be held at zero gain. |
selection_intensity |
Selection intensity constant |
Eigenproblem (Section 8.4):
where:
Implied economic weights:
Selection response:
Expected genetic gain per trait:
Object of class "rgesim", a list with:
summaryData frame with coefficients and metrics.
b_yCoefficients for phenotypic data.
b_gammaCoefficients for GEBVs.
b_combinedCombined coefficient vector.
E_RGExpected genetic gains per trait.
constrained_responseU' * E (should be near zero).
sigma_IStandard deviation of the index.
hI2Index heritability.
rHIAccuracy.
R_RGSelection response.
lambda2Leading eigenvalue.
implied_wImplied economic weights.
K_RGProjection matrix.
Q_RGConstraint projection matrix.
selection_intensitySelection intensity used.
Ceron-Rojas, J. J., & Crossa, J. (2018). Linear Selection Indices in Modern Plant Breeding. Springer International Publishing. Section 8.4.
## Not run: gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) # Simulate GEBV covariance Gamma <- gmat * 0.8 # Restrict first trait to zero gain # U_mat must be (n_traits x n_restrictions) n_traits <- nrow(gmat) U_mat <- matrix(0, n_traits, 1) U_mat[1, 1] <- 1 # Restrict trait 1 result <- rgesim(pmat, gmat, Gamma, U_mat) print(result) print(result$constrained_response) # Should be near zero ## End(Not run)## Not run: gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) # Simulate GEBV covariance Gamma <- gmat * 0.8 # Restrict first trait to zero gain # U_mat must be (n_traits x n_restrictions) n_traits <- nrow(gmat) U_mat <- matrix(0, n_traits, 1) U_mat[1, 1] <- 1 # Restrict trait 1 result <- rgesim(pmat, gmat, Gamma, U_mat) print(result) print(result$constrained_response) # Should be near zero ## End(Not run)
Implements the Restricted Linear Genomic Selection Index where genetic gains are constrained to zero for specific traits while maximizing gains for others. Uses GEBVs only (no phenotypic data required).
rlgsi( Gamma, wmat, wcol = 1, restricted_traits = NULL, U = NULL, k_I = 2.063, L_G = 1, gmat = NULL, GAY = NULL )rlgsi( Gamma, wmat, wcol = 1, restricted_traits = NULL, U = NULL, k_I = 2.063, L_G = 1, gmat = NULL, GAY = NULL )
Gamma |
GEBV variance-covariance matrix (n_traits x n_traits). This represents the variance of GEBVs, typically computed from predicted breeding values. |
wmat |
Economic weights matrix (n_traits x k), or vector |
wcol |
Weight column to use if wmat has multiple columns (default: 1) |
restricted_traits |
Vector of trait indices to restrict (default: NULL). Example: c(1, 3) restricts traits 1 and 3 to zero gain. |
U |
Constraint matrix (n_traits x n_constraints). Each column defines a restriction. Alternative to restricted_traits for custom constraints. Ignored if restricted_traits is provided. |
k_I |
Selection intensity (default: 2.063 for 10 percent selection) |
L_G |
Standardization constant (default: 1). Can be set to sqrt(w'Gw) for standardization. |
gmat |
Optional. True genetic variance-covariance matrix for exact accuracy calculation. If NULL, uses Gamma as approximation. Providing gmat ensures textbook-perfect accuracy metric. |
GAY |
Optional. Genetic advance of comparative trait for PRE calculation |
Mathematical Formulation (Chapter 6, Section 6.1):
The RLGSI minimizes the mean squared difference between the index I = ' and
the breeding objective H = w'g under the restriction: U' = 0
Solution involves solving the augmented system:
Where:
- (Gamma) = Var(GEBVs) - GEBV variance-covariance matrix
- U = Constraint matrix (each column is a restriction vector)
- w = Economic weights
- = RLGSI coefficient vector
- v = Lagrange multipliers
Selection response:
Expected gains:
List with:
summary - Data frame with coefficients, response metrics
b - Vector of RLGSI coefficients ()
E - Named vector of expected genetic gains per trait
R - Overall selection response
U - Constraint matrix used
constrained_response - Realized gains for constrained traits (should be ~0)
## Not run: # Simulate GEBV variance-covariance matrix set.seed(123) n_traits <- 5 Gamma <- matrix(rnorm(n_traits^2), n_traits, n_traits) Gamma <- (Gamma + t(Gamma)) / 2 # Make symmetric diag(Gamma) <- abs(diag(Gamma)) + 2 # Ensure positive definite # Economic weights w <- c(10, 8, 6, 4, 2) # Restrict traits 2 and 4 to zero gain result <- rlgsi(Gamma, w, restricted_traits = c(2, 4)) print(result$summary) print(result$E) # Check that traits 2 and 4 have ~0 gain ## End(Not run)## Not run: # Simulate GEBV variance-covariance matrix set.seed(123) n_traits <- 5 Gamma <- matrix(rnorm(n_traits^2), n_traits, n_traits) Gamma <- (Gamma + t(Gamma)) / 2 # Make symmetric diag(Gamma) <- abs(diag(Gamma)) + 2 # Ensure positive definite # Economic weights w <- c(10, 8, 6, 4, 2) # Restrict traits 2 and 4 to zero gain result <- rlgsi(Gamma, w, restricted_traits = c(2, 4)) print(result$summary) print(result$E) # Check that traits 2 and 4 have ~0 gain ## End(Not run)
Implements the Restricted LPSI where genetic gains are constrained to zero for specific traits while maximizing gains for others. Based on Kempthorne & Nordskog (1959).
rlpsi(pmat, gmat, wmat, wcol = 1, restricted_traits = NULL, C = NULL, GAY)rlpsi(pmat, gmat, wmat, wcol = 1, restricted_traits = NULL, C = NULL, GAY)
pmat |
Phenotypic variance-covariance matrix (n_traits x n_traits) |
gmat |
Genotypic variance-covariance matrix (n_traits x n_traits) |
wmat |
Weight matrix (n_traits x k), or vector |
wcol |
Weight column number (default: 1) |
restricted_traits |
Vector of trait indices to restrict (default: NULL). If provided, a constraint matrix C is auto-generated to enforce zero gain on these traits. Example: c(1, 3) restricts traits 1 and 3 to zero gain. |
C |
Constraint matrix (n_traits x n_constraints). Each column is a restriction. Alternative to restricted_traits for custom constraints. Ignored if restricted_traits is provided. |
GAY |
Genetic advance of comparative trait (optional) |
Mathematical Formulation (Chapter 3, Section 3.1):
The RLPSI minimizes the mean squared difference between I = b'y and H = w'g subject to the restriction: C'Delta_G = 0
Coefficient formula:
Where: - P = Phenotypic variance-covariance matrix - G = Genotypic variance-covariance matrix - C = Constraint matrix (each column enforces one restriction) - w = Economic weights
The constraint C'Delta_G = 0 ensures zero genetic gain for restricted traits.
List with:
summary - Data frame with coefficients (b.*), GA, PRE, Delta_G, rHI, hI2
b - Numeric vector of selection index coefficients
Delta_G - Named vector of realized correlated responses per trait
C - Constraint matrix used
## Not run: gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) wmat <- weight_mat(weight) # Easy way: Restrict traits 1 and 3 to zero gain result <- rlpsi(pmat, gmat, wmat, wcol = 1, restricted_traits = c(1, 3)) # Advanced way: Provide custom constraint matrix C <- diag(ncol(pmat))[, 1, drop = FALSE] result <- rlpsi(pmat, gmat, wmat, wcol = 1, C = C) ## End(Not run)## Not run: gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) wmat <- weight_mat(weight) # Easy way: Restrict traits 1 and 3 to zero gain result <- rlpsi(pmat, gmat, wmat, wcol = 1, restricted_traits = c(1, 3)) # Advanced way: Provide custom constraint matrix C <- diag(ncol(pmat))[, 1, drop = FALSE] result <- rlpsi(pmat, gmat, wmat, wcol = 1, C = C) ## End(Not run)
A dataset containing the data of three replications and 48 progenies with 7 different traits.
data(seldata)data(seldata)
A data frame with 75 rows and 9 columns
rep. Replications
treat. Treatments/Genotypes
sypp. Seed Yield per Plant
dtf. Days to 50
rpp. Racemes per Plant
ppr. Pods per Raceme
ppp. Pods per Plant
spp. Seeds per Pod
pw. Pods Weight
Performs a stochastic simulation comparing the long-term performance of four selection indices (LPSI, ESIM, RLPSI, RESIM) over multiple breeding cycles. Models linkage between loci using Haldane's mapping function.
simulate_selection_cycles( n_cycles = 50, n_individuals = 1000, n_loci = 100, n_traits = 3, heritability = 0.5, selection_proportion = 0.1, economic_weights = NULL, restricted_traits = NULL, genetic_distances = NULL, qtl_effects = NULL, seed = NULL )simulate_selection_cycles( n_cycles = 50, n_individuals = 1000, n_loci = 100, n_traits = 3, heritability = 0.5, selection_proportion = 0.1, economic_weights = NULL, restricted_traits = NULL, genetic_distances = NULL, qtl_effects = NULL, seed = NULL )
n_cycles |
Number of selection cycles to simulate (default: 50) |
n_individuals |
Initial population size (default: 1000) |
n_loci |
Number of QTL per trait (default: 100) |
n_traits |
Number of quantitative traits (default: 3) |
heritability |
Heritability for all traits (scalar or vector, default: 0.5) |
selection_proportion |
Proportion of individuals selected (default: 0.1) |
economic_weights |
Economic weights for LPSI (default: equal weights) |
restricted_traits |
Trait indices to restrict to zero gain for RLPSI/RESIM (default: NULL) |
genetic_distances |
Vector of genetic distances between loci in Morgans (default: 0.1) |
qtl_effects |
Optional matrix of QTL effects (n_loci x n_traits) |
seed |
Random seed for reproducibility (default: NULL) |
Simulation Procedure (Chapter 10):
1. Initialize population with random QTL alleles at linked loci 2. For each cycle: - Compute genetic values from diploid genotypes - Add environmental noise to create phenotypes - Calculate variance-covariance matrices - Apply each selection index (LPSI, ESIM, RLPSI, RESIM) - Select top individuals based on index scores - Generate offspring through recombination (using Haldane's function) 3. Track genetic gain and population mean across cycles
The Four Indices (Chapter 10, Section 10.2):
LPSI: Unrestricted index maximizing genetic gain: b = P^(-1)Gw
ESIM: Unrestricted eigen index maximizing accuracy: (P^(-1)C - lambda^2 I)b = 0
RLPSI: Restricted LPSI with constraints: U'Gb = 0
RESIM: Restricted ESIM with constraints: U'Cb = 0
Important Simulation Assumptions:
Note: Environmental variance is calculated once at generation 0 and held constant across all cycles. Heritability will naturally decline over cycles as genetic variance is depleted (Bulmer effect). This models the biological reality that selection exhausts additive genetic variance while environmental variation remains stable.
List with components:
lpsi_gain - Matrix of genetic gains per cycle for LPSI (n_cycles x n_traits)
esim_gain - Matrix of genetic gains per cycle for ESIM
rlpsi_gain - Matrix of genetic gains per cycle for RLPSI
resim_gain - Matrix of genetic gains per cycle for RESIM
lpsi_mean - Mean genetic value per cycle for LPSI (n_cycles x n_traits)
esim_mean - Mean genetic value per cycle for ESIM
rlpsi_mean - Mean genetic value per cycle for RLPSI
resim_mean - Mean genetic value per cycle for RESIM
parameters - List of simulation parameters used
## Not run: # Basic simulation with 3 traits over 20 cycles results <- simulate_selection_cycles( n_cycles = 20, n_individuals = 500, n_loci = 50, n_traits = 3, heritability = 0.5, selection_proportion = 0.1, economic_weights = c(10, 5, 3), seed = 123 ) # Plot genetic gains plot(1:20, results$lpsi_mean[, 1], type = "l", col = "blue", ylab = "Mean Genetic Value", xlab = "Cycle", main = "Genetic Gain - Trait 1" ) lines(1:20, results$esim_mean[, 1], col = "red") lines(1:20, results$rlpsi_mean[, 1], col = "green") lines(1:20, results$resim_mean[, 1], col = "orange") legend("topleft", c("LPSI", "ESIM", "RLPSI", "RESIM"), col = c("blue", "red", "green", "orange"), lty = 1 ) # Restrict trait 2 to zero gain results_restricted <- simulate_selection_cycles( n_cycles = 20, n_traits = 3, restricted_traits = 2, seed = 456 ) ## End(Not run)## Not run: # Basic simulation with 3 traits over 20 cycles results <- simulate_selection_cycles( n_cycles = 20, n_individuals = 500, n_loci = 50, n_traits = 3, heritability = 0.5, selection_proportion = 0.1, economic_weights = c(10, 5, 3), seed = 123 ) # Plot genetic gains plot(1:20, results$lpsi_mean[, 1], type = "l", col = "blue", ylab = "Mean Genetic Value", xlab = "Cycle", main = "Genetic Gain - Trait 1" ) lines(1:20, results$esim_mean[, 1], col = "red") lines(1:20, results$rlpsi_mean[, 1], col = "green") lines(1:20, results$resim_mean[, 1], col = "orange") legend("topleft", c("LPSI", "ESIM", "RLPSI", "RESIM"), col = c("blue", "red", "green", "orange"), lty = 1 ) # Restrict trait 2 to zero gain results_restricted <- simulate_selection_cycles( n_cycles = 20, n_traits = 3, restricted_traits = 2, seed = 456 ) ## End(Not run)
Implements the optimal Smith-Hazel selection index which maximizes the correlation between the index I = b'y and the breeding objective H = w'g.
This is the foundational selection index method from Chapter 2.
smith_hazel( pmat, gmat, wmat, wcol = 1, selection_intensity = 2.063, GAY = NULL )smith_hazel( pmat, gmat, wmat, wcol = 1, selection_intensity = 2.063, GAY = NULL )
pmat |
Phenotypic variance-covariance matrix (n_traits x n_traits) |
gmat |
Genotypic variance-covariance matrix (n_traits x n_traits) |
wmat |
Economic weights matrix (n_traits x k), or vector |
wcol |
Weight column to use if wmat has multiple columns (default: 1) |
selection_intensity |
Selection intensity constant (default: 2.063 for 10% selection) |
GAY |
Optional. Genetic advance of comparative trait for PRE calculation |
Mathematical Formulation (Chapter 2):
Index coefficients:
Where: - P = Phenotypic variance-covariance matrix - G = Genotypic variance-covariance matrix - w = Economic weights
Key metrics:
- Variance of index:
- Total genetic advance:
- Expected gains per trait:
- Heritability of index:
- Accuracy:
List with:
summary - Data frame with coefficients and metrics
b - Vector of Smith-Hazel index coefficients
w - Named vector of economic weights
Delta_G - Named vector of expected genetic gains per trait
sigma_I - Standard deviation of the index
GA - Total genetic advance
PRE - Percent relative efficiency
hI2 - Heritability of the index
rHI - Accuracy (correlation with breeding objective)
## Not run: # Calculate variance-covariance matrices gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) # Define economic weights weights <- c(10, 8, 6, 4, 2, 1, 1) # Build Smith-Hazel index result <- smith_hazel(pmat, gmat, weights) print(result) summary(result) ## End(Not run)## Not run: # Calculate variance-covariance matrices gmat <- gen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) pmat <- phen_varcov(seldata[, 3:9], seldata[, 2], seldata[, 1]) # Define economic weights weights <- c(10, 8, 6, 4, 2, 1, 1) # Build Smith-Hazel index result <- smith_hazel(pmat, gmat, weights) print(result) summary(result) ## End(Not run)
Summary method for Base Index
## S3 method for class 'base_index' summary(object, ...)## S3 method for class 'base_index' summary(object, ...)
object |
Object of class 'base_index' |
... |
Additional arguments passed to print |
Summary method for Desired Gains Index
## S3 method for class 'dg_lpsi' summary(object, ...)## S3 method for class 'dg_lpsi' summary(object, ...)
object |
Object of class 'dg_lpsi' |
... |
Additional arguments (unused) |
Summary method for ESIM
## S3 method for class 'esim' summary(object, ...)## S3 method for class 'esim' summary(object, ...)
object |
Object of class 'esim' |
... |
Additional arguments (unused) |
Summary method for GESIM
## S3 method for class 'gesim' summary(object, ...)## S3 method for class 'gesim' summary(object, ...)
object |
Object of class 'gesim' |
... |
Additional arguments (unused) |
Summary method for GW-ESIM
## S3 method for class 'gw_esim' summary(object, ...)## S3 method for class 'gw_esim' summary(object, ...)
object |
Object of class 'gw_esim' |
... |
Additional arguments (unused) |
Summary method for MESIM
## S3 method for class 'mesim' summary(object, ...)## S3 method for class 'mesim' summary(object, ...)
object |
Object of class 'mesim' |
... |
Additional arguments (unused) |
Summary method for PPG-ESIM
## S3 method for class 'ppg_esim' summary(object, ...)## S3 method for class 'ppg_esim' summary(object, ...)
object |
Object of class 'ppg_esim' |
... |
Additional arguments (unused) |
Summary method for PPG-GESIM
## S3 method for class 'ppg_gesim' summary(object, ...)## S3 method for class 'ppg_gesim' summary(object, ...)
object |
Object of class 'ppg_gesim' |
... |
Additional arguments (unused) |
Summary method for RESIM
## S3 method for class 'resim' summary(object, ...)## S3 method for class 'resim' summary(object, ...)
object |
Object of class 'resim' |
... |
Additional arguments (unused) |
Summary method for RGESIM
## S3 method for class 'rgesim' summary(object, ...)## S3 method for class 'rgesim' summary(object, ...)
object |
Object of class 'rgesim' |
... |
Additional arguments (unused) |
Summary Method for Selection Simulation Results
## S3 method for class 'selection_simulation' summary(object, ...)## S3 method for class 'selection_simulation' summary(object, ...)
object |
Object of class |
... |
Additional arguments (not used) |
Summary method for Smith-Hazel Index
## S3 method for class 'smith_hazel' summary(object, ...)## S3 method for class 'smith_hazel' summary(object, ...)
object |
Object of class 'smith_hazel' |
... |
Additional arguments passed to print |
A dataset containing the data of 2 different weights namely euqal weight and broad sense heritability
data(weight)data(weight)
A data frame with 7 rows and 3 columns
EW. Equal Weight
h2. Broad Sense Heritability
Convert dataframe to matrix
weight_mat(data)weight_mat(data)
data |
dataframe of weight |
A matrix
weight_mat(data = weight)weight_mat(data = weight)