000 05337nab a22005537a 4500
999 _c30447
_d30447
001 G98583
003 MX-TxCIM
005 20240919020947.0
008 121211b |||p||p||||||| |z||| |
022 _a1365-2540 (Revista en electrónico)
022 0 _a0018-067X
024 8 _ahttps://doi.org/10.1038/hdy.2013.144
040 _aMX-TxCIM
041 0 _aEn
090 _aCIS-7464
100 1 _aOrnella, L.
245 1 0 _aGenomic-enabled prediction with classification algorithms
260 _c2014
500 _aPeer-review: Yes - Open Access: Yes |http://science.thomsonreuters.com/cgi-bin/jrnlst/jlresults.cgi?PC=MASTER&ISSN=0018-067X
500 _aPeer review
500 _aOpen Access
520 _aPearson?s correlation coefficient (ρ) is the most commonly reported metric of the success of prediction in genomic selection (GS). However, in real breeding ρ may not be very useful for assessing the quality of the regression in the tails of the distribution, where individuals are chosen for selection. This research used 14 maize and 16 wheat data sets with different trait?environment combinations. Six different models were evaluated by means of a cross-validation scheme (50 random partitions each, with 90% of the individuals in the training set and 10% in the testing set). The predictive accuracy of these algorithms for selecting individuals belonging to the best α=10, 15, 20, 25, 30, 35, 40% of the distribution was estimated using Cohen?s kappa coefficient (κ) and an ad hoc measure, which we call relative efficiency (RE), which indicates the expected genetic gain due to selection when individuals are selected based on GS exclusively. We put special emphasis on the analysis for α=15%, because it is a percentile commonly used in plant breeding programmes (for example, at CIMMYT). We also used ρ as a criterion for overall success. The algorithms used were: Bayesian LASSO (BL), Ridge Regression (RR), Reproducing Kernel Hilbert Spaces (RHKS), Random Forest Regression (RFR), and Support Vector Regression (SVR) with linear (lin) and Gaussian kernels (rbf). The performance of regression methods for selecting the best individuals was compared with that of three supervised classification algorithms: Random Forest Classification (RFC) and Support Vector Classification (SVC) with linear (lin) and Gaussian (rbf) kernels. Classification methods were evaluated using the same cross-validation scheme but with the response vector of the original training sets dichotomised using a given threshold. For α=15%, SVC-lin presented the highest κ coefficients in 13 of the 14 maize data sets, with best values ranging from 0.131 to 0.722 (statistically significant in 9 data sets) and the best RE in the same 13 data sets, with values ranging from 0.393 to 0.948 (statistically significant in 12 data sets). RR produced the best mean for both κ and RE in one data set (0.148 and 0.381, respectively). Regarding the wheat data sets, SVC-lin presented the best κ in 12 of the 16 data sets, with outcomes ranging from 0.280 to 0.580 (statistically significant in 4 data sets) and the best RE in 9 data sets ranging from 0.484 to 0.821 (statistically significant in 5 data sets). SVC-rbf (0.235), RR (0.265) and RHKS (0.422) gave the best κ in one data set each, while RHKS and BL tied for the last one (0.234). Finally, BL presented the best RE in two data sets (0.738 and 0.750), RFR (0.636) and SVC-rbf (0.617) in one and RHKS in the remaining three (0.502, 0.458 and 0.586). The difference between the performance of SVC-lin and that of the rest of the models was not so pronounced at higher percentiles of the distribution. The behaviour of regression and classification algorithms varied markedly when selection was done at different thresholds, that is, κ and RE for each algorithm depended strongly on the selection percentile. Based on the results, we propose classification method as a promising alternative for GS in plant breeding.
536 _aGlobal Maize Program|Genetic Resources Program|Global Wheat Program
546 _aEnglish
591 _aCIMMYT Informa No. 1876|Nature Publishing Group
594 _aINT3239|INT3400|INT3098|INT3035|INT2902|INT2692|INT0610|CCJL01
595 _aCSC
650 1 0 _aGenomic selection
_91513
650 7 _aMaize
_gAGROVOC
_2
_91173
650 1 0 _asupport vector machines
650 7 _aWheat
_gAGROVOC
_2
_91310
650 7 _aArtificial Selection
_98685
_2AGROVOC
650 7 _aStatistical methods
_92624
_2AGROVOC
700 1 _aGonzlez-Camacho, J.M.,
_ecoaut.
700 1 _aLong, N.,
_ecoaut.
_9576
700 1 _aPerez, P.,
_ecoaut.
700 1 _aTapia, E.,
_ecoaut.
700 1 _aVicente, F.S.,
_ecoaut.
700 1 _9892
_aSukhwinder-Singh
_gGenetic Resources Program
_8INT3098
_ecoaut.
700 1 _9907
_aBurgueño, J.
_gGenetic Resources Program
_8INT3239
700 0 _aXuecai Zhang
_gGlobal Maize Program
_8INT3400
_9951
700 1 _aSingh, R.P.
_gGlobal Wheat Program
_8INT0610
_9825
700 1 _9851
_aDreisigacker, S.
_gGlobal Wheat Program
_8INT2692
700 1 _9871
_aBonnett, D.G.
_gGlobal Wheat Program
_8INT2902
_ecoaut.
700 1 _aCrossa, J.
_gGenetic Resources Program
_8CCJL01
_959
773 0 _tHeredity
_gv. 112, p. 616-626
856 4 _uhttps://hdl.handle.net/10883/19766
_yOpen Access through DSpace
942 _cJA
_2ddc
_n0