Big Data Moscow 2018

Miel Hostens

UGhent, Belgium

BIO

Post-doc assistant position on herd health management focusing on the optimization of productive and reproductive performances in small and large herds with an emphasis on nutrition at the Ambulatory Clinic of the Department of Reproduction, Obstetrics and Herd Health. Workpackage leader for 3 work packages with a focus on data management in EU FP7 project GplusE
Education of master students in Veterinary Medicine. Statistical training of Ph.D. students in data management in the area of dairy cows. Post academic and extension services in the area of herd health management in dairy cows.

TOPIC

A Novel Approach to Data Mining And Prediction Modelling in Dairy Cows

As a major livestock producer, the European Union is directly affected by the global need for more sustainable food production. Climate change will undoubtedly impact on farm animal production but the health and welfare of livestock is also of increasing public concern. The common currency in developing solutions to all of these challenges is improved animal production efficiency (Simmons, 2012).

Diseases at calving and during early lactation (shortly after calving) account for the major health and welfare problems in dairy production (Drackley et al., 2005). These include production diseases such as fatty liver, ketosis, rumen acidosis and lameness and infectious diseases such as mastitis and reproductive tract infections. Infertility and disease are strongly interlinked through communal metabolic and immune signaling pathways (Moore et al., 2005). The knock-on consequences of these are suboptimal production and lower reproductive efficiency. These in turn contribute to excess methane emissions and higher nitrogen and phosphorous losses from soil, because there is a need to breed and keep a higher number of replacement animals.

Due to rapid development of precision livestock farming technologies and availability of high-throughput from sensors, large-scale massive data has become available on many research farms which can each serve as a candidate phenotype for the aforementioned challenges.

Dealing with such interlinked challenges requires early identification using biomarkers (BM). The preferred matrix in which to measure the biomarkers is milk, as it is more accessible than blood and allows low-cost, automated repeat sampling using recently developed in-line sampling and analytical technologies (Egger-Danner et al., 2015). It is highly probable that certain N-glycan structures (BM-1), metabolites (BM-2) or mid-infra-red spectra (BM-3) in bovine milk can serve as biomarkers to predict one of the aforementioned phenotypes but comparable prediction methodologies are lacking.

Machine learning is often described as a key technology that will unlock insights from such data (Domingos, 2012). Effectively leveraging these technologies however requires thorough understanding of the dairy domain and data science in order to clean and effectively join multiple data sources, identify the relevant parts in the data, build domain specific algorithms and visualizations and validate the predictive models. Each of these steps are essential in advance of efficient and effective transfer of such models to the dairy industry.

Accordingly, this methodology paper describes a semi-automated approach to select potential candidates for future industry-wide prediction models. The primary objective of this study is rank the 3 types of BM according to their predictive power to predict the phenotypes of interest from a multi-site study design.

Date: October 11, 2018