This book provides an introduction to computer-based methods for the analysis of genomic data. Breakthroughs in molecular and computational biology have contributed to the emergence of vast data sets, where millions of genetic markers for each individual are coupled with medical records, generating an unparalleled resource for linking human genetic variation to human biology and disease. Similar developments have taken place in animal and plant breeding, where genetic marker information is combined with production traits. An important task for the statistical geneticist is to adapt, construct and implement models that can extract information from these large-scale data. An initial step is to understand the methodology that underlies the probability models and to learn the modern computer-intensive methods required for fitting these models. The objective of this book, suitable for readers who wish to develop analytic skills to perform genomic research, is to provide guidance to take this first step.
This book is addressed to numerate biologists who may lack the formal mathematical background of the professional statistician. For this reason, considerably more detailed explanations and derivations are offered. Examples are used profusely and a large proportion involves programming with the open-source package R. The code needed to solve the exercises is provided and it can be downloaded, allowing students to experiment by running the programs on their own computer.
Part I presents methods of inference and computation that are appropriate for likelihood and Bayesian models. Part II discusses prediction for continuous and binary data using both frequentist and Bayesian approaches. Some of the models used for prediction are also used for gene discovery. The challenge is to find promising genes without incurring a large proportion of false positive results. Therefore, Part II includes a detour on the False Discovery Rate, assuming frequentist and Bayesian perspectives. The last chapter of Part II provides an overview of a selected number of non-parametric methods. Part III consists of exercises and their solutions. This second edition has benefited from many clarifications and extensions of themes discussed in the first edition.
Daniel Sorensen
We publiceren alleen reviews die voldoen aan de voorwaarden voor reviews. Bekijk onze voorwaarden voor reviews.