Assess the quality of your prediction and classification models in ways that accurately reflect their real-world performance, and then improve this performance using state-of-the-art algorithms such as committee-based decision making, resampling the dataset, and boosting. This book presents many important techniques for building powerful, robust models and quantifying their expected behavior when put to work in your application.
Considerable attention is given to information theory, especially as it relates to discovering and exploiting relationships between variables employed by your models. This presentation of an often confusing subject avoids advanced mathematics, focusing instead on concepts easily understood by those with modest background in mathematics.
All algorithms include an intuitive explanation of operation, essential equations, references to more rigorous theory, and commented C++ source code. Many of these techniques are recent developments, still not in widespread use. Others are standard algorithms given a fresh look. In every case, the emphasis is on practical applicability, with all code written in such a way that it can easily be included in any program.
What You'll Learn- Compute entropy to detect problematic predictors
- Improve numeric predictions using constrained and unconstrained combinations, variance-weighted interpolation, and kernel-regression smoothing
- Carry out classification decisions using Borda counts, MinMax and MaxMin rules, union and intersection rules, logistic regression, selection by local accuracy, maximization of the fuzzy integral, and pairwise coupling
- Harness information-theoretic techniques to rapidly screen large numbers of candidate predictors, identifying those that are especially promising
- Use Monte-Carlo permutation methods to assess the role of good luck in performance results
- Compute confidence and tolerance intervals for predictions, as well as confidence levels for classification decisions
Who This Book is ForAnyone who creates prediction or classification models will find a wealth of useful algorithms in this book. Although all code examples are written in C++, the algorithms are described in sufficient detail that they can easily be programmed in any language.