Handbook of Statistical Analysis and Data Mining Applications
Read it now on the O’Reilly learning platform with a 10-day free trial.
O’Reilly members get unlimited access to books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.
Book description
The Handbook of Statistical Analysis and Data Mining Applications is a comprehensive professional reference book that guides business analysts, scientists, engineers and researchers (both academic and industrial) through all stages of data analysis, model building and implementation. The Handbook helps one discern the technical and business problem, understand the strengths and weaknesses of modern data mining algorithms, and employ the right statistical methods for practical application. Use this book to address massive and complex datasets with novel statistical approaches and be able to objectively evaluate analyses and solutions. It has clear, intuitive explanations of the principles and tools for solving problems using modern analytic techniques, and discusses their application to real problems, in ways accessible and beneficial to practitioners across industries - from science and engineering, to medicine, academia and commerce. This handbook brings together, in a single resource, all the information a beginner will need to understand the tools and issues in data mining to build successful data mining solutions.
- Written "By Practitioners for Practitioners"
- Non-technical explanations build understanding without jargon and equations
- Tutorials in numerous fields of study provide step-by-step instruction on how to use supplied tools to build models
- Practical advice from successful real-world implementations
- Includes extensive case studies, examples, MS PowerPoint slides and datasets
- CD-DVD with valuable fully-working 90-day software included: "Complete Data Miner - QC-Miner - Text Miner" bound with book
Show and hide more
Table of contents Product information
Table of contents
- Cover image
- Title page
- Table of Contents
- Copyright
- Foreword 1
- Foreword 2
- Preface
- OVERALL ORGANIZATION OF THIS BOOK
- References
- SAS
- STATSOFT
- SPSS
- Patterns of Action
- Human Intuition
- Putting it all Together
- References
- Chapter 1. The Background for Data Mining Practice
- Preamble
- A Short History of Statistics and Data Mining
- Modern Statistics: A Duality?
- Two Views of Reality
- The Rise of Modern Statistical Analysis: The Second Generation
- Machine Learning Methods: The Third Generation
- Statistical Learning Theory: The Fourth Generation
- Postscript
- References
- Preamble
- The Scientific Method
- What Is Data Mining?
- A Theoretical Framework for the Data Mining Process
- Strengths of the Data Mining Process
- Customer-Centric Versus Account-Centric: A New Way to Look at Your Data
- The Data Paradigm Shift
- Creation of the CAR
- Major Activities of Data Mining
- Major Challenges of Data Mining
- Examples of Data Mining Applications
- Major Issues in Data Mining
- General Requirements for Success in a Data Mining Project
- Example of a Data Mining Project: Classify a Bat’s Species by Its Sound
- The Importance of Domain Knowledge
- Postscript
- References
- Preamble
- The Science of Data Mining
- The Approach to Understanding and Problem Solving
- Business Understanding (Mostly Art)
- Data Understanding (Mostly Science)
- Data Preparation (A Mixture of Art and Science)
- Modeling (A Mixture of Art and Science)
- Deployment (Mostly Art)
- Closing the Information Loop* (Art)
- The Art of Data Mining
- Postscript
- References
- Preamble
- Activities of Data Understanding and Preparation
- Issues That Should be Resolved
- Data Understanding
- Postscript
- References
- Preamble
- Variables as Features
- Types of Feature Selections
- Feature Ranking Methods
- SUBSET SELECTION METHODS
- Postscript
- References
- Preamble
- Data Access Tools
- Data Exploration Tools
- Modeling Management Tools
- Modeling Analysis Tools
- In-Place Data Processing (IDP)
- Rapid Deployment of Predictive Models
- Model Monitors
- Postscript
- Bibliography
- Chapter 7. Basic Algorithms for Data Mining: A Brief Overview
- Preamble
- Basic Data Mining Algorithms
- Generalized Additive Models (GAMs)
- Classification and Regression Trees (CART)
- General Chaid Models
- Generalized EM and k-Means Cluster Analysis—An Overview
- Postscript
- References
- Bibliography
- Preamble
- Advanced Data Mining Algorithms
- Image and Object Data Mining: Visualization and 3D-Medical and Other Scanning Imaging
- Postscript
- References
- Preamble
- The Development of Text Mining
- A Practical Example: NTSB
- Text Mining Concepts Used in Conducting Text Mining Studies
- Postscript
- References
- Preamble
- SPSS Clementine Overview
- SAS-Enterprise Miner (SAS-EM) Overview
- STATISTICA Data Miner, QC-Miner, and Text Miner Overview
- Postscript
- References
- Preamble
- What is Classification?
- Initial Operations in Classification
- Major Issues with Classification
- Assumptions of Classification Procedures
- Methods for Classification
- What is the Best Algorithm for Classification?
- Postscript
- References
- Preamble
- Linear Response Analysis and the Assumptions of the Parametric Model
- Parametric Statistical Analysis
- Assumptions of the Parametric Model
- Linear Regression
- Generalized Linear Models (GLMs)
- Methods for Analyzing Nonlinear Relationships
- Nonlinear Regression and Estimation
- Data Mining and Machine Learning Algorithms Used in Numerical Prediction
- Advantages of Classification and Regression Trees (C&RT) Methods
- Application to Mixed Models
- Neural Nets for Prediction
- Support Vector Machines (SVMs) and Other Kernel Learning Algorithms
- Postscript
- References
- Preamble
- Introduction
- Model Evaluation
- Re-Cap of the Most Popular Algorithms
- Enhancement Action Checklist
- Ensembles of Models: The Single Greatest Enhancement Technique
- How to Thrive as a Data Miner
- Postscript
- References
- Preamble
- What Is Medical Informatics?
- How Data Mining and Text Mining Relate to Medical Informatics
- 3D Medical Informatics
- Postscript
- References
- Bibliography
- Preamble
- What Is Bioinformatics?
- Data Analysis Methods in Bioinformatics
- Web Services in Bioinformatics
- How Do We Apply Data Mining Methods to Bioinformatics?
- Postscript
- References
- Bibliography
- Preamble
- Early CRM Issues in Business
- Knowing How Customers Behaved Before They Acted
- CRM in Business Ecosystems
- Conclusions
- Postscript
- References
- Preamble
- Issues with Fraud Detection
- How Do You Detect Fraud?
- Supervised Classification of Fraud
- How Do You Model Fraud?
- How Are Fraud Detection Systems Built?
- Intrusion Detection Modeling
- Comparison of Models with and Without Time-Based Features
- Building Profiles
- Deployment of Fraud Profiles
- Postscript and Prolegomenon
- References
- Guest Authors of the Tutorials
- Tutorial A. How to Use Data Miner Recipe: STATISTICA Data Miner Only
- What Is STATISTICA Data Miner Recipe (DMR)?
- Core Analytic Ingredients
- Airline Safety
- SDR Database
- Preparing the Data for Our Tutorial
- Data Mining Approach
- Data Mining Algorithm Error Rate
- Conclusion
- References
- Introduction
- Data and Variable Definitions
- Getting to Know the Workspace of the Clementine Data Mining Toolkit
- Results
- Publishing and Reuse of Models and Other Outputs
- References
- Introduction
- A Primer of SAS-EM Predictive Modeling
- Scoring Process and the Total Profit
- Oversampling and Rare Event Detection
- Decision Matrix and the Profit Charts
- Micro-Target the Profitable Customers
- Appendix
- Reference
- Introduction: What Is Credit Scoring?
- Credit Scoring: Business Objectives
- Case Study: Consumer Credit Scoring
- Analysis and Results
- Comparative Assessment of the Models (Evaluation)
- Deploying the Model for Prediction
- Conclusion
- Objectives
- Steps
- Introduction
- Text Mining
- Car Review Example
- Interactive Trees (C&RT, CHAID)
- Other Applications of Text Mining
- Conclusion
- Predictive Process Control Using STATISTICA and STATISTICA QC-Miner
- Case Study: Predictive Process Control
- Data Analyses with STATISTICA
- Conclusion
- References
- Introduction
- Modeling Strategy
- SAS-EM 5.3 Interface
- A Primer of SAS-EM Predictive Modeling
- Advanced Techniques of Predictive Modeling
- Micro-Target the Profitable Customers
- Appendix
- References
- Background
- Data
- References
- Chapter 18. Model Complexity (and How Ensembles Help)
- Preamble
- Model Ensembles
- Complexity
- Generalized Degrees of Freedom
- Examples: Decision Tree Surface with Noise
- Summary and Discussion
- Postscript
- References
- Preamble
- More Is Not Necessarily Better: Lessons from Nature and Engineering
- Embrace Change Rather Than Flee from It
- Decision Making Breeds True in the Business Organism
- The 80:20 Rule in Action
- Agile Modeling: An Example of How to Craft Sufficient Solutions
- Postscript
- References
- Preamble
- Introduction
- 0 Lack Data
- 1 Focus on Training
- 2 Rely on One Technique
- 3 Ask the Wrong Question
- 4 Listen (Only) to the Data
- 5 Accept Leaks from the Future
- 6 Discount Pesky Cases
- 7 Extrapolate
- 8 Answer Every Inquiry
- 9 Sample Casually
- 10 Believe the Best Model
- How Shall We Then Succeed?
- Postscript
- References
- Preamble
- RFID
- Social Networking and Data Mining
- Image and Object Data Mining
- Cloud Computing
- Postscript
- References
- Preamble
- Beware of Overtrained Models
- A Diversity of Models and Techniques Is Best
- The Process Is More Important Than the Tool
- Text Mining of Unstructured Data Is Becoming Very Important
- Practice Thinking about Your Organization as Organism Rather Than as Machine
- Good Solutions Evolve Rather Than Just Appear after Initial Efforts
- What You Don’t Do Is Just as Important as What You Do
- Very Intuitive Graphical Interfaces Are Replacing Procedural Programming
- Data Mining Is No Longer a Boutique Operation; It Is Firmly Established in the Mainstream of Our Society
- “Smart” Systems Are the Direction in Which Data Mining Technology Is Going
- Postscript
- References
Show and hide more
Product information
- Title: Handbook of Statistical Analysis and Data Mining Applications
- Author(s): Robert Nisbet, John Elder, Gary Miner
- Release date: May 2009
- Publisher(s): Elsevier Science
- ISBN: 9780080912035