Handbook of Statistical Analysis and Data Mining Applications

Handbook of Statistical Analysis and Data Mining Applications

Read it now on the O’Reilly learning platform with a 10-day free trial.

O’Reilly members get unlimited access to books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Book description

The Handbook of Statistical Analysis and Data Mining Applications is a comprehensive professional reference book that guides business analysts, scientists, engineers and researchers (both academic and industrial) through all stages of data analysis, model building and implementation. The Handbook helps one discern the technical and business problem, understand the strengths and weaknesses of modern data mining algorithms, and employ the right statistical methods for practical application. Use this book to address massive and complex datasets with novel statistical approaches and be able to objectively evaluate analyses and solutions. It has clear, intuitive explanations of the principles and tools for solving problems using modern analytic techniques, and discusses their application to real problems, in ways accessible and beneficial to practitioners across industries - from science and engineering, to medicine, academia and commerce. This handbook brings together, in a single resource, all the information a beginner will need to understand the tools and issues in data mining to build successful data mining solutions.

Show and hide more Table of contents Product information

Table of contents

  1. Cover image
  2. Title page
  3. Table of Contents
  4. Copyright
  5. Foreword 1
  6. Foreword 2
  7. Preface
    1. OVERALL ORGANIZATION OF THIS BOOK
    2. References
    3. SAS
    4. STATSOFT
    5. SPSS
    1. Patterns of Action
    2. Human Intuition
    3. Putting it all Together
    4. References
    1. Chapter 1. The Background for Data Mining Practice
      1. Preamble
      2. A Short History of Statistics and Data Mining
      3. Modern Statistics: A Duality?
      4. Two Views of Reality
      5. The Rise of Modern Statistical Analysis: The Second Generation
      6. Machine Learning Methods: The Third Generation
      7. Statistical Learning Theory: The Fourth Generation
      8. Postscript
      9. References
      1. Preamble
      2. The Scientific Method
      3. What Is Data Mining?
      4. A Theoretical Framework for the Data Mining Process
      5. Strengths of the Data Mining Process
      6. Customer-Centric Versus Account-Centric: A New Way to Look at Your Data
      7. The Data Paradigm Shift
      8. Creation of the CAR
      9. Major Activities of Data Mining
      10. Major Challenges of Data Mining
      11. Examples of Data Mining Applications
      12. Major Issues in Data Mining
      13. General Requirements for Success in a Data Mining Project
      14. Example of a Data Mining Project: Classify a Bat’s Species by Its Sound
      15. The Importance of Domain Knowledge
      16. Postscript
      17. References
      1. Preamble
      2. The Science of Data Mining
      3. The Approach to Understanding and Problem Solving
      4. Business Understanding (Mostly Art)
      5. Data Understanding (Mostly Science)
      6. Data Preparation (A Mixture of Art and Science)
      7. Modeling (A Mixture of Art and Science)
      8. Deployment (Mostly Art)
      9. Closing the Information Loop* (Art)
      10. The Art of Data Mining
      11. Postscript
      12. References
      1. Preamble
      2. Activities of Data Understanding and Preparation
      3. Issues That Should be Resolved
      4. Data Understanding
      5. Postscript
      6. References
      1. Preamble
      2. Variables as Features
      3. Types of Feature Selections
      4. Feature Ranking Methods
      5. SUBSET SELECTION METHODS
      6. Postscript
      7. References
      1. Preamble
      2. Data Access Tools
      3. Data Exploration Tools
      4. Modeling Management Tools
      5. Modeling Analysis Tools
      6. In-Place Data Processing (IDP)
      7. Rapid Deployment of Predictive Models
      8. Model Monitors
      9. Postscript
      10. Bibliography
      1. Chapter 7. Basic Algorithms for Data Mining: A Brief Overview
        1. Preamble
        2. Basic Data Mining Algorithms
        3. Generalized Additive Models (GAMs)
        4. Classification and Regression Trees (CART)
        5. General Chaid Models
        6. Generalized EM and k-Means Cluster Analysis—An Overview
        7. Postscript
        8. References
        9. Bibliography
        1. Preamble
        2. Advanced Data Mining Algorithms
        3. Image and Object Data Mining: Visualization and 3D-Medical and Other Scanning Imaging
        4. Postscript
        5. References
        1. Preamble
        2. The Development of Text Mining
        3. A Practical Example: NTSB
        4. Text Mining Concepts Used in Conducting Text Mining Studies
        5. Postscript
        6. References
        1. Preamble
        2. SPSS Clementine Overview
        3. SAS-Enterprise Miner (SAS-EM) Overview
        4. STATISTICA Data Miner, QC-Miner, and Text Miner Overview
        5. Postscript
        6. References
        1. Preamble
        2. What is Classification?
        3. Initial Operations in Classification
        4. Major Issues with Classification
        5. Assumptions of Classification Procedures
        6. Methods for Classification
        7. What is the Best Algorithm for Classification?
        8. Postscript
        9. References
        1. Preamble
        2. Linear Response Analysis and the Assumptions of the Parametric Model
        3. Parametric Statistical Analysis
        4. Assumptions of the Parametric Model
        5. Linear Regression
        6. Generalized Linear Models (GLMs)
        7. Methods for Analyzing Nonlinear Relationships
        8. Nonlinear Regression and Estimation
        9. Data Mining and Machine Learning Algorithms Used in Numerical Prediction
        10. Advantages of Classification and Regression Trees (C&RT) Methods
        11. Application to Mixed Models
        12. Neural Nets for Prediction
        13. Support Vector Machines (SVMs) and Other Kernel Learning Algorithms
        14. Postscript
        15. References
        1. Preamble
        2. Introduction
        3. Model Evaluation
        4. Re-Cap of the Most Popular Algorithms
        5. Enhancement Action Checklist
        6. Ensembles of Models: The Single Greatest Enhancement Technique
        7. How to Thrive as a Data Miner
        8. Postscript
        9. References
        1. Preamble
        2. What Is Medical Informatics?
        3. How Data Mining and Text Mining Relate to Medical Informatics
        4. 3D Medical Informatics
        5. Postscript
        6. References
        7. Bibliography
        1. Preamble
        2. What Is Bioinformatics?
        3. Data Analysis Methods in Bioinformatics
        4. Web Services in Bioinformatics
        5. How Do We Apply Data Mining Methods to Bioinformatics?
        6. Postscript
        7. References
        8. Bibliography
        1. Preamble
        2. Early CRM Issues in Business
        3. Knowing How Customers Behaved Before They Acted
        4. CRM in Business Ecosystems
        5. Conclusions
        6. Postscript
        7. References
        1. Preamble
        2. Issues with Fraud Detection
        3. How Do You Detect Fraud?
        4. Supervised Classification of Fraud
        5. How Do You Model Fraud?
        6. How Are Fraud Detection Systems Built?
        7. Intrusion Detection Modeling
        8. Comparison of Models with and Without Time-Based Features
        9. Building Profiles
        10. Deployment of Fraud Profiles
        11. Postscript and Prolegomenon
        12. References
        1. Guest Authors of the Tutorials
        2. Tutorial A. How to Use Data Miner Recipe: STATISTICA Data Miner Only
          1. What Is STATISTICA Data Miner Recipe (DMR)?
          2. Core Analytic Ingredients
          1. Airline Safety
          2. SDR Database
          3. Preparing the Data for Our Tutorial
          4. Data Mining Approach
          5. Data Mining Algorithm Error Rate
          6. Conclusion
          7. References
          1. Introduction
          2. Data and Variable Definitions
          3. Getting to Know the Workspace of the Clementine Data Mining Toolkit
          4. Results
          5. Publishing and Reuse of Models and Other Outputs
          6. References
          1. Introduction
          2. A Primer of SAS-EM Predictive Modeling
          3. Scoring Process and the Total Profit
          4. Oversampling and Rare Event Detection
          5. Decision Matrix and the Profit Charts
          6. Micro-Target the Profitable Customers
          7. Appendix
          8. Reference
          1. Introduction: What Is Credit Scoring?
          2. Credit Scoring: Business Objectives
          3. Case Study: Consumer Credit Scoring
          4. Analysis and Results
          5. Comparative Assessment of the Models (Evaluation)
          6. Deploying the Model for Prediction
          7. Conclusion
          1. Objectives
          2. Steps
          1. Introduction
          2. Text Mining
          3. Car Review Example
          4. Interactive Trees (C&RT, CHAID)
          5. Other Applications of Text Mining
          6. Conclusion
          1. Predictive Process Control Using STATISTICA and STATISTICA QC-Miner
          2. Case Study: Predictive Process Control
          3. Data Analyses with STATISTICA
          4. Conclusion
          1. References
          1. Introduction
          2. Modeling Strategy
          3. SAS-EM 5.3 Interface
          4. A Primer of SAS-EM Predictive Modeling
          5. Advanced Techniques of Predictive Modeling
          6. Micro-Target the Profitable Customers
          7. Appendix
          8. References
          1. Background
          2. Data
          3. References
          1. Chapter 18. Model Complexity (and How Ensembles Help)
            1. Preamble
            2. Model Ensembles
            3. Complexity
            4. Generalized Degrees of Freedom
            5. Examples: Decision Tree Surface with Noise
            6. Summary and Discussion
            7. Postscript
            8. References
            1. Preamble
            2. More Is Not Necessarily Better: Lessons from Nature and Engineering
            3. Embrace Change Rather Than Flee from It
            4. Decision Making Breeds True in the Business Organism
            5. The 80:20 Rule in Action
            6. Agile Modeling: An Example of How to Craft Sufficient Solutions
            7. Postscript
            8. References
            1. Preamble
            2. Introduction
            3. 0 Lack Data
            4. 1 Focus on Training
            5. 2 Rely on One Technique
            6. 3 Ask the Wrong Question
            7. 4 Listen (Only) to the Data
            8. 5 Accept Leaks from the Future
            9. 6 Discount Pesky Cases
            10. 7 Extrapolate
            11. 8 Answer Every Inquiry
            12. 9 Sample Casually
            13. 10 Believe the Best Model
            14. How Shall We Then Succeed?
            15. Postscript
            16. References
            1. Preamble
            2. RFID
            3. Social Networking and Data Mining
            4. Image and Object Data Mining
            5. Cloud Computing
            6. Postscript
            7. References
            1. Preamble
            2. Beware of Overtrained Models
            3. A Diversity of Models and Techniques Is Best
            4. The Process Is More Important Than the Tool
            5. Text Mining of Unstructured Data Is Becoming Very Important
            6. Practice Thinking about Your Organization as Organism Rather Than as Machine
            7. Good Solutions Evolve Rather Than Just Appear after Initial Efforts
            8. What You Don’t Do Is Just as Important as What You Do
            9. Very Intuitive Graphical Interfaces Are Replacing Procedural Programming
            10. Data Mining Is No Longer a Boutique Operation; It Is Firmly Established in the Mainstream of Our Society
            11. “Smart” Systems Are the Direction in Which Data Mining Technology Is Going
            12. Postscript
            13. References
            Show and hide more

            Product information

            • Title: Handbook of Statistical Analysis and Data Mining Applications
            • Author(s): Robert Nisbet, John Elder, Gary Miner
            • Release date: May 2009
            • Publisher(s): Elsevier Science
            • ISBN: 9780080912035