Discovering Knowledge in Data
An Introduction to Data Mining
(Sprache: Englisch)
The field of data mining lies at the confluence of predictive analytics, statistical analysis, and business intelligence. Due to the ever-increasing complexity and size of data sets and the wide range of applications in computer science, business, and...
Leider schon ausverkauft
versandkostenfrei
Buch (Gebunden)
100.10 €
- Lastschrift, Kreditkarte, Paypal, Rechnung
- Kostenlose Rücksendung
- Ratenzahlung möglich
Produktdetails
Produktinformationen zu „Discovering Knowledge in Data “
Klappentext zu „Discovering Knowledge in Data “
The field of data mining lies at the confluence of predictive analytics, statistical analysis, and business intelligence. Due to the ever-increasing complexity and size of data sets and the wide range of applications in computer science, business, and health care, the process of discovering knowledge in data is more relevant than ever before. This book provides the tools needed to thrive in today's big data world. The author demonstrates how to leverage a company's existing databases to increase profits and market share, and carefully explains the most current data science methods and techniques. The reader will "learn data mining by doing data mining". By adding chapters on data modelling preparation, imputation of missing data, and multivariate statistical analysis, Discovering Knowledge in Data, Second Edition remains the eminent reference on data mining.* The second edition of a highly praised, successful reference on data mining, with thorough coverage of big data applications, predictive analytics, and statistical analysis.
* Includes new chapters on Multivariate Statistics, Preparing to Model the Data, and Imputation of Missing Data, and an Appendix on Data Summarization and Visualization
* Offers extensive coverage of the R statistical programming language
* Contains 280 end-of-chapter exercises
* Includes a companion website with further resources for all readers, and Powerpoint slides, a solutions manual, and suggested projects for instructors who adopt the book
Inhaltsverzeichnis zu „Discovering Knowledge in Data “
PREFACE xiCHAPTER 1 AN INTRODUCTION TO DATA MINING 1
1.1 What is Data Mining? 1
1.2 Wanted: Data Miners 2
1.3 The Need for Human Direction of Data Mining 3
1.4 The Cross-Industry Standard Practice for Data Mining 4
1.4.1 Crisp-DM: The Six Phases 5
1.5 Fallacies of Data Mining 6
1.6 What Tasks Can Data Mining Accomplish? 8
1.6.1 Description 8
1.6.2 Estimation 8
1.6.3 Prediction 10
1.6.4 Classification 10
1.6.5 Clustering 12
1.6.6 Association 14
References 14
Exercises 15
CHAPTER 2 DATA PREPROCESSING 16
2.1 Why do We Need to Preprocess the Data? 17
2.2 Data Cleaning 17
2.3 Handling Missing Data 19
2.4 Identifying Misclassifications 22
2.5 Graphical Methods for Identifying Outliers 22
2.6 Measures of Center and Spread 23
2.7 Data Transformation 26
2.8 Min-Max Normalization 26
2.9 Z-Score Standardization 27
2.10 Decimal Scaling 28
2.11 Transformations to Achieve Normality 28
2.12 Numerical Methods for Identifying Outliers 35
2.13 Flag Variables 36
2.14 Transforming Categorical Variables into Numerical Variables 37
2.15 Binning Numerical Variables 38
2.16 Reclassifying Categorical Variables 39
2.17 Adding an Index Field 39
2.18 Removing Variables that are Not Useful 39
2.19 Variables that Should Probably Not Be Removed 40
2.20 Removal of Duplicate Records 41
2.21 A Word About ID Fields 41
The R Zone 42
References 48
Exercises 48
Hands-On Analysis 50
CHAPTER 3 EXPLORATORY DATA ANALYSIS 51
3.1 Hypothesis Testing Versus Exploratory Data Analysis 51
3.2 Getting to Know the Data Set 52
3.3 Exploring Categorical Variables 55
3.4 Exploring Numeric Variables
... mehr
62
3.5 Exploring Multivariate Relationships 69
3.6 Selecting Interesting Subsets of the Data for Further Investigation 71
3.7 Using EDA to Uncover Anomalous Fields 71
3.8 Binning Based on Predictive Value 72
3.9 Deriving New Variables: Flag Variables 74
3.10 Deriving New Variables: Numerical Variables 77
3.11 Using EDA to Investigate Correlated Predictor Variables 77
3.12 Summary 80
The R Zone 82
Reference 88
Exercises 88
Hands-On Analysis 89
CHAPTER 4 UNIVARIATE STATISTICAL ANALYSIS 91
4.1 Data Mining Tasks in Discovering Knowledge in Data 91
4.2 Statistical Approaches to Estimation and Prediction 92
4.3 Statistical Inference 93
4.4 How Confident are We in Our Estimates? 94
4.5 Confidence Interval Estimation of the Mean 95
4.6 How to Reduce the Margin of Error 97
4.7 Confidence Interval Estimation of the Proportion 98
4.8 Hypothesis Testing for the Mean 99
4.9 Assessing the Strength of Evidence Against the Null Hypothesis 101
4.10 Using Confidence Intervals to Perform Hypothesis Tests 102
4.11 Hypothesis Testing for the Proportion 104
The R Zone 105
Reference 106
Exercises 106
CHAPTER 5 MULTIVARIATE STATISTICS 109
5.1 Two-Sample t-Test for Difference in Means 110
5.2 Two-Sample Z-Test for Difference in Proportions 111
5.3 Test for Homogeneity of Proportions 112
5.4 Chi-Square Test for Goodness of Fit of Multinomial Data 114
5.5 Analysis of Variance 115
3.5 Exploring Multivariate Relationships 69
3.6 Selecting Interesting Subsets of the Data for Further Investigation 71
3.7 Using EDA to Uncover Anomalous Fields 71
3.8 Binning Based on Predictive Value 72
3.9 Deriving New Variables: Flag Variables 74
3.10 Deriving New Variables: Numerical Variables 77
3.11 Using EDA to Investigate Correlated Predictor Variables 77
3.12 Summary 80
The R Zone 82
Reference 88
Exercises 88
Hands-On Analysis 89
CHAPTER 4 UNIVARIATE STATISTICAL ANALYSIS 91
4.1 Data Mining Tasks in Discovering Knowledge in Data 91
4.2 Statistical Approaches to Estimation and Prediction 92
4.3 Statistical Inference 93
4.4 How Confident are We in Our Estimates? 94
4.5 Confidence Interval Estimation of the Mean 95
4.6 How to Reduce the Margin of Error 97
4.7 Confidence Interval Estimation of the Proportion 98
4.8 Hypothesis Testing for the Mean 99
4.9 Assessing the Strength of Evidence Against the Null Hypothesis 101
4.10 Using Confidence Intervals to Perform Hypothesis Tests 102
4.11 Hypothesis Testing for the Proportion 104
The R Zone 105
Reference 106
Exercises 106
CHAPTER 5 MULTIVARIATE STATISTICS 109
5.1 Two-Sample t-Test for Difference in Means 110
5.2 Two-Sample Z-Test for Difference in Proportions 111
5.3 Test for Homogeneity of Proportions 112
5.4 Chi-Square Test for Goodness of Fit of Multinomial Data 114
5.5 Analysis of Variance 115
... weniger
Autoren-Porträt von Daniel T. Larose
Daniel T. Larose earned his PhD in Statistics at the University of Connecticut. He is Professor of Mathematical Sciences and Director of the Data Mining programs at Central Connecticut State University. His consulting clients have included Microsoft, Forbes Magazine, the CIT Group, KPMG International, Computer Associates, and Deloitte, Inc. This is Larose's fourth book for Wiley.Chantal D. Larose is a PhD candidate in Statistics at the University of Connecticut. Her research focuses on the imputation of missing data and model-based clustering. She has taught undergraduate statistics since 2011, and has done statistical consulting for DataMiningConsultant.com, LLC.
Bibliographische Angaben
- Autor: Daniel T. Larose
- 2014, 2. Aufl., 336 Seiten, Maße: 16,1 x 24 cm, Gebunden, Englisch
- Verlag: Wiley & Sons
- ISBN-10: 0470908742
- ISBN-13: 9780470908747
- Erscheinungsdatum: 01.06.2014
Sprache:
Englisch
Kommentar zu "Discovering Knowledge in Data"
0 Gebrauchte Artikel zu „Discovering Knowledge in Data“
Zustand | Preis | Porto | Zahlung | Verkäufer | Rating |
---|
Schreiben Sie einen Kommentar zu "Discovering Knowledge in Data".
Kommentar verfassen