Machine Learning for Business Analytics
Concepts, Techniques and Applications with JMP Pro
(Sprache: Englisch)
MACHINE LEARNING FOR BUSINESS ANALYTICS An up-to-date introduction to a market-leading platform for data analysis and machine learning Machine Learning for Business Analytics: Concepts, Techniques, and Applications with JMP Pro(r), 2nd ed. offers an...
Leider schon ausverkauft
versandkostenfrei
Buch (Gebunden)
130.50 €
Produktdetails
Produktinformationen zu „Machine Learning for Business Analytics “
Klappentext zu „Machine Learning for Business Analytics “
MACHINE LEARNING FOR BUSINESS ANALYTICS An up-to-date introduction to a market-leading platform for data analysis and machine learning Machine Learning for Business Analytics: Concepts, Techniques, and Applications with JMP Pro(r), 2nd ed. offers an accessible and engaging introduction to machine learning. It provides concrete examples and case studies to educate new users and deepen existing users' understanding of their data and their business. Fully updated to incorporate new topics and instructional material, this remains the only comprehensive introduction to this crucial set of analytical tools specifically tailored to the needs of businesses. Machine Learning for Business Analytics: Concepts, Techniques, and Applications with JMP Pro(r), 2nd ed. readers will also find: * Updated material which improves the book's usefulness as a reference for professionals beyond the classroom * Four new chapters, covering topics including Text Mining and Responsible Data Science * An updated companion website with data sets and other instructor resources: www.jmp.com/dataminingbook * A guide to JMP Pro(r)'s new features and enhanced functionality Machine Learning for Business Analytics: Concepts, Techniques, and Applications with JMP Pro(r), 2nd ed. is ideal for students and instructors of business analytics and data mining classes, as well as data science practitioners and professionals in data-driven industries.
Inhaltsverzeichnis zu „Machine Learning for Business Analytics “
Foreword xix Preface xx Acknowledgments xxiii PART I PRELIMINARIES 1 Introduction 3 1.1 What Is Business Analytics? 3 1.2 What Is Machine Learning? 5 1.3 Machine Learning, AI, and Related Terms 5 Statistical Modeling vs. Machine Learning 6 1.4 Big Data 6 1.5 Data Science 7 1.6 Why Are There So Many Different Methods? 8 1.7 Terminology and Notation 8 1.8 Road Maps to This Book 10 Order of Topics 12 2 Overview of the Machine Learning Process 17 2.1 Introduction 17 2.2 Core Ideas in Machine Learning 18 Classification 18 Prediction 18 Association Rules and Recommendation Systems 18 Predictive Analytics 19 Data Reduction and Dimension Reduction 19 Data Exploration and Visualization 19 Supervised and Unsupervised Learning 19 2.3 The Steps in A Machine Learning Project 21 2.4 Preliminary Steps 22 Organization of Data 22 Sampling from a Database 22 Oversampling Rare Events in Classification Tasks 23 Preprocessing and Cleaning the Data 23 2.5 Predictive Power and Overfitting 29 Overfitting 29 Creation and Use of Data Partitions 31 2.6 Building a Predictive Model with JMP Pro 34 Predicting Home Values in a Boston Neighborhood 34 Modeling Process 36 2.7 Using JMP Pro for Machine Learning 42 2.8 Automating Machine Learning Solutions 43 Predicting Power Generator Failure 44 Uber's Michelangelo 45 2.9 Ethical Practice in Machine Learning 47 Machine Learning Software: The State of the Market by Herb Edelstein 47 Problems 52 PART II DATA EXPLORATION AND DIMENSION REDUCTION 3 Data Visualization 59 3.1 Introduction 59 3.2 Data Examples 61 Example 1: Boston Housing Data 61 Example 2: Ridership on Amtrak Trains 62 3.3 Basic Charts: Bar Charts, Line Graphs, and Scatter Plots 62 Distribution Plots: Boxplots and Histograms 64 Heatmaps 67 3.4 Multidimensional Visualization 70 Adding Variables: Color, Hue, Size, Shape, Multiple Panels, Animation 70 Manipulations: Rescaling, Aggregation and Hierarchies, Zooming, Filtering 73 Reference: Trend Line and Labels 77 Scaling Up: Large Datasets 79
... mehr
Multivariate Plot: Parallel Coordinates Plot 80 Interactive Visualization 80 3.5 Specialized Visualizations 82 Visualizing Networked Data 82 Visualizing Hierarchical Data: More on Treemaps 83 Visualizing Geographical Data: Maps 84 3.6 Summary: Major Visualizations and Operations, According to Machine Learning Goal 87 Prediction 87 Classification 87 Time Series Forecasting 87 Unsupervised Learning 88 Problems 89 4 Dimension Reduction 91 4.1 Introduction 91 4.2 Curse of Dimensionality 92 4.3 Practical Considerations 92 Example 1: House Prices in Boston 92 4.4 Data Summaries 93 Summary Statistics 94 Tabulating Data 96 4.5 Correlation Analysis 97 4.6 Reducing the Number of Categories in Categorical Variables 98 4.7 Converting a Categorical Variable to a Continuous Variable 100 4.8 Principal Component Analysis 100 Example 2: Breakfast Cereals 101 Principal Components 106 Standardizing the Data 107 Using Principal Components for Classification and Prediction 110 4.9 Dimension Reduction Using Regression Models 110 4.10 Dimension Reduction Using Classification and Regression Trees 111 Problems 112 PART III PERFORMANCE EVALUATION 5 Evaluating Predictive Performance 117 5.1 Introduction 118 5.2 Evaluating Predictive Performance 118 Naive Benchmark: The Average 118 Prediction Accuracy Measures 119 Comparing Training and Validation Performance 120 5.3 Judging Classifier Performance 121 Benchmark: The Naive Rule 121 Class Separation 121 The Classification (Confusion) Matrix 122 Using the Validation Data 123 Accuracy Measures 123 Propensities and Threshold for Classification 124 Performance in Unequal Importance of Classes 127 Asymmetric Misclassification Costs 130 Generalization to More Than Two Classes 132 5.4 Judging Ranking Performance 133 Lift Curves for Binary Data 133 Beyond Two Classes 135 Lift Curves Incorporating Costs and Benefits 136 5.5 Oversampling 137 Creating an Over-sampled Training Set 139 Evaluating Model Performance Using a Nonoversampled Validation Set 139 Evaluating Model Performance If Only Oversampled Validation Set Exists 140 Problems 142 PART IV PREDICTION AND CLASSIFICATION METHODS 6 Multiple Linear Regression 147 6.1 Introduction 147 6.2 Explanatory vs. Predictive Modeling 148 6.3 Estimating the Regression Equation and Prediction 149 Example: Predicting the Price of Used Toyota Corolla Automobiles 150 6.4 Variable Selection in Linear Regression 155 Reducing the Number of Predictors 155 How to Reduce the Number of Predictors 156 Manual Variable Selection 156 Automated Variable Selection 157 Regularization (Shriknage Models) 164 Problems 170 7 k-Nearest Neighbors (k-NN) 175 7.1 The k-NN Classifier (Categorical Outcome) 175 Determining Neighbors 175 Classification Rule 176 Example: Riding Mowers 176 Choosing Parameter k 178 Setting the Threshold Value 179 Weighted k-NN 181 k-NN with More Than Two Classes 182 Working with Categorical Predictors 182 7.2 k-NN for a Numerical Response 184 7.3 Advantages and Shortcomings of k-NN Algorithms 184 Problems 186 8 The Naive Bayes Classifier 189 8.1 Introduction 189 Threshold Probability Method 190 Conditional Probability 190 Example 1: Predicting Fraudulent Financial Reporting 190 8.2 Applying the Full (Exact) Bayesian Classifier 191 Using the "Assign to the Most Probable Class" Method 191 Using the Threshold Probability Method 191 Practical Difficulty with the Complete (Exact) Bayes Procedure 192 8.3 Solution: Naive Bayes 192 The Naive Bayes Assumption of Conditional Independence 193 Using the Threshold Probability Method 194 Example 2: Predicting Fraudulent Financial Reports 194 Example 3: Predicting Delayed Flights 195 Evaluating the Performance of Naive Bayes Output from JMP 198 Working with Continuous Predictors 199 8.4 Advantages and Shortcomings of the Naive Bayes Classifier 201 Problems 203 9 Classification and Regression Trees 205 9.1 Introduction 206 Tree Structure 206 Decision Rules 207 Classifying a New Record 207 9.2 Classification Trees 207 Recursive Partitioning 207 Example 1: Riding Mowers 208 Categorical Predictors 210 Standardization 210 9.3 Growing a Tree for Riding Mowers Example 210 Choice of First Split 211 Choice of Second Split 212 Final Tree 212 Using a Tree to Classify New Records 213 9.4 Evaluating the Performance of a Classification Tree 215 Example 2: Acceptance of Personal Loan 215 9.5 Avoiding Overfitting 219 Stopping Tree Growth: CHAID 220 Growing a Full Tree and Pruning It Back 220 How JMP Pro Limits Tree Size 221 9.6 Classification Rules from Trees 222 9.7 Classification Trees for More Than Two Classes 224 9.8 Regression Trees 224 Prediction 224 Evaluating Performance 225 9.9 Advantages and Weaknesses of a Single Tree 227 9.10 Improving Prediction: Random Forests and Boosted Trees 229 Random Forests 229 Boosted Trees 230 Problems 233 10 Logistic Regression 237 10.1 Introduction 237 10.2 The Logistic Regression Model 239 10.3 Example: Acceptance of Personal Loan 240 Model with a Single Predictor 241 Estimating the Logistic Model from Data: Multiple Predictors 243 Interpreting Results in Terms of Odds (for a Profiling Goal) 246 10.4 Evaluating Classification Performance 247 10.5 Variable Selection 249 10.6 Logistic Regression for Multi-class Classification 250 Logistic Regression for Nominal Classes 250 Logistic Regression for Ordinal Classes 251 Example: Accident Data 252 10.7 Example of Complete Analysis: Predicting Delayed Flights 253 Data Preprocessing 255 Model Fitting, Estimation, and Interpretation---A Simple Model 256 Model Fitting, Estimation and Interpretation---The Full Model 257 Model Performance 257 Problems 264 11 Neural Nets 267 11.1 Introduction 267 11.2 Concept and Structure of a Neural Network 268 11.3 Fitting a Network to Data 269 Example 1: Tiny Dataset 269 Computing Output of Nodes 269 Preprocessing the Data 272 Training the Model 273 Using the Output for Prediction and Classification 279 Example 2: Classifying Accident Severity 279 Avoiding Overfitting 281 11.4 User Input in JMP Pro 282 11.5 Exploring the Relationship Between Predictors and Outcome 284 11.6 Deep Learning 285 Convolutional Neural Networks (CNNs) 285 Local Feature Map 287 A Hierarchy of Features 287 The Learning Process 287 Unsupervised Learning 288 Conclusion 289 11.7 Advantages and Weaknesses of Neural Networks 289 Problems 290 12 Discriminant Analysis 293 12.1 Introduction 293 Example 1: Riding Mowers 294 Example 2: Personal Loan Acceptance 294 12.2 Distance of an Observation from a Class 295 12.3 From Distances to Propensities and Classifications 297 12.4 Classification Performance of Discriminant Analysis 300 12.5 Prior Probabilities 301 12.6 Classifying More Than Two Classes 303 Example 3: Medical Dispatch to Accident Scenes 303 12.7 Advantages and Weaknesses 306 Problems 307 13 Generating, Comparing, and Combining Multiple Models 311 13.1 Ensembles 311 Why Ensembles Can Improve Predictive Power 312 Simple Averaging or Voting 313 Bagging 314 Boosting 315 Stacking 316 Advantages and Weaknesses of Ensembles 317 13.2 Automated Machine Learning (AutoML) 317 AutoML: Explore and Clean Data 317 AutoML: Determine Machine Learning Task 318 AutoML: Choose Features and Machine Learning Methods 318 AutoML: Evaluate Model Performance 320 AutoML: Model Deployment 321 Advantages and Weaknesses of Automated Machine Learning 322 13.3 Summary 322 Problems 323 PART V INTERVENTION AND USER FEEDBACK 14 Interventions: Experiments, Uplift Models, and Reinforcement Learning 327 14.1 Introduction 327 14.2 A/B Testing 328 Example: Testing a New Feature in a Photo Sharing App 329 The Statistical Test for Comparing Two Groups (T-Test) 329 Multiple Treatment Groups: A/B/n Tests 333 Multiple A/B Tests and the Danger of Multiple Testing 333 14.3 Uplift (Persuasion) Modeling 333 Getting the Data 334 A Simple Model 336 Modeling Individual Uplift 336 Creating Uplift Models in JMP Pro 337 Using the Results of an Uplift Model 338 14.4 Reinforcement Learning 340 Explore-Exploit: Multi-armed Bandits 340 Markov Decision Process (MDP) 341 14.5 Summary 344 Problems 345 PART VI MINING RELATIONSHIPS AMONG RECORDS 15 Association Rules and Collaborative Filtering 349 15.1 Association Rules 349 Discovering Association Rules in Transaction Databases 350 Example 1: Synthetic Data on Purchases of Phone Faceplates 350 Data Format 350 Generating Candidate Rules 352 The Apriori Algorithm 353 Selecting Strong Rules 353 The Process of Rule Selection 356 Interpreting the Results 358 Rules and Chance 359 Example 2: Rules for Similar Book Purchases 361 15.2 Collaborative Filtering 362 Data Type and Format 363 Example 3: Netflix Prize Contest 363 User-Based Collaborative Filtering: "People Like You" 365 Item-Based Collaborative Filtering 366 Evaluating Performance 367 Advantages and Weaknesses of Collaborative Filtering 368 Collaborative Filtering vs. Association Rules 369 15.3 Summary 370 Problems 372 16 Cluster Analysis 375 16.1 Introduction 375 Example: Public Utilities 377 16.2 Measuring Distance Between Two Records 378 Euclidean Distance 379 Standardizing Numerical Measurements 379 Other Distance Measures for Numerical Data 379 Distance Measures for Categorical Data 382 Distance Measures for Mixed Data 382 16.3 Measuring Distance Between Two Clusters 383 Minimum Distance 383 Maximum Distance 383 Average Distance 383 Centroid Distance 383 16.4 Hierarchical (Agglomerative) Clustering 385 Single Linkage 385 Complete Linkage 386 Average Linkage 386 Centroid Linkage 386 Ward's Method 387 Dendrograms: Displaying Clustering Process and Results 387 Validating Clusters 391 Two-Way Clustering 393 Limitations of Hierarchical Clustering 393 16.5 Nonhierarchical Clustering: The K-Means Algorithm 394 Choosing the Number of Clusters (k) 396 Problems 403 PART VII FORECASTING TIME SERIES 17 Handling Time Series 409 17.1 Introduction 409 17.2 Descriptive vs. Predictive Modeling 410 17.3 Popular Forecasting Methods in Business 411 Combining Methods 411 17.4 Time Series Components 411 Example: Ridership on Amtrak Trains 412 17.5 Data Partitioning and Performance Evaluation 415 Benchmark Performance: Naive Forecasts 417 Generating Future Forecasts 417 Problems 419 18 Regression-Based Forecasting 423 18.1 A Model with Trend 424 Linear Trend 424 Exponential Trend 427 Polynomial Trend 429 18.2 A Model with Seasonality 430 Additive vs. Multiplicative Seasonality 432 18.3 A Model with Trend and Seasonality 433 18.4 Autocorrelation and ARIMA Models 433 Computing Autocorrelation 433 Improving Forecasts by Integrating Autocorrelation Information 437 Fitting AR Models to Residuals 439 Evaluating Predictability 441 Problems 444 19 Smoothing and Deep Learning Methods for Forecasting 455 19.1 Introduction 455 19.2 Moving Average 456 Centered Moving Average for Visualization 456 Trailing Moving Average for Forecasting 457 Choosing Window Width (w) 460 19.3 Simple Exponential Smoothing 461 Choosing Smoothing Parameter alpha 462 Relation Between Moving Average and Simple Exponential Smoothing 465 19.4 Advanced Exponential Smoothing 465 Series With a Trend 465 Series With a Trend and Seasonality 466 19.5 Deep Learning for Forecasting 470 Problems 472 PART VIII DATA ANALYTICS 20 Text Mining 483 20.1 Introduction 483 20.2 The Tabular Representation of Text: Document-Term Matrix and "Bag-of-Words" 484 20.3 Bag-of-Words vs. Meaning Extraction at Document Level 486 20.4 Preprocessing the Text 486 Tokenization 487 Text Reduction 488 Presence/Absence vs. Frequency (Occurrences) 489 Term Frequency-Inverse Document Frequency (TF-IDF) 489 From Terms to Topics: Latent Semantic Analysis and Topic Analysis 490 Extracting Meaning 491 From Terms to High Dimensional Word Vectors: Word2Vec 491 20.5 Implementing Machine Learning Methods 492 20.6 Example: Online Discussions on Autos and Electronics 492 Importing the Records 493 Text Preprocessing in JMP 494 Using Latent Semantic Analysis and Topic Analysis 496 Fitting a Predictive Model 499 Prediction 499 20.7 Example: Sentiment Analysis of Movie Reviews 500 Data Preparation 500 Latent Semantic Analysis and Fitting a Predictive Model 500 20.8 Summary 502 Problems 503 21 Responsible Data Science 505 21.1 Introduction 505 Example: Predicting Recidivism 506 21.2 Unintentional Harm 506 21.3 Legal Considerations 508 The General Data Protection Regulation (GDPR) 508 Protected Groups 508 21.4 Principles of Responsible Data Science 508 Non-maleficence 509 Fairness 509 Transparency 510 Accountability 511 Data Privacy and Security 511 21.5 A Responsible Data Science Framework 511 Justification 511 Assembly 512 Data Preparation 513 Modeling 513 Auditing 513 21.6 Documentation Tools 514 Impact Statements 514 Model Cards 515 Datasheets 516 Audit Reports 516 21.7 Example: Applying the RDS Framework to the COMPAS Example 517 Unanticipated Uses 518 Ethical Concerns 518 Protected Groups 518 Data Issues 518 Fitting the Model 519 Auditing the Model 520 Bias Mitigation 526 21.8 Summary 526 Problems 528 PART IX CASES 22 Cases 533 22.1 Charles Book Club 533 The Book Industry 533 Database Marketing at Charles 534 Machine Learning Techniques 535 Assignment 537 22.2 German Credit 541 Background 541 Data 541 Assignment 544 22.3 Tayko Software Cataloger 545 Background 545 The Mailing Experiment 545 Data 545 Assignment 546 22.4 Political Persuasion 548 Background 548 Predictive Analytics Arrives in US Politics 548 Political Targeting 548 Uplift 549 Data 549 Assignment 550 22.5 Taxi Cancellations 552 Business Situation 552 Assignment 552 22.6 Segmenting Consumers of Bath Soap 554 Business Situation 554 Key Problems 554 Data 555 Measuring Brand Loyalty 556 Assignment 556 22.7 Catalog Cross-Selling 557 Background 557 Assignment 557 22.8 Direct-Mail Fundraising 559 Background 559 Data 559 Assignment 559 22.9 Time Series Case: Forecasting Public Transportation Demand 562 Background 562 Problem Description 562 Available Data 562 Assignment Goal 562 Assignment 563 Tips and Suggested Steps 563 22.10 Loan Approval 564 Background 564 Regulatory Requirements 564 Getting Started 564 Assignment 564 References 567 Data Files Used in the Book 571 Index 573
... weniger
Autoren-Porträt von Galit Shmueli, Peter C. Bruce, Mia L. Stephens, Muralidhara Anandamurthy, Nitin R. Patel
Galit Shmueli, PhD is Distinguished Professor at National Tsing Hua University's Institute of Service Science. She has designed and instructed business analytics courses since 2004 at University of Maryland, Statistics.com, The Indian School of Business, and National Tsing Hua University, Taiwan. Peter C. Bruce is Founder of the Institute for Statistics Education at Statistics.com, and Chief Learning Officer at Elder Research, Inc. Mia L. Stephens, M.S. is an Advisory Product Manager with JMP, driving the product vision and roadmaps for JMP(r) and JMP Pro(r). Muralidhara Anandamurthy, PhD is an Academic Ambassador with JMP, overseeing technical support for academic users of JMP Pro(r). Nitin R. Patel, PhD is cofounder and lead researcher at Cytel Inc. He is also a Fellow of the American Statistical Association and has served as a visiting professor at the Massachusetts Institute of Technology and Harvard University, among others.
Bibliographische Angaben
- Autoren: Galit Shmueli , Peter C. Bruce , Mia L. Stephens , Muralidhara Anandamurthy , Nitin R. Patel
- 2023, 2. Aufl., 608 Seiten, Maße: 15 x 25 cm, Gebunden, Englisch
- Verlag: Wiley & Sons
- ISBN-10: 1119903831
- ISBN-13: 9781119903833
- Erscheinungsdatum: 17.04.2023
Sprache:
Englisch
Kommentar zu "Machine Learning for Business Analytics"
0 Gebrauchte Artikel zu „Machine Learning for Business Analytics“
Zustand | Preis | Porto | Zahlung | Verkäufer | Rating |
---|
Schreiben Sie einen Kommentar zu "Machine Learning for Business Analytics".
Kommentar verfassen