Introduction to Data Mining, Machine Learning, and Data-Driven Predictions for Materials Science and Engineering

Below is the book’s table of content
including abstracts for all chapters:
expand/collapse all

- Part I:
- Introduction and Foundations

- 1.1
- Where do all the Numbers Come From?
- 1.2
- The Ancient Roots of Data Science
- 1.3
- The First True Data Scientist
- 1.4
- The More Recent Roots of Data Science
- 1.5
- Data Science and Machine Learning in the 20th Century
- 1.6
- Summary and Conclusion
- 1.7
- References

- 2.1
- What is Data Science, and how is it related to Machine Learning and AI?
- 2.2
- Data Science and Machine Learning in Materials Science and Engineering
- 2.3
- From Data and Information to Knowledge
- 2.4
- The Curse of Dimensionality
- 2.5
- Summary and Conclusion
- 2.6
- References

- 3.1
- General Conventions and Notations
- 3.2
- Sets, Tuples, Vectors and Arrays
- 3.3
- Representation of Data in Statistics and Machine Learning
- 3.4
- Summary and Conclusion
- 3.5
- References

- 4.1
- Introduction
- 4.2
- Dataset MDS-1: Tensile Test with Parameter Uncertainties
- 4.3
- Dataset MDS-2: Microstructure Evolution with the Ising Model
- 4.4
- Dataset MDS-3: Cahn-Hilliard Model
- 4.5
- Dataset MDS-4: Properties of Chemical Elements
- 4.6
- Dataset MDS-5: Nanoindentation of a Cu-Cr Composite
- 4.7
- Dataset DS-1: The Iris Flower Dataset
- 4.8
- Dataset DS-2: The Handwritten Digits Dataset
- 4.9
- Online Resource for Obtaining Training Data
- 4.10
- References

- Part II:
- A Primer on Probabilities, Distributions, and Statistics

- 5.1
- Combinatorics
- 5.2
- Probabilities
- 5.3
- Conditional Probabilities, Product rule, and Bayes’ theorem
- 5.4
- Summary
- 5.5
- Exercises
- 5.6
- References

- 6.1
- Random Variables
- 6.2
- Introduction of Probability Functions
- 6.3
- Discrete Probability Distributions
- 6.4
- Continuous Probability Distributions
- 6.5
- Multivariate Discrete and Continuous Distribution
- 6.6
- Bivariate Distributions as a Special Case
- 6.7
- Summary

- 7.1
- Expected Values of Discrete Random Variables
- 7.2
- Variance and Standard Deviation
- 7.3
- Raw Moments
- 7.4
- Central Moments
- 7.5
- Standardized Moments
- 7.6
- Exercises

- 8.1
- And What Now Is Statistics?
- 8.2
- The Sample and the Population
- 8.3
- Two Flavors: Descriptive and Inferential Statistics
- 8.4
- Sampling Strategies
- 8.5
- The Law of Large and Truly Large Numbers
- 8.6
- Central Limit Theorem
- 8.7
- Relations between Multivariate Variables: Covarince and Correlation
- 8.8
- Exercises
- 8.9
- References

- 9.1
- The Why, the When, and the How
- 9.2
- Two Preliminary Steps
- 9.3
- Descriptive Statistics
- 9.4
- Data Visualization
- 9.5
- Exercises
- 9.6
- References

- 10.1
- About the Following Discrete and Continuous Distributions
- 10.2
- Discrete Uniform Distribution
- 10.3
- Bernoulli Distribution
- 10.4
- Binomial Distribution
- 10.5
- Geometric Distribution
- 10.6
- Poisson Distribution
- 10.7
- Normal Distribution
- 10.8
- Bivariate Normal Distribution
- 10.9
- Multivariate Normal Distribution
- 10.10
- The Relation between Covariance Matrix and Multivariate Normal Distribution
- 10.11
- Lognormal Distribution
- 10.12
- Exponential Distribution
- 10.13
- Logistic Distribution
- 10.14
- References

- Part III:
- Classical Machine Learning

- 11.1
- The Definition(s) of Machine Learning
- 11.2
- How and what do machines learn?
- 11.3
- Introduction of the General Machine Learning Workflow
- 11.4
- Data Collection
- 11.5
- Data Preprocessing
- 11.6
- A Taxonomy of Machine Learning Models
- 11.7
- Error Measures for Numerical Data
- 11.8
- Similarity Measures for Classification Problems
- 11.9
- Exercises
- 11.10
- References

- 12.1
- The Roots of Regression Analysis
- 12.2
- General Concepts and Important Terminology
- 12.3
- Simple Linear Regression
- 12.4
- Computational Aspects of Vectorization
- 12.5
- A Worked Example of Simple Linear Regression
- 12.6
- Multiple Linear Regression Models
- 12.7
- Exercises
- 12.8
- References

- 13.1
- Non-linear Model Behavior with Linear Regression
- 13.2
- Generalized Formulations and Vectorization for Multiple Linear Regression
- 13.3
- Generalized Formulation of Linear Regression With Basis Functions
- 13.4
- Formulation of Chosen Cases in Terms of Basis Functions
- 13.5
- Semi- and Non-Parametric Regression
- 13.6
- Further Nonlinear Regression Models
- 13.7
- Summary and Conclusion
- 13.8
- Exercises
- 13.9
- References

- 14.1
- Introduction to Supervised Classification
- 14.2
- Rule-Based Classification Methods and Decision Trees
- 14.3
- Notions and Concepts for Classification Problems
- 14.4
- Nearest Neighbor Classifier
- 14.5
- Gaussian Naive Bayes Model
- 14.6
- Support Vector Machines
- 14.7
- Exercises
- 14.8
- References

- 15.1
- Introduction to Dimensionality Reduction
- 15.2
- Principal Component Analysis: Theoretical Background and Derivations
- 15.3
- Application Aspects and Examples of PCA
- 15.4
- Further Methods for Dimensionality Reduction
- 15.5
- Clustering
- 15.6
- Materials Science Examples
- 15.7
- Exercises
- 15.8
- References

- 16.1
- Feature Engineering and Feature Importance
- 16.2
- Data Splitting, Cross Validation, and Statistical Resampling
- 16.3
- Baseline Models
- 16.4
- References

- Part IV:
- Artificial Neural Networks and Deep Learning

- 17.1
- A First Model of a Neuron
- 17.2
- The Rosenblatt Perceptron
- 17.3
- The ADALINE model
- 17.4
- Increasing the Complexity: Assemblies of Neurons
- 17.5
- Summary and Historical Remarks
- 17.6
- Exercises
- 17.7
- References

- 18.1
- Overview of the Historical Developments
- 18.2
- Activation functions
- 18.3
- Backpropagation – Introduction and Example
- 18.4
- General Formulation of Backpropagation
- 18.5
- Python Implementation and Example for the Fully Connected Network
- 18.6
- Further Concepts and Techniques
- 18.7
- Less Is More: The Concept of Dropout
- 18.8
- Example: Microstructure Classification and Property Prediction
- 18.9
- Exercises
- 18.10
- References

- 19.1
- Convolutional Neural Networks
- 19.2
- Deep Learning Techniques
- 19.3
- Two Examples for Deep Learning in Microscopy
- 19.4
- Autoencoder – or: How to Learn With Networks Without Supervision
- 19.5
- Generative Adversarial Networks – or: How to Create Data?
- 19.6
- Physics Informed Machine Learning and Beyond
- 19.7
- Summary and Conclusion
- 19.8
- Exercises
- 19.9
- References

- Part V:
- Supplementary Material and Appendix

- A.1
- Vector Calculus
- A.2
- Matrices and Matrix Operations
- A.3
- Derived Properties, Further Theorems, and Advanced Definitions
- A.4
- Matrix-related Operation that only exists in Numpy

- B.1
- Proofs of the Theorems about Expectation Values
- B.2
- Proofs of Some Theorems about Variances
- B.3
- Simplification of Pearson Moment Coefficient of Skewness
- B.4
- Proofs and Additional Derivations for Distributions
- B.5
- References