Select and Train a Model
0. Learning Objectives
Identify the key challenges in model development.
Describe how performance on a small set of disproportionately important examples may be more crucial than performance on the majority of examples.
Explain how rare classes in your training data can affect performance.
Define three ways of establishing a baseline for your performance.
Define structured vs. unstructured data.
Identify when to consider deployment constraints when choosing a model.
List the steps involved in getting started with ML modeling.
Describe the iterative process for error analysis.
Identify the key factors in deciding what to prioritize when working to improve model accuracy.
Describe methods you might use for data augmentation given audio data vs. image data.
Explain the problems you can have training on a highly skewed dataset.
Identify a use case in which adding more data to your training dataset could actually hurt performance.
Describe the key components of experiment tracking.
1. Key challenges
AI system = Code (Algorithm/Model) + Data

1.1 Challenges in model development
Doing well on training set (Usually measured by average training error)
Doing well on dev/test sets
Doing well on business metrics/projects goals
2. Why low average error isn't good enough
Performance on disproportionately important examples (search engine)
Informational and transactional queries
"Apple pie recipe"
"Latest movies"
"Wireless data plan"
"Diwali festival"
Navigational queries
"Stanford"
"Reddit"
"Youtube"
Performance on key slices of the dataset
ML for loan approval:
Make sure not to discriminate by ethnicity, gender, location, language or other protected attributes
Product recommendation from retailers
Be careful to treat all major user, retailer, and product categories
Rare classes
Skewed data distribution
Show accuracy in rare classes
3. Establish a baseline
3.1 Establish a baseline level of performance
Human Level Performance (HLP) should be the baseline

3.2 Unstructured and structured data

Ways to establish a baseline
Human level performance (MLP)
Literature search for state-of-the-art/ open source
Quick-and-dirty implementation
Performance of older system
Baseline helps to indicates what might be possible. In some cases (such as HLP) is also gives a sense of what is irreducible error/Bayes error.
3.3 Tips for getting started
Getting started on modeling
Literature search to see what's possible (courses, blogs, open-source projects)
Find open-source implementation if available
A reasonable algorithm with good data will often outperform a great algorithm with no so good data
Deployment constraints when picking a model
Should you take into account deployment contraints when picking a model?
Yes, if baseline is already established and goal is to build and deploy
No, if purpose is establish a baseline and determine what is possible and might be worth pursuing
Sanity-check for code and algorithm
Try to overfit a small training dataset before training on a large one
4. Error analysis and performance auditing
4.1 Error analysis

Useful metrics fro each tag
What fraction of errors has that tag?
Of all data with that tag, what fraction is misclassified?
What fraction of all the data has that tag?
How much room for improvement is there on data with that tag?
4.2 Prioritizing what to work on

Decide on most important categories to work on based on:
How much room for improvement there is
How frequently that category appears
How easy is to improve accuracy in that category
How important it is to improve in that category
4.3 Skewed dataset
Confusion matrix: Precision and Recall
Combining precision and recall - F1 score
4.4 Performance auditing
Auditing framework
Check for accuracy, fairness/bias, and other problems
Brainstorm the ways the system might go wrong
Performance on subsets of data (ethnicity, gender)
How common are certain errors (FP, FN)
Performance on rare class
Establish metrics to assess performance against these issues on appropriate slices of data
Get business/product owner buy-in
5. Data iteration
5.1 Data-centric AI development
Model-centric view
Take the data you have, and develop a model that does as well as possible on it
How the data fixed and iteratively improve the code/model
Data-centric view
The quality of the data is paramount. Use tools to improve the data quality; this will allow multiple models to do well
5.2 A useful picture of data augmentation

In the beginning, the model did not do well for data with library noise, cafe noise, and food court noise. -> we can collect more data in cafe noise, -> then the performance of the related noise are all improved.
5.3 Data augmentation
Goal: Create realistic examples that (1) the algorithm does poorly on, but (2) humans do well on
Checklist:
Does it sound realistic?
Is the X-y mapping clear? (can humans recognize speech?)
Is the algorithm currently doing poorly on it?
5.4 Can adding data hurt?
For unstructured data problems
The model is large (low bias)
The mapping X-> y is clear
==> Adding data RARELY hurts accuracy
5.5 Adding features
Error analysis can be harder if there is no good baseline (such as HLP) to compare to
Error analysis, user feedback and benchmarking to competitors can all provide inspiring for features to add
5.6 Experiment Tracking
What to track
Algorithm/code versioning
Dataset used
Hyperparameters
Results
Tracking tools
Text files
Spreadsheets
Experiment tracking system
Desirable features
Information needed to replicate results
Experiment results, ideally with summary metrics/analysis
Perhaps also: Resource monitoring, visualization, model error analysis
5.7 From big data to Good data
Try to ensure consistently high-quality data in all phases of the ML project lifecycle
Good data:
Covers important cases (Good coverage of input X)
Is defined consistently (Definition of labels y is unambiguous)
Has timely feedback from production data (distribution covers data drift and concept drift)
Is sized appropriately
Last updated
Was this helpful?