Select and Train a Model

0. Learning Objectives

Identify the key challenges in model development.
Describe how performance on a small set of disproportionately important examples may be more crucial than performance on the majority of examples.
Explain how rare classes in your training data can affect performance.
Define three ways of establishing a baseline for your performance.
Define structured vs. unstructured data.
Identify when to consider deployment constraints when choosing a model.
List the steps involved in getting started with ML modeling.
Describe the iterative process for error analysis.
Identify the key factors in deciding what to prioritize when working to improve model accuracy.
Describe methods you might use for data augmentation given audio data vs. image data.
Explain the problems you can have training on a highly skewed dataset.
Identify a use case in which adding more data to your training dataset could actually hurt performance.
Describe the key components of experiment tracking.

1. Key challenges

AI system = Code (Algorithm/Model) + Data

1.1 Challenges in model development

Doing well on training set (Usually measured by average training error)
Doing well on dev/test sets
Doing well on business metrics/projects goals

2. Why low average error isn't good enough

Performance on disproportionately important examples (search engine)
- Informational and transactional queries
  - "Apple pie recipe"
  - "Latest movies"
  - "Wireless data plan"
  - "Diwali festival"
- Navigational queries
  - "Stanford"
  - "Reddit"
  - "Youtube"
Performance on key slices of the dataset
- ML for loan approval:
  - Make sure not to discriminate by ethnicity, gender, location, language or other protected attributes
- Product recommendation from retailers
  - Be careful to treat all major user, retailer, and product categories
Rare classes
- Skewed data distribution
- Show accuracy in rare classes

3. Establish a baseline

3.1 Establish a baseline level of performance

Human Level Performance (HLP) should be the baseline

3.2 Unstructured and structured data

Ways to establish a baseline
- Human level performance (MLP)
- Literature search for state-of-the-art/ open source
- Quick-and-dirty implementation
- Performance of older system

Baseline helps to indicates what might be possible. In some cases (such as HLP) is also gives a sense of what is irreducible error/Bayes error.

3.3 Tips for getting started

Getting started on modeling
- Literature search to see what's possible (courses, blogs, open-source projects)
- Find open-source implementation if available
- A reasonable algorithm with good data will often outperform a great algorithm with no so good data
Deployment constraints when picking a model
- Should you take into account deployment contraints when picking a model?
  - Yes, if baseline is already established and goal is to build and deploy
  - No, if purpose is establish a baseline and determine what is possible and might be worth pursuing
Sanity-check for code and algorithm
- Try to overfit a small training dataset before training on a large one

4. Error analysis and performance auditing

4.1 Error analysis

Useful metrics fro each tag

What fraction of errors has that tag?
Of all data with that tag, what fraction is misclassified?
What fraction of all the data has that tag?
How much room for improvement is there on data with that tag?

4.2 Prioritizing what to work on

Decide on most important categories to work on based on:

How much room for improvement there is
How frequently that category appears
How easy is to improve accuracy in that category
How important it is to improve in that category

4.3 Skewed dataset

Confusion matrix: Precision and Recall
Combining precision and recall - F1 score

4.4 Performance auditing

Auditing framework
Check for accuracy, fairness/bias, and other problems
- Brainstorm the ways the system might go wrong
  - Performance on subsets of data (ethnicity, gender)
  - How common are certain errors (FP, FN)
  - Performance on rare class
- Establish metrics to assess performance against these issues on appropriate slices of data
- Get business/product owner buy-in

5. Data iteration

5.1 Data-centric AI development

Model-centric view
- Take the data you have, and develop a model that does as well as possible on it
- How the data fixed and iteratively improve the code/model
Data-centric view
- The quality of the data is paramount. Use tools to improve the data quality; this will allow multiple models to do well

5.2 A useful picture of data augmentation

In the beginning, the model did not do well for data with library noise, cafe noise, and food court noise. -> we can collect more data in cafe noise, -> then the performance of the related noise are all improved.

5.3 Data augmentation

Goal: Create realistic examples that (1) the algorithm does poorly on, but (2) humans do well on

Checklist:
- Does it sound realistic?
- Is the X-y mapping clear? (can humans recognize speech?)
- Is the algorithm currently doing poorly on it?

5.4 Can adding data hurt?

For unstructured data problems
- The model is large (low bias)
- The mapping X-> y is clear
==> Adding data RARELY hurts accuracy

5.5 Adding features

Error analysis can be harder if there is no good baseline (such as HLP) to compare to
Error analysis, user feedback and benchmarking to competitors can all provide inspiring for features to add

5.6 Experiment Tracking

What to track
- Algorithm/code versioning
- Dataset used
- Hyperparameters
- Results
Tracking tools
- Text files
- Spreadsheets
- Experiment tracking system
Desirable features
- Information needed to replicate results
- Experiment results, ideally with summary metrics/analysis
- Perhaps also: Resource monitoring, visualization, model error analysis

5.7 From big data to Good data

Try to ensure consistently high-quality data in all phases of the ML project lifecycle

Good data:

Covers important cases (Good coverage of input X)
Is defined consistently (Definition of labels y is unambiguous)
Has timely feedback from production data (distribution covers data drift and concept drift)
Is sized appropriately

PreviousOverview of the ML Lifecycle and Deployment NextData Definition and Baseline

Last updated 4 years ago

Was this helpful?