Select and Train a Model

0. Learning Objectives

  • Identify the key challenges in model development.

  • Describe how performance on a small set of disproportionately important examples may be more crucial than performance on the majority of examples.

  • Explain how rare classes in your training data can affect performance.

  • Define three ways of establishing a baseline for your performance.

  • Define structured vs. unstructured data.

  • Identify when to consider deployment constraints when choosing a model.

  • List the steps involved in getting started with ML modeling.

  • Describe the iterative process for error analysis.

  • Identify the key factors in deciding what to prioritize when working to improve model accuracy.

  • Describe methods you might use for data augmentation given audio data vs. image data.

  • Explain the problems you can have training on a highly skewed dataset.

  • Identify a use case in which adding more data to your training dataset could actually hurt performance.

  • Describe the key components of experiment tracking.

1. Key challenges

AI system = Code (Algorithm/Model) + Data

Model development is an iterative process

1.1 Challenges in model development

  • Doing well on training set (Usually measured by average training error)

  • Doing well on dev/test sets

  • Doing well on business metrics/projects goals

2. Why low average error isn't good enough

  • Performance on disproportionately important examples (search engine)

    • Informational and transactional queries

      • "Apple pie recipe"

      • "Latest movies"

      • "Wireless data plan"

      • "Diwali festival"

    • Navigational queries

      • "Stanford"

      • "Reddit"

      • "Youtube"

  • Performance on key slices of the dataset

    • ML for loan approval:

      • Make sure not to discriminate by ethnicity, gender, location, language or other protected attributes

    • Product recommendation from retailers

      • Be careful to treat all major user, retailer, and product categories

  • Rare classes

    • Skewed data distribution

    • Show accuracy in rare classes

3. Establish a baseline

3.1 Establish a baseline level of performance

  • Human Level Performance (HLP) should be the baseline

3.2 Unstructured and structured data

  • Ways to establish a baseline

    • Human level performance (MLP)

    • Literature search for state-of-the-art/ open source

    • Quick-and-dirty implementation

    • Performance of older system

Baseline helps to indicates what might be possible. In some cases (such as HLP) is also gives a sense of what is irreducible error/Bayes error.

3.3 Tips for getting started

  • Getting started on modeling

    • Literature search to see what's possible (courses, blogs, open-source projects)

    • Find open-source implementation if available

    • A reasonable algorithm with good data will often outperform a great algorithm with no so good data

  • Deployment constraints when picking a model

    • Should you take into account deployment contraints when picking a model?

      • Yes, if baseline is already established and goal is to build and deploy

      • No, if purpose is establish a baseline and determine what is possible and might be worth pursuing

  • Sanity-check for code and algorithm

    • Try to overfit a small training dataset before training on a large one

4. Error analysis and performance auditing

4.1 Error analysis

Useful metrics fro each tag

  • What fraction of errors has that tag?

  • Of all data with that tag, what fraction is misclassified?

  • What fraction of all the data has that tag?

  • How much room for improvement is there on data with that tag?

4.2 Prioritizing what to work on

Decide on most important categories to work on based on:

  • How much room for improvement there is

  • How frequently that category appears

  • How easy is to improve accuracy in that category

  • How important it is to improve in that category

4.3 Skewed dataset

  • Confusion matrix: Precision and Recall

  • Combining precision and recall - F1 score

4.4 Performance auditing

  • Auditing framework

    Check for accuracy, fairness/bias, and other problems

    • Brainstorm the ways the system might go wrong

      • Performance on subsets of data (ethnicity, gender)

      • How common are certain errors (FP, FN)

      • Performance on rare class

    • Establish metrics to assess performance against these issues on appropriate slices of data

    • Get business/product owner buy-in

5. Data iteration

5.1 Data-centric AI development

  • Model-centric view

    • Take the data you have, and develop a model that does as well as possible on it

    • How the data fixed and iteratively improve the code/model

  • Data-centric view

    • The quality of the data is paramount. Use tools to improve the data quality; this will allow multiple models to do well

5.2 A useful picture of data augmentation

In the beginning, the model did not do well for data with library noise, cafe noise, and food court noise. -> we can collect more data in cafe noise, -> then the performance of the related noise are all improved.

5.3 Data augmentation

Goal: Create realistic examples that (1) the algorithm does poorly on, but (2) humans do well on

  • Checklist:

    • Does it sound realistic?

    • Is the X-y mapping clear? (can humans recognize speech?)

    • Is the algorithm currently doing poorly on it?

5.4 Can adding data hurt?

  • For unstructured data problems

    • The model is large (low bias)

    • The mapping X-> y is clear

    ==> Adding data RARELY hurts accuracy

5.5 Adding features

  • Error analysis can be harder if there is no good baseline (such as HLP) to compare to

  • Error analysis, user feedback and benchmarking to competitors can all provide inspiring for features to add

5.6 Experiment Tracking

  • What to track

    • Algorithm/code versioning

    • Dataset used

    • Hyperparameters

    • Results

  • Tracking tools

    • Text files

    • Spreadsheets

    • Experiment tracking system

  • Desirable features

    • Information needed to replicate results

    • Experiment results, ideally with summary metrics/analysis

    • Perhaps also: Resource monitoring, visualization, model error analysis

5.7 From big data to Good data

Try to ensure consistently high-quality data in all phases of the ML project lifecycle

Good data:

  • Covers important cases (Good coverage of input X)

  • Is defined consistently (Definition of labels y is unambiguous)

  • Has timely feedback from production data (distribution covers data drift and concept drift)

  • Is sized appropriately

Last updated

Was this helpful?