Feature Engineering, Transformation and Selection

0. Learning Objective

Define a set of feature engineering techniques, such as scaling and binning
Use TensorFlow Transform for a simple preprocessing and data transformation task
Describe feature space coverage and implement different feature selection methods
Perform feature selection using scikit-learn routines and ensure feature space coverage

1. Feature Engineering

1.1 Introduction to Preprocessing

Squeezing the most out of data
- Making data useful before training a model
- Representing data in forms that help models learn
- Increasing predictive quality
- Reducing dimensionality with feature engineering
Art of feature engineering
Feature engineering during training must also be applied correctly during serving

1.2 Preprocessing Operations

Main preprocessing operations
- Data cleansing
- Feature tuning
- Representation transformation
- Feature extraction
- Feature construction
Mapping raw data into features
- Mapping numerical values
- Mapping categorical values
  - Categorical vocabulary
Empirical knowledge of data
- Text:
  - Stemming, lemmatization, TF-IDF, n-grams, embedding lookup
- Images:
  - clipping, resizing, cropping, blur, Canny filters, Sobel filters, photometric distortions

Key points

Feature Engineering maps:
- Raw data into feature vectors
- Integer values to floating-point values
- Normalized numerical values
- Strings and categorical values to vectors of numeric values
- Data from one space into a difference space

1.3 Feature Engineering Techniques

Feature Scaling
- Converts values from their natural range into a prescribed range
- Benefits:
  - Helps neural nets converge faster
  - Do away with NaN errors during training
  - For each feature, the model learns the right weights
Normalization and Standardization
- Normalization: $X_{norm} = \frac{X-X_{min}}{X_{max}-X_{min}} , X_{norm} \in [0,1]$
- Standardization (z-score): $X_{std} = \frac{X-\mu}{\sigma}$
  - Z-score
Bucketizing / Binning
Other techniques
- Dimensionality reduction in embeddings
  - Principal Component Analysis (PCA)
  - t-Distributed Stochastic Neighbor Embedding (t-SNE)
  - Uniform Manifold Approximation and Projection (UMAP)
- Feature crossing
TensorFlow embedding projector
- Intuitive exploration of high-dimensional data
- Visualize & analyze
- Techniques
  - PCA
  - t-SNE
  - UMAP
  - Custom linear projection

1.4 Feature crosses

Combines multiple features together into a new feature
Encodes nonlinearity in the feature space, or encodes the same information in feature features
Methods for feature crosses
- [AxB]: Multiplying the values of two features
- [AxBxCxDxE]: Multiplying multiple features
- [Day of week, Hour] ==> [Hour of week]

Key points

Feature crossing: synthetic feature encoding nonlinearity in feature space
Feature coding: transforming categorical to a continuous variable

2. Feature Transformation at Scale

2.1 Preprocessing Data at Scale

Inconsistencies in feature engineering
- Training and serving code path are different
- Diverse deployment scenarios
  - Mobile (Tensorflow Lite)
  - Server (Tensorflow Serving)
  - Web (Tensorflow JS)
- Risks of introducing training-serving skews
- Skews will lower the performance of your serving model
Preprocessing granularity
- Transformations
  - Instance-level
    clipping
    Multiplying
    Expanding features
  - Full-pass
    Minimax
    Standard scaling
    Bucketizing
- When do transform?
  - Pre-processing training dataset
    Pros:
    Run-once
    Compute on entire dataset
    Cons:
    Transformations reproduced at serving
    Slower iterations
  - Transforming within the model
    Pros:
    Easy iterations
    Transformation guarantees
    Cons:
    Expensive transforms
    Long model latency
    Transformations per batch: skew
  - Why transform per batch?
    For example, normalizing features by their average
    Access to a single batch of data, not the full dataset
    Ways to normalize per batch
    Normalize by average within a batch
    Precompute average and reuse it during normalization
Optimizing instance-level transformations
- Indirectly affect training efficiency
- Typically accelerates sit idle while the CPUs transform
- Solution:
  - Prefetching transforms for better accelerator efficiency
Summarizing the challenges
- Balancing predictive performance
- Full-pass transformations on training data
- Optimizing instance-level transformations for better training efficiency (GPU, TPU)

2.2 Tensorflow Transform

Enter tf.Transform

Inside TensorFlow Extended

tf.Transform layout

tf.Transform: Going deeper

How Transform applies feature transformations

Benefits of using tf.Transform
- Emitted tf.Graph holds all necessary constants and transformations
- Focus on data preprocessing only at training time
- Works in-line during both training and serving
- No need for preprocessing code at serving time
- Consistently applied transformations irrespective of deployment platform
Analyzers framework (tf.Transform Analyzers)
- Scaling
  - scale-to-z-score
  - scale-to-0-1
- Bucketizing
  - quantiles
  - apply-buckets
  - bucketize
- Vocabulary
  - bag-of-words
  - tfidf
  - ngrams
- Dimensionality reduction
  - pca

PreviousCollecting, Labeling and Validating Data NextData journey and Data Storage

Last updated 4 years ago

Was this helpful?