Feature Engineering, Transformation and Selection
0. Learning Objective
Define a set of feature engineering techniques, such as scaling and binning
Use TensorFlow Transform for a simple preprocessing and data transformation task
Describe feature space coverage and implement different feature selection methods
Perform feature selection using scikit-learn routines and ensure feature space coverage
1. Feature Engineering
1.1 Introduction to Preprocessing
Squeezing the most out of data
Making data useful before training a model
Representing data in forms that help models learn
Increasing predictive quality
Reducing dimensionality with feature engineering
Art of feature engineering
Feature engineering during training must also be applied correctly during serving
1.2 Preprocessing Operations
Main preprocessing operations
Data cleansing
Feature tuning
Representation transformation
Feature extraction
Feature construction
Mapping raw data into features
Mapping numerical values
Mapping categorical values
Categorical vocabulary
Empirical knowledge of data
Text:
Stemming, lemmatization, TF-IDF, n-grams, embedding lookup
Images:
clipping, resizing, cropping, blur, Canny filters, Sobel filters, photometric distortions
Key points
Feature Engineering maps:
Raw data into feature vectors
Integer values to floating-point values
Normalized numerical values
Strings and categorical values to vectors of numeric values
Data from one space into a difference space
1.3 Feature Engineering Techniques

Feature Scaling
Converts values from their natural range into a prescribed range
Benefits:
Helps neural nets converge faster
Do away with NaN errors during training
For each feature, the model learns the right weights
Normalization and Standardization
Normalization:
Standardization (z-score):
Z-score
Bucketizing / Binning
Other techniques
Dimensionality reduction in embeddings
Principal Component Analysis (PCA)
t-Distributed Stochastic Neighbor Embedding (t-SNE)
Uniform Manifold Approximation and Projection (UMAP)
Feature crossing
TensorFlow embedding projector
Intuitive exploration of high-dimensional data
Visualize & analyze
Techniques
PCA
t-SNE
UMAP
Custom linear projection
1.4 Feature crosses
Combines multiple features together into a new feature
Encodes nonlinearity in the feature space, or encodes the same information in feature features
Methods for feature crosses
[AxB]: Multiplying the values of two features
[AxBxCxDxE]: Multiplying multiple features
[Day of week, Hour] ==> [Hour of week]
Key points
Feature crossing: synthetic feature encoding nonlinearity in feature space
Feature coding: transforming categorical to a continuous variable
2. Feature Transformation at Scale

2.1 Preprocessing Data at Scale

Inconsistencies in feature engineering
Training and serving code path are different
Diverse deployment scenarios
Mobile (Tensorflow Lite)
Server (Tensorflow Serving)
Web (Tensorflow JS)
Risks of introducing training-serving skews
Skews will lower the performance of your serving model
Preprocessing granularity
Transformations
Instance-level
clipping
Multiplying
Expanding features
Full-pass
Minimax
Standard scaling
Bucketizing
When do transform?
Pre-processing training dataset
Pros:
Run-once
Compute on entire dataset
Cons:
Transformations reproduced at serving
Slower iterations
Transforming within the model
Pros:
Easy iterations
Transformation guarantees
Cons:
Expensive transforms
Long model latency
Transformations per batch: skew
Why transform per batch?
For example, normalizing features by their average
Access to a single batch of data, not the full dataset
Ways to normalize per batch
Normalize by average within a batch
Precompute average and reuse it during normalization
Optimizing instance-level transformations
Indirectly affect training efficiency
Typically accelerates sit idle while the CPUs transform
Solution:
Prefetching transforms for better accelerator efficiency
Summarizing the challenges
Balancing predictive performance
Full-pass transformations on training data
Optimizing instance-level transformations for better training efficiency (GPU, TPU)
2.2 Tensorflow Transform
Enter tf.Transform

Inside TensorFlow Extended

tf.Transform layout

tf.Transform: Going deeper

How Transform applies feature transformations

Benefits of using tf.Transform
Emitted tf.Graph holds all necessary constants and transformations
Focus on data preprocessing only at training time
Works in-line during both training and serving
No need for preprocessing code at serving time
Consistently applied transformations irrespective of deployment platform
Analyzers framework (tf.Transform Analyzers)
Scaling
scale-to-z-score
scale-to-0-1
Bucketizing
quantiles
apply-buckets
bucketize
Vocabulary
bag-of-words
tfidf
ngrams
Dimensionality reduction
pca
Last updated
Was this helpful?