Feature Engineering, Transformation and Selection

0. Learning Objective

  • Define a set of feature engineering techniques, such as scaling and binning

  • Use TensorFlow Transform for a simple preprocessing and data transformation task

  • Describe feature space coverage and implement different feature selection methods

  • Perform feature selection using scikit-learn routines and ensure feature space coverage

1. Feature Engineering

1.1 Introduction to Preprocessing

  • Squeezing the most out of data

    • Making data useful before training a model

    • Representing data in forms that help models learn

    • Increasing predictive quality

    • Reducing dimensionality with feature engineering

  • Art of feature engineering

  • Feature engineering during training must also be applied correctly during serving

1.2 Preprocessing Operations

  • Main preprocessing operations

    • Data cleansing

    • Feature tuning

    • Representation transformation

    • Feature extraction

    • Feature construction

  • Mapping raw data into features

    • Mapping numerical values

    • Mapping categorical values

      • Categorical vocabulary

  • Empirical knowledge of data

    • Text:

      • Stemming, lemmatization, TF-IDF, n-grams, embedding lookup

    • Images:

      • clipping, resizing, cropping, blur, Canny filters, Sobel filters, photometric distortions

Key points

  • Feature Engineering maps:

    • Raw data into feature vectors

    • Integer values to floating-point values

    • Normalized numerical values

    • Strings and categorical values to vectors of numeric values

    • Data from one space into a difference space

1.3 Feature Engineering Techniques

  • Feature Scaling

    • Converts values from their natural range into a prescribed range

    • Benefits:

      • Helps neural nets converge faster

      • Do away with NaN errors during training

      • For each feature, the model learns the right weights

  • Normalization and Standardization

    • Normalization: Xnorm=X−XminXmax−Xmin,Xnorm∈[0,1]X_{norm} = \frac{X-X_{min}}{X_{max}-X_{min}} , X_{norm} \in [0,1]

    • Standardization (z-score): Xstd=X−μσX_{std} = \frac{X-\mu}{\sigma}

      • Z-score

  • Bucketizing / Binning

  • Other techniques

    • Dimensionality reduction in embeddings

      • Principal Component Analysis (PCA)

      • t-Distributed Stochastic Neighbor Embedding (t-SNE)

      • Uniform Manifold Approximation and Projection (UMAP)

    • Feature crossing

  • TensorFlow embedding projector

    • Intuitive exploration of high-dimensional data

    • Visualize & analyze

    • Techniques

      • PCA

      • t-SNE

      • UMAP

      • Custom linear projection

1.4 Feature crosses

  • Combines multiple features together into a new feature

  • Encodes nonlinearity in the feature space, or encodes the same information in feature features

  • Methods for feature crosses

    • [AxB]: Multiplying the values of two features

    • [AxBxCxDxE]: Multiplying multiple features

    • [Day of week, Hour] ==> [Hour of week]

Key points

  • Feature crossing: synthetic feature encoding nonlinearity in feature space

  • Feature coding: transforming categorical to a continuous variable

2. Feature Transformation at Scale

2.1 Preprocessing Data at Scale

  • Inconsistencies in feature engineering

    • Training and serving code path are different

    • Diverse deployment scenarios

      • Mobile (Tensorflow Lite)

      • Server (Tensorflow Serving)

      • Web (Tensorflow JS)

    • Risks of introducing training-serving skews

    • Skews will lower the performance of your serving model

  • Preprocessing granularity

    • Transformations

      • Instance-level

        • clipping

        • Multiplying

        • Expanding features

      • Full-pass

        • Minimax

        • Standard scaling

        • Bucketizing

    • When do transform?

      • Pre-processing training dataset

        • Pros:

          • Run-once

          • Compute on entire dataset

        • Cons:

          • Transformations reproduced at serving

          • Slower iterations

      • Transforming within the model

        • Pros:

          • Easy iterations

          • Transformation guarantees

        • Cons:

          • Expensive transforms

          • Long model latency

          • Transformations per batch: skew

      • Why transform per batch?

        • For example, normalizing features by their average

        • Access to a single batch of data, not the full dataset

        • Ways to normalize per batch

          • Normalize by average within a batch

          • Precompute average and reuse it during normalization

  • Optimizing instance-level transformations

    • Indirectly affect training efficiency

    • Typically accelerates sit idle while the CPUs transform

    • Solution:

      • Prefetching transforms for better accelerator efficiency

  • Summarizing the challenges

    • Balancing predictive performance

    • Full-pass transformations on training data

    • Optimizing instance-level transformations for better training efficiency (GPU, TPU)

2.2 Tensorflow Transform

  • Enter tf.Transform

  • Inside TensorFlow Extended

  • tf.Transform layout

  • tf.Transform: Going deeper

  • How Transform applies feature transformations

  • Benefits of using tf.Transform

    • Emitted tf.Graph holds all necessary constants and transformations

    • Focus on data preprocessing only at training time

    • Works in-line during both training and serving

    • No need for preprocessing code at serving time

    • Consistently applied transformations irrespective of deployment platform

  • Analyzers framework (tf.Transform Analyzers)

    • Scaling

      • scale-to-z-score

      • scale-to-0-1

    • Bucketizing

      • quantiles

      • apply-buckets

      • bucketize

    • Vocabulary

      • bag-of-words

      • tfidf

      • ngrams

    • Dimensionality reduction

      • pca

Last updated

Was this helpful?