What is Feature Engineering?
Feature engineering is the process of transforming raw data into meaningful input variables (features) that help machine learning models learn patterns more effectively.
It involves selecting relevant data attributes, creating new ones through combinations or transformations, and cleaning issues like missing values or inconsistent formats. The goal is to make the data more informative and suitable for the chosen algorithm.
Techniques often draw on domain knowledge and include scaling numerical values, encoding categories, extracting dates or text patterns, and reducing redundant information. This step is typically iterative and happens before model training.
Unlike automated methods, feature engineering emphasizes human insight to highlight signals that raw data might hide, directly impacting how well a model generalizes to new examples.
Example
For a model predicting house prices, raw data might include address and square footage; feature engineering could create new variables like 'distance to nearest school' or 'price per square foot' to capture useful relationships.
Why it matters
Good features often improve model accuracy more than switching algorithms, and remain essential even as automated tools and deep learning advance, because they reduce noise and highlight relevant patterns in real-world data.
Frequently asked questions
Yes, while deep learning can learn some features automatically, manual engineering often boosts performance on structured data and helps with smaller datasets.
Related terms
Batch size is the number of training examples processed together in a single forward and backward pass during model training.
Data augmentation is a technique that artificially increases the size and diversity of a training dataset by creating modified versions of existing data samples.
A dataset is a structured collection of data points used to train, validate, or test machine learning models.
An epoch is one complete pass of a machine learning model through the entire training dataset during training.
In AI and machine learning, a feature is an individual measurable piece of data that serves as an input variable for a model.
Fine-tuning is the process of taking a pre-trained AI model and continuing its training on a smaller, task-specific dataset to adapt it for a particular use case.