How do I choose which features to create?

Start with domain knowledge, explore correlations in the data, and test how different features affect model results through experimentation.

What's the difference between feature engineering and feature selection?

Engineering creates or transforms new features, while selection picks the most useful ones from existing data.

What is Feature Engineering?

Feature engineering is the process of transforming raw data into meaningful input variables (features) that help machine learning models learn patterns more effectively.

It involves selecting relevant data attributes, creating new ones through combinations or transformations, and cleaning issues like missing values or inconsistent formats. The goal is to make the data more informative and suitable for the chosen algorithm.

Techniques often draw on domain knowledge and include scaling numerical values, encoding categories, extracting dates or text patterns, and reducing redundant information. This step is typically iterative and happens before model training.

Unlike automated methods, feature engineering emphasizes human insight to highlight signals that raw data might hide, directly impacting how well a model generalizes to new examples.

Example

For a model predicting house prices, raw data might include address and square footage; feature engineering could create new variables like 'distance to nearest school' or 'price per square foot' to capture useful relationships.

Why it matters

Good features often improve model accuracy more than switching algorithms, and remain essential even as automated tools and deep learning advance, because they reduce noise and highlight relevant patterns in real-world data.

Frequently asked questions

Yes, while deep learning can learn some features automatically, manual engineering often boosts performance on structured data and helps with smaller datasets.

Related terms

Batch Size

Batch size is the number of training examples processed together in a single forward and backward pass during model training.

Data Augmentation

Data augmentation is a technique that artificially increases the size and diversity of a training dataset by creating modified versions of existing data samples.

Dataset

A dataset is a structured collection of data points used to train, validate, or test machine learning models.

Epoch

An epoch is one complete pass of a machine learning model through the entire training dataset during training.

Feature

In AI and machine learning, a feature is an individual measurable piece of data that serves as an input variable for a model.

Fine-Tuning

Fine-tuning is the process of taking a pre-trained AI model and continuing its training on a smaller, task-specific dataset to adapt it for a particular use case.