Aggregate Data

General information

Brick provides a possibility to group rows that have the same values in conjunction with aggregate functions to produce a summary dataset.

Description

Brick Location

Bricks Data Manipulation Transform → Aggregate Data
Bricks Analytics Data Insights → Aggregate Data
Bricks Use Cases Demand Forecasting → Data Processing → Aggregate Data

Brick Parameters

  • Aggregate key
    • Column from the input data used to group rows that have the same values. It is possible to choose several columns for grouping by clicking on the '+' button in the brick settings.
  • Data columns
    • Column and function to apply to that column on an entire group of rows and then return one row of values for each group. It is possible to choose several columns-function pairs by clicking on the '+' button in the brick settings.
      Main restrictions here:
    • If the selected column is of string data type - a function that can be applied are count, first, and last
    • If the selected column is of numeric data type - a function that can be applied are count, min, max, mean, sum, and std
    • If the selected column is of DateTime data type - a function that can be applied are count, min, max, first, and last.

Brick Inputs/Outputs

  • Inputs
    • Brick takes the dataset.
  • Outputs
    • Brick produces the result as a new dataset, with the column or columns that were selected for grouping and a column for each column-function aggregation pair.

Example of usage

Let's consider the dataset from the binary classification problem . The general information about the dataset is represented below:
  • passengerid (category) - ID of passenger
  • name (category) - Passenger's name
  • pclass (category) - Ticket class
  • sex (category) - Gender
  • age (numeric) - Age in years
  • sibsp (numeric) - Number of siblings / spouses aboard the Titanic
  • parch (category) - Number of parents / children aboard the Titanic
  • ticket (category) - Ticket number
  • fare (numeric) - Passenger fare
  • cabin (category) - Cabin number
  • embarked (category) - Port of Embarkation
  • survived (boolean) - True/False
Lets aggregate by Ticket Class ("Pclass") to check how differs data within this groups:
notion image
The resulted dataset with new columns Age_Min, Age_Max, Fare_Mean, Sex_First, Survived_Mean, and PassengerId_Count is shown below:
notion image