Categories Mapping

General Information

The brick provides a separate dashboard to perform manual mapping on the specified categorical columns and then convert them to the desired data type.

Description

Brick Location

Bricks Analytics → Data Mining / AI → Categories Mapping

Brick Parameters

  • Filter Columns
    • If you have columns in your data that need to be ignored (but not removed from the data set) and not be shown in the dashboard as an option, you should specify them in this parameter. To select multiple columns, click the '+' button in the brick settings.
      In addition, you can ignore all columns except the ones you specified by enabling the "Remove all except selected" option. This may be useful if you have a large number of columns while needing just several of them to be mapped with new values.

Brick Inputs/Outputs

  • Inputs
    • Brick takes the dataset.
  • Outputs
    • Brick modifies the input dataset, changing the categories to the new values specified in the dashboard, and converts the columns to the selected data type.

Dashboard

The dashboard is designed to give easy-to-use access to columns unique categories and change them to new values when needed. Additionally, you can change categories data types as well.
To access the dashboard, you need to press the "Open dashboard" button on the right-side menu with brick settings. The dashboard can be divided to the next three functional parts:
notion image
  1. Categorical columns - a selectable list of columns that contain categories. You can filter unwanted columns by adding them to the "Filter Columns" option on the right sidebar. Other components change respectively to the chosen column.
  1. Column's data type - specifies the current data type of the selected categorical column with a drop-down section to let the user manually change it to the desired one. If the conversion is not possible, brick automatically converts the column to a higher-level type (e.g., if it is not possible to convert a column to an integer, the brick first tries to convert to afloat and then to a string).
  1. Mapping table - a table with two columns - old values and new ones.

Example of usage

Let's consider the binary classification problem 🛥️Binary classification : Titanic. The inverse target variable takes two values - survived (0) - good or non-event case / not-survived (1) - bad or event case. The general information about predictors is represented below:
  • passengerid (numeric) - ID of passenger
  • name (string) - Passenger's name
  • pclass (category) - Ticket class
  • sex (category) - Gender
  • age (numeric) - Age in years
  • sibsp (numeric) - Number of siblings / spouses aboard the Titanic
  • parch (numeric) - Number of parents / children aboard the Titanic
  • ticket (category) - Ticket number
  • fare (numeric) - Passenger fare
  • cabin (category) - Cabin number
  • embarked (category) - Port of Embarkation
notion image
And let's assume, that we want to manually map the values of some categorical columns for potential future analysis. Among the columns, we choose:
  • pclass - we want to change numeric values to more informative categories, like the "1st class"
  • sex - we, first, rename the column to is_male (using the Rename Columns brick) and then convert the values to a binary variable
  • embarked - that column has NaN values that we want to be changed to "Unknown"

Building and executing pipeline

First of all, we need to drag'n'drop the required bricks onto the stage (and connect them one by one) in the next order:
  1. From the "Data Sets", we need to choose the titanic.csv file
  1. Then connect it to the Rename Columns brick where we specify the sex column to be renamed to is_male
  1. Lastly, drop the Categories Mapping brick and filter out all columns except pclass, is_male and embarked (consider using the "Remove all except selected" checkbox)
Then, just execute the pipeline and the "Open dashboard" option will appear in the right-side menu:
notion image
 
Inside the dashboard, we would need to do some manual mapping in the following steps:
  • For the embarked column, specify "Unknown" as a new value for "NaN"
    • notion image
  • For the is_male column, first, change the data type to a boolean, then set new values to "false" for females and "true" for males. Note, that Categories Mapping supports true/false, True/False and 1/0 as boolean values
    • notion image
  • For the pclass column, change values to "1st class", "2nd class", "3rd class" respectively:
    • notion image
Lastly, save the made changes and execute the pipeline one more time. In order to see the results, you should open the Output data previewer on the right sidebar.
notion image
The results are depicted in the table:
notion image
 
undefined