Typization

General information

This brick is designed to let the user change columns' types in the input dataset.

Description

Brick Location

Data Manipulation → Convert / Replace → Typization

Brick Parameters

  • Column
    • The name of the column in the input data frame, which type we want to change.
  • Type
    • New type for the specified column.
      Here we support boolean, integer, float, string, category and datetime types.
  • Date format
    • Additional setting for casting into datetime format.
      This field is enabled only if new type is datetime.
  • NaN fraction
    • Float and datetime types have an option to choose a NaN fraction of invalid values. This determines the percentage of invalid data that can be tolerated without failing the conversion. This setting can be helpful in case there is the desired threshold of possibly corrupted data that should not be exceeded. The default behavior is set to 100 meaning that 100% of data can fail to convert resulting in NaN values.

Brick Inputs/Outputs

  • Inputs
    • Brick takes the data set without any restrictions.
  • Outputs
    • Brick produces the result as a new dataset with new columns' types.

DateTime Formatting

When we need to convert the column to the datetime type, it is recommended to specify an explicit format string to prevent possible different interpretations.
Here we support the following format codes to represent the future datetime values.
For example, using the codes below, the string like 2001-01-01 12:34:56 may be safely casted into datetime with the format %Y-%m-%d %H:%M:%S
 
DateTime format codes
Category
Date part
Code
Meaning
Example
Date
Year
Year with century as a decimal number
1988, 2004, 2021
Date
Year
Year without century as a zero-padded decimal number
88, 04, 21
Date
Month
Month as a zero-padded decimal number
01, 02, ..., 12
Date
Month
Month as locale’s full name
January, June, December
Date
Month
Month as locale’s abbreviated name
Jan, Jun, Dec
Date
Week
Week number of the year (Sunday as the first day of the week) as a zero padded decimal number. All days in a new year preceding the first Sunday are considered to be in week 0.
00, 01, ..., 53
Date
Week
Week number of the year (Monday as the first day of the week) as a decimal number. All days in a new year preceding the first Monday are considered to be in week 0.
00, 01, ..., 53
Date
Weekday
Weekday as locale’s full name
Monday, Saturday, Sunday
Date
Weekday
Weekday as locale’s abbreviated name
Mon, Sat, Sun
Date
Weekday
Weekday as a decimal number, where 0 is Sunday and 6 is Saturday
0, 1, ..., 6
Date
Day
Day of the month as a zero-padded decimal number
01, 02, ..., 31
Date
Day
Day of the year as a zero-padded decimal number
001, 002, ..., 366
Date
Other
Date in the format %m/%d/%y
08/16/88
Time
Hour
Hour (24-hour clock) as a zero-padded decimal number
00, 01, ..., 23
Time
Hour
Hour (12-hour clock) as a zero-padded decimal number
01, 02, ..., 12
Time
Minute
Minute as a zero-padded decimal number
00, 01, ..., 59
Time
Second
Second as a zero-padded decimal number
00, 01, ..., 59
Time
Second
Microsecond as a decimal number, zero-padded on the left
000000, 000001, …, 999999
Time
Other
Locale’s equivalent of either AM or PM
AM, PM
Time
Other
Time zone name
UTC, GMT
Time
Other
Time in the format %H:%M:%S
21:30:00
Date
Time
Other
DateTime in the format %a %b %d %H:%M:%S %Y
Tue Aug 16 21:30:00 1988
Other
Other
A literal % character
%