The brick provides a possibility to calculate the formula for both string and numerical variables for each row. For column calculation look at Apply Function.
Bricks → Calculation → Math Formula
Brick takes the dataset.
Brick calculates a formula for the input dataset and returns it with a calculated column.
- Target - the name of the column for which we calculate the formula. It is possible to choose existing or write down the name of the new one.
- Arguments - names of columns which can be used in calculation.
- Block with formula which will be doubled on the side panel as well.
- Operations block shows all the possible operations in categories.
- Arithmetic: addition, subtraction, multiplication, division and modular division;
- Trigonometric: functions of sine, cosine, tangent, cotangent;
- Logical: comparing less, more, equal, less equal, more equal, equal, not equal, logical ‘and’, logical ’or’, logical ’not’, is empty. Also support IF-ELSE constructs (see examples below);
- Logarithm operations: ln, log, ln1p;
- Power: finding x power of y, x power of 2, x power of 3, x power of 10, e power of x, 2 power of x, 10 power of x, square root of x, cube root of x, y root of x;
- Other: finding minimum of 2 values, maximum of 2 values, rounding, rounding down, rounding up, sign of x, absolute value;
- String: checking if one string contains another, if one string is contained by another, concatenation of two strings, subtraction of symbols from the string, replacing one string with another, transform rows to upper or lower case;
- Typization: ability to convert cells to string, numeric, boolean or datetime data types;
- Parse Date Time: grants functions that can extract different parts of the date time, like date, month, weekday, week of the year, year, day of the month or the year, time, hours and minutes;
- Compound Date Time: gives an ability to form a date time from different numerical parts/components like year, month, day, hour and minutes. It accepts both numerical values and columns. All parts are optional and can be deleted: in cases when part is not provided - it gets default values. The default values are: year=1970, month=1, day=1, hour=0, minutes=0. You can transfer parts either by using keywords or without them but by specific order. Though, don’t leave “handling” arguments (keywords without the assigned values or empty strings between commas) - it will cause errors;
- Transform Date Time: shift dates by specific amount of units or find a difference between two dates. Supported units are:
For demonstration, let us consider the Titanic dataset, which subset is presented below:
Let suggest we want to set a new fare with a certain rule. We write down the name of the new column in Target field, then choose what operations we need (in this case we choose minimum of two values, one of which is power of two).
As a result we get the table with new_fare column.
For example, we need to know the first letter of cabin.
First, we choose the target column or write down the name of a new.
Then, find in String Operations section substring(str, position, n) and choose it. To choose the column we click at one of the argument and then fill other parameters.
As a result we have a dataset with a new column which shows the first letter of the cabin.
Let try complex logical statement to check if a passenger was female and number of parents/children was equal to 2.
We create a new column, then choose logical operator AND operator and fill it with two conditions. For the first condition we choose contains(str,substr) from String Operations section and fill the parameters with name of the column and word we want to check e.g. ‘female’. For the second condition we need to select column with operator ‘equal’ (==).
As a result we have a dataset with a new column which shows if each row satisfies our conditions.
Overall structure of formula’s IF-ELSE construct is very similar to Excel or Power BI syntaxis:
if(condition, then, else)
Example: let’s create a formula that will divide house’s SalePrice by two when this house is old enough (if it was built before the 2000th) and leave it unchanged otherwise:
If we then run this formula and check the output, we will indeed see that the first statement applies only to those rows, where YearBuilt is smaller than 2000: