🆕

Create your first pipeline

1. Create a new pipeline

To start your analysis, create a new Project and a new Pipeline. Give the pipeline the name to easier identify it further.
notion image

2. Upload your data to the pipeline

Every analysis starts with data, thus the first step in the pipeline is data upload.

Create the dataset

You may drag-and-drop your file to the scene (1), create a new dataset (3) or move the dataset from the left panel to the scene (2).
notion image
 
In the example below we are using home_prices_sample.csv

Review and upload the data

After you have added the data set to the scene, you may run the pipeline(1) to upload the data and review the data summary.
Data preview and summary are available under “View Data details”(2).
notion image
There are 2 tabs: Data Preview displays the sample of the dataset, Summary displays the list of the statistics for the entire dataset.
notion image
 

3. Build first pipeline

For the first pipeline, let’s try to use Brick recommendation (1).
There’s the list of goals you may choose and the we will guide you through the process of building the pipeline.
 
Let’s choose “Build a predictive model” (2).
notion image
 

4. Data Preparation

The next step is to clean and process the data for the further analysis. We have a set of functions available for data transformation and manipulation. You may browse through the list of bricks or search for them.
 
  1. Add the brick to the canvas
To use the brick drag-and-drop it from the left panel to the scene (canvas) and connect output of the dataset.
The platform suggests us to use Auto Data Preparation brick. The brick combines multiple transformations and automatically suggests the transformation rules. More details about Data transformation.
notion image
 
Connect the bricks on the canvas by dragging an arrow from the output of the dataset brick to the input of the Auto Data Preparation brick.
 
  1. Run the pipeline
  1. Open dashboard settings
    1. In the dashboard settings, you may see the transformation suggested for the data.
      For the example purposes, we will remove One-hot encoding for the string fields. To apply the changes press Save, close the dashboard and Run pipeline.
      notion image
      Summary tab provides the insights into the changes applied.
      notion image
  1. Data preview
    1. Open Data Preview if you would like to view the data sample or full data.
 
 

5. Split the data for training and testing

Find the Split Data brick and connect to the Data Preparation. This step allows you to train the model on the subset of data.
 

6. Predictive modeling

  1. Drop the AutoML Predictive Model
    1. Drop the AutoML Predictive Model brick to the canvas and connect to the Train data from the Split Data brick.
      Then setup the Target variable (1) and Run the pipeline (2).
notion image
 
  1. Open model view
Model performance consists of the model metrics and plots.
Classification
  • Supported metrics: accuracy, precision, recall, f1-score, ROC AUC, Gini
Regression
  • Supported metrics: RMSE, MAPE, R2
 
notion image
notion image
 
You may download the model in the json format or save models to the assets from the Model view.
Models saved to the assets maybe used in the other pipelines.
  1. Save the model with the name
    1. notion image
  1. Models available in the “Saved models”
    1. notion image
 
 
  1. Predicted values
    1. To review the predicted values open the View Data details. The new column is added to the dataframe - predicted_SalePrice.
      notion image
      notion image
 
Output of the AutoML Predictive modeling brick - data frame with the predicted values and a trained model.
 

7. Use the trained model

After we have trained the model, let’s test the performance on the test data sample.
  1. Find the Predict brick in the list.
  1. Connect the inputs to the Predict brick
    1. Connect the data from the Split Data and model from the AutoML Predictive Model.
      notion image
  1. Run pipeline
  1. Open Data details
    1. In the Data details you may see the predicted values.