Define Age Groups using Execute R Script in Azure ML

Azure ML - Define Age Groups using Execute R Script

In this tutorial, I'll show you how to create an age grouping using the Execute R Script module and append the newly created column to your dataset.

Step 1 - Dataset

1. You will need to download the aw_cycles.csv (762.23 kb) file to your local file system, and then upload the dataset to your Azure ML workspace.

2. Create a new Experiment. If you're not familiar with how to, please see our Azure Machine Learning Tutorial, Step 1 - Create AML Experiment for details.

3. Name your Experiment "Age Group using R Script", or whatever you desire.

3. After you have uploaded the aw_cycles.csv file, find it from within your My Datasets and drag it to the designer.

4. You can visualize the data to see the rows of data. You will be focusing on the Age column values for this tutorial.

Step 2 - Execute R Script

You will be using the Execute R Script module for this exercise. The Execute R Script module allows you to execute R Script from Azure Machine Learning experiment. To learn more about this module please visit here.

1. From the left module pane, search for "execute r script".

2. Drag the Execute R Script module just below the aw_cycles.csv dataset. Connect the dataset connector (Dataset) to the left connector (Dataset1) of the Execute R Script module. Your designer should now look similar to below:

3. While having focus on the Execute R Script module, set the R Script property by clicking the display window icon, as highlighted in the image below. Also, ensure the R Version is set to CRAN R 3.1.0.

4. When the window expands, you can begin to edit the R Script. Highlight the content in the current R Script window and delete it. Replace it with the following R Script code:

# Map 1-based optional input ports to variables
dataset1 <- maml.mapInputPort(1) # class: data.frame

# Use cut to set age break from 0-80+; define age group labels.
dataset1$AgeGroup <- cut(dataset1$Age, c(0,10,20,35,50,60,76,80,90),
                       labels = c("[0-10]", "[10-20]", "[20-35]", "[35-50]",
                                  "[50-60]","[60-76]", "[76-80]","[+80]"))

# Select data.frame to be sent to the output Dataset port
maml.mapOutputPort("dataset1");

The code above is simply doing the following:

  1. Referencing our dataset (aw_cycles.csv)
  2. Creating a new column named "AgeGroup"
  3. Use cut function to divide the displayed range into intervals and codes the values according to which interval the fall. The leftmost interval corresponds to level one, the next leftmost to level two and so on. The labels display how it would render.
  4. Define labels as how we would like to conform the results.
  5. Return the dataset with newly created and appended column "AgeGroup".

Your R Script editor should look like the following:

5. Click the OK check button.

6. Save and Run your experiment. When the process completes you can right click on the Execute R Script >> Result Dataset >> Visualize to view the AgeGroup column and values.

You may scroll to the left to see the Age value coincides with the AgeGroup column values.

This is helpful for categorizing your age groups for working with summarized data in charts, histograms, and such.

This concluded defining your Age Group R Scripting.

Step 3 - Export to CSV

If you want to export the new dataset with appended Age Group values, you can use the Convert to CSV module.

1. From the left module pane, search for "convert to csv".

2. Drag the Convert to CSV module just below the Execute R Script module. Connect the bottom left connector (1) (Result Dataset) of the Execute R Script module to the top connector of the Convert to CSV module. Your designer should now look similar to below:

3. Save and Run your experiment. Right click the Covert to CSV module >> Results dataset >> Download and save it to your desired location. Open the file and you will see the complete aw_cycles.csv with AgeGroup column.

 

Cheers!

Comments are closed