Data integration is a big part of putting predictive models to work. ADAPA on the Amazon cloud allows for easy model deployment, but how can you actually move the data to the cloud for scoring? One simple way is by using SQL Server Integration Services (SSIS).
In our last post on this topic, we showed how to use the provided Web Service Task available in SSIS to score data in ADAPA via web services. A limitation of the Web Service Task, however, is that one needs to provide values for the input parameters through the task UI, which is not very useful if the intent is to score several data records.
If that is the case, the answer is to use a Script Component instead. Below, we demonstrate in 10 simple steps how to setup, write and execute our own Script Component so that we can successfully pass multiple data records to ADAPA on the cloud and get a result back for each record (without having to type the inputs in the UI every time).
Step 1: Installing the Zementis Security Certificate
First, we need to have the “Zementis Security Certificate” installed as a trusted 3rd party root certification authority. A link to the Zementis Security Certificate is available from the ADAPA Control …
Data integration is a big part of putting predictive models to work. ADAPA on the Amazon cloud allows for easy model deployment, but how can you actually move the data to the cloud for scoring? One simple way is by using SQL Server Integration Services (SSIS).
In our last post on this topic, we showed how to use the provided Web Service Task available in SSIS to score data in ADAPA via web services. A limitation of the Web Service Task, however, is that one needs to provide values for the input parameters through the task UI, which is not very useful if the intent is to score several data records.
If that is the case, the answer is to use a Script Component instead. Below, we demonstrate in 10 simple steps how to setup, write and execute our own Script Component so that we can successfully pass multiple data records to ADAPA on the cloud and get a result back for each record (without having to type the inputs in the UI every time).
Step 1: Installing the Zementis Security Certificate
First, we need to have the “Zementis Security Certificate” installed as a trusted 3rd party root certification authority. A link to the Zementis Security Certificate is available from the ADAPA Control Center window. Look for the link underneath the table of available instances. To install the certificate, follow the instructions posted here – follow the instructions on how to install a certificate on a server with Microsoft Management Console (MMC). Instead of importing the certificate to the “Personal” folder under “Certificates”, choose instead “Third-Party Root Certification Authorities”. See figure below.
After having the certificate in place, it is time to start working on the SSIS project to have our data scored by ADAPA.
Step 2: Uploading PMML models into ADAPA
In the example below, we launched an ADAPA instance by using the ADAPA Control Center. In this instance, we uploaded two models available in the PMML Examples page of the Zementis website: “Iris_NN.xml” and “Iris_SVM.xml”. These are the respective PMML exports of a neural network model and a support vector machine built to solve the Iris classification problem. The figure below shows the ADAPA Console after the two models have been uploaded.
Step 3: Creating the Excel data file
From the same PMML Examples page, we also downloaded the “Iris_NN.csv” file, opened it in Excel, deleted the “Class” column and saved it as an Excel file named “Iris_NN_input.xlsx”. We will be passing each data record in this file to ADAPA on cloud via a web service request in the Script Component.
Step 4: Preparing a package in SSIS for the Script Component
First, let’s create a new SSIS package. Then, drag a Data Flow Task from the Control Flow Items tab to the Control Flow design surface. Double-click the Data Flow Task to open the Data Flow design surface. Drag a Excel Source component to the design surface. Double-click this source component and choose to create a new Excel Connection Manager. Select the Excel file created in step 3 above. Make sure to select First row has column names. Click OK and select the appropriate Excel sheet. Click OK again to return to the Data Flow task editor.
Step 5: Adding the Script Component
Drag a Script Component from the Data Flow Transformations tab to the Data Flow design surface. In the dialog box, select Transformations and click OK. Connect the output of the Excel Source component to the Script Component. See figure below.
Step 6: Setting-up the Script Component
Double-click the Script Component to open the Script Transformation Editor. On the Input Columns tab, check the Name box to select all input columns (sepal_length, sepal_width, petal_length, and petal_width). On the Input and Outputs tab, expand the Output node and select the Output Columns folder. Click the Add Column button. Edit the Name property of the column to be Class (this is the name of the predicted field in the Iris model). Iris types will be strings, so select the Data Type property of the column and use the drop-down list to select the type String [DT_STR]. Finally, select the Script tab and click the Design Script button to open the script editor. We will be using C# in this example.
Step 7: Creating a Web Reference to ADAPA
The script editor will automatically create a C# frame for our script. We will populate it later. For now, let’s create a web reference to ADAPA. On the Project Explorer panel on the left side of the script editor, right-click on the References folder and select to Add Web Reference. In the Add Web Reference box, enter the URL for the ADAPA WSDL file. See figure below.
A link to the ADAPA WSDL file can be found in the ADAPA Console. See ADAPA Web Console figure posted on Step 3 above. For security reasons, SSIS will ask us to enter username and password multiple times before it concludes generating the client code for calling the ADAPA web service. When done, it should look like the figure below.
The username and password refer to the e-mail address and password we used to access the ADAPA Web Console.
Step 8: Handling HTTP authentication
To see the classes generated by SSIS, go to the Project Explorer panel, open the Web References folder and double click on com.amazonaws.compute…. The Object Browser tab will appear on the right side of the script editor. On this panel, find and select the …com.amazonaws.compute… folder. This folder should contain a class for the adapaModelsService. The figure below shows the script editor with the Object Browser tab on the righ and the Project Explorer panel on the left.
Now, double click on the adapaModelsService class to open file Reference.cs on the script editor. To class adapaModelsService add a method (see below) to override the GetWebRequest() method of the System.Web.Services.Protocols.SoapHttpClientProtocol class that the client code derived from. This is so that the SSIS generated classes are able to handle the basic authentication process required by ADAPA. The new GetWebRequest() method can be obtained on a posting by Mark Michaelis dated March 23, 2005. After copying this method into the adapaModelsService class, the script editor will look like the figure below.
Also add the following two lines of code to the original list of includes:
using System.Text
using System.Net
and save the file.
Step 9: Writing the script (finally!)
Switch tabs now to “main.cs”. This is the file containing the code for our C# script. This is the place in which we write the client call to ADAPA. The figure below shows the script editor after editing it. Note that the client code requires that the network credentials are set and the PreAuthenticate property is assigned true (as per Mark Michaelis posting).
As in the previous step, make sure to add the following lines of code to the original list of includes:
using System.Net
using …com.amazonaws.compute…
where …com.amazonaws.compute… is the namespace generated by SSIS for the ADAPA WSDL. Copy it from the Object Explorer tab and paste it to “main.cs”. All we need to do now is to move to the Project Explorer panel and right click on the root folder and select Build. We should see the message Build Succeeded on the bottom left corner of the script editor. We can now close the editor and click OK to close the Script Transformation Editor. Our mission to build the script to access ADAPA on the cloud is finished. All we need to do now is to run it.
Step 10: Scoring data
To see the results of all our labor, we will use the Row Count component. Let’s first create a variable in the Variables tab and name it “RowCount”. Make sure its scope is global by clicking first on the back panel of the Data Flow design surface. Then, switch to the Toolbox tab and drag a Row Count component from the Data Flow Transformations tab to the Data Flow design surface. Connect the output of the Script Component to the Row Count component. Double click on the Row Count component. Use the drop-down list associated with VariableName of CustomProperties and select the variable just created which now reads “user::RowCount”. The best way to see the output of the Script Component is to add a data viewer to the Data Flow on the path between the output of the Script Component and the input of the Row Count component. For that, right click on the connection between the Script Component and Row Count and select Data Viewers. Select Data Viewers again on the Data Flow Path Editor and click Add. Select type Grid on the Configure Data Viewer window and click OK. Then, click OK again on the Data Flow Path Editor. We are now ready to execute our script and see the results generated by ADAPA.
To execute the package, select Start Debugging from the Debug menu. By doing that, the data viewer window will appear containing the data and results of the web service call to ADAPA and the Iris neural network model. Note that for every single input record, we obtained a result back which identifies the class, i.e. the particular Iris type, the input characteristics belongs to. See figure below for results.
If you are interested in learning more about accessing Web Services with SSIS Script, we highly recommend the following list of resources:
- Donald Farmer’s book – The Rational Guide to Extending SSIS 2005 with Script
- Jamie Thompson’s blog – SSIS: Consuming Web Services in SSIS 2008
- Mark Michaeli’s blog – Calling Web Services Using Basic Authentication
Last, but not least, the package we created, named “ADAPAWS.dtsx”, can be downloaded by clicking here (this is a zip file that you will need to unzip). Remember to replace “yourname@company.com” and “yourADAPApassword” in “main.cs” by the e-mail and password used to access the ADAPA Web Console.
Comprehensive blog featuring topics related to predictive analytics with an emphasis on open standards, Predictive Model Markup Language (PMML), cloud computing, as well as the deployment and integration of predictive models in any business process.