I got an update from the folks at FICO on the new release of Model Builder, their predictive analytics workbench. My last update was for Blaze Advisor 6.9 and Model Builder 7.0 and this release, Model Builder 7.1, is the first point release for Model Builder on the new 7.x platform.
I got an update from the folks at FICO on the new release of Model Builder, their predictive analytics workbench. My last update was for Blaze Advisor 6.9 and Model Builder 7.0 and this release, Model Builder 7.1, is the first point release for Model Builder on the new 7.x platform.
For those of you who have not seen it, Model Builder is FICO’s data mining and predictive analytics workbench. Designed for an analytic modeler or statistician, it applies mathematical techniques to analyze data and make usable predictions. It supports the whole range of data mining/analytic tasks from data integration and preparation to variable identification, modeling and deployment. With the 7.0 platform it moved to the Eclipse environment and became tightly integrated with FICO Blaze Advisor, their business rules management system allowing models and variables to be shared between the Model Builder and Blaze Advisor design environments using a common repository. It competes, to some extent, with products like IBM SPSS Modeler, SAS Enterprise Miner or KNIME though its focus is narrower, being largely (though not exclusively) focused on credit risk modeling. It’s strengths have historically been the built-in scorecard engineering capabilities that FICO specializes in and its deployment of models to rules-based execution environments using Blaze Advisor.
Release 7.1 focuses on improving reject inference (explained below) and credit risk modeling more generally as well as continued focus on ease of use and faster deployment – better (risk) models built in less time and deployed more easily.
Reject inference is one of those topic that those who work in credit risk modeling deal with every day. Here’s the problem: When you make credit offers to people you base them (in part) on some analytic model. Some people get offers, some do not. And some of those that do never actually take up the loan. As people use the loans you gather more data about their behavior – who keeps paying, who defaults who pays off early etc. If you decide you want to expand your pool of loans – make it easier for a group you previously rejected to get credit – then you have a problem as you don’t have any data about their behavior with a loan (obviously, since they never got one). Reject inference is about using mathematical techniques to infer the likely behavior of those who were rejected or who never took up their loans. By reconstructing the total applicant pool, banks are able to build unbiased scoring models so they can expand their approval levels and correct for the influence of “cherry picking” in their historical decisions.
Anyway, traditionally FICO and its customers do single score reject inference. This uses the known performance curve for those who were offered credit to create a KGB or Known Good-Bad score. This score is then applied to those who were rejected to infer their likely behavior. The problem with this approach is that you often have a fair amount of score engineering work to do as there are overrides and exceptions (people with poor scores who got accepted anyway for instance). With Model Builder 7.1 you can now also do dual score reject inference. In this case an Accept-Reject score is also created and blended with the Known Good-Bad score to infer performance of those previously rejected more accurately. The blending helps reduce the work in allowing for overrides and generally improves the accuracy of the inference.
Model Builder 7.1 implements this in a new Performance Inference (reject inference) wizard with some simple interfaces lets a modeler look at a dataset, identify the rejects and those who were accepted but did not proceed and see what behavior is predicted for them. The wizard lets the modeler walk through selecting the performance variable, create test/train data, assign sample weights and then select variables to be auto-binned before generating a scorecard. As you work through this process you see a population reconstruction for those rejected and those offered but not booked.
This wizard also uses a nice productivity feature of Model Builder. Variables are tagged with metadata when created including roles like target variables or sample weights. The wizards and UI then use this metadata to highlight the variables of the correct type quickly. This may not sound like much but analytic datasets get huge with thousands of variables so it really helps.
The wizard also uses another new feature that has value beyond reject inference. The new multi-target interactive binner lets you interact with variables and bins and see Weight of Evidence rates by bin for two different analytic outcomes (such as the known good-bad and accept-reject flags). As long as the bins share records (a bin is a set of values for a particular variable in the model that are considered the same for the purpose of the model) the tool allows you to compare the Weight of Evidence for each business outcome to see if they move in lock step or contradict each other. This could also be used for risk and response models, directly comparing the weight of evidence varies between the bins in the two outcomes, and engineering a single model that targets both
There are a number of other new modeling features. A range divergence objective function helps detect and correct for cherry picking. For instance, a historical origination rule might say we are never going to grant credit to those with bankruptcies but we allow a few exceptions, who are carefully screened, so we have a bias in the data where it seems like everyone with a bankruptcy is a really good customer. The tool also provides some new reports including a new performance inference summary with things like inference iteration history, key ratios etc as well as swapsets, population reconstruction etc. The summary of this report is nicely business/decision focused and is backed up with a bunch of statistical stuff. I wish all analytic reports were structured this way… A nice model performance comparison GUI showing lift curves and bad/accept rate tradeoffs has also been added.
Architecturally, models are now stored in the same repository as rules (since Model Builder 7). This allows models to be versioned as a set of rules – when a new version of a model is released it can be pushed into the repository as a new version of the implementing ruleset. This integration has also helped with some other FICO products such as the Originations Manager 4.0 released in December 2010 which is now shipping with a limited use version of Model Builder, as well as TRIAD Customer Manager and Debt Manager. All these are built around Blaze Advisor so they too can share models with Model Builder.
Model Builder 7 shares its variables and their definitions with the Blaze Advisor rule repository. This is a great feature as deploying variables for a model is even more work that deploying the models. An export, using PMML for instance, that only includes model scores will still need a lot of set up work on the variable side (there’s no reason why PMML models can’t include variable processing logic but all too many exports don’t) 7.1 includes updated management applications for managing variables and deployed models that use the Blaze Advisor 6.9 rule maintenance application widgets all embedded in the modeling tool. Finally the tool boasts some improved project organization, java object support in deployments and full 64-bit support.
You can get more information on FICO Model Builder here.