Tutorial: Counterfactual explanations for scorecard with continuous target¶
This tutorial shows how to generate counterfactual explanations on scorecard models with continuous target. The dataset for this tutorial is https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html.
[1]:
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import HuberRegressor
from optbinning import BinningProcess
from optbinning import Scorecard
from optbinning.scorecard import Counterfactual
Load the dataset
[2]:
data = fetch_california_housing()
target = "target"
variable_names = data.feature_names
X = pd.DataFrame(data.data, columns=variable_names)
y = data.target
Scorecard model¶
First, we develop a scorecard following the steps presented in previous tutorials.
[3]:
binning_process = BinningProcess(variable_names)
estimator = HuberRegressor(max_iter=200)
scorecard = Scorecard(binning_process=binning_process,
estimator=estimator, scaling_method="min_max",
scaling_method_params={"min": 0, "max": 100},
reverse_scorecard=True,verbose=True)
scorecard.fit(X, y)
2024-01-15 00:14:37,344 | INFO : Scorecard building process started.
2024-01-15 00:14:37,346 | INFO : Options: check parameters.
2024-01-15 00:14:37,349 | INFO : Dataset: continuous target.
2024-01-15 00:14:37,351 | INFO : Binning process started.
2024-01-15 00:14:39,763 | INFO : Binning process terminated. Time: 2.4119s
2024-01-15 00:14:39,765 | INFO : Fitting estimator.
2024-01-15 00:14:40,146 | INFO : Fitting terminated. Time 0.3814s
2024-01-15 00:14:40,149 | INFO : Scorecard table building started.
2024-01-15 00:14:40,306 | INFO : Scorecard table terminated. Time: 0.1576s
2024-01-15 00:14:40,308 | INFO : Scorecard building process terminated. Time: 2.9628s
[3]:
Scorecard(binning_process=BinningProcess(variable_names=['MedInc', 'HouseAge',
'AveRooms',
'AveBedrms',
'Population',
'AveOccup', 'Latitude',
'Longitude']),
estimator=HuberRegressor(max_iter=200), reverse_scorecard=True,
scaling_method='min_max',
scaling_method_params={'max': 100, 'min': 0}, verbose=True)
Generating counterfactual explanations¶
As an input data point or query, we select the first sample. Note that a query must be either a dictionary of a pandas DataFrame.
[4]:
query = X.iloc[0, :].to_frame().T
[5]:
query
[5]:
MedInc | HouseAge | AveRooms | AveBedrms | Population | AveOccup | Latitude | Longitude | |
---|---|---|---|---|---|---|---|---|
0 | 8.3252 | 41.0 | 6.984127 | 1.02381 | 322.0 | 2.555556 | 37.88 | -122.23 |
[6]:
scorecard.predict(query)
[6]:
array([4.29854244])
The predicted outcome (house value) for this query (house) is 4.3. We want to generate counterfactual explanations to find out how to increase the house value to at least 4.5.
[7]:
cf = Counterfactual(scorecard=scorecard, verbose=True)
[8]:
cf.fit(X)
2024-01-15 00:14:40,397 | INFO : Counterfactual fit started.
2024-01-15 00:14:40,399 | INFO : Options: check parameters.
2024-01-15 00:14:40,401 | INFO : Compute optimization problem data.
2024-01-15 00:14:40,444 | INFO : Counterfactual fit terminated. Time: 0.0476s
[8]:
Counterfactual(scorecard=Scorecard(binning_process=BinningProcess(variable_names=['MedInc',
'HouseAge',
'AveRooms',
'AveBedrms',
'Population',
'AveOccup',
'Latitude',
'Longitude']),
estimator=HuberRegressor(max_iter=200),
reverse_scorecard=True,
scaling_method='min_max',
scaling_method_params={'max': 100, 'min': 0},
verbose=True),
verbose=True)
[9]:
cf.generate(query=query, y=4.5, outcome_type="continuous", n_cf=1,
max_changes=3, hard_constraints=["min_outcome"])
2024-01-15 00:14:40,477 | INFO : Counterfactual generation started.
2024-01-15 00:14:40,480 | INFO : Options: check parameters.
2024-01-15 00:14:40,492 | INFO : Options: check objectives and constraints.
2024-01-15 00:14:40,494 | INFO : Optimizer started.
2024-01-15 00:14:40,496 | INFO : Optimizer: build model...
2024-01-15 00:14:40,535 | INFO : Optimizer: solve...
2024-01-15 00:14:40,678 | INFO : Optimizer terminated. Time: 0.1819s
2024-01-15 00:14:40,680 | INFO : Post-processing started.
2024-01-15 00:14:40,691 | INFO : Post-processing terminated. Time: 0.0102s
2024-01-15 00:14:40,692 | INFO : Counterfactual generation terminated. Status: OPTIMAL. Time: 0.2154s
[9]:
Counterfactual(scorecard=Scorecard(binning_process=BinningProcess(variable_names=['MedInc',
'HouseAge',
'AveRooms',
'AveBedrms',
'Population',
'AveOccup',
'Latitude',
'Longitude']),
estimator=HuberRegressor(max_iter=200),
reverse_scorecard=True,
scaling_method='min_max',
scaling_method_params={'max': 100, 'min': 0},
verbose=True),
verbose=True)
[10]:
cf.information()
optbinning (Version 0.19.0)
Copyright (c) 2019-2024 Guillermo Navas-Palencia, Apache License 2.0
Status : OPTIMAL
Solver statistics
Type mip
Number of variables 42
Number of constraints 120
Objective value 7.7965
Best objective bound 7.7965
Objectives
proximity 0.9862
closeness 6.8103
Timing
Total time 0.24 sec
Fit 0.05 sec ( 19.87%)
Solver 0.18 sec ( 75.89%)
Post-processing 0.01 sec ( 5.59%)
The generate counterfactual suggest increasing the block population, reduce the average house occupancy and change the house block longitude. None of them seems doable.
[11]:
cf.display(show_only_changes=True, show_outcome=True)
[11]:
MedInc | HouseAge | AveRooms | AveBedrms | Population | AveOccup | Latitude | Longitude | outcome | |
---|---|---|---|---|---|---|---|---|---|
0 | - | - | - | [1.05, 1.07) | [986.50, 1426.50) | - | [34.10, 34.18) | - | 4.522533 |
Now, let’s generate several counterfactuals aiming to limit the house value to 4.0.
[12]:
cf.generate(query=query, y=4.0, outcome_type="continuous", n_cf=3,
max_changes=3,
hard_constraints=["diversity_features", "max_outcome"],
time_limit=30
).display(show_only_changes=True, show_outcome=True)
2024-01-15 00:14:40,752 | INFO : Counterfactual generation started.
2024-01-15 00:14:40,754 | INFO : Options: check parameters.
2024-01-15 00:14:40,761 | INFO : Options: check objectives and constraints.
2024-01-15 00:14:40,765 | INFO : Optimizer started.
2024-01-15 00:14:40,767 | INFO : Optimizer: build model...
2024-01-15 00:14:40,890 | INFO : Optimizer: solve...
2024-01-15 00:15:11,023 | INFO : Optimizer terminated. Time: 30.2561s
2024-01-15 00:15:11,024 | INFO : Post-processing started.
2024-01-15 00:15:11,054 | INFO : Post-processing terminated. Time: 0.0284s
2024-01-15 00:15:11,057 | INFO : Counterfactual generation terminated. Status: FEASIBLE. Time: 30.3048s
[12]:
MedInc | HouseAge | AveRooms | AveBedrms | Population | AveOccup | Latitude | Longitude | outcome | |
---|---|---|---|---|---|---|---|---|---|
0 | [5.79, 6.82) | - | - | - | [986.50, 1426.50) | [2.90, 3.01) | - | - | 3.233206 |
0 | [5.79, 6.82) | - | - | [1.07, 1.10) | [986.50, 1426.50) | - | - | - | 3.350462 |
0 | - | - | - | - | [986.50, 1426.50) | [3.11, 3.24) | - | [-118.91, inf) | 3.948499 |
And the same generation enforcing diversity on feature values.
[13]:
cf.generate(query=query, y=3.0, outcome_type="continuous", n_cf=3,
max_changes=3,
hard_constraints=["diversity_features", "diversity_values", "max_outcome"],
time_limit=30
).display(show_only_changes=True, show_outcome=True)
2024-01-15 00:15:11,109 | INFO : Counterfactual generation started.
2024-01-15 00:15:11,114 | INFO : Options: check parameters.
2024-01-15 00:15:11,125 | INFO : Options: check objectives and constraints.
2024-01-15 00:15:11,127 | INFO : Optimizer started.
2024-01-15 00:15:11,129 | INFO : Optimizer: build model...
2024-01-15 00:15:11,294 | INFO : Optimizer: solve...
2024-01-15 00:15:41,694 | INFO : Optimizer terminated. Time: 30.5651s
2024-01-15 00:15:41,696 | INFO : Post-processing started.
2024-01-15 00:15:41,716 | INFO : Post-processing terminated. Time: 0.0192s
2024-01-15 00:15:41,717 | INFO : Counterfactual generation terminated. Status: FEASIBLE. Time: 30.6084s
[13]:
MedInc | HouseAge | AveRooms | AveBedrms | Population | AveOccup | Latitude | Longitude | outcome | |
---|---|---|---|---|---|---|---|---|---|
0 | [4.53, 5.04) | - | - | - | [1426.50, 1911.50) | - | - | [-118.91, inf) | 2.481344 |
0 | [5.04, 5.79) | - | - | [1.07, 1.10) | [1911.50, 2720.50) | - | - | - | 2.882739 |
0 | [5.79, 6.82) | - | - | - | [986.50, 1426.50) | [3.52, 3.82) | - | - | 2.918294 |