Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Got error running shap_elimination.fit_compute() #268

Open
jeong2624 opened this issue Aug 14, 2024 · 3 comments
Open

Got error running shap_elimination.fit_compute() #268

jeong2624 opened this issue Aug 14, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@jeong2624
Copy link

I ran the function shap_elimination.fit_compute(X, y), but the error occurs on line 493 of feature_elimination.py.
ValueError: all the input array dimensions except for the concatenation axis must match exactly.

What should I do if I can't do it even though I fit X and y in data frames and series formats, respectively?

Environment (please complete the following information):

  • probatus version : 3.1.2
  • python version : 3.9.0
  • OS: macOS

To Reproduce

Put your code here

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

feature = pd.read_csv("Feature.csv", sep = "\t")
feature.index = feature.loc[:, "Unnamed: 0"].tolist()
feature = feature.drop(columns = ["Unnamed: 0"]).transpose()
feature.shape

target = pd.read_csv("target.csv", sep = "\t")
target.index = target.loc[:, "geo_accession"].tolist()
target = target.drop(columns = ["Unnamed: 0", "geo_accession"])
target.shape

from sklearn.model_selection import train_test_split, RepeatedStratifiedKFold

X_train, X_test, y_train, y_test = train_test_split(feature, target, test_size = 0.3,
stratify = target, random_state = 42)

train_median = np.median(X_train, axis = 0)
train_std = np.std(X_train, axis = 0)

X_train_scale = (X_train - train_median) / train_std
X_test_scale = (X_test - train_median) / train_std

X_train = pd.DataFrame(X_train_scale, index = X_train.index, columns = X_train.columns)
X_test = pd.DataFrame(X_test_scale, index = X_test.index, columns = X_test.columns)

from sklearn.preprocessing import LabelEncoder

encoder = LabelEncoder()
encoder.fit(y_train)
y_train_encoder = encoder.transform(y_train)
y_test_encoder = encoder.transform(y_test)

y_train = pd.Series(y_train_encoder, index = y_train.index)
y_test = pd.Series(y_test_encoder, index = y_test.index)

from sklearn.svm import SVC
from probatus.feature_elimination import ShapRFECV
plt.rcdefaults()
model = SVC(random_state = 42, probability = True, kernel = "linear")
cv = RepeatedStratifiedKFold(n_splits = 10, n_repeats = 10, random_state = 123)

shap_elimination = ShapRFECV(model, step = 0.1, cv = cv,
scoring = 'accuracy', n_jobs = -1, random_state = 42)
report = shap_elimination.fit_compute(X_train, y_train, check_additivity = False)
performance_plot = shap_elimination.plot()

Error traceback

ValueError Traceback (most recent call last)
Cell In[10], line 11
7 cv = RepeatedStratifiedKFold(n_splits = 10, n_repeats = 10, random_state = 123)
9 shap_elimination = ShapRFECV(model, step = 0.1, cv = cv,
10 scoring = 'accuracy', n_jobs = -1, random_state = 42)
---> 11 report = shap_elimination.fit_compute(X_train, y_train, check_additivity = False)
12 performance_plot = shap_elimination.plot()

File ~/miniforge3/envs/Project/lib/python3.9/site-packages/probatus/feature_elimination/feature_elimination.py:311, in ShapRFECV.fit_compute(self, X, y, sample_weight, columns_to_keep, column_names, shap_variance_penalty_factor, **shap_kwargs)
248 def fit_compute(
249 self,
250 X,
(...)
256 **shap_kwargs,
257 ):
258 """
259 Fits the object with the provided data.
260
(...)
308 DataFrame containing results of feature elimination from each iteration.
309 """
--> 311 self.fit(
312 X,
313 y,
314 sample_weight=sample_weight,
315 columns_to_keep=columns_to_keep,
316 column_names=column_names,
317 shap_variance_penalty_factor=shap_variance_penalty_factor,
318 **shap_kwargs,
319 )
320 return self.compute()

File ~/miniforge3/envs/Project/lib/python3.9/site-packages/probatus/feature_elimination/feature_elimination.py:493, in ShapRFECV.fit(self, X, y, sample_weight, columns_to_keep, column_names, groups, shap_variance_penalty_factor, **shap_kwargs)
491 shap_values = np.concatenate([current_result[0] for current_result in results_per_fold], axis=0)
492 else: # multi-class case
--> 493 shap_values = np.concatenate([current_result[0] for current_result in results_per_fold], axis=1)
495 scores_train = [current_result[1] for current_result in results_per_fold]
496 scores_val = [current_result[2] for current_result in results_per_fold]

File <array_function internals>:180, in concatenate(*args, **kwargs)

ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 194 and the array at index 9 has size 193

@jeong2624 jeong2624 added the bug Something isn't working label Aug 14, 2024
@ReinierKoops
Copy link

ReinierKoops commented Aug 14, 2024

Hi @jeong2624 my initial thought is that Probatus should (for now) only work with Tree & linear models. So this might require an adaptation of Probatus. I'll have a closer look tomorrow. Thanks for your report and interest!

@jeong2624
Copy link
Author

jeong2624 commented Aug 15, 2024

I solved this problem by downgrading version for shap and probatus.

  • shap version : 0.43.0
  • probatus version : 3.1.0
  • numpy version : 1.26.1 (additional information)

It is likely that the latest version of shap was related to raising this error.

@ReinierKoops
Copy link

ReinierKoops commented Aug 15, 2024

Thank you, will investigate how to patch this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants