-
Notifications
You must be signed in to change notification settings - Fork 4k
Open
Labels
Description
Description
When using the CUDA version of LightGBM, calling Booster(...).predict(...) with pred_contrib=True on a sparse matrix (scipy.sparse.csr_matrix) with int64 indices causes a segmentation fault instead of a proper error message.
Reproducible example
import numpy as np
import lightgbm as lgb
from sklearn.datasets import make_multilabel_classification
from sklearn.model_selection import train_test_split
X, y = make_multilabel_classification(n_samples=100, sparse=True, n_features=5, n_classes=1, n_labels=2)
y = y.flatten()
X_train, X_test, y_train, _ = train_test_split(X, y, test_size=0.1, random_state=42)
X_test.indptr = X_test.indptr.astype(np.int64)
train_data = lgb.Dataset(X_train, label=y_train)
params = {
"objective": "binary",
"num_leaves": 7,
"min_data_in_bin": 1,
"min_data_in_leaf": 1,
"seed": 708,
"verbose": -1,
}
booster = lgb.train(params, train_set=train_data, num_boost_round=5)
preds = booster.predict(X_test, pred_contrib=True)Environment info
LightGBM version or commit hash: v4.6.0
Command(s) you used to install LightGBM
pip install lightgbm==4.6.0 \
--no-binary lightgbm \
--config-settings=cmake.define.USE_CUDA=ON
pip install scikit-learn==1.7.2Expected behavior
One of the following:
- Successful execution with int64 sparse matrix indices
- A clear exception/message indicating that int64 sparse matrix indices are not supported in CUDA builds
Tests should also be updated accordingly. In option 1.'s case, the test_predict_contrib_int64 test added in #7071 should no longer be skipped during CUDA jobs, or in option 2.'s case it should be tested that the clear message/exception is raised.
Additional Comments
Discovered when adding unit test for .predict() in #7071.
jameslamb