XGBoost predict_proba 推理性能慢

更新时间：2023-12-02 14:38:58

为什么 xgboost 这么慢?":XGBClassifier() 是 XGBoost 的 scikit-learn API(参见例如 https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.XGBClassifier 了解更多详情).如果您直接调用该函数(而不是通过 API)，它会更快.为了比较这两个函数的性能，直接调用每个函数是有意义的，而不是直接调用一个函数和通过 API 调用一个函数.下面是一个例子:

"Why is xgboost so slow?": XGBClassifier() is the scikit-learn API for XGBoost (see e.g. https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.XGBClassifier for more details). If you call the function directly (not through an API) it will be faster. To compare the performance of the two functions it makes sense to call each function directly, instead of calling one function directly and one function through an API. Here is an example:

# benchmark_xgboost_vs_sklearn.py
# Adapted from `xgboost_test.py` by Jacob Schreiber 
# (https://gist.github.com/jmschrei/6b447aada61d631544cd)

"""
Benchmarking scripts for XGBoost versus sklearn (time and accuracy)
"""

import time
import random
import numpy as np
import xgboost as xgb
from sklearn.ensemble import GradientBoostingClassifier

random.seed(0)
np.random.seed(0)

def make_dataset(n=500, d=10, c=2, z=2):
    """
    Make a dataset of size n, with d dimensions and m classes,
    with a distance of z in each dimension, making each feature equally
    informative.
    """

    # Generate our data and our labels
    X = np.concatenate([np.random.randn(n, d) + z*i for i in range(c)])
    y = np.concatenate([np.ones(n) * i for i in range(c)])

    # Generate a random indexing
    idx = np.arange(n*c)
    np.random.shuffle(idx)

    # Randomize the dataset, preserving data-label pairing
    X = X[idx]
    y = y[idx]

    # Return x_train, x_test, y_train, y_test
    return X[::2], X[1::2], y[::2], y[1::2]

def main():
    """
    Run SKLearn, and then run xgboost,
    then xgboost via SKLearn XGBClassifier API wrapper
    """

    # Generate the dataset
    X_train, X_test, y_train, y_test = make_dataset(10, z=100)
    n_estimators=5
    max_depth=5
    learning_rate=0.17

    # sklearn first
    tic = time.time()
    clf = GradientBoostingClassifier(n_estimators=n_estimators,
        max_depth=max_depth, learning_rate=learning_rate)
    clf.fit(X_train, y_train)
    print("SKLearn GBClassifier: {}s".format(time.time() - tic))
    print("Acc: {}".format(clf.score(X_test, y_test)))
    print(y_test.sum())
    print(clf.predict(X_test))

    # Convert the data to DMatrix for xgboost
    dtrain = xgb.DMatrix(X_train, label=y_train)
    dtest  = xgb.DMatrix(X_test, label=y_test)
    # Loop through multiple thread numbers for xgboost
    for threads in 1, 2, 4:
        # xgboost's sklearn interface
        tic = time.time()
        clf = xgb.XGBModel(n_estimators=n_estimators, max_depth=max_depth,
            learning_rate=learning_rate, nthread=threads)
        clf.fit(X_train, y_train)
        print("SKLearn XGBoost API Time: {}s".format(time.time() - tic))
        preds = np.round( clf.predict(X_test) )
        acc = 1. - (np.abs(preds - y_test).sum()  / y_test.shape[0])
        print("Acc: {}".format( acc ))
        print("{} threads: ".format( threads ))
        tic = time.time()
        param = {
                  'max_depth' : max_depth,
                        'eta' : 0.1,
                      'silent': 1,
                   'objective':'binary:logistic',
                     'nthread': threads
                }
        bst = xgb.train( param, dtrain, n_estimators,
            [(dtest, 'eval'), (dtrain, 'train')] )
        print("XGBoost (no wrapper) Time: {}s".format(time.time() - tic))
        preds = np.round(bst.predict(dtest) )
        acc = 1. - (np.abs(preds - y_test).sum() / y_test.shape[0])
        print("Acc: {}".format(acc))

if __name__ == '__main__':
    main()

总结结果:

sklearn.ensemble.GradientBoostingClassifier()

时间:0.003237009048461914s
准确度:1.0

sklearn xgboost API 包装器 XGBClassifier()

sklearn xgboost API wrapper XGBClassifier()

时间:0.3436141014099121s
准确度:1.0

XGBoost(无包装)xgb.train()

XGBoost (no wrapper) xgb.train()

时间:0.0028612613677978516s
准确度:1.0

上一篇 : ：web - 网站如何判断访问者location下一篇 : 如何显示我的网站的访问者总数?

XGBoost predict_proba 推理性能慢

相关阅读

推荐文章