Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different results after coverting to treelite shared library #411

Closed
winstonzhao opened this issue Oct 29, 2022 · 1 comment
Closed

Different results after coverting to treelite shared library #411

winstonzhao opened this issue Oct 29, 2022 · 1 comment

Comments

@winstonzhao
Copy link

winstonzhao commented Oct 29, 2022

My XgBoost model is giving different results after converting to a treelite model.

Minimal example:

import numpy as np
import pandas as pd
import pickle
import treelite
import treelite_runtime
import xgboost as xgb
sample = pd.read_parquet("sample.parquet")
features = list(sample.columns)[1:-3]
xgb1 = xgb.Booster(model_file="model_1s_1week.booster")
print(xgb1.predict(xgb.DMatrix(sample[features])))
model = treelite.Model.from_xgboost(xgb1)
model.export_lib(toolchain='gcc', libpath='./model_1s_1week.so', params={'parallel_comp': 32}, verbose=False)
predictor = treelite_runtime.Predictor('./model_1s_1week.so', verbose=False)
print(predictor.predict(treelite_runtime.DMatrix(sample[features], dtype='float32')))
>>>
[-0.00013971  0.00022221 -0.00022852 -0.00027931 -0.00024623 -0.00030208
 -0.00035697 -0.00042081 -0.00032556]
[21:50:54] ../src/compiler/ast/split.cc:29: Parallel compilation enabled; member trees will be divided into 32 translation units.
[-0.01349026 -0.01085109 -0.00022864 -0.00027949 -0.00024635 -0.0003022
  0.00029168 -0.00042087 -0.0003258 ]

Referenced files model_1s_1week.booster and sample.parquet have been attached.
files.zip

I feel like I'm probably doing something extremely basic incorrectly.
Modelled trained with the following:

xgb1 = xgb.XGBRegressor(learning_rate =0.1,
                     n_estimators=1000,
                     max_depth=3,
                     min_child_weight=1,
                     gamma=0.1,
                     subsample=0.75,
                     colsample_bytree=0.8,
                     reg_alpha=0.01,
                     reg_lambda=1.0,
                     predictor='gpu_predictor',
                     # tree_method="exact",
                     grow_policy="lossguide",
                     tree_method="hist",
                     objective= 'reg:squarederror',
                    feature_selector="greedy",
                    top_k=4,
                     seed=27)

xgb1.fit(train_df[features], train_df[target], eval_set=[(forward_df[features], forward_df[target])], early_stopping_rounds=50)

Versions:

xgboost = 0.90
treelite = 3.0.0
@winstonzhao
Copy link
Author

winstonzhao commented Oct 29, 2022

Duplicate of dmlc/tl2cgen#11

This can be fixed by changing:
print(predictor.predict(treelite_runtime.DMatrix(sample[features], dtype='float32')))
to
print(predictor.predict(treelite_runtime.DMatrix(sample[features].to_numpy(), dtype='float32')))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant