Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index out of bounds using lance extension in duckdb v0.7.0 #826

Open
jalateras opened this issue May 5, 2023 · 3 comments
Open

Index out of bounds using lance extension in duckdb v0.7.0 #826

jalateras opened this issue May 5, 2023 · 3 comments
Labels
bug Something isn't working duckdb good first issue Good for newcomers python rust Rust related tasks

Comments

@jalateras
Copy link

I build the lance duckdb extension using the version 2972ae209fd159b6ff15266d0a457f144029aa60. I can load the extension in duckdb v0.7.0

RUST_BACKTRACE=1 duckdb --unsigned
v0.7.0 f7827396d7
Enter ".help" for usage hints.
D load lance;

I then download the vec_data.lance dataset from s3://eto-public/datasets/sift/vec_data.lance/ bur when i execute the following select

D select count(*) from lance_scan('vec_data.lance');

i get this exception

thread '<unnamed>' panicked at 'index out of bounds: the len is 2 but the index is 18446744073709551615', src/scan.rs:110:24
stack backtrace:
   0: rust_begin_unwind
             at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panicking.rs:575:5
   1: core::panicking::panic_fmt
             at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/panicking.rs:65:14
   2: core::panicking::panic_bounds_check
             at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/panicking.rs:151:5
   3: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
   4: _read_lance_init
   5: __ZN6duckdb18CTableFunctionInitERNS_13ClientContextERNS_22TableFunctionInitInputE
   6: __ZN6duckdb26TableScanGlobalSourceStateC2ERNS_13ClientContextERKNS_17PhysicalTableScanE
   7: __ZNK6duckdb17PhysicalTableScan20GetGlobalSourceStateERNS_13ClientContextE
   8: __ZN6duckdb8Executor16SchedulePipelineERKNSt3__110shared_ptrINS_12MetaPipelineEEERNS_17ScheduleEventDataE
   9: __ZN6duckdb8Executor22ScheduleEventsInternalERNS_17ScheduleEventDataE
  10: __ZN6duckdb8Executor14ScheduleEventsERKNSt3__16vectorINS1_10shared_ptrINS_12MetaPipelineEEENS1_9allocatorIS5_EEEE
  11: __ZN6duckdb8Executor18InitializeInternalEPNS_16PhysicalOperatorE
  12: 
  13: __ZN6duckdb13ClientContext35PendingStatementOrPreparedStatementERNS_17ClientContextLockERKNSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEENS3_10unique_ptrINS_12SQLStatementENS3_14default_deleteISD_EEEERNS3_10shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
  14: __ZN6duckdb13ClientContext43PendingStatementOrPreparedStatementInternalERNS_17ClientContextLockERKNSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEENS3_10unique_ptrINS_12SQLStatementENS3_14default_deleteISD_EEEERNS3_10shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
  15: __ZN6duckdb13ClientContext28PendingQueryPreparedInternalERNS_17ClientContextLockERKNSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEERNS3_10shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
  16: __ZN6duckdb13ClientContext12PendingQueryERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEERNS1_10shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
  17: __ZN6duckdb17PreparedStatement12PendingQueryERNSt3__16vectorINS_5ValueENS1_9allocatorIS3_EEEEb
  18: __ZN6duckdb17PreparedStatement7ExecuteERNSt3__16vectorINS_5ValueENS1_9allocatorIS3_EEEEb
  19: _duckdb_shell_sqlite3_print_duckbox
  20: _exec_prepared_stmt
  21: _shell_exec
  22: _runOneSqlLine
  23: _process_input
  24: _main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
fatal runtime error: failed to initiate panic, error 5
Abort trap: 6
@jalateras
Copy link
Author

I tried the same with a smaller dataset created with Pandas

import pandas as pd
import lance

df = pd.DataFrame([['Ajitesh', 84, 183, 'no'],
                   ['Shailesh', 79, 186, 'yes'],
                   ['Seema', 67, 158, 'yes'],
                   ['Nidhi', 52, 155, 'no']])
df.columns = ['name', 'weight', 'height', 'smoker']
lance.write_dataset(df, '/tmp/small.lance')

and when i execute the following

duckdb --unsigned
v0.7.0 f7827396d7
Enter ".help" for usage hints.
D load lance;
D select count(*) from lance_scan('small.lance');

i get the following exception

thread '<unnamed>' panicked at 'index out of bounds: the len is 4 but the index is 18446744073709551615', src/scan.rs:110:24
stack backtrace:
   0: rust_begin_unwind
             at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panicking.rs:575:5
   1: core::panicking::panic_fmt
             at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/panicking.rs:65:14
   2: core::panicking::panic_bounds_check
             at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/panicking.rs:151:5
   3: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
   4: _read_lance_init
   5: __ZN6duckdb18CTableFunctionInitERNS_13ClientContextERNS_22TableFunctionInitInputE
   6: __ZN6duckdb26TableScanGlobalSourceStateC2ERNS_13ClientContextERKNS_17PhysicalTableScanE
   7: __ZNK6duckdb17PhysicalTableScan20GetGlobalSourceStateERNS_13ClientContextE
   8: __ZN6duckdb8Executor16SchedulePipelineERKNSt3__110shared_ptrINS_12MetaPipelineEEERNS_17ScheduleEventDataE
   9: __ZN6duckdb8Executor22ScheduleEventsInternalERNS_17ScheduleEventDataE
  10: __ZN6duckdb8Executor14ScheduleEventsERKNSt3__16vectorINS1_10shared_ptrINS_12MetaPipelineEEENS1_9allocatorIS5_EEEE
  11: __ZN6duckdb8Executor18InitializeInternalEPNS_16PhysicalOperatorE
  12: __ZN6duckdb13ClientContext24PendingPreparedStatementERNS_17ClientContextLockENSt3__110shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
  13: __ZN6duckdb13ClientContext35PendingStatementOrPreparedStatementERNS_17ClientContextLockERKNSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEENS3_10unique_ptrINS_12SQLStatementENS3_14default_deleteISD_EEEERNS3_10shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
  14: __ZN6duckdb13ClientContext43PendingStatementOrPreparedStatementInternalERNS_17ClientContextLockERKNSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEENS3_10unique_ptrINS_12SQLStatementENS3_14default_deleteISD_EEEERNS3_10shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
  15: __ZN6duckdb13ClientContext28PendingQueryPreparedInternalERNS_17ClientContextLockERKNSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEERNS3_10shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
  16: __ZN6duckdb13ClientContext12PendingQueryERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEERNS1_10shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
  17: __ZN6duckdb17PreparedStatement12PendingQueryERNSt3__16vectorINS_5ValueENS1_9allocatorIS3_EEEEb
  18: __ZN6duckdb17PreparedStatement7ExecuteERNSt3__16vectorINS_5ValueENS1_9allocatorIS3_EEEEb
  19: _duckdb_shell_sqlite3_print_duckbox
  20: _exec_prepared_stmt
  21: _shell_exec
  22: _runOneSqlLine
  23: _process_input
  24: _main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
fatal runtime error: failed to initiate panic, error 5
Abort trap: 6

and here is the full trace

thread '<unnamed>' panicked at 'index out of bounds: the len is 4 but the index is 18446744073709551615', src/scan.rs:110:24
stack backtrace:
   0:        0x118744912 - std::backtrace_rs::backtrace::libunwind::trace::hf6d6e64f9b264809
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:        0x118744912 - std::backtrace_rs::backtrace::trace_unsynchronized::h83629c2e54dbbc12
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:        0x118744912 - std::sys_common::backtrace::_print_fmt::h40995e5769fa5524
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/sys_common/backtrace.rs:65:5
   3:        0x118744912 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h8d94e552d95b28cc
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/sys_common/backtrace.rs:44:22
   4:        0x118767f9a - core::fmt::write::h421d4212716e9716
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/fmt/mod.rs:1209:17
   5:        0x11873e4bc - std::io::Write::write_fmt::hdc28b71c2d62dad8
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/io/mod.rs:1682:15
   6:        0x1187446da - std::sys_common::backtrace::_print::habfe2bb38db219c3
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/sys_common/backtrace.rs:47:5
   7:        0x1187446da - std::sys_common::backtrace::print::he11eab6b959c3b5b
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/sys_common/backtrace.rs:34:9
   8:        0x118746446 - std::panicking::default_hook::{{closure}}::ha68ba8cbe26bbbe3
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panicking.rs:267:22
   9:        0x118746197 - std::panicking::default_hook::h5cf85224a4df5bc6
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panicking.rs:286:9
  10:        0x118746b8d - std::panicking::rust_panic_with_hook::hed342721bf9addfa
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panicking.rs:688:13
  11:        0x118746943 - std::panicking::begin_panic_handler::{{closure}}::h3d9af89e51f2fba9
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panicking.rs:579:13
  12:        0x118744da8 - std::sys_common::backtrace::__rust_end_short_backtrace::hfb9719355016e93f
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/sys_common/backtrace.rs:137:18
  13:        0x11874660d - rust_begin_unwind
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panicking.rs:575:5
  14:        0x1188d6103 - core::panicking::panic_fmt::h1965fc2159be50bb
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/panicking.rs:65:14
  15:        0x1188d6246 - core::panicking::panic_bounds_check::h503aa148bf97089f
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/panicking.rs:151:5
  16:        0x11688216a - <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter::h3d52f90a316a19bb
  17:        0x1168702d0 - _read_lance_init
  18:        0x11671565e - __ZN6duckdb18CTableFunctionInitERNS_13ClientContextERNS_22TableFunctionInitInputE
  19:        0x104054aa9 - __ZN6duckdb26TableScanGlobalSourceStateC2ERNS_13ClientContextERKNS_17PhysicalTableScanE
  20:        0x104052fcf - __ZNK6duckdb17PhysicalTableScan20GetGlobalSourceStateERNS_13ClientContextE
  21:        0x104176f4f - __ZN6duckdb8Executor16SchedulePipelineERKNSt3__110shared_ptrINS_12MetaPipelineEEERNS_17ScheduleEventDataE
  22:        0x10417828b - __ZN6duckdb8Executor22ScheduleEventsInternalERNS_17ScheduleEventDataE
  23:        0x1041783f9 - __ZN6duckdb8Executor14ScheduleEventsERKNSt3__16vectorINS1_10shared_ptrINS_12MetaPipelineEEENS1_9allocatorIS5_EEEE
  24:        0x1041793ea - __ZN6duckdb8Executor18InitializeInternalEPNS_16PhysicalOperatorE
  25:        0x1040e5fe7 - __ZN6duckdb13ClientContext24PendingPreparedStatementERNS_17ClientContextLockENSt3__110shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
  26:        0x1040eb651 - __ZN6duckdb13ClientContext35PendingStatementOrPreparedStatementERNS_17ClientContextLockERKNSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEENS3_10unique_ptrINS_12SQLStatementENS3_14default_deleteISD_EEEERNS3_10shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
  27:        0x1040e9084 - __ZN6duckdb13ClientContext43PendingStatementOrPreparedStatementInternalERNS_17ClientContextLockERKNSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEENS3_10unique_ptrINS_12SQLStatementENS3_14default_deleteISD_EEEERNS3_10shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
  28:        0x1040e8bed - __ZN6duckdb13ClientContext28PendingQueryPreparedInternalERNS_17ClientContextLockERKNSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEERNS3_10shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
  29:        0x1040e979d - __ZN6duckdb13ClientContext12PendingQueryERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEERNS1_10shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
  30:        0x1040ff998 - __ZN6duckdb17PreparedStatement12PendingQueryERNSt3__16vectorINS_5ValueENS1_9allocatorIS3_EEEEb
  31:        0x1040f5499 - __ZN6duckdb17PreparedStatement7ExecuteERNSt3__16vectorINS_5ValueENS1_9allocatorIS3_EEEEb
  32:        0x10302cbe9 - _duckdb_shell_sqlite3_print_duckbox
  33:        0x10301ab81 - _exec_prepared_stmt
  34:        0x10300d2bd - _shell_exec
  35:        0x10301c5bd - _runOneSqlLine
  36:        0x10300e2b1 - _process_input
  37:        0x1030015bf - _main
fatal runtime error: failed to initiate panic, error 5
Abort trap: 6

@jalateras
Copy link
Author

@changhiskhan any progress on this

@eddyxu eddyxu added bug Something isn't working good first issue Good for newcomers duckdb labels May 12, 2023
@eddyxu
Copy link
Contributor

eddyxu commented May 18, 2023

Ok, i can reproduce this, seems only happen on count(), but not SELECT * FROM lance_scan(). Lemme look into it

Update:

So this query works SELECT COUNT(name) FROM lance_scan("/tmp/small.lance") but SELECT COUNT(*) FROM lance_scan("/tmp/small.lance").

It seems that here

https://github.com/eto-ai/lance/blob/5c370e9220b8b97e7b873497397ff7412adf7d98/integration/duckdb_lance/src/scan.rs#L107

projected_column_id() returns U64::Max = 18446744073709551615 here.

Fix:

When the projected column id is u64::Max, instead pick any (preferably smallest) column instead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working duckdb good first issue Good for newcomers python rust Rust related tasks
Projects
None yet
Development

No branches or pull requests

4 participants