-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Add LogicalScalar #14617
base: main
Are you sure you want to change the base?
WIP: Add LogicalScalar #14617
Conversation
@@ -1831,15 +1625,15 @@ mod tests { | |||
), | |||
( | |||
Expr::Literal(ScalarValue::Date64(Some(0))), | |||
r#"CAST('1970-01-01 00:00:00' AS DATETIME)"#, | |||
r#"CAST('1970-01-01 00:00:00' AS TIMESTAMP)"#, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still an open issue and we need to check whether this is equivalent.
Maybe I also misunderstood Date64
(see LogicalScalar::from
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dateTime and timestamp is not equivalent
@@ -47,3 +50,53 @@ singleton!(LOGICAL_FLOAT64, logical_float64, Float64); | |||
singleton!(LOGICAL_DATE, logical_date, Date); | |||
singleton!(LOGICAL_BINARY, logical_binary, Binary); | |||
singleton!(LOGICAL_STRING, logical_string, String); | |||
|
|||
pub fn logical_timestamp( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some interning or similar could be beneficial here to share Arc
s between invocations.
I hope we have a complete PR before merging LogicalScalar, maybe we can merge it into a branch |
For decimal, I think it should be the same as ScalarValue given we need to differentiate the precision of it
I would be great if we don't have Result or |
/// TODO logical-types | ||
#[derive(Debug, Clone, Eq, Hash, PartialEq, PartialOrd, Ord)] | ||
pub struct LogicalDate { | ||
value: i32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably need LogicalScalar::Date32(i32) and LogicalScalar::Date64(i64) 🤔
/// A null value | ||
Null, | ||
/// Stores a scalar for [`NativeType::Boolean`]. | ||
Boolean(bool), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need Option for LogicalScalar too 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we a 100% sure on that? I think this is one of the things I don't like when working with ScalarValue
. Can't we just use LogicalScalar::Null
and use the schema information when converting to ScalarValue
?
#[derive(Clone, PartialEq, Eq, Hash, Debug, PartialOrd)] | ||
pub struct LogicalFixedSizeList { | ||
/// The inner list with a fixed size. | ||
inner: LogicalList, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might need len: i32
impl FixedSizeListArray {
/// Create a new [`FixedSizeListArray`] with `size` element size, panicking on failure
///
/// # Panics
///
/// Panics if [`Self::try_new`] returns an error
pub fn new(field: FieldRef, size: i32, values: ArrayRef, nulls: Option<NullBuffer>) -> Self {
Self::try_new(field, size, values, nulls).unwrap()
}
|
||
/// Returns the value of this timestamp as [DateTime] or [NaiveDateTime] depending on whether | ||
/// there is a time zone. | ||
pub fn value(&self) -> Result<LogicalTimestampValue> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need value
for LogicalScalar, we don't have similar functionality for ScalarValue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was to obtain a value that you would also use in non-arrow rust for representing this concept (e.g. types from chrono
) while still storeing the "arrow-like" values.
Which issue does this PR close?
This change is related to #12622.
While the PR is not yet complete, I won't be able to work on it for at least a few days. Therefore, I'd like to gather some feedback on the current direction and spark some further discussions in #12622.
Rationale for this change
In #12622, we discussed how can decouple
ScalarValue
from the physicalDataType
.While there is already work on the
logical-types
branch, this PR explores a different approach that may be easier to integrate into the currentmain
branch.This PR introduces the enum
LogicalScalar
- a scalar value that is decoupled from the physical arrowDataType
.By doing this,
ScalarValue
can remain tightly integrated withDataType
and no existing code breaks while still allowing us to useLogicalScalar
in situations where a coupling toDataType
is unwanted.Some design considerations of
LogicalScalar
:NativeType
. There is currently no support for extension types. However, adding aLogicalScalar::Extension
variant should be possible.Timestamp
have a correspondingLogicalTimestamp
concept). For some variants this prevents creating invalid values. Furthermore, it allows us to attach logic to these values (e.g., transformingLogicalTimestamp
tochrono::NaiveDateTime
).ScalarValue
. However, some fixes may still be necessary.LogicalScalar
is supposed to be used in at least two ways:ScalarValue
(LogicalScalar::from(scalar_value)
). The resultingLogicalScalar
can be more ergonomic to work with. To name a few "benefits": no working with arrow arrays, no dictionaries, no multiple encodings ofUtf8
strings and hopefully more. I have adapted the code indatafusion/sql/src/unparser/expr.rs
to demonstrate such a use case.ScalarValue
: As we want to decouple logical from physical types, the newLogicalScalar
can provide a vehicle for replacingScalarValue
inLogicalPlan
etc.The goal of this approach in the foreseeable future is to support the migration of
LogicalPlan
to logical types.To do this, we must replace the variant
Expr::Literal(ScalarValue)
withExpr::Literal(LogicalScalar)
.However, as this breaks quite a few things, doing this will be done in other PRs (I think #14609 is exploring this impact)
What changes are included in this PR?
LogicalScalar
enum and the contained structs and enums.datafusion/sql/src/unparser/expr.rs
to demonstrate the use ofLogicalScalar
.Are these changes tested?
Some tests via the changes in
datafusion/sql/src/unparser/expr.rs
.I would like to improve upon that.
Are there any user-facing changes?
Yes Public APIs (
LogicalScalar
) have been added.Documentation on
LogicalScalar
is still incomplete (see Open Issues).Open Issues
Result<LogicalScalar>
. I think, in most situations, extracting a logical value should succeed. However, there may be some special cases that I did not consider and there are quite someexpect
calls inscalar/logical/from_scalar_value.rs
. Basically this boils down to ensuring that we can always extract aScalarValue
from the arrays contained in, for example,ScalarValue::List
. What is your opinion on that?DATETIME
equivalent toTIMESTAMP
is this context? See changes in the test suite.ScalarValue
toLogicalScalar
LogicalDecimal
. I think it makes sense to use a dedicated decimal library here for working with decimal values and only convert to the arrow decimals once we need them. Any opinions?cc @jayzhan211