Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing ordering issue in subquery for time column in GAPFILL based queries #15096

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

shauryachats
Copy link
Contributor

The current GAPFILL implementation expects the time column series to always be the first column in any subquery executed before the GAPFILL operation, which is not an ideal behaviour since it breaks perfectly valid SQLs like:

SELECT
  occupied,
  GapFill(time_col, '1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss.SSS',
    '2021-11-07 4:00:00.000',
    '2021-11-07 12:00:00.000', '1:HOURS',
    FILL(occupied, 'FILL_PREVIOUS_VALUE'),
    TIMESERIESON(alias_levelId, alias_lotId)),
  alias_lotId,
  alias_levelId
FROM
  (
    SELECT
      lastWithTime(isOccupied, eventTime, 'INT') as occupied,
      cast(lotId as varchar) as alias_lotId,
      cast(levelId as varchar) as alias_levelId,
      DATETIMECONVERT(eventTime,'1:MILLISECONDS:EPOCH',
        '1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss.SSS',
        '1:HOURS') AS time_col
    FROM
      parkingData
    WHERE
      eventTime >= 1636257600000 AND eventTime <= 1636286400000
    GROUP BY
      alias_levelId, time_col,alias_lotId
    LIMIT 200
  )
LIMIT 200

The current implementation of findTimeBucketColumnIndex simply looks for the GAPFILL expression in the relevant subquery and operates on the assumption that the time bucket column is positioned in the same index as GAPFILL when GAPFILL is stripped while execution. This assumption only works for queries where GAPFILL is present in the lowest subquery and is the first to be executed. It fails for queries where a subquery is executed before GAPFILL (AGGREGATE_GAPFILL and AGGREGATE_GAPFILL_AGGREGATE).

This PR fixes this by recomputing the timeBucketColumnIndex for the aforementioned GAPFILL types using the broker response to identify the index of the time bucket column. It also patches the rest of the GAPFILL computation logic to remove the assumption that GAPFILL is the first column in the subquery.

Testing

Successfully tested that the fix works in TIMESTAMP quickstart. Also added multiple unit tests to make sure each possible flow is working correctly with the fix.
(Note: current unit tests did not have queries that trigger CountGapfillProcessor and SumAvgGapfillProcessor and this PR adds tests for those query shapes as well).

@codecov-commenter
Copy link

codecov-commenter commented Feb 20, 2025

Codecov Report

Attention: Patch coverage is 87.50000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 63.41%. Comparing base (59551e4) to head (042f9d1).
Report is 1741 commits behind head on master.

Files with missing lines Patch % Lines
.../pinot/core/query/reduce/BaseGapfillProcessor.java 75.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #15096      +/-   ##
============================================
+ Coverage     61.75%   63.41%   +1.66%     
- Complexity      207     1480    +1273     
============================================
  Files          2436     2745     +309     
  Lines        133233   154282   +21049     
  Branches      20636    23806    +3170     
============================================
+ Hits          82274    97843   +15569     
- Misses        44911    49055    +4144     
- Partials       6048     7384    +1336     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 63.38% <87.50%> (+1.68%) ⬆️
java-21 63.31% <87.50%> (+1.68%) ⬆️
skip-bytebuffers-false 63.41% <87.50%> (+1.66%) ⬆️
skip-bytebuffers-true 63.28% <87.50%> (+35.55%) ⬆️
temurin 63.41% <87.50%> (+1.66%) ⬆️
unittests 63.41% <87.50%> (+1.66%) ⬆️
unittests1 56.05% <87.50%> (+9.15%) ⬆️
unittests2 33.94% <0.00%> (+6.21%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants