Generated on 2023-10-16
#9220 | [FEA] Add GPU support for converting binary data to a hex string in REPL |
#9171 | [FEA] Add GPU version of ToPrettyString |
#5314 | [FEA] Support window.rowsBetween(Window.unboundedPreceding, -1) |
#9057 | [FEA] Add unbounded to unbounded fixers for min and max |
#8121 | [FEA] Add Spark 3.5.0 shim layer |
#9224 | [FEA] Allow } and }} to be transpiled to static strings |
#8596 | [FEA] Support spark.sql.legacy.parquet.datetimeRebaseModeInWrite=LEGACY |
#8767 | [AUDIT][SPARK-43302][SQL] Make Python UDAF an AggregateFunction |
#9055 | [FEA] Support Spark 3.3.3 official release |
#8672 | [FEA] Make GPU readers easier to debug on failure (any failure including OOM) |
#8965 | [FEA] Enable Bloom filter join acceleration by default |
#8625 | [FEA] Support outputTimestampType being INT96 |
#7803 | [FEA] Accelerate Bloom filtered joins |
#9060 | [BUG] OOM error in split and retry with multifile coalesce reader with parquet data |
#8916 | [BUG] Databricks - move init scripts off DBFS |
#9416 | [BUG] CDH build failed due to missing dependencies |
#9357 | [BUG] json_test failed on "NameError: name 'TimestampNTZType' is not defined" |
#9271 | [BUG] ThreadPool size is deduced incorrectly in MultiFileReaderThreadPool on YARN clusters |
#9309 | [BUG] bround and round do not return the correct result for some decimal values. |
#9153 | [BUG] netty OOM with MULTITHREADED shuffle |
#9311 | [BUG] test_hash_groupby_collect_list fails |
#9180 | [FEA][AUDIT][SPARK-44641] Incorrect result in certain scenarios when SPJ is not triggered |
#9290 | [BUG] delta_lake_test FAILED on "column mapping mode id is not supported for this Delta version" |
#9255 | [BUG] Unable to read DeltaTable with columnMapping.mode = name |
#9261 | [BUG] Leaks and Double Frees in Unit Tests |
#9246 | [BUG] test_predefined_character_classes failed with seed 4 |
#9208 | [BUG] SplitAndRetryOOM query14_part1 at 100TB with spark.executor.cores=64 |
#9106 | [BUG] Configuring GDS breaks new host spillable buffers and batches |
#9131 | [BUG] ConcurrentModificationException in ScalableTaskCompletion |
#9263 | [BUG] Unit test logging is not captured when running against Spark 3.5.0 |
#9168 | [BUG] Calling RmmSpark.getAndResetNumRetryThrow from tests is not working |
#8776 | [BUG] FileCacheIntegrationSuite intermittent failure |
#9223 | [BUG] Failed to create memory map on query14_part1 at 100TB with spark.executor.cores=64 |
#9116 | [BUG] spark350 shim build failed in mvn-verify github checks and nightly due to dependencies not released |
#8984 | [BUG] Check that keys are not null when creating a map |
#9233 | [BUG] test_parquet_testing_error_files - Failed: DID NOT RAISE <class 'Exception'> in databricks runtime 12.2 |
#9142 | [BUG] AWS EMR 6.12 NDS SF3k query9 Failure on g4dn.4xlarge |
#9214 | [BUG] mvn resolve dependencies failed missing rapids-4-spark-sql-plugin-api_2.12 of 311 shim |
#9204 | [BUG] SplitAndRetryOOM query78 at 100TB with spark.executor.cores=64 |
#9213 | [BUG] Missing revision info in databricks shims failed nightly build |
#9206 | [BUG] test_datetime_roundtrip_with_legacy_rebase failed in databricks runtimes |
#9165 | [BUG] Data gen for key groups produces type-mismatch columns |
#9129 | [BUG] Writing Parquet map(map) column can not set the outer key as non-null. |
#9194 | [BUG] missing sql-plugin-api databricks artifacts in the nightly CI |
#9167 | [BUG] Ensure no udf-compiler internal nodes escape |
#9092 | [BUG] NDS query 64 falls back to CPU only for a shuffle |
#9071 | [BUG] test_numeric_running_sum_window_no_part_unbounded failed in MT tests |
#9154 | [BUG] Spark 3.5.0 nightly build failures (test_parquet_testing_error_files) |
#9149 | [BUG] compile failed in databricks runtimes due to new added TestReport |
#9041 | [BUG] Fix regression in Python UDAF support when running against Spark 3.5.0 |
#9064 | [BUG][Spark 3.5.0] Re-enable test_hive_empty_simple_udf when 3.5.0-rc2 is available |
#9065 | [BUG][Spark 3.5.0] Reinstate cast map/array to string tests when 3.5.0-rc2 is available |
#9119 | [BUG] Predicate pushdown doesn't work for parquet files written by GPU |
#9103 | [BUG] test_select_complex_field fails in MT tests |
#9086 | [BUG] GpuBroadcastNestedLoopJoinExec can assert in doUnconditionalJoin |
#8939 | [BUG] q95 odd task failure in query95 at 30TB |
#9082 | [BUG] Race condition while spilling and aliasing a RapidsBuffer (regression) |
#9069 | [BUG] ParquetFormatScanSuite does not pass locally |
#8980 | [BUG] invalid escape sequences in pytests |
#7807 | [BUG] Round robin partitioning sort check falls back to CPU for cases that can be supported |
#8482 | [BUG] Potential leak on SplitAndRetry when iterator not fully drained |
#8942 | [BUG] NDS query 14 parts 1 and 2 both fail at SF100K |
#8778 | [BUG] GPU Parquet output for TIMESTAMP_MICROS is misinteterpreted by fastparquet as nanos |
#9445 | Only run test_csv_infer_schema_timestamp_ntz tests with PySpark >= 3.4.1 |
#9420 | Update private and jni dep version to released 23.10.0 |
#9415 | [BUG] fix docker modified check in premerge [skip ci] |
#9392 | Only run test_json_ts_formats_round_trip_ntz tests with PySpark >= 3.4.1 |
#9401 | Remove using mamba before they fix the incompatibility issue [skip ci] |
#9381 | Change the executor core calculation to take into account the cluster manager |
#9351 | Put back in full decimal support for format_number |
#9374 | GpuCoalesceBatches should throw SplitAndRetyOOM on GPU OOM error |
#9238 | Simplified handling of GPU core dumps |
#9362 | [DOC] Removing User Guide pages that will be source of truth on docs.nvidia… |
#9365 | Update DataWriteCommandExec docs to reflect ORC support for nested types |
#9277 | [Doc]Remove CUDA related requirement from download page.[Skip CI] |
#9352 | Refine rules for skipping test_csv_infer_schema_timestamp_ntz_* tests |
#9334 | Add NaNs to Data Generators In Floating-Point Testing |
#9344 | Update MULTITHREADED shuffle maxBytesInFlight default to 128MB |
#9330 | Add Hao to blossom-ci whitelist |
#9328 | Building different Cuda versions section profile does not take effect [skip ci] |
#9329 | Add kuhushukla to blossom ci yml |
#9281 | Support format_number |
#9335 | Temporarily skip failing tests test_csv_infer_schema_timestamp_ntz* |
#9318 | Update authorized user in blossom-ci whitelist [skip ci] |
#9221 | Add GPU version of ToPrettyString |
#9321 | [DOC] Fix some incorrect config links in doc [skip ci] |
#9314 | Fix RMM crash in FileCacheIntegrationSuite with ARENA memory allocator |
#9287 | Allow checkpoint and restore on non-deterministic expressions in GpuFilter and GpuProject |
#9146 | Improve some CSV integration tests |
#9159 | Update tests and documentation for spark.sql.timestampType when reading CSV/JSON |
#9313 | Sort results of collect_list test before comparing since it is not guaranteed |
#9286 | [FEA][AUDIT][SPARK-44641] Incorrect result in certain scenarios when SPJ is not triggered |
#9229 | Support negative preceding/following for ROW-based window functions |
#9297 | Append new authorized user to blossom-ci whitelist [skip ci] |
#9294 | Fix test_delta_read_column_mapping test failures on Spark 3.2.x and 3.3.x |
#9285 | Add CastOptions to make GpuCast extendible to handle more options |
#9279 | Fix file format checks to be exact and handle Delta Lake column mapping |
#9283 | Refactor ExternalSource to move some APIs to converted GPU format or scan |
#9264 | Fix leak in test and double free in corner case |
#9280 | Fix some issues found with different seeds in integration tests |
#9257 | Have host spill use the new HostAlloc API |
#9253 | Enforce Scala method syntax over deprecated procedure syntax |
#9273 | Add arm64 profile to build arm artifacts |
#9270 | Remove GDS spilling |
#9267 | Roll our own BufferedIterator so we can close cleanly |
#9266 | Specify correct dependency versions for 350 build |
#9262 | Add Delta Lake support for Spark 3.4.1 and Delta Lake tests on Spark 3.4.x |
#9256 | Test Parquet double column stat without NaN |
#9254 | [Doc]update the emr getting started doc for emr-6130 release[skip ci] |
#9228 | Add in unbounded to unbounded optimization for min/max |
#9252 | Add Spark 3.5.0 to list of supported Spark versions [skip ci] |
#9251 | Enable a couple of retry asserts in internal row to cudf row iterator suite |
#9239 | Handle escaping the dangling right ] and right } in the regexp transpiler |
#9090 | Add test cases for Parquet statistics |
#9240 | Fix flaky ORC filecache test |
#9053 | [DOC] update the turning guide document issues [skip ci] |
#9211 | Allow skipping host spill for a direct device->disk spill |
#9234 | Enable Spark 350 builds |
#9237 | Check for null keys when creating map |
#9235 | xfail fixed_length_byte_array.parquet test due to rapidsai/cudf#14104 |
#9231 | Use conda libmamba solver to resolve intermittent libarchive issue [skip ci] |
#8404 | Add in support for FIXED_LEN_BYTE_ARRAY as binary |
#9225 | Add in a HostAlloc API for high priority and add in spilling |
#9207 | Support SplitAndRetry for GpuRangeExec |
#9217 | Fix leak in aggregate when there are retries |
#9200 | Fix a few minor things with scale test |
#9222 | Deploy classified aggregator for Databricks [skip ci] |
#9209 | Fix tests for datetime rebase in Databricks |
#9181 | [DOC] address document issues [skip ci] |
#9132 | Support spark.sql.parquet.datetimeRebaseModeInWrite=LEGACY |
#9196 | Fix host memory leak for R2C |
#9192 | Throw overflow exception when interval seconds are outside of range [0, 59] |
#9150 | add error section in report and the rest queries |
#9189 | Expose host store spill |
#9147 | Make map column non-nullable when it's a key in another map. |
#9193 | Support Retry for GpuLocalLimitExec and GpuGlobalLimitExec |
#9183 | Add test to verify UDT fallback for parquet |
#9195 | Deploy sql-plugin-api artifact in DBR CI pipelines [skip ci] |
#9170 | Add in new HostAlloc API |
#9182 | Consolidate Spark vendor shim dependency management |
#9190 | Prevent returning internal compiler expressions when compiling UDFs |
#9164 | Support Retry for GpuTopN and GpuSortEachBatchIterator |
#9134 | Fix shuffle fallback due to AQE on AWS EMR |
#9188 | Fix flaky tests in FileCacheIntegrationSuite |
#9148 | Add minimum Maven module eventually containing all non-shimmable source code |
#9169 | Add retry-without-split in InternalRowToColumnarBatchIterator |
#9172 | Remove doSetSpillable in favor of setSpillable |
#9152 | Add test cases for testing Parquet compression types |
#9157 | XFAIL parquet lz4_raw tests for Spark 3.5.0 or later |
#9128 | Test parquet predicate pushdown for basic types and fields having dots in names |
#9158 | Add json4s dependencies for Databricks integration_tests build |
#9102 | Add retry support to GpuOutOfCoreSortIterator.mergeSortEnoughToOutput |
#9089 | Add application to run Scale Test |
#9143 | [DOC] update spark.rapids.sql.concurrentGpuTasks default value in tuning guide [skip ci] |
#9141 | Removed resultDecimalType in GpuIntegralDecimalDivide |
#9099 | Spark 3.5.0 follow-on work (rc2 support + Python UDAF) |
#9140 | Bump Jython to 2.7.3 |
#9136 | Moving row column conversion code from cudf to jni |
#9133 | Add 350 tag to InSubqueryShims |
#9124 | Import scala.collection intead of collection |
#9122 | Fall back to CPU if spark.sql.execution.arrow.useLargeVarTypes is true |
#9115 | [DOC] updates documentation related to java compatibility [skip ci] |
#9098 | Add SpillableHostColumnarBatch |
#9091 | GPU support for DynamicPruningExpression and InSubqueryExec |
#9117 | Temply disable spark 350 shim build in nightly [skip ci] |
#9113 | Instantiate execution plan capture callback via shim loader |
#8969 | Initial support for Spark 3.5.0-rc1 |
#9100 | Support broadcast nested loop existence joins with no condition |
#8925 | Add GpuConv operator for the conv 10<->16 expression |
#9109 | [DOC] adding java 11 to download docs [skip ci] |
#9085 | Retry with smaller split on CudfColumnSizeOverflowException |
#8961 | Save Databricks init scripts in the workspace |
#9088 | Add retry and SplitAndRetry support to AcceleratedColumnarToRowIterator |
#9095 | Support released spark 3.3.3 |
#9084 | Fix race when a rapids buffer is aliased while it is spilled |
#9093 | Update ParquetFormatScanSuite to not call CUDF directly |
#9068 | Test ORC predicate pushdown (PPD) with timestamps decimals booleans |
#9054 | Initial entry point to data generation for scale test |
#9070 | Spillable host buffer |
#9066 | Add retry support to RowToColumnarIterator |
#9073 | Stop using invalid escape sequences |
#9018 | Add test for selecting a single complex field array and its parent struct array |
#9067 | Add array support for round robin partition; Refactor pluginSupportedOrderableSig |
#9072 | Revert "Implement SumUnboundedToUnboundedFixer (#8934)" |
#9056 | Add in configs for host memory limits |
#9061 | Fix import order |
#8934 | Implement SumUnboundedToUnboundedFixer |
#9051 | Use number of threads on executor instead of driver to set core count |
#9040 | Fix issues from 23.08 merge in join_test |
#9045 | Fix auto merge conflict 9043 [skip ci] |
#9009 | Add in a layer of indirection for task completion callbacks |
#9013 | Create a two-shim jar by default on Databricks |
#8995 | Add test case for ORC statistics test |
#8970 | Add ability to debug dump input data only on errors |
#9003 | Fix auto merge conflict 9002 [skip ci] |
#8989 | Mark lazy spillables as allowSpillable in during gatherer construction |
#8988 | Move big data generator to a separate module |
#8987 | Fix host memory buffer leaks in SerializationSuite |
#8968 | Enable GPU acceleration of Bloom filter join expressions by default |
#8947 | Add ArrowUtilsShims in preparation for Spark 3.5.0 |
#8946 | [Spark 3.5.0] Shim access to StructType.fromAttributes |
#8824 | Drop the in-range check at INT96 output path |
#8924 | Deprecate and delegate GpuCV.debug to cudf TableDebug |
#8915 | Move LegacyBehaviorPolicy references to shim layer |
#8918 | Output unified diff when GPU output deviates |
#8857 | Remove the pageable pool |
#8854 | Fix auto merge conflict 8853 [skip ci] |
#8805 | Bump up dep versions to 23.10.0-SNAPSHOT |
#8796 | Init version 23.10.0-SNAPSHOT |
#5509 | [FEA] Support order-by on Array |
#7876 | [FEA] Add initial support for Databricks 12.2 ML LTS |
#8547 | [FEA] Add support for Delta Lake 2.4 with Spark 3.4 |
#8633 | [FEA] Add support for xxHash64 function |
#4929 | [FEA] Support min/max aggregation/reduction for arrays of structs and arrays of strings |
#8668 | [FEA] Support min and max for arrays |
#4887 | [FEA] Hash partitioning on ArrayType |
#6680 | [FEA] Support hashaggregate for Array[Any] |
#8085 | [FEA] Add support for MillisToTimestamp |
#7801 | [FEA] Window Expression orderBy column is not supported in a window range function, found DoubleType |
#8556 | [FEA] [Delta Lake] Add support for new metrics in MERGE |
#308 | [FEA] Spark 3.1 adding support for TIMESTAMP_SECONDS, TIMESTAMP_MILLIS and TIMESTAMP_MICROS functions |
#8122 | [FEA] Add spark 3.4.1 snapshot shim |
#8525 | [FEA] Add support for org.apache.spark.sql.functions.flatten |
#8202 | [FEA] List supported Spark builds when the Shim is not found |
#8231 | [FEA] Add filecache support to ORC scans |
#8141 | [FEA] Explore how to best deal with large numbers of aggregations in the short term |
#9034 | [BUG] java.lang.ClassCastException: com.nvidia.spark.rapids.RuleNotFoundExprMeta cannot be cast to com.nvidia.spark.rapids.GeneratorExprMeta |
#9032 | [BUG] Multiple NDS queries fail with Spark-3.4.1 with bloom filter exception |
#8962 | [BUG] Nightly build failed: ExecutionPlanCaptureCallback$.class is not bitwise-identical across shims |
#9021 | [BUG] test_map_scalars_supported_key_types failed in dataproc 2.1 |
#9020 | [BUG] auto-disable snapshot shims test in github action for pre-release branch |
#9010 | [BUG] Customer failure 23.08: Cannot compute hash of a table with a LIST of STRUCT columns. |
#8922 | [BUG] integration map_test:test_map_scalars_supported_key_types failures |
#8982 | [BUG] Nightly prerelease failures - OrcSuite |
#8978 | [BUG] compiling error due to OrcSuite&OrcStatisticShim in databricks runtimes |
#8610 | [BUG] query 95 @ SF30K fails with OOM exception |
#8955 | [BUG] Bloom filter join tests can fail with multiple join columns |
#45 | [BUG] very large shuffles can fail |
#8779 | [BUG] Put shared Databricks test script together for ease of maintenance |
#8930 | [BUG] checkoutSCM plugin is unstable for pre-merge CI, it is often unable to clone submodules |
#8923 | [BUG] Mortgage test failing with 'JavaPackage' error on AWS Databricks |
#8303 | [BUG] GpuExpression columnarEval can return scalars from subqueries that may be unhandled |
#8318 | [BUG][Databricks 12.2] GpuRowBasedHiveGenericUDF ClassCastException |
#8822 | [BUG] Early terminate CI if submodule init failed |
#8847 | [BUG] github actions CI messed up w/ JDK versions intermittently |
#8716 | [BUG] test_hash_groupby_collect_set_on_nested_type and test_hash_reduction_collect_set_on_nested_type failed |
#8827 | [BUG] databricks cudf_udf night build failing with pool size exceeded errors |
#8630 | [BUG] Parquet with RLE encoded booleans loads corrupted data |
#8735 | [BUG] test_orc_column_name_with_dots fails in nightly EGX tests |
#6980 | [BUG] Partitioned writes release GPU semaphore with unspillable GPU memory |
#8784 | [BUG] hash_aggregate_test.py::test_min_max_in_groupby_and_reduction failed on "TypeError: object of type 'NoneType' has no len()" |
#8756 | [BUG] [Databricks 12.2] RapidsDeltaWrite queries that reference internal metadata fail to run |
#8636 | [BUG] AWS Databricks 12.2 integration tests failed due to Iceberg check |
#8754 | [BUG] databricks build broke after adding bigDataGen |
#8726 | [BUG] Test "parquet_write_test.py::test_hive_timestamp_value[INJECT_OOM]" failed on Databricks |
#8690 | [BUG buildall script does not support JDK11 profile |
#8702 | [BUG] test_min_max_for_single_level_struct failed |
#8727 | [BUG] test_column_add_after_partition failed in databricks 10.4 runtime |
#8669 | [BUG] SpillableColumnarBatch doesn't always take ownership |
#8655 | [BUG] There are some potential device memory leaks in AbstractGpuCoalesceIterator |
#8685 | [BUG] install build fails with Maven 3.9.3 |
#8156 | [BUG] Install phase for modules with Spark build classifier fails for install plugin versions 3.0.0+ |
#1130 | [BUG] TIMESTAMP_MILLIS not handled in isDateTimeRebaseNeeded |
#7676 | [BUG] SparkShimsImpl class initialization in SparkShimsSuite for 340 too eager |
#8278 | [BUG] NDS query 16 hangs at SF30K |
#8665 | [BUG] EGX nightly tests fail to detect Spark version on startup |
#8647 | [BUG] array_test.py::test_array_min_max[Float][INJECT_OOM] failed mismatched CPU and GPU output in nightly |
#8640 | [BUG] Optimize Databricks pre-merge scripts, move it out into a new CI file |
#8308 | [BUG] Device Memory leak seen in integration_tests when AssertEmptyNulls are enabled |
#8602 | [BUG] AutoCloseable Broadcast results are getting closed by Spark |
#8603 | [BUG] SerializeConcatHostBuffersDeserializeBatch.writeObject fails with ArrayIndexOutOfBoundsException on rows-only table |
#8615 | [BUG] RapidsShuffleThreadedWriterSuite temp shuffle file test failure |
#6872 | [BUG] awk: cmd. line:1: warning: regexp escape sequence `\ ' is not a known regexp operator |
#8588 | [BUG] Spark 3.3.x integration tests failed due to missing jars |
#7775 | [BUG] scala version hardcoded irrespective of Spark dependency |
#8548 | [BUG] cache_test:test_batch_no_cols test FAILED on spark-3.3.0+ |
#8579 | [BUG] build failed on Databricks clusters "GpuDeleteCommand.scala:104: type mismatch" |
#8187 | [BUG] Integration test test_window_running_no_part can produce non-empty nulls (cudf scan) |
#8493 | [BUG] branch-23.08 fails to build on Databricks 12.2 |
#9407 | [Doc]Update docs for 23.08.2 version[skip ci] |
#9382 | Bump up project version to 23.08.2 |
#8476 | Use retry with split in GpuCachedDoublePassWindowIterator |
#9048 | Update 23.08 changelog 23/08/15 [skip ci] |
#9044 | [DOC] update release version from v2308.0 to 2308.1 [skip ci] |
#9036 | Fix meta class cast exception when generator not supported |
#9042 | Bump up project version to 23.08.1-SNAPSHOT |
#9035 | Handle null values when merging Bloom filters |
#9029 | Update 23.08 changelog to latest [skip ci] |
#9023 | Allow WindowLocalExec to run on CPU for a map test. |
#9024 | Do not trigger snapshot spark version test in pre-release maven-verify checks [skip ci] |
#8975 | Init 23.08 changelog [skip ci] |
#9016 | Fix issue where murmur3 tried to work on array of structs |
#9014 | Updating link to download jar [skip ci] |
#9006 | Revert test changes to fix binary dedup error |
#9001 | [Doc]update the emr getting started doc for emr-6120 release[skip ci] |
#8949 | Update JNI and private version to released 23.08.0 |
#8977 | Create an anonymous subclass of AdaptiveSparkPlanHelper in ExecutionPlanCaptureCallback.scala |
#8972 | [Doc]Add best practice doc[skip ci] |
#8948 | [Doc]update download docs for 2308 version[skip ci] |
#8971 | Fix test_map_scalars_supported_key_types |
#8990 | Remove doc references to 312db [skip ci] |
#8960 | [Doc] address profiling tool formatted issue [skip ci] |
#8983 | Revert OrcSuite to fix deployment build |
#8979 | Fix Databricks build error for new added ORC test cases |
#8920 | Add test case to test orc dictionary encoding with lots of rows for nested types |
#8940 | Add test case for ORC statistics test |
#8909 | Match Spark's NaN handling in collect_set |
#8892 | Experimental support for BloomFilterAggregate expression in a reduction context |
#8957 | Fix building dockerfile.cuda hanging at tzdata installation [skip ci] |
#8944 | Fix issues around bloom filter with multple columns |
#8744 | Add test for selecting a single complex field array and its parent struct array |
#8936 | Device synchronize prior to freeing a set of RapidsBuffer |
#8935 | Don't go over shuffle limits on CPU |
#8927 | Skipping test_map_scalars_supported_key_types because of distributed … |
#8931 | Clone submodule using git command instead of checkoutSCM plugin |
#8917 | Databricks shim version for integration test |
#8775 | Support BloomFilterMightContain expression |
#8833 | Binary and ternary handling of scalar audit and some fixes |
#7233 | [FEA] Support order by on single-level array |
#8893 | Fix regression in Hive Generic UDF support on Databricks 12.2 |
#8828 | Put shared part together for Databricks test scripts |
#8872 | Terminate CI if fail to clone submodule |
#8787 | Add in support for ExponentialDistribution |
#8868 | Add a test case for testing ORC version V_0_11 and V_0_12 |
#8795 | Add ORC writing test cases for not implicitly lowercase columns |
#8871 | Adjust parallelism in spark-tests script to reduce memory footprint [skip ci] |
#8869 | Specify expected JAVA_HOME and bin for mvn-verify-check [skip ci] |
#8785 | Add test cases for ORC writing according to options orc.compress and compression |
#8810 | Fall back to CPU for deletion vectors writes on Databricks |
#8830 | Update documentation to add Databricks 12.2 as a supported platform [skip ci] |
#8799 | Add tests to cover some odd corner cases with nulls and empty arrays |
#8783 | Fix collect_set_on_nested_type tests failed |
#8855 | Fix bug: Check GPU file instead of CPU file [skip ci] |
#8852 | Update test scripts and dockerfiles to match cudf conda pkg change [skip ci] |
#8848 | Try mitigate mismatched JDK versions in mvn-verify checks [skip ci] |
#8825 | Add a case to test ORC writing/reading with lots of nulls |
#8802 | Treat unbounded windows as truly non-finite. |
#8798 | Add ORC writing test cases for dictionary compression |
#8829 | Enable rle_boolean_encoding.parquet test |
#8667 | Make state spillable in partitioned writer |
#8801 | Fix shuffling an empty Struct() column with UCX |
#8748 | Add driver log warning when GPU is limiting scheduling resource |
#8786 | Add support for row-based execution in RapidsDeltaWrite |
#8791 | Auto merge to branch-23.10 from branch-23.08[skip ci] |
#8790 | Update ubuntu dockerfiles default to 20.04 and deprecating centos one [skip ci] |
#8777 | Install python packages with shared scripts on Databricks |
#8772 | Test concurrent writer update file metrics |
#8646 | Add testing of Parquet files from apache/parquet-testing |
#8684 | Add 'submodule update --init' when build spark-rapids |
#8769 | Remove iceberg scripts from Databricks test scripts |
#8773 | Add a test case for reading/write null to ORC |
#8749 | Add test cases for read/write User Defined Type (UDT) to ORC |
#8768 | Add support for xxhash64 |
#8751 | Ensure columnarEval always returns a GpuColumnVector |
#8765 | Add in support for maps to big data gen |
#8758 | Normal and Multi Distributions for BigDataGen |
#8755 | Add in dependency for databricks on integration tests |
#8737 | Fix parquet_write_test.py::test_hive_timestamp_value failure for Databricks |
#8745 | Conventional jar layout is not required for JDK9+ |
#8706 | Add a tool to support generating large amounts of data |
#8747 | xfail hash_groupby_collect_set and hash_reduction_collect_set on nested type cases |
#8689 | Support nested arrays for min /max aggregations in groupby and reduction |
#8699 | Regression test for array of struct with a single field name "element" in Parquet |
#8733 | Avoid generating numeric null partition values on Databricks 10.4 |
#8728 | Use specific mamba version and install libarchive explictly [skip ci] |
#8594 | String generation from complex regex in integration tests |
#8700 | Add regression test to ensure Parquet doesn't interpret timestamp values differently from Hive 0.14.0+ |
#8711 | Factor out modules shared among shim profiles |
#8697 | Spillable columnar batch takes ownership and improve code coverage |
#8705 | Add schema evolution integration tests for partitioned data |
#8673 | Fix some potential memory leaks |
#8707 | Update config docs for new filecache configs [skip ci] |
#8695 | Always create the main artifact along with a shim-classifier artifact |
#8704 | Add tests for column names with dots |
#8703 | Comment out min/max agg test for nested structs to unblock CI |
#8698 | Cache last ORC stripe footer to avoid redundant remote reads |
#8687 | Handle TIMESTAMP_MILLIS for rebase check |
#8688 | Enable the 340 shim test |
#8656 | Return result from filecache message instead of null |
#8659 | Filter out nulls for build batches when needed in hash joins |
#8682 | [DOC] Update CUDA requirements in documentation and Dockerfiles[skip ci] |
#8637 | Support Float order-by columns for RANGE window functions |
#8681 | changed container name to adapt to blossom-lib refactor [skip ci] |
#8573 | Add support for Delta Lake 2.4.0 |
#8671 | Fix use-after-freed bug in GpuFloatArrayMin |
#8650 | Support TIMESTAMP_SECONDS, TIMESTAMP_MILLIS and TIMESTAMP_MICROS |
#8495 | Speed up PCBS CPU read path by not recalculating as much |
#8389 | Add filecache support for ORC |
#8658 | Check if need to run Databricks pre-merge |
#8649 | Add Spark 3.4.1 shim |
#8624 | Rename numBytesAdded/Removed metrics and add deletion vector metrics in Databricks 12.2 shims |
#8645 | Fix "PytestUnknownMarkWarning: Unknown pytest.mark.inject_oom" warning |
#8608 | Matrix stages to dynamically build Databricks shims |
#8517 | Revert "Disable asserts for non-empty nulls (#8183)" |
#8628 | Enable Delta Write fallback tests on Databricks 12.2 |
#8632 | Fix GCP examples and getting started guide [skip ci] |
#8638 | Support nested structs for min /max aggregations in groupby and reduction |
#8639 | Add iceberg test for nightly DB12.2 IT pipeline[skip ci] |
#8618 | Heuristic to speed up partial aggregates that get larger |
#8605 | [Doc] Fix demo link in index.md [skip ci] |
#8619 | Enable output batches metric for GpuShuffleCoalesceExec by default |
#8617 | Fixes broadcast spill serialization/deserialization |
#8531 | filecache: Modify FileCacheLocalityManager.init to pass in Spark context |
#8613 | Try print JVM core dump files if any test failures in CI |
#8616 | Wait for futures in multi-threaded writers even on exception |
#8578 | Add in metric to see how much computation time is lost due to retry |
#8590 | Drop ".dev0" suffix from Spark SNASHOT distro builds |
#8604 | Upgrade scalatest version to 3.2.16 |
#8555 | Support flatten SQL function |
#8599 | Fix broken links in advanced_configs.md |
#8589 | Revert to the JVM-based Spark version extraction in pytests |
#8582 | Fix databricks shims build errors caused by DB updates |
#8564 | Fold verify-all-modules-with-headSparkVersion into verify-all-modules [skip ci] |
#8553 | Handle empty batch in ParquetCachedBatchSerializer |
#8575 | Corrected typos in CONTRIBUTING.md [skip ci] |
#8574 | Remove maxTaskFailures=4 for pre-3.1.1 Spark |
#8503 | Remove hard-coded version numbers for dependencies when building on |
#8544 | Fix auto merge conflict 8543 [skip ci] |
#8521 | List supported Spark versions when no shim found |
#8520 | Add support for first, last, nth, and collect_list aggregations for BinaryType |
#8509 | Remove legacy spark version check |
#8494 | Fix 23.08 build on Databricks 12.2 |
#8487 | Move MockTaskContext to tests project |
#8426 | Pre-merge CI to support Databricks 12.2 |
#8282 | Databricks 12.2 Support |
#8407 | Bump up dep version to 23.08.0-SNAPSHOT |
#8359 | Init version 23.08.0-SNAPSHOT |
Changelog of older releases can be found at docs/archives