-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] No automated tests for REPLs (pyspark, spark-shell, notebooks) #5704
Comments
We have some notebooks testing on Databricks almost every day. |
Thanks @GaryShen2008 ! This is great! We need to do some forensics with @tgravescs to dig out the notebook that was responsible for #3760 to include this test case into our daily notebook testing. |
I don't know what notebook showed this so we would have to go back to try to reproduce |
Contributes to NVIDIA#5704 Signed-off-by: Gera Shegalov <[email protected]>
) Contributes to #5704 This PR aims to catch issues like #9500. It modifies run_pyspark_from_build mostly to avoid recreating the logic of figuring out jar location etc. Currently it may not catch this if do not have Spark 3.5.0 CI yet. But this is how it could reproduce the #9500 ```Bash $ SPARK_HOME=~/dist/spark-3.1.1-bin-hadoop3.2 SPARK_SHELL_SMOKE_TEST=1 ./integration_tests/run_pyspark_from_build.sh ... + grep -F 'res0: Array[org.apache.spark.sql.Row] = Array([4950])' res0: Array[org.apache.spark.sql.Row] = Array([4950]) + echo 'SUCCESS spark-shell smoke test...' SUCCESS spark-shell smoke test $ echo $? 0 $ SPARK_HOME=~/dist/spark-3.5.0-bin-hadoop3 SPARK_SHELL_SMOKE_TEST=1 ./integration_tests/run_pyspark_from_build.sh $ echo $? 1 SPARK_SHELL_SMOKE_TEST=1 \ PYSP_TEST_spark_shuffle_manager=com.nvidia.spark.rapids.spark311.RapidsShuffleManager \ SPARK_HOME=~/dist/spark-3.1.1-bin-hadoop3.2 \ ./integration_tests/run_pyspark_from_build.sh + echo 'SUCCESS spark-shell smoke test' SUCCESS spark-shell smoke test $ echo $? 0 SPARK_SHELL_SMOKE_TEST=1 \ PYSP_TEST_spark_shuffle_manager=com.nvidia.spark.rapids.spark350.RapidsShuffleManager \ SPARK_HOME=~/dist/spark-3.5.0-bin-hadoop3 \ ./integration_tests/run_pyspark_from_build.sh $ echo $? 1 ``` Signed-off-by: Gera Shegalov <[email protected]>
Describe the bug
Our codebase contains classloading-sensitve code such as
Classloader architecture in REPLs is different and much more complicated than in batch
spark-submit
ted Spark apps.REPL's such as jupyter and Databricks notebooks are tested late in the dev-cycle manually. Bugs are detected too late into the release #3760.
We need to shift-left detection of breaking changes by automating manual notebook/REPL tests.
Steps/Code to reproduce bug
Various
Expected behavior
Catch bugs in REPLs and Notebooks no later than nightly tests
Environment details (please complete the following information)
Databricks, local REPL
Additional context
#5646
Tasks
The text was updated successfully, but these errors were encountered: