Here we spell out the conventions for how to get data needed to test H2O for different test environments. All data is retrieved using the gradlew wrapper that comes as a part of this repository. Please keep in mind that anywhere below that says $ ./gradlew
may require you run C:\> gradlew.bat
on windows machines instead. All sync commands use file size and "last modified" time stamps of each local file to determine whether the file needs to be updated.
Pre-push tests are intended to be fast and run often. Running time for all tests should be a few minutes.
- 8GB or better RAM.
- Five 1GB JVMs get started.
- All test data is either generated or exists in the s3://h2o-public-test-data/smalldata/ directory.
- Driven by java unit test runners
$ ./gradlew syncSmalldata
$ ./gradlew test
Laptop tests are meant to run stressful workloads on a powerful laptop. Running time for all tests should be less than an hour.
- At least 12 GB of RAM. At least 2 CPUs. (Most H2O developers use a 16 GB Macbook Pro with 4 CPUs and 8 hardware threads.)
-
Max 5 GB of data to download.
-
Search for data in the following order:
- path specified by environment variable H2O_BIGDATA + "/laptop"
- ./bigdata/laptop (A "magic" directory in your git workspace)
- /home/h2opublictestdata/bigdata/laptop
- /mnt/h2o-public-test-data/bigdata/laptop
- RUnit tests
- Python tests
$ ./gradlew syncBigdataLaptop
./gradlew testLaptop
Big server tests are meant to run stressful workloads on modern server hardware with lots of resources. Many servers may be used at once to reduce running time.
- At least 256 GB of RAM. Lots of CPUs.
-
Max 50 GB of data to download.
-
Take advantage of soft and hard links to make bigger datasets.
-
Search for data in the following order:
- path specified by environment variable H2O_BIGDATA + "/bigserver"
- ./bigdata/bigserver (A "magic" directory in your git workspace)
- /home/h2opublictestdata/bigdata/bigserver
- /mnt/h2o-public-test-data/bigdata/bigserver
- RUnit tests
- Python tests
CAUTION: Don't do this at home.
$ ./gradlew syncBigdataBigserver
./gradlew testBigserver
- Infinite RAM. Lots of cores.
-
Data lives in S3. Huge.
-
Search for data in the following order:
- s3://h2o-public-test-data/bigdata
- JVMs running directly in EC2 instances
You don't. Just point your test directly to s3://h2o-public-test-data/bigdata. Definitely do not copy the data to EBS disks.
Jenkins launches them nightly or on-demand. (Need instructions for how to do this.)