Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve speed for test #188

Open
tikkss opened this issue Dec 26, 2023 · 3 comments
Open

Improve speed for test #188

tikkss opened this issue Dec 26, 2023 · 3 comments

Comments

@tikkss
Copy link
Contributor

tikkss commented Dec 26, 2023

Running all tests takes about 70 seconds, even with cached files in my environment. During implementation, I tested frequently and waiting for 70 seconds each time was not realistic.

As a workaround, I've been running tests for the implemented dataset only, which completes in about 2.5 seconds using the following command:

ruby -I lib test/run-test.rb -t "HouseOfRepresentativeTest" -v

However, when submitting a pull request and running all tests, it takes about 70 seconds.

The tests for red-datasets use test-unit. I couldn't find how to speed up the tests, such as parallelization.

Is there any effective solution? Thanks!

My environment:

  • OS: MacOS Sonoma 14.2.1
  • CPU: Intel Core i5 2.3GHz 4-core Processor
  • Memory: 16GB

Running all tests with cached files in my environment on 723bdf2 below:

$ test/run-test.rb -v
Loaded suite test
Started
AFINNTest:
  test: #each:                                                  .: (0.080627)
  #metadata:
    test: #description:                                         .: (0.000977)
AdultTest:
  #metadata:
    test: #description:                                         .: (0.005996)
  test:
    test: #each:                                                .: (1.315731)
  train:
    test: #each:                                                .: (2.806436)
AozoraBunkoTest:
  test: #new:                                                   .: (0.113628)
  Book:
    test: #clear_cache! removes all cache files:                .: (0.034559)
    #html:
      test: not readable:                                       .: (0.000677)
      readable:
        test: encoding is ShiftJIS:                             .: (0.000690)
        test: encoding is UTF-8:                                .: (0.000354)
    #text:
      test: not readable:                                       .: (0.000187)
      test: readable:                                           .: (0.000918)
    converting boolean:
      test: #copyrighted?:                                      .: (0.098200)
      test: #person_copyrighted?:                               .: (0.099811)
CIFARTest:
  cifar-10:
    test:
      test: #each:                                              .: (0.002878)
    train:
      test: #each:                                              .: (0.004444)
  cifar-100:
    test:
      test: #each:                                              .: (0.002528)
      test: #to_table:                                          .: (0.002000)
    train:
      test: #each:                                              .: (0.002576)
      test: #to_table:                                          .: (0.001986)
  invalid:
    test: type:                                                 .: (0.000348)
CLDRPluralsTest:
  test: #each:                                                  .: (0.048240)
  #metadata:
    test: #description:                                         .: (0.018892)
CaliforniaHousingTest:
  test: #each:                                                  .: (1.359899)
  #metadata:
    test: #description:                                         .: (0.000205)
CommunitiesTest:
  test: #each:                                                  .: (0.186278)
  #metadata:
    test: #description:                                         .: (0.004938)
DiamondsTest:
  test: #each:                                                  .: (3.823073)
  #metadata:
    test: #description:                                         .: (0.000963)
DictionaryTest:
  test: #each:                                                  .: (0.102608)
  test: #id:                                                    .: (0.093453)
  test: #ids:                                                   .: (0.090334)
  test: #length:                                                .: (0.083336)
  test: #size:                                                  .: (0.089598)
  test: #value:                                                 .: (0.084973)
  test: #values:                                                .: (0.093050)
DownloaderTest:
  #download:
    test: too many redirection:                                 .: (0.001245)
EStatJapanTest:
  anomaly responses:
    test: forbidden access with invalid app_id:                 .: (0.001666)
  app_id:
    test: configure:                                            .: (0.000370)
    test: constructor:                                          .: (0.000199)
    test: env:                                                  .: (0.000195)
    test: env & configure:                                      .: (0.000194)
    test: env & configure & constructor:                        .: (0.000322)
    test: nothing:                                              .: (0.000164)
  parsing records:
    test: parsing records with default option:                  .: (0.001693)
    test: parsing records with hierarchy_selection:             .: (0.002348)
    test: parsing records with skip_nil_(column|row):           .: (0.001604)
  url generation:
    test: generates url correctly:                              .: (0.000370)
FashionMNISTTest:
  Abnormal:
    test: invalid type:                                         .: (0.000285)
  Normal:
    test:
      test: #each:                                              .: (0.041159)
      test: #to_table:                                          .: (0.399451)
      #metadata:
        test: #id:                                              .: (0.000313)
        test: #name:                                            .: (0.000291)
    train:
      test: #each:                                              .: (0.310068)
      test: #to_table:                                          .: (2.539012)
      #metadata:
        test: #id:                                              .: (0.000267)
        test: #name:                                            .: (0.000087)
FuelEconomyTest:
  test: #each:                                                  .: (0.016510)
  #metadata:
    test: #description:                                         .: (0.001305)
GeoloniaTest:
  test: #each:                                                  .: (7.531156)
  #metadata:
    test: #description:                                         .: (0.001492)
HepatitisTest:
  test: #each:                                                  .: (0.004290)
  #metadata:
    test: #description:                                         .: (0.048317)
HouseOfCouncillorTest:
  :bill:
    test: #each:                                                .: (2.304549)
  :in_house_group:
    test: #each:                                                .: (0.002019)
  :member:
    test: #each:                                                .: (0.047492)
  :question:
    test: #each:                                                .: (0.869110)
HouseOfRepresentativeTest:
  test: #each:                                                  .: (2.884005)
ITACorpusTest:
  #metadata:
    test: #description:                                         .: (0.001511)
  type:
    test: emotion:                                              .: (0.001433)
    test: invalid:                                              .: (0.000172)
    test: recitation:                                           .: (0.001565)
IrisTest:
  test: #each:                                                  .: (0.004085)
  #metadata:
    test: #description:                                         .: (0.009697)
JapaneseDateParserTest:
  test: #parse[month and day with leading a space in Heisei]:   .: (0.000289)
  test: #parse[month         with leading a space in Heisei]:   .: (0.000104)
  test: #parse[          day with leading a space in Heisei]:   .: (0.000108)
  test: #parse[           without leading a space in Heisei]:   .: (0.000161)
  test: #parse[year, month and day with leading a space in Reiwa]:      .: (0.000109)
  test: #parse[year, month         with leading a space in Reiwa]:      .: (0.000073)
  test: #parse[year,           day with leading a space in Reiwa]:      .: (0.000068)
  test: #parse[year,            without leading a space in Reiwa]:      .: (0.000066)
  test: #parse[boundary within Heisei]:                         .: (0.000093)
  test: #parse[boundary within Reiwa]:                          .: (0.000067)
  test: unsupported era initial range:                          .: (0.000203)
KuzushijiMNISTTest:
  Abnormal:
    test: invalid type:                                         .: (0.000206)
  Normal:
    test:
      test: #each:                                              .: (0.040145)
      test: #to_table:                                          .: (0.424113)
      #metadata:
        test: #id:                                              .: (0.000201)
        test: #name:                                            .: (0.000079)
    train:
      test: #each:                                              .: (0.294359)
      test: #to_table:                                          .: (2.597233)
      #metadata:
        test: #id:                                              .: (0.000624)
        test: #name:                                            .: (0.000089)
LIBSVMDatasetListTest:
  test: #each:                                                  .: (0.003166)
  #metadata:
    test: #description:                                         .: (0.005273)
LIBSVMDatasetTest:
  test: :default_feature_value:                                 .: (0.003167)
  test: :note:                                                  .: (0.004178)
  test: classification:                                         .: (0.003153)
  test: multi-label:                                            .: (1.426006)
  test: regression:                                             .: (1.012187)
  test: string:                                                 .: (0.000055)
LicenseTest:
  .try_convert:
    test: String:                                               .: (0.000224)
    test: {name:, url:}:                                        .: (0.000095)
    test: {spdx_id:}:                                           .: (0.000256)
LivedoorNewsTest:
  #metadata:
    test: #description:                                         .: (0.000859)
  type:
    test: dokujo_tsushin:                                       .: (0.311255)
    test: invalid:                                              .: (0.000263)
    test: it_life_hack:                                         .: (0.297497)
    test: kaden_channel:                                        .: (0.295787)
    test: livedoor_homme:                                       .: (0.285076)
    test: movie_enter:                                          .: (0.297599)
    test: peachy:                                               .: (0.294360)
    test: smax:                                                 .: (0.298369)
    test: sports_watch:                                         .: (0.292065)
    test: topic_news:                                           .: (0.283887)
MNISTTest:
  Abnormal:
    test: invalid type:                                         .: (0.000362)
  Normal:
    test:
      test: #each:                                              .: (0.028889)
      test: #to_table:                                          .: (0.350796)
      #metadata:
        test: #id:                                              .: (0.000482)
        test: #name:                                            .: (0.000120)
    train:
      test: #each:                                              .: (0.184738)
      test: #to_table:                                          .: (2.296102)
      #metadata:
        test: #id:                                              .: (0.000323)
        test: #name:                                            .: (0.000116)
MetadataTest:
  #licenses:
    test: String:                                               .: (0.000483)
    test: Symbol:                                               .: (0.000232)
    test: [String]:                                             .: (0.000120)
    test: {name:, url:}:                                        .: (0.000114)
MushroomTest:
  test: #each:                                                  .: (0.206307)
  #metadata:
    test: #description:                                         .: (0.006712)
NagoyaUniversityConversationCorpusTest:
  #metadata:
    test: #description:                                         .: (0.000188)
  each:
    test: #participants:                                        .: (1.405967)
    test: #sentences:                                           .: (1.211507)
    test: others:                                               .: (1.210386)
PMJTDatasetListTest:
  test: #each:                                                  .: (0.134741)
PenguinsTest:
  Adelie:
    test: #each:                                                .: (0.021625)
  Chinstrap:
    test: #each:                                                .: (0.047549)
  Gentoo:
    test: #each:                                                .: (0.018874)
  Penguins:
    test: #each:                                                .: (0.052839)
    test: data cleansing:                                       .: (0.050738)
    test: order of species:                                     .: (0.051520)
  PenguinsRawData::SpeciesBase:
    test: #data_path:                                           .: (0.000588)
PennTreebankTest:
  type:
    test: invalid:                                              .: (0.000426)
    test: test:                                                 .: (0.051926)
    test: train:                                                .: (0.621158)
    test: valid:                                                .: (0.039625)
PostalCodeJapanTest:
  :reading:
    test: :lowercase:                                           .: (0.227016)
    test: :romaji:                                              .: (0.233065)
    test: :uppercase:                                           .: (0.175865)
QuoraDuplicateQuestionPairTest:
  test: #each:                                                  .: (22.387830)
RdatasetTest:
  Rdataset:
    test: invalid package name:                                 .: (0.205192)
    datasets:
      test: invalid dataset name:                               .: (0.214120)
      AirPassengers:
        test: #each:                                            .: (0.048332)
        test: #metadata.description:                            .: (0.100297)
        test: #metadata.id:                                     .: (0.041614)
      airquality:
        test: #each:                                            .: (0.045113)
      attenu:
        test: #each:                                            .: (0.070130)
    drc:
      germination:
        test: #each:                                            .: (0.088386)
    validate:
      nace_rev2:
        test: #each:                                            .: (0.308739)
  RdatasetList:
    #each:
      test: with package_name:                                  .: (0.177569)
      test: without package_name:                               .: (0.208933)
SeabornTest:
  attention:
    test_each:                                                  .: (0.003711)
  flights:
    test_each:                                                  .: (0.004510)
  fmri:
    test_each:                                                  .: (0.056336)
  list:
    test_each:                                                  .: (0.001493)
  penguins:
    test_each:                                                  .: (0.019277)
SudachiSynonymDictionaryTest:
  test: #each:                                                  .: (1.022899)
  #metadata:
    test: #description:                                         .: (0.045505)
TableTest:
  test: #column_names:                                          .: (0.005359)
  test: #dictionary_encode:                                     .: (0.005118)
  test: #each:                                                  .: (0.004869)
  test: #each_column:                                           .: (0.004619)
  test: #each_record:                                           .: (0.004896)
  test: #label_encode:                                          .: (0.004576)
  test: #n_columns:                                             .: (0.004048)
  test: #n_rows:                                                .: (0.004077)
  test: #to_h:                                                  .: (0.004782)
  #[]:
    test: index:                                                .: (0.004264)
    test: name:                                                 .: (0.004362)
  #fetch_values:
    test: found:                                                .: (0.004786)
    not found:
      test: with block:                                         .: (0.004534)
      test: without block:                                      .: (0.004533)
  #find_record:
    test: negative:                                             .: (0.004053)
    test: negative - over:                                      .: (0.004574)
    test: positive:                                             .: (0.004088)
    test: positive - over:                                      .: (0.004790)
TestDataset:
  #clear_cache!:
    test: when the dataset is downloaded:                       .: (0.006043)
    test: when the dataset is not downloaded:                   .: (0.004488)
WikipediaKyotoJapaneseEnglishTest:
  test: description:                                            .: (0.000219)
  test: invalid:                                                .: (0.000179)
  article:
    test: #each:                                                /Users/zzz/src/github.com/red-data-tools/red-datasets/lib/datasets/tar-gz-readable.rb:7: warning: attempt to close unfinished zstream; reset forced.
.: (0.004294)
  lexicon:
    test: #each:                                                .: (1.681613)
WikipediaTest:
  en:
    articles:
      test: #each:                                              Failed to read bzcat input: Errno::EPIPE: Broken pipe
.: (1.693823)
      #metadata:
        test: #description:                                     .: (0.000463)
        test: #id:                                              .: (0.000331)
        test: #name:                                            .: (0.000198)
WineTest:
  test: #each:                                                  .: (0.014964)
  #metadata:
    test: #description:                                         .: (0.062812)

Finished in 73.414974 seconds.
-------------------------------------------------------------------------------------
199 tests, 240 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
100% passed
-------------------------------------------------------------------------------------
2.71 tests/s, 3.27 assertions/s
@kou
Copy link
Member

kou commented Dec 28, 2023

Thanks. It seems that the following test is the slowest test:

QuoraDuplicateQuestionPairTest:
  test: #each:                                                  .: (22.387830)

Let's look into this as the first step.

@tikkss
Copy link
Contributor Author

tikkss commented Jan 4, 2024

I'll give it a try to improve the speed of QuoraDuplicateQuestionPairTest. Thanks!

tikkss added a commit to tikkss/red-datasets that referenced this issue Jan 15, 2024
Because date_time and float are not included in tsv.

Before this change:

```bash
$ time ruby -I lib test/run-test.rb -t "/QuoraDuplicateQuestionPairTest/" -v
Loaded suite test
Started
QuoraDuplicateQuestionPairTest:
  test: #each:                                                                                                                  .: (24.666772)

Finished in 24.667106 seconds.
-------------------------------------------------------------------------------------------------------------------------------------------------
1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
100% passed
-------------------------------------------------------------------------------------------------------------------------------------------------
0.04 tests/s, 0.04 assertions/s

real    0m25.270s
user    0m24.633s
sys     0m0.451s
```

After this change:

```bash
$ time ruby -I lib test/run-test.rb -t "/QuoraDuplicateQuestionPairTest/" -v
Loaded suite test
Started
QuoraDuplicateQuestionPairTest:
  test: #each:                                                                                                                  .: (17.476536)

Finished in 17.476912 seconds.
-------------------------------------------------------------------------------------------------------------------------------------------------
1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
100% passed
-------------------------------------------------------------------------------------------------------------------------------------------------
0.06 tests/s, 0.06 assertions/s

real    0m18.117s
user    0m17.554s
sys     0m0.383s
```

Refs red-data-tools#188.
kou pushed a commit that referenced this issue Jan 16, 2024
Because date_time and float are not included in tsv.

Before this change:

```bash
$ time ruby -I lib test/run-test.rb -t "/QuoraDuplicateQuestionPairTest/" -v
Loaded suite test
Started
QuoraDuplicateQuestionPairTest:
  test: #each:                                                                                                                  .: (24.666772)

Finished in 24.667106 seconds.
-------------------------------------------------------------------------------------------------------------------------------------------------
1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
100% passed
-------------------------------------------------------------------------------------------------------------------------------------------------
0.04 tests/s, 0.04 assertions/s

real    0m25.270s
user    0m24.633s
sys     0m0.451s
```

After this change:

```bash
$ time ruby -I lib test/run-test.rb -t "/QuoraDuplicateQuestionPairTest/" -v
Loaded suite test
Started
QuoraDuplicateQuestionPairTest:
  test: #each:                                                                                                                  .: (17.476536)

Finished in 17.476912 seconds.
-------------------------------------------------------------------------------------------------------------------------------------------------
1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
100% passed
-------------------------------------------------------------------------------------------------------------------------------------------------
0.06 tests/s, 0.06 assertions/s

real    0m18.117s
user    0m17.554s
sys     0m0.383s
```

Refs #188.
tikkss added a commit to tikkss/red-datasets that referenced this issue Jan 19, 2024
Because tsv file is too big(404,290 rows).

Before this change:

```bash
$ time ruby -I lib test/run-test.rb -t "/QuoraDuplicateQuestionPairTest/" -v
Loaded suite test
Started
QuoraDuplicateQuestionPairTest:
  test: #each:                                                         .: (17.476536)

Finished in 17.476912 seconds.
-------------------------------------------------------------------------------------
1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
100% passed
-------------------------------------------------------------------------------------
0.06 tests/s, 0.06 assertions/s

real    0m18.117s
user    0m17.554s
sys     0m0.383s
```

After this change:

$ time ruby -I lib test/run-test.rb -t "/QuoraDuplicateQuestionPairTest/" -v
Loaded suite test
Started
QuoraDuplicateQuestionPairTest:
  test: #each:                                                          .: (0.001124)

Finished in 0.001461 seconds.
-------------------------------------------------------------------------------------
1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
100% passed
-------------------------------------------------------------------------------------
684.46 tests/s, 684.46 assertions/s

real    0m0.595s
user    0m0.376s
sys     0m0.126s

Refs red-data-tools#188.
tikkss added a commit to tikkss/red-datasets that referenced this issue Jan 19, 2024
Because tsv file is too big(404,290 rows).

Before this change:

```bash
$ time ruby -I lib test/run-test.rb -t "/QuoraDuplicateQuestionPairTest/" -v
Loaded suite test
Started
QuoraDuplicateQuestionPairTest:
  test: #each:                                                         .: (17.476536)

Finished in 17.476912 seconds.
-------------------------------------------------------------------------------------
1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
100% passed
-------------------------------------------------------------------------------------
0.06 tests/s, 0.06 assertions/s

real    0m18.117s
user    0m17.554s
sys     0m0.383s
```

After this change:

$ time ruby -I lib test/run-test.rb -t "/QuoraDuplicateQuestionPairTest/" -v
Loaded suite test
Started
QuoraDuplicateQuestionPairTest:
  test: #each:                                                          .: (0.001124)

Finished in 0.001461 seconds.
-------------------------------------------------------------------------------------
1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
100% passed
-------------------------------------------------------------------------------------
684.46 tests/s, 684.46 assertions/s

real    0m0.595s
user    0m0.376s
sys     0m0.126s

Refs red-data-tools#188.
tikkss added a commit to tikkss/red-datasets that referenced this issue Jan 19, 2024
Because tsv file is too big(404,290 rows).

Before this change:

```bash
$ time ruby -I lib test/run-test.rb -t "/QuoraDuplicateQuestionPairTest/" -v
Loaded suite test
Started
QuoraDuplicateQuestionPairTest:
  test: #each:                                                         .: (17.476536)

Finished in 17.476912 seconds.
-------------------------------------------------------------------------------------
1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
100% passed
-------------------------------------------------------------------------------------
0.06 tests/s, 0.06 assertions/s

real    0m18.117s
user    0m17.554s
sys     0m0.383s
```

After this change:

```bash
$ time ruby -I lib test/run-test.rb -t "/QuoraDuplicateQuestionPairTest/" -v
Loaded suite test
Started
QuoraDuplicateQuestionPairTest:
  test: #each:                                                          .: (0.001124)

Finished in 0.001461 seconds.
-------------------------------------------------------------------------------------
1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
100% passed
-------------------------------------------------------------------------------------
684.46 tests/s, 684.46 assertions/s

real    0m0.595s
user    0m0.376s
sys     0m0.126s
```

Refs red-data-tools#188.
kou added a commit that referenced this issue Jan 20, 2024
Because tsv file is too big(404,290 rows).

Before this change:

```bash
$ time ruby -I lib test/run-test.rb -t "/QuoraDuplicateQuestionPairTest/" -v
Loaded suite test
Started
QuoraDuplicateQuestionPairTest:
  test: #each:                                                         .: (17.476536)

Finished in 17.476912 seconds.
-------------------------------------------------------------------------------------
1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
100% passed
-------------------------------------------------------------------------------------
0.06 tests/s, 0.06 assertions/s

real    0m18.117s
user    0m17.554s
sys     0m0.383s
```

After this change:

```bash
$ time ruby -I lib test/run-test.rb -t "/QuoraDuplicateQuestionPairTest/" -v
Loaded suite test
Started
QuoraDuplicateQuestionPairTest:
  test: #each:                                                          .: (0.001124)

Finished in 0.001461 seconds.
-------------------------------------------------------------------------------------
1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
100% passed
-------------------------------------------------------------------------------------
684.46 tests/s, 684.46 assertions/s

real    0m0.595s
user    0m0.376s
sys     0m0.126s
```

Refs #188.

---------

Co-authored-by: Sutou Kouhei <[email protected]>
tikkss added a commit to tikkss/red-datasets that referenced this issue Aug 30, 2024
GitHub: red-data-toolsGH-188

Because csv file is too big (277,656 rows).

Before this change:

```console
time ruby test/run-test.rb -t GeoloniaTest --verbose=important-only
Finished in 5.998956 seconds.
2 tests, 2 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m6.603s
user    0m6.146s
sys     0m0.346s
```

After this change:

```console
$ time ruby test/run-test.rb -t GeoloniaTest --verbose=important-only
Finished in 0.001634 seconds.
2 tests, 2 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m0.564s
user    0m0.388s
sys     0m0.087s
```
tikkss added a commit to tikkss/red-datasets that referenced this issue Aug 30, 2024
GitHub: red-data-toolsGH-188

Because csv file is too big (277,656 rows).

Before this change:

```console
$ time ruby test/run-test.rb -t GeoloniaTest --verbose=important-only
Finished in 5.998956 seconds.
2 tests, 2 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m6.603s
user    0m6.146s
sys     0m0.346s
```

After this change:

```console
$ time ruby test/run-test.rb -t GeoloniaTest --verbose=important-only
Finished in 0.001634 seconds.
2 tests, 2 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m0.564s
user    0m0.388s
sys     0m0.087s
```
kou pushed a commit that referenced this issue Aug 31, 2024
GitHub: GH-188

Because csv file is too big (277,656 rows).

Before this change:

```console
$ time ruby test/run-test.rb -t GeoloniaTest --verbose=important-only
Finished in 5.998956 seconds.
2 tests, 2 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m6.603s
user    0m6.146s
sys     0m0.346s
```

After this change:

```console
$ time ruby test/run-test.rb -t GeoloniaTest --verbose=important-only
Finished in 0.001634 seconds.
2 tests, 2 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m0.564s
user    0m0.388s
sys     0m0.087s
```
tikkss added a commit to tikkss/red-datasets that referenced this issue Aug 31, 2024
GitHub: red-data-toolsGH-188

Because date_time is not included in the following csvs.

* diamonds.csv (diamonds dataset inherits from `Ggplot2Dataset`)
* mpg.csv (fuel economy dataset inherits from `Ggplot2Dataset`)

Before this change:

```console
$ time ruby test/run-test.rb -t DiamondsTest --verbose=important-only
Finished in 3.702616 seconds.
2 tests, 2 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m4.665s
user    0m4.136s
sys     0m0.226s
```

After this change:

```console
$ time ruby test/run-test.rb -t DiamondsTest --verbose=important-only
Finished in 3.367821 seconds.
2 tests, 2 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m4.179s
user    0m3.738s
sys     0m0.202s
```

After this change, the fuel economy dataset test also passed:

```console
$ ruby test/run-test.rb -t FuelEconomyTest --verbose=important-only
Finished in 0.017332 seconds.
2 tests, 2 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
```
kou pushed a commit that referenced this issue Sep 1, 2024
GitHub: GH-188

Because date_time is not included in the following csvs.

* diamonds.csv (diamonds dataset inherits from `Ggplot2Dataset`)
* mpg.csv (fuel economy dataset inherits from `Ggplot2Dataset`)

Before this change:

```console
$ time ruby test/run-test.rb -t DiamondsTest --verbose=important-only
Finished in 3.702616 seconds.
2 tests, 2 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m4.665s
user    0m4.136s
sys     0m0.226s
```

After this change:

```console
$ time ruby test/run-test.rb -t DiamondsTest --verbose=important-only
Finished in 3.367821 seconds.
2 tests, 2 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m4.179s
user    0m3.738s
sys     0m0.202s
```

After this change, the fuel economy dataset test also passed:

```console
$ ruby test/run-test.rb -t FuelEconomyTest --verbose=important-only
Finished in 0.017332 seconds.
2 tests, 2 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
```
tikkss added a commit to tikkss/red-datasets that referenced this issue Sep 2, 2024
GitHub: red-data-toolsGH-188

Because csv file is too big (53,940 rows).

Before this change:

```console
$ time ruby test/run-test.rb -t DiamondsTest --verbose=important-only
Finished in 3.367821 seconds.
2 tests, 2 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m4.179s
user    0m3.738s
sys     0m0.202s
```

After this change:

```console
$ time ruby test/run-test.rb -t DiamondsTest --verbose=important-only
Finished in 0.002048 seconds.
2 tests, 2 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m0.551s
user    0m0.379s
sys     0m0.085s
```
kou pushed a commit that referenced this issue Sep 2, 2024
GitHub: GH-188

Because csv file is too big (53,940 rows).

Before this change:

```console
$ time ruby test/run-test.rb -t DiamondsTest --verbose=important-only
Finished in 3.367821 seconds.
2 tests, 2 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m4.179s
user    0m3.738s
sys     0m0.202s
```

After this change:

```console
$ time ruby test/run-test.rb -t DiamondsTest --verbose=important-only
Finished in 0.002048 seconds.
2 tests, 2 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m0.551s
user    0m0.379s
sys     0m0.085s
```
tikkss added a commit to tikkss/red-datasets that referenced this issue Sep 2, 2024
GitHub: red-data-toolsGH-188

Because csv file is too big (10,779 rows).

Before this change:

```console
$ time ruby test/run-test.rb -t HouseOfRepresentativeTest --verbose=important-only
Finished in 2.679117 seconds.
1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m3.295s
user    0m3.031s
sys     0m0.152s
```

After this change:

```console
$ time ruby test/run-test.rb -t HouseOfRepresentativeTest --verbose=important-only
Finished in 0.095506 seconds.
1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m0.659s
user    0m0.465s
sys     0m0.103s
```
tikkss added a commit to tikkss/red-datasets that referenced this issue Sep 2, 2024
GitHub: red-data-toolsGH-188

Because csv file is too big (10,779 rows and 41 columns).

Before this change:

```console
$ time ruby test/run-test.rb -t HouseOfRepresentativeTest --verbose=important-only
Finished in 2.679117 seconds.
1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m3.295s
user    0m3.031s
sys     0m0.152s
```

After this change:

```console
$ time ruby test/run-test.rb -t HouseOfRepresentativeTest --verbose=important-only
Finished in 0.095506 seconds.
1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m0.659s
user    0m0.465s
sys     0m0.103s
```
kou pushed a commit that referenced this issue Sep 3, 2024
GitHub: GH-188

Because csv file is too big (10,779 rows and 41 columns).

Before this change:

```console
$ time ruby test/run-test.rb -t HouseOfRepresentativeTest --verbose=important-only
Finished in 2.679117 seconds.
1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m3.295s
user    0m3.031s
sys     0m0.152s
```

After this change:

```console
$ time ruby test/run-test.rb -t HouseOfRepresentativeTest --verbose=important-only
Finished in 0.095506 seconds.
1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m0.659s
user    0m0.465s
sys     0m0.103s
```
tikkss added a commit to tikkss/red-datasets that referenced this issue Sep 3, 2024
GitHub: red-data-toolsGH-188

Because csv file is too big (32,561 rows).

Before this change:

```console
$ time ruby test/run-test.rb -t AdultTest::train --verbose=important-only
Finished in 2.742848 seconds.
1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m3.366s
user    0m3.064s
sys     0m0.183s
```

After this change:

```console
$ time ruby test/run-test.rb -t AdultTest::train --verbose=important-only
Finished in 0.001817 seconds.
1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m0.587s
user    0m0.395s
sys     0m0.093s
```
kou pushed a commit that referenced this issue Sep 4, 2024
GitHub: GH-188

Because csv file is too big (32,561 rows).

Before this change:

```console
$ time ruby test/run-test.rb -t AdultTest::train --verbose=important-only
Finished in 2.742848 seconds.
1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m3.366s
user    0m3.064s
sys     0m0.183s
```

After this change:

```console
$ time ruby test/run-test.rb -t AdultTest::train --verbose=important-only
Finished in 0.001817 seconds.
1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m0.587s
user    0m0.395s
sys     0m0.093s
```
tikkss added a commit to tikkss/red-datasets that referenced this issue Sep 4, 2024
GitHub: red-data-toolsGH-188

Because csv file is too big (9,462 rows and 39 columns).

Before this change:

```console
$ time ruby test/run-test.rb -t HouseOfCouncillorTest:::bill --verbose=important-only
Finished in 2.352206 seconds.
1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m2.953s
user    0m2.701s
sys     0m0.153s
```

After this change:

```console
$ time ruby test/run-test.rb -t HouseOfCouncillorTest:::bill --verbose=important-only
Finished in 0.002077 seconds.
1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m0.566s
user    0m0.382s
sys     0m0.092s
```
kou pushed a commit that referenced this issue Sep 5, 2024
GitHub: GH-188

Because csv file is too big (9,462 rows and 39 columns).

Before this change:

```console
$ time ruby test/run-test.rb -t HouseOfCouncillorTest:::bill --verbose=important-only
Finished in 2.352206 seconds.
1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m2.953s
user    0m2.701s
sys     0m0.153s
```

After this change:

```console
$ time ruby test/run-test.rb -t HouseOfCouncillorTest:::bill --verbose=important-only
Finished in 0.002077 seconds.
1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m0.566s
user    0m0.382s
sys     0m0.092s
```
tikkss added a commit to tikkss/red-datasets that referenced this issue Sep 5, 2024
GitHub: red-data-toolsGH-188

Because text file is too big (134,555 rows in total).

Before this change:

```console
$ time ruby test/run-test.rb -t NagoyaUniversityConversationCorpusTest --verbose=important-only
Finished in 3.65658 seconds.
4 tests, 4 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m4.550s
user    0m3.723s
sys     0m0.568s
```

After this change:

```console
$ time ruby test/run-test.rb -t NagoyaUniversityConversationCorpusTest --verbose=important-only
Finished in 0.060458 seconds.
4 tests, 4 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m0.613s
user    0m0.428s
sys     0m0.096s
```
kou added a commit that referenced this issue Sep 6, 2024
GitHub: GH-188

Because text file is too big (134,555 rows in total).

Before this change:

```console
$ time ruby test/run-test.rb -t NagoyaUniversityConversationCorpusTest --verbose=important-only
Finished in 3.65658 seconds.
4 tests, 4 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m4.550s
user    0m3.723s
sys     0m0.568s
```

After this change:

```console
$ time ruby test/run-test.rb -t NagoyaUniversityConversationCorpusTest --verbose=important-only
Finished in 0.060458 seconds.
4 tests, 4 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m0.613s
user    0m0.428s
sys     0m0.096s
```

---------

Co-authored-by: Sutou Kouhei <[email protected]>
tikkss added a commit to tikkss/red-datasets that referenced this issue Sep 7, 2024
GitHub: red-data-toolsGH-188

Because csv file is too big (20,640 rows).

Before this change:

```console
$ time ruby test/run-test.rb -t CaliforniaHousingTest --verbose=important-only
Finished in 1.453959 seconds.
2 tests, 2 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m3.029s
user    0m1.745s
sys     0m0.161s
```

After this change:

```console
$ time ruby test/run-test.rb -t CaliforniaHousingTest --verbose=important-only
Finished in 0.331599 seconds.
2 tests, 2 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m1.126s
user    0m0.738s
sys     0m0.143s
```
kou pushed a commit that referenced this issue Sep 8, 2024
GitHub: GH-188

Because csv file is too big (20,640 rows).

Before this change:

```console
$ time ruby test/run-test.rb -t CaliforniaHousingTest --verbose=important-only
Finished in 1.453959 seconds.
2 tests, 2 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m3.029s
user    0m1.745s
sys     0m0.161s
```

After this change:

```console
$ time ruby test/run-test.rb -t CaliforniaHousingTest --verbose=important-only
Finished in 0.331599 seconds.
2 tests, 2 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m1.126s
user    0m0.738s
sys     0m0.143s
```
tikkss added a commit to tikkss/red-datasets that referenced this issue Sep 8, 2024
GitHub: red-data-toolsGH-188

Because csv file is too big (67,753 rows).

Before this change:

```console
$ time ruby test/run-test.rb -t SudachiSynonymDictionaryTest --verbose=important-only
Finished in 1.296801 seconds.
2 tests, 2 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m1.964s
user    0m1.740s
sys     0m0.118s
```

After this change:

```console
$ time ruby test/run-test.rb -t SudachiSynonymDictionaryTest --verbose=important-only
Finished in 0.010658 seconds.
2 tests, 2 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m0.634s
user    0m0.455s
sys     0m0.092s
```
kou pushed a commit that referenced this issue Sep 8, 2024
GitHub: GH-188

Because csv file is too big (67,753 rows).

Before this change:

```console
$ time ruby test/run-test.rb -t SudachiSynonymDictionaryTest --verbose=important-only
Finished in 1.296801 seconds.
2 tests, 2 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m1.964s
user    0m1.740s
sys     0m0.118s
```

After this change:

```console
$ time ruby test/run-test.rb -t SudachiSynonymDictionaryTest --verbose=important-only
Finished in 0.010658 seconds.
2 tests, 2 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m0.634s
user    0m0.455s
sys     0m0.092s
```
tikkss added a commit to tikkss/red-datasets that referenced this issue Sep 9, 2024
GitHub: red-data-toolsGH-188

Because csv file is too big (16,281 rows).

Before this change:

```console
$ time ruby test/run-test.rb -t AdultTest::test --verbose=important-only
Finished in 1.355024 seconds.
1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m1.980s
user    0m1.777s
sys     0m0.113s
```

After this change:

```console
$ time ruby test/run-test.rb -t AdultTest::test --verbose=important-only
Finished in 0.002213 seconds.
1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m0.619s
user    0m0.442s
sys     0m0.089s
```
kou pushed a commit that referenced this issue Sep 9, 2024
GitHub: GH-188

Because csv file is too big (16,281 rows).

Before this change:

```console
$ time ruby test/run-test.rb -t AdultTest::test --verbose=important-only
Finished in 1.355024 seconds.
1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m1.980s
user    0m1.777s
sys     0m0.113s
```

After this change:

```console
$ time ruby test/run-test.rb -t AdultTest::test --verbose=important-only
Finished in 0.002213 seconds.
1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0
notifications

real    0m0.619s
user    0m0.442s
sys     0m0.089s
```
@tikkss
Copy link
Contributor Author

tikkss commented Sep 13, 2024

I have given it a try to improve the execution time below:

  • First, improve the slowest test
  • Next, improve the next slowest test
  • Repeat it until exists slow tests

We reduced the test execution time from about 73 seconds to about 24 seconds (3 times faster!).

It was difficult to more reduce the execution time of the following top 5 slow tests, so this is as far as we can go with red-datasets.

$ ruby -I ../../test-unit/test-unit/lib test/run-test.rb --report-slow-tests --progress-row-max=72
Loaded suite test
Started
|/Users/zzz/src/github.com/red-data-tools/red-datasets/lib/datasets/tar-gz-readable.rb:7: warning: attempt to close unfinished zstream; reset forced.
-Failed to read bzcat input: Errno::EPIPE: Broken pipe
Finished in 23.527626 seconds.
------------------------------------------------------------------------
Top 5 slow tests
test: #to_table(FashionMNISTTest::Normal::train):               2.519765
--location /Users/zzz/src/github.com/red-data-tools/red-datasets/test/test-fashion-mnist.rb:42
test: #to_table(MNISTTest::Normal::train):                      2.305772
--location /Users/zzz/src/github.com/red-data-tools/red-datasets/test/test-mnist.rb:41
test: #to_table(KuzushijiMNISTTest::Normal::train):             2.266870
--location /Users/zzz/src/github.com/red-data-tools/red-datasets/test/test-kuzushiji-mnist.rb:42
test: #each(WikipediaTest::en::articles):                       1.690996
--location /Users/zzz/src/github.com/red-data-tools/red-datasets/test/test-wikipedia.rb:9
test: #each(WikipediaKyotoJapaneseEnglishTest::lexicon):        1.494162
--location /Users/zzz/src/github.com/red-data-tools/red-datasets/test/test-wikipedia-kyoto-japanese-english.rb:137
------------------------------------------------------------------------
201 tests, 242 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
100% passed
------------------------------------------------------------------------
8.54 tests/s, 10.29 assertions/s

More improvements will come from implementing parallelization support in test-unit/test-unit#235.
We'll focus on that next.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants