Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] TensorStack (2) #505

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from
Draft

[Feature] TensorStack (2) #505

wants to merge 6 commits into from

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jul 31, 2023

Description

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 31, 2023
@vmoens vmoens added the enhancement New feature or request label Jul 31, 2023
@github-actions
Copy link

github-actions bot commented Jul 31, 2023

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 109. Improved: $\large\color{#35bf28}3$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 47.0010μs 20.1499μs 49.6279 KOps/s 49.2698 KOps/s $\color{#35bf28}+0.73\%$
test_plain_set_stack_nested 0.2260ms 0.1862ms 5.3710 KOps/s 5.2885 KOps/s $\color{#35bf28}+1.56\%$
test_plain_set_nested_inplace 47.9000μs 23.7944μs 42.0267 KOps/s 41.8756 KOps/s $\color{#35bf28}+0.36\%$
test_plain_set_stack_nested_inplace 0.3222ms 0.2221ms 4.5031 KOps/s 4.4930 KOps/s $\color{#35bf28}+0.23\%$
test_items 70.8000μs 3.3839μs 295.5206 KOps/s 272.6459 KOps/s $\textbf{\color{#35bf28}+8.39\%}$
test_items_nested 2.7128ms 0.3915ms 2.5542 KOps/s 2.7356 KOps/s $\textbf{\color{#d91a1a}-6.63\%}$
test_items_nested_locked 0.4459ms 0.3686ms 2.7129 KOps/s 2.7541 KOps/s $\color{#d91a1a}-1.49\%$
test_items_nested_leaf 0.9316ms 0.2273ms 4.4001 KOps/s 4.5189 KOps/s $\color{#d91a1a}-2.63\%$
test_items_stack_nested 2.1763ms 2.0213ms 494.7428 Ops/s 501.2848 Ops/s $\color{#d91a1a}-1.31\%$
test_items_stack_nested_leaf 1.9422ms 1.8370ms 544.3582 Ops/s 547.4333 Ops/s $\color{#d91a1a}-0.56\%$
test_items_stack_nested_locked 6.0711ms 1.0334ms 967.6805 Ops/s 1.0122 KOps/s $\color{#d91a1a}-4.40\%$
test_keys 29.1000μs 5.1275μs 195.0276 KOps/s 197.0544 KOps/s $\color{#d91a1a}-1.03\%$
test_keys_nested 2.4120ms 0.1848ms 5.4104 KOps/s 5.4576 KOps/s $\color{#d91a1a}-0.86\%$
test_keys_nested_locked 0.2670ms 0.1816ms 5.5069 KOps/s 5.5044 KOps/s $\color{#35bf28}+0.05\%$
test_keys_nested_leaf 0.3484ms 0.1754ms 5.7010 KOps/s 5.2578 KOps/s $\textbf{\color{#35bf28}+8.43\%}$
test_keys_stack_nested 1.9296ms 1.7778ms 562.4817 Ops/s 553.3976 Ops/s $\color{#35bf28}+1.64\%$
test_keys_stack_nested_leaf 3.4001ms 1.7857ms 560.0195 Ops/s 554.9385 Ops/s $\color{#35bf28}+0.92\%$
test_keys_stack_nested_locked 3.6881ms 0.7736ms 1.2926 KOps/s 1.3297 KOps/s $\color{#d91a1a}-2.79\%$
test_values 78.8010μs 1.5106μs 661.9741 KOps/s 639.0567 KOps/s $\color{#35bf28}+3.59\%$
test_values_nested 94.1010μs 66.4592μs 15.0468 KOps/s 14.9219 KOps/s $\color{#35bf28}+0.84\%$
test_values_nested_locked 0.1445ms 66.2500μs 15.0943 KOps/s 14.8134 KOps/s $\color{#35bf28}+1.90\%$
test_values_nested_leaf 0.1285ms 58.3131μs 17.1488 KOps/s 16.7929 KOps/s $\color{#35bf28}+2.12\%$
test_values_stack_nested 1.7348ms 1.6116ms 620.4991 Ops/s 620.7697 Ops/s $\color{#d91a1a}-0.04\%$
test_values_stack_nested_leaf 1.7421ms 1.6107ms 620.8547 Ops/s 624.2363 Ops/s $\color{#d91a1a}-0.54\%$
test_values_stack_nested_locked 0.8085ms 0.6562ms 1.5240 KOps/s 1.5455 KOps/s $\color{#d91a1a}-1.40\%$
test_membership 20.9000μs 1.8068μs 553.4715 KOps/s 537.0488 KOps/s $\color{#35bf28}+3.06\%$
test_membership_nested 72.9010μs 3.5793μs 279.3868 KOps/s 277.7211 KOps/s $\color{#35bf28}+0.60\%$
test_membership_nested_leaf 33.1000μs 3.5518μs 281.5439 KOps/s 278.7462 KOps/s $\color{#35bf28}+1.00\%$
test_membership_stacked_nested 83.8010μs 14.4579μs 69.1664 KOps/s 69.1699 KOps/s $-0.01\%$
test_membership_stacked_nested_leaf 40.5000μs 14.1983μs 70.4310 KOps/s 69.2323 KOps/s $\color{#35bf28}+1.73\%$
test_membership_nested_last 24.4000μs 7.5054μs 133.2371 KOps/s 132.8851 KOps/s $\color{#35bf28}+0.26\%$
test_membership_nested_leaf_last 89.8010μs 7.6331μs 131.0085 KOps/s 132.6013 KOps/s $\color{#d91a1a}-1.20\%$
test_membership_stacked_nested_last 0.3393ms 0.2281ms 4.3844 KOps/s 4.3391 KOps/s $\color{#35bf28}+1.05\%$
test_membership_stacked_nested_leaf_last 47.5000μs 16.8307μs 59.4152 KOps/s 58.4912 KOps/s $\color{#35bf28}+1.58\%$
test_nested_getleaf 86.5010μs 15.9019μs 62.8855 KOps/s 62.2017 KOps/s $\color{#35bf28}+1.10\%$
test_nested_get 98.4010μs 15.0458μs 66.4638 KOps/s 65.7156 KOps/s $\color{#35bf28}+1.14\%$
test_stacked_getleaf 1.0312ms 0.8897ms 1.1240 KOps/s 1.1262 KOps/s $\color{#d91a1a}-0.19\%$
test_stacked_get 0.9900ms 0.8482ms 1.1790 KOps/s 1.1846 KOps/s $\color{#d91a1a}-0.47\%$
test_nested_getitemleaf 74.5010μs 15.9200μs 62.8141 KOps/s 62.8558 KOps/s $\color{#d91a1a}-0.07\%$
test_nested_getitem 85.5010μs 15.0824μs 66.3023 KOps/s 65.7780 KOps/s $\color{#35bf28}+0.80\%$
test_stacked_getitemleaf 1.0197ms 0.8868ms 1.1276 KOps/s 1.1258 KOps/s $\color{#35bf28}+0.16\%$
test_stacked_getitem 0.9921ms 0.8484ms 1.1787 KOps/s 1.1811 KOps/s $\color{#d91a1a}-0.20\%$
test_lock_nested 88.7786ms 1.5340ms 651.8705 Ops/s 699.5552 Ops/s $\textbf{\color{#d91a1a}-6.82\%}$
test_lock_stack_nested 0.1125s 21.2738ms 47.0062 Ops/s 51.4122 Ops/s $\textbf{\color{#d91a1a}-8.57\%}$
test_unlock_nested 87.7361ms 1.5401ms 649.3264 Ops/s 652.6771 Ops/s $\color{#d91a1a}-0.51\%$
test_unlock_stack_nested 0.1148s 21.5142ms 46.4810 Ops/s 50.4989 Ops/s $\textbf{\color{#d91a1a}-7.96\%}$
test_flatten_speed 1.1016ms 1.0105ms 989.5929 Ops/s 988.8134 Ops/s $\color{#35bf28}+0.08\%$
test_unflatten_speed 2.1890ms 1.8398ms 543.5456 Ops/s 543.3914 Ops/s $\color{#35bf28}+0.03\%$
test_common_ops 4.4474ms 1.1169ms 895.3239 Ops/s 908.5515 Ops/s $\color{#d91a1a}-1.46\%$
test_creation 39.0000μs 6.3003μs 158.7215 KOps/s 160.6572 KOps/s $\color{#d91a1a}-1.20\%$
test_creation_empty 30.4000μs 13.6338μs 73.3473 KOps/s 73.9411 KOps/s $\color{#d91a1a}-0.80\%$
test_creation_nested_1 0.1055ms 25.0740μs 39.8819 KOps/s 40.0982 KOps/s $\color{#d91a1a}-0.54\%$
test_creation_nested_2 78.9010μs 27.9101μs 35.8293 KOps/s 37.0691 KOps/s $\color{#d91a1a}-3.34\%$
test_clone 0.1757ms 25.2121μs 39.6636 KOps/s 40.7457 KOps/s $\color{#d91a1a}-2.66\%$
test_getitem[int] 0.1088ms 27.8847μs 35.8619 KOps/s 36.4643 KOps/s $\color{#d91a1a}-1.65\%$
test_getitem[slice_int] 88.8010μs 54.7542μs 18.2634 KOps/s 18.5799 KOps/s $\color{#d91a1a}-1.70\%$
test_getitem[range] 0.1625ms 82.2345μs 12.1603 KOps/s 12.3329 KOps/s $\color{#d91a1a}-1.40\%$
test_getitem[tuple] 0.1603ms 45.3085μs 22.0709 KOps/s 22.2390 KOps/s $\color{#d91a1a}-0.76\%$
test_getitem[list] 0.3230ms 77.8568μs 12.8441 KOps/s 13.1102 KOps/s $\color{#d91a1a}-2.03\%$
test_setitem_dim[int] 57.4010μs 33.2274μs 30.0957 KOps/s 30.7041 KOps/s $\color{#d91a1a}-1.98\%$
test_setitem_dim[slice_int] 79.8010μs 58.9551μs 16.9621 KOps/s 17.1940 KOps/s $\color{#d91a1a}-1.35\%$
test_setitem_dim[range] 0.1060ms 80.2272μs 12.4646 KOps/s 12.6353 KOps/s $\color{#d91a1a}-1.35\%$
test_setitem_dim[tuple] 70.5010μs 49.0442μs 20.3898 KOps/s 20.8206 KOps/s $\color{#d91a1a}-2.07\%$
test_setitem 0.2230ms 33.2175μs 30.1046 KOps/s 31.0265 KOps/s $\color{#d91a1a}-2.97\%$
test_set 2.1052ms 32.1825μs 31.0728 KOps/s 32.2712 KOps/s $\color{#d91a1a}-3.71\%$
test_set_shared 3.8627ms 0.1797ms 5.5646 KOps/s 5.5394 KOps/s $\color{#35bf28}+0.45\%$
test_update 0.2208ms 35.9863μs 27.7884 KOps/s 28.0488 KOps/s $\color{#d91a1a}-0.93\%$
test_update_nested 0.2586ms 53.2287μs 18.7869 KOps/s 19.0306 KOps/s $\color{#d91a1a}-1.28\%$
test_set_nested 0.2262ms 35.3385μs 28.2978 KOps/s 29.3735 KOps/s $\color{#d91a1a}-3.66\%$
test_set_nested_new 0.2629ms 53.6117μs 18.6527 KOps/s 18.9710 KOps/s $\color{#d91a1a}-1.68\%$
test_select 0.2983ms 97.8248μs 10.2224 KOps/s 10.2923 KOps/s $\color{#d91a1a}-0.68\%$
test_unbind_speed 0.7005ms 0.6584ms 1.5189 KOps/s 1.5221 KOps/s $\color{#d91a1a}-0.22\%$
test_unbind_speed_stack0 97.4733ms 9.1368ms 109.4477 Ops/s 107.9531 Ops/s $\color{#35bf28}+1.38\%$
test_unbind_speed_stack1 77.4010μs 1.1514μs 868.5078 KOps/s 864.9958 KOps/s $\color{#35bf28}+0.41\%$
test_creation[device0] 4.0811ms 0.4600ms 2.1738 KOps/s 2.2005 KOps/s $\color{#d91a1a}-1.21\%$
test_creation_from_tensor 3.6747ms 0.5065ms 1.9745 KOps/s 1.9377 KOps/s $\color{#35bf28}+1.90\%$
test_add_one[memmap_tensor0] 2.5653ms 33.6430μs 29.7238 KOps/s 30.7856 KOps/s $\color{#d91a1a}-3.45\%$
test_contiguous[memmap_tensor0] 45.3000μs 8.6699μs 115.3422 KOps/s 113.6177 KOps/s $\color{#35bf28}+1.52\%$
test_stack[memmap_tensor0] 81.3010μs 27.1546μs 36.8262 KOps/s 37.5415 KOps/s $\color{#d91a1a}-1.91\%$
test_memmaptd_index 0.4378ms 0.3181ms 3.1437 KOps/s 3.0939 KOps/s $\color{#35bf28}+1.61\%$
test_memmaptd_index_astensor 1.4929ms 1.3727ms 728.5009 Ops/s 731.5983 Ops/s $\color{#d91a1a}-0.42\%$
test_memmaptd_index_op 2.8805ms 2.6322ms 379.9095 Ops/s 380.9467 Ops/s $\color{#d91a1a}-0.27\%$
test_reshape_pytree 0.1119ms 37.9807μs 26.3292 KOps/s 26.3145 KOps/s $\color{#35bf28}+0.06\%$
test_reshape_td 77.9010μs 46.5345μs 21.4894 KOps/s 22.2109 KOps/s $\color{#d91a1a}-3.25\%$
test_view_pytree 0.1622ms 35.5732μs 28.1111 KOps/s 28.7159 KOps/s $\color{#d91a1a}-2.11\%$
test_view_td 40.2010μs 8.7574μs 114.1897 KOps/s 113.3673 KOps/s $\color{#35bf28}+0.73\%$
test_unbind_pytree 81.6010μs 38.7667μs 25.7953 KOps/s 25.9015 KOps/s $\color{#d91a1a}-0.41\%$
test_unbind_td 0.2777ms 96.7721μs 10.3336 KOps/s 10.0404 KOps/s $\color{#35bf28}+2.92\%$
test_split_pytree 94.1010μs 45.1784μs 22.1345 KOps/s 22.0312 KOps/s $\color{#35bf28}+0.47\%$
test_split_td 1.0535ms 0.1165ms 8.5830 KOps/s 8.6373 KOps/s $\color{#d91a1a}-0.63\%$
test_add_pytree 91.2010μs 47.7415μs 20.9461 KOps/s 21.3212 KOps/s $\color{#d91a1a}-1.76\%$
test_add_td 0.1391ms 76.5055μs 13.0710 KOps/s 13.4475 KOps/s $\color{#d91a1a}-2.80\%$
test_distributed 24.8000μs 8.8987μs 112.3757 KOps/s 113.7444 KOps/s $\color{#d91a1a}-1.20\%$
test_tdmodule 0.1935ms 28.7011μs 34.8418 KOps/s 33.7680 KOps/s $\color{#35bf28}+3.18\%$
test_tdmodule_dispatch 0.2951ms 55.5519μs 18.0012 KOps/s 17.7050 KOps/s $\color{#35bf28}+1.67\%$
test_tdseq 0.5754ms 33.2287μs 30.0945 KOps/s 30.7940 KOps/s $\color{#d91a1a}-2.27\%$
test_tdseq_dispatch 0.5423ms 67.3866μs 14.8397 KOps/s 14.8234 KOps/s $\color{#35bf28}+0.11\%$
test_instantiation_functorch 1.8086ms 1.6440ms 608.2630 Ops/s 610.6758 Ops/s $\color{#d91a1a}-0.40\%$
test_instantiation_td 2.0577ms 1.3657ms 732.2513 Ops/s 728.1060 Ops/s $\color{#35bf28}+0.57\%$
test_exec_functorch 0.2481ms 0.1878ms 5.3246 KOps/s 5.3302 KOps/s $\color{#d91a1a}-0.10\%$
test_exec_td 0.2611ms 0.1783ms 5.6074 KOps/s 5.5642 KOps/s $\color{#35bf28}+0.78\%$
test_vmap_mlp_speed[True-True] 7.5885ms 1.2071ms 828.4323 Ops/s 834.7225 Ops/s $\color{#d91a1a}-0.75\%$
test_vmap_mlp_speed[True-False] 1.0031ms 0.5947ms 1.6814 KOps/s 1.6406 KOps/s $\color{#35bf28}+2.49\%$
test_vmap_mlp_speed[False-True] 14.5648ms 1.0449ms 957.0605 Ops/s 991.8399 Ops/s $\color{#d91a1a}-3.51\%$
test_vmap_mlp_speed[False-False] 3.2552ms 0.4534ms 2.2055 KOps/s 2.1998 KOps/s $\color{#35bf28}+0.26\%$
test_vmap_transformer_speed[True-True] 26.6194ms 14.3297ms 69.7852 Ops/s 66.0110 Ops/s $\textbf{\color{#35bf28}+5.72\%}$
test_vmap_transformer_speed[True-False] 17.7368ms 9.2223ms 108.4331 Ops/s 107.3142 Ops/s $\color{#35bf28}+1.04\%$
test_vmap_transformer_speed[False-True] 17.4460ms 13.7574ms 72.6880 Ops/s 73.1794 Ops/s $\color{#d91a1a}-0.67\%$
test_vmap_transformer_speed[False-False] 21.6909ms 9.0531ms 110.4589 Ops/s 115.5304 Ops/s $\color{#d91a1a}-4.39\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants