-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vector library cleanup #473
Conversation
@@ -1307,7 +1307,7 @@ unsigned int compute_ideal_endpoint_formats( | |||
vmask lanes_min_error = vbest_ep_error == hmin(vbest_ep_error); | |||
vbest_error_index = select(vint(0x7FFFFFFF), vbest_error_index, lanes_min_error); | |||
vbest_error_index = hmin(vbest_error_index); | |||
int best_error_index = vbest_error_index.lane<0>(); | |||
int best_error_index = vbest_error_index.lane0(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You squished out the other lane0 accesses into hmax_s etc... any reason not to wrap this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind you fixed it later
Source/UnitTest/test_simd.cpp
Outdated
@@ -169,8 +203,8 @@ TEST(vfloat, ChangeSign) | |||
/** @brief Test VLA atan. */ | |||
TEST(vfloat, Atan) | |||
{ | |||
vfloat a(-0.15f, 0.0f, 0.9f, 2.1f); | |||
vfloat r = atan(a); | |||
vfloa4 a(-0.15f, 0.0f, 0.9f, 2.1f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vfloa4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind you fixed it later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The astcenc vector library effectively implements two different class APIs:
vfloat4
) in the codec.vfloat
) in the codec, and where the width is resolved at compile time.For historical reasons the classes that are only used as a VLA classes (e.g.
vfloat8
for AVX2) implement a lot of functionality which was inherited from the original 4-wide implementation and not actually used in the VLA parts of the codec. This makes adding new VLA implementation (e.g. Arm SVE) more expensive than it needs to be.This PR doesn't add SVE support, but does some cleanup to minimize the vector library API as a precursor to doing so. The main changes are:
.lane<0>()
with dedicated scalar function returns e.g. usehmax_s()
rather thanhmax.lane<0>()
. This was beeing done in places before, but was not done consistently. Now this pattern is used everywhere.