You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Alongside VFMA.F16/VFMS.F16, AArch32 offers VMLA.F16/VMLS.F16 instructions which performs multiply-add operation with intermediate rounding. Importantly, the vector-by-vector lane form (e.g. VMLA.F16 Qd, Qn, Dm[x]) on AArch32 is supported only for VMLA/VMLS instructions, and not for VFMA/VFMS instructions.
The NEON intrinsics specification lacks intrinsics for the VMLA/VMLS instructions. In particular, it makes impossible to achieve peak performance on half-precision matrix-matrix multiplication in AArch32 using NEON intrinsics, because the optimal implementation would use the VMLA.F16 Qd, Qn, Dm[x] instructions.
I request that NEON specification be updated to include the following intrinsics for AArch32:
Alongside
VFMA.F16
/VFMS.F16
, AArch32 offersVMLA.F16
/VMLS.F16
instructions which performs multiply-add operation with intermediate rounding. Importantly, the vector-by-vector lane form (e.g.VMLA.F16 Qd, Qn, Dm[x]
) on AArch32 is supported only forVMLA
/VMLS
instructions, and not forVFMA
/VFMS
instructions.The NEON intrinsics specification lacks intrinsics for the
VMLA
/VMLS
instructions. In particular, it makes impossible to achieve peak performance on half-precision matrix-matrix multiplication in AArch32 using NEON intrinsics, because the optimal implementation would use theVMLA.F16 Qd, Qn, Dm[x]
instructions.I request that NEON specification be updated to include the following intrinsics for AArch32:
vmla_f16
(VMLA.F16 Dd, Dn, Dm
)vmls_f16
(VMLS.F16 Dd, Dn, Dm
)vmlaq_f16
(VMLA.F16 Qd, Qn, Qm
)vmlsq_f16
(VMLS.F16 Qd, Qn, Qm
)vmla_lane_f16
(VMLA.F16 Dd, Dn, Dm[x]
)vmls_lane_f16
(VMLS.F16 Dd, Dn, Dm[x]
)vmlaq_lane_f16
/vmlaq_laneq_f16
(VMLA.F16 Qd, Qn, Dm[x]
)vmlsq_lane_f16
/vmlsq_laneq_f16
(VMLS.F16 Qd, Qn, Dm[x]
)The text was updated successfully, but these errors were encountered: