Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSE dectetion made wrong assumptions on SSSE3-less processor #140

Open
wants to merge 2 commits into
base: frame-parallel
Choose a base branch
from

Conversation

YggdrasiI
Copy link

@YggdrasiI YggdrasiI commented Nov 3, 2016

Hello,

it seems that on some processors invalid SSE instructions will be used.
If I run the decoder on an AMD Turion it crashes immediately with:

Error: SIGILL – Ungültiger Maschinenbefehl (Speicherabzug geschrieben)

This pull request try to fix the issues due avoiding of non-available instructions.

Regards,
Olaf Schulz

========================================================================

Reason are the usage of SSSE3 or SSE4.1 instruction in functions which
are assumed as SSE2 compatible.
Following list contains the instruction for each of the failing functions.

It could be useful to restore older variants of the functions, if available.
(I had not checked if such functions are available.)

  1. The detection for have_SSSE3 seems to be wrong. It returns true
    but the processor does not support it.

Detected flags:
MMX:1 SSE:1 SSE2:1 SSE3:1 SSSE3:1 SSE4a:0 SSE4_1:0 SSE4_2:0 AVX:0 AVX2:0
^^^^^^^ wrong detection for AMD Turion, see Appendix A

Solution: Use the correct register, ecx, not edx.
int have_SSSE3 = !!(ecx & (1<< 9));

  1. if (have_SSE2) { accel->sao_band_8 = sao_band_8bit_sse2; } (End of init_acceleration_functions_sse )

Nemiver debugger stopped at
x00007ffff7bb108e <_Z18sao_band_8bit_sse2PhiPKhiiiiiiii+110>: pshufb %xmm0,%xmm1

pshufb is a SSSE3 command, which is not supported by the processor.

Solution: Change line to (and rename function).
if (have_SSSE3) { accel->sao_band_8 = sao_band_8bit_sse2; }

  1. if (have_SSE2) { accel->put_hevc_qpel_8[0][0] = put_hevc_luma_direct_8_sse2; }

Nemiver debugger stopped at
0x00007ffff7ba89df <Z27put_hevc_luma_direct_8_sse2PslPKhliiS+207>: pinsrq $0x0,(%r11),%xmm0

pinsrq is a SSE4.1 command
Solution: Shift function into if(have_SSE4_1)-wrapper (and restore
older SSE2 variant if available.)

  1. if (have_SSE2) { accel->put_hevc_epel_8 = put_hevc_chroma_direct_8_sse2; }

Nemiver debugger stopped at
0x00007ffff7ba8b3f <Z29put_hevc_chroma_direct_8_sse2PslPKhliiiiS+207>: pinsrq $0x0,(%r11),%xmm0

Solution: Like 2.

  1. 98 #if HAVE_SSE4_1
    99 if (have_SSE2) { [...]

Switch for put_pred_8_sse2 and put_bipred_8_sse2. Both function contain the
instruction pextrd (SSE4.1):

Nemiver debugger stopped at
0x00007ffff7ba7f6f <_Z15put_pred_8_sse2PhlPKslii+447>: pextrd $0x0,%xmm0,%r8d
0x00007ffff7ba8107 <_Z17put_bipred_8_sse2PhlPKsS1_lii+375>: pextrd $0x0,%xmm0,%r10d

Solution: Like 2.

Appendix A)
cat /proc/cpuinfo

processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 36
model name : AMD Turion(tm) 64 Mobile Technology MT-32
stepping : 2
cpu MHz : 800.000
cache size : 512 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good nopl pni lahf_lm vmmcall
bogomips : 1600.02
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc

• Change detection of available SSE instructions.
It changes the selected SSE functions in the case
 1=have_SSE2=have_SSE3, 0=have_SSSE3=have_SSSE4_1.
@farindk
Copy link
Contributor

farindk commented Nov 23, 2016

Hello Olaf,

thanks for this pull request. I've integrated the CMakefile fix, but solved the SSE problem in a different way. The real problem here was that the compiler generates SSE4 instructions even when no explicit SSE4 intrinsics have been used.

I changed the configuration script such that there are now two options (--enable-sse4 and --enable-sse2) controlling what kind of SSE instruction set is used. Since both options are on by default, you actually have to set --disable-sse4 to get SSE2 compatible code. The configure script will now also tell at the end which SSE instruction set it compile for (it will always fall back to scalar code at runtime when the CPU does not support it).

Regards,
Dirk Farin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants