Bug 11619 - illegal instruction in libatlas from scamp
Summary: illegal instruction in libatlas from scamp
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: Cauldron
Hardware: x86_64 Linux
Priority: Normal critical
Target Milestone: ---
Assignee: Dimitri Jakov
QA Contact:
URL:
Whiteboard:
Keywords: Triaged
Depends on:
Blocks:
 
Reported: 2013-11-07 22:48 CET by Chris Denice
Modified: 2013-11-18 00:25 CET (History)
1 user (show)

See Also:
Source RPM: atlas-3.8.4-3.mga4.src.rpm
CVE:
Status comment:


Attachments

Description Chris Denice 2013-11-07 22:48:12 CET
Description of problem:

Running the "scamp" program (package scamp) on some heavy task crashes with an illegal instruction within libatlas.

Within gdb, I got this output:
Program received signal SIGILL, Illegal instruction.
0x00002aaaad01722d in ATL_dJIK52x52x52TN52x52x0_a1_b0 () from /usr/lib64/atlas/libatlas.so.3

That kind of error can only come from the usage of non-generic instruction to the CPU, which sounds like too much optimization put on the libatlas package.

Similar bugs were affecting Debian as well, and they upgraded to 3.10
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=682591

It does not sound neither as a bug in Scamp:
check the last post there (from the developper):
http://www.astromatic.net/forum/showthread.php?tid=270

My cpu is:

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                2
On-line CPU(s) list:   0,1
Thread(s) per core:    1
Core(s) per socket:    2
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 15
Model name:            Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz
Stepping:              6
CPU MHz:               2394.000
CPU max MHz:           2394.0000
CPU min MHz:           1596.0000
BogoMIPS:              4808.15
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              4096K
NUMA node0 CPU(s):     0,1



Would be nice to fix this before release !

Thanks,
cheers,
chris.


Reproducible: 

Steps to Reproduce:
Chris Denice 2013-11-07 22:48:42 CET

CC: (none) => joequant

Comment 1 Chris Denice 2013-11-07 23:41:15 CET
Hi there,
I was trying to recompile our package atlas locally, and I am seeing optimization specific flags to my machine appearing in the logs:

-DATL_ARCH_Core2 -DATL_CPUMHZ=2394

so I suspect we have the same problem on the build server, libatlas may have been optimized to it and crashes on other hardware?
Manuel Hiebel 2013-11-08 19:53:22 CET

Keywords: (none) => Triaged
Assignee: bugsquad => mitya

Comment 2 Chris Denice 2013-11-14 01:10:13 CET
Hi Dimitri (maintainer),
If you're busy I can start debugging this. I suspect that some avx instruction are swithched on by default, addind a few sed to kill them on Make.inc should do the job.

This must surely be fixed before released as it threatens all these packages:
lib64gsl0 (<--- not good that's use everywhere)
ncl
ocaml-gsl
psfex
scamp
sextractor

cheers.
Comment 3 Dimitri Jakov 2013-11-14 08:16:48 CET
(In reply to Chris Denice from comment #2)
> Hi Dimitri (maintainer),
> If you're busy I can start debugging this.

Hi Chris, I would appreciate that as I don't have much time to maintain ATLAS now. Ideally, we should update to ATLAS 3.10 for Mageia 4, but I'm not sure if we have enough time for that.
Comment 4 Chris Denice 2013-11-14 10:52:55 CET
No pb, I'll have a look to what is doable in the remaining time.
cheers,
chris.
Comment 5 Chris Denice 2013-11-18 00:25:35 CET
Works for me now, I patched atlas 3.8.4-4mga and added some sed to remove optimization on the build server.

Please reopen if there is any issues. I suggest to wait mga5 to upgrade to 3.10 as we would need to recomplie many progs now otherwise.

Cheers,
chris.

Status: NEW => RESOLVED
Resolution: (none) => FIXED


Note You need to log in before you can comment on or make changes to this bug.