Bug 6285 - System freezes with general protection fault
Summary: System freezes with general protection fault
Status: RESOLVED WORKSFORME
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: 2
Hardware: x86_64 Linux
Priority: Normal normal
Target Milestone: ---
Assignee: Mageia Bug Squad
QA Contact:
URL: https://bugs.mageia.org/show_bug.cgi?...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-06-02 09:45 CEST by Dimitrios Glentadakis
Modified: 2014-03-04 07:13 CET (History)
2 users (show)

See Also:
Source RPM:
CVE:
Status comment:


Attachments
Output of lspcidrake (3.14 KB, text/plain)
2012-06-15 06:38 CEST, Dimitrios Glentadakis
Details
Memtest86+ screenshot (484.29 KB, image/jpeg)
2012-06-30 08:55 CEST, Dimitrios Glentadakis
Details

Description Dimitrios Glentadakis 2012-06-02 09:45:36 CEST
Some times the system freezes, i cannot move anything even the mouse pointer is freezed too.
I restart with CTRL+ALT+syst+REUISB


I have two systemlog logs from June 1
http://glenbox.free.fr/files/Mageia/gpf_bug/syslog1

and June 2:
http://glenbox.free.fr/files/Mageia/gpf_bug/syslog2
Comment 1 Dave Hodgins 2012-06-02 19:11:13 CEST
Based on the line ...
Jun  2 06:45:04 localhost kernel: [    1.135649] ata1.01: limited to UDMA/33 due to 40-wire cable
I'd strongly suggest replacing the ide cable with an 80-wire cable.

A google search on "wiki 80-wire ide cable general protection fault"
does indicate that can be a result of using a 40-wire cable.

That may be unrelated to the cause here, I don't know much about
trouble shooting gpfs, but that's the only obvious thing I see
in the log.

CC: (none) => davidwhodgins

Comment 2 Dimitrios Glentadakis 2012-06-02 20:32:01 CEST
I remember that the problems begun before one month when i putted two screws to fix the second hard disc in the box because some times was making a noise. So i fixed the hard disc with a screw, i did nt have the noise any more but i have the problem with the freezes.

Now, i open the computer and i push the connectors of the cable to be sure that the cable is well plugged to the disc and to the motherboard. After that i was nt able to boot the computer. I did nt hear the beep at startup and the monitor does nt activated.

I unplugged the cable from the second hard drive, and the computer can boot but it stopped in an emergency mode and it asked me the root password or to press CTRL+D. I think it could just ignore the 2nd drive without stopping the boot.

I logged as root and i commented the line with the 2nd drive in fstab.
I save the file and by fault i typed "exit". Then the system start a beep continuously, and i turned it off by the button behind the machine.

After that i could boot normally the computer.

So, you have absolutely right about your diagnostic, there is in fact a problem with the cable... i will buy a new one (80-wire) :)
Comment 3 Dave Hodgins 2012-06-02 22:47:27 CEST
I would strongly advise checking the output of "smartctl -a /dev/sda" (replace
sda with the appropriate device), after replacing the cable, to ensure the
drive itself is ok.  If the screw used to reduce vibration of the drive is too
long, it can damage the drive.

I'll close this bug as invalid, as it seems to be a hardware issue.  Feel free
to reopen, if the problem returns after replacing the cable, and ensuring
smartctl shows no errors for the drive.

Status: NEW => RESOLVED
Resolution: (none) => INVALID

Comment 4 Dimitrios Glentadakis 2012-06-14 07:08:22 CEST
Yesterday, i had again the 'general protection fault' with a total freeze. I did nt replace the cable but i have it unplugged.

Here is the syslog:
http://glenbox.free.fr/files/Mageia/gpf_bug/syslog3  16.6MB

I don't know where we could see the warning about the cable before, could you see if it is still present now ?

* I bought my computer November 2010

Status: RESOLVED => UNCONFIRMED
Resolution: INVALID => (none)
Ever confirmed: 1 => 0

Comment 5 Dave Hodgins 2012-06-15 03:47:54 CEST
No sign of the warning about the 40 wire cable.

You should enable the non-free repository and install the kernel-firmware-nonfree
package. That should get rid of the error message
Jun 13 05:13:13 localhost kernel: [   33.737616] r8169 0000:02:00.0: eth0: unable to load firmware patch rtl_nic/rtl8168d-2.fw (-2)

That may be the cause of the gpf, as it's happening during mgapplet checking for
updates.
Comment 6 Dimitrios Glentadakis 2012-06-15 06:37:51 CEST
Ok, i installed it and i will test it.
I upload the output of lspcidrake too , in case that you need an information from there.

Thanks a million :)
Comment 7 Dimitrios Glentadakis 2012-06-15 06:38:23 CEST
Created attachment 2461 [details]
Output of lspcidrake
Comment 8 Dimitrios Glentadakis 2012-06-16 07:29:33 CEST
I had a freeze this morning and i could only restarted the computer by the reset button.
In the syslog i did nt find any 'general protection fault' entry:
http://glenbox.free.fr/files/Mageia/gpf_bug/syslog4
Comment 9 Dave Hodgins 2012-06-16 23:42:41 CEST
(In reply to comment #8)
> I had a freeze this morning and i could only restarted the computer by the
> reset button.
> In the syslog i did nt find any 'general protection fault' entry:
> http://glenbox.free.fr/files/Mageia/gpf_bug/syslog4

That appears to be the syslog from the restart after
the freeze, as it shows the cleanup of orphan inodes
on sda1.

For debugging, the syslog from just before this, to the
prior "/proc/kmsg started" has to be looked at.
Comment 10 Dimitrios Glentadakis 2012-06-17 06:36:25 CEST
I get the log for the 16 june with this :
cat /var/log/syslog | grep 'Jun 16'

I tried again and i dont have other entries.
This one (syslog4 above) starts with the "/proc/kmsg started"
Comment 11 Dimitrios Glentadakis 2012-06-19 07:25:45 CEST
I have a new general protection fault with system freeze just when i click on "Shutdown"

http://glenbox.free.fr/files/Mageia/gpf_bug/syslog5
Comment 12 Dimitrios Glentadakis 2012-06-19 07:26:55 CEST
(In reply to comment #11)
> when i click on

...when i "clicked" on.
Comment 13 Dave Hodgins 2012-06-20 21:31:14 CEST
Looks like the problems started about 10 minutes after the start.

Jun 19 06:39:29 localhost kernel: [  616.473774] nepomukindexer[6732] trap divide error ip:7f0b354e6281 sp:7fff6d3662e0 error:0 in strigiea_riff.so[7f0b354e4000+5000]
followed by nepomukservices[3652]: segfault at 06:54:58
followed by general protection fault at 07:09:31

$ grep "general protection fault" syslog5|wc -l
73

Adding Thomas to the cc list.  Any suggestions?

I suspect a hardware problem, but don't know how to narrow
down what is actually causing the problem.

CC: (none) => tmb

Comment 14 Dimitrios Glentadakis 2012-06-22 20:39:11 CEST
A new log, i had a freeze between 20:30 - 20:35. I rebooted with the reset button.
http://glenbox.free.fr/files/Mageia/gpf_bug/syslog6 15.0M
Comment 15 Dimitrios Glentadakis 2012-06-25 21:05:08 CEST
Yesterday i replugged the second disc and today (before ~20 min) i had a freeze
http://glenbox.free.fr/files/Mageia/gpf_bug/syslog7

The cable is not 40 wire but 80.
Comment 16 Dimitrios Glentadakis 2012-06-25 21:30:32 CEST
I think that i have warranty until octobre or novembre but if it is a hardware problem what i will say to change a part. I bought the computer bright new withoit operation system
Comment 17 Dave Hodgins 2012-06-26 02:33:46 CEST
I'm pretty sure it's a hardware problem, but confirming which piece(s) is/are the
cause is not going to be easy.

As it seems to mostly happen during network access.  It could be a bad network
interface, bad ram, or other motherboard problems.

I'd start by running memtest86+ overnight.  If that doesn't show any problems,
it's a matter of disabling hardware one piece at a time, till the problem goes
away, and then adding it back to confirm it is causing it.

You could try disabling the onboard nic in the bios setup, and using the wireless
nic (assuming you have a wireless router),

One of the most frustrating hardware issues I've encountered in the past, was with
an under powered power supply. The system had a 500W ps with 24 amps on the +12v
rail.  The video card (nvidia 9600gt) requires 26amps on +12.  The result was a
couple of months of gpfs, spontaneous reboots, etc, till I tracked down the problem
by really overloading it, by adding some additional usb devices.  With any two of
three devices it was fine.  With all three, the system wouldn't even boot.
Replacing the ps with a 700W ps with 52 amps on th +12v rail fixed the problems
in that system. It was only after I realized it was a power problem, that I found
the power requirements for the video card.

It doesn't have to be bad hardware.  It could be as simple as a connector that's
not fully plugged in.  It could also be a bad driver, but then I'd expect a
search on the driver name and gpf to show many results.
Comment 18 Dimitrios Glentadakis 2012-06-26 09:13:32 CEST
I dont understand the part about the network. I have to disable it from the bios ?
for the memory, i installed the memtest86+ but i dont know how to run it. Also, in its description i saw that it is for 386 architecture but i have 86_64, maybe is another tool that i need ?
Comment 19 Dimitrios Glentadakis 2012-06-28 18:33:48 CEST
i did the memtest86+ for about 10 hours. No error detected.
Comment 20 Dimitrios Glentadakis 2012-06-30 08:55:29 CEST
Created attachment 2508 [details]
Memtest86+ screenshot
Comment 21 Dave Hodgins 2012-07-01 22:21:31 CEST
Usually the bios setup will have an option to disable to onboard network interface.

Try disabling that, assuming you can use the wlan for internet access, to see if
that will provide a stable system.
Comment 22 Dimitrios Glentadakis 2012-07-01 22:28:05 CEST
Yesterday i brought the computer to the shop for a hardware controle. I told him after the tests, if he does nt find anything to change the power simply with a stronger one.
Comment 23 Dimitrios Glentadakis 2012-08-05 09:51:00 CEST
I got the computer yesterday morning and until now i dont have any problem with it.
The thechnicien told me that the second hard drive is deffective because sometimes it hangs the boot (before grub) and sometimes it is nt recognized in the bios. He tried another cable with same resaults so i removed the drive completely and he changed the power with a stronger one. I think that the power was the cause for the general protection fault as i have it with the second drive unplugged too.

I colse the bug and if something appears i will reply. It is important to have the support of you here, even in hardware problems , at least to determine and interpret the system logs.

Dave thanks a lot for your support, i could nt do without it!!

Status: UNCONFIRMED => RESOLVED
Resolution: (none) => INVALID

Comment 24 Dimitrios Glentadakis 2012-12-28 07:34:35 CET
Since one week i have freezes after a cold boot.
I see in the syslog that there are many "general protection faults"
http://glenbox.free.fr/files/Mageia/gpf_bug/syslog8

Before one week i reported a kwin crash problem:
https://bugs.kde.org/show_bug.cgi?id=311863

Status: RESOLVED => REOPENED
Resolution: INVALID => (none)
Ever confirmed: 0 => 1

Comment 25 Dave Hodgins 2012-12-28 22:06:11 CET
One thing I noticed in the log that may help ...
microcode: failed to load file amd-ucode/microcode_am
d.bin

Try installing the microcode package from the nonfree repository.

Also, ensure cronie-anacron is installed, so that the logs
get rotated (weekly for syslog).
Comment 26 Dimitrios Glentadakis 2013-01-02 19:06:26 CET
Until now the system is very stable and i never had a freeze !

I installed the packages that you have suggested me:
http://glenbox.free.fr/files/Mageia/gpf_bug/syslog9




Thanks very very much for your support Dave !

Happy new year 

:)

Status: REOPENED => RESOLVED
Resolution: (none) => WORKSFORME

Comment 27 Dimitrios Glentadakis 2014-03-04 07:13:09 CET
Finaly this issue was solved by applying a cleaning product on the memory bar. I had removed the memory bar and cleaned by blowing into but it didn't help. I brought the pc to a workshop and the technician cleaned with a cleaning spray. 
The repair occurred on August 2013, and since then i have no problem any more.

Note You need to log in before you can comment on or make changes to this bug.