Bug 21280

Summary: copying multiple files terminates while incomplete
Product: Mageia Reporter: Tony Blackwell <tablackwell>
Component: RPM PackagesAssignee: Mageia Bug Squad <bugsquad>
Status: RESOLVED INVALID QA Contact:
Severity: minor    
Priority: Low CC: ftg, jani.valimaa, mageiatools, marja11
Version: 6   
Target Milestone: ---   
Hardware: All   
OS: Linux   
Whiteboard: NEEDINFO
Source RPM: CVE:
Status comment:
Attachments: journalctl -b output

Description Tony Blackwell 2017-07-19 00:59:29 CEST
Description of problem:copying multiple files terminates while incomplete.

context: new M6 install on SSD, combined with having just replaced another (spinning) hard disk which had totally failed.  Was rebuilding stuff onto a new disk from backups.  Had about 500Gb of stuff copying, between 11 different copy processes from xfce desktop.  (it wouldn't accept a 12th).  Was telling myself that probably wasn't a good idea as presumably would fragment the resulting files, but lots else to do...

Went out to chat for 5 min, came back to find the copying had aborted prematurely.  Not only had the copy progress window closed, but so had the source and destination directory windows.

This all on a modern UEFI motherboard with 32Gb RAM.

Scary stuff!

Looking at journalctl, there are some martian reports.  I've since unplugged a second network card on the host machine, which may help this.  Journalctl is however totally swamped by mandi messages, of form:

 "mandi[3668]: handling method call 'GetMode' on interface 'org.mageia.monitoring.ifw'"

In some periods of one second in the log there are 50 such identical messages,
occurring at times 08:26:17, 08:27:34, 08:29:25, 08:30:19 where there are of the order of 30-50 messages in each 1-second window just mentioned, with no entries for times in between those listed above.  Doesn't look right!

There was almost nothing else shown with journalctl -b


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:  Don't know...


I've provisionally called this 'critical' as there were loads of data which didn't get copied.  The disappearance of opened source and destination windows is also a worrying part of this.
Comment 1 Marja Van Waes 2017-07-21 20:58:18 CEST
Could you please attach your "journalctl -b" output?

If you've rebooted X times since this happened, then attach the output of "journaclt -b-X"

please compress the file with xz before attaching.

Also, please tell how exactly you started the copying processes from XFCE desktop

Thanks :-)

CC'ing the mageiatools maintainers because of the avalanche of "mandi[3668]: handling method call 'GetMode' on interface 'org.mageia.monitoring.ifw'" messages.

Also CC'ing wally in case there's a problem with an XFCE tool you used.

CC: sysadmin-bugs => jani.valimaa, mageiatools, marja11
Component: Release (media or process) => RPM Packages
Whiteboard: (none) => NEEDINFO

Comment 2 Tony Blackwell 2017-07-25 21:25:36 CEST
Created attachment 9519 [details]
journalctl -b output

Good and bad:
1.  Sorry, but I've torn down and rebuilt the system since reporting this, so no journalctl available.  It may be we have to abandon this bug report pending recurrence, however, for what its worth:

2.  At this instant I've got further copy problems on the same hardware.  Not exactly the same in that copy hasn't finished incomplete, but rather I went to bed leaving it to copy half a terabyte of data, and now at 5am its done almost nothing, with the disk access light still in rapid flicker.  Attached output of journalctl -b, which mostly seems to be reporting DMA write errors.  At first glance I wondered if this was hardware-based (modern UEFI system, but some old disks) although all reported themselves healthy to gsmartcontrol a few days ago.  

Hmm, the target disk on this occasion is the only one on the system which is repoted by gsmartcontrol as "unknown model" with smart status unsupported.

I'm going to flag this as a hardware issue pending any more data.
Comment 3 Tony Blackwell 2017-07-25 21:29:14 CEST
have set to resolved and invalid for now.  Appreciate your interest.  Sorry if my bug reporting was in fact not a software issue.
Thanks,
Tony

Severity: critical => minor
Status: NEW => RESOLVED
Resolution: (none) => INVALID
Priority: Normal => Low

Comment 4 Frank Griffin 2017-07-25 22:36:26 CEST
What are you copying from and to, and which (if either) was getting the DMA errors.  Also, are you getting any SMART errors from the BIOS when you boot ?

CC: (none) => ftg

Comment 5 Tony Blackwell 2017-07-26 00:50:37 CEST
I was copying from an external 2Gb 2.5 inch drive, USB3, to an internal 3 1/2 inch SATA3 drive.  No SMART errors on boot.  Curiously the same internal drive seems to accept files written to it normally if its just a few Gb.  It didn't go on with a useful copy when I selected a parent directory containing about half a terabyte of files for the copy.

in terms of the attached output of journalctl -b, the target drive for the write was /dev/sdg, the only drive which is 'unknown' to smart.  Beyond seeing 'write DMA ext' errors, I can't otherwise tell which drive was responsible - I assume the target drive, as write errors were reported?