Bug 5613 - sata renumbering; wrong disk install and data loss risk, lilo reinstall failure
Summary: sata renumbering; wrong disk install and data loss risk, lilo reinstall failure
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: Installer (show other bugs)
Version: Cauldron
Hardware: x86_64 Linux
Priority: Normal critical
Target Milestone: ---
Assignee: Mageia Bug Squad
QA Contact:
URL:
Whiteboard:
Keywords: NEEDINFO
Depends on:
Blocks:
 
Reported: 2012-04-26 00:13 CEST by Tony Blackwell
Modified: 2014-03-15 20:28 CET (History)
4 users (show)

See Also:
Source RPM: drakx-installer-stage2
CVE:
Status comment:


Attachments
lspci output (2.86 KB, text/plain)
2012-04-26 01:35 CEST, Tony Blackwell
Details
attchment is M2 fdisk (1.91 KB, text/plain)
2013-05-24 20:07 CEST, dennis drown
Details

Description Tony Blackwell 2012-04-26 00:13:28 CEST
Description of problem:
With more than one disk, real risk of installing on wrong one.  You know your old system was on sda, so install there, and on reboot find it was the wrong disk and all you data from another disk is gone.  I suspect the problem is still that installer and running system see disk order differently.  

This old bug was present in the ancestral distribution, fully discussed but not resolved in 
https://qa.mandriva.com/show_bug.cgi?id=53443
It nearly bit me again, i.e. still present.  Only thing that saved me was the different disk partitioning.

Herton Ronaldo Krzesinski tracked down the issue back then.  Quoting from him in that thread:

"Looking at dmesg and report.bug it was possible to see what's going on, it's
the order of loading of modules that's causing this. Inside the installer,
pata_jmicron, ahci, ata_piix. In the installed system, the modules are loaded
in inverse order: ata_piix, ahci, pata_jmicron. Thus devices connected can
appear as different block devices like this case. We have to fix installer or
mkinitrd."

A side effect back then was that lilo.conf generated during install referred to the wrong hd and would fail if attempting to re-run from the newly booted system without first manually fixing lilo.conf; i guess this will still be the same.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
Comment 1 Tony Blackwell 2012-04-26 00:25:48 CEST
Sorry to find this so late in the release cycle.  M2B3 seemed mature enough that I installed it on my main multi-HD machine...
Comment 2 Tony Blackwell 2012-04-26 01:35:40 CEST
Created attachment 2108 [details]
lspci output
Comment 3 Tony Blackwell 2012-04-26 01:50:05 CEST
(drat, wrote a lot more, added an attachment without saving first...)

This is not exactly the same issue as the quoted old bug, where the underlying cause related to having both IDE and SATA drives.  Now, my system has 7 sata and no IDE.  I strongly suspect as the same issues apply however.

Looking in detail at my disks, drives sda to sdd are all Seagate 200641AS 2TB, the next 2 are ATA ST2000DL 003-9VT1 disks and the last drive is the same type as the first 4.  The last drive, sdg, differs from all the others in 2 ways.  It uses the ahci module; all the others use ata-piix.  It is the only drive which does not have smart enabled and for which smart will not enable.  (tried to enable through gsmartcontrol but smart status would not reset, tried adding the following line to rc.d/rc.local: 'smartctl -s on -o /dev/sdg' and rebooted but this didn't help either.  

Found vi MCC and hardware that sdg is the only disk on controller 4.  (the others are: 2 disks on controller 0, 2 on controller 1, 1 on controller 2, one on controller 3)
Comment 4 Tony Blackwell 2012-04-26 02:39:33 CEST
Motherboard is ASUS P6T
Storage on this is Intel ICH10R Southbridge with 6 SATA ports, JMicron JMB363 SATA & PATA controller with the SATA going to the rear eSATA port, and JMicron JMB322 supporting 2 SATA ports with additional functions ("Drive Xpert technology", "supports EZ BAckup and Super Speed functions")  BIOS has all SATA in IDE mode rather than AHCI.
Manuel Hiebel 2012-04-26 11:12:31 CEST

Attachment 2108 mime type: application/octet-stream => text/plain

Comment 5 Manuel Hiebel 2012-04-26 11:20:45 CEST
for lilo see bug 5044

maybe the report.bug can help us, you can switch to console 2 (by pressing 'Ctrl-Alt-F2') during installation,  plug a USB key/stick and type: 'bug' then press Enter. It will put report.bug on the key.

Source RPM: (none) => drakx-installer-stage2

Comment 6 Tony Blackwell 2012-04-26 12:32:12 CEST
My impression is there is no bug with lilo - it generates a perfectly good lilo.conf from the (dud) info the installer program gives it.  The only trouble is that the installer has told it the "wrong" disk, as seen from the different point of view of the rebooted system.  

During install, the system is being installed to what it thinks is sdb, and (with my M2B3 boot installed to the sdb(x) subdirectory and primary boot from old Mageia1 on what is really sda, it even boots).  The problem comes when on the re-booted system, which now realizes it is sda, you try to run lilo.  The lilo.conf entries all point to sdb which is now the wrong disk and lilo re-install crashes.  Manually editing lilo.conf and changing sdb to sda where appropriate allows it to run.  fstab also needed fixing. I'm not sure if other loose ends are left in the system.
Comment 7 Marja Van Waes 2012-05-26 13:06:55 CEST
Hi,

This bug was filed against cauldron, but we do not have cauldron at the moment.

Please report whether this bug is still valid for Mageia 2.

Thanks :)

Cheers,
marja

Keywords: (none) => NEEDINFO

Comment 8 Marja Van Waes 2012-08-03 21:10:53 CEST
@ Tony

I suspect there is something special about your system. Please reproduce this bug with Mageia 2 final, and attach the file /root/drakx/report.bug.gz

CC: (none) => marja11, pterjan

Comment 9 Marja Van Waes 2012-08-26 15:34:39 CEST
@ Tony,

Sorry, we need to close this bug as OLD because our maintainer can't do anything without the report.bug or /root/drakx/report.bug.gz of such an install.

Please reopen if you can provide that file

Status: NEW => RESOLVED
Resolution: (none) => OLD

Comment 10 Tony Blackwell 2013-01-17 12:44:01 CET
still  valid bug report for mageia 3 beta 2 x86_64; same problem.

Looking in /root/drakx, there is a file report.bug.xz which appears to be compiled code.  There is no corresponding .gz file

Status: RESOLVED => REOPENED
Resolution: OLD => (none)

Comment 11 Tony Blackwell 2013-01-19 13:20:58 CET
Further info:
While the problem of different disk sequencing to M1 and all before is true, on closer inspection there has been one major fix.  At least the installer and the subsequent installed system both enumerate the disks in the same way now, so the previous install problems (where /etc/lilo.conf created at install was incorrect and pointed to wrong disk because the installed system saw disks differently to its own installer) does not now apply.  Running lilo no longer crashes as the lilo.conf file is now true to the way the system sees the disk order with a mix of ATA and SATA disks.

I think this could be left as it is, with a more specific warning in the install notes that systems with a mix of ATA and SATA disks may have changed disk order, and particular care needs to be taken that the installation is to proceed on the intended disk.
Manuel Hiebel 2013-01-20 12:35:48 CET

CC: (none) => thierry.vignaud

Comment 12 Tony Blackwell 2013-05-05 09:07:49 CEST
M3 RC x86_64

I was almost bitten by this again.
While the last comment above is correct, the end result is that on my main system which has 1 ATA disk and half-a-dozen SATA disks, the 'main' installation disk which both other operating systems and the original Mageia 1 see as disk 1, (i.e. sda) is seen by M3 as disk 2 (i.e sdb)

As noted above, M3 is internally consistent now so the installation disk and the newly-installed system both name the disks the same.  Lilo on the new system works.  Perhaps I just have to live with this.

Question: Did we get a more specific warning in the install notes as requested above?  I've not come across it...
Comment 13 dennis drown 2013-05-24 20:07:17 CEST
Created attachment 4044 [details]
attchment is M2 fdisk

CC: (none) => dadrown1

Comment 14 dennis drown 2013-05-24 20:10:06 CEST
(In reply to dennis drown from comment #13)
> Created attachment 4044 [details]
> attchment is M2 fdisk

[root@localhost ~]# fdisk -l

Disk /dev/sda: 2000.4 GB, 2000398934016 bytes, 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x0f73c3c8

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048  1902899249   951448601    7  HPFS/NTFS/exFAT
/dev/sda2      1902903282  3907024064  1002060391+   5  Extended
Partition 2 does not start on physical sector boundary.
/dev/sda5      1902903296  3907024064  1002060384+  83  Linux

Disk /dev/sdb: 500.1 GB, 500107862016 bytes, 976773168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0a30c408

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *          63   807185924   403592931    7  HPFS/NTFS/exFAT
/dev/sdb2       807185925   976768064    84791070    f  W95 Ext'd (LBA)
/dev/sdb5       807185988   976768064    84791038+   7  HPFS/NTFS/exFAT

Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes, 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0xe7ca5714

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1   *        2048   981812474   490905213+   7  HPFS/NTFS/exFAT
/dev/sdc2       981815247  1953520064   485852409    5  Extended
Partition 2 does not start on physical sector boundary.
/dev/sdc5       981815296  1022987069    20585887   83  Linux
/dev/sdc6      1022990336  1031164154     4086909+  82  Linux swap / Solaris
/dev/sdc7      1031168000  1953520064   461176032+  83  Linux
[root@localhost ~]#                                                     

this is the M3 fdisk

all i did was a clean install, if there is anything i can do to help
i will reinstall M2 then M3 from dvd if you need install logs etc etc
Comment 15 dennis drown 2013-05-24 20:11:08 CEST
Comment on attachment 4044 [details]
attchment is M2 fdisk

>[root@localhost ~]# fdisk -l
>
>Disk /dev/sda: 500.1 GB, 500107862016 bytes
>255 heads, 63 sectors/track, 60801 cylinders, total 976773168 sectors
>Units = sectors of 1 * 512 = 512 bytes
>Sector size (logical/physical): 512 bytes / 512 bytes
>I/O size (minimum/optimal): 512 bytes / 512 bytes
>Disk identifier: 0x0a30c408
>
>   Device Boot      Start         End      Blocks   Id  System
>/dev/sda1   *          63   807185924   403592931    7  HPFS/NTFS/exFAT
>/dev/sda2       807185925   976768064    84791070    f  W95 Ext'd (LBA)
>/dev/sda5       807185988   976768064    84791038+   7  HPFS/NTFS/exFAT
>
>Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
>255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
>Units = sectors of 1 * 512 = 512 bytes
>Sector size (logical/physical): 512 bytes / 4096 bytes
>I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>Disk identifier: 0xe7ca5714
>
>   Device Boot      Start         End      Blocks   Id  System
>/dev/sdb1   *        2048   981812474   490905213+   7  HPFS/NTFS/exFAT
>/dev/sdb2       981815247  1953520064   485852409    5  Extended
>Partition 2 does not start on physical sector boundary.
>/dev/sdb5       981815296  1022987069    20585887   83  Linux
>/dev/sdb6      1022990336  1031164154     4086909+  82  Linux swap / Solaris
>/dev/sdb7      1031168000  1953520064   461176032+  83  Linux
>
>Disk /dev/sdc: 2000.4 GB, 2000398934016 bytes
>255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
>Units = sectors of 1 * 512 = 512 bytes
>Sector size (logical/physical): 512 bytes / 4096 bytes
>I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>Disk identifier: 0x0f73c3c8
>
>   Device Boot      Start         End      Blocks   Id  System
>/dev/sdc1   *        2048  1902899249   951448601    7  HPFS/NTFS/exFAT
>/dev/sdc2      1902903282  3907024064  1002060391+   5  Extended
>Partition 2 does not start on physical sector boundary.
>/dev/sdc5      1902903296  3907024064  1002060384+  83  Linux

this is my M2 fdisk
Comment 16 Tony Blackwell 2014-03-15 20:28:38 CET
This is my (very old) bug, although Dennis contributed more recently.  It is still true, but only applies to hardware with a mixture of a boot disk IDE and other disks SATA.  The final current state (as of mageia 4 version 2, in qa) is that at least the install disk and the rebooted system see the disks the same way so reboot is OK (they both think the 'primary' disk is sdb, but that doesn't really matter).

Closing this - not an active issue anymore.

Status: REOPENED => RESOLVED
Resolution: (none) => FIXED


Note You need to log in before you can comment on or make changes to this bug.