Bug 19434 - trying to switch to another network profile during boot makes the system hang forever (typing the number of the last used profile works ok)
Summary: trying to switch to another network profile during boot makes the system hang...
Status: NEW
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: Cauldron
Hardware: x86_64 Linux
Priority: Normal normal
Target Milestone: ---
Assignee: Mageia tools maintainers
QA Contact:
URL:
Whiteboard: MGA5TOO
Keywords:
Depends on:
Blocks:
 
Reported: 2016-09-21 22:29 CEST by shelandy ting
Modified: 2019-03-24 09:46 CET (History)
4 users (show)

See Also:
Source RPM: drakx-net
CVE:
Status comment:


Attachments
hang after selecting a profile, continue with magic keys (235.44 KB, text/plain)
2016-09-29 19:59 CEST, Marja Van Waes
Details
the log of the 2nd boot, (can not get the log of 1st one becuase it hung) (281.31 KB, text/plain)
2016-09-29 23:22 CEST, shelandy ting
Details
the journel file containing failing and sucessful boot info (351.91 KB, text/plain)
2016-10-08 01:26 CEST, shelandy ting
Details

Description shelandy ting 2016-09-21 22:29:30 CEST
*Description of problem:

I can not select the network profile by keyboard during the boot time 

* When prompt of specifying the network profile, type either the number (eg, "1", "2", "3"...) or the name  (e.g., "default") of the profile. The boot processing will be stopped once you hit the keyboard. The only way to finish the boot process is not to touch the keyboard but let it use the network profile used for the last time 

* Has tried to rebuild the initrd.img as mentioning in https://bugs.mageia.org/show_bug.cgi?id=6170  This did not help.  See the log below

* However, the keyboard is working before that: since I boot up the 
with the kernel option "splash" yet I can press the "esc" key at the beginning to switch to the text mode for monitoring the text message during the boot process  

* Version-Release number of selected component (if applicable):
Mageia 5.0  kernel 4.4.16  64bit

How reproducible:

Steps to Reproduce:
1. Use draknetprofile to create an (at least) second profile
2. Reboot the system
3. When prompt of specifying the network profile, either type the number (1,2,3...) or the name of the profile. 

# dracut  initrd-4.4.16-desktop-1-recut.mga5.img
Executing: /usr/bin/dracut  initrd-4.4.16-desktop-1-recut.mga5.img
dracut module 'bootchart' will not be installed, because command '/sbin/bootchartd' could not be found!
dracut module 'caps' will not be installed, because command 'capsh' could not be found!
dracut module 'busybox' will not be installed, because command 'busybox' could not be found!
dracut module 'network' will not be installed, because it's in the list to be omitted!
dracut module 'network' will not be installed, because it's in the list to be omitted!
dracut module 'ifcfg' depends on 'network', which can't be installed
dracut module 'dmraid' will not be installed, because command 'dmraid' could not be found!
dracut module 'lvm' will not be installed, because command 'lvm' could not be found!
dracut module 'mdraid' will not be installed, because command 'mdadm' could not be found!
dracut module 'multipath' will not be installed, because command 'multipath' could not be found!
dracut module 'fcoe' will not be installed, because command 'dcbtool' could not be found!
dracut module 'fcoe' will not be installed, because command 'fipvlan' could not be found!
dracut module 'fcoe' will not be installed, because command 'lldpad' could not be found!
dracut module 'fcoe-uefi' will not be installed, because command 'dcbtool' could not be found!
dracut module 'fcoe-uefi' will not be installed, because command 'fipvlan' could not be found!
dracut module 'fcoe-uefi' will not be installed, because command 'lldpad' could not be found!
dracut module 'iscsi' will not be installed, because command 'iscsistart' could not be found!
dracut module 'iscsi' will not be installed, because command 'iscsi-iname' could not be found!
dracut module 'nbd' will not be installed, because command 'nbd-client' could not be found!
95nfs: Could not find any command of 'rpcbind portmap'!
dracut module 'biosdevname' will not be installed, because command 'biosdevname' could not be found!
dracut module 'systemd' will not be installed, because it's in the list to be omitted!
dracut module 'caps' will not be installed, because command 'capsh' could not be found!
dracut module 'busybox' will not be installed, because command 'busybox' could not be found!
dracut module 'ifcfg' depends on 'network', which can't be installed
dracut module 'dmraid' will not be installed, because command 'dmraid' could not be found!
dracut module 'lvm' will not be installed, because command 'lvm' could not be found!
dracut module 'mdraid' will not be installed, because command 'mdadm' could not be found!
dracut module 'multipath' will not be installed, because command 'multipath' could not be found!
dracut module 'fcoe' will not be installed, because command 'dcbtool' could not be found!
dracut module 'fcoe' will not be installed, because command 'fipvlan' could not be found!
dracut module 'fcoe' will not be installed, because command 'lldpad' could not be found!
dracut module 'fcoe-uefi' will not be installed, because command 'dcbtool' could not be found!
dracut module 'fcoe-uefi' will not be installed, because command 'fipvlan' could not be found!
dracut module 'fcoe-uefi' will not be installed, because command 'lldpad' could not be found!
dracut module 'iscsi' will not be installed, because command 'iscsistart' could not be found!
dracut module 'iscsi' will not be installed, because command 'iscsi-iname' could not be found!
dracut module 'nbd' will not be installed, because command 'nbd-client' could not be found!
95nfs: Could not find any command of 'rpcbind portmap'!
*** Including module: bash ***
*** Including module: dash ***
*** Including module: i18n ***
*** Including module: drm ***
*** Including module: plymouth ***
*** Including module: kernel-modules ***
*** Including module: resume ***
*** Including module: rootfs-block ***
*** Including module: terminfo ***
*** Including module: udev-rules ***
Skipping udev rule: 91-permissions.rules
Skipping udev rule: 80-drivers-modprobe.rules
*** Including module: usrmount ***
*** Including module: base ***
*** Including module: fs-lib ***
*** Including module: shutdown ***
*** Including modules done ***
*** Installing kernel module dependencies and firmware ***
*** Installing kernel module dependencies and firmware done ***
*** Resolving executable dependencies ***
*** Resolving executable dependencies done***
*** Stripping files ***
*** Stripping files done ***
*** Generating early-microcode cpio image ***
*** Constructing GenuineIntel.bin ****
*** Store current command line parameters ***
*** Creating image file ***
*** Creating image file done ***
Comment 1 Marja Van Waes 2016-09-22 22:52:42 CEST
Not sure this is a kernel issue, but the kernel maintainers will know how to find out :-)

@ Shelandy

Does it work when you boot with an older kernel?

CC: sysadmin-bugs => mageiatools, marja11
Component: Release (media or process) => RPM Packages
Assignee: bugsquad => kernel

Comment 2 shelandy ting 2016-09-25 18:57:49 CEST
I did not know that we can switch the network profile until recently that I can not get wpa2-enterprise connection work.  I had not tested it when I used the old kernels in the past and I am not able to test this feature now since I only keep the lasted version of kernel in my very tide SSD drive.
Comment 3 Marja Van Waes 2016-09-25 19:25:07 CEST
Please, after shutting your system down: 
* note the time
* boot your system two times
** where during the first boot you try to type the profile number when asked
** and the 2nd boot you don't.

After having successfully booted up the second time:
* please start a terminal/konsole
* become root.

Then type:

   journalctl --since="YYYY-MM-DD hh:mm" > journal.txt

Where YYYY-MM-DD and hh:mm match the date you're doing this + the noted time, e.g.: journalctl --since="2016-09-25 19:25" > journal.txt

Please attach journal.txt to this bug report
Comment 4 Marja Van Waes 2016-09-29 19:59:23 CEST
Created attachment 8468 [details]
hang after selecting a profile, continue with magic keys

I could reproduce this in cauldron

After creating a test profile with draknetprofile

[marja@cldrn_64 ~]$ ls -al /etc/netprofile/profiles/
totaal 16
drwxr-xr-x 4 root root 4096 sep 29 13:51 ./
drwxr-xr-x 4 root root 4096 feb  8  2016 ../
drwxr-xr-x 9 root root 4096 sep 29 13:51 default/
drwxrwxr-x 9 root root 4096 sep 29 13:51 test/
[marja@cldrn_64 ~]$ cat /etc/netprofile/current
test
[marja@cldrn_64 ~]$

I tried to switch from the activated test profile to the default one during boot up.

It prevented my system from finishing the boot process, it just hung.
However, I did not wait 10 minutes (intend to do that later).

I failed to see anything useful in the logs, but I'll attach logs anyway of the seeming freeze (when switching between plymouth and text mode, in both the request to make a choice kept being there, despite having chosen)

     Select Network Profile:
     (1)default (2)test*

In the attached logs I decided to try to continue with alt+sysrq+e around 19:00h, which succeeded
Despite having typed "1", the test profile was used.
Comment 5 Marja Van Waes 2016-09-29 20:00:35 CEST
I have strong doubts about this being a kernel issue, changing the assignee

Version: 5 => Cauldron
Assignee: kernel => mageiatools
Whiteboard: (none) => MGA5TOO

Marja Van Waes 2016-09-29 20:01:10 CEST

Source RPM: (none) => drakx-net

Comment 6 shelandy ting 2016-09-29 23:05:53 CEST
I following the suggest to make a test.  I boot up the 4.4.16 kernel at 3:54.  I have three profile, the first one is the default, So press "3"  when the network profile selection shown up.  I have waited for 6min.  No progress at all (I have benchmarked my system before: this ssd can be booed in 16 sec.) Since the boot is hang by pressing a number key, there is no way for me to get the  journal.txt. I can not even use crontol-alt-del to shut down.  I have to use power button to force it reboot at 4:03
Comment 7 shelandy ting 2016-09-29 23:22:02 CEST
Created attachment 8470 [details]
the log of the 2nd boot, (can not get the log of  1st one becuase it hung)

the log that I reboot the 4.4.16 kernel at 4:03pm
I can not get the previous on at 3:54 where I press "3"  when the network profile selection shown up.  It hung there over 6 mins. I have to use power button to force it reboot
Comment 8 Marja Van Waes 2016-09-30 14:51:57 CEST
(In reply to shelandy ting from comment #7)

> I can not get the previous on at 3:54 where I press "3"  when the network
> profile selection shown up.  It hung there over 6 mins. I have to use power
> button to force it reboot

Those logs should exist, and be either available using "--since=", as explained in comment 3, or using e.g.:

    journalctl -ab -1 > output.txt


-1 is the journal from 1 boot ago
-2 is the journal from 2 boots ago
-3 is the one from 3 boots ago
etc.

I hope there's more in your log than in mine.


Btw, typing the password for an encrypted partition during boot-up (in another cauldron on the same machine) does not give any problems at all here.

Typing the network selection gives problems both when seeing plymouth and when only seeing log messages with "splash=verbose".
The hang happens before an attempt is made to start a display manager (and, does, of course, also happen when logging into RL3).
Comment 9 Marja Van Waes 2016-10-07 23:40:41 CEST
When selecting the last activated network profile, by pressing the corresponding number the moment you're prompted to choose, _immediately_ the message appears that that network profile will be used (so much faster than when not choosing).

It is only typing the number of a _different_ network profile than the last activated one that the systems hangs for ever.

I've tried this both with network profile 1 as last used one, and with network profile 2 as last used one. (I have not tried typing a network profile name.)

Summary: network profile can not be selected by keyboard during the boot time => trying to switch to another network profile during boot makes the system hang forever (typing the number of the last used profile works ok)

Comment 10 shelandy ting 2016-10-08 01:26:52 CEST
Created attachment 8507 [details]
the journel file containing failing and sucessful boot info

line 1115 and line 1116 seems to be the turning point for failing boot

Oct 07 17:28:45 pipa kernel: input: SynPS/2 Synaptics TouchPad as /devices/platform/i8042/serio4/input/input17
Oct 07 17:30:07 pipa alsactl[814]: alsactl daemon stopped

while in line 2236, it goes on to load OSS Proxy

Oct 07 17:30:59 pipa kernel: input: SynPS/2 Synaptics TouchPad as /devices/platform/i8042/serio4/input/input17
Oct 07 17:31:00 pipa osspd[880]: OSS Proxy v1.3.2 (C) 2008-2010 by Tejun Heo <teheo@suse.de>

It looks like something happening after line 1115 kernel: input: SynPS/2 Synaptics TouchPad
where I can not make the boot go through, eventually I have to hit the power button to shut down the machine

CC: (none) => shelandy

Comment 11 Marja Van Waes 2016-10-08 08:56:18 CEST
@ shelandy ting

can you confirm that typing the number of the last used profile, works ok during boot?

(if not, then I'll revert the summary of this report and open a separate report for my findings)
Comment 12 shelandy ting 2016-10-09 06:04:20 CEST
Yes, I did try typing the number of last used profile, it will go through.  

So,I have confirmed that what Marja van Waes mentioned above regarding using the last working profile can be repeated on my machine.
Comment 13 Marja Van Waes 2016-10-13 14:45:46 CEST
This may be a red herring, but there is something very funny: since I created a second profile and switched to it:

sep 29 13:51:09 cldrn_64 draknetprofile[2772]: ### Program is starting ###
sep 29 13:51:18 cldrn_64 draknetprofile[2772]: launched command: /sbin/netprofile switch test
sep 29 13:51:20 cldrn_64 draknetprofile[2772]: Switching to "test" profile
sep 29 13:51:38 cldrn_64 draknetprofile[2772]: ### Program is exiting ###

Since then I've often seen many such messages:
sep 29 18:46:04 cldrn_64 pam_timestamp_check[5586]: PAM `/' permissions are lax


[marja@cldrn_64 ~]$ ll -d /
drwxrwxr-x 25 root root 4096 sep 29 13:51 //
[marja@cldrn_64 ~]$ stat /
  File: '/'
  Size: 4096         Blocks: 8           IO Block: 4096   directory
Device: 806h/2054d   Inode: 2            Links: 25
Access: (0775/drwxrwxr-x)   Uid: (    0/    root)   Gid: (    0/    root)
Access: 2016-01-18 23:49:24.000000000 +0100
Modify: 2016-09-29 13:51:19.000000000 +0200
Change: 2016-10-12 09:59:40.883815568 +0200
Birth:  -
[marja@cldrn_64 ~]$ 

I've now changed the / permissions to 0755, which immediately stopped those "PAM `/' permissions are lax" messages, but doubt that'll survive a reboot. If it does, then I'm curious whether that'll somehow magically help to allow switching to a different profile during boot.
Comment 14 Marja Van Waes 2016-10-13 15:17:18 CEST
(In reply to Marja van Waes from comment #13)

> 
> I've now changed the / permissions to 0755, which immediately stopped those
> "PAM `/' permissions are lax" messages, but doubt that'll survive a reboot.
> If it does, then I'm curious whether that'll somehow magically help to allow
> switching to a different profile during boot.

It does not survive a reboot, it's 0775 again. 

Sorry if this is totally unrelated. The time stamp may come from a very different package, but I don't know how to query which package has set the permissions on / :-(
Comment 15 Marja Van Waes 2016-10-15 13:50:06 CEST
The lax permissions I saw on / are not related to this bug.
I could reproduce (in a different cauldron) the / directory getting a timestamp that matches the time of when a second profile was created. 
However, this time the permissions on the root directory did not change, but this bug was valid anyway.
Comment 16 Juergen Harms 2019-03-24 09:46:28 CET
Booting a cauldron (fully updated Mageia-7 Beta-1 system; Kernel 5.0.3-desktop-2-mga7) I hit what very much looks like this problem:

- I have a choice between 3 network profiles;
- if, when the network selection question appears, I let the system use the (default) previously used profile, everything goes well;
- if I type-in anything specific (digit, letter ... ) the the system goes into a hang; the only way to get out of the hang is a complete restart (power cycle);
- in case the first key I hit is ESC, I obtain a console which illustrates the hang situation.

Note: I nearly ever use netprofile during boot, normally - if I need to change profile - I do this with drakconf once the the system is running. Therefore I have no idea whether the problem is new in Mageia-7 or whether it is the old bug still being around.

Drakconf is a perfectly valid work-around, but the nasty aspect of this bug is that the boot goes into a hang if the keyboard is touched during the critical period. How about simply de-activating the possibility to call netprofile during boot as long as this problem is not solved?

CC: (none) => juergen.harms


Note You need to log in before you can comment on or make changes to this bug.