Bug 27450

Summary: Apache hangs on systemctl stop/restart httpd.service
Product: Mageia Reporter: Marc Krämer <mageia>
Component: RPM PackagesAssignee: Shlomi Fish <shlomif>
Status: RESOLVED OLD QA Contact:
Severity: normal    
Priority: Normal CC: davidwhodgins
Version: 8Keywords: Triaged
Target Milestone: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Source RPM: apache-2.4.41-1.2.mga7.x86_64 CVE:
Status comment:

Description Marc Krämer 2020-10-19 09:38:40 CEST
after some time on running apache, doing
systemctl stop httpd

hangs forever.
Maybe it is not a good idea to have SIGWINCH here, which should graceful stop httpd. Having a hanging httpd (which we have in fact) is worse than loosing a few requests and serve fast again.


Btw. can we add 
apaachectl -t
before startup/restart. If you change the config and do a restart it fails because the config is not correct. So you have a downtime. It would be nice to check the config before killing and trying to restart it.
Comment 1 Aurelien Oudelet 2020-10-19 20:21:51 CEST
Hi, thanks reporting this.

Doing this prevent Apache from running fine again?
If you have not done modifications on Apache's configuration while it is running for a long time, it should run again.

For WINCH signal, it can be done by:
# apachectl -k graceful-stop
This command automatically checks the configuration files as in configtest before initiating the restart to make sure Apache doesn't die.

Meanwhile, /usr/lib/systemd/system/httpd.service on M7 is:
[Unit]
Description=The Apache HTTP Server
After=network.target remote-fs.target nss-lookup.target

[Service]
Type=notify
Environment=LANG=C
EnvironmentFile=-/etc/sysconfig/httpd
ExecStart=/usr/sbin/httpd $OPTIONS -DFOREGROUND
ExecReload=/usr/sbin/httpd $OPTIONS -k graceful
# Send SIGWINCH for graceful stop
KillSignal=SIGWINCH
KillMode=mixed

[Install]
WantedBy=multi-user.target

I don't see error here, as it must send SIGWINCH, and "ExecReload=/usr/sbin/httpd $OPTIONS -k graceful".
AND this is like sending apachectl -t, according to Apachectl manpage:

apachectl graceful
Gracefully restarts the Apache httpd daemon. [...] This command automatically checks the configuration files as in configtest before initiating the restart to make sure Apache doesn't die. This is equivalent to apachectl -k graceful.

Assigning to registered maintainer, for advice.
(Please set the status to 'assigned' if you are working on it)

Keywords: (none) => Triaged
Summary: apache hangs on stop/restart => Apache hangs on systemctl stop/restart httpd.service
Assignee: bugsquad => shlomif

Comment 2 David Walser 2020-10-21 22:24:04 CEST
Our upstream is here:
https://src.fedoraproject.org/rpms/httpd/blob/master/f/httpd.service

We don't have the httpd-init because we do it in package scriplets instead.  I definitely think the graceful/winch stuff is correct; the default assumed behavior shouldn't be that it hangs.  If you experience that on your own system, kill it manually.

I don't think forcing it to check the config every time is necessary either.  If you change it, you can do that yourself.
Comment 3 Marc Krämer 2020-10-22 17:57:46 CEST
@David: most services have disabled graceful features and there ar good reasons for it.
1. while the server may still serve one request during restart it is fully unresponsible to new requests
2. having enabled h2 or keep alive will put apache in a state where it waits for a long time compared to the time for the restart
3. initiating a reboot (e.g. remote) will hang the server until apache finally got killed

For the config check (which is not very expensive), I'm quite sure we had this feature before we migrated to systemd, but I'm to lazy to check that in svn - and I think it does not matter anyway. For a reload it is not nice to have the server "crash" just by reloading the config. It would be more polite to say "reload is not possible since there is an error in your config".
The usual workflow is to change a setting (and be honest) think you've done correct because it was just a small change and try to apply it by systemctl reload httpd. Then you got the server crashed (because you've forgotten to check the syntax first) and now you are in a hurry, as the server should be running and you don't have the time to really search for the error. Software should prevent us from making mistakes.
Comment 4 Marc Krämer 2020-11-06 20:48:05 CET
FYI: apache reload always works as expected, but restart or stop always hangs, so I assume KillSignal or KillMode is the problem. I'm not sure sending SIGWINCH to all apache processes is correct...


If I read the man page systemd.kill correct
"If set to mixed, the SIGTERM signal (see below *=KillSignal *) is sent to the main process while the subsequent SIGKILL signal (see below *=FinalKillSignal*) is sent to all remaining processes of the unit's control group."

This means all workers receive FinalKillSignal, if not set, equivalent to SIGKILL and the main-process receives KillSignal which is (in our case) set to SIGWINCH 
Accodring to https://httpd.apache.org/docs/2.4/en/stopping.html, SIGWINCH should only be sent to the parent process and no signal should be passed to the children.

"The graceful-stop signal allows you to run multiple identically configured instances of httpd at the same time. This is a powerful feature when performing graceful upgrades of httpd, however it can also cause deadlocks and race conditions with some configurations."

At least each service using gracefull stop should always set
TimeoutStopSec=
this will prevent services from hanging "forever" to stop. I thnik this should not exceed 30s!
This will e.g. help having a shutdown not ending in deadlocks or waiting forever to stop httpd.
Comment 5 Dave Hodgins 2020-11-06 21:35:05 CET
Just fyi, I cannot recreate the problem on my Mageia 7 x86_64 install ...
[root@x3 ~]# systemctl status httpd.service 
● httpd.service - The Apache HTTP Server
   Loaded: loaded (/usr/lib/systemd/system/httpd.service; disabled; vendor preset: disabled)
   Active: active (running) since Mon 2020-11-02 15:21:56 EST; 4 days ago
 Main PID: 1533 (/usr/sbin/httpd)
   Status: "Total requests: 149; Idle/Busy workers 100/0;Requests/sec: 0.00043; Bytes served/sec:   0 B/sec"
    Tasks: 9 (limit: 4915)
   Memory: 30.3M
   CGroup: /system.slice/httpd.service
           ├─ 1533 /usr/sbin/httpd -DFOREGROUND
           ├─ 1567 /usr/sbin/httpd -DFOREGROUND
           ├─ 1568 /usr/sbin/httpd -DFOREGROUND
           ├─ 1569 /usr/sbin/httpd -DFOREGROUND
           ├─ 1570 /usr/sbin/httpd -DFOREGROUND
           ├─ 1571 /usr/sbin/httpd -DFOREGROUND
           ├─ 1572 /usr/sbin/httpd -DFOREGROUND
           ├─ 3278 /usr/sbin/httpd -DFOREGROUND
           └─31295 /usr/sbin/httpd -DFOREGROUND

Nov 02 15:21:55 x3.hodgins.homeip.net systemd[1]: Starting The Apache HTTP Server...
Nov 02 15:21:56 x3.hodgins.homeip.net systemd[1]: Started The Apache HTTP Server.
[root@x3 ~]# systemctl stop httpd.service 
[root@x3 ~]# systemctl status httpd.service 
● httpd.service - The Apache HTTP Server
   Loaded: loaded (/usr/lib/systemd/system/httpd.service; disabled; vendor preset: disabled)
   Active: inactive (dead)

Nov 02 15:21:55 x3.hodgins.homeip.net systemd[1]: Starting The Apache HTTP Server...
Nov 02 15:21:56 x3.hodgins.homeip.net systemd[1]: Started The Apache HTTP Server.
Nov 06 15:32:47 x3.hodgins.homeip.net systemd[1]: Stopping The Apache HTTP Server...
Nov 06 15:32:48 x3.hodgins.homeip.net systemd[1]: httpd.service: Succeeded.
Nov 06 15:32:48 x3.hodgins.homeip.net systemd[1]: Stopped The Apache HTTP Server.

CC: (none) => davidwhodgins

Comment 6 Marc Krämer 2020-11-07 11:29:32 CET
@Dave: thanks for your reply. I think you are running a test environment. I run some real world server, cinfigured with php-fpm (using mod-proxy).
Yesterday due to the reading for this bug, I stumbled across https://httpd.apache.org/docs/2.4/en/mod/mpm_common.html#gracefulshutdowntimeout

Setting this to e.g. 2 in apache config seems to solve the issue. The default is to wait forever (which is what I was seeing).

You can still see the effect:
[root@borachio ~]# systemctl status httpd.service 
● httpd.service - The Apache HTTP Server
   Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/httpd.service.d
           └─mount.conf
   Active: active (running) since Fri 2020-10-16 18:54:18 CEST; 3 weeks 0 days ago
  Process: 25437 ExecReload=/usr/sbin/httpd $OPTIONS -k graceful (code=exited, status=0/SUCCESS)
 Main PID: 3768 (httpd)
   Status: "Total requests: 2763257; Idle/Busy workers 98/2;Requests/sec: 1.47; Bytes served/sec:  21KB/sec"
    Tasks: 257 (limit: 4915)
   Memory: 1.9G
   CGroup: /system.slice/httpd.service
           ├─ 3768 /usr/sbin/httpd -DFOREGROUND
           ├─25439 /usr/sbin/httpd -DFOREGROUND
           ├─25440 /usr/sbin/httpd -DFOREGROUND
           ├─25447 /usr/sbin/httpd -DFOREGROUND
           └─25635 /usr/sbin/httpd -DFOREGROUND

Nov 05 05:37:27 borachio.domain.de systemd[1]: Reloading The Apache HTTP Server.
Nov 05 05:37:27 borachio.domain.de systemd[1]: Reloaded The Apache HTTP Server.
Nov 05 15:33:40 borachio.domain.de systemd[1]: Reloading The Apache HTTP Server.
Nov 05 15:33:40 borachio.domain.de systemd[1]: Reloaded The Apache HTTP Server.

[root@borachio ~]# ps aux|grep httpd
root      3768  0.0  0.0  55584 50276 ?        Ss   Okt16   1:17 /usr/sbin/httpd -DFOREGROUND
root     22010  0.0  0.0  22640   760 pts/0    S+   11:13   0:00 grep --color httpd
apache   25439  0.0  0.1 4709000 186992 ?      Sl   Nov06   0:34 /usr/sbin/httpd -DFOREGROUND
apache   25440  0.0  0.1 4709000 173872 ?      Sl   Nov06   0:33 /usr/sbin/httpd -DFOREGROUND
apache   25447  0.0  0.1 4446856 176656 ?      Sl   Nov06   0:41 /usr/sbin/httpd -DFOREGROUND
apache   25635  0.1  0.2 4709000 272796 ?      Sl   Nov06   1:33 /usr/sbin/httpd -DFOREGROUND

[root@borachio ~]# systemctl stop httpd.service  <------- hangs for some time

other console shows this during stop:
root@borachio ~]# ps aux|grep httpd
root      3768  0.0  0.0  55584 50276 ?        Ss   Okt16   1:17 /usr/sbin/httpd -DFOREGROUND
root     22043  0.0  0.0  34168  5848 pts/0    S+   11:14   0:00 systemctl stop httpd.service
root     22049  0.0  0.0  22640   760 pts/1    S+   11:14   0:00 grep --color httpd
apache   25440  0.0  0.0      0     0 ?        Z    Nov06   0:33 [httpd] <defunct>
apache   25447  0.0  0.0      0     0 ?        Z    Nov06   0:41 [httpd] <defunct>
apache   25635  0.1  0.0      0     0 ?        Z    Nov06   1:33 [httpd] <defunct>




From my perspectiv I still suggest:
a) adding GracefulShutdownTimeout to apache config (documented, and some default)
b) adding some realistic hard time out on systemd services that try to do graceful stopping/restart, as these services may get stuck and the result is not desired
Comment 7 Dave Hodgins 2020-11-07 22:34:47 CET
I'm somewhat against choosing a default other that 0 (wait forever) as there
will be very slow systems where what ever we choose is not enough while on
fast systems it will be seen as too much.

My preference would be to add or alter a urpmi.readme file that explains how
to set the value, to the appropriate package.

On those systems that are affected, the sysadmin can choose a proper timeout.
Comment 8 Aurelien Oudelet 2021-07-06 13:16:49 CEST
Mageia 7 is EOL since July 1st 2021.
There will not have any further bugfix for this release.

You are encouraged to upgrade to Mageia 8 as soon as possible.

@reporter, if this bug still apply with Mageia 8, please let us know it.

@packager, if you work on the Mageia 7 version of your package, please check the Mageia 8 package if issue is also present. In this case, please fix the Mageia 8 version instead.

This bug report will be closed OLD if there is no further notice within 1st September 2021.
Marc Krämer 2021-07-06 13:33:35 CEST

Version: 7 => 8

Comment 9 Marc Krämer 2023-04-25 12:50:51 CEST
already fixed

Resolution: (none) => OLD
Status: NEW => RESOLVED