| Summary: | httpd dies at 4am when logrotate reloads it | ||
|---|---|---|---|
| Product: | Mageia | Reporter: | Pascal Terjan <pterjan> |
| Component: | RPM Packages | Assignee: | QA Team <qa-bugs> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | major | ||
| Priority: | Normal | CC: | brtians1, dan, davidwhodgins, ftg, herman.viaene, luigiwalser, sysadmin-bugs, thierry.vignaud, tmb, zen25000 |
| Version: | 8 | Keywords: | advisory, validated_update |
| Target Milestone: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | MGA8-64-OK | ||
| Source RPM: | apache-2.4.51-1.mga8.src.rpm | CVE: | |
| Status comment: | |||
| Bug Depends on: | |||
| Bug Blocks: | 30543 | ||
|
Barry Jackson
2021-11-02 21:42:21 CET
CC:
(none) =>
zen25000 Thank you for the report. Assigning it globally because of the nature of the problem hoping that somebody might have a clue (CC'ing tmb to this end); in any case, Apache is updated by various people. Source RPM:
apache-2.4.51-1.mga8.src.rpm =>
apache-2.4.51-1.mga8.src.rpm, systemd-246.15-1.mga8.src.rpm If there's some kind of race condition on startup that's triggered by multiple restarts in quick succession, adding something like "RestartSec=3" to httpd.service might help. CC:
(none) =>
dan I added an untested workaround: # cat /etc/systemd/system/httpd.service.d/restart.conf [Service] Restart=on-failure RestartSec=5 It's broken today. Maybe it was overwritten by an update and we need a rule to set it? CC:
(none) =>
thierry.vignaud AL13N found an article with a solution: https://access.redhat.com/solutions/3557661 The "sharedscripts" is missing in the config. In this case, /bin/systemctl reload httpd.service will be run repeatedly to the number of log files in /var/log/httpd. For example, if you have access_log, error_log, ssl_access_log, and ssl_error_log in /var/log/httpd, it will be reloaded 4 times at one time, and it will sometimes cause segfault. Please add "sharedscripts" as follows: # cat /etc/logrotate.d/httpd /var/log/httpd/*log { missingok notifempty sharedscripts delaycompress postrotate /bin/systemctl reload httpd.service > /dev/null 2>/dev/null || true endscript } I have submitted the change in Cauldron. Hopefully someone can test and let us know how it fares. CC:
(none) =>
luigiwalser also added on infra on 2022-07-05: http://gitweb.mageia.org/infrastructure/puppet/commit/?id=9548ce7d0c189776b03cb6197fcceb15d14dfaa7 Nice, that should give us a good test case, as our web server has actually exhibited the issue. Nice, thanks you Other than the data center being down, I assume our web server was OK this morning? Considering all servers got shutdown / rebooted ... no idea... but generally it seems to have behaved so far since I committed the change... Change implemented in apache-2.4.54-1.1.mga8.src.rpm RPMS: apache-2.4.54-1.1.mga8 apache-mod_proxy-2.4.54-1.1.mga8 apache-devel-2.4.54-1.1.mga8 apache-mod_http2-2.4.54-1.1.mga8 apache-mod_ssl-2.4.54-1.1.mga8 apache-mod_dav-2.4.54-1.1.mga8 apache-mod_cache-2.4.54-1.1.mga8 apache-mod_ldap-2.4.54-1.1.mga8 apache-mod_session-2.4.54-1.1.mga8 apache-mod_proxy_html-2.4.54-1.1.mga8 apache-mod_dbd-2.4.54-1.1.mga8 apache-htcacheclean-2.4.54-1.1.mga8 apache-mod_suexec-2.4.54-1.1.mga8 apache-mod_userdir-2.4.54-1.1.mga8 apache-mod_brotli-2.4.54-1.1.mga8 apache-doc-2.4.54-1.1.mga8 It also has a change to hopefully address Bug 30543. We may need to wait for September 1st and the next PHP update to validate the changes, but at least we have something to test. Blocks:
(none) =>
30543 MGA8-64 Plasma on Acer Aspire 5253
No installation issues
# systemctl start httpd
[root@mach7 ~]# systemctl -l status httpd
● httpd.service - The Apache HTTP Server
Loaded: loaded (/usr/lib/systemd/system/httpd.service; disabled; vendor preset: disabled)
Active: active (running) since Thu 2022-08-04 15:09:17 CEST; 13s ago
Main PID: 9930 (httpd)
Status: "Processing requests..."
Tasks: 11 (limit: 4364)
Memory: 41.2M
CPU: 1.047s
CGroup: /system.slice/httpd.service
├─9930 /usr/sbin/httpd -DFOREGROUND
├─9942 /usr/sbin/httpd -DFOREGROUND
├─9943 /usr/sbin/httpd -DFOREGROUND
├─9946 /usr/sbin/httpd -DFOREGROUND
├─9948 /usr/sbin/httpd -DFOREGROUND
└─9949 /usr/sbin/httpd -DFOREGROUND
Aug 04 15:09:16 mach7.hviaene.thuis systemd[1]: Starting The Apache HTTP Server...
Aug 04 15:09:17 mach7.hviaene.thuis systemd[1]: Started The Apache HTTP Server.
Then point the browser at localhost and got the responde "It works"
Started mysqld and phpmyadmin, opens up correctly and I can browse around.
So this is OK, but now what??? Am I supposed to leave this running overnight and chack in the morning whether httpd still runs?CC:
(none) =>
herman.viaene Might have something to do with "disabled; vendor preset: disabled)". If it comes down for any reason, systemd may not restart it automatically ? CC:
(none) =>
ftg The issues we're testing for only present in the morning on the 1st of the month, and when separately packaged Apache modules (such as PHP) are updated, so testing this will take time. There should be a PHP update soon in Bug 30704. well, you can simply change apache logrotate config to daily (or even hourly, but then you need to add logrotate to cron.hourly) to speed up trying to provoke it failing... but it did not happend every on daily rotate anyway (we rotate duvel logs on dayly basis) but we have not hit the issue on duvel since I committed the change 2022-07-05 Yes I feel pretty confident about that change. The modules one is the one I really want tested. I'm ok with testing to confirm no regressions are noticed, and rely on the testing on duvel to show that the change fixes the actual problem. If we later find the fix doesn't work, then a new bug can be opened. CC:
(none) =>
davidwhodgins Remember this update is also for Bug 30543. MGA8-32bit The following 4 packages are going to be installed: - apache-2.4.54-1.1.mga8.i586 - apache-htcacheclean-2.4.54-1.1.mga8.i586 - apache-mod_cache-2.4.54-1.1.mga8.i586 - apache-mod_ssl-2.4.54-1.1.mga8.i586 ---this is an upgrade --- -- rebooted server Nextcloud server working as expected. CC:
(none) =>
brtians1
Brian Rockwell
2022-08-28 04:37:08 CEST
Whiteboard:
(none) =>
MGA8-32-OK Has this been tested with a PHP update? Just production version OK, keep in mind this update is for this bug and Bug 30543, so we have to test for both issues. We may have to wait until the next PHP update is in updates_testing. Whiteboard:
MGA8-32-OK =>
(none) There's a PHP update in updates_testing now. Test procedure: (install apache-mod_php if it's not already installed) systemctl status -l httpd (note the starting time) (install PHP updates from updates_testing) systemctl status -l httpd (should note that httpd has been restarted during the PHP update) sure I can test it. I don't see it in the list at the moment, will keep an eye out for it. I'll do a fresh install and an upgrade on the nextcloud box Validating the update.
Before installing the php update
* httpd.service - The Apache HTTP Server
Loaded: loaded (/usr/lib/systemd/system/httpd.service; disabled; vendor preset: disabled)
Active: active (running) since Mon 2022-08-29 20:35:03 EDT; 1 day 18h ago
Main PID: 2135 (/usr/sbin/httpd)
Status: "Total requests: 376; Idle/Busy workers 100/0;Requests/sec: 0.00249; Bytes served/sec: 8 B/sec"
Tasks: 11 (limit: 19118)
Memory: 37.6M
CPU: 20.717s
CGroup: /system.slice/httpd.service
|-2135 /usr/sbin/httpd -DFOREGROUND
|-2171 /usr/sbin/httpd -DFOREGROUND
|-2172 /usr/sbin/httpd -DFOREGROUND
|-2173 /usr/sbin/httpd -DFOREGROUND
|-2174 /usr/sbin/httpd -DFOREGROUND
|-2175 /usr/sbin/httpd -DFOREGROUND
|-2176 /usr/sbin/httpd -DFOREGROUND
|-4771 /usr/sbin/httpd -DFOREGROUND
|-8870 /usr/sbin/httpd -DFOREGROUND
|-8888 /usr/sbin/httpd -DFOREGROUND
`-8889 /usr/sbin/httpd -DFOREGROUND
Aug 29 20:35:03 x3.hodgins.homeip.net systemd[1]: Starting The Apache HTTP Server...
Aug 29 20:35:03 x3.hodgins.homeip.net systemd[1]: Started The Apache HTTP Server.
After installing the update ...
[root@x3 ~]# systemctl status httpd.service
* httpd.service - The Apache HTTP Server
Loaded: loaded (/usr/lib/systemd/system/httpd.service; disabled; vendor preset: disabled)
Active: active (running) since Wed 2022-08-31 14:36:45 EDT; 15s ago
# journalctl -b --no-h|grep httpd|grep -v msec
Aug 31 14:36:45 systemd[1]: httpd.service: Succeeded.
Aug 31 14:36:45 systemd[1]: httpd.service: Consumed 20.813s CPU time.
From /var/log/httpd/error_log ...
[Wed Aug 31 14:36:44.125365 2022] [mpm_prefork:notice] [pid 2135] AH00170: caught SIGWINCH, shutting down gracefully
[Wed Aug 31 14:36:45.299830 2022] [mpm_prefork:notice] [pid 90010] AH00163: Apache/2.4.54 (Unix) OpenSSL/1.1.1q mod_fcgid/2.3.9 PHP/8.0.23 mod_perl/2.0.10 Perl/v5.32.1 configured -- resuming normal operations
[Wed Aug 31 14:36:45.299905 2022] [core:notice] [pid 90010] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'Whiteboard:
(none) =>
MGA8-64-OK
Dave Hodgins
2022-09-01 20:03:09 CEST
Keywords:
(none) =>
advisory An update for this issue has been pushed to the Mageia Updates repository. https://advisories.mageia.org/MGAA-2022-0119.html Status:
NEW =>
RESOLVED Sorry folks: Sep 05 04:02:02 zmhost.mtf.net systemd[1]: Reloading The Apache HTTP Server. Sep 05 04:02:02 zmhost.mtf.net systemd[1]: Reloaded The Apache HTTP Server. Sep 05 04:02:03 zmhost.mtf.net systemd[1]: Reloading The Apache HTTP Server. Sep 05 04:02:03 zmhost.mtf.net systemd[1]: Reloaded The Apache HTTP Server. Sep 05 04:02:03 zmhost.mtf.net systemd[1]: Reloading The Apache HTTP Server. Sep 05 04:02:03 zmhost.mtf.net systemd[1]: Reloaded The Apache HTTP Server. Sep 05 04:02:05 zmhost.mtf.net systemd[1]: httpd.service: Main process exited, code=exited, status=1/FAILURE Sep 05 04:02:05 zmhost.mtf.net systemd[1]: httpd.service: Failed with result 'exit-code'. This server has: [root@zmhost /]# rpm -qa|grep apache apache-2.4.54-1.1.mga8 apache-mod_php-8.0.22-1.mga8 apache-mod_perl-2.0.10-9.mga8 apache-mod_ssl-2.4.54-1.1.mga8 apache-doc-2.4.54-1.1.mga8 Until recently I have been running this server with a root crontab that stopped httpd before the dreaded hour and restarted it after, actually using zmrestart script which stops zoneminder, mysql and httpd in sequence and the re-starts them in the right order. # 59 3 * * 0-6 /usr/bin/zmstop >/dev/null 2>&1 # 4 4 * * 0-6 /usr/bin/zmrestart >/dev/null 2>&1 I commented the crontab (as above) on seeing this fix had been pushed the other day. Resolution:
FIXED =>
(none) did you check that your apache logrotate now has "sharedscripts" it's marked %config(noreplace), so a package update does not change it by itself It seems I missed the change in the httpd logrotate, (thanks Thomas). So closing again. Resolution:
(none) =>
FIXED I am still hitting this issue:
Sep 09 04:02:02 zmhost.mtf.net systemd[1]: Reloading The Apache HTTP Server.
Sep 09 04:02:03 zmhost.mtf.net systemd[1]: Reloaded The Apache HTTP Server.
Sep 09 04:02:04 zmhost.mtf.net systemd[1]: httpd.service: Main process exited, code=exited, status=1/FAILURE
Sep 09 04:02:04 zmhost.mtf.net systemd[1]: httpd.service: Killing process 1279527 (/usr/sbin/httpd) with signal SIGKILL.
Sep 09 04:02:04 zmhost.mtf.net systemd[1]: httpd.service: Killing process 1279989 (zmu) with signal SIGKILL.
Sep 09 04:02:04 zmhost.mtf.net systemd[1]: httpd.service: Killing process 1279990 (zmu) with signal SIGKILL.
Sep 09 04:02:04 zmhost.mtf.net systemd[1]: httpd.service: Failed with result 'exit-code'.
Sep 09 04:02:04 zmhost.mtf.net systemd[1]: httpd.service: Consumed 6h 54min 3.449s CPU time.
This morning the service re-started without issue using:
[root@zmhost baz]# systemctl restart httpd
[root@zmhost baz]# systemctl status httpd
● httpd.service - The Apache HTTP Server
Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2022-09-09 10:59:01 BST; 4s ago
Main PID: 3135595 (/usr/sbin/httpd)
Status: "Processing requests..."
Tasks: 6 (limit: 4540)
Memory: 27.3M
CPU: 174ms
CGroup: /system.slice/httpd.service
├─3135595 /usr/sbin/httpd -DFOREGROUND
├─3135621 /usr/sbin/httpd -DFOREGROUND
├─3135622 /usr/sbin/httpd -DFOREGROUND
├─3135623 /usr/sbin/httpd -DFOREGROUND
├─3135624 /usr/sbin/httpd -DFOREGROUND
└─3135625 /usr/sbin/httpd -DFOREGROUND
Sep 09 10:59:00 zmhost.mtf.net systemd[1]: Starting The Apache HTTP Server...
Sep 09 10:59:01 zmhost.mtf.net systemd[1]: Started The Apache HTTP Server.
zoneminder then re-started immediately httpd came up without my intervention:
[root@zmhost baz]# systemctl status zoneminder
● zoneminder.service - ZoneMinder CCTV recording and security system
Loaded: loaded (/usr/lib/systemd/system/zoneminder.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2022-09-09 10:59:07 BST; 15s ago
Process: 3135596 ExecStart=/usr/bin/zmpkg.pl start (code=exited, status=0/SUCCESS)
Main PID: 3135608 (zmdc.pl)
Tasks: 57 (limit: 4540)
Memory: 483.4M
CPU: 8.274s
CGroup: /system.slice/zoneminder.service
├─3135608 /usr/bin/perl -wT /usr/bin/zmdc.pl startup
├─3135640 /usr/bin/zmc -d /dev/video1
├─3135646 /usr/bin/zmc -d /dev/video2
├─3135652 /usr/bin/zmc -d /dev/video3
├─3135658 /usr/bin/zmc -d /dev/video4
├─3135664 /usr/bin/zmc -d /dev/video5
├─3135670 /usr/bin/zmc -d /dev/video6
├─3135676 /usr/bin/zmc -d /dev/video7
├─3135682 /usr/bin/zmc -m 8
├─3135688 /usr/bin/zmc -d /dev/video8
├─3135694 /usr/bin/zmc -d /dev/video10
├─3135700 /usr/bin/zmc -d /dev/video15
├─3135706 /usr/bin/zmc -d /dev/video9
├─3135712 /usr/bin/zmc -m 13
├─3135716 /usr/bin/perl -wT /usr/bin/zmfilter.pl --filter_id=1 --daemon
├─3135722 /usr/bin/perl -wT /usr/bin/zmfilter.pl --filter_id=2 --daemon
├─3135726 /usr/bin/perl -wT /usr/bin/zmaudit.pl -c
├─3135730 /usr/bin/perl -wT /usr/bin/zmwatch.pl
├─3135738 /usr/bin/perl -wT /usr/bin/zmtelemetry.pl
└─3135742 /usr/bin/perl -wT /usr/bin/zmstats.pl
/etc/logrotate.d/httpd content is:
/var/log/httpd/*_log /var/log/httpd/apache_runtime_status /var/log/httpd/ssl_mutex {
rotate 3
daily
missingok
notifempty
sharedscripts
nocompress
postrotate
/bin/systemctl reload httpd.service > /dev/null 2>/dev/null || true
endscript
}
[root@zmhost logrotate.d]# rpm -qa|grep apache
apache-2.4.54-1.1.mga8
apache-mod_php-8.0.22-1.mga8
apache-mod_perl-2.0.10-9.mga8
apache-mod_ssl-2.4.54-1.1.mga8
apache-doc-2.4.54-1.1.mga8
[root@zmhost logrotate.d]#
Any ideas?Status:
RESOLVED =>
REOPENED This particular bug is fixed. The issue on your system unrelated. Resolution:
(none) =>
FIXED |
This has been happening on two Mageia servers and barjac mentioned on IRC he has the same problem since updating to Mageia 8. It seems systemctl reload happens several times in a very short succession and after a few successes one fails with the process dying, then systemd never tries to start it again. ● httpd.service - The Apache HTTP Server Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Mon 2021-11-01 04:02:03 CET; 12h ago Process: 332705 ExecStart=/usr/sbin/httpd $OPTIONS -DFOREGROUND (code=exited, status=1/FAILURE) Process: 405999 ExecReload=/usr/sbin/httpd $OPTIONS -k graceful (code=exited, status=0/SUCCESS) Main PID: 332705 (code=exited, status=1/FAILURE) Status: "Configuration loaded." CPU: 3h 19min 2.135s Nov 01 04:02:02 neru.mageia.org systemd[1]: Reloading The Apache HTTP Server. Nov 01 04:02:02 neru.mageia.org systemd[1]: Reloaded The Apache HTTP Server. Nov 01 04:02:02 neru.mageia.org systemd[1]: Reloading The Apache HTTP Server. Nov 01 04:02:02 neru.mageia.org systemd[1]: Reloaded The Apache HTTP Server. Nov 01 04:02:02 neru.mageia.org systemd[1]: Reloading The Apache HTTP Server. Nov 01 04:02:02 neru.mageia.org systemd[1]: Reloaded The Apache HTTP Server. Nov 01 04:02:03 neru.mageia.org systemd[1]: Reloading The Apache HTTP Server. Nov 01 04:02:03 neru.mageia.org systemd[1]: Reloaded The Apache HTTP Server. Nov 01 04:02:03 neru.mageia.org systemd[1]: httpd.service: Main process exited, code=exited, status=1/FAILURE Nov 01 04:02:03 neru.mageia.org systemd[1]: httpd.service: Killing process 406002 (httpd) with signal SIGKILL. Nov 01 04:02:03 neru.mageia.org systemd[1]: httpd.service: Killing process 406003 (httpd) with signal SIGKILL. Nov 01 04:02:03 neru.mageia.org systemd[1]: httpd.service: Killing process 406002 (httpd) with signal SIGKILL. Nov 01 04:02:03 neru.mageia.org systemd[1]: httpd.service: Killing process 406003 (httpd) with signal SIGKILL. Nov 01 04:02:03 neru.mageia.org systemd[1]: httpd.service: Failed with result 'exit-code'. Nov 01 04:02:03 neru.mageia.org systemd[1]: httpd.service: Unit process 406002 (httpd) remains running after unit stopped. Nov 01 04:02:03 neru.mageia.org systemd[1]: httpd.service: Unit process 406003 (httpd) remains running after unit stopped. Nov 01 04:02:03 neru.mageia.org systemd[1]: httpd.service: Consumed 3h 19min 2.135s CPU time. Nov 01 04:02:03 neru.mageia.org audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=httpd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed' Nov 01 04:02:03 neru.mageia.org kernel: audit: type=1131 audit(1635735723.120:77651): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=httpd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed' Nov 01 04:02:09 neru.mageia.org systemd[1]: httpd.service: Unit cannot be reloaded because it is inactive. Nov 01 04:02:09 neru.mageia.org systemd[1]: httpd.service: Unit cannot be reloaded because it is inactive. Nov 01 04:02:09 neru.mageia.org systemd[1]: httpd.service: Unit cannot be reloaded because it is inactive. Nov 01 04:02:09 neru.mageia.org systemd[1]: httpd.service: Unit cannot be reloaded because it is inactive. Nov 01 04:02:10 neru.mageia.org systemd[1]: httpd.service: Unit cannot be reloaded because it is inactive. Nov 01 04:02:10 neru.mageia.org systemd[1]: httpd.service: Unit cannot be reloaded because it is inactive. Nov 01 04:02:10 neru.mageia.org systemd[1]: httpd.service: Unit cannot be reloaded because it is inactive. Nov 01 04:02:10 neru.mageia.org systemd[1]: httpd.service: Unit cannot be reloaded because it is inactive. Nov 01 04:02:11 neru.mageia.org systemd[1]: httpd.service: Unit cannot be reloaded because it is inactive. Nov 01 04:02:11 neru.mageia.org systemd[1]: httpd.service: Unit cannot be reloaded because it is inactive. Nov 01 04:02:11 neru.mageia.org systemd[1]: httpd.service: Unit cannot be reloaded because it is inactive. Nov 01 04:02:11 neru.mageia.org systemd[1]: httpd.service: Unit cannot be reloaded because it is inactive. Nov 01 04:02:11 neru.mageia.org systemd[1]: httpd.service: Unit cannot be reloaded because it is inactive. Nov 01 04:02:11 neru.mageia.org systemd[1]: httpd.service: Unit cannot be reloaded because it is inactive. Nov 01 04:02:11 neru.mageia.org systemd[1]: httpd.service: Unit cannot be reloaded because it is inactive. Nov 01 04:02:11 neru.mageia.org systemd[1]: httpd.service: Unit cannot be reloaded because it is inactive. Nov 01 04:02:12 neru.mageia.org systemd[1]: httpd.service: Unit cannot be reloaded because it is inactive. Nov 01 04:02:12 neru.mageia.org systemd[1]: httpd.service: Unit cannot be reloaded because it is inactive. Nov 01 04:02:12 neru.mageia.org systemd[1]: httpd.service: Unit cannot be reloaded because it is inactive. Nov 01 04:02:12 neru.mageia.org systemd[1]: httpd.service: Unit cannot be reloaded because it is inactive. Nov 01 04:02:12 neru.mageia.org systemd[1]: httpd.service: Unit cannot be reloaded because it is inactive. Nov 01 04:02:12 neru.mageia.org systemd[1]: httpd.service: Unit cannot be reloaded because it is inactive. Nov 01 04:02:13 neru.mageia.org systemd[1]: httpd.service: Unit cannot be reloaded because it is inactive. Nov 01 04:02:13 neru.mageia.org systemd[1]: httpd.service: Unit cannot be reloaded because it is inactive.