Bug 12994 - transparent_hugepages causes huge performance degradation on some job (x20!)
: transparent_hugepages causes huge performance degradation on some job (x20!)
Status: RESOLVED FIXED
Product: Mageia
Classification: Unclassified
Component: RPM Packages
: 4
: x86_64 Linux
: Normal Severity: minor
: ---
Assigned To: Thomas Backlund
:
:
:
:
:
:
  Show dependency treegraph
 
Reported: 2014-03-11 14:33 CET by Chris Denice
Modified: 2014-06-19 15:14 CEST (History)
0 users

See Also:
Source RPM: kernel-3.12.13-2.mga4.src.rpm
CVE:
Status comment:


Attachments

Description Chris Denice 2014-03-11 14:33:00 CET
Hi there,

On kernel-mga server (and maybe on desktop branch as well), transparent hugepages are activated by default for all applications:

cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never

This causes catastrophic memory bottleneck issues from some scientific software. The one I tested got a speed-down of more than a factor 20.

The "tmb" kernels (tested on 3.12.9-tmb-desktop-1) do not suffer from this problem as they are compiled with 
cat /sys/kernel/mm/transparent_hugepage/enabled 
always [madvise] never

I suggest that our default kernels should be set with the same option as in kernel-tmb.


Here some output of the perf command on the above mention job:

-----------

With [always]:

Performance counter stats for process id '2873':

     502807.884816 task-clock                #    0.986 CPUs utilized           [100.00%]
            50,862 context-switches          #    0.101 K/sec                   [100.00%]
             1,007 cpu-migrations            #    0.002 K/sec                   [100.00%]
         1,942,898 page-faults               #    0.004 M/sec                  
 1,253,195,887,064 cycles                    #    2.492 GHz                     [40.01%]
    26,064,489,984 stalled-cycles-frontend   #    2.08% frontend cycles idle    [40.01%]
 1,077,143,421,532 stalled-cycles-backend    #   85.95% backend  cycles idle    [39.99%]
    92,480,197,254 instructions              #    0.07  insns per cycle        
                                             #   11.65  stalled cycles per insn [40.00%]
    14,743,277,283 branches                  #   29.322 M/sec                   [39.99%]
       473,693,401 branch-misses             #    3.21% of all branches         [39.99%]
   226,801,709,703 L1-dcache-loads           #  451.070 M/sec                   [40.00%]
     4,165,665,190 L1-dcache-load-misses     #    1.84% of all L1-dcache hits   [40.01%]
    27,180,082,792 LLC-loads                 #   54.057 M/sec                   [40.01%]
     4,113,015,180 LLC-load-misses           #   15.13% of all LL-cache hits    [40.01%]

509.983990738 seconds time elapsed

------------------------------

with [never]

Performance counter stats for process id '6674':

      27239.596662 task-clock                #    0.952 CPUs utilized           [100.00%]
             2,950 context-switches          #    0.108 K/sec                   [100.00%]
                68 cpu-migrations            #    0.002 K/sec                   [100.00%]
         2,599,749 page-faults               #    0.095 M/sec                  
    66,409,117,724 cycles                    #    2.438 GHz                     [40.03%]
     5,952,852,737 stalled-cycles-frontend   #    8.96% frontend cycles idle    [40.01%]
    31,017,329,357 stalled-cycles-backend    #   46.71% backend  cycles idle    [40.00%]
    62,820,347,576 instructions              #    0.95  insns per cycle        
                                             #    0.49  stalled cycles per insn [40.02%]
    10,138,845,015 branches                  #  372.210 M/sec                   [39.97%]
       141,091,064 branch-misses             #    1.39% of all branches         [40.00%]
    23,659,469,728 L1-dcache-loads           #  868.569 M/sec                   [40.00%]
       397,775,061 L1-dcache-load-misses     #    1.68% of all L1-dcache hits   [40.01%]
       969,632,553 LLC-loads                 #   35.596 M/sec                   [40.02%]
       103,381,444 LLC-load-misses           #   10.66% of all LL-cache hits    [39.99%]

      28.607893859 seconds time elapsed


-------

Have a look to the elapsed time, 509s against 28, as well as the speed-down of more than two on L1-dcache-loads.


Cheers,
Chris.


Reproducible: 

Steps to Reproduce:
Comment 1 Bit Twister 2014-03-16 20:52:03 CET
Setting madvise on my 3.12.13-desktop-2.mga4 gave me a noticeable kde start time improvement.
Also made my MythTV frontend response improve when recording 4 shows at the time.

Tried creating /etc/sysctl.d/my_sysctl.conf with
sys.kernel.mm.transparent_hugepage.enabled = madvise
and reboot, but it did not change /sys/kernel/mm/transparent_hugepage/enabled  :(

"journalctl | grep sysctl" showed no errors with my_sysctl.conf.

Had to create /etc/rc.d/rc.local, chmod +x /etc/rc.d/rc.local and add
echo madvise  > /sys/kernel/mm/transparent_hugepage/enabled
to get it to work.
Comment 2 Chris Denice 2014-06-19 15:14:17 CEST
Thanks Thomas :)

Fixed with latest kernel updates.

Note You need to log in before you can comment on or make changes to this bug.