Ever saw something like below messages inside your KVM (Kernel Virtual Machine) guest's console?
" BUG: soft lockup - CPU#0 stuck for 10s! [swapper:0] "
I did and I find it a bit annoying. If you're inside graphical desktop like environment like KDE or GNOME, you probably won't notice it directly. But you will likely suffer the same condition, the guest OS somehow become unresponsive for a few moment. In my case, it manifest into stalled CD/DVD access and "ruins" the console display. I had to press Enter few times before I could get back to normal shell prompt. Before I go further, FYI I use Fedora 9, kernel version 2.6.27.23-xx.x.xx.fc9.i686 on a Core Duo powered laptop.
First, why the kernel shows such message? I use the default CentOS 5.3 kernel, so I check the related kernel config inside /boot directory and here is the related configuration item:
CONFIG_DETECT_SOFTLOCKUP=y
What does it do? Ingo Molnar, the writer of this lockup detection patch describe it as:
"From: Ingo Molnar
This patch adds a new kernel debug feature: CONFIG_DETECT_SOFTLOCKUP.
When enabled then per-CPU watchdog threads are started, which try to run once per second. If they get delayed for more than 10 seconds then a callback from the timer interrupt detects this condition and prints out a warning message and a stack dump (once per lockup incident). The feature is otherwise non-intrusive, it doesnt try to unlock the box in any way, it only gets the debug info out, automatically, and on all CPUs affected by the lockup.
Tested this on x86, both with the feature enabled (in which case a provoked lockup was correctly detected) and with the feature disabled. It is CPU-hotplug aware. Should work on every architecture. "
Pay attention here that the watchdog mentioned in the above description has nothing to do with NMI (Non Maskable Interrupt) watchdog. NMI watchdog deal with hard CPU lockup, while the above mentioned lockup watchdog can't. It's just a kernel thread that will stuck if CPU hangs.
I suspect it might be a bug in KVM driver (or specificly, KVM for Intel VT in my case). I came to this hypothesis because the help section of soft lockup patch says:
"Say Y here to enable the kernel to detect "soft lockups", which are bugs that cause the kernel to loop in kernel mode for more than 10 seconds, without giving other tasks a chance to run."
Great...so IMO KVM is too busy on something, or.... something is delaying KVM guest to run. I almost rush to compile my own kernel image using full preemption, hoping that it could squash the problem. But I was tempted to Google a bit more. Interesting result, a post in a mailing list (I forgot which one) suggest to set the CPU frequency into static. Let's try, I edited /etc/sysconfig/cpuspeed so the related lines become:
MAX_SPEED=1333000
MIN_SPEED=1333000
I pick that frequency because it's the middle number between the three available frequencies: 1833000, 1333000 and 1000000 Hz. So theoritically I still get adequate computing power to most job without draining the battery too soon.
Execute:
# service cpuspeed restart
Make sure it's applied correctly:
# grep '1333000' -r /sys/devices/system/cpu/
/sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq:1333000
/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq:1333000
...
/sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq:1333000
/sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq:1333000
...
Then I ran my KVM guest again. I did few tasks in it, let it went idle, repeat and so on. During my test for about an hour, the result was promising! It became stable. Well, few lock up happened though, but it was far reduced. Thing that I notice is lockup also happens when I switch into another virtual desktop or if fairly heavy swapping in/out is on the way. So, to further reduce lock-ups, I avoid switching to another virtual desktop (ok, that sucks, but I could live with that) and close any unneccesary applications to conserve virtual memory as much as I can.
Why it works? All I can say is by making the frequency static, you also avoid timer interrupt delivery frequency being changed too. It stabilize the kernel timing and also indirectly stabilize the KVM guest timing. Previously, I was using ondemand power governor and as you might be aware of, it adapts the CPU frequency according to the load quite aggresively. So, frequency was juggling between all three available frequencies. Conservative governor didn't lend a help here. Pity, I assume KVM still force conservative governor to switch to highest frequency then drop most of the time because most of the codes could be run natively instead of being translated.
regards,
Mulyadi.