Hi all,
We have an 8 core Dell Poweredge which we're running Gentoo on,
which is basically just being used to run VMWare server (not ESX).
Kernel/CPU details as follows:
Linux hydra 2.6.19-gentoo-r5 #2 SMP Tue Oct 9 16:25:09 GMT 2007
x86_64 Intel(R) Xeon(R) CPU E5335 @ 2.00GHz GenuineIntel GNU/Linux
(later versions of the kernel cause problems for VMWare).
Occassionally, the machine hangs for a couple of minutes, in which
we can't access it in any way. Eventually it comes back as if nothing
has happened.
According to dmesg, we get a "soft lockup detected on CPU#0" around
the time that the hangs occur. There are also a large number of
hda/ide errors in dmesg, though the only device on hda is the cdrom
drive, which isn't being used (and the tray isn't open).
There's some articles on LKML giving a similar error, though we
haven't been able to apply anything there to our problem.
Has anyone seen anything like this before? Suggestions?
Some of the dmesg output is included below.
end_request: I/O error, dev hda, sector 0
hda: tray open
end_request: I/O error, dev hda, sector 0
hda: irq timeout: status=0xd0 { Busy }
ide: failed opcode was: unknown
hda: status timeout: status=0xd0 { Busy }
ide: failed opcode was: unknown
hda: drive not ready for command
BUG: soft lockup detected on CPU#0!
Call Trace:
 <IRQ>  [<ffffffff80252a3f>] softlockup_tick+0xdb/0xed
 [<ffffffff80239bf5>] update_process_times+0x42/0x68
 [<ffffffff802181d0>] smp_local_timer_interrupt+0x34/0x55
 [<ffffffff80218874>] smp_apic_timer_interrupt+0x52/0x6a
 [<ffffffff8020a146>] apic_timer_interrupt+0x66/0x70
 [<ffffffff80439e1d>] ide_outb+0x0/0x9
 [<ffffffff80438b60>] ide_inb+0x4/0x8
 [<ffffffff80439d37>] ide_wait_stat+0xaa/0x110
 [<ffffffff80437b6c>] ide_do_request+0x437/0x983
 [<ffffffff80219573>] __unmask_IO_APIC_irq+0x4f/0x6f
 [<ffffffff802195b4>] unmask_IO_APIC_irq+0x21/0x35
 [<ffffffff80253990>] default_enable+0x18/0x21
 [<ffffffff80253936>] check_irq_resend+0x16/0x58
 [<ffffffff80438b40>] ide_timer_expiry+0x2bf/0x2db
 [<ffffffff8022ad6d>] rebalance_tick+0x170/0x369
 [<ffffffff80438881>] ide_timer_expiry+0x0/0x2db
 [<ffffffff802394a2>] run_timer_softirq+0x130/0x1a5
 [<ffffffff80236096>] __do_softirq+0x55/0xc3
 [<ffffffff8020a69c>] call_softirq+0x1c/0x28
 [<ffffffff8020bae3>] do_softirq+0x2c/0x7d
 [<ffffffff80218879>] smp_apic_timer_interrupt+0x57/0x6a
 [<ffffffff80208071>] mwait_idle+0x0/0x20
 [<ffffffff8020a146>] apic_timer_interrupt+0x66/0x70
 <EOI>  [<ffffffff881c27f8>] :vmnet:VNetHub_AllocVnet+0x29c/0x2dc
 [<ffffffff80208070>] mwait_idle_with_hints+0x44/0x45
 [<ffffffff8020807d>] mwait_idle+0xc/0x20
 [<ffffffff80208988>] cpu_idle+0x8a/0xae
 [<ffffffff807876e0>] start_kernel+0x218/0x21d
 [<ffffffff8078715a>] _sinittext+0x15a/0x15e
hda: status timeout: status=0xd0 { Busy }
ide: failed opcode was: unknown
hda: drive not ready for command
hda: status timeout: status=0xd0 { Busy }
ide: failed opcode was: unknown
hda: drive not ready for command
hda: status timeout: status=0xd0 { Busy }
ide: failed opcode was: unknown
-- 
Be seeing you,            http://www.glendale.org.uk
Sam.                      xmpp:sam@???