Hostname: abcdmesxi843
ESXi Version: VMware ESXi 7.0.2 build-17867351
VM Name: xyz761
Time of Issue: 10.48 AM SGT
Time in GMT: 2:48 am GMT
Datastore Details: naa.60060e8012456900504045690000025d ||
G800-RCG-INT-APP-NPRD-RH-01-N-007-25D
Vmware Logs:
- From the Vmware Logs we can see the VM was running on Host: abcdmesxi843
2021-11-17T01:13:25.001Z| vmx| | I005: Hostname=abcdmesxi843
2021-11-17T02:40:01.321Z| vcpu-1| | I005:
Msg_Question:
- Here we can see that the VM Lost the Access to the VMDK: xyz761_1.vmdk
and we got the below warning for the same:
2021-11-17T02:40:01.321Z| vcpu-1| | I005:
[msg.hbacommon.locklost] The lock protecting
‘/vmfs/volumes/6174c00b-20f7956e-08e3-0031b51502cd/xyz761/xyz761_1.vmdk’
has been lost, possibly due to underlying storage issues.
2021-11-17T02:40:01.321Z| vcpu-1| | I005+ If this
virtual machine is configured to be highly available, ensure that the virtual
machine is running on some other host before clicking OK.
2021-11-17T02:40:01.321Z| vcpu-1| | I005:
—————————————-
2021-11-17T02:40:22.422Z| vcpu-0| | I005: Tools: Tools
heartbeat timeout.
2021-11-17T02:44:01.878Z| vcpu-1| | I005: Timing out
dialog 247634252021-11-17T02:44:01.879Z| vcpu-1| | I005: MsgQuestion:
msg.hbacommon.locklost reply=0
2021-11-17T02:44:01.879Z| vcpu-1| | E001: PANIC: Exiting because of
failed disk operation.
2021-11-17T02:44:02.496Z| vcpu-1| | W003: A core file
is available in “/vmfs/volumes/61659acc-89cee08e-924f-0031b515054f/xyz761_1636708105855/vmx-zdump.003″
- Post this we can see that the VM shutdown:
2021-11-17T02:44:02.496Z| mks| | W003: Panic in
progress… ungrabbing2021-11-17T02:44:02.496Z| mks| | I005: MKS: Release
starting (Panic)
2021-11-17T02:44:02.496Z| mks| | I005: MKS: Release
finished (Panic)
2021-11-17T02:44:02.496Z| vcpu-1| | I005: Backtrace:
2021-11-17T02:44:02.496Z| vcpu-1| | I005: Backtrace[0]
00000029efbe2d80 rip=00000029a7209779 rbx=00000029efbe2da0 rbp=00000029efbe3280
r12=00000029a8422ac0 r13=0000000000000001 r14=00000029a8196440
r15=00000029edc612e0
2021-11-17T02:44:02.496Z| vcpu-1| | I005: Backtrace[1]
00000029efbe3290 rip=00000029a736ff61 rbx=00000029a8c71678 rbp=00000029efbe32b0
r12=00000029a8bbce70 r13=0000000000000002 r14=00000029a8196440
r15=00000029edc612e0
2021-11-17T02:44:02.496Z| vcpu-1| | I005: Backtrace[2]
00000029efbe32c0 rip=00000029a7365e7a rbx=00000029a8c6ae00 rbp=00000029efbe3470
r12=00000029a8422ac0 r13=000000000000014f r14=00000029a8196440
r15=00000029edc612e0
2021-11-17T02:44:02.496Z| vcpu-1| | I005: Backtrace[3]
00000029efbe3480 rip=00000029a767e767 rbx=0000000000000230 rbp=00000029efbe34c0
r12=00000029a86cc598 r13=000000000000014f r14=00000029a8196440
r15=00000029edc612e0
2021-11-17T02:44:02.496Z| vcpu-1| | I005: Backtrace[4]
00000029efbe34d0 rip=00000029a768769d rbx=00000029edc62368 rbp=00000029efbe3510
r12=00000029edc612d0 r13=000000000000012d r14=00000029edc612e0
r15=0000000000000001
2021-11-17T02:44:02.496Z| vcpu-1| | I005: Backtrace[5]
00000029efbe3520 rip=00000029a767e922 rbx=00000029a86930e0 rbp=00000029efbe3530
r12=0000000000000001 r13=00000029efbe79c0 r14=00000029a88f3040
r15=0000000000000003
2021-11-17T02:44:02.497Z| vcpu-1| | I005: Backtrace[6]
00000029efbe3540 rip=00000029a77b4037 rbx=00000029efbe3540 rbp=00000029efbe3660
r12=00000029a8ca6c40 r13=00000029efbe79c0 r14=00000029a88f3040
r15=0000000000000003
2021-11-17T02:44:02.497Z| vcpu-1| | I005: Backtrace[7]
00000029efbe3670 rip=00000029ea428d3b rbx=0000000000000000 rbp=0000000000000000
r12=00000029ef9e2360 r13=00000029efbe79c0 r14=00000029a88f3040
r15=0000000000000003
2021-11-17T02:44:02.497Z| vcpu-1| | I005: Backtrace[8]
00000029efbe3780 rip=00000029ea7288ad rbx=0000000000000000 rbp=0000000000000000
r12=00000029ef9e2360 r13=00000029efbe79c0 r14=00000029a88f3040
r15=0000000000000003
2021-11-17T02:44:02.497Z| vcpu-1| | I005: Backtrace[9]
00000029efbe3788 rip=0000000000000000 rbx=0000000000000000 rbp=0000000000000000
r12=00000029ef9e2360 r13=00000029efbe79c0 r14=00000029a88f3040
r15=0000000000000003
2021-11-17T02:44:02.497Z| vcpu-1| | I005:
SymBacktrace[0] 00000029efbe2d80 rip=00000029a7209779 in function Panic in
object /bin/vmx loaded at 00000029a6faf000
2021-11-17T02:44:02.497Z| vcpu-1| | I005:
SymBacktrace[1] 00000029efbe3290 rip=00000029a736ff61 in function (null) in
object /bin/vmx loaded at 00000029a6faf000
2021-11-17T02:44:02.497Z| vcpu-1| | I005:
SymBacktrace[2] 00000029efbe32c0 rip=00000029a7365e7a in function (null) in
object /bin/vmx loaded at 00000029a6faf000
2021-11-17T02:44:02.497Z| vcpu-1| | I005:
SymBacktrace[3] 00000029efbe3480 rip=00000029a767e767 in function (null) in
object /bin/vmx loaded at 00000029a6faf000
2021-11-17T02:44:02.497Z| vcpu-1| | I005:
SymBacktrace[4] 00000029efbe34d0 rip=00000029a768769d in function (null) in
object /bin/vmx loaded at 00000029a6faf000
2021-11-17T02:44:02.497Z| vcpu-1| | I005:
SymBacktrace[5] 00000029efbe3520 rip=00000029a767e922 in function (null) in
object /bin/vmx loaded at 00000029a6faf000
2021-11-17T02:44:02.497Z| vcpu-1| | I005:
SymBacktrace[6] 00000029efbe3540 rip=00000029a77b4037 in function (null) in
object /bin/vmx loaded at 00000029a6faf000
2021-11-17T02:44:02.497Z| vcpu-1| | I005:
SymBacktrace[7] 00000029efbe3670 rip=00000029ea428d3b in function (null) in
object /lib64/libpthread.so.0 loaded at 00000029ea421000
2021-11-17T02:44:02.497Z| vcpu-1| | I005:
SymBacktrace[8] 00000029efbe3780 rip=00000029ea7288ad in function clone in
object /lib64/libc.so.6 loaded at 00000029ea63e000
2021-11-17T02:44:02.497Z| vcpu-1| | I005:
SymBacktrace[9] 00000029efbe3788 rip=0000000000000000
VOBD Logs:
- From the VOBD Logs we can see issues with the Datastore
G800-RCG-INT-APP-NPRD-RH-01-R-003-603.
2021-11-17T01:38:46.070Z: [vmfsCorrelator]
2477863646095us: [vob.vmfs.heartbeat.unrecoverable] This host lost connectivity
to volume 6174c00b-20f7956e-08e3-0031b51502cd
(“G800-RCG-INT-APP-NPRD-RH-01-R-003-603”) and subsequent recovery
attempts have failed
2021-11-17T01:38:46.070Z: [vmfsCorrelator]
2477863645774us: [esx.problem.vmfs.heartbeat.unrecoverable]
6174c00b-20f7956e-08e3-0031b51502cd G800-RCG-INT-APP-NPRD-RH-01-R-003-603
2021-11-17T02:44:02.494Z: [UserWorldCorrelator]
2481780070364us: [vob.uw.core.dumped] /bin/vmx(2389436)
/vmfs/volumes/61659acc-89cee08e-924f-0031b515054f/xyz761_1636708105855/vmx-zdump.003
2021-11-17T02:44:02.494Z: [UserWorldCorrelator]
2481780070004us: [esx.problem.application.core.dumped] An application
(/bin/vmx) running on ESXi host has crashed (8 time(s) so far). A core file may
have been created at /vmfs/volumes/61659acc-89cee08e-924f-0031b515054f/xyz761_1636708105855/vmx-zdump.003.
VMKernel Logs:
- From the Vmkernel Logs we can see the same set of events for the
Datastore: 60060e80124569005040456900000603.
2021-11-17T01:38:46.069Z cpu2:2097202)NMP:
nmp_ThrottleLogForDevice:3861: Cmd 0xc0 (0x45b920dac300, 2097193) to dev
“naa.60060e80124569005040456900000603” on path
“vmhba2:C0:T3:L8” Failed:
2021-11-17T01:38:46.069Z cpu2:2097202)NMP:
nmp_ThrottleLogForDevice:3869: H:0x0 D:0x2 P:0x0 Valid sense data: 0xe 0x1d
0x0. Act:NONE. cmdId.initiator=0x4306a623b140 CmdSN 0x2a6
2021-11-17T01:38:46.069Z cpu2:2097202)ScsiDeviceIO:
4277: Cmd(0x45b920d07d00) 0xfe, CmdSN 0x2a6 from world 2097193 to dev
“naa.60060e80124569005040456900000603” failed H:0x0 D:0x2 P:0x5
2021-11-17T01:38:46.069Z cpu2:2097202)HBX: 1947: ATS Miscompare detected
between test and set HB images at offset 3153920 on vol
‘6174c00b-20f7956e-08e3-0031b51502cd’.
2021-11-17T01:38:46.069Z cpu2:2097202)HBX: 1948:
‘G800-RCG-INT-APP-NPRD-RH-01-R-003-603’: HB at offset 3153920 – Test
version:2021-11-17T01:38:46.070Z cpu4:2097837)HBX: 5822:
‘G800-RCG-INT-APP-NPRD-RH-01-R-003-603’: HB at offset 3153920 – Cancelling all
threads waiting for reclaim of HB:
2021-11-17T01:38:46.070Z
cpu4:2097837)WARNING: HBX: 318: This host lost connectivity to volume
6174c00b-20f7956e-08e3-0031b51502cd
(“G800-RCG-INT-APP-NPRD-RH-01-R-003-603”) and subsequent recovery
attempts have failed
2021-11-17T01:38:46.070Z cpu4:2097837)HBX: 6106:
‘G800-RCG-INT-APP-NPRD-RH-01-R-003-603’: HB at offset 3153920 – Lost heartbeat.
On-disk:
2021-11-17T02:44:05.843Z
cpu22:2097244)WARNING: IOSpaceVMK: 264: IOSpaceVMK: Couldn’t find port VMCI
resultHigh.
2021-11-17T02:44:05.843Z
cpu22:2097244)WARNING: IOSpaceVMK: 264: IOSpaceVMK: Couldn’t find port VMCI
resultLow.
2021-11-17T02:44:10.957Z
cpu19:2097400)WARNING: Vol3: 2953: ‘G800-RCG-INT-APP-NPRD-RH-01-R-003-603’:
Failed to clear journal address in on-disk HB. This could result in leak of
journal block at <type 1 addr 1128496>
2021-11-17T06:55:47.913Z
cpu9:2392011)World: vm 2392012: 6892: Starting world vmm0:xyz761 of type
8
Hostname: abcdmesxi842
ESXi Version: VMware ESXi 7.0.2 build-17867351
VM Name: xyz760
Time of Issue: 5:22 AM SGT
Time in GMT: 9:22 pm GMT
- From the
Logs of the other VM we can see the same set of Issue being Raised for the
VM:
2021-11-16T02:09:28.195Z| vmx| | I005: Hostname=abcdmesxi842
2021-11-16T21:13:37.644Z| vcpu-1| | I005:
Msg_Question:
2021-11-16T21:13:37.644Z| vcpu-1| | I005:
[msg.hbacommon.locklost] The lock protecting
‘/vmfs/volumes/6174c00b-20f7956e-08e3-0031b51502cd/xyz804/xyz804.vmdk’
has been lost, possibly due to underlying storage issues.
2021-11-16T21:13:37.644Z| vcpu-1| | I005+ If this
virtual machine is configured to be highly available, ensure that the virtual
machine is running on some other host before clicking OK.
2021-11-16T21:13:37.644Z| vcpu-1| | I005:
—————————————-
2021-11-16T21:17:37.712Z| vcpu-1| | I005: Timing out
dialog 239331462021-11-16T21:17:37.713Z| vcpu-1| | I005: MsgQuestion:
msg.hbacommon.locklost reply=0
2021-11-16T21:17:37.713Z| vcpu-1| | E001: PANIC: Exiting because of
failed disk operation.
2021-11-16T21:17:38.377Z| vcpu-1| | W003: A core file
is available in “/vmfs/volumes/61659acc-89cee08e-924f-0031b515054f/xyz760_1636708106786/vmx-zdump.002″
2021-11-16T21:17:38.377Z| mks| | W003: Panic in
progress… ungrabbing2021-11-16T21:17:38.377Z| mks| | I005: MKS: Release
starting (Panic)2021-11-16T21:17:38.377Z| mks| | I005: MKS: Release finished
(Panic)
2021-11-16T21:17:38.377Z| vcpu-1| | I005: Backtrace:
2021-11-16T21:17:38.377Z| vcpu-1| | I005: Backtrace[0]
0000000644838d80 rip=00000005fbddf779 rbx=0000000644838da0 rbp=0000000644839280
r12=00000005fcff8ac0 r13=0000000000000001 r14=00000005fcd6c440
r15=00000006428372e0
2021-11-16T21:17:38.377Z| vcpu-1| | I005: Backtrace[1]
0000000644839290 rip=00000005fbf45f61 rbx=00000005fd847238 rbp=00000006448392b0
r12=00000005fd793e20 r13=0000000000000002 r14=00000005fcd6c440
r15=00000006428372e0
2021-11-16T21:17:38.377Z| vcpu-1| | I005: Backtrace[2]
00000006448392c0 rip=00000005fbf3be7a rbx=00000005fd8409c0 rbp=0000000644839470
r12=00000005fcff8ac0 r13=000000000000014f r14=00000005fcd6c440
r15=00000006428372e0
2021-11-16T21:17:38.377Z| vcpu-1| | I005: Backtrace[3]
0000000644839480 rip=00000005fc254767 rbx=0000000000000230 rbp=00000006448394c0
r12=00000005fd2a2598 r13=000000000000014f r14=00000005fcd6c440
r15=00000006428372e0
2021-11-16T21:17:38.377Z| vcpu-1| | I005: Backtrace[4]
00000006448394d0 rip=00000005fc25d69d rbx=0000000642838368 rbp=0000000644839510
r12=00000006428372d0 r13=000000000000012d r14=00000006428372e0
r15=0000000000000001
2021-11-16T21:17:38.377Z| vcpu-1| | I005: Backtrace[5]
0000000644839520 rip=00000005fc254922 rbx=00000005fd2690e0 rbp=0000000644839530
r12=0000000000000001 r13=000000064483d9c0 r14=00000005fd4c9040
r15=0000000000000003
2021-11-16T21:17:38.377Z| vcpu-1| | I005: Backtrace[6]
0000000644839540 rip=00000005fc38a037 rbx=0000000644839540 rbp=0000000644839660
r12=00000005fd87ca80 r13=000000064483d9c0 r14=00000005fd4c9040
r15=0000000000000003
2021-11-16T21:17:38.377Z| vcpu-1| | I005: Backtrace[7]
0000000644839670 rip=000000063effed3b rbx=0000000000000000 rbp=0000000000000000
r12=0000000644638360 r13=000000064483d9c0 r14=00000005fd4c9040
r15=0000000000000003
2021-11-16T21:17:38.377Z| vcpu-1| | I005: Backtrace[8]
0000000644839780 rip=000000063f2fe8ad rbx=0000000000000000 rbp=0000000000000000
r12=0000000644638360 r13=000000064483d9c0 r14=00000005fd4c9040
r15=0000000000000003
2021-11-16T21:17:38.377Z| vcpu-1| | I005: Backtrace[9]
0000000644839788 rip=0000000000000000 rbx=0000000000000000 rbp=0000000000000000
r12=0000000644638360 r13=000000064483d9c0 r14=00000005fd4c9040
r15=0000000000000003
2021-11-16T21:17:38.377Z| vcpu-1| | I005:
SymBacktrace[0] 0000000644838d80 rip=00000005fbddf779 in function Panic in
object /bin/vmx loaded at 00000005fbb85000
2021-11-16T21:17:38.377Z| vcpu-1| | I005:
SymBacktrace[1] 0000000644839290 rip=00000005fbf45f61 in function (null) in
object /bin/vmx loaded at 00000005fbb85000
2021-11-16T21:17:38.377Z| vcpu-1| | I005:
SymBacktrace[2] 00000006448392c0 rip=00000005fbf3be7a in function (null) in
object /bin/vmx loaded at 00000005fbb85000
2021-11-16T21:17:38.377Z| vcpu-1| | I005:
SymBacktrace[3] 0000000644839480 rip=00000005fc254767 in function (null) in
object /bin/vmx loaded at 00000005fbb85000
2021-11-16T21:17:38.378Z| vcpu-1| | I005:
SymBacktrace[4] 00000006448394d0 rip=00000005fc25d69d in function (null) in
object /bin/vmx loaded at 00000005fbb85000
2021-11-16T21:17:38.378Z| vcpu-1| | I005:
SymBacktrace[5] 0000000644839520 rip=00000005fc254922 in function (null) in
object /bin/vmx loaded at 00000005fbb85000
2021-11-16T21:17:38.378Z| vcpu-1| | I005:
SymBacktrace[6] 0000000644839540 rip=00000005fc38a037 in function (null) in
object /bin/vmx loaded at 00000005fbb85000
2021-11-16T21:17:38.378Z| vcpu-1| | I005:
SymBacktrace[7] 0000000644839670 rip=000000063effed3b in function (null) in
object /lib64/libpthread.so.0 loaded at 000000063eff7000
2021-11-16T21:17:38.378Z| vcpu-1| | I005:
SymBacktrace[8] 0000000644839780 rip=000000063f2fe8ad in function clone in
object /lib64/libc.so.6 loaded at 000000063f214000
2021-11-16T21:17:38.378Z| vcpu-1| | I005:
SymBacktrace[9] 0000000644839788 rip=0000000000000000
Conclusion:
- As per the article: https://kb.vmware.com/s/article/1022119
If an ESX/ESXi host loses
access to a datastore, I/O from running virtual machines on the datastore will
time out and fail. The virtual machine pauses and an event message appears
stating that the virtual machine lost access to its disk.
- This
issue seems to be triggered due to the disconnection of the Datastore:
G800-RCG-INT-APP-NPRD-RH-01-N-007-25D
Action Plan:
- Please check with your Storage Vendor to understand the Reason for
the Storage Disconnect.
- As per the Details I can see that the Driver which you are using
for the HBA is nfnic version 4.0.0.65 and the recommended firmware version
with the driver is 4.2(3), However we can see that the firmware version
running on the Host is 4.4(3). Please check with your Hardware Vendor and
get try to get to Same Driver/Firmware Version.