RCA 27: vCenter crashed with vmkernel-zdump file

Product: VMware ESXi Version: 6.7.0 Build: 16075168 Release:  ()

Hostname: abcdtmp022h07.abcde.com

 

Host Hardware Information:

 

Hardware Vendor: Dell Inc.

Hardware Model: PowerEdge FC830

BIOS Version: 2.10.5

BIOS Release Date: 2019-08-20T00:00:00

 

Host CPU Information:

 

Version: Intel(R) Xeon(R) CPU E5-4669 v3 @ 2.10GHz

Packages: 4 Cores: 72 Threads: 144

 

 

Host Memory Information:

Physical: 1048476 GB

VMKernel: 1048476 GB

Free: 1039777 GB

 

 

VMKernel Logs for the Dump:

 

  • Reviewed the VMKernel Logs and can see events related to the Admission Failures for openwsmand process.

 

2020-07-05T10:29:14.814Z cpu78:2849989)MemSchedAdmit: 478: UserWorld ‘openwsmand’ with cmdline ‘/sbin/openwsmand –syslog=3 –foreground-process’
2020-07-05T10:29:14.814Z cpu78:2849989)MemSchedAdmit: 489: uw.2849986 (3904145) extraMin/extraFromParent: 256/256, wsman (836) childEmin/eMinLimit: 5704/5888
2020-07-05T10:29:14.814Z cpu78:2849989)MemSchedAdmit: 471: Admission failure in path: wsman/openwsmand.2849986/uw.2849986
2020-07-05T10:29:14.814Z cpu78:2849989)MemSchedAdmit: 478: UserWorld ‘openwsmand’ with cmdline ‘/sbin/openwsmand –syslog=3 –foreground-process’
2020-07-05T10:29:14.814Z cpu78:2849989)MemSchedAdmit: 489: uw.2849986 (3904145) extraMin/extraFromParent: 256/256, wsman (836) childEmin/eMinLimit: 5704/5888
2020-07-05T10:29:14.814Z cpu78:2849989)MemSchedAdmit: 471: Admission failure in path: wsman/openwsmand.2849986/uw.2849986
2020-07-05T10:29:14.814Z cpu78:2849989)MemSchedAdmit: 478: UserWorld ‘openwsmand’ with cmdline ‘/sbin/openwsmand –syslog=3 –foreground-process’
2020-07-05T10:29:14.814Z cpu78:2849989)MemSchedAdmit: 489: uw.2849986 (3904145) extraMin/extraFromParent: 256/256, wsman (836) childEmin/eMinLimit: 5704/5888

 

  • Post which the Machine crashed with the Exception 14.

 

2020-07-05T10:48:13.827Z cpu96:2566985)[45m[33;1mVMware ESXi 6.7.0 [Releasebuild-16075168 x86_64][0m
#PF Exception 14 in world 2566985:NetWorld-VM- IP 0x41803262aecd addr 0x41
2020-07-05T10:48:13.829Z cpu96:2566985)@BlueScreen: #PF Exception 14 in world 2566985:NetWorld-VM- IP 0x41803262aecd addr 0x41
2020-07-05T10:48:13.829Z cpu96:2566985)Code start: 0x418032400000 VMK uptime: 22:05:49:05.559
2020-07-05T10:48:13.829Z cpu96:2566985)0x451b5b99bce0:[0x41803262aecd]PktMemFree@vmkernel#nover+0xd stack: 0x4306b80ce4c0
2020-07-05T10:48:13.830Z cpu96:2566985)0x451b5b99bcf0:[0x41803268db4b]Vmxnet3VMKDevTxComplete@vmkernel#nover+0xe4 stack: 0x451b5c9a3000
2020-07-05T10:48:13.830Z cpu96:2566985)0x451b5b99bd60:[0x41803268e1fe]Vmxnet3VMKDevTxCompleteCB@vmkernel#nover+0x12b stack: 0x0
2020-07-05T10:48:13.830Z cpu96:2566985)0x451b5b99bdb0:[0x418032634b3f]Port_IOCompleteList@vmkernel#nover+0x1ac stack: 0x1
2020-07-05T10:48:13.831Z cpu96:2566985)0x451b5b99be30:[0x41803262dd92]PktList_DoIOComplete@vmkernel#nover+0x73 stack: 0x300003100000000
2020-07-05T10:48:13.831Z cpu96:2566985)0x451b5b99beb0:[0x41803262e620]PktListIOCompleteInt@vmkernel#nover+0x165 stack: 0x0
2020-07-05T10:48:13.831Z cpu96:2566985)0x451b5b99bf30:[0x41803263b04f]Portset_ProcessAllDeferred@vmkernel#nover+0x28 stack: 0x430152e20120
2020-07-05T10:48:13.832Z cpu96:2566985)0x451b5b99bf50:[0x41803263c6ae]Portset_ReleasePort@vmkernel#nover+0x4b stack: 0x0
2020-07-05T10:48:13.832Z cpu96:2566985)0x451b5b99bf80:[0x418032622631]NetWorldPerVMCB@vmkernel#nover+0x122 stack: 0x0
2020-07-05T10:48:13.832Z cpu96:2566985)0x451b5b99bfe0:[0x4180327107da]CpuSched_StartWorld@vmkernel#nover+0x77 stack: 0x0
2020-07-05T10:48:13.840Z cpu96:2566985)heartbeat            0x4180333d3000 .data 0x417fcde00000 .bss 0x417fcde16180

 

 

========================================================================================

 

 

Dump Review:

 

  • While reviewing the logs we can see not just the VMKernel Crashed at the time of issue but sfcb and openwsmand as well. Generally Multiple processes crashing points out an issue towards memory exhaustion.

 

 

 

vmkernel-zdump.1

 

Initializing debug related workflows
Processing VMkernel crash dump  |
Information gathered from Dump file: /scripts/data/srdata25/020137923007/esx-abcdtmp022h07.abcde.com-2020-07-06–12.16-2114294/var/core/vmkernel-zdump.1
Panic Details: Crash at 2020-07-05T10:48:13.829Z on CPU 96 running world 2566985. VMK Uptime:22:05:49:05.559
Panic Message: @BlueScreen: #PF Exception 14 in world 2566985:NetWorld-VM- IP 0x41803262aecd addr 0x41
Backtrace (top 5 lines):
  0x451b5b99bce0:[0x41803262aecd]PktMemFree@vmkernel#nover+0xd stack: 0x4306b80ce4c0, 0x4307298e57c0, 0x451b5c9a3000, 0x430730d61928, 0x430788110af0
  0x451b5b99bcf0:[0x41803268db4b]Vmxnet3VMKDevTxComplete@vmkernel#nover+0xe4 stack: 0x451b5c9a3000, 0x430730d61928, 0x430788110af0, 0x12100000000, 0x42
  0x451b5b99bd60:[0x41803268e1fe]Vmxnet3VMKDevTxCompleteCB@vmkernel#nover+0x12b stack: 0x0, 0x45bb00000000, 0x45bb498a2ec0, 0x0, 0x451b5b99bdc0
  0x451b5b99bdb0:[0x418032634b3f]Port_IOCompleteList@vmkernel#nover+0x1ac stack: 0x1, 0x0, 0x0, 0x451b00000000, 0x0
  0x451b5b99be30:[0x41803262dd92]PktList_DoIOComplete@vmkernel#nover+0x73 stack: 0x300003100000000, 0x4306b80ce4c0, 0xb1742ba37c7a, 0x0, 0x0

 

 

openwsmand-zdump.000

 

Initializing debug related workflows
Starting gdb session using /build/apps/bin/debugzilla.py with /scripts/data/srdata25/020137923007/esx-abcdtmp022h07.abcde.com-2020-07-06–12.16-2114294/var/core/openwsmand-zdump.000GDB Session is ready
Information gathered from Dump file: /scripts/data/srdata25/020137923007/esx-abcdtmp022h07.abcde.com-2020-07-06–12.16-2114294/var/core/openwsmand-zdump.000
Dump is from: openwsmand
Build Number: 16075168 (6.7.0 – vsphere67p02)
Crashing World ID: 2106649
Backtrace (top 5 lines):
  #0  0x046aab54 in addProperty (parm=0x4725c4c, ps=0x4725b50, p=0x4725920) at /build/mts/release/bora-12296573/compcache/cayman_esx_glibc/ob-4858559/linux64/sysroot/usr/include/bits/string3.h:52
  #1  0x046a76cf in instance (parm=0x4725c4c, stateUnion=0x4725b4c) at /build/mts/release/bora-12296573/cayman_sfcc/sfcc/src/backend/cimxml/grammar.c:1175
  #2  0x046a89d6 in valueNamedInstance (stateUnion=0x4725b40, parm=0x4725c4c) at /build/mts/release/bora-12296573/cayman_sfcc/sfcc/src/backend/cimxml/grammar.c:1141
  #3  iReturnValueContent (parm=0x4725c4c, stateUnion=<optimized out>) at /build/mts/release/bora-12296573/cayman_sfcc/sfcc/src/backend/cimxml/grammar.c:566
  #4  iReturnValue (stateUnion=0x47259c4, parm=0x4725c4c) at /build/mts/release/bora-12296573/cayman_sfcc/sfcc/src/backend/cimxml/grammar.c:460
Backtrace hints this dump resulted from a Crash
Found signs of Memory Exhaustion: No
Found signs of FD Exhaustion: No

 

 

 

sfcb-vmware_bas-zdump.003

 

Dump is from: sfcbd

Backtrace (top 5 lines):

  #0  0x0804d352 in _dl_sysinfo_int80 () from /tmp/debug-uw.8zG6d5ys/lib/ld-linux.so.2

  #1  0x0a1bc840 in __GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56

  #2  0x0a1be043 in __GI_abort () at abort.c:90

  #3  0x09cadf15 in tool_mm_realloc_or_die (new_size=1600000, orig_size=800000, ptr=<optimized out>) at /build/mts/release/bora-15963473/cayman_sfcb/sfcb/src/support.c:918

  #4  tool_mm_realloc_or_die (ptr=0xf0ff008, orig_size=800000, new_size=1600000) at /build/mts/release/bora-15963473/cayman_sfcb/sfcb/src/support.c:912

Backtrace hints this dump resulted from a Crash

Found signs of Memory Exhaustion: Yes

Will run memory checker on the dump.

Found signs of FD Exhaustion: No

 

Memory Details

    Virtual  

107667456 bytes

    In Dump   

107667456 bytes

    Free          

934512 bytes

    Allocated     

58785160 bytes

    Leaked       

42562440 bytes

 

Conclusion:

 

  • Components like sfcbd and openwsman generally works with a third-party CIM provider and If a third-party CIM provider is installed, sfcbd and openwsman start automatically.
  • We are seeing the Memory leak with the Components associated with the CIM Providers.

Reference: https://kb.vmware.com/s/article/1025757

 

Action Plan:

 

  • As per the Logs I can see that Dell EMC Open Manage is installed on the Machine.

 

Vendor  Name        

Summary                                                    

Version             

Acceptance Level

Dell OpenManage  

Dell EMC OpenManage Server Administrator

9.3.1 for ESXi6.7  9.3.1.ESXi670-3684  

VMwareAccepted

DEL racadm      

Dell EMC Racadm 9.4.0 for ESXi6.7                          

9.4.0.ESXi670-3749  

PartnerSupported

 

Ashutosh Dixit

I am currently working as a Senior Technical Support Engineer with VMware Premier Services for Telco. Before this, I worked as a Technical Lead with Microsoft Enterprise Platform Support for Production and Premier Support. I am an expert in High-Availability, Deployments, and VMware Core technology along with Tanzu and Horizon.

Leave a Reply