RCA 32: vCenter Down due to Memory Error

Hostname: abc30vcntrpa027.abc.com

vCenter Version:  vCenter Appliance 6.7 Update 3g

 

Time of Issue: 13th 5:30 GMT

 

 

Endpoint. Properties:

 

  • From the Endpoint Properties we can see the below changes:

 

# vapi.interfaceLoader.timeoutMs – timeout used for scanning vapi providers in millis. In case of frequent CPU starvation this value may need to be increased. Higher values mean that rescanning configuration may take very long time
# Default value is 10000
#******************************************************************************
vapi.http.soTimeout=0
vapi.http.timeout=15000
# vapi.http.maxConnections=2000
# vapi.http.maxConnectionsPerRoute=600
# vapi.http.ioThreads=2
# vapi.http.keepAliveDuration=30000
vapi.interfaceLoader.timeoutMs=15000
#******************************************************************************

 

vSphere Client Virgo Service:

 

  • In the Client Virgo logs we can see error 443 while communicating with the VAPI Service, probably the time when the services were intentionally stopped.

 

[2020-10-13T03:15:29.885Z] [ERROR] data-service-pool-792        70274151 112207 200427 com.vmware.vise.vim.vapi.SessionBasedVapiConnectionAuthenticator  Error while logging in to service ‘https://ABC30VCNTRPA027.abc.com:443/site/api‘ – invoking ‘create’ on type ‘com.vmware.cis.Session’ failed. com.vmware.vapi.client.exception.TransportProtocolException: HTTP response with status code 503 (enable debug logging for details): <HTML><BODY><H1>503 Service Unavailable (Failed to connect to endpoint: [N7Vmacore4Http16LocalServiceSpecE:0x00007faf4c4d3fe0] _serverNamespace = /site/api action = Allow _port = 12346)</H1></BODY></HTML>
Caused by: com.vmware.vapi.client.exception.TransportProtocolException: HTTP response with status code 503 (enable debug logging for details): <HTML><BODY><H1>503 Service Unavailable (Failed to connect to endpoint: [N7Vmacore4Http16LocalServiceSpecE:0x00007faf4c4d3fe0] _serverNamespace = /site/api action = Allow _port = 12346)</H1></BODY></HTML>
[2020-10-13T03:15:29.908Z] [ERROR] extfw-filterer-pool-2733267  70274151 112207 200427 com.vmware.vise.util.concurrent.ExecutorUtil                      A task crashed: com.vmware.vise.extensionfw.impl.Filterer$1@5cf014c0 java.lang.RuntimeException: com.vmware.vapi.client.exception.TransportProtocolException: HTTP response with status code 503 (enable debug logging for details): <HTML><BODY><H1>503 Service Unavailable (Failed to connect to endpoint: [N7Vmacore4Http16LocalServiceSpecE:0x00007faf4c4d3fe0] _serverNamespace = /site/api action = Allow _port = 12346)</H1></BODY></HTML>

 

 

  • From the logs we can see the issues related to the Deployment getting Failed for the NetBackup Service:

 

[2020-10-13T04:58:57.738Z] [INFO ] -extension-mngr-pool-2733301 70274213 112223 200428 com.vmware.vise.vim.extension.PluginStatusTaskManager             DEPLOYMENT_FAILED: Error deploying plugin package com.vmware.fallback.mixed:6.5.0. Reason: Incompatible. This is a vSphere Web Client (Flash) only plugin indicated by the missing type=”html” plugin manifest attribute.
[2020-10-13T04:58:57.745Z] [ERROR] catalog-manager-pool-2733293 70274213 112223 200428 com.vmware.vise.vim.lookup.LsCatalogManager                       Unable to extract service version for: NetBackup_8.1 java.lang.NumberFormatException: For input string: “NetBackup_8”
[2020-10-13T04:58:57.745Z] [ERROR] catalog-manager-pool-2733293 70274213 112223 200428 com.vmware.vise.vim.lookup.LsCatalogManager                       Unable to extract service version for: NetBackup_8.1 java.lang.NumberFormatException: For input string: “NetBackup_8”
[2020-10-13T04:58:57.745Z] [ERROR] catalog-manager-pool-2733293 70274213 112223 200428 com.vmware.vise.vim.lookup.LsCatalogManager                       Unable to extract service version for: NetBackup_8.1 java.lang.NumberFormatException: For input string: “NetBackup_8”
[2020-10-13T04:58:57.745Z] [ERROR] catalog-manager-pool-2733293 70274213 112223 200428 com.vmware.vise.vim.lookup.LsCatalogManager                       Unable to extract service version for: NetBackup_8.1 java.lang.NumberFormatException: For input string: “NetBackup_8”
[2020-10-13T04:58:57.745Z] [ERROR] catalog-manager-pool-2733293 70274213 112223 200428 com.vmware.vise.vim.lookup.LsCatalogManager                       Unable to extract service version for: NetBackup_7.7.1 java.lang.NumberFormatException: For input string: “NetBackup_7”

[2020-10-13T04:58:57.763Z] [ERROR] catalog-manager-pool-2733311 70274221 112223 200428 com.vmware.vise.vim.lookup.LsCatalogManager                       Unable to extract service version for: NetBackup_7.7.3 java.lang.NumberFormatException: For input string: “NetBackup_7”
[2020-10-13T04:58:57.763Z] [ERROR] catalog-manager-pool-2733311 70274221 112223 200428 com.vmware.vise.vim.lookup.LsCatalogManager                       Unable to extract service version for: NetBackup_7.7.3 java.lang.NumberFormatException: For input string: “NetBackup_7”
[2020-10-13T04:58:57.763Z] [ERROR] catalog-manager-pool-2733311 70274221 112223 200428 com.vmware.vise.vim.lookup.LsCatalogManager                       Unable to extract service version for: NetBackup_7.7.3 java.lang.NumberFormatException: For input string: “NetBackup_7”
[2020-10-13T04:58:57.763Z] [ERROR] catalog-manager-pool-2733311 70274221 112223 200428 com.vmware.vise.vim.lookup.LsCatalogManager                       Unable to extract service version for: NetBackup_7.7.3 java.lang.NumberFormatException: For input string: “NetBackup_7”
[2020-10-13T04:58:57.763Z] [ERROR] catalog-manager-pool-2733311 70274221 112223 200428 com.vmware.vise.vim.lookup.LsCatalogManager                       Unable to extract service version for: NetBackup_7.7.3 java.lang.NumberFormatException: For input string: “NetBackup_7”

 

[2020-10-13T04:58:57.766Z] [ERROR] catalog-manager-pool-2733305 70274216 112223 200428 com.vmware.vise.vim.lookup.LsCatalogManager                       Invoking ‘downloadResourceBundle(com.vmware.vws.vws.health from https://ABC30VCNTRPA027.abc.com:443/hwh/resourcebundle)’ unsuccessful, retry time left = None. Returning default result org.apache.http.HttpException: HttpStatus is not SC_OK. statusCode = 404

 

[2020-10-13T04:58:57.764Z] [INFO ] catalog-manager-pool-2733314 70274224 112223 200428 com.vmware.vise.vim.lookup.LsCatalogManager                       Loading properties for /etc/vmware/vsphere-ui/cmCatalog/com.vmware.cis.com.vmware.vsan.mgmt.zip:US
[2020-10-13T04:58:57.765Z] [INFO ] catalog-manager-pool-2733315 70274225 112223 200428 com.vmware.vise.vim.lookup.LsCatalogManager                       Loading properties for /etc/vmware/vsphere-ui/cmCatalog/com.vmware.cis.com.vmware.vcops.zip:US
[2020-10-13T04:58:57.765Z] [INFO ] catalog-manager-pool-2733307 70274226 112223 200428 com.vmware.vise.vim.lookup.LsCatalogManager                       downloadResourceBundle: Trying https://ABC30VCNTRPA019.abc.com:443/catalog/com.vmware.vsan.mgmt_catalog.zip
[2020-10-13T04:58:57.765Z] [INFO ] catalog-manager-pool-2733317 70274227 112223 200428 com.vmware.vise.vim.lookup.LsCatalogManager                       Loading properties for /etc/vmware/vsphere-ui/cmCatalog/com.vmware.cis.com.vmware.vcIntegrity.zip:US

 

  • Post which we started seeing error on Port 443:

 

[2020-10-13T04:58:55.839Z] [WARN ] -authentication-pool-2733279 70274213 112222 200428 com.vmware.vise.vim.commons.vcservice.impl.VcServiceImpl          VC https://ABC30VCNTRPA027.abc.com:443/sdk doesn’t support the requested locale “en_US”. Will use “en” instead.
[2020-10-13T04:58:55.878Z] [WARN ] linkedVcGroup-pool-2733280   70274213 112222 200428 com.vmware.vise.vim.commons.vcservice.impl.VcServiceImpl          VC https://ABC30VCNTRPA019.abc.com:443/sdk doesn’t support the requested locale “en_US”. Will use “en” instead.
[2020-10-13T04:58:55.905Z] [WARN ] linkedVcGroup-pool-2733282   70274213 112222 200428 com.vmware.vise.vim.commons.vcservice.impl.VcServiceImpl          VC https://ABC30VCNTRPA03.abc.com:443/sdk doesn’t support the requested locale “en_US”. Will use “en” instead.
[2020-10-13T04:58:55.907Z] [WARN ] linkedVcGroup-pool-2733281   70274213 112222 200428 com.vmware.vise.vim.commons.vcservice.impl.VcServiceImpl          VC https://ABC30VCNTRPA13.abc.com:443/sdk doesn’t support the requested locale “en_US”. Will use “en” instead.

[2020-10-13T04:58:57.766Z] [ERROR] catalog-manager-pool-2733305 70274216 112223 200428 com.vmware.vise.vim.lookup.LsCatalogManager                       Invoking ‘downloadResourceBundle(com.vmware.vws.vws.health from https://ABC30VCNTRPA027.abc.com:443/hwh/resourcebundle)’ unsuccessful, retry time left = None. Returning default result org.apache.http.HttpException: HttpStatus is not SC_OK. statusCode = 404

[2020-10-13T04:58:57.766Z] [ERROR] catalog-manager-pool-2733305 70274216 112223 200428 com.vmware.vise.vim.lookup.LsCatalogManager                       Invoking ‘downloadResourceBundle(com.vmware.vws.vws.health from https://ABC30VCNTRPA027.abc.com:443/hwh/resourcebundle)’ unsuccessful, retry time left = None. Returning default result org.apache.http.HttpException: HttpStatus is not SC_OK. statusCode = 404
    at com.vmware.vise.vim.lookup.LsCatalogManager.executeDownload(LsCatalogManager.java:1568)
    at com.vmware.vise.vim.lookup.LsCatalogManager.downloadResourceBundle(LsCatalogManager.java:1509)
    at com.vmware.vise.vim.lookup.LsCatalogManager.access$1300(LsCatalogManager.java:98)
    at com.vmware.vise.vim.lookup.LsCatalogManager$7.call(LsCatalogManager.java:1437)
    at com.vmware.vise.vim.lookup.LsCatalogManager$7.call(LsCatalogManager.java:1435)
    at com.vmware.vise.vim.lookup.LsCatalogManager.retryUntil(LsCatalogManager.java:1708)
    at com.vmware.vise.vim.lookup.LsCatalogManager.initializeCatalogServiceProviderInfo(LsCatalogManager.java:1442)
    at com.vmware.vise.vim.lookup.LsCatalogManager.access$1200(LsCatalogManager.java:98)
    at com.vmware.vise.vim.lookup.LsCatalogManager$6.call(LsCatalogManager.java:1251)
    at com.vmware.vise.vim.lookup.LsCatalogManager$6.call(LsCatalogManager.java:1239)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at com.vmware.vise.util.concurrent.QueuingCachedThreadPool$2.call(QueuingCachedThreadPool.java:831)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at com.vmware.vise.util.concurrent.QueuingCachedThreadPool$QueueProcessor.run(QueuingCachedThreadPool.java:1229)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at com.vmware.vise.util.concurrent.WorkerThreadFactory$1.run(WorkerThreadFactory.java:64)
    at java.lang.Thread.run(Thread.java:748)
[2020-10-13T04:58:57.767Z] [WARN ] catalog-manager-pool-2733305 70274216 112223 200428 com.vmware.vise.vim.lookup.LsCatalogManager                       Failed to download /etc/vmware/vsphere-ui/cmCatalog/com.vmware.vws.vws.health.zip from https://ABC30VCNTRPA027.abc.com:443/hwh/resourcebundle.
[2020-10-13T04:58:57.789Z] [WARN ] catalog-manager-pool-2733320 70274230 112223 200428 com.vmware.vise.vim.lookup.LocalizedMessageDictionary             119 Resource Bundle entry conflicts found for: /etc/vmware/vsphere-ui/cmCatalog/com.vmware.vim.sms.sms-6.7.0.0.zip/cis.sms.ResourceBundle:US

 

 

[2020-10-13T06:59:08.170Z] [INFO ] session-init-pool-2733348    70274389 112223 200428 com.vmware.vise.vim.vapi.StaticEndpointVapiConnectionManager Logging out of vAPI service: https://ABC30VCNTRPA027.abc.com:443/site/api
[2020-10-13T06:59:08.170Z] [INFO ] session-init-pool-2733351    70274389 112223 200428 com.vmware.vise.vim.vapi.StaticEndpointVapiConnectionManager Logging out of vAPI service:
https://ABC30VCNTRPA027.abc.com:443/site/api
[2020-10-13T06:59:08.171Z] [INFO ] session-init-pool-2733353    70274389 112223 200428 com.vmware.vise.vim.vapi.StaticEndpointVapiConnectionManager Logging out of vAPI service:
https://ABC30VCNTRPA027.abc.com:443/vmonapi

 

 

 

Endpoint Logs:

 

  • From the endpoint logs we can see issues regarding the health status green till 3:21.

 

2020-10-13T03:21:54.654Z | INFO  | state-manager1            | HealthConfigurationEventListener | Computed health status is = GREEN

 

  • Post which we started seeing errors related to the slow Operations, post which the vAPI Service failed with the OutOfMemoryError Errors.

 

2020-10-13T04:00:41.875Z | INFO  | sso8                      | MasterSessionManagerImpl       | Created session svcABCDMINBUVCB@ABC.COM (internal id 32599edd-ef0d-4490-b6a2-203e877c892a, token 82a1b…)
2020-10-13T04:00:41.876Z | WARN  | sso8                      | RequestDurationLogProvider     | Slow operation. Request to service: com.vmware.cis.session, operation: create took 6168 ms
2020-10-13T04:00:44.110Z | INFO  | sso18                     | MasterSessionManagerImpl       | Created session svcABCDMINBUVCB@ABC.COM (internal id 56bf4dd2-f23e-447a-abad-dee32f159d28, token 5c2e8…)
2020-10-13T04:00:44.112Z | WARN  | sso18                     | RequestDurationLogProvider     | Slow operation. Request to service: com.vmware.cis.session, operation: create took 8403 ms
2020-10-13T04:00:46.972Z | INFO  | sso18                     | MasterSessionManagerImpl       | Created session svcABCDMINBUVCB@ABC.COM (internal id c7d51f21-0e91-450f-9bf3-d2cc41c63188, token 3c159…)
2020-10-13T04:00:46.973Z | ERROR | sso18                     | RequestDurationLogProvider     | Slow operation. Request to service: com.vmware.cis.session, operation: create took 11271 ms
2020-10-13T04:00:51.727Z | INFO  | sso8                      | MasterSessionManagerImpl       | Created session svcABCDMINBUVCB@ABC.COM (internal id d505d496-4bf4-4879-a62a-96fb6702cbc9, token 19035…)
2020-10-13T04:00:51.728Z | ERROR | sso8                      | RequestDurationLogProvider     | Slow operation. Request to service: com.vmware.cis.session, operation: create took 15992 ms

2020-10-13T04:01:21.691Z | ERROR | Keep-Alive-Timer          | ApiEndpointServer              | Unexpected error on thread Thread[Keep-Alive-Timer,8,system]
java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.Arrays.copyOfRange(Arrays.java:3664)
    at java.lang.String.<init>(String.java:207)
    at java.lang.StringBuilder.toString(StringBuilder.java:407)
    at sun.net.www.http.KeepAliveKey.hashCode(KeepAliveCache.java:333)
    at java.util.HashMap.hash(HashMap.java:339)
    at java.util.HashMap.get(HashMap.java:557)
    at sun.net.www.http.KeepAliveCache.run(KeepAliveCache.java:191)
    at java.lang.Thread.run(Thread.java:748)

 

2020-10-13T04:01:22.783Z | ERROR | vAPI-I/O dispatcher-1     | ApiEndpointServer              | Unexpected error on thread Thread[vAPI-I/O dispatcher-1,5,main]
2020-10-13T04:02:50.738Z | ERROR | sso4                      | ApiEndpointServer              | Unexpected error on thread Thread[sso4,5,main]
java.lang.OutOfMemoryError: GC overhead limit exceeded
2020-10-13T04:02:49.173Z | ERROR | vAPI-I/O reactor-0        | ApiEndpointServer              | Unexpected error on thread Thread[vAPI-I/O reactor-0,5,main]
java.lang.OutOfMemoryError: GC overhead limit exceeded
2020-10-13T04:02:47.140Z | ERROR | sso21                     | ApiEndpointServer              | Unexpected error on thread Thread[sso21,5,main]
java.lang.OutOfMemoryError: Java heap space
2020-10-13T04:02:47.137Z | ERROR | vAPI-I/O dispatcher-14    | ApiEndpointServer              | Unexpected error on thread Thread[vAPI-I/O dispatcher-14,5,main]
java.lang.OutOfMemoryError: GC overhead limit exceeded
2020-10-13T04:02:42.759Z | ERROR | session-monitor1          | MasterSessionManagerImpl       | Unexpected exception while trying to monitor sessions.
2020-10-13T04:02:41.538Z | ERROR | vAPI-I/O dispatcher-18    | ApiEndpointServer              | Unexpected error on thread Thread[vAPI-I/O dispatcher-18,5,main]
java.lang.OutOfMemoryError: GC overhead limit exceeded
2020-10-13T04:02:37.387Z | ERROR | sso9                      | ApiEndpointServer              | Unexpected error on thread Thread[sso9,5,main]
java.lang.OutOfMemoryError: GC overhead limit exceeded
2020-10-13T04:02:29.055Z | ERROR | vAPI-I/O dispatcher-16    | ApiEndpointServer              | Unexpected error on thread Thread[vAPI-I/O dispatcher-16,5,main]
java.lang.OutOfMemoryError: GC overhead limit exceeded
2020-10-13T04:01:25.707Z | ERROR | sso22                     | ApiEndpointServer              | Unexpected error on thread Thread[sso22,5,main]
java.lang.OutOfMemoryError: GC overhead limit exceeded
2020-10-13T04:05:27.564Z | ERROR | sso53                     | ApiEndpointServer              | Unexpected error on thread Thread[sso53,5,main]

 

 

 

Conclusion:

 

  • Based on the logs and information the issue seems to be with the vAPI Endpoint service getting failed with the Memory Error:

 

Action Plan:

 

  • Increase the threshold of the vAPI service Endpoint and add 512 MB to the Service Threshold, so that incase if it requires more space to cater the request without failures :

 

  • SSH to the vCenter Server and run the below command:

cloudvm-ram-size -C 512 vmware-vapi-endpoint

 

  • To check what is the memory for vapi-endpoint run:

$ cloudvm-ram-size -l vmware-vapi-endpoint

 

  • For the changed memory setting to take effect run:

$ service-control –restart vmware-vapi-endpoint

 

Ashutosh Dixit

I am currently working as a Senior Technical Support Engineer with VMware Premier Services for Telco. Before this, I worked as a Technical Lead with Microsoft Enterprise Platform Support for Production and Premier Support. I am an expert in High-Availability, Deployments, and VMware Core technology along with Tanzu and Horizon.

Leave a Reply