RCA 36: vCenter Down due to vAPI Memory issue

vCenter Name: abcdesvc01.r1-cabc.r1.abc.net

vCenter Version: vCenter Appliance 6.7 Update 3l

 

Time of Issue:  2 February at 16:17 CST

Time in GMT: 2 February at 10:17 PM GMT

 

Account Information: r1-cabc\abcdsxpasmgr

 

 

Endpoint Logs:

 

  • Reviewed the Endpoint logs to check the response on the VAPI Service:

 

  • We can see Multiple Token generated request for the Master and Child Sessions:

 

2021-02-02T22:25:13.973Z | INFO  | sso2                      | MasterSessionManagerImpl       | Created session abcdsxpasmgr@R1-CABC.R1.ABC.NET (internal id c4c17e91-cc44-449d-af72-1a8fb797f227, token f26f7…)
2021-02-02T22:25:14.042Z | WARN  | vAPI-I/O dispatcher-0     | SessionApiSecurityUtil         | Failed to create child session with session manager com.vmware.cis.session for session svcvmwLogInsight@R1-CABC.R1.ABC.NET (internal id 31d68f61-38ac-413e-80cb-ca16a8168a1a, token f2ba6…).
com.vmware.vapi.std.errors.internal_server_error => {data=<unset>, messages=[com.vmware.vapi.std.localizable_message => {args=[Field messages missing from Structure com.vmware.vapi.std.errors.service_unavailable], default_message=Error in method invocation Field messages missing from Structure com.vmware.vapi.std.errors.service_unavailable, id=vapi.method.invoke.exception}]}
    at com.vmware.vapi.endpoint.auth.impl.SessionApiSecurityUtil.onLoginResult(SessionApiSecurityUtil.java:248)
    at com.vmware.vapi.endpoint.auth.impl.SessionApiSecurityUtil.access$500(SessionApiSecurityUtil.java:43)
    at com.vmware.vapi.endpoint.auth.impl.SessionApiSecurityUtil$2.setResult(SessionApiSecurityUtil.java:230)
    at com.vmware.vapi.endpoint.auth.impl.SessionApiSecurityUtil$2.setResult(SessionApiSecurityUtil.java:220)
    at com.vmware.vapi.endpoint.session.SessionFacade$1.setResult(SessionFacade.java:108)
    at com.vmware.vapi.endpoint.session.SessionFacade$1.setResult(SessionFacade.java:90)
    at com.vmware.vapi.internal.protocol.client.msg.json.JsonApiProvider$ResponseCallbackImpl.received(JsonApiProvider.java:221)
    at com.vmware.vapi.internal.protocol.client.msg.json.JsonApiProvider$1.received(JsonApiProvider.java:249)
    at com.vmware.vapi.internal.protocol.client.rpc.http.ApacheHttpAsyncClientTransport$SingleResponseConsumer.responseCompleted(ApacheHttpAsyncClientTransport.java:470)
    at com.vmware.vapi.internal.protocol.client.rpc.http.ApacheHttpAsyncClientTransport$DecoratorConsumer.responseCompleted(ApacheHttpAsyncClientTransport.java:334)
    at org.apache.http.impl.nio.client.MainClientExec.responseCompleted(MainClientExec.java:418)
    at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:164)
    at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:355)
    at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:242)
    at org.apache.http.impl.nio.client.LoggingAsyncRequestExecutor.inputReady(LoggingAsyncRequestExecutor.java:87)
    at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:264)
    at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:73)
    at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:37)
    at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:113)
    at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:159)
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:338)
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:316)
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:277)
    at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:105)
    at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:584)
    at java.lang.Thread.run(Thread.java:748)

 

2021-02-02T22:27:33.136Z | INFO  | sso4                      | MasterSessionManagerImpl       | Created session abcdsxpasmgr@R1-CABC.R1.ABC.NET (internal id 2c2433d8-5850-4e12-9c97-fd3a48bdfd9a, token 31fb1…)
2021-02-02T22:27:41.680Z | INFO  | sso1                      | MasterSessionManagerImpl       | Created session abcdsxpasmgr@R1-CABC.R1.ABC.NET (internal id fb7fc9f6-e8d6-4efe-84ff-1a1368e67b33, token 15162…)

 

  • We can see events related to the Slow Operation for the Request to service operation.

 

2021-02-02T22:28:42.578Z | ERROR | jetty-default-63          | RequestDurationLogProvider     | Slow operation. Request to service: com.vmware.vapi.rest.navigation.options, operation: get took 12214 ms
2021-02-02T22:28:43.763Z | ERROR | jetty-default-25          | RequestDurationLogProvider     | Slow operation. Request to service: com.vmware.vapi.rest.navigation.options, operation: get took 13939 ms
2021-02-02T22:28:44.256Z | ERROR | jetty-default-31          | RequestDurationLogProvider     | Slow operation. Request to service: com.vmware.vapi.rest.navigation.options, operation: get took 14447 ms
2021-02-02T22:29:07.542Z | ERROR | jetty-default-27          | RequestDurationLogProvider     | Slow operation. Request to service: com.vmware.vapi.rest.navigation.options, operation: get took 12863 ms
2021-02-02T22:29:08.269Z | ERROR | jetty-default-1136        | RequestDurationLogProvider     | Slow operation. Request to service: com.vmware.vapi.rest.navigation.options, operation: get took 13146 ms
2021-02-02T22:29:08.270Z | ERROR | jetty-default-26          | RequestDurationLogProvider     | Slow operation. Request to service: com.vmware.vapi.rest.navigation.options, operation: get took 14343 ms

2021-02-02T22:30:11.017Z | INFO  | state-manager1            | HealthStatusCollectorImpl      | HEALTH GREEN Current vApi Endpoint health status is created between 2021-02-02T22:27:17UTC and 2021-02-02T22:30:11UTC.
2021-02-02T22:30:11.017Z | INFO  | state-manager1            | HealthConfigurationEventListener | Computed health status is = GREEN

 

 

2021-02-02T22:31:54.783Z | INFO  | sso5                      | MasterSessionManagerImpl       | Created session abcdsxpasmgr@R1-CABC.R1.ABC.NET (internal id 5326cf2c-ca70-4b31-95ba-c5e6590935cd, token 885a7…)
2021-02-02T22:32:01.995Z | INFO  | sso3                      | MasterSessionManagerImpl       | Created session abcdsxpasmgr@R1-CABC.R1.ABC.NET (internal id 21ed257d-1079-4344-bc26-83e962f2fa0c, token e2630…)
2021-02-02T22:32:07.314Z | INFO  | vAPI-I/O dispatcher-0     | SessionApiSecurityUtil         | Created child session with session manager com.vmware.cis.session for session abcdsxpasmgr@R1-CABC.R1.ABC.NET (internal id 21ed257d-1079-4344-bc26-83e962f2fa0c, token e2630…).
2021-02-02T22:32:11.808Z | INFO  | sso4                      | MasterSessionManagerImpl       | Created session abcdsxpasmgr@R1-CABC.R1.ABC.NET (internal id 19d59b8a-08d2-4372-a003-b6e82060d267, token c9c4e…)
2021-02-02T22:32:19.096Z | INFO  | sso1                      | MasterSessionManagerImpl       | Created session abcdsxpasmgr@R1-CABC.R1.ABC.NET (internal id 193ad050-f9b1-4d5d-bf58-69535740517e, token e8101…)
2021-02-02T22:32:23.624Z | INFO  | vAPI-I/O dispatcher-1     | SessionApiSecurityUtil         | Created child session with session manager com.vmware.cis.session for session abcdsxpasmgr@R1-CABC.R1.ABC.NET (internal id 193ad050-f9b1-4d5d-bf58-69535740517e, token e8101…).

 

2021-02-02T22:32:27.101Z | WARN  | sso2                      | RequestDurationLogProvider     | Slow operation. Request to service: com.vmware.cis.session, operation: create took 5078 ms
2021-02-02T22:34:48.282Z | INFO  | state-manager1            | HealthStatusCollectorImpl      | HEALTH ORANGE Application error has occurred. Please check log files for more information.
2021-02-02T22:34:48.289Z | INFO  | state-manager1            | HealthStatusCollectorImpl      | HEALTH GREEN Current vApi Endpoint health status is created between 2021-02-02T22:34:11UTC and 2021-02-02T22:34:48UTC.

 

  • Here we can see that the Health Status of the service went to ORANGE Post which the Reconfiguration Operation started to fail with the Error java.lang.OutOfMemoryError

 

2021-02-02T22:34:48.289Z | INFO  | state-manager1            | HealthConfigurationEventListener | Computed health status is = ORANGE
2021-02-02T22:34:48.289Z | INFO  | state-manager1            | HealthConfigurationEventListener | HEALTH Application error has occurred. Please check log files for more information.
2021-02-02T22:34:48.289Z | ERROR | state-manager1            | DefaultStateManager            | Reconfiguration attempt failed with exception

2021-02-02T22:34:48.289Z | ERROR | state-manager1            | DefaultStateManager            | Reconfiguration attempt failed with exception
java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:3332)
    at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
    at java.lang.StringBuilder.append(StringBuilder.java:136)
    at java.lang.StringBuilder.append(StringBuilder.java:131)
    at java.util.AbstractMap.toString(AbstractMap.java:559)
    at com.vmware.vapi.internal.bindings.BindingsUtil.appendObject(BindingsUtil.java:417)
    at com.vmware.vapi.internal.bindings.BindingsUtil.convertToString(BindingsUtil.java:337)
    at com.vmware.vapi.internal.bindings.BindingsUtil.convertToString(BindingsUtil.java:306)
    at com.vmware.vapi.metadata.metamodel.ComponentInfo.toString(ComponentInfo.java:200)
    at com.vmware.vapi.internal.metadata.service.metamodel.FilteredElements.getMergedComponentData(FilteredElements.java:448)
    at com.vmware.vapi.internal.metadata.service.metamodel.JsonMetamodelSource.getComponentData(JsonMetamodelSource.java:317)
    at com.vmware.vapi.internal.metadata.service.metamodel.DataSourceManagerImpl.getComponentData(DataSourceManagerImpl.java:107)
    at com.vmware.vapi.internal.metadata.service.metamodel.ComponentProviderImpl.fingerprint(ComponentProviderImpl.java:55)
    at com.vmware.vapi.metadata.metamodel.ComponentApiInterface$FingerprintApiMethod.doInvoke(ComponentApiInterface.java:86)
    at com.vmware.vapi.internal.bindings.ApiMethodSkeleton.invoke(ApiMethodSkeleton.java:234)
    at com.vmware.vapi.provider.ApiMethodBasedApiInterface.invoke(ApiMethodBasedApiInterface.java:86)
    at com.vmware.vapi.provider.local.LocalProvider.invokeMethodInt(LocalProvider.java:450)
    at com.vmware.vapi.provider.local.LocalProvider.invoke(LocalProvider.java:287)
    at com.vmware.vapi.core.DecoratorApiProvider.invoke(DecoratorApiProvider.java:37)
    at com.vmware.vapi.metadata.impl.ShowUnreleasedDecorator.invoke(ShowUnreleasedDecorator.java:52)
    at com.vmware.vapi.internal.bindings.Stub.invoke(Stub.java:228)
    at com.vmware.vapi.internal.bindings.Stub.invoke(Stub.java:209)
    at com.vmware.vapi.internal.bindings.Stub.invokeMethodAsync(Stub.java:173)
    at com.vmware.vapi.internal.bindings.Stub.invokeMethod(Stub.java:141)
    at com.vmware.vapi.metadata.metamodel.ComponentStub.fingerprint(ComponentStub.java:119)
    at com.vmware.vapi.metadata.metamodel.ComponentStub.fingerprint(ComponentStub.java:111)
    at com.vmware.vapi.metadata.impl.IdlComponentInfoService.getComponentInfoFingerprint(IdlComponentInfoService.java:48)
    at com.vmware.vapi.metadata.impl.ApiMetadataImpl.calculateChecksum(ApiMetadataImpl.java:120)
    at com.vmware.vapi.metadata.impl.ApiMetadataImpl.getFingerprint(ApiMetadataImpl.java:91)
    at com.vmware.vapi.endpoint.api.ApiMetadataBuilder.reloadProducts(ApiMetadataBuilder.java:98)
    at com.vmware.vapi.endpoint.api.ApiMetadataBuilder.rebuild(ApiMetadataBuilder.java:56)

2021-02-02T22:46:29.655Z | ERROR | sso2                      | ApiEndpointServer              | Unexpected error on thread Thread[sso2,5,main]
java.lang.OutOfMemoryError: GC overhead limit exceeded
2021-02-02T22:46:30.824Z | ERROR | sso6                      | StsFacadeImpl                  | Failed to acquire Act-As token for Subject svcvmwLogInsight@R1-CABC.R1.ABC.NET, token StartTime 2/2/21 10:46 PM / EndTime 2/2/21 10:49 PM.
com.vmware.vim.sso.client.exception.InternalError: Cannot sign request message

    at com.vmware.vim.sso.client.impl.SecurityTokenServiceImpl$RequestResponseProcessor.executeRoundtrip(SecurityTokenServiceImpl.java:851)
    at com.vmware.vim.sso.client.impl.SecurityTokenServiceImpl.acquireTokenByCertificate(SecurityTokenServiceImpl.java:477)
    at com.vmware.vapi.endpoint.sso.context.StsFacadeImpl.acquireActAsToken(StsFacadeImpl.java:37)
    at com.vmware.vapi.endpoint.sso.context.SecurityContextUtil.acquireAndRenewActAsToken(SecurityContextUtil.java:303)
    at com.vmware.vapi.endpoint.sso.context.SecurityContextUtil.access$000(SecurityContextUtil.java:38)
    at com.vmware.vapi.endpoint.sso.context.SecurityContextUtil$2.run(SecurityContextUtil.java:170)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: com.vmware.vim.sso.client.impl.exception.SignatureException: Error while creating SOAP request signature
    at com.vmware.vim.sso.client.impl.signature.WsSecuritySignatureImpl.sign(WsSecuritySignatureImpl.java:162)
    at com.vmware.vim.sso.client.impl.SecurityTokenServiceImpl$RequestResponseProcessor.executeRoundtrip(SecurityTokenServiceImpl.java:842)
    … 8 more
Caused by: javax.xml.crypto.dsig.XMLSignatureException: java.security.SignatureException: Could not sign data
    at org.jcp.xml.dsig.internal.dom.DOMSignatureMethod.sign(DOMSignatureMethod.java:345)
    at org.jcp.xml.dsig.internal.dom.DOMXMLSignature.sign(DOMXMLSignature.java:405)
    at com.vmware.vim.sso.client.impl.signature.WsSecuritySignatureImpl.sign(WsSecuritySignatureImpl.java:148)
    … 9 more
Caused by: java.security.SignatureException: Could not sign data
    at sun.security.rsa.RSASignature.engineSign(RSASignature.java:195)
    at java.security.Signature$Delegate.engineSign(Signature.java:1382)
    at java.security.Signature.sign(Signature.java:698)
    at org.jcp.xml.dsig.internal.dom.DOMSignatureMethod.sign(DOMSignatureMethod.java:343)
    … 11 more
Caused by: javax.crypto.BadPaddingException: RSA private key operation failed
    at sun.security.rsa.RSACore.crtCrypt(RSACore.java:201)
    at sun.security.rsa.RSACore.rsa(RSACore.java:122)
    at sun.security.rsa.RSASignature.engineSign(RSASignature.java:192)
    … 14 more

 

  • Same set of events can be seen on 9th as well.

 

2021-02-09T08:25:57.685Z | ERROR | state-manager1            | DefaultStateManager            | Reconfiguration attempt failed with exception
java.lang.OutOfMemoryError: Java heap space
2021-02-09T10:15:39.612Z | ERROR | state-manager1            | DefaultStateManager            | Reconfiguration attempt failed with exception
java.lang.OutOfMemoryError: Java heap space

 

 

VPXD Logs:

 

  • Have been rolled over:

 

2021-02-05T12:21:07.481Z – time the service was last started, Section for VMware VirtualCenter, pid=64235, version=6.7.0, build=17137327, option=Release
2021-02-05T12:21:07.481Z warning vpxd[64311] [Originator@6876 sub=Default opID=29a24a75] [VdbStatement] SQL execution took too long:  SELECT EVENT_ID, CHAIN_ID, EVENT_TYPE, EXTENDED_CLASS, CREATE_TIME, USERNAME, CATEGORY, VM_ID,        VM_NAME, HOST_ID, HOST_NAME, COMPUTERESOURCE_ID, COMPUTERESOURCE_TYPE,        COMPUTERESOURCE_NAME, DATACENTER_ID, DATACENTER_NAME, DATASTORE_ID, DATASTORE_NAME, NETWORK_ID,        NETWORK_NAME, NETWORK_TYPE, DVS_ID, DVS_NAME, STORAGEPOD_ID, STORAGEPOD_NAME, CHANGE_TAG_ID FROM VPXV_EVENT_ALL  WHERE CREATE_TIME>=? AND CREATE_TIME<=? AND (EVENT_TYPE IN (?)) AND EVENT_ID>? ORDER BY EVENT_ID ASC LIMIT 1000;
2021-02-05T12:21:07.481Z warning vpxd[64311] [Originator@6876 sub=Default opID=29a24a75] [VdbStatement] Execution elapsed time: 3219 ms
2021-02-05T12:21:07.521Z info vpxd[64311] [Originator@6876 sub=vpxLro opID=29a24a75] [VpxLRO] — FINISH lro-15326952
2021-02-05T12:21:07.524Z info vpxd[65483] [Originator@6876 sub=vpxLro opID=4bd873bf] [VpxLRO] — BEGIN lro-15326958 — session[521b352f-48c5-e415-4f6d-8fb0dcef40d5]52262e75-ead1-d8ba-8442-9124f079900d — vim.event.EventHistoryCollector.readNext — 521b352f-48c5-e415-4f6d-8fb0dcef40d5(52856373-b41d-4409-917b-f30c615472a9)
2021-02-05T12:21:07.777Z info vpxd[65483] [Originator@6876 sub=vpxLro opID=4bd873bf] [VpxLRO] — FINISH lro-15326958
2021-02-05T12:21:07.779Z info vpxd[64389] [Originator@6876 sub=vpxLro opID=4a9fc314] [VpxLRO] — BEGIN lro-15326963 — session[521b352f-48c5-e415-4f6d-8fb0dcef40d5]52262e75-ead1-d8ba-8442-9124f079900d — vim.HistoryCollector.remove — 521b352f-48c5-e415-4f6d-8fb0dcef40d5(52856373-b41d-4409-917b-f30c615472a9)
2021-02-05T12:21:07.779Z info vpxd[64389] [Originator@6876 sub=vpxLro opID=4a9fc314] [VpxLRO] — FINISH lro-15326963

 

 

 

Conclusion:

 

  • From the logs it looks like that the issue is with the Memory Exhaustion for the VAPI Service.
  • vAPI Endpoint – provides a single point of access to vAPI services
  • Due to this reason the restart of services resolves the issue.
  • The same symptoms tend to cause issues with other applications as well which is documents in the article: https://community.pivotal.io/s/article/5000e00001iQGX41587493234100?language=en_US

 

Action Plan:

 

  • Take an offline Snapshot of the vCenter server below making any changes.
  • Use SSH to connect to the VCSA and enter the BASH shell. 
  • Run the following command to output memory assigned to service:

 

cloudvm-ram-size -l | grep -i vmware-vapi-endpoint

 

  • Increase the Memory Size of the VAPI Endpoint service using the below command:

 

cloudvm-ram-size -C 2048 vmware-vapi-endpoint

 

  • This would increase the memory of the vmware-vapi-endpoint service from 768 MB to 2048 MB.
  • Restart the service that has been altered, in this example, vmware-vapi-endpoint:

service-control –stop vmware-vapi-endpoint

service-control –start vmware-vapi-endpoint

 

  • Confirm that the service has now been allocated the increased memory using the command:

cloudvm-ram-size -l | grep -i vmware-vapi-endpoint

 

Ashutosh Dixit

I am currently working as a Senior Technical Support Engineer with VMware Premier Services for Telco. Before this, I worked as a Technical Lead with Microsoft Enterprise Platform Support for Production and Premier Support. I am an expert in High-Availability, Deployments, and VMware Core technology along with Tanzu and Horizon.

Leave a Reply