top of page

Experienced Technology Product Manager adept at steering success throughout the entire product lifecycle, from conceptualization to market delivery. Proficient in market analysis, strategic planning, and effective team leadership, utilizing data-driven approaches for ongoing enhancements.

  • Twitter
  • LinkedIn
White Background

Timed out while waiting for Event Broker response

Came across a problem recently where provisioning and reconfiguration of virtual machines through vRealize Automation were failing


Exceptions seen in catalina.out

2019-06-02 09:53:25,643 vcac: [component="cafe:event-broker" priority="INFO" thread="event-broker-service-taskExecutor8" tenant="" context="" parent="" token=""] com.vmware.vcac.core.event.broker.integration.PublishReplyEventServiceActivator.onApplicationEvent:96 - Message Broker unavailable[internal stop]

2019-04-02 09:53:25,711 vcac: [component="cafe:console-proxy" priority="WARN" thread="Grizzly(1)" tenant="" context="" parent="" token=""] com.vmware.vcac.platform.event.broker.client.stomp.StompEventSubscribeHandler.handleException:493- Error during message processing: session:[f4329ebb-4e2e-7690-8b6a-3f420c8bd226], command[null], headers[{message=[Connection to broker closed.], content-length=[0]}], payload [{}]. Reason : [Connection to broker closed.] 2019-04-02 09:53:25,712 vcac: [component="cafe:console-proxy" priority="ERROR" thread="Grizzly(1)" tenant="" context="" parent="" token=""] com.vmware.vcac.core.service.event.ServerEventBrokerServiceFacade.handleError:337 - Error for command 'null', headers: '{message=[Connection to broker closed.], content-length=[0]}'java.lang.Exception: Connection to broker closed.


Above exceptions clearly state the problem is with messaging broker which is rabbitmq


Performing rabbitmq reset's and then adding second or third node ( if available ) to the master would eventually resolve the problem.


After fair bit of research under rabbitmq logs we see


on node psvra01.nukescloud.com:


=INFO REPORT==== 11-Jun-2019::16:31:41 ===

rabbit on node 'rabbit@psvra03.nukescloud.com' down


=INFO REPORT==== 11-Jun-2019::16:31:41 ===

Keep rabbit@psvra03.nukescloud.com listeners: the node is already back



on node psvra03.nukescloud.com:


=INFO REPORT==== 11-Jun-2019::16:54:12 ===

rabbit on node 'rabbit@psvra01.nukescloud.com' down


=INFO REPORT==== 11-Jun-2019::16:54:12 ===

Keep rabbit@rabbit@psvra01.nukescloud.com listeners: the node is already back


...


=INFO REPORT==== 11-Jun-2019::18:55:09 ===

rabbit on node 'rabbit@rabbit@psvra01.nukescloud.com' down


=INFO REPORT==== 11-Jun-2019::18:55:09 ===

Keep rabbit@rabbit@psvra01.nukescloud.com listeners: the node is already back


Above snippets clearly show network partitions / issues happening.


Rabbitmq does not tolerate network partitioning events and does not recover from them properly.


Executing below command would help in somewhat resilient in case of these network partitions


rabbitmqctl set_policy ha-all "" '{"ha-mode":"all","ha-sync-mode":"automatic","ha-promote-on-failure":"always","ha-promote-on-shutdown":"always"}'

New versions of vRealize Automation have mechanisms in place to detect this sort of issues and attempt an automated recovery


The command mentioned above will help to certain extent but there has to be 100% available and redundant network available between vRealize Automation nodes.


860 views0 comments

Recent Posts

See All

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
bottom of page