Are you seeing similar exceptions in horizon.log?
First step you do to troubleshoot elasticcache problem is to check if runtime-config.properties file is properly configured
From first vRA appliance in a 3 node cluster # ehcache configuration properties ehcache.replication.rmi.registry.port=40002 ehcache.replication.rmi.remoteObject.port=40003 # Overrides the list of ehcache replication peers. FQDNs separated by ":", e.g. server1.example.com:server2.example.com ehcache.replication.rmi.servers=node2fqdn:node3fqdn
From Second vRA appliance in a 3 node cluster # ehcache configuration properties ehcache.replication.rmi.registry.port=40002 ehcache.replication.rmi.remoteObject.port=40003 # Overrides the list of ehcache replication peers. FQDNs separated by ":", e.g. server1.example.com:server2.example.com ehcache.replication.rmi.servers=node1fqdn:node3fqdn From Third vRA appliance in a 3 node cluster
# ehcache configuration properties ehcache.replication.rmi.registry.port=40002 ehcache.replication.rmi.remoteObject.port=40003 # Overrides the list of ehcache replication peers. FQDNs separated by ":", e.g. server1.example.com:server2.example.com ehcache.replication.rmi.servers=node1fqdn:node2fqdn
Next , we did check if port connectivity is established and open
node1:~ # curl -v telnet://node1fqdn:40003 * Rebuilt URL to: telnet://node1fqdn:40003/ * Trying 10.37.79.15... * TCP_NODELAY set * Connected to node1fqdn (XX.XX.XX.XX) port 40003 (#0)
Ran elastic-search health-check and it's output was promising as well
https://hostname/SAAS/API/1.0/REST/system/health/
Did approach Engineering and was suggested to perform following steps
1) Backup existing file:
cp /opt/vmware/horizon/workspace/bin/setenv.sh /opt/vmware/horizon/workspace/bin/setenv_bak.sh 2) vi /opt/vmware/horizon/workspace/bin/setenv.sh 3) Import utils.inc file by adding this line in setenv.sh: . /usr/local/horizon/scripts/utils.inc 4) Search for JVM_OPTS in setenv.sh file and ensure you have this property set exactly like this: -Djava.rmi.server.hostname=$(myip) 5) Please repeat above steps for all appliances 6) Restart vIDM service on all appliances: service horizon-workspace restart
By default this is how it looks...
JVM_OPTS="-server -Djdk.tls.ephemeralDHKeySize=1024 -XX:+AggressiveOpts \
-XX:MaxMetaspaceSize=768m -XX:MetaspaceSize=768m \
-Xss1m -Xmx3419m -Xms2564m \
-XX:+UseParallelGC -XX:+UseParallelOldGC \
-XX:NewRatio=3 -XX:SurvivorRatio=12 \
-XX:+DisableExplicitGC \
-XX:+UseBiasedLocking -XX:-LoopUnswitching"
and we need to change it to
JVM_OPTS="-server -Djdk.tls.ephemeralDHKeySize=1024 -Djava.rmi.server.hostname=$(myip) -XX:+AggressiveOpts \ -XX:MaxMetaspaceSize=768m -XX:MetaspaceSize=768m \ -Xss1m -Xmx3419m -Xms2564m \ -XX:+UseParallelGC -XX:+UseParallelOldGC \ -XX:NewRatio=3 -XX:SurvivorRatio=12 \ -XX:+DisableExplicitGC \ -XX:+UseBiasedLocking -XX:-LoopUnswitching"
After making the change on all available vRA appliances and restarting them. There were no more exceptions seen in horizon.log
This file is setting the correct hostname and IP address in the java environment for the application to form a cluster correctly.
This code has been already fixed in IDM and should be added in vRA 7.4
Finally , root-cause is that using IPv6 address in /etc/hosts file is not setting the hostname and ip-address correctly for the application.