top of page
White Structure

Experienced Technology Product Manager adept at steering success throughout the entire product lifecycle, from conceptualization to market delivery. Proficient in market analysis, strategic planning, and effective team leadership, utilizing data-driven approaches for ongoing enhancements.

  • Twitter
  • LinkedIn
White Structure
Writer's pictureArun Nukula

vRLI Cluster unresponsive as / partition full on 1 node due to multiple .hints file

Recently we've seen a situation where the root partition was full on vRLI appliance.

This was part of a vRLI 3 node cluster.


When this issue occurs, the cassandra service gets into a hung state and then this issue starts impacting other nodes in the cluster as well.


cassandra.log shows service unresponsive due to space issue on the root partition



INFO  [HANDSHAKE-XXXXXXX] 2020-03-04 10:47:57,384 OutboundTcpConnection.java:560 - Handshaking version with XXXXXXX
INFO  [RequestResponseStage-3] 2020-03-04 10:47:57,400 Gossiper.java:1019 - InetAddress /ZZZZZZZ is now UP
INFO  [GossipStage:1] 2020-03-04 10:47:58,379 StorageService.java:2292 - Node /ZZZZZZZ state jump to NORMAL
ERROR [HintsWriteExecutor:1] 2020-03-04 10:48:24,194 CassandraDaemon.java:228 - Exception in thread Thread[HintsWriteExecutor:1,5,main]
org.apache.cassandra.io.FSWriteError: java.io.IOException: No space left on device
        at org.apache.cassandra.hints.HintsWriteExecutor.flushInternal(HintsWriteExecutor.java:232) ~[apache-cassandra-3.11.2.jar:3.11.2]
        at org.apache.cassandra.hints.HintsWriteExecutor.flush(HintsWriteExecutor.java:203) ~[apache-cassandra-3.11.2.jar:3.11.2]
        at org.apache.cassandra.hints.HintsWriteExecutor.lambda$flush$1(HintsWriteExecutor.java:195) ~[apache-cassandra-3.11.2.jar:3.11.2]

The root partition was occupied by a .hprof file along with multiple .hints file and crc32 file getting created in /usr/lib/loginsight/application/lib/apache-cassandra-*/data/hints directory



Background on hints


Hints are one of three ways to support consistency in the system. When replica node is not available coordinator stores mutating data in temporary hint files to proceed as replica is available.


Ideally, in all vRLI deployments, it's configured that they are deleted after the default 3 hours. But somehow it's not working and hint files stay there seems forever in some environments.


Repairing runs automatically that is an addition way to support consistency in the system.

Manual deletion is solution in this situation.


This is a bug and will be addressed in upcoming releases of vRLI

442 views0 comments

Recent Posts

See All

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
bottom of page