GlusterFS availability/healing of a Volume

I performed some investigation on availability/healing of GlusterFS Volumes.


In a "lab" environment I tested two types:



  1. Distributed Volume

  2. Replicated Volume


With the tests I wanted to figure out a few questions:



  1. What is the availability of files if one of the nodes dies/goes offline

  2. How is will GlusterFS recover of the node comes back online

Please note, that in the tests, the nodes were pulled offline manual, while the client was not connected to the node that goes offline. For a failover to a working node, you can use tools like ucarp for "failing over".

Replicated Volume


Created a simple replicated volume across 3 nodes:



Then Node 1 dies/goes offline




All the files will be available for reading, since they are spread over all the nodes. Now you can also write data to the volume while node01 is offline.



Once Node01 comes back online, GlusterFS will "resync" the node:






Distributed Volume


We also created a simple distributed volume using 3 nodes:




And (again) node01 dies:



The files A and D are not available, since they are living on node01, the other files are still available. You can also write to the volume, but the files will be written on the nodes that are online (so not balanced across the 3 nodes):



Once Node01 is back online, the files A and D will become available again.



But, the volume is out of balance, so a rebalance should be initiated to get the replicated volume in a good shape:



As you can see there are pros and cons of replicated and distributed volumes, but you can also combine these:



But I haven't test this set up in the lab... but can guess what will happen when a node die. :-D