GlusterFS Setup

Lightning fast data synchronization across multiple server nodes

  1. Floren
    Many websites or projects experiencing exponential growth face distributed data storage issues. GlusterFS is an unified, poly-protocol, scale-out filesystem, capable of serving PBs of data at lightning speeds and turns common hardware into a high-performance scalable storage solution.

    In this tutorial, we will review a basic replication setup between two nodes which allows instant synchronization of a specific directory, as well their related content, permissions changes, etc. If certain terms used are unfamiliar, please consult the the GlusterFS documentation.

    Disk Partitioning
    For a replication setup, GlusterFS requires an identical disk partition present on each node. We will use apollo and chronos as nodes, with one GlusterFS volume and brick replicated across the nodes.

    From my experience, the most important part of a GlusterFS setup is planing ahead your disks partitioning. If you have a proper layout, it will be very easy to create a designated GlusterFS volume group and logical volume for each node.

    We will use as example the following partitioning, present in both nodes:
    # df -ah /mnt/gvol0
    Filesystem            Size  Used Avail Use% Mounted on
    /dev/mapper/vg_gluster-lv_gvol0
                           10G  151M  9.2G   2% /mnt/gvol0
    The GlusterFS volume naming conventions are:
    • /<base directory>/<volume name>/<brick name>/brick
    For Selinux compatibility, we will use /mnt as base directory, gvol0 as volume name and brick0 as brick name. Create the volume path on each node:
    # ls -lah /mnt/gvol0
    total 28K
    drwxr-xr-x. 4 root root 4.0K Sep  9 18:52 .
    drwxr-xr-x. 3 root root 4.0K Sep  9 18:52 ..
    drwx------. 2 root root  16K Sep  9 17:33 lost+found
    # install -d -m 0755 /mnt/gvol0/brick0/brick
    We are done with the initial setup, let's proceed to GlusterFS configuration.

    GlusterFS Setup
    Install GlusterFS into each node, by running the following commands:
    # yum --enablerepo=axivo install glusterfs-server
    # service rpcbind start
    Enterprise Linux 6:
    # chkconfig glusterd on
    # service glusterd start
    Enterprise Linux 7:
    # systemctl enable glusterd.service
    # systemctl start glusterd.service
    Before probing the nodes, open the required firewall ports. GlusterFS uses the following ports:
    • 111 (tcp and udp) - rpcbind
    • 2049 (tcp) - nfs
    • 24007 (tcp) - server daemon
    • 38465:38469 (tcp) - nfs related services
    • 49152 (tcp) - brick
    We used the following iptables configuration file, for each node:
    # cat /etc/sysconfig/iptables-glusterfs
    -A INPUT -m state --state NEW -m tcp -p tcp -s 192.168.1.0/24 --dport 111         -j ACCEPT
    -A INPUT -m state --state NEW -m udp -p udp -s 192.168.1.0/24 --dport 111         -j ACCEPT
    -A INPUT -m state --state NEW -m tcp -p tcp -s 192.168.1.0/24 --dport 2049        -j ACCEPT
    -A INPUT -m state --state NEW -m tcp -p tcp -s 192.168.1.0/24 --dport 24007       -j ACCEPT
    -A INPUT -m state --state NEW -m tcp -p tcp -s 192.168.1.0/24 --dport 38465:38469 -j ACCEPT
    -A INPUT -m state --state NEW -m tcp -p tcp -s 192.168.1.0/24 --dport 49152       -j ACCEPT
    If you use Firewalld, add the following rules for each node:
    # firewall-cmd --permanent --zone=public --add-rich-rule='rule family="ipv4" source address="192.168.1.0/24" port port="111"         protocol="tcp" accept'
    # firewall-cmd --permanent --zone=public --add-rich-rule='rule family="ipv4" source address="192.168.1.0/24" port port="111"         protocol="udp" accept'
    # firewall-cmd --permanent --zone=public --add-rich-rule='rule family="ipv4" source address="192.168.1.0/24" port port="2049"        protocol="tcp" accept'
    # firewall-cmd --permanent --zone=public --add-rich-rule='rule family="ipv4" source address="192.168.1.0/24" port port="24007"       protocol="tcp" accept'
    # firewall-cmd --permanent --zone=public --add-rich-rule='rule family="ipv4" source address="192.168.1.0/24" port port="38465-38469" protocol="tcp" accept'
    # firewall-cmd --permanent --zone=public --add-rich-rule='rule family="ipv4" source address="192.168.1.0/24" port port="49152"       protocol="tcp" accept'
    You will need to add a port for each additional brick. Since we use only one brick, 49152 is sufficient.
    Make sure each node name resolves properly in your DNS setup and probe each node:
    [root@apollo ~]# gluster peer probe chronos
    peer probe: success.
    [root@chronos ~]# gluster peer probe apollo
    peer probe: success.
    On apollo only, run the following command to create the replication volume:
    [root@apollo ~]# gluster volume create gvol0 replica 2 {apollo,chronos}:/mnt/gvol0/brick0/brick
    volume create: gvol0: success: please start the volume to access data
    Breaking down the above command, we told GlusterFS to create a replica volume and keep a copy of the data on at least 2 bricks at any given time. Since we only have two bricks, this means each server will house a copy of the data. Lastly, we specify which nodes and bricks to use.

    Verify the volume information and start the volume:
    [root@apollo ~]# gluster volume info
    Volume Name: gvol0
    Type: Replicate
    Volume ID: 2b9c2607-9569-48c3-9138-08fb5d8a213f
    Status: Created
    Number of Bricks: 1 x 2 = 2
    Transport-type: tcp
    Bricks:
    Brick1: apollo:/mnt/gvol0/brick0/brick
    Brick2: chronos:/mnt/gvol0/brick0/brick
    
    [root@apollo ~]# gluster volume start gvol0
    volume start: gvol0: success
    We are done with the server setup, let's proceed to GlusterFS replication.

    Replication Setup
    We will use /var/www/html as replication directory across the two nodes. Please make sure the directory does not contain any files. Once the directory is mounted as GlusterFS type, any previous file present into directory will not be available anymore.

    Execute the following commands, to mount /var/www/html as GlusterFS type:
    [root@apollo ~]# install -d -m 0755 /var/www/html
    [root@apollo ~]# cat >> /etc/fstab << EOF
    apollo:/gvol0     /var/www/html    glusterfs    defaults    0 0
    EOF
    [root@apollo ~]# mount -a
    [root@apollo ~]# mount -l -t fuse.glusterfs
    apollo:/gvol0 on /var/www/html type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)
    
    [root@chronos ~]# install -d -m 0755 /var/www/html
    [root@chronos ~]# cat >> /etc/fstab << EOF
    chronos:/gvol0    /var/www/html    glusterfs    defaults    0 0
    EOF
    [root@chronos ~]# mount -a
    [root@chronos ~]# mount -l -t fuse.glusterfs
    chronos:/gvol0 on /var/www/html type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)
    List the GlusterFS pool, to verify any node connectivity issues:
    [root@apollo ~]# gluster pool list
    UUID                                    Hostname        State
    5cb470ae-3c88-46fa-8cb9-2dd29d26e104    chronos         Connected
    188975c7-6d69-472a-b421-641286551d28    localhost       Connected
    To test the GlusterFS replication, we installed Nginx into both nodes, created a file on apollo and changed its content as well the file permissions randomly:
    [root@apollo ~]# ls -lah /var/www/html
    total 8K
    drwxr-xr-x. 3 root  root 4.0K Sep  9 19:38 .
    drwxr-xr-x. 3 root  root 4.0K Sep  9 19:38 ..
    [root@apollo ~]# yum --enablerepo=axivo install nginx
    [root@chronos ~]# yum --enablerepo=axivo install nginx
    [root@apollo ~]# ls -lah /var/www/html
    total 12K
    drwxr-xr-x. 3 root root 4.0K Sep  9 20:01 .
    drwxr-xr-x. 3 root root 4.0K Sep  9 19:38 ..
    -rw-r--r--. 1 root root  535 Oct 11  2009 404.html
    -rw-r--r--. 1 root root  543 Oct 11  2009 50x.html
    -rw-r--r--. 1 root root  198 May  6  2006 favicon.ico
    -rw-r--r--. 1 root root  528 Oct 11  2009 index.html
    -rw-r--r--. 1 root root  377 May  6  2006 nginx.gif
    [root@apollo ~]# vi /var/www/html/info.php
    [root@chronos ~]# chown nginx /var/www/html/info.php
    [root@apollo ~]# cat /var/www/html/info.php
    <?php
    phpinfo();
    The file was instantly replicated on chronos, with identical content and permissions on both nodes:
    [root@apollo ~]# ls -lah /var/www/html
    total 13K
    drwxr-xr-x. 3 root  root 4.0K Sep  9 20:06 .
    drwxr-xr-x. 3 root  root 4.0K Sep  9 19:38 ..
    -rw-r--r--. 1 root  root  535 Oct 11  2009 404.html
    -rw-r--r--. 1 root  root  543 Oct 11  2009 50x.html
    -rw-r--r--. 1 root  root  198 May  6  2006 favicon.ico
    -rw-r--r--. 1 root  root  528 Oct 11  2009 index.html
    -rw-r--r--. 1 nginx root   18 Sep  9 20:06 info.php
    -rw-r--r--. 1 root  root  377 May  6  2006 nginx.gif
    [root@chronos ~]# cat /var/www/html/info.php
    <?php
    phpinfo();
    You are currently running a high-performance, scalable replication system.

    Troubleshooting
    The logs are the best place to start your troubleshooting, examine any encountered errors into /var/log/glusterfs/var-www-html.log file. Please don't turn off your firewall just because you think it is blocking your setup, the firewall adds an important security layer and should never be disabled. Instead, study the logs and find the source of problems.

    An easy way to determine which ports are used in GlusterFS is by running a volume status check:
    # gluster volume status
    Status of volume: gvol0
    Gluster process                                         Port    Online  Pid
    ------------------------------------------------------------------------------
    Brick apollo:/mnt/gvol0/brick0/brick                    49152   Y       1220
    Brick chronos:/mnt/gvol0/brick0/brick                   49152   Y       1363
    NFS Server on localhost                                 2049    Y       1230
    Self-heal Daemon on localhost                           N/A     Y       1235
    NFS Server on chronos                                   2049    Y       1371
    Self-heal Daemon on chronos                             N/A     Y       1375
    
    Task Status of Volume gvol0
    ------------------------------------------------------------------------------
    There are no active volume tasks
    You can also run netstat -tulpn | grep gluster to examine further the used ports.

    This tutorial covered an infinitesimal part of GlusterFS capabilities. You need to understand that rushing through the tutorial without proper understanding or reading the documentation will result in a failure. Once you understand how GlusterFS works, you are welcome to ask any related questions into our support forums.
    pamamolf, MattW and eva2000 like this.