Setting Up A High Availability NFS Server

In this tutorial I will describe how to set up a highly available NFS server that can be used as storage solution for other high-availability services like, for example, a cluster of web servers that are being loadbalanced. If you have a web server cluster with two or more nodes that serve the same web site(s), than these nodes must access the same pool of data so that every node serves the same data, no matter if the loadbalancer directs the user to node 1 or node n. This can be achieved with an NFS share on an NFS server that all web server nodes (the NFS clients) can access.

As we do not want the NFS server to become another “Single Point of Failure”, we have to make it highly available. In fact, in this tutorial I will create two NFS servers that mirror their data to each other in realtime using DRBD and that monitor each other using heartbeat, and if one NFS server fails, the other takes over silently. To the outside (e.g. the web server nodes) these two NFS servers will appear as a single NFS server.

In this setup I will use Debian Sarge (3.1) for the two NFS servers as well as for the NFS client (which represents a node of the web server cluster).

I want to say first that this is not the only way of setting up such a system. There are many ways of achieving this goal but this is the way I take. I do not issue any guarantee that this will work for you!

1 My Setup

In this document I use the following systems:

NFS server 1: server1.example.com, IP address: 192.168.0.172; I will refer to this one as server1.
NFS server 2: server2.example.com, IP address: 192.168.0.173; I will refer to this one as server2.
Virtual IP address: I use 192.168.0.174 as the virtual IP address that represents the NFS cluster to the outside.
NFS client (e.g. a node from the web server cluster): client.example.com, IP address: 192.168.0.100; I will refer to the NFS client as client.
The /data directory will be mirrored by DRBD between server1 and server2. It will contain the NFS share /data/export.

2 Basic Installation Of server1 and server2

First we set up two basic Debian systems for server1 and server2. You can do it as outlined on the first two pages of this tutorial: http://www.howtoforge.com/perfect_setup_debian_etch. As hostname, you enter server1 and server2 respectively, and as domain you enter example.com.

Regarding the partitioning, I use the following partition scheme:

/dev/sda1 -- 100 MB /boot (primary, ext3, Bootable flag: on)
/dev/sda5 -- 5000 MB / (logical, ext3)
/dev/sda6 -- 1000 MB swap (logical)
/dev/sda7 -- 150 MB unmounted (logical, ext3) (will contain DRBD's meta data)
/dev/sda8 -- 26 GB unmounted (logical, ext3) (will contain the /data directory)

You can vary the sizes of the partitions depending on your hard disk size, and the names of your partition might also vary, depending on your hardware (e.g. you might have /dev/hda1 instead of /dev/sda1 and so on). However, it is important that /dev/sda7 has a little more than 128 MB because we will use this partition for DRBD's meta data which uses 128 MB. Also, make sure /dev/sda7 as well as /dev/sda8 are identical in size on server1 and server2, and please do not mount them (when the installer asks you:

No mount point is assigned for the ext3 file system in partition #7 of SCSI1 (0,0,0) (sda). Do you want to return to the partitioning menu?

please answer No)! /dev/sda8 is going to be our data partition (i.e., our NFS share).

After the basic installation make sure that you give server1 and server2 static IP addresses (server1: 192.168.0.172, server2: 192.168.0.173), as described at the beginning of http://www.howtoforge.com/perfect_setup_debian_etch_p3).

Afterwards, you should check /etc/fstab on both systems. Mine looks like this on both systems:

# /etc/fstab: static file system information.
#
#                
proc            /proc           proc    defaults        0       0
/dev/sda5       /               ext3    defaults,errors=remount-ro 0       1
/dev/sda1       /boot           ext3    defaults        0       2
/dev/sda6       none            swap    sw              0       0
/dev/hdc        /media/cdrom0   iso9660 ro,user,noauto  0       0
/dev/fd0        /media/floppy0  auto    rw,user,noauto  0       0

If you find that yours looks like this, for example:

# /etc/fstab: static file system information.
#
#                
proc            /proc           proc    defaults        0       0
/dev/hda5       /               ext3    defaults,errors=remount-ro 0       1
/dev/hda1       /boot           ext3    defaults        0       2
/dev/hda6       none            swap    sw              0       0
/dev/hdc        /media/cdrom0   iso9660 ro,user,noauto  0       0
/dev/fd0        /media/floppy0  auto    rw,user,noauto  0       0

then please make sure you use /dev/hda instead of /dev/sda in the following configuration files. Also make sure that /dev/sda7 (or /dev/hda7) and /dev/sda8 (or /dev/hda8…) are not listed in /etc/fstab!

3 Synchronize System Time

It's important that both server1 and server2 have the same system time. Therefore we install an NTP client on both:

server1/server2:

apt-get install ntp ntpdate

Afterwards you can check that both have the same time by running

server1/server2:

date

4 Install NFS Server

Next we install the NFS server on both server1 and server2:

server1/server2:

apt-get install nfs-kernel-server

Then we remove the system bootup links for NFS because NFS will be started and controlled by heartbeat in our setup:

server1/server2:

update-rc.d -f nfs-kernel-server remove
update-rc.d -f nfs-common remove

We want to export the directory /data/export (i.e., this will be our NFS share that our web server cluster nodes will use to serve web content), so we edit /etc/exports on server1 and server2. It should contain only the following line:

server1/server2:

nano /etc/exports

/data/export/ 192.168.0.0/255.255.255.0(rw,no_root_squash,no_all_squash,sync)

This means that /data/export will be accessible by all systems from the 192.168.0.x subnet. You can limit access to a single system by using 192.168.0.100/255.255.255.255 instead of 192.168.0.0/255.255.255.0, for example. See

man 5 exports

to learn more about this.

Later in this tutorial we will create /data/exports on our empty (and still unmounted!) partition /dev/sda8.

5 Install DRBD

Next we install DRBD on both server1 and server2:

server1/server2:

apt-get install kernel-headers-2.6.8-2-386 drbd0.7-module-source drbd0.7-utils
cd /usr/src/
tar xvfz drbd0.7.tar.gz
cd modules/drbd/drbd
make
make install

Then edit /etc/drbd.conf on server1 and server2. It must be identical on both systems and looks like this:

server1/server2:

nano /etc/drbd.conf

resource r0 {

 protocol C;
 incon-degr-cmd "halt -f";

 startup {
    degr-wfc-timeout 120;    # 2 minutes.
  }

  disk {
    on-io-error   detach;
  }

  net {

  }

  syncer {

    rate 10M;

    group 1;

    al-extents 257;
  }

 on server1 {                # ** EDIT ** the hostname of server 1 (uname -n)
   device     /dev/drbd0;        #
   disk       /dev/sda8;         # ** EDIT ** data partition on server 1
   address    192.168.0.172:7788; # ** EDIT ** IP address on server 1
   meta-disk  /dev/sda7[0];      # ** EDIT ** 128MB partition for DRBD on server 1
  }

 on server2 {                # ** EDIT ** the hostname of server 2 (uname -n)
   device    /dev/drbd0;         #
   disk      /dev/sda8;          # ** EDIT ** data partition on server 2
   address   192.168.0.173:7788;  # ** EDIT ** IP address on server 2
   meta-disk /dev/sda7[0];       # ** EDIT ** 128MB partition for DRBD on server 2
  }

}

As resource name you can use whatever you like. Here it's r0. Please make sure you put the correct hostnames of server1 and server2 into /etc/drbd.conf. DRBD expects the hostnames as they are shown by the command

uname -n

If you have set server1 and server2 respectively as hostnames during the basic Debian installation, then the output of uname -n should be server1 and server2.

Also make sure you replace the IP addresses and the disks appropriately. If you use /dev/hda instead of /dev/sda, please put /dev/hda8 instead of /dev/sda8 into /etc/drbd.conf (the same goes for the meta-disk where DRBD stores its meta data). /dev/sda8 (or /dev/hda8…) will be used as our NFS share later on.

6 Configure DRBD

Now we load the DRBD kernel module on both server1 and server2. We need to do this only now because afterwards it will be loaded by the DRBD init script.

server1/server2:

modprobe drbd

Let's configure DRBD:

server1/server2:

drbdadm up all
cat /proc/drbd

The last command should show something like this (on both server1 and server2):

version: 0.7.10 (api:77/proto:74)
SVN Revision: 1743 build by phil@mescal, 2005-01-31 12:22:07
 0: cs:Connected st:Secondary/Secondary ld:Inconsistent
    ns:0 nr:0 dw:0 dr:0 al:0 bm:1548 lo:0 pe:0 ua:0 ap:0
 1: cs:Unconfigured

You see that both NFS servers say that they are secondary and that the data is inconsistant. This is because no initial sync has been made yet.

I want to make server1 the primary NFS server and server2 the “hot-standby”, If server1 fails, server2 takes over, and if server1 comes back then all data that has changed in the meantime is mirrored back from server2 to server1 so that data is always consistent.

This next step has to be done only on server1!

server1:

drbdadm -- --do-what-I-say primary all

Now we start the initial sync between server1 and server2 so that the data on both servers becomes consistent. On server1, we do this:

server1:

drbdadm -- connect all

The initial sync is going to take a few hours (depending on the size of /dev/sda8 (/dev/hda8…)) so please be patient.

You can see the progress of the initial sync like this on server1 or server2:

server1/server2:

cat /proc/drbd

The output should look like this:

version: 0.7.10 (api:77/proto:74)
SVN Revision: 1743 build by phil@mescal, 2005-01-31 12:22:07
 0: cs:SyncSource st:Primary/Secondary ld:Consistent
    ns:13441632 nr:0 dw:0 dr:13467108 al:0 bm:2369 lo:0 pe:23 ua:226 ap:0
        [==========>.........] sync'ed: 53.1% (11606/24733)M
        finish: 1:14:16 speed: 2,644 (2,204) K/sec
 1: cs:Unconfigured

When the initial sync is finished, the output should look like this:

SVN Revision: 1743 build by phil@mescal, 2005-01-31 12:22:07
 0: cs:Connected st:Primary/Secondary ld:Consistent
    ns:37139 nr:0 dw:0 dr:49035 al:0 bm:6 lo:0 pe:0 ua:0 ap:0
 1: cs:Unconfigured

7 Some Further NFS Configuration

NFS stores some important information (e.g. information about file locks, etc.) in /var/lib/nfs. Now what happens if server1 goes down? server2 takes over, but its information in /var/lib/nfs will be different from the information in server1's /var/lib/nfs directory. Therefore we do some tweaking so that these details will be stored on our /data partition (/dev/sda8 or /dev/hda8…) which is mirrored by DRBD between server1 and server2. So if server1 goes down server2 can use the NFS details of server1.

server1/server2:

mkdir /data

server1:

mount -t ext3 /dev/drbd0 /data
mv /var/lib/nfs/ /data/
ln -s /data/nfs/ /var/lib/nfs
mkdir /data/export
umount /data

server2:

rm -fr /var/lib/nfs/
ln -s /data/nfs/ /var/lib/nfs

8 Install And Configure heartbeat

heartbeat is the control instance of this whole setup. It is going to be installed on server1 and server2, and it monitors the other server. For example, if server1 goes down, heartbeat on server2 detects this and makes server2 take over. heartbeat also starts and stops the NFS server on both server1 and server2. It also provides NFS as a virtual service via the IP address 192.168.0.174 so that the web server cluster nodes see only one NFS server.

First we install heartbeat:

server1/server2:

apt-get install heartbeat

Now we have to create three configuration files for heartbeat. They must be identical on server1 and server2!

server1/server2:

/etc/heartbeat/ha.cf:

logfacility     local0
keepalive 2
#deadtime 30 # USE THIS!!!
deadtime 10
bcast   eth0
node server1 server2

As nodenames we must use the output of uname -n on server1 and server2.

server1/server2:

/etc/heartbeat/haresources:

server1  IPaddr::192.168.0.174/24/eth0 drbddisk::r0 Filesystem::/dev/drbd0::/data::ext3 nfs-kernel-server

The first word is the output of uname -n on server1, no matter if you create the file on server1 or server2! After IPaddr we put our virtual IP address 192.168.0.174, and after drbddisk we use the resource name of our DRBD resource which is r0 here (remember, that is the resource name we use in /etc/drbd.conf - if you use another one, you must use it here, too).

server1/server2:

/etc/heartbeat/authkeys:

auth 3
3 md5 somerandomstring

somerandomstring is a password which the two heartbeat daemons on server1 and server2 use to authenticate against each other. Use your own string here. You have the choice between three authentication mechanisms. I use md5 as it is the most secure one.

/etc/heartbeat/authkeys should be readable by root only, therefore we do this:

server1/server2:

chmod 600 /etc/heartbeat/authkeys

Finally we start DRBD and heartbeat on server1 and server2:

server1/server2:

/etc/init.d/drbd start
/etc/init.d/heartbeat start

9 First Tests

Now we can do our first tests. On server1, run

server1:

ifconfig

In the output, the virtual IP address 192.168.0.174 should show up:

eth0      Link encap:Ethernet  HWaddr 00:0C:29:A1:C5:9B
          inet addr:192.168.0.172  Bcast:192.168.0.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fea1:c59b/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:18992 errors:0 dropped:0 overruns:0 frame:0
          TX packets:24816 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2735887 (2.6 MiB)  TX bytes:28119087 (26.8 MiB)
          Interrupt:177 Base address:0x1400

eth0:0    Link encap:Ethernet  HWaddr 00:0C:29:A1:C5:9B
          inet addr:192.168.0.174  Bcast:192.168.0.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:177 Base address:0x1400

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:71 errors:0 dropped:0 overruns:0 frame:0
          TX packets:71 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:5178 (5.0 KiB)  TX bytes:5178 (5.0 KiB)

Also, run

server1:

df -h

on server1. You should see /data listed there now:

Filesystem            Size  Used Avail Use% Mounted on
/dev/sda5             4.6G  430M  4.0G  10% /
tmpfs                 126M     0  126M   0% /dev/shm
/dev/sda1              89M   11M   74M  13% /boot
/dev/drbd0             24G   33M   23G   1% /data

If you do the same

server2:

ifconfig
df -h

on server2, you shouldn't see 192.168.0.174 and /data.

Now we create a test file in /data/export on server1 and then simulate a server failure of server1 (by stopping heartbeat):

server1:

touch /data/export/test1
/etc/init.d/heartbeat stop

If you run ifconfig and df -h on server2 now, you should see the IP address 192.168.0.174 and the /data partition, and

server2:

ls -l /data/export

should list the file test1 which you created on server1 before. So it has been mirrored to server2!

Now we create another test file on server2 and see if it gets mirrored to server1 when it comes up again:

server2:

touch /data/export/test2

server1:

/etc/init.d/heartbeat start

(Wait a few seconds.)

ifconfig
df -h
ls -l /data/export

You should see 192.168.0.174 and /data again on server1 which means it has taken over again (because we defined it as primary), and you should also see the file /data/export/test2!

10 Configure The NFS Client

Now we install NFS on our client (192.168.0.100):

apt-get install nfs-common

Next we create the /data directory and mount our NFS share into it:

mkdir /data
mount 192.168.0.174:/data/export /data

192.168.0.174 is the virtual IP address we configured before. You must make sure that the forward and the reverse DNS record for client.example.com match each other, otherwise you get a “Permission denied” error on the client, and on the server you'll find this in /var/log/syslog:

#Mar  2 04:19:09 localhost rpc.mountd: Fake hostname localhost for 192.168.0.100 - forward lookup doesn't match reverse

If you do not have proper DNS records (or do not have a DNS server for your local network) you must change this now, otherwise you cannot mount the NFS share!

If it works you can now create further test files in /data on the client and then simulate failures of server1 and server2 (but not both at a time!) and check if the test files are replicated. On the client you shouldn't notice at all if server1 or server2 fails - the data in the /data directory should always be available (unless server1 and server2 fail at the same time…).

To unmount the /data directory, run

umount /data

If you want to automatically mount the NFS share at boot time, put the following line into /etc/fstab:

192.168.0.174:/data/export  /data    nfs          rw            0    0

Links

NFS: http://nfs.sourceforge.net
DRBD: http://www.drbd.org
heartbeat / The High-Availability Linux Project: http://linux-ha.org
The Linux Virtual Server Project: http://www.linuxvirtualserver.org

Chucks notes and stuff

Sidebar

Table of Contents

Setting Up A High Availability NFS Server

1 My Setup

2 Basic Installation Of server1 and server2

3 Synchronize System Time

4 Install NFS Server

5 Install DRBD

6 Configure DRBD

7 Some Further NFS Configuration

8 Install And Configure heartbeat

9 First Tests

10 Configure The NFS Client

Links

Chucks notes and stuff

User Tools

Site Tools

Sidebar

Table of Contents

Setting Up A High Availability NFS Server

1 My Setup

2 Basic Installation Of server1 and server2

3 Synchronize System Time

4 Install NFS Server

5 Install DRBD

6 Configure DRBD

7 Some Further NFS Configuration

8 Install And Configure heartbeat

9 First Tests

10 Configure The NFS Client

Links

Page Tools