js#vista.png msort nsort
js#vista.png msort nsort
In this tutorial I will describe how to set up a highly available NFS server that can be used as storage solution for other high-availability services like, for example, a cluster of web servers that are being loadbalanced. If you have a web server cluster with two or more nodes that serve the same web site(s), than these nodes must access the same pool of data so that every node serves the same data, no matter if the loadbalancer directs the user to node 1 or node n. This can be achieved with an NFS share on an NFS server that all web server nodes (the NFS clients) can access.
As we do not want the NFS server to become another “Single Point of Failure”, we have to make it highly available. In fact, in this tutorial I will create two NFS servers that mirror their data to each other in realtime using DRBD and that monitor each other using heartbeat, and if one NFS server fails, the other takes over silently. To the outside (e.g. the web server nodes) these two NFS servers will appear as a single NFS server.
In this setup I will use Debian Sarge (3.1) for the two NFS servers as well as for the NFS client (which represents a node of the web server cluster).
I want to say first that this is not the only way of setting up such a system. There are many ways of achieving this goal but this is the way I take. I do not issue any guarantee that this will work for you!
In this document I use the following systems:
First we set up two basic Debian systems for server1 and server2. You can do it as outlined on the first two pages of this tutorial: http://www.howtoforge.com/perfect_setup_debian_etch. As hostname, you enter server1 and server2 respectively, and as domain you enter example.com.
Regarding the partitioning, I use the following partition scheme:
/dev/sda1 -- 100 MB /boot (primary, ext3, Bootable flag: on) /dev/sda5 -- 5000 MB / (logical, ext3) /dev/sda6 -- 1000 MB swap (logical) /dev/sda7 -- 150 MB unmounted (logical, ext3) (will contain DRBD's meta data) /dev/sda8 -- 26 GB unmounted (logical, ext3) (will contain the /data directory)
You can vary the sizes of the partitions depending on your hard disk size, and the names of your partition might also vary, depending on your hardware (e.g. you might have /dev/hda1 instead of /dev/sda1 and so on). However, it is important that /dev/sda7 has a little more than 128 MB because we will use this partition for DRBD's meta data which uses 128 MB. Also, make sure /dev/sda7 as well as /dev/sda8 are identical in size on server1 and server2, and please do not mount them (when the installer asks you:
No mount point is assigned for the ext3 file system in partition #7 of SCSI1 (0,0,0) (sda). Do you want to return to the partitioning menu?
please answer No)! /dev/sda8 is going to be our data partition (i.e., our NFS share).
After the basic installation make sure that you give server1 and server2 static IP addresses (server1: 192.168.0.172, server2: 192.168.0.173), as described at the beginning of http://www.howtoforge.com/perfect_setup_debian_etch_p3).
Afterwards, you should check /etc/fstab on both systems. Mine looks like this on both systems:
# /etc/fstab: static file system information. # # proc /proc proc defaults 0 0 /dev/sda5 / ext3 defaults,errors=remount-ro 0 1 /dev/sda1 /boot ext3 defaults 0 2 /dev/sda6 none swap sw 0 0 /dev/hdc /media/cdrom0 iso9660 ro,user,noauto 0 0 /dev/fd0 /media/floppy0 auto rw,user,noauto 0 0
If you find that yours looks like this, for example:
# /etc/fstab: static file system information. # # proc /proc proc defaults 0 0 /dev/hda5 / ext3 defaults,errors=remount-ro 0 1 /dev/hda1 /boot ext3 defaults 0 2 /dev/hda6 none swap sw 0 0 /dev/hdc /media/cdrom0 iso9660 ro,user,noauto 0 0 /dev/fd0 /media/floppy0 auto rw,user,noauto 0 0
then please make sure you use /dev/hda instead of /dev/sda in the following configuration files. Also make sure that /dev/sda7 (or /dev/hda7) and /dev/sda8 (or /dev/hda8…) are not listed in /etc/fstab!
It's important that both server1 and server2 have the same system time. Therefore we install an NTP client on both:
server1/server2:
apt-get install ntp ntpdate
Afterwards you can check that both have the same time by running
server1/server2:
date
Next we install the NFS server on both server1 and server2:
server1/server2:
apt-get install nfs-kernel-server
Then we remove the system bootup links for NFS because NFS will be started and controlled by heartbeat in our setup:
server1/server2:
update-rc.d -f nfs-kernel-server remove update-rc.d -f nfs-common remove
We want to export the directory /data/export (i.e., this will be our NFS share that our web server cluster nodes will use to serve web content), so we edit /etc/exports on server1 and server2. It should contain only the following line:
server1/server2:
nano /etc/exports /data/export/ 192.168.0.0/255.255.255.0(rw,no_root_squash,no_all_squash,sync)
This means that /data/export will be accessible by all systems from the 192.168.0.x subnet. You can limit access to a single system by using 192.168.0.100/255.255.255.255 instead of 192.168.0.0/255.255.255.0, for example. See
man 5 exports
to learn more about this.
Later in this tutorial we will create /data/exports on our empty (and still unmounted!) partition /dev/sda8.
Next we install DRBD on both server1 and server2:
server1/server2:
apt-get install kernel-headers-2.6.8-2-386 drbd0.7-module-source drbd0.7-utils cd /usr/src/ tar xvfz drbd0.7.tar.gz cd modules/drbd/drbd make make install
Then edit /etc/drbd.conf on server1 and server2. It must be identical on both systems and looks like this:
server1/server2:
nano /etc/drbd.conf resource r0 { protocol C; incon-degr-cmd "halt -f"; startup { degr-wfc-timeout 120; # 2 minutes. } disk { on-io-error detach; } net { } syncer { rate 10M; group 1; al-extents 257; } on server1 { # ** EDIT ** the hostname of server 1 (uname -n) device /dev/drbd0; # disk /dev/sda8; # ** EDIT ** data partition on server 1 address 192.168.0.172:7788; # ** EDIT ** IP address on server 1 meta-disk /dev/sda7[0]; # ** EDIT ** 128MB partition for DRBD on server 1 } on server2 { # ** EDIT ** the hostname of server 2 (uname -n) device /dev/drbd0; # disk /dev/sda8; # ** EDIT ** data partition on server 2 address 192.168.0.173:7788; # ** EDIT ** IP address on server 2 meta-disk /dev/sda7[0]; # ** EDIT ** 128MB partition for DRBD on server 2 } }
As resource name you can use whatever you like. Here it's r0. Please make sure you put the correct hostnames of server1 and server2 into /etc/drbd.conf. DRBD expects the hostnames as they are shown by the command
uname -n
If you have set server1 and server2 respectively as hostnames during the basic Debian installation, then the output of uname -n should be server1 and server2.
Also make sure you replace the IP addresses and the disks appropriately. If you use /dev/hda instead of /dev/sda, please put /dev/hda8 instead of /dev/sda8 into /etc/drbd.conf (the same goes for the meta-disk where DRBD stores its meta data). /dev/sda8 (or /dev/hda8…) will be used as our NFS share later on.
Now we load the DRBD kernel module on both server1 and server2. We need to do this only now because afterwards it will be loaded by the DRBD init script.
server1/server2:
modprobe drbd
Let's configure DRBD:
server1/server2:
drbdadm up all cat /proc/drbd
The last command should show something like this (on both server1 and server2):
version: 0.7.10 (api:77/proto:74) SVN Revision: 1743 build by phil@mescal, 2005-01-31 12:22:07 0: cs:Connected st:Secondary/Secondary ld:Inconsistent ns:0 nr:0 dw:0 dr:0 al:0 bm:1548 lo:0 pe:0 ua:0 ap:0 1: cs:Unconfigured
You see that both NFS servers say that they are secondary and that the data is inconsistant. This is because no initial sync has been made yet.
I want to make server1 the primary NFS server and server2 the “hot-standby”, If server1 fails, server2 takes over, and if server1 comes back then all data that has changed in the meantime is mirrored back from server2 to server1 so that data is always consistent.
This next step has to be done only on server1!
server1:
drbdadm -- --do-what-I-say primary all
Now we start the initial sync between server1 and server2 so that the data on both servers becomes consistent. On server1, we do this:
server1:
drbdadm -- connect all
The initial sync is going to take a few hours (depending on the size of /dev/sda8 (/dev/hda8…)) so please be patient.
You can see the progress of the initial sync like this on server1 or server2:
server1/server2:
cat /proc/drbd
The output should look like this:
version: 0.7.10 (api:77/proto:74) SVN Revision: 1743 build by phil@mescal, 2005-01-31 12:22:07 0: cs:SyncSource st:Primary/Secondary ld:Consistent ns:13441632 nr:0 dw:0 dr:13467108 al:0 bm:2369 lo:0 pe:23 ua:226 ap:0 [==========>.........] sync'ed: 53.1% (11606/24733)M finish: 1:14:16 speed: 2,644 (2,204) K/sec 1: cs:Unconfigured
When the initial sync is finished, the output should look like this:
SVN Revision: 1743 build by phil@mescal, 2005-01-31 12:22:07 0: cs:Connected st:Primary/Secondary ld:Consistent ns:37139 nr:0 dw:0 dr:49035 al:0 bm:6 lo:0 pe:0 ua:0 ap:0 1: cs:Unconfigured
NFS stores some important information (e.g. information about file locks, etc.) in /var/lib/nfs. Now what happens if server1 goes down? server2 takes over, but its information in /var/lib/nfs will be different from the information in server1's /var/lib/nfs directory. Therefore we do some tweaking so that these details will be stored on our /data partition (/dev/sda8 or /dev/hda8…) which is mirrored by DRBD between server1 and server2. So if server1 goes down server2 can use the NFS details of server1.
server1/server2:
mkdir /data
server1:
mount -t ext3 /dev/drbd0 /data mv /var/lib/nfs/ /data/ ln -s /data/nfs/ /var/lib/nfs mkdir /data/export umount /data
server2:
rm -fr /var/lib/nfs/ ln -s /data/nfs/ /var/lib/nfs
heartbeat is the control instance of this whole setup. It is going to be installed on server1 and server2, and it monitors the other server. For example, if server1 goes down, heartbeat on server2 detects this and makes server2 take over. heartbeat also starts and stops the NFS server on both server1 and server2. It also provides NFS as a virtual service via the IP address 192.168.0.174 so that the web server cluster nodes see only one NFS server.
First we install heartbeat:
server1/server2:
apt-get install heartbeat
Now we have to create three configuration files for heartbeat. They must be identical on server1 and server2!
server1/server2:
logfacility local0 keepalive 2 #deadtime 30 # USE THIS!!! deadtime 10 bcast eth0 node server1 server2
As nodenames we must use the output of uname -n on server1 and server2.
server1/server2:
server1 IPaddr::192.168.0.174/24/eth0 drbddisk::r0 Filesystem::/dev/drbd0::/data::ext3 nfs-kernel-server
The first word is the output of uname -n on server1, no matter if you create the file on server1 or server2! After IPaddr we put our virtual IP address 192.168.0.174, and after drbddisk we use the resource name of our DRBD resource which is r0 here (remember, that is the resource name we use in /etc/drbd.conf - if you use another one, you must use it here, too).
server1/server2:
auth 3 3 md5 somerandomstring
somerandomstring is a password which the two heartbeat daemons on server1 and server2 use to authenticate against each other. Use your own string here. You have the choice between three authentication mechanisms. I use md5 as it is the most secure one.
/etc/heartbeat/authkeys should be readable by root only, therefore we do this:
server1/server2:
chmod 600 /etc/heartbeat/authkeys
Finally we start DRBD and heartbeat on server1 and server2:
server1/server2:
/etc/init.d/drbd start /etc/init.d/heartbeat start
Now we can do our first tests. On server1, run
server1:
ifconfig
In the output, the virtual IP address 192.168.0.174 should show up:
eth0 Link encap:Ethernet HWaddr 00:0C:29:A1:C5:9B inet addr:192.168.0.172 Bcast:192.168.0.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fea1:c59b/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:18992 errors:0 dropped:0 overruns:0 frame:0 TX packets:24816 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:2735887 (2.6 MiB) TX bytes:28119087 (26.8 MiB) Interrupt:177 Base address:0x1400 eth0:0 Link encap:Ethernet HWaddr 00:0C:29:A1:C5:9B inet addr:192.168.0.174 Bcast:192.168.0.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interrupt:177 Base address:0x1400 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:71 errors:0 dropped:0 overruns:0 frame:0 TX packets:71 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:5178 (5.0 KiB) TX bytes:5178 (5.0 KiB)
Also, run
server1:
df -h
on server1. You should see /data listed there now:
Filesystem Size Used Avail Use% Mounted on /dev/sda5 4.6G 430M 4.0G 10% / tmpfs 126M 0 126M 0% /dev/shm /dev/sda1 89M 11M 74M 13% /boot /dev/drbd0 24G 33M 23G 1% /data
If you do the same
server2:
ifconfig df -h
on server2, you shouldn't see 192.168.0.174 and /data.
Now we create a test file in /data/export on server1 and then simulate a server failure of server1 (by stopping heartbeat):
server1:
touch /data/export/test1 /etc/init.d/heartbeat stop
If you run ifconfig and df -h on server2 now, you should see the IP address 192.168.0.174 and the /data partition, and
server2:
ls -l /data/export
should list the file test1 which you created on server1 before. So it has been mirrored to server2!
Now we create another test file on server2 and see if it gets mirrored to server1 when it comes up again:
server2:
touch /data/export/test2
server1:
/etc/init.d/heartbeat start
(Wait a few seconds.)
ifconfig df -h ls -l /data/export
You should see 192.168.0.174 and /data again on server1 which means it has taken over again (because we defined it as primary), and you should also see the file /data/export/test2!
Now we install NFS on our client (192.168.0.100):
apt-get install nfs-common
Next we create the /data directory and mount our NFS share into it:
mkdir /data mount 192.168.0.174:/data/export /data
192.168.0.174 is the virtual IP address we configured before. You must make sure that the forward and the reverse DNS record for client.example.com match each other, otherwise you get a “Permission denied” error on the client, and on the server you'll find this in /var/log/syslog:
#Mar 2 04:19:09 localhost rpc.mountd: Fake hostname localhost for 192.168.0.100 - forward lookup doesn't match reverse
If you do not have proper DNS records (or do not have a DNS server for your local network) you must change this now, otherwise you cannot mount the NFS share!
If it works you can now create further test files in /data on the client and then simulate failures of server1 and server2 (but not both at a time!) and check if the test files are replicated. On the client you shouldn't notice at all if server1 or server2 fails - the data in the /data directory should always be available (unless server1 and server2 fail at the same time…).
To unmount the /data directory, run
umount /data
If you want to automatically mount the NFS share at boot time, put the following line into /etc/fstab:
192.168.0.174:/data/export /data nfs rw 0 0