In order to be able to do a live migration of a Xen guest from one cluster member to another, some sort of shared storage is required. As the Xen guest won’t run on more than one cluster member at a time, a cluster filesystem is not required. That is, as long as you configure Xen to access the Xen guest by a physical device, not a file.
As a FCAL-based SAN is not always available, we looked for other possibilities. Inspired by some Oracle RAC documentation (see: http://www.oracle.com/technology/pub/articles/hunter_rac10gr2.html) a shared firewire disk or -array appeared to be an option.
A second possibility is mentioned in Xen live migration documentation from Novell(see below). In this proof of concept iSCSI has been used as a shared storage solution.
By default Linux logs on a firewire device in exclusive mode. This prevents you from accidently accessing the same device with another node and screwing up your data. Fortunately you can bypass the exclusive login mechanism using a kernel module option to the serial bus protocol kernel module (sbp2 exclusive_login=0).
For this to work, the chipset of your firewire device(s) should support multiple logins. For example the Oxford-chipset is known to support multiple logins. Check the afore-mentioned Oracle RAC documentation for more information on shared firewire hardware.
Adjust /etc/modules for the necessary modules
sd_mod ieee1394 ohci1394 sbp2 exclusive_login=0
Adjust /etc/modprobe.d/sbp2
options sbp2 exclusive_login=0 serialize_io=1
For immediate effect:
rmmod sbp2 modprobe sbp2
For permanent effect, rebuild your initial-ramdisk, to have these options also used in this (because the sbp2 module is loaded at boottime).
mkinitramfs -o foo.version.img <kernel-version> or mkinitrd -o /boot/initrd.img-2.6.12.6-xen-fw 2.6.12.6-xen (example)
First we need an iSCSI target. That is a device/server that provides shared storage on your network. We use the iSCSI enterprise target software to build a Linux based iSCSI target server.
=from source=
You can download the software at http://iscsitarget.sourceforge.net/
After building and installing the software, you’ll have a kernel module named iscsi_trgt, a daemon called ‘ietd’ and a tool called ‘ietdadm’.
=package=
Although binary packages are not yet available for Debian Etch, Philipp Hug created unofficial packages for Debian Sid and Ubuntu Dapper. They are available at http://iscsitarget.sourceforge.net/wiki/index.php/Unoffical_DEBs
We installed the binary package named ‘iscsitarget’, which contains the userland binaries, on Debian Etch with no problems. The package named ‘iscsitarget-source’ contains the kernel module sources. This package allows you to build a binary kernel module for your kernel. The build on Debian Etch went flawlessly.
Add this line to /etc/apt/sources.list
deb http://debian.hug.cx/debian/ unstable/
Then procede to install the software and build the kernelmodule
apt-get install module-assistant debhelper linux-source-2.6.18 dpkg-dev \
kernel-package libncurses-dev libssl-dev linux-headers-2.6.18-4-xen-amd64
cd /usr/src/
tar -jxvf linux-source-2.6.18.tar.bz2
ln -s linux-source-2.6.18 linux
apt-get install iscsitarget iscsitarget-source
tar -zxvf iscsitarget.tar.gz (this unpacks in sub-dir iscsitarget)
m-a a-i iscsitarget
Let’s configure the daemon. We have to tell it which device(s) to enable and which clients should be able to access them. In the next example we will enable the logical volumes named ‘vault1’ and ‘vault2’ to everybody. The configuration file is /etc/ietd.conf
Target iqn.2006-07.com.example.intra:storage.disk1.vault
Lun 0 Path=/dev/mapper/vg00-vault1,Type=fileio
Alias vault1
Target iqn.2006-07.com.example.intra:storage.disk2.vault
Lun 1 Path=/dev/mapper/vg00-vault2,Type=fileio
Alias vault2
Remember that every node on your network that uses iSCSI will need a unique ‘iqn’. Check the iSCSI documentation on the web for the applicable syntax. You can add some lines to /etc/ietd.conf that require a username/password for iSCSI logons to succeed but this is ommitted by default.
Start the iSCSI target daemon to enable the shared storage provider. This will open TCP port 3260 by default.
/etc/init.d/iscsi-target start
On the clients that will have to access the iSCSI based shared storage we need to install and configure iSCSI initiator software. We’ll use the Open iSCSI package.
=from source=
The source is available at http://www.open-iscsi.org/
=package=
Debian Etch has a binary package for iSCSI clients.
apt-get install open-iscsi
After building and installing the software, you’ll have two kernel modules named iscsi_tcp and scsi_transport_iscsi, a daemon called ‘iscsid’ and a tool called ‘iscsiadm’.
The install procedure of Open iSCSI will create a configuration file /etc/iscsid.conf that enables some defaults:
node.active_cnx = 1 node.startup = manual node.session.timeo.replacement_timeout = 120 node.session.err_timeo.abort_timeout = 10 node.session.err_timeo.reset_timeout = 30 node.session.iscsi.InitialR2T = No node.session.iscsi.ImmediateData = Yes node.session.iscsi.FirstBurstLength = 262144 node.session.iscsi.MaxBurstLength = 16776192 node.session.iscsi.DefaultTime2Wait = 0 node.session.iscsi.DefaultTime2Retain = 0 node.session.iscsi.MaxConnections = 0 node.cnx[0].iscsi.HeaderDigest = None node.cnx[0].iscsi.DataDigest = None node.cnx[0].iscsi.MaxRecvDataSegmentLength = 65536
Now it’s imperative to create a unique ‘iqn’ for our client and store it in /etc/initiatorname.iscsi
InitiatorName=iqn.2006-07.com.example.intra:hannibal.clientnode1
Afterwards, start the Open iSCSI daemon on the client
/etc/init.d/open-iscsi start
Let’s check at our iSCSI target server, for instance 192.168.1.16
iscsiadm -m discovery -t sendtargets -p 192.168.1.16:3260
After logging on to the iSCSI-target a new SCSI-device should have been added to the client
iscsiadm -m node -T iqn.2006-07.com.example.intra:storage.disk1.vault -p 192.168.1.16:3260 -l
Have fun!
iSCSI supports more than one connection to the same iSCSI LUN. This allows for high available setups. In our setup we have at least two NIC’s in the iSCSI-server as well as in the iSCSI-clients. We configure them using two ethernet segments, 192.168.1.x and 192.168.2.x. Now we can initiate two iSCSI sessions per iSCSI-client, one per segment. As a result the iSCSI-client will end up with two new SCSI-devices, that have the same LUN as a target (but via two different paths!).
On Linux the multipath-tools can map our two newly obtained SCSI-devices into one multipath blockdevice that has loadbalancing and failover as features. In addition to using multipath one could also consider to setup a host-based mirror for the shared storage. This could be accomplished by setting up two or more iSCSI-servers (targets) and join them in a software mirror (RAID-1) MD-device. This is left as an exercise for the reader
We installed the multipath-tools on a Debian Etch Xen-host.
apt-get install multipath-tools
Create the /etc/multipath.conf file (some examples are available in /usr/share/doc/multipath-tools/examples). The SCSI-ID that must be entered on the ‘wwid’ line can be obtained by the scsi_id tool. In our example we’ll get the same id for sdd and sde, remember they’re two paths to the same LUN!
/sbin/scsi_id -g -u -s /block/sdd
defaults {
user_friendly_names yes
}
defaults {
udev_dir /dev
polling_interval 5
default_selector "round-robin 0"
default_getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
failback immediate
}
blacklist {
wwid 200d04b651805e38e
devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
devnode "^hd[a-z][[0-9]*]"
devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]"
}
multipaths {
multipath {
wwid 149455400000000000000000001000000691000000d000000
alias vault1
path_grouping_policy failover
path_checker readsector0
}
Now after a reload of the multipath-tools and logging on the the iSCSI-targets, your multipath blockdevice will be ready for usage.
/etc/init.d/multipath-tools reload /usr/bin/iscsiadm -m node -T iqn.2006-07.com.example.intra:storage.disk1.vault -p 192.168.1.16:3260 -l /usr/bin/iscsiadm -m node -T iqn.2006-07.com.example.intra:storage.disk1.vault -p 192.168.2.16:3260 -l
Lets check our new device:
multipath -ll
The output is something like
vault1 (149455400000000000000000001000000691000000d000000) dm-7 IET,VIRTUAL-DISK [size=50G][features=0][hwhandler=0] \_ round-robin 0 [prio=1][enabled] \_ 4:0:0:0 sdd 8:48 [active][ready] \_ round-robin 0 [prio=1][enabled] \_ 5:0:0:0 sde 8:64 [active][ready]
Binary package with debian unstable.
apt-get install drbd8-utils
To make the module (from source):
apt-get install module-assistant debhelper dpkg-dev kernel-package libncurses-dev libssl-dev m-a a-i drbd8
Edit /etc/drbd.conf
work-in-progress.....
Configure the xen relocation service
... (xend-relocation-address '') (xend-relocation-server yes) (xend-relocation-port 8002) (xend-relocation-address '') (xend-relocation-hosts-allow '') ...
Restart xend on both nodes and make sure that port 8002 accepts connections from everywhere. Check for a LISTENER line with netstat
Time has to be synced between both nodes (See Time server).
With two xen guests running on the same dom0, which provides a shared disk to both of them (in order to simulate a shared-storage cluster environment), add a bang after the w in the xenguest’s configuration file to force Xen to allow you to mount it anyway (instead of giving a message).
Remember to use a cluster-aware filesystem like OCFS2 or GFS so the two VMs won’t mess up each other.
The config-line would then be:
disk=['phy:vgxen01/lv_linva06_hda,hda,w',*'phy:vgxen01/lv_linva04_06_hdb,hdb,w!'*] instead of: disk=['phy:vgxen01/lv_linva06_hda,hda,w',*'phy:vgxen01/lv_linva04_06_hdb,hdb,w'*]
Start a domU on node1 and relocate it to node-2 with the command:
xm migrate --live name_xen_guest node2