Xen-clustering/live-migration

shared storage

In order to be able to do a live migration of a Xen guest from one cluster member to another, some sort of shared storage is required. As the Xen guest won’t run on more than one cluster member at a time, a cluster filesystem is not required. That is, as long as you configure Xen to access the Xen guest by a physical device, not a file.

As a FCAL-based SAN is not always available, we looked for other possibilities. Inspired by some Oracle RAC documentation (see: http://www.oracle.com/technology/pub/articles/hunter_rac10gr2.html) a shared firewire disk or -array appeared to be an option.

A second possibility is mentioned in Xen live migration documentation from Novell(see below). In this proof of concept iSCSI has been used as a shared storage solution.

Firewire

By default Linux logs on a firewire device in exclusive mode. This prevents you from accidently accessing the same device with another node and screwing up your data. Fortunately you can bypass the exclusive login mechanism using a kernel module option to the serial bus protocol kernel module (sbp2 exclusive_login=0).

For this to work, the chipset of your firewire device(s) should support multiple logins. For example the Oxford-chipset is known to support multiple logins. Check the afore-mentioned Oracle RAC documentation for more information on shared firewire hardware.

Adjust /etc/modules for the necessary modules

sd_mod
ieee1394
ohci1394
sbp2 exclusive_login=0

Adjust /etc/modprobe.d/sbp2

options sbp2 exclusive_login=0 serialize_io=1

For immediate effect:

rmmod sbp2
modprobe sbp2

For permanent effect, rebuild your initial-ramdisk, to have these options also used in this (because the sbp2 module is loaded at boottime).

mkinitramfs -o foo.version.img <kernel-version>
or
mkinitrd -o /boot/initrd.img-2.6.12.6-xen-fw 2.6.12.6-xen   (example)

iSCSI

target

First we need an iSCSI target. That is a device/server that provides shared storage on your network. We use the iSCSI enterprise target software to build a Linux based iSCSI target server.

install

=from source=

You can download the software at http://iscsitarget.sourceforge.net/

After building and installing the software, you’ll have a kernel module named iscsi_trgt, a daemon called ‘ietd’ and a tool called ‘ietdadm’.

=package=

Although binary packages are not yet available for Debian Etch, Philipp Hug created unofficial packages for Debian Sid and Ubuntu Dapper. They are available at http://iscsitarget.sourceforge.net/wiki/index.php/Unoffical_DEBs

We installed the binary package named ‘iscsitarget’, which contains the userland binaries, on Debian Etch with no problems. The package named ‘iscsitarget-source’ contains the kernel module sources. This package allows you to build a binary kernel module for your kernel. The build on Debian Etch went flawlessly.

Add this line to /etc/apt/sources.list

deb http://debian.hug.cx/debian/ unstable/

Then procede to install the software and build the kernelmodule

apt-get install module-assistant debhelper linux-source-2.6.18 dpkg-dev \
                kernel-package libncurses-dev libssl-dev linux-headers-2.6.18-4-xen-amd64
cd /usr/src/
tar -jxvf linux-source-2.6.18.tar.bz2
ln -s linux-source-2.6.18 linux

apt-get install  iscsitarget iscsitarget-source
tar -zxvf iscsitarget.tar.gz  (this unpacks in sub-dir iscsitarget)

m-a a-i iscsitarget
config

Let’s configure the daemon. We have to tell it which device(s) to enable and which clients should be able to access them. In the next example we will enable the logical volumes named ‘vault1’ and ‘vault2’ to everybody. The configuration file is /etc/ietd.conf

Target iqn.2006-07.com.example.intra:storage.disk1.vault
       Lun 0 Path=/dev/mapper/vg00-vault1,Type=fileio
       Alias vault1
Target iqn.2006-07.com.example.intra:storage.disk2.vault
       Lun 1 Path=/dev/mapper/vg00-vault2,Type=fileio
       Alias vault2

Remember that every node on your network that uses iSCSI will need a unique ‘iqn’. Check the iSCSI documentation on the web for the applicable syntax. You can add some lines to /etc/ietd.conf that require a username/password for iSCSI logons to succeed but this is ommitted by default.

Start the iSCSI target daemon to enable the shared storage provider. This will open TCP port 3260 by default.

/etc/init.d/iscsi-target start

initiator

On the clients that will have to access the iSCSI based shared storage we need to install and configure iSCSI initiator software. We’ll use the Open iSCSI package.

install

=from source=

The source is available at http://www.open-iscsi.org/

=package=

Debian Etch has a binary package for iSCSI clients.

apt-get install open-iscsi
config

After building and installing the software, you’ll have two kernel modules named iscsi_tcp and scsi_transport_iscsi, a daemon called ‘iscsid’ and a tool called ‘iscsiadm’.

The install procedure of Open iSCSI will create a configuration file /etc/iscsid.conf that enables some defaults:

node.active_cnx = 1
node.startup = manual
node.session.timeo.replacement_timeout = 120
node.session.err_timeo.abort_timeout = 10
node.session.err_timeo.reset_timeout = 30
node.session.iscsi.InitialR2T = No
node.session.iscsi.ImmediateData = Yes
node.session.iscsi.FirstBurstLength = 262144
node.session.iscsi.MaxBurstLength = 16776192
node.session.iscsi.DefaultTime2Wait = 0
node.session.iscsi.DefaultTime2Retain = 0
node.session.iscsi.MaxConnections = 0
node.cnx[0].iscsi.HeaderDigest = None
node.cnx[0].iscsi.DataDigest = None
node.cnx[0].iscsi.MaxRecvDataSegmentLength = 65536

Now it’s imperative to create a unique ‘iqn’ for our client and store it in /etc/initiatorname.iscsi

InitiatorName=iqn.2006-07.com.example.intra:hannibal.clientnode1

Afterwards, start the Open iSCSI daemon on the client

/etc/init.d/open-iscsi start

Let’s check at our iSCSI target server, for instance 192.168.1.16

iscsiadm -m discovery -t sendtargets -p 192.168.1.16:3260

After logging on to the iSCSI-target a new SCSI-device should have been added to the client

iscsiadm -m node -T iqn.2006-07.com.example.intra:storage.disk1.vault -p 192.168.1.16:3260 -l

Have fun!

multipath

iSCSI supports more than one connection to the same iSCSI LUN. This allows for high available setups. In our setup we have at least two NIC’s in the iSCSI-server as well as in the iSCSI-clients. We configure them using two ethernet segments, 192.168.1.x and 192.168.2.x. Now we can initiate two iSCSI sessions per iSCSI-client, one per segment. As a result the iSCSI-client will end up with two new SCSI-devices, that have the same LUN as a target (but via two different paths!).

On Linux the multipath-tools can map our two newly obtained SCSI-devices into one multipath blockdevice that has loadbalancing and failover as features. In addition to using multipath one could also consider to setup a host-based mirror for the shared storage. This could be accomplished by setting up two or more iSCSI-servers (targets) and join them in a software mirror (RAID-1) MD-device. This is left as an exercise for the reader :-)

We installed the multipath-tools on a Debian Etch Xen-host.

apt-get install multipath-tools

Create the /etc/multipath.conf file (some examples are available in /usr/share/doc/multipath-tools/examples). The SCSI-ID that must be entered on the ‘wwid’ line can be obtained by the scsi_id tool. In our example we’ll get the same id for sdd and sde, remember they’re two paths to the same LUN!

/sbin/scsi_id -g -u -s /block/sdd
defaults {
   user_friendly_names yes
}
defaults {
       udev_dir        /dev
       polling_interval 5
       default_selector        "round-robin 0"
       default_getuid_callout  "/sbin/scsi_id -g -u -s /block/%n"
       failback        immediate
}
blacklist {
       wwid    200d04b651805e38e
       devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
       devnode "^hd[a-z][[0-9]*]"
       devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]"
}
multipaths {
       multipath {
               wwid                    149455400000000000000000001000000691000000d000000
               alias                   vault1
               path_grouping_policy    failover
               path_checker            readsector0
}

Now after a reload of the multipath-tools and logging on the the iSCSI-targets, your multipath blockdevice will be ready for usage.

/etc/init.d/multipath-tools reload
/usr/bin/iscsiadm -m node -T iqn.2006-07.com.example.intra:storage.disk1.vault -p 192.168.1.16:3260 -l
/usr/bin/iscsiadm -m node -T iqn.2006-07.com.example.intra:storage.disk1.vault -p 192.168.2.16:3260 -l

Lets check our new device:

multipath -ll

The output is something like

vault1 (149455400000000000000000001000000691000000d000000) dm-7 IET,VIRTUAL-DISK
[size=50G][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][enabled]
 \_ 4:0:0:0 sdd 8:48  [active][ready] 
\_ round-robin 0 [prio=1][enabled]
 \_ 5:0:0:0 sde 8:64  [active][ready]

Distributed replicated block device (drbd8)

active/active with version 8

Binary package with debian unstable.

apt-get install drbd8-utils

To make the module (from source):

apt-get install module-assistant debhelper dpkg-dev kernel-package libncurses-dev libssl-dev m-a a-i drbd8

Edit /etc/drbd.conf

work-in-progress.....

Config

Configure the xen relocation service

 ...
(xend-relocation-address '')
(xend-relocation-server yes)
(xend-relocation-port 8002)
(xend-relocation-address '')
(xend-relocation-hosts-allow '')
...

Restart xend on both nodes and make sure that port 8002 accepts connections from everywhere. Check for a LISTENER line with netstat :-)

Time has to be synced between both nodes (See Time server).

Clusterfilesystems

Simulate a cluster with two domU's within a dom0

With two xen guests running on the same dom0, which provides a shared disk to both of them (in order to simulate a shared-storage cluster environment), add a bang after the w in the xenguest’s configuration file to force Xen to allow you to mount it anyway (instead of giving a message).

Remember to use a cluster-aware filesystem like OCFS2 or GFS so the two VMs won’t mess up each other.

The config-line would then be:

disk=['phy:vgxen01/lv_linva06_hda,hda,w',*'phy:vgxen01/lv_linva04_06_hdb,hdb,w!'*]
instead of:
disk=['phy:vgxen01/lv_linva06_hda,hda,w',*'phy:vgxen01/lv_linva04_06_hdb,hdb,w'*]

Action

Start a domU on node1 and relocate it to node-2 with the command:

xm migrate --live name_xen_guest node2

Literature

 
xen/live-migration_infrastructure.txt · Last modified: 2008/02/04 11:54 by olivier
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki