=======Xen-clustering/live-migration======= =====shared storage===== In order to be able to do a live migration of a Xen guest from one cluster member to another, some sort of shared storage is required. As the Xen guest won't run on more than one cluster member at a time, a cluster filesystem is not required. That is, as long as you configure Xen to access the Xen guest by a physical device, not a file. As a FCAL-based SAN is not always available, we looked for other possibilities. Inspired by some Oracle RAC documentation (see: http://www.oracle.com/technology/pub/articles/hunter_rac10gr2.html) a shared firewire disk or -array appeared to be an option. A second possibility is mentioned in Xen live migration documentation from Novell(see below). In this proof of concept iSCSI has been used as a shared storage solution. ====Firewire==== By default Linux logs on a firewire device in exclusive mode. This prevents you from accidently accessing the same device with another node and screwing up your data. Fortunately you can bypass the exclusive login mechanism using a kernel module option to the serial bus protocol kernel module (sbp2 exclusive_login=0). For this to work, the chipset of your firewire device(s) should support multiple logins. For example the Oxford-chipset is known to support multiple logins. Check the afore-mentioned Oracle RAC documentation for more information on shared firewire hardware. Adjust /etc/modules for the necessary modules sd_mod ieee1394 ohci1394 sbp2 exclusive_login=0 Adjust /etc/modprobe.d/sbp2 options sbp2 exclusive_login=0 serialize_io=1 For immediate effect: rmmod sbp2 modprobe sbp2 For permanent effect, rebuild your initial-ramdisk, to have these options also used in this (because the sbp2 module is loaded at boottime). mkinitramfs -o foo.version.img or mkinitrd -o /boot/initrd.img-2.6.12.6-xen-fw 2.6.12.6-xen (example) ====iSCSI==== ===target=== First we need an iSCSI target. That is a device/server that provides shared storage on your network. We use the iSCSI enterprise target software to build a Linux based iSCSI target server. ==install== =from source= You can download the software at http://iscsitarget.sourceforge.net/ After building and installing the software, you'll have a kernel module named iscsi_trgt, a daemon called 'ietd' and a tool called 'ietdadm'. =package= Although binary packages are not yet available for Debian Etch, Philipp Hug created unofficial packages for Debian Sid and Ubuntu Dapper. They are available at http://iscsitarget.sourceforge.net/wiki/index.php/Unoffical_DEBs We installed the binary package named 'iscsitarget', which contains the userland binaries, on Debian Etch with no problems. The package named 'iscsitarget-source' contains the kernel module sources. This package allows you to build a binary kernel module for your kernel. The build on Debian Etch went flawlessly. Add this line to /etc/apt/sources.list deb http://debian.hug.cx/debian/ unstable/ Then procede to install the software and build the kernelmodule apt-get install module-assistant debhelper linux-source-2.6.18 dpkg-dev \ kernel-package libncurses-dev libssl-dev linux-headers-2.6.18-4-xen-amd64 cd /usr/src/ tar -jxvf linux-source-2.6.18.tar.bz2 ln -s linux-source-2.6.18 linux apt-get install iscsitarget iscsitarget-source tar -zxvf iscsitarget.tar.gz (this unpacks in sub-dir iscsitarget) m-a a-i iscsitarget ==config== Let's configure the daemon. We have to tell it which device(s) to enable and which clients should be able to access them. In the next example we will enable the logical volumes named 'vault1' and 'vault2' to everybody. The configuration file is /etc/ietd.conf Target iqn.2006-07.com.example.intra:storage.disk1.vault Lun 0 Path=/dev/mapper/vg00-vault1,Type=fileio Alias vault1 Target iqn.2006-07.com.example.intra:storage.disk2.vault Lun 1 Path=/dev/mapper/vg00-vault2,Type=fileio Alias vault2 Remember that every node on your network that uses iSCSI will need a unique 'iqn'. Check the iSCSI documentation on the web for the applicable syntax. You can add some lines to /etc/ietd.conf that require a username/password for iSCSI logons to succeed but this is ommitted by default. Start the iSCSI target daemon to enable the shared storage provider. This will open TCP port 3260 by default. /etc/init.d/iscsi-target start ===initiator=== On the clients that will have to access the iSCSI based shared storage we need to install and configure iSCSI initiator software. We'll use the Open iSCSI package. ==install== =from source= The source is available at http://www.open-iscsi.org/ =package= Debian Etch has a binary package for iSCSI clients. apt-get install open-iscsi ==config== After building and installing the software, you'll have two kernel modules named iscsi_tcp and scsi_transport_iscsi, a daemon called 'iscsid' and a tool called 'iscsiadm'. The install procedure of Open iSCSI will create a configuration file /etc/iscsid.conf that enables some defaults: node.active_cnx = 1 node.startup = manual node.session.timeo.replacement_timeout = 120 node.session.err_timeo.abort_timeout = 10 node.session.err_timeo.reset_timeout = 30 node.session.iscsi.InitialR2T = No node.session.iscsi.ImmediateData = Yes node.session.iscsi.FirstBurstLength = 262144 node.session.iscsi.MaxBurstLength = 16776192 node.session.iscsi.DefaultTime2Wait = 0 node.session.iscsi.DefaultTime2Retain = 0 node.session.iscsi.MaxConnections = 0 node.cnx[0].iscsi.HeaderDigest = None node.cnx[0].iscsi.DataDigest = None node.cnx[0].iscsi.MaxRecvDataSegmentLength = 65536 Now it's imperative to create a unique 'iqn' for our client and store it in /etc/initiatorname.iscsi InitiatorName=iqn.2006-07.com.example.intra:hannibal.clientnode1 Afterwards, start the Open iSCSI daemon on the client /etc/init.d/open-iscsi start Let's check at our iSCSI target server, for instance 192.168.1.16 iscsiadm -m discovery -t sendtargets -p 192.168.1.16:3260 After logging on to the iSCSI-target a new SCSI-device should have been added to the client iscsiadm -m node -T iqn.2006-07.com.example.intra:storage.disk1.vault -p 192.168.1.16:3260 -l Have fun! ====multipath==== iSCSI supports more than one connection to the same iSCSI LUN. This allows for high available setups. In our setup we have at least two NIC's in the iSCSI-server as well as in the iSCSI-clients. We configure them using two ethernet segments, 192.168.1.x and 192.168.2.x. Now we can initiate two iSCSI sessions per iSCSI-client, one per segment. As a result the iSCSI-client will end up with two new SCSI-devices, that have the same LUN as a target (but via two different paths!). On Linux the multipath-tools can map our two newly obtained SCSI-devices into one multipath blockdevice that has loadbalancing and failover as features. In addition to using multipath one could also consider to setup a host-based mirror for the shared storage. This could be accomplished by setting up two or more iSCSI-servers (targets) and join them in a software mirror (RAID-1) MD-device. This is left as an exercise for the reader :-) We installed the multipath-tools on a Debian Etch Xen-host. apt-get install multipath-tools Create the /etc/multipath.conf file (some examples are available in /usr/share/doc/multipath-tools/examples). The SCSI-ID that must be entered on the 'wwid' line can be obtained by the scsi_id tool. In our example we'll get the same id for sdd and sde, remember they're two paths to the same LUN! /sbin/scsi_id -g -u -s /block/sdd defaults { user_friendly_names yes } defaults { udev_dir /dev polling_interval 5 default_selector "round-robin 0" default_getuid_callout "/sbin/scsi_id -g -u -s /block/%n" failback immediate } blacklist { wwid 200d04b651805e38e devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*" devnode "^hd[a-z][[0-9]*]" devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]" } multipaths { multipath { wwid 149455400000000000000000001000000691000000d000000 alias vault1 path_grouping_policy failover path_checker readsector0 } Now after a reload of the multipath-tools and logging on the the iSCSI-targets, your multipath blockdevice will be ready for usage. /etc/init.d/multipath-tools reload /usr/bin/iscsiadm -m node -T iqn.2006-07.com.example.intra:storage.disk1.vault -p 192.168.1.16:3260 -l /usr/bin/iscsiadm -m node -T iqn.2006-07.com.example.intra:storage.disk1.vault -p 192.168.2.16:3260 -l Lets check our new device: multipath -ll The output is something like vault1 (149455400000000000000000001000000691000000d000000) dm-7 IET,VIRTUAL-DISK [size=50G][features=0][hwhandler=0] \_ round-robin 0 [prio=1][enabled] \_ 4:0:0:0 sdd 8:48 [active][ready] \_ round-robin 0 [prio=1][enabled] \_ 5:0:0:0 sde 8:64 [active][ready] ====Distributed replicated block device (drbd8)==== ===active/active with version 8=== Binary package with debian unstable. apt-get install drbd8-utils To make the module (from source): apt-get install module-assistant debhelper dpkg-dev kernel-package libncurses-dev libssl-dev m-a a-i drbd8 Edit /etc/drbd.conf work-in-progress..... =====Config===== Configure the xen relocation service ... (xend-relocation-address '') (xend-relocation-server yes) (xend-relocation-port 8002) (xend-relocation-address '') (xend-relocation-hosts-allow '') ... Restart xend on both nodes and make sure that port 8002 accepts connections from everywhere. Check for a LISTENER line with netstat :-) Time has to be synced between both nodes (See [[hannibal:Time server]]). ====Clusterfilesystems==== ===Simulate a cluster with two domU's within a dom0=== With two xen guests running on the same dom0, which provides a shared disk to both of them (in order to simulate a shared-storage cluster environment), add a bang after the w in the xenguest's configuration file to force Xen to allow you to mount it anyway (instead of giving a message). Remember to use a cluster-aware filesystem like OCFS2 or GFS so the two VMs won't mess up each other. The config-line would then be: disk=['phy:vgxen01/lv_linva06_hda,hda,w',*'phy:vgxen01/lv_linva04_06_hdb,hdb,w!'*] instead of: disk=['phy:vgxen01/lv_linva06_hda,hda,w',*'phy:vgxen01/lv_linva04_06_hdb,hdb,w'*] =====Action===== Start a domU on node1 and relocate it to node-2 with the command: xm migrate --live name_xen_guest node2 =====Literature===== * a proof of concept regarding Xen migration on Suse Linux by Novell Presales, available at [[http://forge.novell.com/modules/xfcontent/private.php?reference_id=2736&content=/library/Xen%20live%20migration%20demo/XEN_migration_demo_1.1.pdf|http://forge.novell.com/.../XEN_migration_demo_1.1.pdf]] * http://www.linux1394.org * documentation by Jeffrey Hunter on Oracle Technology Network regarding Oracle RAC on Linux and Firewire, available at http://www.oracle.com/technology/pub/articles/hunter_rac10gr2.html * a thesis from Espen Braastad (University of Oslo) named 'Management of high availibility services using virtualization', May 22 2006, available at http://www.linpro.no/content/download/519/3617/file/espen_HAxen.pdf