Debian FAI with iSCSI and IPXE
Move diskless cluster to iSCSI diskless cluster. This is how we did it.
The goal: iSCSI Boot via iPXE
The goal was to move our diskless cluster from “nfsroot + aufs” to iSCSI disks for each node and boot from iSCSI via iPXE. The bootstrapping of the node will be done with FAI and the final configuration with salt upon first boot. Once the initial configuration with FAI is stable, changes the nodes configuration will only be done via salt.
Documentation about the whole process is generally available although a bit thinly spread and does not combine all of the tools used here. It was quite a bumpy road to finally get a booting system with numerous small roadblocks.
But in the end: It works!
Making FAI iSCSI “aware”
The first thing needed is a working PXE boot environment. We use dnsmasq, the setup can be found here.
On boot, the NIC will send a DHCP request. The DHCP Server will answer with the IP address, gateway, etc. and a boot image.
FAI needs to be setup too, obviously. Make sure to include the package
open-iscsi in /etc/fai/NFSROOT
or chroot to your nfsroot and install it
there. It is not necessary to make it available in the initramdisk (it will
be crucial for the nodes, though).
FAI will boot into its nfsroot and perform then the installation on the target.
To make disks via iSCSI available there are some modifications to the startup
necessary. By itself FAI will ignore the iSCSI disks even when you manage to
connect them. Even if you make the disk somehow known to FAI, the iSCSI disks
will always be added later in the device tree. If there are other, physical, disks in the
system installed, the iSCSI disk will always be the last disk. I did not find a
way to tell FAI (setup-storage) to use the last disk, but I found a way to
use an iSCSI disk.
On start up FAI will execute the files in the folder class in the FAI config
space. I modified the 20-hwdetect.sh
to include the iSCSI disk in the
disklist variable. Then I used the feature of globbing device names in the
disk_config partition setup. The file 20-hwdetect.sh
needed to be renamed
to 55-hwdetect.sh
, otherwise there was no access to the defined classes of
the host which are defined later in 50-host-classes
. Here is the
55-hwdetect.sh
:
#! /bin/bash
# (c) Thomas Lange, 2002-2013, lange@informatik.uni-koeln.de
# NOTE: Files named *.sh will be evaluated, but their output ignored.
[ $do_init_tasks -eq 1 ] || return 0 # Do only execute when doing install
echo 0 > /proc/sys/kernel/printk
kernelmodules=iscsi_tcp
# here, you can load modules depending on the kernel version
case $(uname -r) in
2.6*) kernelmodules="$kernelmodules mptspi dm-mod md-mod aes dm-crypt"
;;
3*) kernelmodules="$kernelmodules mptspi dm-mod md-mod aes dm-crypt"
;;
4*) kernelmodules="$kernelmodules mptspi dm-mod md-mod aes dm-crypt"
;;
esac
for mod in $kernelmodules; do
[ "$verbose" ] && echo Loading kernel module $mod
modprobe -a $mod 1>/dev/null 2>&1
done
ip ad show up | egrep -iv 'loopback|127.0.0.1|::1/128|_lft'
echo $printk > /proc/sys/kernel/printk
# here comes the iSCSI part
# this will start iSCSI and makes the disk visible to the system
if ifclass ISCSI; then
echo "ISCSI class defined: logging in to target"
echo "Initiatorname: iqn.2010-04.org.ipxe:$HOSTNAME "
iscsistart -i iqn.2010-04.org.ipxe:$HOSTNAME \
-t iqn.2003-01.cluster.nas1:root \
-g 1 \
-a $SERVER \
-p 3260
sleep 5
lsscsi | grep LIO-ORG
# match the last 4 letters of the LIO-ORG device line, i.e. sdb<SPACE>
# this selects the iSCSI disk, my be better to use /dev/disk/by-path/* ?
disklist=$(lsscsi | grep LIO-ORG | grep -o "....$")
#disklist=${iscsi_disk: -4}
echo "ISCSI disk found: $disklist"
fi
odisklist=$disklist
set_disk_info # recalculate list of available disks
if [ "$disklist" != "$odisklist" ]; then
tmp_disklist="$disklist $odisklist"
echo New disklist: $tmp_disklist
echo disklist=\"$tmp_disklist\" >> $LOGDIR/additional.var
fi
save_dmesg # save new boot messages (from loading modules)
For this to work there needs to be an iSCSI target setup with the proper ACLs for the current initiator. Here is a small script which creates a small disk image in the fileio backstore, and makes it the first LUN for the initiator:
#!/bin/bash
set -e
TARGET_HOSTNAME=$1
ISCSI_TARGET="iqn.2003-01.cluster.nas1:root"
ISCSI_INITIATOR="iqn.2010-04.org.ipxe:$TARGET_HOSTNAME"
# Create file image
targetcli backstores/fileio create $TARGET_HOSTNAME /srv/iscsi-images/$TARGET_HOSTNAME.img 24G true
# Create ACL for initiator
targetcli iscsi/$ISCSI_TARGET/tpg1/acls create $ISCSI_INITIATOR add_mapped_luns=false
# create lun on ACL
targetcli iscsi/$ISCSI_TARGET/tpg1/acls/$ISCSI_INITIATOR create mapped_lun=0 tpg_lun_or_backstore=/backstores/fileio/$TARGET_HOSTNAME
targetcli / saveconfig
And finally the disk config for FAI’s setup-storage
(/srv/fai/config/disk_config/ISCSI
). The glob expression will match the first
iSCSI disk (LUN 0) and create a root and swap partition.
disk_config /dev/disk/by-path/*iscsi*-lun-0 fstabkey:uuid align-at:2M disklabel:gpt-bios
primary / 4G- xfs defaults,relatime
primary swap 8G swap sw
Client Configuration
For the client configuration I followed mostly this configuration in the part iSCSI boot configuration
In the FAI config space:
- make sure open-iscsi is marked for installation in the package_config
- create empty files/etc/iscsi/iscsi.initramfs/ISCSI
- add DEVICE=eth0 in ./files/etc/initramfs-tools/initramfs.conf/ISCSI (otherwise there were problem wuth bboting on multi NIC hosts)
- GRUB_CMDLINE_LINUX=”ip=dhcp iscsi_auto=true” in files/etc/default/grub/ISCSI
Now comes a hook which will create the initiator login information (hooks/chboot.ISCSI
):
#!/bin/bash
# this script is executable
# we skip the normal chboot task
skiptask chboot
# this will create the ipxe boot script
ssh -l $LOGUSER $SERVER /usr/local/bin/enable-iscsi $HOSTNAME $SERVER
# configure initiator name
echo "InitiatorName=iqn.2010-04.org.ipxe:$HOSTNAME" > $target/etc/iscsi/initiatorname.iscsi
# iscsi configuration
cat << EOF > $target/etc/iscsi/iscsi.initramfs
ISCSI_INITIATOR="iqn.2010-04.org.ipxe:$HOSTNAME"
ISCSI_TARGET_NAME=iqn.2003-01.cluster.nas1:root
ISCSI_TARGET_IP=$SERVER
ISCSI_TARGET_PORT=3260
ISCSI_TARGET_GROUP=1
EOF
chroot $target update-initramfs -u
exit 0
In this script we run a command on the server which creates a ipxe script (/usr/local/bin/enable-iscsi
):
#!/bin/bash
HOST=$1
SERVER=$2
FAI_TFTP_CONFIG=/srv/tftp/fai/pxelinux.cfg
ISCSI_TARGET="iqn.2003-01.cluster.nas1:root"
HEX_HOST=$(sipcalc -d $HOST |grep "Host address (hex)" | awk '{print $5}')
cat << EOF > $FAI_TFTP_CONFIG/$HEX_HOST
default iscsi
label iscsi
kernel ipxe.lkrn
append dhcp && sanboot iscsi:$SERVER::::$ISCSI_TARGET
EOF
Now, upon reboot, the host should boot via iSCSI. On the first boot the host will start salt-call to configure and install the rest of the system (slurm, ganglia, ssh keys etc.).