Static network config with Debian Cloud images

I self-host some services on virtual machines (VMs), and I’m currently using Debian 11.x as the host machine relying on the libvirt infrastructure to manage QEMU/KVM machines. While everything has worked fine for years (including on Debian 10.x), there has always been one issue causing a one-minute delay every time I install a new VM: the default images run a DHCP client that never succeeds in my environment. I never found out a way to disable DHCP in the image, and none of the documented ways through cloud-init that I have tried worked. A couple of days ago, after reading the AlmaLinux wiki I found a solution that works with Debian.

The following commands creates a Debian VM with static network configuration without the annoying one-minute DHCP delay. The three essential cloud-init keywords are the NoCloud meta-data parameters dsmode:local, static network-interfaces setting combined with the user-data bootcmd keyword. I’m using a Raptor CS Talos II ppc64el machine, so replace the image link with a genericcloud amd64 image if you are using x86.

wget https://cloud.debian.org/images/cloud/bullseye/latest/debian-11-generic-ppc64el.qcow2
cp debian-11-generic-ppc64el.qcow2 foo.qcow2

cat>meta-data
dsmode: local
network-interfaces: |
 iface enp0s1 inet static
 address 192.168.98.14/24
 gateway 192.168.98.12
^D

cat>user-data
#cloud-config
fqdn: foo.mydomain
manage_etc_hosts: true
disable_root: false
ssh_pwauth: false
ssh_authorized_keys:
- ssh-ed25519 AAAA...
timezone: Europe/Stockholm
bootcmd:
- rm -f /run/network/interfaces.d/enp0s1
- ifup enp0s1
^D

virt-install --name foo --import --os-variant debian10 --disk foo.qcow2 --cloud-init meta-data=meta-data,user-data=user-data

Unfortunately virt-install from Debian 11 does not support the –cloud-init network-config parameter, so if you want to use a version 2 network configuration with cloud-init (to specify IPv6 addresses, for example) you need to replace the final virt-install command with the following.

cat>network_config_static.cfg
version: 2
 ethernets:
  enp0s1:
   dhcp4: false
   addresses: [ 192.168.98.14/24, fc00::14/7 ]
   gateway4: 192.168.98.12
   gateway6: fc00::12
   nameservers:
    addresses: [ 192.168.98.12, fc00::12 ]
^D

cloud-localds -v -m local --network-config=network_config_static.cfg seed.iso user-data

virt-install --name foo --import --os-variant debian10 --disk foo.qcow2 --disk seed.iso,readonly=on --noreboot

virsh start foo
virsh detach-disk foo vdb --config
virsh console foo

There are still some warnings like the following, but it does not seem to cause any problem:

[FAILED] Failed to start Initial cloud-init job (pre-networking).

Finally, if you do not want the cloud-init tools installed in your VMs, I found the following set of additional user-data commands helpful. Cloud-init will not be enabled on first boot and a cron job will be added that purges some unwanted packages.

runcmd:
- touch /etc/cloud/cloud-init.disabled
- apt-get update && apt-get dist-upgrade -uy && apt-get autoremove --yes --purge && printf '#!/bin/sh\n{ rm /etc/cloud/cloud-init.disabled /etc/cloud/cloud.cfg.d/01_debian_cloud.cfg && apt-get purge --yes cloud-init cloud-guest-utils cloud-initramfs-growroot genisoimage isc-dhcp-client && apt-get autoremove --yes --purge && rm -f /etc/cron.hourly/cloud-cleanup && shutdown --reboot +1; } 2>&1 | logger -t cloud-cleanup\n' > /etc/cron.hourly/cloud-cleanup && chmod +x /etc/cron.hourly/cloud-cleanup && reboot &

The production script I’m using is a bit more complicated, but can be downloaded as vello-vm. Happy hacking!

Combining Dnsmasq and Unbound

For my home office network I have been using Dnsmasq for some time. Dnsmasq provides me with DNS, DHCP, DHCPv6, and IPv6 Router Advertisement. I run dnsmasq on a Debian Jessie server, but it works similar with OpenWRT if you want to use a smaller device. My entire /etc/dnsmasq.d/local configuration used to look like this:

dhcp-authoritative
interface=eth1
read-ethers
dhcp-range=192.168.1.100,192.168.1.150,12h
dhcp-range=2001:9b0:104:42::100,2001:9b0:104:42::1500
dhcp-option=option6:dns-server,[::]
enable-ra

Here dhcp-authoritative enable DHCP. interface=eth1 says to listen on eth1 only, which is my internal (IPv4 NAT) network. I try to keep track of the MAC address of all my devices in a /etc/ethers file, so I use read-ethers to have dnsmasq give stable IP addresses for them. The dhcp-range is used to enable DHCP and DHCPv6 on my internal network. The dhcp-option=option6:dns-server,[::] statement is needed to inform the DHCP clients of the DNS resolver’s IPv6 address, otherwise they would only get the IPv4 DNS server address. The enable-ra parameter enables IPv6 router advertisement on the internal network, thereby removing the need to run radvd too — useful since I prefer to use copyleft software.

Recently I had a desire to use DNSSEC, and enabled it in Dnsmasq using the following statements:

dnssec
trust-anchor=.,19036,8,2,49AAC11D7B6F6446702E54A1607371607A1A41855200FD2CE1CDDE32F24E8FB5
dnssec-check-unsigned

The dnssec keyword enable DNSSEC validation in dnsmasq, using the indicated trust-anchor (get the root-anchors from IANA). The dnssec-check-unsigned deserves some more discussion. The dnsmasq manpage describes it as follows:

As a default, dnsmasq does not check that unsigned DNS replies are legitimate: they are assumed to be valid and passed on (without the “authentic data” bit set, of course). This does not protect against an attacker forging unsigned replies for signed DNS zones, but it is fast. If this flag is set, dnsmasq will check the zones of unsigned replies, to ensure that unsigned replies are allowed in those zones. The cost of this is more upstream queries and slower performance.

For example, this means that dnsmasq’s DNSSEC functionality is not secure against active man-in-the-middle attacks between dnsmasq and the DNS server it is using. Even if example.org used DNSSEC properly, an attacker could fake that it was unsigned to dnsmasq, and I would get potentially incorrect values in return. We all know that the Internet is not a secure place, and your threat model should include active attackers. I believe this mode should be the default in dnsmasq, and users should have to configure dnsmasq to not be in that mode if they really want to (with the obvious security warning).

Running with this enabled for a couple of days resulted in frustration about not being able to reach a couple of domains. The behaviour was that my clients would hang indefinitely or get a SERVFAIL, both resulting in lack of service. You can enable query logging in dnsmasq with log-queries and enabling this I noticed three distinct form of error patterns:

jow13gw dnsmasq 460 - -  forwarded www.fritidsresor.se to 213.80.101.3
jow13gw dnsmasq 460 - -  validation result is BOGUS
jow13gw dnsmasq 547 - -  reply cloudflare-dnssec.net is BOGUS DNSKEY
jow13gw dnsmasq 547 - -  validation result is BOGUS
jow13gw dnsmasq 547 - -  reply linux.conf.au is BOGUS DS
jow13gw dnsmasq 547 - -  validation result is BOGUS

The first only happened intermittently, the second did not cause any noticeable problem, and the final one was reproducible. To be fair, I only found the last example after starting to search for problem reports (see post confirming bug).

At this point, I had a confirmed bug in dnsmasq that affect my use-case. I want to use official packages from Debian on this machine, so installing newer versions manually is not an option. So I started to look into alternatives for DNS resolving, and quickly found Unbound. Installing it was easy:

apt-get install unbound
unbound-control-setup 

I created a local configuration file in /etc/unbound/unbound.conf.d/local.conf as follows:

server:
	interface: 127.0.0.1
	interface: ::1
	interface: 192.168.1.2
	interface: 2001:9b0:104:42::2
	access-control: 127.0.0.1 allow
	access-control: ::1 allow
	access-control: 192.168.1.2/24 allow
	access-control: 2001:9b0:104:42::2/64 allow
	outgoing-interface: 155.4.17.2
	outgoing-interface: 2001:9b0:1:1a04::2
#	log-queries: yes
#	verbosity: 2

The interface keyword determine which IP addresses to listen on, here I used the loopback interface and the local address of the physical network interface for my internal network. The access-control allows recursive DNS resolving from those networks. And outgoing-interface specify my external Internet-connected interface. log-queries and/or verbosity are useful for debugging.

To make things work, dnsmasq has to stop providing DNS services. This can be achieved with the port=0 keyword, however that will also disable informing DHCP clients about the DNS server to use. So this has to be added in manually. I ended up adding the two following lines to /etc/dnsmasq.d/local:

port=0
dhcp-option=option:dns-server,192.168.1.2

Restarting unbound and dnsmasq now leads to working (and secure) internal DNSSEC-aware name resolution over both IPv4 and IPv6. I can verify that resolution works, and that Unbound verify signatures and reject bad domains properly with dig as below, or use online DNSSEC resolver test page although I’m not sure how confident you can be in the result from that page.

$ host linux.conf.au
linux.conf.au has address 192.55.98.190
linux.conf.au mail is handled by 1 linux.org.au.
$ host sigfail.verteiltesysteme.net
;; connection timed out; no servers could be reached
$ 

I use Munin to monitor my services, and I was happy to find a nice Unbound Munin plugin. I installed the file in /usr/share/munin/plugins/ and created a Munin plugin configuration file /etc/munin/plugin-conf.d/unbound as follows:

[unbound*]
user root
env.statefile /var/lib/munin-node/plugin-state/root/unbound.state
env.unbound_conf /etc/unbound/unbound.conf
env.unbound_control /usr/sbin/unbound-control
env.spoof_warn 1000
env.spoof_crit 100000

I run munin-node-configure --shell|sh to enable it. To work unbound has to be configured as well, so I create a /etc/unbound/unbound.conf.d/munin.conf as follows.

server:
	extended-statistics: yes
	statistics-cumulative: no
	statistics-interval: 0
remote-control:
	control-enable: yes

The graphs may be viewed at my munin instance.