Sunday, October 30, 2011

Using runlevels to demote a network king to a mere baron

I have put enough services onto my server that it has become a single-point of failure. It's a router, a DSL modem, backup storage, and has network-shared drive, plus a few other cool things.

The key point of failure is the combination of Modem + Router. The PCI DSL card requires a custom driver, and sometimes after a system upgrade it needs to be reinstalled. The details of the wanrouter driver are here.

What I need is a failover mode: If the DSL fails to work, I want to use my old external modem and router. So I still want the system to run, but as a dchp-client server instead of a router.

Let's use some startup logic and runlevels to define two roles for the server: Network King routing over DSL, and faithful Baron merely connecting over wi-fi to the Linksys. And a bit of connective tissue so the machine automagically boots into the correct role, plus can be switched manually.

Internet <------+ Network King +-----> LAN

Internet <------+ Other router +------> LAN <------+ Faithful Baron 

Runlevel 1 - startup (Don't touch this)
Runlevel 2 - network testing
Runlevel 3 - server (Network King)
Runlevel 4 - client (Faithful Baron)
Runlevel 5 - unchanged from stock install
Runslevel6 - reboot (Don't touch this)

Decision logic:
If the server can connect to the internet over the dsl interface, then it is a King
Otherwise, it is a Baron.

Issues:
1) I need to change the LAN IP address of the server so it doesn't conflict with the router anymore!
2) All changes to the server must be tracked and undoable

Changes to the external router
Port-forward from the internet to the Baron (easy, one DMZ setting or separate port-forward settings for each service)

Server setup changes
1) Create a new directory to hold the three small scripts we are going to make, so you can keep track!
mkdir /root/startup-scripts

2) It's possible to create one really ugly /etc/networking/interfaces file, but let's not do that. Instead, we'll create separate interface files for runlevels 3 and 4. For convenience, let's put a link to them next to the original interfaces file.
cp /etc/network/interfaces /root/startup-scripts/runlevel-3-interfaces
cp /etc/network/interfaces /root/startup-scripts/runlevel-4-interfaces
ln /root/startup-scripts/runlevel-3-interfaces /etc/network/interfaces
ln /root/startup-scripts/runlevel-4-interfaces /etc/network/interfaces

3) Edit the runlevel 3 file (/root/startup-scripts/runlevel-3-interfaces)
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).
# This file is ONLY for runlevel 3 (Network King [router] mode)

# The loopback network interface
auto lo
iface lo inet loopback

# The ethernet jack and wi-fi antenna in bridged server mode
iface eth0 inet manual
iface wlan0 inet manual
     up hostapd -B /etc/hostapd/hostapd.conf
     down ifconfig mon.wlan0 down
     down pkill hostapd
auto br0                       
iface br0 inet static
     # Adding and removing the slave eth0 and wlan0 interfaces
     # is handled by /etc/init.d/kingbaron
     address 192.168.1.1
     broadcast 192.168.1.255
     netmask 255.255.255.0
     network 192.168.1.0
     up hostapd -B /etc/hostapd/hostapd.conf
     up route add -net 239.0.0.0 netmask 255.0.0.0 br0
     down ifconfig mon.wlan0 down
     down pkill hostapd
     down route del -net 239.0.0.0 netmask 255.0.0.0 br0

iface dsl-provider inet ppp
     pre-up /sbin/ifconfig dsl0 up # line maintained by pppoeconf
     provider dsl-provider

auto dsl0
iface dsl0 inet manual

4) Edit the runlevel 4 file (/root/startup-scripts/runlevel-4-interfaces)
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).
# This file os ONLY for runlevel 4 (Network dhcp client mode)

# The loopback network interface
auto lo
iface lo inet loopback

# The ethernet jack in client mode
auto eth0
allow-hotplug eth0
iface eth0 inet dhcp

# The wi-fi antenna in client mode
auto wlan0
iface wlan0 inet dhcp
     pre-up ifconfig wlan0 down
     pre-up iwconfig wlan0 mode Managed
     pre-up iwconfig wlan0 essid MY_LAN_NAME

5) Let's edit the original /etc/network/interfaces file to reduce startup time by not automatically raising the eth0 and wlan0 interfaces. We don't need those in runlevel 2, since only the DSL line will needs to be brought up.
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

# The ethernet jack in client mode
# (Disabled during initial boot)
allow-hotplug eth0
iface eth0 inet manual

# The wi-fi antenna in client mode
# (Disabled during initial boot)
auto wlan0
iface wlan0 inet manual
     pre-up iwconfig wlan0 essid Klein-Weisser

# The following lines are auto-generated for the dsl connection

auto dsl-provider
iface dsl-provider inet ppp
     pre-up /sbin/ifconfig dsl0 up # line maintained by pppoeconf
     provider dsl-provider

auto dsl0
iface dsl0 inet manual

6) Create a new file for the startup testing and decision logic: /root/startup-scripts/kingbaron
#!/bin/bash

### BEGIN INIT INFO
# Provides:             runlevel_chooser
# Required-Start:       $network $remote_fs $syslog wanrouter
# Required-Stop:        $network
# Default-Start:        2
# Default-Stop:
# Short-Description:    Choose runlevels based on testing network connection to an interface
### END INIT INFO

# Functions

start_king_mode () {
   # If coming from runlevel N or 4, need to change from dhcp to static/Master
   [ runlevel=="4 2" ] && ifdown -a --interface=/etc/network/runlevel-4-interfaces
   [ runlevel=="N 2" ] && ifdown -a

   # If the br0 interface does not exist (coming from runlevel N), create it.
   [ $(brctl show | grep -c br0) -eq 0 ] && brctl addbr br0

   # Add the slave interfaces to br0
   [ $(brctl show | grep -c eth0) -eq 0 ] && brctl addif br0 eth0
   [ $(brctl show | grep -c wlan0) -eq 0 ] && brctl addif br0 wlan0

   # Bring up the King mode interfaces (except dsl0 and ppp0, which are already up)
   ifup -a -v --interfaces=/etc/network/runlevel-3-interfaces

   # Test that ifup worked
   [ $(ifconfig | grep -c mon.wlan0) -eq 0 ] && logger -i -s -t kingbaron "Failed to bring up wlan0...Sorry"
   [ $(ifconfig | grep -c br0) -eq 0 ] && logger -i -s -t kingbaron "Failed to bring up br0...Sorry"

   logger -i -s -t kingbaron "Network should be up now. If not, try 'ifconfig' for LAN interfaces and 'wanrouter' for the DSL interface"
   telinit 3
   exit 0
}

start_baron_mode () {
   # Shut down the dsl0 and ppp0 interfaces (not used in Runlevel 4).
   # Sometimes the signal needs to be sent twice
   [ $(wanrouter status | grep -c stopped) -gt 0 ] || wanrouter stop
   [ $(wanrouter status | grep -c stopped) -gt 0 ] || wanrouter stop

   # If coming from runlevel 3, need to change from static/Master to dhcp
   # If coming from runlevel N or 4, we can keep the same interfaces
   [ runlevel=="3 2" ] && ifdown -a --interface=/etc/network/runlevel-3-interfaces

   # If br0 is up, bring it down. If it still has slaves from runlevel 3, unslave them
   [ $(ifconfig | grep -c br0) -gt 0 ] && ifconfig br0 down
   [ $(brctl show | grep -c wlan0) -gt 0 ] && brctl delif br0 wlan0 && ifconfig wlan0 down
   [ $(brctl show | grep -c eth0) -gt 0 ] && brctl delif br0 eth0 && ifconfig eth0 down

   # If wlan0 is stuck in Master mode from runlevel 3, unstick it
   [ $(ifconfig | grep -c mon.wlan0) -gt 0 ] && pkill hostapd && ifconfig wlan0 down && iwconfig wlan0 Managed

   # Bring up the Baron Mode interfaces, ignoring anything already up
   ifup -a -v --interfaces=/etc/network/runlevel-4-interfaces

   # Sometimes the network fails to come up, especially if it didn't go down properly
   # Check for the most common errors (like WiFi not going up) and try to autofix
   if [ $(ifconfig | grep -A2 wlan0 | grep -c inet) -eq 0 ]; then
      logger -i -s -t kingbaron "Wireless failed to come up. Resetting and trying again..."
      ifconfig wlan0 down
      iwconfig wlan0 mode Managed
      iwconfig wlan0 essid MY_ESSID
      ifconfig wlan0 up
      dhclient -v wlan0
   fi
   logger -i -s -t kingbaron "Network should be up now. If not, try 'ifconfig' and 'iwconfig'"
   telinit 4
   exit 0
}

logger -i -s -t kingbaron "Testing for DSL connectivity"

# Check for the existence of a the DSL interface. If it exists, try to get a connection
# If the internet is reachable, goto runlevel 3 (King mode). Else goto runlevel 4 (Baron mode)

# If wanrouter is not already running, then start it
flag="wanrouter off"
[ $(wanrouter status | grep -c Connecting) -eq 0 ] && flag="wanrouter on"
[ $(wanrouter status | grep -c Connected) -eq 0 ] && flag="wanrouter on"
[ $flag=="wanrouter off" ] && wanrouter start

# If the test interface does not exist, then start client mode
if [ $(ifconfig | grep -c dsl0) -eq "0" ]; then
   logger -i -s -t kingbaron "The DSL interface (dsl0) does not exist. Entering dhcp client mode"
   start_baron_mode
fi

# If the test interface exists, then wait for wanrouter to start up
# Average start time is about 20 seconds
logger -i -s -t kingbaron "Found the DSL interface. Waiting up to 40 seconds for the DSL link (dsl0) to come up"
i="0"
while [ $i -lt 40 ]; do
   sleep 1
   i=$[$i+1]
   [ $(wanrouter status | grep -c Connecting) -gt 0 ] || i=100
done

# If time expires without the wanrouter starting, print the error and start client mode
if [ $i -lt 100 ]; then
   logger -i -s -t kingbaron "dsl0 interface failed to come up. Entering dhcp client mode"
   start_baron_mode
fi

# If the wanrouter comes up properly, then wait for a ppp connection
logger -i -s -t kingbaron "dsl0 up. Waiting up to 40 seconds for a PPPoE connection (ppp0)"
pon dsl-provider
i="0"
while [ $i -lt 40 ]; do
   sleep 1
   i=$[$i+1]
   [ $(route | grep -c default) -gt 0 ] && i=100
done

# If time expires without the ppp connection starting, print the error and start client mode
if [ $i -lt 100 ]; then
   logger -i -s -t kingbaron "ppp0 failed to come up. Entering dhcp client mode"
   start_baron_mode
fi

# If all has gone well, and the ppp connection comes up
logger -i -s -t kingbaron "ppp0 came up. This system has DSL connectivity. Starting Router services"
start_king_mode

7) Install the script:
chmod +x /etc/startup-scripts/kingbaron            # Make executable
ln /root/startup-scripts/kingbaron /etc/init.d/    # Hardlink to init.d
update-rc.d kingbaron defaults                     # Symlink to runlevel 2

8) Take a look at /etc/rc2.d/. See all those router and server services that shouldn't operate in runlevel 2? Or shouldn't operate in runlevels 2 and 4? Use the command "update-rc.d $Name disable 2" to disable the appropriate services in runlevel 2 (or 4).

...And do a whole lot of tweaking and testing, and voila! An automated detection and failover system. If the DSL line is active, router services start and the interfaces come up in static/master mode. If the DSL line isn't active, router services don't start and the interfaces come up in dhcp mode. Runlevel 2 is the decision mode, runlevel 3 is the King (router) mode, and runlevel 4 is the Baron (dhcp) mode.

All the server services (samba, cups, etc) still operate, regardless. And you can manually reset with the command 'telinit 2' to make the system reset the interfaces and router services.

TIP: Look for related posts using the kingbaron tag.

No comments: