Automated Remote Host Shutdown with apcupsd

Home, Bangkok, Thailand, 2020-03-21 18:02 +0700

#infrastructure #automation

My control node which was already running apcupsd for monitoring purposes is now configured to automatically power off my NAS, backup server and KVM compute node when the remaining charge in the UPS drops to 10 minutes or less. Here’s how I did it.

This post assumes you already have apcupsd running - for a guide on setting it up see my earlier post on apcupsd and apcupsd_exporter.

doshutdown Script

The main script in /etc/apcupsd is apccontrol and this script is invoked by the apcupsd daemon process when it detects power state changes. You should not actually edit this file directly - however in this script you will see dispatchers for each of the power events that apcupsd handles:

case "$1" in
    killpower)
        echo "Apccontrol doing: ${APCUPSD} --killpower on UPS ${2}" | (${WALL} 2>/dev/null || cat)
        sleep 10
        ${APCUPSD} --killpower
        echo "Apccontrol has done: ${APCUPSD} --killpower on UPS ${2}" | (${WALL} 2>/dev/null || cat)
    ;;
    commfailure)
        echo "Warning communications lost with UPS ${2}" | ${WALL}
    ;;
    commok)
        echo "Communications restored with UPS ${2}" | ${WALL}
    ;;
    
    ...

To handle one of these events you just need to create a script with the name of that event in the /etc/apcupsd directory. The event we want to handle is doshutdown which is triggered when the time threshold or one of the other thresholds configured in apcupsd.conf is reached.

In my case I just wanted to issue shutdown to the three hosts mentioned above - here is my doshutdown script:

$ cat /etc/apcupsd/doshutdown

#!/bin/bash

WALL=wall

echo "Shutdown initiated by apcupsd" | ${WALL}

echo "Issuing shutdown command to nas" | ${WALL}
ssh shutdownbot@nas "sudo /sbin/shutdown -h now" &

echo "Issuing shutdown command to backup" | ${WALL}
ssh shutdownbot@backup "sudo /sbin/shutdown -h now" &

echo "Issuing shutdown command to kvmcompute" | ${WALL}
ssh shutdownbot@kvmcompute "sudo /sbin/shutdown -h now" &

See the documentation for a full explanation of customizing event handlers. Note that the script needs to be executable - ie. chmod 755 doshutdown.

As you can see from this script we need a user called shutdownbot on each of the hosts which has sufficient rights to sudo shutdown with no password.

Remaing Minutes Threshold

I set the threshold in /etc/apcupsd/apcupsd.conf to 10 minutes - meaning that when apcupsd detects that the UPS is down to 10 minutes charge it will trigger the doshutdown action:

MINUTES 10

Key Pair

apcupsd is running as root on my control node (not totally happy with that but that is a backlog item for another day). Since the root user will be ssh’ing to each box I created a SSH keypair as root:

sudo ssh-keygen -t rsa -b 2048

Then grabbed the public key as this will need to be authorized on each of the target nodes

cat /root/.ssh/id_rsa.pub

Linux Hosts

On my KVM compute node and backup node which both run Linux provisioning the shutdownbot user followed the same process:

sudo useradd -u 2004 -m -s /bin/bash shutdownbot

I have a little table of user ID’s for these service accounts so that I can keep them consistent across hosts - in this case shutdownbot got the ID 2004.

After creating the user I edited the sudoers file and granted shutdownbot the ability to run the shutdown command:

sudo visudo

The config rule to allow this is:

shutdownbot ALL=(ALL) NOPASSWD:/sbin/shutdown -h now

Importantly we’re only allowing shutdownbot to run one specific command as sudo, meaning the blast radius of the account being compromised is limited. The NOPASSWD directive is required so that it can be invoked by the apcupsd daemon without human interaction.

Finally we can register the previously created public key as an authorized key for shutdownbot:

mkdir /home/shutdownbot/.ssh
chmod 700 /home/shutdownbot/.ssh
echo <paste id_rsa.pub contents here> > /home/shutdownbot/.ssh/authorized_keys
chmod 600 /home/shutdownbot/.ssh/authorized_keys
chown -R shutdownbot:shutdownbot /home/shutdownbot/.ssh

Synology NAS

On my Synology NAS the process was a bit different. To create the user I logged into the DSM console and used the Users applet from the Control Panel to create the new shutdownbot user:

Unfortunately due to a change that was made in DSM around version 6.2.2 it is necessary to make this user a member of the administrators group for it to be able to access SSH.

However you can lock down access further by overriding the permissions and application access to deny everything except access to the homes share:

Next SSH into the NAS as admin and register the public key:

mkdir /var/services/homes/shutdownbot/.ssh
chmod 700 /var/services/homes/shutdownbot/.ssh
echo <paste id_rsa.pub contents here> > /var/services/homes/shutdownbot/.ssh/authorized_keys
chmod 600 /var/services/homes/shutdownbot/.ssh/authorized_keys
chown -R shutdownbot:users /var/services/homes/shutdownbot/.ssh

Then also changed the permissions on the user directory - as noted in my earlier post this step is not well documented, but’s necessary for SSH with to work with a keypair:

chmod 700 /var/services/homes/shutdownbot

The Synology OS doesn’t include visudo so you need to directly edit the sudoers file.

vi /etc/sudoers

The advantage of visudo is that it will validate your changes before applying them - when directly editing you don’t have that validation so be very careful when making your edits so you don’t accidentally lock yourself out of sudo by committing a bad config. The line to be added is the same as on the Linux nodes:

# Allow the shutdownbot user to run shutdown commands
shutdownbot ALL=(ALL) NOPASSWD:/sbin/shutdown -h now

Testing

Back on the control node now check that SSH works:

sudo ssh shutdownbot@nas
sudo ssh shutdownbot@compute
sudo ssh shutdownbot@backup

Next test that the doshutdown script works by directly invoking it - this should shutdown the remote hosts:

./doshutdown

Finally test the end-to-end integration by pulling the mains power from the UPS and observe it shutdown the remote hosts when the 10 minute remaining charge threshold is reached.

It’s nice to have a Grafana dashboard to monitor the state of your UPS while testing, in particular so you can see the remaining charge and remaining runtime. For example here’s the Grafana dashboard running on my control node just after pulling mains power showing remaining runtime of 15.6 minutes

The remaining charge got down to 8 minutes before the script ran to shutdown the nodes - the rate of discharge was a bit faster than apcupsd’s estimate meaning that it overshot the 10 minute threshold before kicking in:

With the load reduced after auto-shutting down the remote hosts, the UPS had sufficient charge to run the remaining devices (control node, UniFi network switch, pfSense firewall, and UniFi AP) for an estimated 34 minutes (not sure how accurate that is):

If you want to setup a dashboard like this check out my earlier post on deploying apcupsd_exporter with Prometheus and Grafana.

Disable Shutdown of control Node

One final thing - I don’t want the control node itself which is running apcupsd to be shutdown when the 10 minute threshold is reached because it will eventually have it’s own local battery backup in the form of two 18650 cells which will keep it running for around 12 hours. The control node needs to stay online until the bitter end so that it can be used to issue wake-on-lan packets to the other nodes when mains power is restored.

As noted at the beginning of this post you should not directly edit the /etc/apcupsd/apccontrol script because it will be replaced between upgrades (although the apcupsd package hasn’t been updated for years) however the behaviour is to invoke your doshutdown script AND execute the default lines in apccontrol which are:

    doshutdown)
        echo "UPS ${2} initiated Shutdown Sequence" | ${WALL}
        #${SHUTDOWN} -h now "apcupsd UPS ${2} initiated shutdown"
    ;;

So I had to go in and comment that shutdown command.

Conclusion

It feels really good to have this mechanism in place to provide some level of assurance that devices in my home lab running sensitive file systems will be shutdown cleanly in case of a power outage.