OCF Cron Symlink Resource Agent

Symlinks are easy in Corosync/HA. However, when coupled with Cron it gets to be a bit of a pain. A list of problems needing to be addressed:

1. Normal server startup has Cron starting before Corosync which means that if /var/spool/cron is missing then Cron will create it. This is not a bad thing, just not optimal for us if we are wanting to link to a NFS share with all the cron jobs that need to be made available on all the HA nodes. The OCF Symlink resource will error out until you delete the directory and once the symlink is made you have to restart Cron to get all working. Kind of defeats the purpose of headache free HA. ;)

2. Cron really does need to be running on all nodes in the cluster if only for log rotation.

3. One could do the standard symlink RA with a crond RA and set the order of start and grouping. However, you would then need a second clone group to ensure cron is running. Unfortunately this does not work due to race conditions.

By modifying the original Symlink RA script I was able to get a very nice Cron Symlink RA that works perfectly for me. A standard symlink would look like this:

primitive cronlinked ocf:heartbeat:symlink \
	params link="/var/spool/cron" target="/mnt/imports/nvwh2.bluedotmedia.de/var/spool/cron" \
	op monitor interval="15" timeout="15" on-fail="ignore" \
	meta target-role="Started"

As you can see you have to set on-fail to “ignore” otherwise it will just failover due to the directory that Cron created when it started. However, here is the new Cron Symlink with the new cronlink RA:

primitive cronlinked ocf:itadmins:cronlink \
	params link="/etc/cron.d/cronscript /var/spool/cron" target="/mnt/nfsshare/etc/cron.d/cronscript /mnt/nfsshare/var/spool/cron" croninit="/usr/sbin/service cron restart" \
	meta target-role="Started" \
	op monitor interval="15" timeout="15" on-fail="restart"

Exactly as before except now you can tell corosync how to restart Cron once the link has been created and do not have to set the resource to be “ignored” on failure. Also you can set multiple link pairs (space seperated) and works perfectly if wanting certain jobs running on certain nodes in the cluster unless, of course, a fail-over ensues.

In the newest version I have had to move to Bash due to the use of arrays and it will now fail on the first error it encounters during monitoring. I have had no issues with this but if you have any let me know.

#!/bin/bash
#
#
#   An OCF RA that manages symlinks for Cron
#
# Copyright (c) 2011 Dominik Klein
# Modified by Charles Williams 2012 - 2015
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of version 2 of the GNU General Public License as
# published by the Free Software Foundation.
#
# This program is distributed in the hope that it would be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
#
# Further, this software is distributed without any warranty that it is
# free of the rightful claim of any third person regarding infringement
# or the like.  Any license provided herein, whether implied or
# otherwise, applies only to this software file.  Patent licenses, if
# any, provided herein do not apply to combinations of this program with
# other software, or any other product whatsoever.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA.
#

#######################################################################
# Initialization:

: ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/lib/heartbeat}
. ${OCF_FUNCTIONS_DIR}/ocf-shellfuncs

#######################################################################

meta_data() {
        cat <<END
<?xml version="1.0"?>
<!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
<resource-agent name="cronlink">
<version>1.8</version>

<longdesc lang="en">
This resource agent that manages a symbolic link (symlink) for Cron.

It is primarily intended to manage /var/spool/cron which is automatically
created by Cron when it starts. This resource removes that directory (or another one)
before creating the symlink and restarting Cron.

It will also create symlinks to cronjobs in the /etc/cron* directories as well. This
means no longer needing to "clusterafy" your cronjobs.

Link to target pairs can also be used for multiple symlinks as follows:

primitive cronlinked ocf:heartbeat:cronlink \
	params link="/etc/cron.d/cronscript /var/spool/cron" target="/mnt/nfsshare/etc/cron.d/cronscript /mnt/nfsshare/var/spool/cron" croninit="/usr/sbin/service cron restart" \
	meta target-role="Started" \
	op monitor interval="15" timeout="15" on-fail="restart"
</longdesc>
<shortdesc lang="en">Manages a symbolic link for Cron</shortdesc>
<parameters>
<parameter name="link" required="1">
<longdesc lang="en">
Full path of the symbolic link to be managed. This must obviously be
in a filesystem that supports symbolic links.
</longdesc>
<shortdesc lang="en">Full path of the symlink</shortdesc>
<content type="string"/>
</parameter>
<parameter name="target" required="1">
<longdesc lang="en">
Full path to the link target (the file or directory which the symlink points to).
</longdesc>
<shortdesc lang="en">Full path to the link target</shortdesc>
<content type="string" />
</parameter>
<parameter name="croninit" required="1">
<longdesc lang="en">
Full command to restart Cron.
</longdesc>
<shortdesc lang="en">Cron restart command</shortdesc>
<content type="string"/>
</parameter>
<parameter name="backup_suffix">
<longdesc lang="en">
A suffix to append to any files that the resource agent moves out of
the way because they clash with "link".

If this is unset (the default), then the resource agent will simply
refuse to create a symlink if it clashes with an existing file.
</longdesc>
<shortdesc lang="en">Suffix to append to backup files</shortdesc>
<content type="string" />
</parameter>
</parameters>
<actions>
<action name="start"   timeout="15" />
<action name="stop"    timeout="15" />
<action name="monitor" depth="0"  timeout="15" interval="60"/>
<action name="meta-data"  timeout="5" />
<action name="validate-all"  timeout="10" />
</actions>
</resource-agent>
END
}

symlink_monitor() {
    # This applies the following logic:
    #
    # * If $OCF_RESKEY_link does not exist, then the resource is
    #   definitely stopped.
    #
    # * If $OCF_RESKEY_link exists and is a symlink that points to
    #   ${OCF_RESKEY_target}, then the resource is definitely started.
    #
    # * If $OCF_RESKEY_link exists, but is anything other than a
    #   symlink to ${OCF_RESKEY_target}, then the status depends on whether
    #   ${OCF_RESKEY_backup_suffix} is set:
    #
    #   - if ${OCF_RESKEY_backup_suffix} is set, then the resource is
    #     simply not running. The existing file will be moved out of
    #     the way, to ${OCF_RESKEY_link}${OCF_RESKEY_backup_suffix},
    #     when the resource starts.
    #
    #   - if ${OCF_RESKEY_backup_suffix} is not set, then an existing
    #     file ${OCF_RESKEY_link} is an error condition, and the
    #     resource can't start here.
    rc=$OCF_ERR_GENERIC

    # Using ls here instead of "test -e", as "test -e" returns false
    # if the file does exist, but not if it's a symlink to a file that doesn't
    ocf_log info "Checking if $1 is symlinked to $2"
    if ! ls "$1" >/dev/null 2>&1; then
        ocf_log debug "$1 does not exist"
        rc=$OCF_NOT_RUNNING
    elif [ ! -L  "$1" ]; then
        if [ -d "$1" ]; then
                ocf_run rm -rf "$1"
		rc=$OCF_NOT_RUNNING
        elif [ -z "$OCF_RESKEY_backup_suffix" ]; then
            ocf_log err "$1 exists but is not a symbolic link!"
            exit $OCF_ERR_INSTALLED
        else
            ocf_log debug "$1 exists but is not a symbolic link, will be moved to ${1}${OCF_RESKEY_backup_suffix} on start"
            rc=$OCF_NOT_RUNNING
        fi
    elif readlink -f "$1" | egrep -q "^${2}$"; then
        ocf_log debug "$1 exists and is a symbolic link to ${2}."
        rc=$OCF_SUCCESS
    else
        if [ -z "$OCF_RESKEY_backup_suffix" ]; then
            ocf_log err "$1 does not point to ${2}!"
            exit $OCF_ERR_INSTALLED
        else
            ocf_log debug "$1 does not point to ${2}, will be moved to ${1}${OCF_RESKEY_backup_suffix} on start"
            rc=$OCF_NOT_RUNNING
        fi
    fi
    return $rc
}

symlink_monitor_links() {
    links=($OCF_RESKEY_link)
    targets=($OCF_RESKEY_target)
    success=0
    if [ "${#links[@]}" -eq "${#targets[@]}" ]; then
        i=0
        while [ $i -lt ${#links[*]} ]; do
                symlink_monitor ${links[$i]} ${targets[$i]}
                        rc=$?
                        if [ $rc -ne $OCF_SUCCESS ]; then
                                return $rc
                        fi
                i=$(( $i + 1));
        done
        return $rc
    fi
}

symlink_start() {
        links=($OCF_RESKEY_link)
        targets=($OCF_RESKEY_target)
        success=0
        if [ "${#links[@]}" -eq "${#targets[@]}" ]; then
                i=0
                while [ $i -lt ${#links[*]} ]; do
                    if ! symlink_monitor ${links[$i]} ${targets[$i]}; then
                        if [ -e "${links[$i]}" ]; then
                            if [ -z "$OCF_RESKEY_backup_suffix" ]; then
                                # Shouldn't happen, because symlink_monitor should
                                # have errored out. But there is a chance that
                                # something else put that file there after
                                # symlink_monitor ran.
                                ocf_log err "${links[$i]} exists and no backup_suffix is set, won't overwrite."
                                #exit $OCF_ERR_GENERIC
                                success=1
                            else
                                ocf_log debug "Found ${links[$i]}, moving to ${links[$i]}${OCF_RESKEY_backup_suffix}"
                                #ocf_run mv -v ${links[$i]} ${links[$i]}${OCF_RESKEY_backup_suffix} || exit $OCF_ERR_GENERIC
                                ocf_run mv -v ${links[$i]} ${links[$i]}${OCF_RESKEY_backup_suffix} || success=1
                            fi
                        fi
        ocf_log info "Linking $links to $targets"
                        ocf_run ln -sv ${targets[$i]} ${links[$i]}
                        symlink_monitor ${links[$i]} ${targets[$i]}
                    fi
                    i=$(( $i + 1));
                done
               	ocf_run $OCF_RESKEY_croninit
                return $?
        fi
        if [ $success -eq 0 ]; then
                return $OCF_SUCCESS
        else
                return $OCF_ERR_GENERIC
        fi
}

symlink_stop() {
    links=($OCF_RESKEY_link)
    targets=($OCF_RESKEY_target)
    success=0
    if [ "${#links[@]}" -eq "${#targets[@]}" ]; then
        i=0
        while [ $i -lt ${#links[*]} ]; do
                    if symlink_monitor ${links[$i]} ${targets[$i]}; then   
                        ocf_run rm -vf ${links[$i]} || exit $OCF_ERR_GENERIC
                        if ! symlink_monitor ${links[$i]} ${targets[$i]}; then
                            if [ -e "${links[$i]}${OCF_RESKEY_backup_suffix}" ]; then
                                ocf_log debug "Found backup ${links[$i]}${OCF_RESKEY_backup_suffix}, moving to ${links[$i]}"
                                # if restoring the backup fails then still return with
                                # $OCF_SUCCESS, but log a warning
                                ocf_run -warn mv "${links[$i]}${OCF_RESKEY_backup_suffix}" "${links[$i]}"
                            fi
                            ocf_run $OCF_RESKEY_croninit
                        else
                            ocf_log err "Removing ${links[$i]} failed."
                            #return $OCF_ERR_GENERIC
                            success=1
                        fi
                    else
                        ocf_run $OCF_RESKEY_croninit
                    fi
                    i=$(( $i + 1));
                done
        fi
        if [ $success -eq 0 ]; then
                return $OCF_SUCCESS
        else
                return $OCF_ERR_GENERIC
        fi
}

symlink_validate_all() {
    if [ "x${OCF_RESKEY_link}" = "x" ]; then
        ocf_log err "Mandatory parameter link is unset"
        exit $OCF_ERR_CONFIGURED
    fi
    if [ "x${OCF_RESKEY_target}" = "x" ]; then
        ocf_log err "Mandatory parameter target is unset"
        exit $OCF_ERR_CONFIGURED
    fi
    if [ "x${OCF_RESKEY_croninit}" = "x" ]; then
        ocf_log err "Mandatory parameter croninit is unset"
        exit $OCF_ERR_CONFIGURED
    fi

    # Having a non-existant target is technically not an error, as
    # symlinks are allowed to point to non-existant paths. But it
    # still doesn't hurt to warn people if the target does not exist
    # (but only during non-probes).
    links=(${OCF_RESKEY_link// / })
    targets=(${OCF_RESKEY_target// / })
    success=0
    if [ "${#links[@]}" -eq "${#targets[@]}" ]; then
        i=0
        while [ $i -lt ${#links[*]} ]; do
            if [ ! -e "${targets[$i]}" ]; then
                ocf_log warn "${targets[$i]} does not exist!"
            fi
            i=$(( $i + 1));
        done
    fi
}

symlink_usage() {
        cat <<EOF
usage: $0 {start|stop|monitor|validate-all|meta-data}
Expects to have a fully populated OCF RA-compliant environment set.
EOF
}

if [ $# -ne 1 ]; then
        symlink_usage
        exit $OCF_ERR_ARGS
fi

case $__OCF_ACTION in
meta-data)
        meta_data
        exit $OCF_SUCCESS
        ;;
usage)
        symlink_usage
        exit $OCF_SUCCESS
esac

# Everything except usage and meta-data must pass the validate test
symlink_validate_all || exit

case $__OCF_ACTION in
start)
        echo "Starting ..."
        symlink_start
        ;;
stop)
        symlink_stop
        ;;
status|monitor)
        symlink_monitor_links
        ;;
validate-all)
        ;;
*)
        symlink_usage
        exit $OCF_ERR_UNIMPLEMENTED
esac
# exit code is the exit code (return code) of the last command (shell function)