Shell scripts for various web-related automation tasks

For example, you have a task to Automating the process of checking the status of a list of websites and reporting on any that are down or return an error.

Prerequisites:

  • bash: Your default shell on Linux/macOS (or WSL on Windows).
  • curl: Usually pre-installed.
  • grep (optional but useful): For filtering text.
  • A text file with URLs: Let’s call it websites.txt.

here is the shell script file to automate this task

#!/bin/bash

# --- Configuration ---
WEBSITE_LIST="websites.txt"
TIMEOUT_SECONDS=5
USER_AGENT="WebsiteStatusChecker/1.0 (curl)" # Identify our requests
LOG_FILE="website_check_log_$(date +%Y%m%d_%H%M%S).txt"
DOWN_SITES_FILE="down_websites_$(date +%Y%m%d_%H%M%S).txt"

# --- Functions ---

# Function to log messages to console and file
log() {
    local message="$1"
    echo "$(date +%Y-%m-%d\ %H:%M:%S) - $message" | tee -a "$LOG_FILE"
}

# Function to check a single URL
check_url() {
    local url="$1"
    local status_code=""
    local error_message=""

    log "Checking $url..."

    # Use curl to get only the HTTP status code (-s silent, -o /dev/null no output, -w write-out)
    # --head: only fetch HTTP headers, not the entire page content - much faster.
    # --connect-timeout: max time for connection to happen.
    # --max-time: max time for the entire operation.
    status_code=$(curl -s -o /dev/null --head -w "%{http_code}" \
                       --connect-timeout "$TIMEOUT_SECONDS" --max-time "$TIMEOUT_SECONDS" \
                       -A "$USER_AGENT" "$url")

    # $?: Check the exit status of the curl command. 0 means success.
    if [ $? -ne 0 ]; then
        error_message="Curl failed for $url. Error code: $?."
        log "  ERROR: $error_message"
        echo "$url - ERROR: $error_message" >> "$DOWN_SITES_FILE"
    elif [[ "$status_code" =~ ^[23][0-9]{2}$ ]]; then # Check for 2xx (Success) or 3xx (Redirection)
        log "  SUCCESS: $url returned HTTP $status_code"
    else
        log "  WARNING: $url returned HTTP $status_code (Possible issue)"
        echo "$url - HTTP $status_code" >> "$DOWN_SITES_FILE"
    fi
}

# --- Main Script Logic ---

log "--- Starting Website Status Check ---"
log "Reading URLs from: $WEBSITE_LIST"
log "Log file: $LOG_FILE"
log "Down sites file: $DOWN_SITES_FILE"
echo "" > "$DOWN_SITES_FILE" # Clear the down sites file at the start

# Check if the website list file exists
if [ ! -f "$WEBSITE_LIST" ]; then
    log "Error: Website list file '$WEBSITE_LIST' not found."
    exit 1
fi

# Loop through each URL in the file
while IFS= read -r url || [[ -n "$url" ]]; do
    # Skip empty lines and lines starting with # (comments)
    if [[ -z "$url" || "$url" =~ ^\# ]]; then
        continue
    fi
    check_url "$url"
done < "$WEBSITE_LIST"

log "--- Website Status Check Complete ---"

# --- Final Report ---
if [ -s "$DOWN_SITES_FILE" ]; then # -s checks if file has size (i.e., not empty)
    log "\n--- Summary of Websites with Issues ---"
    cat "$DOWN_SITES_FILE" | tee -a "$LOG_FILE" # Display and append to main log
    log "Please review '$DOWN_SITES_FILE' for details."
else
    log "\nAll checked websites returned successful status codes."
fi

How to Run:

  1. Save the script: Copy the code above and save it as check_websites.sh (or any .sh name).
  2. Make it executable: Open your terminal and run: chmod +x check_websites.sh
  3. Create the URL list: Create a file named websites.txt in the same directory as your script and paste the example URLs (or your own).
  4. Run the script: ./check_websites.sh

Based on this example, comes with a practical question: I have a list of python file to run at designated time point, should I write such a shell script to automate it or use APScehduler?

Here’s a quick decision guide:

FeatureShell Script + Cron/Systemd TimerAPSScheduler (or similar Python library)
ControlSystem-levelApplication-level
Dynamic JobsHard to change at runtimeEasy to add/remove/modify at runtime
App DowntimeJobs still run if app is downJobs stop if app is down (unless app is managed by systemd)
ComplexitySimpler for basic schedulesMore complex setup, but powerful for complex logic
Python EnvMust manage explicitly in scriptInherits app’s environment
Primary UseIndependent, fixed, external tasksDependent, dynamic, internal application tasks
ConcurrencyEach job is a separate process (easy parallelism)Configurable (threads, processes, async)

So APScheduler is more ideal solution.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.