Shell scripts for various web-related automation tasks

For example, you have a task to Automating the process of checking the status of a list of websites and reporting on any that are down or return an error.

Prerequisites:

bash: Your default shell on Linux/macOS (or WSL on Windows).
curl: Usually pre-installed.
grep (optional but useful): For filtering text.
A text file with URLs: Let’s call it websites.txt.

here is the shell script file to automate this task

#!/bin/bash

# --- Configuration ---
WEBSITE_LIST="websites.txt"
TIMEOUT_SECONDS=5
USER_AGENT="WebsiteStatusChecker/1.0 (curl)" # Identify our requests
LOG_FILE="website_check_log_$(date +%Y%m%d_%H%M%S).txt"
DOWN_SITES_FILE="down_websites_$(date +%Y%m%d_%H%M%S).txt"

# --- Functions ---

# Function to log messages to console and file
log() {
    local message="$1"
    echo "$(date +%Y-%m-%d\ %H:%M:%S) - $message" | tee -a "$LOG_FILE"
}

# Function to check a single URL
check_url() {
    local url="$1"
    local status_code=""
    local error_message=""

    log "Checking $url..."

    # Use curl to get only the HTTP status code (-s silent, -o /dev/null no output, -w write-out)
    # --head: only fetch HTTP headers, not the entire page content - much faster.
    # --connect-timeout: max time for connection to happen.
    # --max-time: max time for the entire operation.
    status_code=$(curl -s -o /dev/null --head -w "%{http_code}" \
                       --connect-timeout "$TIMEOUT_SECONDS" --max-time "$TIMEOUT_SECONDS" \
                       -A "$USER_AGENT" "$url")

    # $?: Check the exit status of the curl command. 0 means success.
    if [ $? -ne 0 ]; then
        error_message="Curl failed for $url. Error code: $?."
        log "  ERROR: $error_message"
        echo "$url - ERROR: $error_message" >> "$DOWN_SITES_FILE"
    elif [[ "$status_code" =~ ^[23][0-9]{2}$ ]]; then # Check for 2xx (Success) or 3xx (Redirection)
        log "  SUCCESS: $url returned HTTP $status_code"
    else
        log "  WARNING: $url returned HTTP $status_code (Possible issue)"
        echo "$url - HTTP $status_code" >> "$DOWN_SITES_FILE"
    fi
}

# --- Main Script Logic ---

log "--- Starting Website Status Check ---"
log "Reading URLs from: $WEBSITE_LIST"
log "Log file: $LOG_FILE"
log "Down sites file: $DOWN_SITES_FILE"
echo "" > "$DOWN_SITES_FILE" # Clear the down sites file at the start

# Check if the website list file exists
if [ ! -f "$WEBSITE_LIST" ]; then
    log "Error: Website list file '$WEBSITE_LIST' not found."
    exit 1
fi

# Loop through each URL in the file
while IFS= read -r url || [[ -n "$url" ]]; do
    # Skip empty lines and lines starting with # (comments)
    if [[ -z "$url" || "$url" =~ ^\# ]]; then
        continue
    fi
    check_url "$url"
done < "$WEBSITE_LIST"

log "--- Website Status Check Complete ---"

# --- Final Report ---
if [ -s "$DOWN_SITES_FILE" ]; then # -s checks if file has size (i.e., not empty)
    log "\n--- Summary of Websites with Issues ---"
    cat "$DOWN_SITES_FILE" | tee -a "$LOG_FILE" # Display and append to main log
    log "Please review '$DOWN_SITES_FILE' for details."
else
    log "\nAll checked websites returned successful status codes."
fi

How to Run:

Save the script: Copy the code above and save it as check_websites.sh (or any .sh name).
Make it executable: Open your terminal and run: chmod +x check_websites.sh
Create the URL list: Create a file named websites.txt in the same directory as your script and paste the example URLs (or your own).
Run the script: ./check_websites.sh

Based on this example, comes with a practical question: I have a list of python file to run at designated time point, should I write such a shell script to automate it or use APScehduler?

Here’s a quick decision guide:

Feature	Shell Script + Cron/Systemd Timer	APSScheduler (or similar Python library)
Control	System-level	Application-level
Dynamic Jobs	Hard to change at runtime	Easy to add/remove/modify at runtime
App Downtime	Jobs still run if app is down	Jobs stop if app is down (unless app is managed by systemd)
Complexity	Simpler for basic schedules	More complex setup, but powerful for complex logic
Python Env	Must manage explicitly in script	Inherits app’s environment
Primary Use	Independent, fixed, external tasks	Dependent, dynamic, internal application tasks
Concurrency	Each job is a separate process (easy parallelism)	Configurable (threads, processes, async)

So APScheduler is more ideal solution.

Naixian Zhang

Shell scripts for various web-related automation tasks

Prerequisites:

How to Run:

Leave a comment Cancel reply

Prerequisites:

How to Run:

Share this:

Related

Leave a comment Cancel reply