Initial commit: Backup Restore Orchestrator

Windmill-Flow + restore.sh für das automatische tägliche Backup-Verifikationssystem.
Direkter Windmill-Sync via `wmill sync push` möglich.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Sebastian Serfling
2026-04-29 21:15:42 +02:00
commit 467eb35225
28 changed files with 2632 additions and 0 deletions
+236
View File
@@ -0,0 +1,236 @@
# Backup Restore Orchestrator
Automatisiertes tägliches Backup-Restore-Testsystem auf Basis von [Windmill](https://windmill.dev).
Jeden Tag um **00:11 Uhr** werden alle PBS-Backups auf mehreren Restore-Servern wiederhergestellt, auf Bootfähigkeit geprüft, als verschlüsselte 7z-Archive gespeichert und auf einen zentralen Backup-Server übertragen. Ergebnisse werden in einer MySQL-Datenbank gespeichert und per Nextcloud Talk gemeldet.
---
## Repository-Struktur
```
├── f/Backup/ ← Windmill Workspace (sync-fähig)
│ ├── backup_restore_orchestrator__flow/ ← Hauptflow (Orchestrator)
│ │ ├── flow.yaml
│ │ ├── aktive_datastores_aus_db_holen.my.sql ─ Step F
│ │ ├── job_initialisieren_&_backup-queue_...py ─ Step A
│ │ ├── alle_freien_restore-server_holen.py ─ Step B
│ │ ├── ssh-credentials_fuer_alle_...py ─ Step G
│ │ ├── script_deployen_&_pbs-datastores_...py ─ Step C
│ │ ├── alte_restore-ordner_...py ─ Step H
│ │ ├── ersten_restore_pro_server_starten.py ─ Step D
│ │ └── webhook_verarbeiten_&_...py ─ Step E
│ ├── backup_restore_report___nextcloud_talk__flow/ ← Täglicher Report (08:00 Uhr)
│ ├── folder.meta.yaml
│ ├── nextcloud_talk_room.variable.yaml
│ ├── nextcloud_talk_url.variable.yaml
│ └── restore_version.variable.yaml
└── restore-worker/
└── restore.sh ← Restore-Script (auf Restore-Servern)
```
---
## Windmill Sync
```bash
# Einmalig: Windmill CLI installieren
npm install -g windmill-cli
# Workspace konfigurieren
wmill workspace add <workspace-id> https://windmill.stines.de
# Aus Repo in Windmill einspielen
wmill sync push --workspace <workspace-id>
# Aus Windmill ins Repo ziehen
wmill sync pull --workspace <workspace-id>
```
> **Hinweis:** Secrets (`skipSecrets: true`) werden nicht synchronisiert. Variablen mit sensiblen Werten müssen nach dem Push manuell in Windmill gesetzt werden.
---
## Systemarchitektur
```
Windmill (Schedule 00:11 Uhr)
├─► PBS backup.stines.de:8007 ← Backup-Quelle
├─► STI-PROX01 (max 200 GB) ── restore.sh ──► 7z ──► Rsync ──► Backup-Server
├─► ITD-PROX01 (max 100 GB) ── restore.sh ──► 7z ──► Rsync ──► Backup-Server
└─► STI-BAC01 (min 250 GB) ── restore.sh ──► 7z (lokal, kein Rsync)
Webhook → Windmill Step E
→ nächstes Backup starten
```
### Restore-Server
| Hostname | max_backup_size_gb | min_backup_size_gb | Rsync |
|------------|--------------------|--------------------|-------|
| STI-PROX01 | 200 | NULL | ja |
| ITD-PROX01 | 100 | NULL | ja |
| STI-BAC01 | NULL | 250 | nein (lokal gemountet) |
---
## Flow-Ablauf: F → A → B → G → C → H → D → E
| Step | ID | Sprache | Funktion |
|------|----|---------|----------|
| F | `f` | MySQL | Aktive Datastores aus DB holen |
| A | `a` | Python | Job anlegen, PBS-Snapshots holen, Queue aufbauen (größte zuerst) |
| B | `b` | Python | Freie Restore-Server holen |
| G | `g` | Python | SSH-Credentials aus Bitwarden |
| C | `c` | Python | Script deployen, PBS-Storages registrieren, Session speichern |
| H | `h` | Python | Alte Backup-Ordner auf Backup-Server löschen |
| D | `d` | Python | Ersten Restore pro Server starten |
| E | `e` | Python | Webhook verarbeiten, nächsten Restore starten |
### Zwei Modi
**Schedule-Pfad** (täglich 00:11):
Steps F → A → B → G → C → H → D laufen sequenziell. Step D startet den ersten Restore pro Server per SSH non-blocking und gibt `waiting_for_webhook` zurück.
**Webhook-Pfad** (nach jedem `restore.sh`):
Flow-Input enthält `job_uuid` → Step A erkennt Webhook-Aufruf. Steps BH werden übersprungen. Step E verarbeitet das Ergebnis, schreibt es in die DB und startet sofort das nächste passende Backup auf demselben Server.
---
## restore.sh — Ablauf
Das Script läuft auf den Restore-Servern unter `/opt/windmill-restore/restore.sh`.
Gestartet von Step D und Step E via SSH + `nohup` (non-blocking).
```
[0] 7z-Passwort vom PBS holen (password_7z.txt via Rsync)
[1] Space-Check: free_space >= backup_size * 1.5
[2] IDs ermitteln: Original aus Backup-Pfad, Restore-ID ab 1000
[2.5] ZIP-bereits-vorhanden-Check → bei Treffer: success Webhook + exit
[3] qmrestore (VM) / pct restore (CT)
[4] IMAGE_DIR dynamisch aus PVE-Storage-Pfad ermitteln
[5] Images prüfen (leer → failed)
[6] Vorbereiten: Netzwerkkarten entfernen, qemu-Agent aktivieren
[7] VM/CT starten, Bootfähigkeit prüfen (Agent ping 120s / pct exec)
[8] VM: qm shutdown --timeout 120, CT: pct stop
[9] Config sichern (qemu-server.conf / lxc.conf)
[10] Verschlüsseltes 7z-Archiv erstellen (mx=0, mhe=on)
[11] Rsync zum Backup-Server (3 Versuche + Größenprüfung)
ODER lokal speichern (STI-BAC01: SKIP_RSYNC=1)
[12] VM/CT destroy, ZIP löschen (außer STI-BAC01)
[13] Webhook → Windmill Step E
```
### Script-Deployment
Das Script wird von Step C automatisch auf alle Restore-Server deployed:
1. `restore.sh` in diesem Repo (Ordner `restore-worker/`) aktualisieren
2. In Gitea pushen: `http://172.17.1.251:8080/sebastian.serfling/BackupScript.git`
3. Windmill-Variable `f/Backup/restore_version` erhöhen (z.B. `1.0.27`)
4. Nächster Flow-Lauf: Step C erkennt Versionsunterschied → deployed automatisch
---
## Windmill-Variablen
| Variable | Inhalt |
|----------|--------|
| `f/Backup/pbs_variable` | JSON: host, port, user, password, fingerprint |
| `f/Backup/mysql_config` | JSON: MySQL-Verbindungsdaten |
| `f/Backup/bitwarden_api_login` | JSON: bw_clientid, bw_clientsecret, bw_masterpassword |
| `f/Backup/gitea_token` | Gitea Access Token |
| `f/Backup/restore_version` | Aktuelle Script-Version, z.B. `1.0.26` |
| `f/Backup/backup_server_host` | Hostname/IP Backup-Server |
| `f/Backup/backup_server_ssh_password` | SSH-Passwort Backup-Server |
| `f/Backup/windmill_webhook_url` | Webhook-URL für restore.sh Callbacks |
| `f/Backup/windmill_webhook_token` | Bearer Token |
| `f/Backup/nextcloud_talk_url` | https://nextcloud.stines.de |
| `f/Backup/nextcloud_talk_room` | Room-Token |
| `f/Backup/nextcloud_talk_user` | Benutzername |
| `f/Backup/nextcloud_talk_password` | App-Passwort |
---
## Datenbank-Schema (MySQL: `Kunden`)
### `bronze.restore.jobs`
| Spalte | Typ | Beschreibung |
|--------|-----|--------------|
| job_uuid | VARCHAR(64) PK | Eindeutige Job-ID |
| started_at | DATETIME | Startzeitpunkt |
| finished_at | DATETIME | Endzeitpunkt |
| status | VARCHAR(20) | running / completed / failed |
| total_backups | INT | Anzahl Backups in Queue |
| restored_count | INT | Erfolgreich abgeschlossen |
| failed_count | INT | Fehlgeschlagen |
### `bronze.restore.result`
| Spalte | Typ | Beschreibung |
|--------|-----|--------------|
| job_uuid | VARCHAR(64) | Referenz auf Job |
| client_name | VARCHAR(128) | z.B. `tnp-Invest-GmbH:vm/100` |
| backup_path | VARCHAR(256) | Vollpfad mit Timestamp |
| vm_name | VARCHAR(128) | Hostname der VM/CT |
| restore_server | VARCHAR(128) | Hostname des Restore-Servers |
| status | VARCHAR(20) | restoring / done / failed |
| restore_duration_sec | INT | Dauer Restore in Sekunden |
| zip_size_bytes | BIGINT | Größe des 7z-Archivs |
| rsync_ok | TINYINT | Rsync erfolgreich |
| qm_agent_ok | TINYINT | Boot-Check erfolgreich |
| error_message | TEXT | Fehlermeldung falls failed |
### `bronze.backup.queue`
| Spalte | Typ | Beschreibung |
|--------|-----|--------------|
| job_uuid | VARCHAR(64) | Referenz auf Job |
| client_name | VARCHAR(128) | Backup-Bezeichnung |
| backup_path | VARCHAR(256) | Vollpfad mit Timestamp |
| backup_size_bytes | BIGINT | Komprimierte PBS-Größe |
| priority | INT | 0 = größtes (höchste Prio) |
| rsync_target | VARCHAR(256) | Zielpfad auf Backup-Server |
| pbs_storage_id | VARCHAR(128) | z.B. `pbs-firma-gmbh` |
| status | VARCHAR(20) | queued / assigned / done / failed / obsolete |
### `bronze.restore.server`
| Spalte | Typ | Beschreibung |
|--------|-----|--------------|
| hostname | VARCHAR(128) PK | Server-Hostname |
| ip | VARCHAR(45) | IP-Adresse |
| is_active | TINYINT | 1 = aktiv |
| free_space_gb | INT | Freier Speicher (wird aktualisiert) |
| restore_mount | VARCHAR(128) | z.B. `/mnt/BTRFS` |
| restore_path | VARCHAR(128) | PVE-Storage-Name |
| current_job_uuid | VARCHAR(64) | NULL = frei |
| max_backup_size_gb | INT | NULL = kein Limit |
| min_backup_size_gb | INT | NULL = kein Limit |
| script_deployed | TINYINT | Script vorhanden |
| script_version | VARCHAR(20) | Aktuelle Script-Version |
---
## SQL-Reset bei Problemen
```sql
-- Kompletter Reset für neuen Testlauf
UPDATE Kunden.`bronze.restore.jobs` SET status='failed', finished_at=NOW() WHERE status='running';
UPDATE Kunden.`bronze.restore.server` SET current_job_uuid=NULL;
DELETE FROM Kunden.`bronze.restore.session`;
UPDATE Kunden.`bronze.backup.queue` SET status='obsolete' WHERE status IN ('queued','assigned');
-- Einzelnen Server freigeben
UPDATE Kunden.`bronze.restore.server` SET current_job_uuid=NULL WHERE hostname='ITD-PROX01';
```
---
## Bekannte Probleme
| Problem | Ursache | Fix |
|---------|---------|-----|
| Falsches Datum in Logs | Server auf UTC statt CET | `timedatectl set-timezone Europe/Berlin` |
| `/var/tmp` voll (ITD-PROX01) | Proxmox schreibt tmp auf Root-Partition | `mount --bind /mnt/BTRFS/tmp /var/tmp` |
| Server bleibt nach letztem Backup belegt | `current_job_uuid` nicht zurückgesetzt | `UPDATE bronze.restore.server SET current_job_uuid=NULL WHERE hostname=...` |
@@ -0,0 +1 @@
SELECT datastore FROM Kunden.`bronze.backup.server.datastore` AS kbbsd WHERE kbbsd.restore = 1
@@ -0,0 +1,17 @@
# py: 3.12
anyio==4.12.1
bcrypt==5.0.0
certifi==2026.2.25
cffi==2.0.0
cryptography==46.0.5
h11==0.16.0
httpcore==1.0.9
httpx==0.28.1
idna==3.11
invoke==2.2.1
mysql-connector-python==9.6.0
paramiko==4.0.0
pycparser==3.0
pynacl==1.6.2
typing-extensions==4.15.0
wmill==1.657.2
@@ -0,0 +1,40 @@
import wmill, mysql.connector, json
def main(prev: dict):
if prev.get("mode") == "webhook":
return prev
db_cfg = json.loads(wmill.get_variable("f/Backup/mysql_config"))
conn = mysql.connector.connect(**db_cfg)
cur = conn.cursor(dictionary=True)
# FIX: max_backup_size_gb hinzugefügt
cur.execute("""
SELECT hostname, ip, free_space_gb,
script_deployed, script_version,
restore_mount, restore_path,
max_backup_size_gb,
min_backup_size_gb
FROM Kunden.`bronze.restore.server`
WHERE is_active = 1 AND current_job_uuid IS NULL
ORDER BY free_space_gb DESC
""")
servers = cur.fetchall()
cur.close(); conn.close()
if not servers:
raise Exception("Kein freier Restore-Server verfuegbar!")
for s in servers:
if not s.get("restore_mount"):
raise Exception(
f"restore_mount fuer '{s['hostname']}' nicht konfiguriert!"
)
if not s.get("restore_path"):
raise Exception(
f"restore_path fuer '{s['hostname']}' nicht konfiguriert!"
)
print(f"{len(servers)} freie Restore-Server: "
f"{[s['hostname'] for s in servers]}")
return {**prev, "target_servers": servers}
@@ -0,0 +1,17 @@
# py: 3.12
anyio==4.13.0
bcrypt==5.0.0
certifi==2026.2.25
cffi==2.0.0
cryptography==46.0.6
h11==0.16.0
httpcore==1.0.9
httpx==0.28.1
idna==3.11
invoke==2.2.1
mysql-connector-python==9.6.0
paramiko==4.0.0
pycparser==3.0
pynacl==1.6.2
typing-extensions==4.15.0
wmill==1.666.0
@@ -0,0 +1,83 @@
import wmill, json, paramiko, mysql.connector, re, io
def main(prev: dict):
if prev.get("mode") == "webhook":
return prev
db_cfg = json.loads(wmill.get_variable("f/Backup/mysql_config"))
conn = mysql.connector.connect(**db_cfg)
cur = conn.cursor(dictionary=True)
cur.execute("""
SELECT datastore, rsync_target, retention_days
FROM Kunden.`bronze.backup.datastore.config`
WHERE rsync_target IS NOT NULL AND rsync_target != ''
""")
configs = cur.fetchall()
cur.close(); conn.close()
if not configs:
print("Keine Datastore-Configs - Cleanup uebersprungen.")
return prev
backup_server = wmill.get_variable("f/Backup/backup_server_host")
ip_match = re.search(r'https?://([0-9.]+)', backup_server)
ip = ip_match.group(1) if ip_match else backup_server
bw_pass = wmill.get_variable("f/Backup/backup_server_ssh_password")
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect(ip, username="root", password=bw_pass)
# Cleanup-Script aufbauen
# Datum-basierter Vergleich statt -mtime
script_lines = ["#!/bin/bash", "set -euo pipefail", ""]
for cfg in configs:
datastore = cfg["datastore"]
target = cfg["rsync_target"]
retention_days = cfg.get("retention_days") or 7
script_lines.append(
f"echo '[{datastore}] Cleanup {target} (Ordner aelter als {retention_days} Tage)'"
)
script_lines.append(f"if [ -d '{target}' ]; then")
script_lines.append(f" cutoff=$(date -d '{retention_days} days ago' +%Y-%m-%d)")
script_lines.append(f" find '{target}' -maxdepth 1 -type d -name '????-??-??' | while read dir; do")
script_lines.append(f" folder_date=$(basename \"$dir\")")
script_lines.append(f" if [[ \"$folder_date\" < \"$cutoff\" ]]; then")
script_lines.append(f" echo \" Loesche: $dir (Datum: $folder_date < Cutoff: $cutoff)\"")
script_lines.append(f" rm -rf \"$dir\"")
script_lines.append(f" else")
script_lines.append(f" echo \" Behalte: $dir\"")
script_lines.append(f" fi")
script_lines.append(f" done")
script_lines.append(f"else")
script_lines.append(f" echo ' WARNUNG: Verzeichnis nicht gefunden: {target}'")
script_lines.append(f"fi")
script_lines.append("")
script_lines.append("echo 'Cleanup abgeschlossen.'")
cleanup_script = "\n".join(script_lines)
# Script per SFTP hochladen statt Heredoc
sftp = ssh.open_sftp()
sftp.putfo(
io.BytesIO(cleanup_script.encode()),
"/tmp/cleanup_restore.sh"
)
sftp.close()
ssh.exec_command("chmod +x /tmp/cleanup_restore.sh")
import time; time.sleep(1)
ssh.exec_command(
"nohup /tmp/cleanup_restore.sh > /tmp/cleanup_restore.log 2>&1 &"
)
ssh.close()
print(f"Cleanup auf {ip} im Hintergrund gestartet.")
print(f"Log: /tmp/cleanup_restore.log")
return {**prev, "cleanup": "started_background"}
@@ -0,0 +1,17 @@
# py: 3.12
anyio==4.13.0
bcrypt==5.0.0
certifi==2026.2.25
cffi==2.0.0
cryptography==46.0.5
h11==0.16.0
httpcore==1.0.9
httpx==0.28.1
idna==3.11
invoke==2.2.1
mysql-connector-python==9.6.0
paramiko==4.0.0
pycparser==3.0
pynacl==1.6.2
typing-extensions==4.15.0
wmill==1.664.0
@@ -0,0 +1,158 @@
import wmill, json, paramiko, mysql.connector
def start_restore(server, backup, job_uuid, webhook_url, webhook_tok):
"""Startet restore.sh auf einem Server non-blocking."""
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect(
server["ip"],
username=server["ssh_creds"]["user"],
password=server["ssh_creds"]["password"]
)
safe_client = backup["client_name"].replace("/", "_").replace(":", "_")
srv_hostname = server["hostname"]
bk_size = backup.get("backup_size_bytes") or 0
cmd = (
f"nohup /opt/windmill-restore/restore.sh"
f" --job-uuid '{job_uuid}'"
f" --backup-path '{backup['backup_path']}'"
f" --client '{backup['client_name']}'"
f" --restore-mount '{server['restore_mount']}'"
f" --restore-path '{server['restore_path']}'"
f" --rsync-target '{backup['rsync_target']}'"
f" --pbs-storage '{backup['pbs_storage_id']}'"
f" --webhook-url '{webhook_url}'"
f" --webhook-token '{webhook_tok}'"
f" --server-hostname '{srv_hostname}'"
f" --backup-size '{bk_size}'"
f" > /opt/windmill-restore/logs/{safe_client}.log 2>&1 &"
)
ssh.exec_command(cmd)
ssh.close()
def find_backup_for_server(server, backups):
"""
Sucht das erste Backup aus der Liste das zur Server-Größenklasse passt.
max_backup_size_gb = NULL -> kein oberes Limit
min_backup_size_gb = NULL -> kein unteres Limit
Gibt (index, backup) zurueck oder (None, None) wenn nichts passt.
"""
max_gb = server.get("max_backup_size_gb")
min_gb = server.get("min_backup_size_gb")
max_bytes = max_gb * 1024 * 1024 * 1024 if max_gb is not None else None
min_bytes = min_gb * 1024 * 1024 * 1024 if min_gb is not None else None
for i, backup in enumerate(backups):
size = backup.get("backup_size_bytes") or 0
if max_bytes is not None and size > max_bytes:
continue
if min_bytes is not None and size < min_bytes:
continue
return i, backup
return None, None
def main(prev: dict):
if prev.get("mode") == "webhook":
return prev
servers = prev.get("target_servers", [])
backups = list(prev.get("backups", []))
job_uuid = prev["job_uuid"]
if not backups:
return {**prev, "status": "no_backups"}
db_cfg = json.loads(wmill.get_variable("f/Backup/mysql_config"))
conn = mysql.connector.connect(**db_cfg)
cur = conn.cursor()
webhook_url = wmill.get_variable("f/Backup/windmill_webhook_url")
webhook_tok = wmill.get_variable("f/Backup/windmill_webhook_token")
servers_sorted = sorted(
servers,
key=lambda s: s.get("max_backup_size_gb") or 999999,
reverse=True
)
started = []
for server in servers_sorted:
if not backups:
break
idx, backup = find_backup_for_server(server, backups)
if backup is None:
print(f"Kein passendes Backup fuer '{server['hostname']}' "
f"(max: {server.get('max_backup_size_gb')} GB)")
continue
backups.pop(idx)
def fail_backup(msg, bk=backup, sv=server):
cur.execute("""
INSERT INTO Kunden.`bronze.restore.result`
(job_uuid, client_name, backup_path,
backup_size_bytes, restore_server, status, error_message)
VALUES (%s, %s, %s, %s, %s, 'failed', %s)
""", (job_uuid, bk["client_name"], bk["backup_path"],
bk.get("backup_size_bytes", 0), sv["hostname"], msg))
cur.execute("""
UPDATE Kunden.`bronze.restore.jobs`
SET failed_count=failed_count+1 WHERE job_uuid=%s
""", (job_uuid,))
conn.commit()
if not backup.get("rsync_target"):
fail_backup("rsync_target fehlt"); continue
if not backup.get("pbs_storage_id"):
fail_backup("pbs_storage_id fehlt"); continue
if not server.get("restore_mount"):
fail_backup("restore_mount fehlt"); continue
if not server.get("restore_path"):
fail_backup("restore_path fehlt"); continue
client_like = f"{backup['client_name']}%"
cur.execute("""
INSERT INTO Kunden.`bronze.restore.result`
(job_uuid, client_name, backup_path,
backup_size_bytes, restore_server, status, started_at)
VALUES (%s, %s, %s, %s, %s, 'restoring', NOW())
""", (job_uuid, backup["client_name"], backup["backup_path"],
backup.get("backup_size_bytes", 0), server["hostname"]))
cur.execute("""
UPDATE Kunden.`bronze.restore.server`
SET current_job_uuid=%s WHERE hostname=%s
""", (job_uuid, server["hostname"]))
cur.execute("""
UPDATE Kunden.`bronze.backup.queue`
SET status='assigned'
WHERE job_uuid=%s AND backup_path LIKE %s
""", (job_uuid, client_like))
conn.commit()
start_restore(server, backup, job_uuid, webhook_url, webhook_tok)
size_gb = (backup.get("backup_size_bytes") or 0) / 1024 / 1024 / 1024
print(f"Restore gestartet: {backup['client_name']} "
f"({size_gb:.1f} GB) auf {server['hostname']} "
f"(max: {server.get('max_backup_size_gb')} GB)")
started.append({
"client": backup["client_name"],
"server": server["hostname"],
})
cur.close(); conn.close()
return {
**prev,
"status": "restore_started",
"started": started,
"backups": backups,
}
@@ -0,0 +1,121 @@
summary: Backup Restore Orchestrator
description: |
Startet täglich um 00:11 Uhr. Holt Backup-Liste direkt via
proxmox-backup-client, schreibt Queue nach Größe sortiert in DB,
registriert PBS-Datastores auf allen freien Restore-Servern,
startet Restores parallel auf mehreren Servern per Webhook-Chaining.
value:
modules:
- id: f
summary: Aktive Datastores aus DB holen
value:
type: rawscript
content: '!inline aktive_datastores_aus_db_holen.my.sql'
input_transforms:
database:
type: static
value: $res:u/sebastianserfling/fascinating_mysql
lock: ''
language: mysql
- id: a
summary: Job initialisieren & Backup-Queue aus PBS aufbauen
value:
type: rawscript
content: '!inline job_initialisieren_&_backup-queue_aus_pbs_aufbauen.py'
input_transforms:
datastores:
type: javascript
expr: results.f
trigger_type:
type: javascript
expr: "flow_input.job_uuid ? 'webhook' : 'schedule'"
webhook_data:
type: javascript
expr: 'flow_input.job_uuid ? flow_input : {}'
lock: '!inline job_initialisieren_&_backup-queue_aus_pbs_aufbauen.lock'
language: python3
- id: b
summary: Alle freien Restore-Server holen
value:
type: rawscript
content: '!inline alle_freien_restore-server_holen.py'
input_transforms:
prev:
type: javascript
expr: results.a
lock: '!inline alle_freien_restore-server_holen.lock'
language: python3
- id: g
summary: SSH-Credentials fuer alle Restore-Server aus Bitwarden
value:
type: rawscript
content: '!inline ssh-credentials_fuer_alle_restore-server_aus_bitwarden.py'
input_transforms:
bw_url:
type: static
value: https://bitwarden.stines.de
prev:
type: javascript
expr: results.b
lock: '!inline ssh-credentials_fuer_alle_restore-server_aus_bitwarden.lock'
language: python3
- id: c
summary: Script deployen & PBS-Datastores auf allen Servern registrieren
value:
type: rawscript
content: '!inline
script_deployen_&_pbs-datastores_auf_allen_servern_registrieren.py'
input_transforms:
bw_result:
type: javascript
expr: results.g
datastores:
type: javascript
expr: results.f
prev:
type: javascript
expr: results.g
lock: '!inline
script_deployen_&_pbs-datastores_auf_allen_servern_registrieren.lock'
language: python3
- id: h
summary: Alte Restore-Ordner auf Backup-Server loeschen
value:
type: rawscript
content: '!inline alte_restore-ordner_auf_backup-server_loeschen.py'
input_transforms:
prev:
type: javascript
expr: results.c
lock: '!inline alte_restore-ordner_auf_backup-server_loeschen.lock'
language: python3
- id: d
summary: Ersten Restore pro Server starten
value:
type: rawscript
content: '!inline ersten_restore_pro_server_starten.py'
input_transforms:
prev:
type: javascript
expr: results.h
lock: '!inline ersten_restore_pro_server_starten.lock'
language: python3
continue_on_error: false
- id: e
summary: Webhook verarbeiten & naechsten Restore auf demselben Server starten
value:
type: rawscript
content: '!inline
webhook_verarbeiten_&_naechsten_restore_auf_demselben_server_starten.py'
input_transforms:
from_init:
type: javascript
expr: results.a
lock: '!inline
webhook_verarbeiten_&_naechsten_restore_auf_demselben_server_starten.lock'
language: python3
schema:
$schema: https://json-schema.org/draft/2020-12/schema
type: object
order: []
properties: {}
@@ -0,0 +1,10 @@
# py: 3.12
anyio==4.12.1
certifi==2026.2.25
h11==0.16.0
httpcore==1.0.9
httpx==0.28.1
idna==3.11
mysql-connector-python==9.6.0
typing-extensions==4.15.0
wmill==1.662.0
@@ -0,0 +1,161 @@
import wmill, mysql.connector, json, uuid, subprocess, sys, os
from datetime import datetime
def main(
trigger_type: str = "schedule",
webhook_data: dict = {},
datastores: list = [],
):
# Webhook erkennen: kind=webhook ODER job_uuid im payload
if trigger_type == "webhook" or webhook_data.get("job_uuid"):
return {"mode": "webhook", "data": webhook_data}
pbs = json.loads(wmill.get_variable("f/Backup/pbs_variable"))
port = pbs.get("port", 8007)
env = os.environ.copy()
env["PBS_PASSWORD"] = pbs["password"]
if pbs.get("fingerprint"):
env["PBS_FINGERPRINT"] = pbs["fingerprint"]
if subprocess.run(["which", "proxmox-backup-client"],
capture_output=True).returncode != 0:
print("Installiere proxmox-backup-client...", file=sys.stderr)
subprocess.run(["bash", "-c", (
"echo 'deb http://download.proxmox.com/debian/pbs bookworm pbs-no-subscription'"
" > /etc/apt/sources.list.d/pbs-client.list && "
"wget -qO /etc/apt/trusted.gpg.d/proxmox-release-bookworm.gpg "
" https://enterprise.proxmox.com/debian/proxmox-release-bookworm.gpg && "
"apt-get update -qq && apt-get install -y proxmox-backup-client"
)], check=True)
db_cfg = json.loads(wmill.get_variable("f/Backup/mysql_config"))
conn = mysql.connector.connect(**db_cfg)
cur = conn.cursor(dictionary=True)
cur.execute("""
SELECT datastore, rsync_target, pbs_storage_id
FROM Kunden.`bronze.backup.datastore.config`
""")
ds_config = {row["datastore"]: row for row in cur.fetchall()}
all_snaps = []
for row in datastores:
datastore = row["datastore"]
if port != 8007:
repo = f"{pbs['user']}@{pbs['host']}!{port}:{datastore}"
else:
repo = f"{pbs['user']}@{pbs['host']}:{datastore}"
env["PBS_REPOSITORY"] = repo
print(f"Hole Snapshots: {datastore}...", file=sys.stderr)
result = subprocess.run(
["proxmox-backup-client", "snapshots", "--output-format", "json"],
capture_output=True, text=True, env=env,
)
if result.returncode != 0:
print(f"WARNUNG: {datastore} fehlgeschlagen:\n{result.stderr}",
file=sys.stderr)
continue
snaps = json.loads(result.stdout)
snaps = [s for s in snaps if s.get("backup-type") in ("vm", "ct")]
for s in snaps:
s["_datastore"] = datastore
all_snaps.extend(snaps)
print(f" -> {len(snaps)} Snapshots.", file=sys.stderr)
if not all_snaps:
raise Exception("Keine Snapshots gefunden!")
latest: dict = {}
for snap in all_snaps:
key = f"{snap['_datastore']}/{snap['backup-type']}/{snap['backup-id']}"
if key not in latest or snap["backup-time"] > latest[key]["backup-time"]:
latest[key] = snap
sorted_snaps = sorted(latest.values(), key=lambda s: s.get("size", 0), reverse=True)
print(f"{len(sorted_snaps)} Gruppen -> Queue.", file=sys.stderr)
job_uuid = str(uuid.uuid4())
cur.execute("""
SELECT job_uuid FROM Kunden.`bronze.restore.jobs`
WHERE status = 'running'
LIMIT 1
""")
existing = cur.fetchone()
if existing:
cur.close(); conn.close()
raise Exception(
f"Job bereits aktiv: {existing['job_uuid']} "
f"kein neuer Job gestartet."
)
cur.execute("SET time_zone = 'Europe/Berlin'")
cur.execute("""
INSERT INTO Kunden.`bronze.restore.jobs`
(job_uuid, started_at, status, restore_server)
VALUES (%s, NOW(), 'running', '')
""", (job_uuid,))
cur.execute("""
UPDATE Kunden.`bronze.backup.queue`
SET status='obsolete' WHERE status='queued'
""")
for idx, snap in enumerate(sorted_snaps):
backup_type = snap["backup-type"]
backup_id = str(snap["backup-id"])
datastore = snap["_datastore"]
size_bytes = snap.get("size", 0)
ts = snap["backup-time"]
client_name = f"{datastore}:{backup_type}/{backup_id}"
backup_time_str = datetime.utcfromtimestamp(ts).strftime("%Y-%m-%dT%H:%M:%SZ")
backup_path = f"{datastore}:{backup_type}/{backup_id}/{backup_time_str}"
cfg = ds_config.get(datastore, {})
rsync_target = cfg.get("rsync_target")
pbs_storage_id = cfg.get("pbs_storage_id")
cur.execute("""
INSERT INTO Kunden.`bronze.backup.queue`
(job_uuid, client_name, backup_path, backup_size_bytes,
encrypt_key_ref, priority, rsync_target,
pbs_storage_id, status, last_seen_at)
VALUES (%s, %s, %s, %s, %s, %s, %s, %s, 'queued', NOW())
""", (
job_uuid, client_name, backup_path, size_bytes,
client_name, idx, rsync_target, pbs_storage_id,
))
cur.execute("""
INSERT INTO Kunden.`bronze.restore.plan`
(job_uuid, client_name, vm_name)
VALUES (%s, %s, %s)
""", (job_uuid, client_name, f"{backup_type}/{backup_id}"))
cur.execute("""
UPDATE Kunden.`bronze.restore.jobs`
SET total_backups=%s WHERE job_uuid=%s
""", (len(sorted_snaps), job_uuid))
conn.commit()
cur.execute("""
SELECT client_name, backup_path, backup_size_bytes,
encrypt_key_ref, priority,
rsync_target, pbs_storage_id
FROM Kunden.`bronze.backup.queue`
WHERE job_uuid = %s AND status = 'queued'
ORDER BY priority ASC
""", (job_uuid,))
queued = cur.fetchall()
cur.close(); conn.close()
print(f"Queue: {len(queued)} Backups.", file=sys.stderr)
return {
"mode": "schedule",
"job_uuid": job_uuid,
"backups": queued,
}
@@ -0,0 +1,17 @@
# py: 3.12
anyio==4.12.1
bcrypt==5.0.0
certifi==2026.2.25
cffi==2.0.0
cryptography==46.0.5
h11==0.16.0
httpcore==1.0.9
httpx==0.28.1
idna==3.11
invoke==2.2.1
mysql-connector-python==9.6.0
paramiko==4.0.0
pycparser==3.0
pynacl==1.6.2
typing-extensions==4.15.0
wmill==1.659.1
@@ -0,0 +1,245 @@
import wmill, json, paramiko, io, mysql.connector, re
GITEA_REPO = "http://172.17.1.251:8080/sebastian.serfling/BackupScript.git"
def deploy_to_server(ssh, server, pbs, pbs_host, pbs_user, pbs_pass,
pbs_port, ds_config, datastores, gitea_token,
backup_server_host, job_uuid, ssh_creds, db_cfg,
script_version):
hostname = server["hostname"]
_, out, _ = ssh.exec_command(
"cat /opt/windmill-restore/version.txt 2>/dev/null || echo none"
)
current_version = out.read().decode().strip()
needs_deploy = current_version != script_version \
or not server.get("script_deployed")
if needs_deploy:
print(f"[{hostname}] Deploye Script v{script_version}...")
ssh.exec_command("mkdir -p /opt/windmill-restore/logs")
ssh.exec_command("which git || apt-get install -y git 2>/dev/null")
repo_url = GITEA_REPO.replace("http://", f"http://{gitea_token}@")
_, out, err = ssh.exec_command(
"cd /opt/windmill-restore && "
"if [ -d 'BackupScript/.git' ]; then "
" cd BackupScript && git pull; "
"else "
" rm -rf BackupScript && "
" git clone '" + repo_url + "' BackupScript; "
"fi"
)
exit_code = out.channel.recv_exit_status()
stderr_out = err.read().decode().strip()
if exit_code != 0:
raise Exception(f"[{hostname}] Git fehlgeschlagen: {stderr_out}")
_, out, err = ssh.exec_command(
"cp /opt/windmill-restore/BackupScript/restore.sh "
" /opt/windmill-restore/restore.sh && "
"chmod +x /opt/windmill-restore/restore.sh && "
"echo '" + script_version + "' > /opt/windmill-restore/version.txt"
)
exit_code = out.channel.recv_exit_status()
stderr_out = err.read().decode().strip()
if exit_code != 0:
raise Exception(f"[{hostname}] Script kopieren fehlgeschlagen: {stderr_out}")
print(f"[{hostname}] Script v{script_version} deployed.")
pbs_conf = "\n".join([
f"PBS_HOST={pbs_host}",
f"PBS_PORT={pbs_port}",
f"PBS_USER={pbs_user}",
f"PBS_PASSWORD={pbs_pass}",
]) + "\n"
sftp = ssh.open_sftp()
sftp.putfo(io.BytesIO(pbs_conf.encode()),
"/opt/windmill-restore/pbs.conf")
sftp.putfo(io.BytesIO(backup_server_host.encode()),
"/opt/windmill-restore/backup_server_host")
sftp.close()
ssh.exec_command("chmod 600 /opt/windmill-restore/pbs.conf")
print(f"[{hostname}] backup_server_host: {backup_server_host}")
else:
print(f"[{hostname}] Script aktuell (v{current_version}).")
ssh.exec_command(
"mkdir -p /opt/windmill-restore/keys && "
"chmod 700 /opt/windmill-restore/keys"
)
_, out, _ = ssh.exec_command(
"pvesm status 2>/dev/null | awk 'NR>1{print $1}'"
)
existing_storages = out.read().decode().splitlines()
pbs_storages = []
for row in datastores:
datastore = row["datastore"]
storage_id = "pbs-" + datastore.lower() \
.replace(" ", "-") \
.replace("_", "-")
ds_cfg = ds_config.get(datastore, {})
fingerprint = ds_cfg.get("fingerprint", "") or ""
keyfile_path = f"/opt/windmill-restore/keys/{datastore}.keyfile"
rsync_src = f"root@{pbs_host}:/root/Scripte/{datastore}.keyfile"
_, out, err = ssh.exec_command(
"if [ -s '" + keyfile_path + "' ]; then "
" echo 'vorhanden'; "
"else "
" rsync -az -e 'ssh -o StrictHostKeyChecking=no' "
" '" + rsync_src + "' '" + keyfile_path + "' "
" && chmod 600 '" + keyfile_path + "' "
" && echo 'geholt'; "
"fi"
)
exit_code = out.channel.recv_exit_status()
stderr_out = err.read().decode().strip()
if exit_code != 0:
raise Exception(
f"[{hostname}] Keyfile fehlgeschlagen fuer '{datastore}': {stderr_out}"
)
if storage_id in existing_storages:
print(f"[{hostname}] Storage '{storage_id}' vorhanden.")
else:
fp_part = f"--fingerprint '{fingerprint}'" if fingerprint else ""
cmd = (
f"pvesm add pbs {storage_id} "
f"--server '{pbs_host}' "
f"--datastore '{datastore}' "
f"--username '{pbs_user}' "
f"--password '{pbs_pass}' "
f"--port {pbs_port} "
f"--encryption-key '{keyfile_path}' "
f"--content backup"
)
_, out, err = ssh.exec_command(cmd)
exit_code = out.channel.recv_exit_status()
stderr = err.read().decode().strip()
if exit_code != 0:
raise Exception(
f"[{hostname}] pvesm add '{datastore}' fehlgeschlagen: {stderr}"
)
print(f"[{hostname}] -> '{storage_id}' registriert")
pbs_storages.append({"datastore": datastore, "storage_id": storage_id})
conn = mysql.connector.connect(**db_cfg)
cur = conn.cursor()
if needs_deploy:
cur.execute("""
UPDATE Kunden.`bronze.restore.server`
SET script_deployed=1, script_version=%s WHERE hostname=%s
""", (script_version, hostname))
for entry in pbs_storages:
cur.execute("""
UPDATE Kunden.`bronze.backup.datastore.config`
SET pbs_storage_id=%s WHERE datastore=%s
""", (entry["storage_id"], entry["datastore"]))
cur.execute("""
INSERT INTO Kunden.`bronze.restore.session`
(job_uuid, hostname, ip, ssh_user, ssh_password)
VALUES (%s, %s, %s, %s, %s)
ON DUPLICATE KEY UPDATE
ip=VALUES(ip), ssh_user=VALUES(ssh_user),
ssh_password=VALUES(ssh_password)
""", (
job_uuid, hostname,
ssh_creds["ip"], ssh_creds["user"], ssh_creds["password"],
))
conn.commit(); cur.close(); conn.close()
print(f"[{hostname}] Session-Creds gespeichert.")
return pbs_storages
def main(prev: dict, bw_result: dict = {}, datastores: list = []):
if prev.get("mode") == "webhook":
return prev
servers = prev["target_servers"]
server_creds = prev.get("server_creds", {})
job_uuid = prev["job_uuid"]
script_version = wmill.get_variable("f/Backup/restore_version").strip()
print(f"Script-Version aus Variable: {script_version}")
pbs = json.loads(wmill.get_variable("f/Backup/pbs_variable"))
pbs_host = pbs["host"]
pbs_user = pbs["user"]
pbs_pass = pbs["password"]
pbs_port = str(pbs.get("port", 8007))
db_cfg = json.loads(wmill.get_variable("f/Backup/mysql_config"))
conn = mysql.connector.connect(**db_cfg)
cur = conn.cursor(dictionary=True)
cur.execute("""
SELECT datastore, rsync_target, pbs_storage_id, fingerprint
FROM Kunden.`bronze.backup.datastore.config`
""")
ds_config = {row["datastore"]: row for row in cur.fetchall()}
cur.close(); conn.close()
gitea_token = wmill.get_variable("f/Backup/gitea_token")
backup_server_host = wmill.get_variable("f/Backup/backup_server_host")
target_servers_out = []
for server in servers:
hostname = server["hostname"]
creds = server_creds.get(hostname, {})
url = creds.get("url", "")
ip_match = re.search(r'https?://([0-9.]+)', url)
ip = ip_match.group(1) if ip_match else server.get("ip", "")
if not ip:
print(f"WARNUNG: Keine IP fuer '{hostname}' uebersprungen.")
continue
ssh_creds_dict = {
"ip": ip,
"user": creds.get("username", "root"),
"password": creds.get("password", ""),
}
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect(ip, username=ssh_creds_dict["user"],
password=ssh_creds_dict["password"])
try:
pbs_storages = deploy_to_server(
ssh, server, pbs, pbs_host, pbs_user, pbs_pass,
pbs_port, ds_config, datastores, gitea_token,
backup_server_host, job_uuid, ssh_creds_dict, db_cfg,
script_version
)
finally:
ssh.close()
target_servers_out.append({
**server,
"ip": ip,
"ssh_creds": ssh_creds_dict,
"pbs_storages": pbs_storages,
})
if not target_servers_out:
raise Exception("Kein Server konnte vorbereitet werden!")
return {
**prev,
"target_servers": target_servers_out,
"script_version": script_version,
}
@@ -0,0 +1,9 @@
# py: 3.12
anyio==4.12.1
certifi==2026.2.25
h11==0.16.0
httpcore==1.0.9
httpx==0.28.1
idna==3.11
typing-extensions==4.15.0
wmill==1.657.2
@@ -0,0 +1,82 @@
import subprocess, sys, json, os, wmill
def bw_lookup(search_term, env, run):
run(["bw", "sync", "--session", env["BW_SESSION"]], check=False)
for attempt in range(3):
result = run(
["bw", "list", "items", "--search", search_term,
"--session", env["BW_SESSION"]]
)
items = json.loads(result.stdout)
if items:
break
import time; time.sleep(3)
if not items:
raise Exception(f"Kein Bitwarden-Eintrag fuer: '{search_term}'")
exact = next(
(i for i in items
if i.get("name","").strip().lower() == search_term.strip().lower()),
None,
)
item = exact if exact else items[0]
url = ((item.get("login",{}).get("uris") or [{}])[0].get("uri","")) \
if item.get("login") else ""
return {
"username": item.get("login",{}).get("username","") if item.get("login") else "",
"password": item.get("login",{}).get("password","") if item.get("login") else "",
"url": url,
}
def main(
prev: dict,
bw_url: str = "https://bitwarden.stines.de",
):
if prev.get("mode") == "webhook":
return prev
servers = prev.get("target_servers", [])
bw_creds = json.loads(wmill.get_variable("f/Backup/bitwarden_api_login"))
env = os.environ.copy()
env["BW_CLIENTID"] = bw_creds["bw_clientid"]
env["BW_CLIENTSECRET"] = bw_creds["bw_clientsecret"]
env["BW_PASSWORD"] = bw_creds["bw_masterpassword"]
def run(cmd, check=True):
return subprocess.run(cmd, env=env, text=True, capture_output=True, check=check)
if subprocess.run(["which", "bw"], capture_output=True).returncode != 0:
run(["wget",
"https://github.com/bitwarden/cli/releases/download/v1.22.1/bw-linux-1.22.1.zip",
"-O", "bw.zip"])
run(["unzip", "bw.zip"])
run(["chmod", "+x", "bw"])
run(["mv", "bw", "/usr/local/bin/bw"])
with open("/etc/hosts", "a") as f:
f.write("172.17.1.3 bitwarden.stines.de\n")
run(["bw", "config", "server", bw_url])
run(["bw", "logout"], check=False)
result = run(["bw", "login", "--apikey"], check=False)
if result.returncode != 0:
raise Exception(f"Bitwarden Login fehlgeschlagen: {result.stderr}")
unlock = run(["bw", "unlock", bw_creds["bw_masterpassword"], "--raw"])
bw_session = unlock.stdout.strip()
if not bw_session:
raise Exception("Vault konnte nicht entsperrt werden")
env["BW_SESSION"] = bw_session
server_creds = {}
for server in servers:
hostname = server["hostname"]
print(f"Hole Creds fuer: {hostname}")
creds = bw_lookup(hostname, env, run)
server_creds[hostname] = creds
print(f" -> OK: {creds['username']}@{hostname}")
run(["bw", "logout"], check=False)
return {**prev, "server_creds": server_creds}
@@ -0,0 +1,17 @@
# py: 3.12
anyio==4.13.0
bcrypt==5.0.0
certifi==2026.2.25
cffi==2.0.0
cryptography==46.0.7
h11==0.16.0
httpcore==1.0.9
httpx==0.28.1
idna==3.11
invoke==3.0.3
mysql-connector-python==9.6.0
paramiko==4.0.0
pycparser==3.0
pynacl==1.6.2
typing-extensions==4.15.0
wmill==1.680.0
@@ -0,0 +1,404 @@
import wmill, json, mysql.connector, paramiko, re, base64
from datetime import datetime
import httpx
def send_nextcloud_message(message: str):
"""Sendet eine Nachricht an Nextcloud Talk."""
try:
nc_url = wmill.get_variable("f/Backup/nextcloud_talk_url").rstrip("/")
nc_room = wmill.get_variable("f/Backup/nextcloud_talk_room")
nc_user = wmill.get_variable("f/Backup/nextcloud_talk_user")
nc_password = wmill.get_variable("f/Backup/nextcloud_talk_password")
credentials = base64.b64encode(
f"{nc_user}:{nc_password}".encode()
).decode()
url = f"{nc_url}/ocs/v2.php/apps/spreed/api/v1/chat/{nc_room}"
headers = {
"Authorization": f"Basic {credentials}",
"OCS-APIREQUEST": "true",
"Content-Type": "application/json",
"Accept": "application/json",
}
response = httpx.post(
url,
headers=headers,
json={"message": message},
timeout=15,
verify=False,
)
print(f"Nextcloud Talk: HTTP {response.status_code}")
except Exception as e:
print(f"Nextcloud Talk Fehler (nicht kritisch): {e}")
def find_next_backup_for_server(cur, job_uuid, server_hostname, max_backup_size_gb):
if max_backup_size_gb is not None:
max_bytes = max_backup_size_gb * 1024 * 1024 * 1024
cur.execute("""
SELECT q.client_name, q.backup_path, q.backup_size_bytes,
q.rsync_target, q.pbs_storage_id,
r.hostname AS server_hostname,
r.ip AS server_ip,
r.restore_mount,
r.restore_path AS restore_path,
r.free_space_gb,
r.max_backup_size_gb,
r.min_backup_size_gb
FROM Kunden.`bronze.backup.queue` q
JOIN Kunden.`bronze.restore.server` r
ON r.hostname = %s
WHERE q.job_uuid = %s AND q.status = 'queued'
AND r.current_job_uuid = %s
AND (q.backup_size_bytes IS NULL OR q.backup_size_bytes <= %s)
AND (r.min_backup_size_gb IS NULL OR q.backup_size_bytes >= r.min_backup_size_gb * 1024 * 1024 * 1024)
ORDER BY q.priority ASC
LIMIT 1
""", (server_hostname, job_uuid, job_uuid, max_bytes))
else:
cur.execute("""
SELECT q.client_name, q.backup_path, q.backup_size_bytes,
q.rsync_target, q.pbs_storage_id,
r.hostname AS server_hostname,
r.ip AS server_ip,
r.restore_mount,
r.restore_path AS restore_path,
r.free_space_gb,
r.max_backup_size_gb,
r.min_backup_size_gb
FROM Kunden.`bronze.backup.queue` q
JOIN Kunden.`bronze.restore.server` r
ON r.hostname = %s
WHERE q.job_uuid = %s AND q.status = 'queued'
AND r.current_job_uuid = %s
AND (r.min_backup_size_gb IS NULL OR q.backup_size_bytes >= r.min_backup_size_gb * 1024 * 1024 * 1024)
ORDER BY q.priority ASC
LIMIT 1
""", (server_hostname, job_uuid, job_uuid))
return cur.fetchone()
def release_server_and_check_done(cur, conn, job_uuid, server_hostname):
"""Server freigeben und prüfen ob alle Jobs fertig sind."""
cur.execute("""
UPDATE Kunden.`bronze.restore.server`
SET current_job_uuid=NULL
WHERE hostname=%s
""", (server_hostname,))
conn.commit()
print(f"Server '{server_hostname}' fertig.")
cur.execute("""
SELECT COUNT(*) AS cnt FROM Kunden.`bronze.restore.server`
WHERE current_job_uuid = %s
""", (job_uuid,))
still_active = cur.fetchone()["cnt"]
if still_active == 0:
cur.execute("""
UPDATE Kunden.`bronze.restore.jobs`
SET status='completed', finished_at=NOW()
WHERE job_uuid=%s
""", (job_uuid,))
cur.execute("""
DELETE FROM Kunden.`bronze.restore.session`
WHERE job_uuid=%s
""", (job_uuid,))
# Statistik für Abschluss-Nachricht
cur.execute("""
SELECT total_backups, restored_count, failed_count,
TIMESTAMPDIFF(MINUTE, started_at, NOW()) AS duration_min
FROM Kunden.`bronze.restore.jobs`
WHERE job_uuid=%s
""", (job_uuid,))
job_stats = cur.fetchone()
conn.commit()
print(f"Job {job_uuid} vollstaendig abgeschlossen.")
if job_stats:
total = job_stats["total_backups"] or 0
restored = job_stats["restored_count"] or 0
failed = job_stats["failed_count"] or 0
dur_min = job_stats["duration_min"] or 0
dur_str = f"{dur_min//60}h {dur_min%60}m" if dur_min >= 60 else f"{dur_min}m"
if failed == 0:
msg = (
f"✅ **Backup-Job abgeschlossen**\n"
f"Alle {total} Backups erfolgreich | Dauer: {dur_str}"
)
else:
msg = (
f"⚠️ **Backup-Job abgeschlossen mit Fehlern**\n"
f"{restored}/{total} erfolgreich | ❌ {failed} fehlgeschlagen | Dauer: {dur_str}"
)
send_nextcloud_message(msg)
return "all_done"
else:
conn.commit()
print(f"Noch {still_active} Server aktiv.")
return "server_done"
def main(from_init: dict):
if from_init.get("mode") == "schedule":
print("Restores gestartet - Flow wartet auf Webhooks.")
return {"status": "waiting_for_webhook",
"job_uuid": from_init.get("job_uuid")}
data = from_init.get("data", {})
job_uuid = data.get("job_uuid")
client = data.get("client_name")
status = data.get("status", "unknown")
server_hostname = data.get("server_hostname", "")
if not job_uuid or not client:
raise Exception(f"Ungueltiger Webhook-Payload: {data}")
print(f"Webhook: {client} -> {status} (Server: {server_hostname})")
client_like = f"{client}%"
db_cfg = json.loads(wmill.get_variable("f/Backup/mysql_config"))
conn = mysql.connector.connect(**db_cfg)
cur = conn.cursor(dictionary=True)
cur.execute("""
UPDATE Kunden.`bronze.restore.result` SET
vm_name = %s,
vm_id_original = %s,
vm_id_restored = %s,
restore_duration_sec = %s,
actual_disk_used_bytes = %s,
zip_size_bytes = %s,
zip_duration_sec = %s,
rsync_size_bytes = %s,
rsync_ok = %s,
rsync_retries = %s,
qm_agent_ok = %s,
status = %s,
error_message = %s,
webhook_received_at = %s
WHERE job_uuid = %s AND backup_path LIKE %s
""", (
data.get("vm_name", ""),
data.get("vm_id_original"),
data.get("vm_id_restored"),
data.get("restore_duration_sec"),
data.get("actual_disk_used_bytes"),
data.get("zip_size_bytes"),
data.get("zip_duration_sec"),
data.get("rsync_size_bytes"),
1 if data.get("rsync_ok") else 0,
data.get("rsync_retries", 0),
1 if data.get("qm_agent_ok") in (True, "true", "skipped") else 0,
"done" if status == "success" else "failed",
data.get("error_message", ""),
datetime.now(),
job_uuid, client_like,
))
cur.execute("""
UPDATE Kunden.`bronze.backup.queue` SET status=%s
WHERE job_uuid=%s AND backup_path LIKE %s
""", ("done" if status == "success" else "failed", job_uuid, client_like))
field = "restored_count" if status == "success" else "failed_count"
cur.execute(
f"UPDATE Kunden.`bronze.restore.jobs` "
f"SET {field}={field}+1 WHERE job_uuid=%s",
(job_uuid,)
)
if data.get("free_space_gb") is not None and server_hostname:
cur.execute("""
UPDATE Kunden.`bronze.restore.server`
SET free_space_gb = %s WHERE hostname = %s
""", (data.get("free_space_gb"), server_hostname))
conn.commit()
vm_name = data.get("vm_name") or client
dur_sec = data.get("restore_duration_sec") or 0
dur_str = f"{dur_sec//60}m {dur_sec%60}s" if dur_sec >= 60 else f"{dur_sec}s"
zip_mb = (data.get("zip_size_bytes") or 0) // 1024 // 1024
icon = "" if status == "success" else ""
if status != "success":
err = data.get("error_message", "")[:100]
nc_msg = (
f"{icon} **{vm_name}** ({client})\n"
f"Server: {server_hostname} | Fehler: {err}"
)
send_nextcloud_message(nc_msg)
cur.execute("""
SELECT max_backup_size_gb, min_backup_size_gb, free_space_gb
FROM Kunden.`bronze.restore.server`
WHERE hostname = %s
""", (server_hostname,))
srv_cfg = cur.fetchone()
max_backup_size_gb = srv_cfg["max_backup_size_gb"] if srv_cfg else None
min_backup_size_gb = srv_cfg["min_backup_size_gb"] if srv_cfg else None
max_bytes = max_backup_size_gb * 1024 * 1024 * 1024 if max_backup_size_gb is not None else None
min_bytes = min_backup_size_gb * 1024 * 1024 * 1024 if min_backup_size_gb is not None else None
nxt = find_next_backup_for_server(cur, job_uuid, server_hostname, max_backup_size_gb)
conn.commit()
if not nxt:
if max_backup_size_gb is not None and min_backup_size_gb is not None:
cur.execute("""
SELECT client_name, backup_path, backup_size_bytes,
rsync_target, pbs_storage_id
FROM Kunden.`bronze.backup.queue`
WHERE job_uuid = %s AND status = 'queued'
AND (backup_size_bytes IS NULL OR backup_size_bytes <= %s)
AND backup_size_bytes >= %s
ORDER BY priority ASC
LIMIT 1
""", (job_uuid, max_bytes, min_bytes))
elif max_backup_size_gb is not None:
cur.execute("""
SELECT client_name, backup_path, backup_size_bytes,
rsync_target, pbs_storage_id
FROM Kunden.`bronze.backup.queue`
WHERE job_uuid = %s AND status = 'queued'
AND (backup_size_bytes IS NULL OR backup_size_bytes <= %s)
ORDER BY priority ASC
LIMIT 1
""", (job_uuid, max_bytes))
elif min_backup_size_gb is not None:
cur.execute("""
SELECT client_name, backup_path, backup_size_bytes,
rsync_target, pbs_storage_id
FROM Kunden.`bronze.backup.queue`
WHERE job_uuid = %s AND status = 'queued'
AND backup_size_bytes >= %s
ORDER BY priority ASC
LIMIT 1
""", (job_uuid, min_bytes))
else:
cur.execute("""
SELECT client_name, backup_path, backup_size_bytes,
rsync_target, pbs_storage_id
FROM Kunden.`bronze.backup.queue`
WHERE job_uuid = %s AND status = 'queued'
ORDER BY priority ASC
LIMIT 1
""", (job_uuid,))
next_queued = cur.fetchone()
if next_queued:
print(f"Server '{server_hostname}' nimmt naechstes passendes Backup: "
f"{next_queued['client_name']}")
cur.execute("""
SELECT hostname, ip, restore_mount, restore_path,
free_space_gb, max_backup_size_gb, min_backup_size_gb
FROM Kunden.`bronze.restore.server`
WHERE hostname = %s
""", (server_hostname,))
srv = cur.fetchone()
nxt = {
**next_queued,
"server_hostname": server_hostname,
"server_ip": srv["ip"] if srv else "",
"restore_mount": srv["restore_mount"] if srv else "",
"restore_path": srv["restore_path"] if srv else "",
"free_space_gb": srv["free_space_gb"] if srv else 0,
"max_backup_size_gb": srv["max_backup_size_gb"] if srv else None,
"min_backup_size_gb": srv["min_backup_size_gb"] if srv else None,
}
else:
result = release_server_and_check_done(cur, conn, job_uuid, server_hostname)
cur.close(); conn.close()
if result == "all_done":
return {"status": "all_done", "job_uuid": job_uuid}
else:
return {"status": "server_done",
"server": server_hostname,
"job_uuid": job_uuid}
cur.close(); conn.close()
# FIX: started_at hinzugefügt
conn2 = mysql.connector.connect(**db_cfg)
cur2 = conn2.cursor()
cur2.execute("""
INSERT INTO Kunden.`bronze.restore.result`
(job_uuid, client_name, backup_path,
backup_size_bytes, restore_server, status, started_at)
VALUES (%s, %s, %s, %s, %s, 'restoring', NOW())
""", (
job_uuid, nxt["client_name"], nxt["backup_path"],
nxt.get("backup_size_bytes", 0), nxt["server_hostname"],
))
cur2.execute("""
UPDATE Kunden.`bronze.backup.queue` SET status='assigned'
WHERE job_uuid=%s AND backup_path LIKE %s
""", (job_uuid, f"{nxt['client_name']}%"))
conn2.commit(); cur2.close(); conn2.close()
conn3 = mysql.connector.connect(**db_cfg)
cur3 = conn3.cursor(dictionary=True)
cur3.execute("""
SELECT ip, ssh_user, ssh_password
FROM Kunden.`bronze.restore.session`
WHERE job_uuid = %s AND hostname = %s
LIMIT 1
""", (job_uuid, server_hostname))
session = cur3.fetchone()
cur3.close(); conn3.close()
if not session:
raise Exception(
f"Keine Session-Creds fuer '{server_hostname}'. "
f"Job-UUID: {job_uuid}"
)
webhook_url = wmill.get_variable("f/Backup/windmill_webhook_url")
webhook_tok = wmill.get_variable("f/Backup/windmill_webhook_token")
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect(
session["ip"],
username=session["ssh_user"],
password=session["ssh_password"]
)
safe_client = nxt["client_name"].replace("/", "_").replace(":", "_")
nxt_size = nxt.get("backup_size_bytes") or 0
cmd = (
f"nohup /opt/windmill-restore/restore.sh"
f" --job-uuid '{job_uuid}'"
f" --backup-path '{nxt['backup_path']}'"
f" --client '{nxt['client_name']}'"
f" --restore-mount '{nxt['restore_mount']}'"
f" --restore-path '{nxt['restore_path']}'"
f" --rsync-target '{nxt['rsync_target']}'"
f" --pbs-storage '{nxt['pbs_storage_id']}'"
f" --webhook-url '{webhook_url}'"
f" --webhook-token '{webhook_tok}'"
f" --server-hostname '{server_hostname}'"
f" --backup-size '{nxt_size}'"
f" > /opt/windmill-restore/logs/{safe_client}.log 2>&1 &"
)
ssh.exec_command(cmd)
ssh.close()
size_gb = nxt_size / 1024 / 1024 / 1024
print(f"Naechster Restore: {nxt['client_name']} ({size_gb:.1f} GB) auf {server_hostname}")
return {
"status": "next_restore_started",
"client": nxt["client_name"],
"server": server_hostname,
"job_uuid": job_uuid,
}
@@ -0,0 +1,31 @@
summary: Backup Restore Report - Nextcloud Talk
description: |
Läuft täglich um 08:00 Uhr. Holt das Ergebnis des letzten
Backup-Restore-Jobs aus der DB und sendet eine Zusammenfassung
per Nextcloud Talk (Webhook/Bot).
value:
modules:
- id: a
summary: Letzten Job aus DB holen & Report zusammenbauen
value:
type: rawscript
content: '!inline letzten_job_aus_db_holen_&_report_zusammenbauen.py'
input_transforms: {}
lock: '!inline letzten_job_aus_db_holen_&_report_zusammenbauen.lock'
language: python3
- id: b
summary: Nachricht an Nextcloud Talk senden
value:
type: rawscript
content: '!inline nachricht_an_nextcloud_talk_senden.py'
input_transforms:
report:
type: javascript
expr: results.a
lock: '!inline nachricht_an_nextcloud_talk_senden.lock'
language: python3
schema:
$schema: https://json-schema.org/draft/2020-12/schema
type: object
order: []
properties: {}
@@ -0,0 +1,10 @@
# py: 3.12
anyio==4.12.1
certifi==2026.2.25
h11==0.16.0
httpcore==1.0.9
httpx==0.28.1
idna==3.11
mysql-connector-python==9.6.0
typing-extensions==4.15.0
wmill==1.659.1
@@ -0,0 +1,129 @@
import wmill, mysql.connector, json
from datetime import datetime
def fmt_bytes(b):
if not b: return ""
b = int(b)
for unit in ["B","KB","MB","GB","TB"]:
if b < 1024: return f"{b:.1f} {unit}"
b /= 1024
return f"{b:.1f} PB"
def fmt_dur(sec):
if not sec: return ""
sec = int(sec)
if sec < 60: return f"{sec}s"
if sec < 3600: return f"{sec//60}m {sec%60}s"
return f"{sec//3600}h {(sec%3600)//60}m"
def main():
db_cfg = json.loads(wmill.get_variable("f/Backup/mysql_config"))
conn = mysql.connector.connect(**db_cfg)
cur = conn.cursor(dictionary=True)
cur.execute("""
SELECT job_uuid, started_at, finished_at, status,
total_backups, restored_count, failed_count
FROM Kunden.`bronze.restore.jobs`
WHERE status IN ('completed', 'failed')
ORDER BY started_at DESC
LIMIT 1
""")
job = cur.fetchone()
if not job:
cur.close(); conn.close()
return {"message": "⚠️ Kein abgeschlossener Backup-Job gefunden."}
job_uuid = job["job_uuid"]
duration = ""
if job["started_at"] and job["finished_at"]:
secs = (job["finished_at"] - job["started_at"]).seconds
duration = fmt_dur(secs)
cur.execute("""
SELECT client_name, vm_name, restore_server, status,
restore_duration_sec, zip_size_bytes,
rsync_ok, qm_agent_ok, error_message
FROM Kunden.`bronze.restore.result`
WHERE job_uuid = %s
ORDER BY status ASC, client_name ASC
""", (job_uuid,))
results = cur.fetchall()
# Nicht gelaufene VMs
cur.execute("""
SELECT p.client_name, p.vm_name
FROM Kunden.`bronze.restore.plan` p
LEFT JOIN Kunden.`bronze.restore.result` r
ON r.job_uuid = p.job_uuid
AND r.client_name = p.client_name
WHERE p.job_uuid = %s
AND r.id IS NULL
ORDER BY p.client_name ASC
""", (job_uuid,))
not_run = cur.fetchall()
cur.close(); conn.close()
total = job["total_backups"] or 0
done = job["restored_count"] or 0
failed = job["failed_count"] or 0
skipped = len(not_run)
date_str = job["started_at"].strftime("%d.%m.%Y") if job["started_at"] else "?"
time_str = job["started_at"].strftime("%H:%M") if job["started_at"] else "?"
if failed == 0 and skipped == 0:
status_icon = ""
elif done == 0:
status_icon = ""
else:
status_icon = "⚠️"
lines = [
f"{status_icon} **Backup Restore Report {date_str}**",
f"",
f"🕐 Start: {time_str} | Dauer: {duration}",
f"📊 Gesamt: {total} | ✅ OK: {done} | ❌ Fehler: {failed} | ⏭️ Nicht gestartet: {skipped}",
f"",
]
# Fehlgeschlagene
failed_results = [r for r in results if r["status"] == "failed"]
if failed_results:
lines.append("**❌ Fehlgeschlagen:**")
for r in failed_results:
name = r["vm_name"] or r["client_name"]
err = r["error_message"] or "unbekannt"
if len(err) > 80:
err = err[:80] + "..."
lines.append(f"{name} ({r['restore_server']}): {err}")
lines.append("")
# Nicht gestartet
if not_run:
lines.append("**⏭️ Nicht gestartet:**")
for r in not_run:
name = r["vm_name"] or r["client_name"]
lines.append(f"{name}")
lines.append("")
# Erfolgreich
done_results = [r for r in results if r["status"] == "done"]
if done_results:
lines.append("**✅ Erfolgreich:**")
for r in done_results:
name = r["vm_name"] or r["client_name"]
dauer = fmt_dur(r["restore_duration_sec"])
zipsize = fmt_bytes(r["zip_size_bytes"])
agent = "" if r["qm_agent_ok"] else ""
rsync = "" if r["rsync_ok"] else ""
lines.append(
f"{name} | {r['restore_server']} | "
f"{dauer} | 📦 {zipsize} | Agent: {agent} | Rsync: {rsync}"
)
message = "\n".join(lines)
print(message)
return {"message": message, "job_uuid": job_uuid, "failed": failed, "skipped": skipped}
@@ -0,0 +1,9 @@
# py: 3.12
anyio==4.12.1
certifi==2026.2.25
h11==0.16.0
httpcore==1.0.9
httpx==0.28.1
idna==3.11
typing-extensions==4.15.0
wmill==1.657.2
@@ -0,0 +1,44 @@
import wmill, json
import httpx
def main(report: dict):
import base64
message = report.get("message", "Kein Bericht verfuegbar.")
nc_url = wmill.get_variable("f/Backup/nextcloud_talk_url").rstrip("/")
nc_room = wmill.get_variable("f/Backup/nextcloud_talk_room")
nc_user = wmill.get_variable("f/Backup/nextcloud_talk_user")
nc_password = wmill.get_variable("f/Backup/nextcloud_talk_password")
credentials = base64.b64encode(
f"{nc_user}:{nc_password}".encode()
).decode()
url = f"{nc_url}/ocs/v2.php/apps/spreed/api/v1/chat/{nc_room}"
headers = {
"Authorization": f"Basic {credentials}",
"OCS-APIREQUEST": "true",
"Content-Type": "application/json",
"Accept": "application/json",
}
payload = {"message": message}
response = httpx.post(
url,
headers=headers,
json=payload,
timeout=30,
verify=False, # falls self-signed cert
)
print(f"HTTP: {response.status_code}")
print(f"URL: {url}")
if response.status_code not in (200, 201):
raise Exception(
f"Nextcloud Talk Fehler: {response.status_code} {response.text}"
)
print("Nachricht gesendet ✓")
return {"status": "sent", "http": response.status_code}
+6
View File
@@ -0,0 +1,6 @@
summary: ''
display_name: Backup
extra_perms:
sebastianserfling@stines.de: true
owners:
- sebastianserfling@stines.de
@@ -0,0 +1,3 @@
description: ''
value: btrv2jb9
is_secret: false
@@ -0,0 +1,3 @@
description: ''
value: https://cloudstorage.stines.de
is_secret: false
+3
View File
@@ -0,0 +1,3 @@
description: ''
value: 1.0.28
is_secret: false
+734
View File
@@ -0,0 +1,734 @@
#!/usr/bin/env bash
# =============================================================================
# /opt/windmill-restore/restore.sh
# Windmill Backup Restore Worker
# Version: 1.0.26
#
# Unterstützt sowohl VM (qm) als auch CT (pct) Backups.
# Backup-Typ wird automatisch aus dem Backup-Pfad erkannt (vm/ oder ct/).
#
# ABLAUF:
# [0] 7z-Passwort holen password_7z.txt per Rsync vom PBS-Server
# [1] Space-Check Freier Platz auf restore-mount prüfen
# [2] ID ermitteln Original aus Backup-Pfad, Restore-ID ab 1000
# [3] Restore qmrestore (VM) oder pct restore (CT)
# [4] IMAGE_DIR Dynamisch aus PVE-Storage-Pfad ermitteln
# [5] Images prüfen Abbruch wenn leer/nicht vorhanden
# [6] Vorbereiten VM: unlock/cdrom/net entfernen/Agent
# CT: unlock/net entfernen
# [7] Starten & prüfen VM: qm-Agent 120s | CT: pct exec ping
# [8] Stoppen VM: qm shutdown | CT: pct stop
# [9] Config sichern Originale Config ins ZIP-Verzeichnis
# [10] 7z-Archiv Images verschlüsselt zippen
# [11] Rsync ZIP zum Backup-Server
# [12] Aufräumen destroy, ZIP löschen
# [13] Webhook JSON → Windmill
# =============================================================================
set -euo pipefail
# ── Konfigdatei laden ─────────────────────────────────────────────────────────
CONF_FILE="/opt/windmill-restore/pbs.conf"
[[ ! -f "$CONF_FILE" ]] && { echo "FEHLER: $CONF_FILE fehlt!" >&2; exit 1; }
source "$CONF_FILE"
# ── Argument-Parser ───────────────────────────────────────────────────────────
JOB_UUID=""
BACKUP_PATH=""
CLIENT_NAME=""
RESTORE_MOUNT=""
RESTORE_PATH=""
RSYNC_TARGET=""
PBS_STORAGE=""
WEBHOOK_URL=""
WEBHOOK_TOKEN=""
SERVER_HOSTNAME=""
BACKUP_SIZE_BYTES=0
while [[ $# -gt 0 ]]; do
case $1 in
--job-uuid) JOB_UUID="$2"; shift 2 ;;
--backup-path) BACKUP_PATH="$2"; shift 2 ;;
--client) CLIENT_NAME="$2"; shift 2 ;;
--restore-mount) RESTORE_MOUNT="$2"; shift 2 ;;
--restore-path) RESTORE_PATH="$2"; shift 2 ;;
--rsync-target) RSYNC_TARGET="$2"; shift 2 ;;
--pbs-storage) PBS_STORAGE="$2"; shift 2 ;;
--webhook-url) WEBHOOK_URL="$2"; shift 2 ;;
--webhook-token) WEBHOOK_TOKEN="$2"; shift 2 ;;
--server-hostname) SERVER_HOSTNAME="$2"; shift 2 ;;
--backup-size) BACKUP_SIZE_BYTES="$2"; shift 2 ;;
*) echo "Unbekannter Parameter: $1" >&2; exit 1 ;;
esac
done
for var in JOB_UUID BACKUP_PATH CLIENT_NAME \
RESTORE_MOUNT RESTORE_PATH RSYNC_TARGET PBS_STORAGE WEBHOOK_URL; do
[[ -z "${!var}" ]] && { echo "FEHLER: --${var//_/-} fehlt" >&2; exit 1; }
done
[[ ! -d "$RESTORE_MOUNT" ]] && {
echo "FEHLER: Restore-Mount '$RESTORE_MOUNT' existiert nicht!" >&2; exit 1
}
# Fallback SERVER_HOSTNAME
SERVER_HOSTNAME="${SERVER_HOSTNAME:-$(hostname -f 2>/dev/null || hostname)}"
# ── Logging ───────────────────────────────────────────────────────────────────
LOG_DIR="/opt/windmill-restore/logs"
mkdir -p "$LOG_DIR"
SAFE_CLIENT="${CLIENT_NAME//\//_}"
SAFE_CLIENT="${SAFE_CLIENT//:/_}"
LOG_FILE="$LOG_DIR/${SAFE_CLIENT}.log"
exec >> "$LOG_FILE" 2>&1
# ── Backup-Pfad zerlegen ──────────────────────────────────────────────────────
DATASTORE=$(echo "$BACKUP_PATH" | cut -d: -f1)
SNAPSHOT_PATH=$(echo "$BACKUP_PATH" | cut -d: -f2-)
BACKUP_TYPE=$(echo "$SNAPSHOT_PATH" | cut -d/ -f1)
PVE_BACKUP_REF="${PBS_STORAGE}:backup/${SNAPSHOT_PATH}"
# ── Komprimierungsstufe festlegen ─────────────────────────────────────────────
# Standard: mx=1 (schnellste Komprimierung)
# Ausnahme: tnp-Invest-GmbH vm/108 → mx=0 (Store-Modus, kein Komprimieren)
# Hintergrund: Diese VM ist sehr groß und würde mit mx=1 ~10h brauchen.
COMPRESS_LEVEL=0
# ── 7z Thread-Anzahl je Host festlegen ────────────────────────────────────────
# STI-BAC01 → Ryzen 9 5950X (16 Kerne / 32 Threads) → mmt=16
# ITD-PROX01 → Ryzen 7 3700X ( 8 Kerne / 16 Threads) → mmt=8
# STI-PROX01 → Xeon E5-1650v3 ( 6 Kerne / 12 Threads) → mmt=6
# Fallback → mmt=4
case "$SERVER_HOSTNAME" in
STI-BAC01) MMT_THREADS=30 ;;
ITD-PROX01) MMT_THREADS=8 ;;
STI-PROX01) MMT_THREADS=16 ;;
*) MMT_THREADS=4 ;;
esac
echo "INFO: Server '$SERVER_HOSTNAME' → 7z mmt=${MMT_THREADS}"
# ── Messvariablen ─────────────────────────────────────────────────────────────
LAST_DATE=$(TZ="Europe/Berlin" date +"%Y-%m-%d" -d "1 day ago")
# STI-BAC01: rsync_target lokal gemountet → ZIP direkt dorthin, kein Rsync
if [[ "$SERVER_HOSTNAME" == "STI-BAC01" ]]; then
ZIP_DIR="${RSYNC_TARGET}/${LAST_DATE}"
SKIP_RSYNC=1
else
ZIP_DIR="${RESTORE_MOUNT}/zips/${LAST_DATE}"
SKIP_RSYNC=0
fi
BACKUP_SERVER_HOST=$(cat /opt/windmill-restore/backup_server_host 2>/dev/null \
|| echo "backup-server")
KEY_DIR="/opt/windmill-restore/keys"
RESTORE_START=$(date +%s)
STATUS="success"
ERROR_MSG=""
VM_ID_ORIGINAL=0
VM_ID_RESTORED=0
VM_NAME=""
IMAGE_DIR=""
ACTUAL_DISK_BYTES=0
ZIP_SIZE_BYTES=0
ZIP_DURATION=0
RSYNC_SIZE_BYTES=0
RSYNC_OK="true"
RSYNC_RETRIES=0
QM_AGENT_OK="false"
ZIP_FILE=""
ZIP_PASSWORD=""
FREE_GB=0
echo "============================================================"
echo " Windmill Restore Worker"
echo " Client: $CLIENT_NAME"
echo " Typ: $BACKUP_TYPE"
echo " Datastore: $DATASTORE"
echo " Backup: $BACKUP_PATH"
echo " PBS-Storage: $PBS_STORAGE"
echo " Restore-Mount: $RESTORE_MOUNT"
echo " Restore-Path: $RESTORE_PATH"
echo " Rsync-Target: $RSYNC_TARGET"
echo " Server: $SERVER_HOSTNAME"
echo " Skip-Rsync: $SKIP_RSYNC"
echo " Job-UUID: $JOB_UUID"
echo " 7z-Level: mx=${COMPRESS_LEVEL} mmt=${MMT_THREADS}"
echo " Start: $(date '+%Y-%m-%d %H:%M:%S')"
echo "============================================================"
# ── JSON Escape Funktion ──────────────────────────────────────────────────────
escape_json() {
local input="$1"
input="${input//\\/\\\\}"
input="${input//\"/\\\"}"
input="${input//$'\n'/\\n}"
input="${input//$'\r'/\\r}"
input="${input//$'\t'/\\t}"
echo "$input"
}
# ── Webhook-Funktion ──────────────────────────────────────────────────────────
send_webhook() {
local wh_status="$1"
local wh_error
wh_error=$(escape_json "${2:-}")
local wh_vm_name
wh_vm_name=$(escape_json "${VM_NAME:-$SAFE_CLIENT}")
local duration=$(( $(date +%s) - RESTORE_START ))
local payload
payload=$(printf '{
"job_uuid": "%s",
"client_name": "%s",
"status": "%s",
"error_message": "%s",
"server_hostname": "%s",
"free_space_gb": %d,
"vm_name": "%s",
"vm_id_original": %d,
"vm_id_restored": %d,
"restore_duration_sec": %d,
"actual_disk_used_bytes": %d,
"zip_size_bytes": %d,
"zip_duration_sec": %d,
"rsync_size_bytes": %d,
"rsync_ok": %s,
"rsync_retries": %d,
"qm_agent_ok": "%s",
"log_file": "%s"
}' \
"$JOB_UUID" "$CLIENT_NAME" "$wh_status" "$wh_error" \
"$SERVER_HOSTNAME" "$FREE_GB" "$wh_vm_name" \
"$VM_ID_ORIGINAL" "$VM_ID_RESTORED" \
"$duration" "$ACTUAL_DISK_BYTES" \
"$ZIP_SIZE_BYTES" "$ZIP_DURATION" \
"$RSYNC_SIZE_BYTES" "$RSYNC_OK" "$RSYNC_RETRIES" \
"$QM_AGENT_OK" "$LOG_FILE")
echo ""
echo "$(date '+%Y-%m-%d %H:%M:%S') ==> Sende Webhook..."
echo " Payload: $payload"
local http_response
http_response=$(curl -s -w "\n%{http_code}" \
-X POST "$WEBHOOK_URL" \
-H "Content-Type: application/json" \
${WEBHOOK_TOKEN:+-H "Authorization: Bearer ${WEBHOOK_TOKEN}"} \
-d "$payload")
local http_code
http_code=$(echo "$http_response" | tail -1)
local http_body
http_body=$(echo "$http_response" | head -n -1)
echo " HTTP: $http_code"
echo " Response: $http_body"
[[ "$http_code" =~ ^2 ]] && echo " Webhook OK." \
|| echo " WARNUNG: HTTP $http_code"
}
# ── ERR-Trap ──────────────────────────────────────────────────────────────────
trap 'STATUS="failed"
ERROR_LINE=$LINENO
echo ""
echo "FEHLER in Zeile ${ERROR_LINE} räume auf..."
if [[ ${VM_ID_RESTORED:-0} -gt 0 ]]; then
if [[ "$BACKUP_TYPE" == "ct" ]]; then
pct stop "$VM_ID_RESTORED" 2>/dev/null || true
sleep 3
pct destroy "$VM_ID_RESTORED" --purge 1 2>/dev/null || true
else
qm stop "$VM_ID_RESTORED" --skiplock 1 2>/dev/null || true
sleep 5
qm destroy "$VM_ID_RESTORED" \
--destroy-unreferenced-disks 1 --purge 1 2>/dev/null || true
fi
echo " ${BACKUP_TYPE^^} ${VM_ID_RESTORED} entfernt."
fi
[[ -n "${ZIP_FILE:-}" && -f "$ZIP_FILE" ]] && rm -f "$ZIP_FILE"
[[ -n "${IMAGE_DIR:-}" && -d "$IMAGE_DIR" ]] && rm -rf "$IMAGE_DIR"
send_webhook "failed" "Abgebrochen in Zeile ${ERROR_LINE} $LOG_FILE"' ERR
# ═════════════════════════════════════════════════════════════════════════════
# [0/13] 7Z-PASSWORT VOM PBS-SERVER HOLEN
# ═════════════════════════════════════════════════════════════════════════════
echo ""
echo "$(date '+%Y-%m-%d %H:%M:%S') ==> [0/13] 7z-Passwort vom PBS-Server holen ($PBS_HOST)..."
mkdir -p "$KEY_DIR"
chmod 700 "$KEY_DIR"
PW_FILE_LOCAL="${KEY_DIR}/password_7z.txt"
if [[ ! -f "$PW_FILE_LOCAL" || ! -s "$PW_FILE_LOCAL" ]]; then
echo " Hole password_7z.txt..."
rsync -az \
-e "ssh -o StrictHostKeyChecking=no" \
"root@${PBS_HOST}:/root/Scripte/password_7z.txt" \
"$PW_FILE_LOCAL" \
2>&1
chmod 600 "$PW_FILE_LOCAL"
echo " password_7z.txt gespeichert ✓"
else
echo " password_7z.txt bereits vorhanden."
fi
ZIP_PASSWORD=$(grep -m1 "^${DATASTORE}:" "$PW_FILE_LOCAL" \
| awk -F': ' '{print $2}' | tr -d '[:space:]')
[[ -z "$ZIP_PASSWORD" ]] && {
echo "FEHLER: Kein 7z-Passwort für '$DATASTORE' in password_7z.txt" >&2; exit 1
}
echo " 7z-Passwort geladen ✓"
# ═════════════════════════════════════════════════════════════════════════════
# [1/13] SPACE-CHECK
# ═════════════════════════════════════════════════════════════════════════════
echo ""
echo "$(date '+%Y-%m-%d %H:%M:%S') ==> [1/13] Prüfe freien Speicherplatz auf $RESTORE_MOUNT..."
mkdir -p "$ZIP_DIR"
FREE_KB=$(df "$RESTORE_MOUNT" 2>/dev/null | awk 'NR==2{print $4}' || echo "0")
FREE_GB=$(( FREE_KB / 1024 / 1024 ))
FREE_BYTES=$(( FREE_KB * 1024 ))
echo " Frei: ${FREE_GB} GB"
# ═════════════════════════════════════════════════════════════════════════════
# [2/13] ID ERMITTELN
# ═════════════════════════════════════════════════════════════════════════════
echo ""
echo "$(date '+%Y-%m-%d %H:%M:%S') ==> [2/13] Ermittle IDs..."
VM_ID_ORIGINAL=$(echo "$SNAPSHOT_PATH" | grep -oP '\d+' | head -1 || echo "0")
echo " Original-ID: $VM_ID_ORIGINAL (Typ: $BACKUP_TYPE)"
VM_ID_RESTORED=$(
{
pvesh get /nodes/localhost/qemu --output-format json 2>/dev/null || echo "[]"
pvesh get /nodes/localhost/lxc --output-format json 2>/dev/null || echo "[]"
} | python3 -c "
import json, sys
data = []
for line in sys.stdin:
line = line.strip()
if line:
try: data.extend(json.loads(line))
except: pass
existing = {int(v.get('vmid', 0)) for v in data}
for i in range(1000, 2000):
if i not in existing:
print(i); break
" 2>/dev/null || echo "1000"
)
echo " Restore-ID: $VM_ID_RESTORED"
# ═════════════════════════════════════════════════════════════════════════════
# [2.5/13] CONFIG-CHECK
# Config direkt aus PBS-Backup lesen um VM-Name zu ermitteln und zu prüfen
# ob ZIP bereits auf dem Backup-Server existiert → Restore überspringen
# ═════════════════════════════════════════════════════════════════════════════
echo ""
echo "$(date '+%Y-%m-%d %H:%M:%S') ==> [2.5/13] Config aus PBS-Backup lesen..."
CONFIG_VM_NAME=""
CONFIG_TMP="/tmp/pbs_config_${VM_ID_ORIGINAL}_$$.conf"
if [[ "$BACKUP_TYPE" == "ct" ]]; then
CONF_FILE_IN_BACKUP="pct.conf"
NAME_KEY="^hostname:"
else
CONF_FILE_IN_BACKUP="qemu-server.conf"
NAME_KEY="^name:"
fi
export PBS_PASSWORD
export PBS_REPOSITORY="${PBS_USER}@${PBS_HOST}:${DATASTORE}"
SNAP_ID=$(echo "$SNAPSHOT_PATH" | cut -d/ -f3)
echo " Repository: $PBS_REPOSITORY"
echo " Snapshot: ${BACKUP_TYPE}/${VM_ID_ORIGINAL}/${SNAP_ID}"
echo " Config: $CONF_FILE_IN_BACKUP"
echo " Keyfile: ${KEY_DIR}/${DATASTORE}.keyfile"
proxmox-backup-client restore \
--keyfile "${KEY_DIR}/${DATASTORE}.keyfile" \
"${BACKUP_TYPE}/${VM_ID_ORIGINAL}/${SNAP_ID}" \
"$CONF_FILE_IN_BACKUP" \
"$CONFIG_TMP" \
2>&1 || echo " WARNUNG: proxmox-backup-client restore fehlgeschlagen (exit $?)"
if [[ -f "$CONFIG_TMP" ]]; then
CONFIG_VM_NAME=$(grep -m1 "$NAME_KEY" "$CONFIG_TMP" 2>/dev/null \
| awk -F': ' '{print $2}' | tr -d '[:space:]' || echo "")
rm -f "$CONFIG_TMP"
echo " VM-Name: ${CONFIG_VM_NAME:-unbekannt}"
else
echo " Config nicht lesbar überspringe ZIP-Check."
fi
# Prüfen ob ZIP bereits vorhanden
if [[ -n "$CONFIG_VM_NAME" ]]; then
ZIP_CHECK="${RSYNC_TARGET}/${LAST_DATE}/${CONFIG_VM_NAME}-${VM_ID_ORIGINAL}.7z"
if [[ "$SKIP_RSYNC" == "1" ]]; then
if [[ -f "$ZIP_CHECK" ]]; then
echo " ZIP bereits vorhanden (lokal): $ZIP_CHECK"
VM_NAME="$CONFIG_VM_NAME"
ZIP_SIZE_BYTES=$(stat -c%s "$ZIP_CHECK" 2>/dev/null || echo "0")
RSYNC_OK="true"
RSYNC_SIZE_BYTES=$ZIP_SIZE_BYTES
QM_AGENT_OK="skipped"
trap - ERR
send_webhook "success" ""
exit 0
fi
else
if ssh "$BACKUP_SERVER_HOST" "test -f '$ZIP_CHECK'" 2>/dev/null; then
echo " ZIP bereits vorhanden (remote): $ZIP_CHECK"
VM_NAME="$CONFIG_VM_NAME"
ZIP_SIZE_BYTES=$(ssh "$BACKUP_SERVER_HOST" \
"stat -c%s '$ZIP_CHECK'" 2>/dev/null || echo "0")
RSYNC_OK="true"
RSYNC_SIZE_BYTES=$ZIP_SIZE_BYTES
QM_AGENT_OK="skipped"
trap - ERR
send_webhook "success" ""
exit 0
fi
fi
echo " Kein vorhandenes ZIP starte vollständigen Restore."
fi
# ═════════════════════════════════════════════════════════════════════════════
# [3/13] RESTORE
# ═════════════════════════════════════════════════════════════════════════════
echo ""
echo "$(date '+%Y-%m-%d %H:%M:%S') ==> [3/13] Restore vom PBS-Storage ($BACKUP_TYPE)..."
echo " Backup-Ref: $PVE_BACKUP_REF"
echo " Storage: $RESTORE_PATH"
echo " ID: $VM_ID_RESTORED"
RESTORE_START_INNER=$(date +%s)
if [[ "$BACKUP_TYPE" == "ct" ]]; then
pct restore "$VM_ID_RESTORED" "$PVE_BACKUP_REF" \
--storage "$RESTORE_PATH" \
--unique 1 \
2>&1
else
qmrestore "$PVE_BACKUP_REF" "$VM_ID_RESTORED" \
--storage "$RESTORE_PATH" \
--unique 1 \
2>&1
fi
RESTORE_DURATION=$(( $(date +%s) - RESTORE_START_INNER ))
echo " Restore abgeschlossen in ${RESTORE_DURATION}s"
# ═════════════════════════════════════════════════════════════════════════════
# [4/13] IMAGE_DIR DYNAMISCH ERMITTELN
# ═════════════════════════════════════════════════════════════════════════════
echo ""
echo "$(date '+%Y-%m-%d %H:%M:%S') ==> [4/13] Ermittle Image-Verzeichnis..."
STORAGE_BASE=$(pvesh get "/storage/${RESTORE_PATH}" --output-format json \
2>/dev/null | python3 -c "
import json, sys
cfg = json.load(sys.stdin)
print(cfg.get('path', ''))
" 2>/dev/null || echo "")
if [[ -n "$STORAGE_BASE" ]]; then
if [[ "$BACKUP_TYPE" == "ct" ]]; then
IMAGE_DIR=""
for candidate in \
"${STORAGE_BASE}/images/${VM_ID_RESTORED}" \
"${STORAGE_BASE}/private/${VM_ID_RESTORED}" \
"${STORAGE_BASE}/rootdir/${VM_ID_RESTORED}"; do
if [[ -d "$candidate" ]] && [[ -n "$(ls -A "$candidate" 2>/dev/null)" ]]; then
IMAGE_DIR="$candidate"
echo " CT-Image gefunden: $IMAGE_DIR"
break
else
echo " Nicht gefunden: $candidate"
fi
done
if [[ -z "$IMAGE_DIR" ]]; then
IMAGE_DIR=$(find "$STORAGE_BASE" -maxdepth 2 -type d \
-name "$VM_ID_RESTORED" 2>/dev/null | head -1 || echo "")
[[ -n "$IMAGE_DIR" ]] && echo " CT-Image via find: $IMAGE_DIR"
fi
else
IMAGE_DIR="${STORAGE_BASE}/images/${VM_ID_RESTORED}"
fi
else
if [[ "$BACKUP_TYPE" == "ct" ]]; then
IMAGE_DIR="/var/lib/vz/private/${VM_ID_RESTORED}"
else
IMAGE_DIR="/var/lib/vz/images/${VM_ID_RESTORED}"
fi
echo " WARNUNG: Storage-Pfad nicht ermittelt, Fallback: $IMAGE_DIR"
fi
if [[ -z "$IMAGE_DIR" ]]; then
if [[ "$BACKUP_TYPE" == "ct" ]]; then
IMAGE_DIR="/var/lib/vz/private/${VM_ID_RESTORED}"
else
IMAGE_DIR="/var/lib/vz/images/${VM_ID_RESTORED}"
fi
echo " WARNUNG: Fallback: $IMAGE_DIR"
fi
echo " Image-Dir: $IMAGE_DIR"
ACTUAL_DISK_BYTES=$(du -sb "$IMAGE_DIR" 2>/dev/null | awk '{print $1}' || echo "0")
echo " Image-Größe: $(( ACTUAL_DISK_BYTES / 1024 / 1024 / 1024 )) GB"
# ═════════════════════════════════════════════════════════════════════════════
# [5/13] IMAGES PRÜFEN
# ═════════════════════════════════════════════════════════════════════════════
echo ""
echo "$(date '+%Y-%m-%d %H:%M:%S') ==> [5/13] Prüfe Images..."
if [[ ! -d "$IMAGE_DIR" ]] || [[ -z "$(ls -A "$IMAGE_DIR" 2>/dev/null)" ]]; then
ERROR_MSG="IMAGE_DIR leer oder nicht vorhanden: $IMAGE_DIR"
echo " FEHLER: $ERROR_MSG"
if [[ "$BACKUP_TYPE" == "ct" ]]; then
pct destroy "$VM_ID_RESTORED" --purge 1 2>/dev/null || true
else
qm destroy "$VM_ID_RESTORED" \
--destroy-unreferenced-disks 1 --purge 1 2>/dev/null || true
fi
trap - ERR
send_webhook "failed" "$ERROR_MSG"
exit 0
fi
echo " Images vorhanden ✓"
# ═════════════════════════════════════════════════════════════════════════════
# [6/13] VORBEREITEN
# ═════════════════════════════════════════════════════════════════════════════
echo ""
echo "$(date '+%Y-%m-%d %H:%M:%S') ==> [6/13] Vorbereiten ($BACKUP_TYPE)..."
if [[ "$BACKUP_TYPE" == "ct" ]]; then
pct unlock "$VM_ID_RESTORED" 2>/dev/null || true
pct stop "$VM_ID_RESTORED" 2>/dev/null || true
sleep 3
for ((net=0; net<=10; net++)); do
pct set "$VM_ID_RESTORED" --delete "net${net}" 2>/dev/null || true
done
echo " CT vorbereitet (Netzwerkkarten entfernt)."
else
qm unlock "$VM_ID_RESTORED" 2>/dev/null || true
qm stop "$VM_ID_RESTORED" 2>/dev/null || true
sleep 3
qm set "$VM_ID_RESTORED" -delete cdrom 2>/dev/null || true
qm set "$VM_ID_RESTORED" -delete ide0 2>/dev/null || true
for ((net=0; net<=10; net++)); do
qm set "$VM_ID_RESTORED" -delete "net${net}" 2>/dev/null || true
done
qm set "$VM_ID_RESTORED" --agent "enabled=1,type=virtio" 2>/dev/null || true
echo " VM vorbereitet (Netzwerkkarten entfernt, Agent aktiviert)."
fi
# ═════════════════════════════════════════════════════════════════════════════
# [7/13] STARTEN & PRÜFEN
# ═════════════════════════════════════════════════════════════════════════════
echo ""
echo "$(date '+%Y-%m-%d %H:%M:%S') ==> [7/13] Starte & prüfe ($BACKUP_TYPE)..."
if [[ "$BACKUP_TYPE" == "ct" ]]; then
pct start "$VM_ID_RESTORED" 2>/dev/null || true
sleep 10
if pct status "$VM_ID_RESTORED" 2>/dev/null | grep -q "running"; then
QM_AGENT_OK="true"
echo " CT läuft ✓"
CT_HOSTNAME=$(pct exec "$VM_ID_RESTORED" -- hostname 2>/dev/null || echo "unbekannt")
echo " Hostname: $CT_HOSTNAME"
else
QM_AGENT_OK="false"
echo " CT nicht gestartet."
fi
else
qm start "$VM_ID_RESTORED" 2>/dev/null || true
AGENT_WAIT=0
AGENT_MAX=120
AGENT_INTERVAL=10
while [[ $AGENT_WAIT -lt $AGENT_MAX ]]; do
sleep $AGENT_INTERVAL
AGENT_WAIT=$(( AGENT_WAIT + AGENT_INTERVAL ))
echo -n " [${AGENT_WAIT}s/${AGENT_MAX}s] qm-Agent... "
if qm agent "$VM_ID_RESTORED" ping 2>/dev/null | grep -qi "pong\|ping"; then
QM_AGENT_OK="true"
echo "ONLINE ✓"
hostname_info=$(qm agent "$VM_ID_RESTORED" get-host-name 2>/dev/null \
| grep host-name | tr -d '"' || true)
echo " Hostname: ${hostname_info:-unbekannt}"
break
else
echo "nicht erreichbar."
fi
done
[[ "$QM_AGENT_OK" == "false" ]] && \
echo " qm-Agent nicht erreichbar qm_agent_ok=false in DB."
fi
# ═════════════════════════════════════════════════════════════════════════════
# [8/13] STOPPEN
# ═════════════════════════════════════════════════════════════════════════════
echo ""
echo "$(date '+%Y-%m-%d %H:%M:%S') ==> [8/13] Stoppe $BACKUP_TYPE..."
if [[ "$BACKUP_TYPE" == "ct" ]]; then
pct stop "$VM_ID_RESTORED" 2>/dev/null || true
sleep 10
else
# Graceful shutdown mit 2 Minuten Timeout, danach force-stop
qm shutdown "$VM_ID_RESTORED" --timeout 120 2>/dev/null || true
# Prüfen ob VM noch läuft → force-stop
if qm status "$VM_ID_RESTORED" 2>/dev/null | grep -q "running"; then
echo " VM läuft noch nach 120s force stop..."
qm stop "$VM_ID_RESTORED" --skiplock 1 2>/dev/null || true
sleep 5
fi
fi
echo " Gestoppt."
# ═════════════════════════════════════════════════════════════════════════════
# [9/13] CONFIG SICHERN
# ═════════════════════════════════════════════════════════════════════════════
echo ""
echo "$(date '+%Y-%m-%d %H:%M:%S') ==> [9/13] Config sichern..."
if [[ "$BACKUP_TYPE" == "ct" ]]; then
PVE_CONF="/etc/pve/lxc/${VM_ID_RESTORED}.conf"
CONF_FILENAME="lxc.conf"
VM_NAME=$(grep -m1 "^hostname:" "$PVE_CONF" 2>/dev/null \
| awk -F': ' '{print $2}' | tr -d '[:space:]' \
|| echo "${CONFIG_VM_NAME:-$SAFE_CLIENT}")
else
PVE_CONF="/etc/pve/qemu-server/${VM_ID_RESTORED}.conf"
CONF_FILENAME="qemu-server.conf"
VM_NAME=$(grep -m1 "^name:" "$PVE_CONF" 2>/dev/null \
| awk -F': ' '{print $2}' | tr -d '[:space:]' \
|| echo "${CONFIG_VM_NAME:-$SAFE_CLIENT}")
fi
if [[ -f "$PVE_CONF" ]]; then
cp "$PVE_CONF" "${IMAGE_DIR}/${CONF_FILENAME}"
echo " Config gesichert: ${IMAGE_DIR}/${CONF_FILENAME}"
else
echo " WARNUNG: Config nicht gefunden: $PVE_CONF"
fi
echo " Name: $VM_NAME"
# ═════════════════════════════════════════════════════════════════════════════
# [10/13] 7Z-ARCHIV
# ═════════════════════════════════════════════════════════════════════════════
echo ""
echo "$(date '+%Y-%m-%d %H:%M:%S') ==> [10/13] Erstelle verschlüsseltes 7z-Archiv (mx=${COMPRESS_LEVEL})..."
ZIP_FILE="${ZIP_DIR}/${VM_NAME}-${VM_ID_ORIGINAL}.7z"
ZIP_START=$(date +%s)
7z a -t7z \
-mmt=${MMT_THREADS} \
-mx=${COMPRESS_LEVEL} \
-md=16M \
-p"${ZIP_PASSWORD}" \
-mhe=on \
"$ZIP_FILE" \
"${IMAGE_DIR}/"* \
2>&1 | tail -5
ZIP_DURATION=$(( $(date +%s) - ZIP_START ))
ZIP_SIZE_BYTES=$(stat -c%s "$ZIP_FILE" 2>/dev/null || echo "0")
echo " ZIP: $(( ZIP_SIZE_BYTES / 1024 / 1024 )) MB in ${ZIP_DURATION}s"
# ═════════════════════════════════════════════════════════════════════════════
# [11/13] RSYNC ZUM BACKUP-SERVER
# ═════════════════════════════════════════════════════════════════════════════
echo ""
RSYNC_TARGET_DATE="${RSYNC_TARGET}/${LAST_DATE}"
echo "$(date '+%Y-%m-%d %H:%M:%S') ==> [11/13] Rsync / Datei-Transfer..."
if [[ "$SKIP_RSYNC" == "1" ]]; then
echo " Lokaler Modus: ZIP bereits in ${RSYNC_TARGET_DATE} kein Rsync."
RSYNC_OK="true"
RSYNC_SIZE_BYTES=$ZIP_SIZE_BYTES
echo " Groesse: $(( RSYNC_SIZE_BYTES / 1024 / 1024 )) MB"
else
MAX_RETRIES=3
rsync_transfer() {
rsync -avz --progress --timeout=300 \
"$ZIP_FILE" \
"${BACKUP_SERVER_HOST}:${RSYNC_TARGET_DATE}/" \
2>&1
}
ssh "$BACKUP_SERVER_HOST" "mkdir -p '${RSYNC_TARGET_DATE}'" 2>/dev/null || true
while [[ $RSYNC_RETRIES -lt $MAX_RETRIES ]]; do
if rsync_transfer; then
RSYNC_OK="true"
RSYNC_SIZE_BYTES=$ZIP_SIZE_BYTES
echo " Rsync OK: $(( RSYNC_SIZE_BYTES / 1024 / 1024 )) MB"
break
else
RSYNC_RETRIES=$(( RSYNC_RETRIES + 1 ))
if [[ $RSYNC_RETRIES -lt $MAX_RETRIES ]]; then
echo " Fehlgeschlagen ($RSYNC_RETRIES/$MAX_RETRIES). Warte 60s..."
sleep 60
else
RSYNC_OK="false"
STATUS="failed"
ERROR_MSG="Rsync fehlgeschlagen nach ${RSYNC_RETRIES} Versuchen"
echo " FEHLER: $ERROR_MSG"
fi
fi
done
if [[ "$RSYNC_OK" == "true" ]]; then
REMOTE_SIZE=$(ssh "$BACKUP_SERVER_HOST" \
"stat -c%s '${RSYNC_TARGET_DATE}/$(basename "$ZIP_FILE")'" \
2>/dev/null || echo "0")
if [[ "$REMOTE_SIZE" != "$ZIP_SIZE_BYTES" ]]; then
echo " WARNUNG: Remote ${REMOTE_SIZE}B != lokal ${ZIP_SIZE_BYTES}B"
RSYNC_OK="false"
STATUS="failed"
ERROR_MSG="Groessenabweichung: lokal=${ZIP_SIZE_BYTES} remote=${REMOTE_SIZE}"
else
echo " Groessenprüfung OK: ${REMOTE_SIZE} Bytes."
fi
fi
fi
# ═════════════════════════════════════════════════════════════════════════════
# [12/13] AUFRÄUMEN
# ═════════════════════════════════════════════════════════════════════════════
echo ""
echo "$(date '+%Y-%m-%d %H:%M:%S') ==> [12/13] Aufräumen..."
if [[ "$BACKUP_TYPE" == "ct" ]]; then
pct destroy "$VM_ID_RESTORED" --purge 1 \
2>/dev/null || echo " CT $VM_ID_RESTORED nicht mehr vorhanden."
else
qm destroy "$VM_ID_RESTORED" \
--destroy-unreferenced-disks 1 \
--purge 1 \
2>/dev/null || echo " VM $VM_ID_RESTORED nicht mehr vorhanden."
fi
if [[ "$SKIP_RSYNC" == "1" ]]; then
echo " ${BACKUP_TYPE^^} ${VM_ID_RESTORED} entfernt. ZIP bleibt am Zielort."
else
rm -f "$ZIP_FILE"
echo " ${BACKUP_TYPE^^} ${VM_ID_RESTORED} entfernt, ZIP gelöscht."
fi
# ── Zusammenfassung & Webhook ─────────────────────────────────────────────────
TOTAL=$(( $(date +%s) - RESTORE_START ))
echo ""
echo "============================================================"
echo " Status: $STATUS"
echo " Typ: $BACKUP_TYPE"
echo " Gesamtdauer: ${TOTAL}s"
echo " Name: ${VM_NAME:-$SAFE_CLIENT}"
echo " Image-Dir: $IMAGE_DIR"
echo " qm-Agent/CT: $QM_AGENT_OK"
echo " Rsync: $RSYNC_OK (Versuche: $RSYNC_RETRIES)"
echo " ZIP: $(( ZIP_SIZE_BYTES / 1024 / 1024 )) MB"
echo " 7z-Level: mx=${COMPRESS_LEVEL} mmt=${MMT_THREADS}"
[[ -n "$ERROR_MSG" ]] && echo " Fehler: $ERROR_MSG"
echo "============================================================"
trap - ERR
send_webhook "$STATUS" "$ERROR_MSG"
+25
View File
@@ -0,0 +1,25 @@
# yaml-language-server: $schema=wmill.schema.json
defaultTs: bun
includes:
- "f/**"
excludes: []
skipVariables: false
skipResources: false
skipResourceTypes: false
skipSecrets: true
skipScripts: false
skipFlows: false
skipApps: false
skipFolders: false
skipWorkspaceDependencies: false
nonDottedPaths: true
codebases: []
gitBranches:
main:
overrides: {}