1.0.0-alpha.15

Prerelease
View changes

Fixed

  • Health status remained “unknown” after compose drift redeploy, auto-start, and force redeploy

  • Only the image update path called verifyAfterDeploy() — three other compose paths skipped it

  • Added verifyHealthAfterCompose() that polls container state after compose up/restart

  • Respects Docker HEALTHCHECK if present, falls back to running state

  • Applied to all 4 compose paths: image update, compose drift, auto-start (no containers), auto-start (stuck), force redeploy

Changed

  • api.address is now []string instead of string

  • api.port field removed — port is part of each address string

  • Supports multiple listen addresses (one HTTP server per address, shared mux)

  • All servers shut down gracefully on SIGTERM/SIGINT

  • Default: ["127.0.0.1:9090"]

1.0.0-alpha.13

Prerelease
View changes

Fixed

  • CRITICAL: Restored type="module" attribute on data-star script tag

  • Previous commit (fcd1235) incorrectly removed the module type

  • This caused browser error: SyntaxError: Unexpected keyword 'export'

  • Data-star requires ES6 module loading to work correctly

    Technical Details

    The Problem

    When the data-star library is loaded without type="module":

    <!-- BROKEN (alpha.12 deployed version) -->
    <script src="https://cdn.jsdelivr.net/gh/starfederation/datastar@1.0.0-RC.8/bundles/datastar.js"></script>
    

    The browser tries to execute the ES6 module code as a regular script, causing:

    [Error] SyntaxError: Unexpected keyword 'export'
        (anonymous function) (datastar.js:8)
    

    The Fix

    Correctly load data-star as an ES6 module:

    <!-- FIXED (alpha.13) -->
    <script type="module" src="https://cdn.jsdelivr.net/gh/starfederation/datastar@1.0.0-RC.8/bundles/datastar.js"></script>
    

    All three script tags in the UI now properly use type="module":

    1. Data-star library load
    2. SSE initialization script
    3. Helper functions script

    Migration Notes

    Critical fix - If you deployed alpha.12, you must upgrade to alpha.13 immediately. The UI is completely broken in alpha.12.

1.0.0-alpha.12

Prerelease
View changes

Fixed

  • Fully functional data-star UI with SSE integration

  • Initial status fetch from /status endpoint on page load

  • Real-time service updates via SSE at /ui/stream

  • Periodic updates every 30 seconds

  • Live audit events stream

    Service Actions

  • All action buttons working correctly:

  • Trigger: Manual update check for a service

  • Redeploy: Force redeploy with command preview

  • Unblock: Unblock failed digest

    Visual Features

  • Connection status indicator (connected/error/reconnecting)

  • Resource usage bars with color coding:

  • Green: Normal usage

  • Orange: Warning (CPU >60%, MEM >70%)

  • Red: Danger (CPU >80%, MEM >85%)

  • Config flags visualization (U/H/S badges)

  • Container details display per service

  • Events table with real-time audit log

Added

  • data-store - Reactive state container

  • data-text - Reactive text content

  • data-show - Conditional visibility

  • data-each - Loop over arrays

  • data-on-click - Event handlers

  • data-class - Dynamic CSS classes

  • data-style - Dynamic inline styles

    Migration Notes

    No breaking changes - This release completes the UI implementation started in alpha.11.

1.0.0-alpha.11

Prerelease
View changes

Changed

  • Replaced old template-based UI with modern data-star reactive UI

  • Now uses data-star framework (1.0.0-RC.8) for reactive updates

  • Real-time updates via Server-Sent Events at /ui/stream

  • Cleaner, more maintainable codebase (~400 lines removed)

    Endpoint Changes

  • /ui now serves data-star UI (previously served template-based UI)

  • Removed /ui/v2 endpoint (data-star UI is now the default)

  • SSE endpoint changed from /ui/events to /ui/stream for consistency

Removed

  • Old template-based UI code (~400 lines of template HTML)

  • uiData struct and uiTemplate variable (no longer needed with data-star)

  • handleUIDataStar function (merged into handleUI)

  • Unused imports: html/template, os

    Technical Details

    Data-Star Integration

    The new UI uses the data-star reactive framework:

    <script type="module" src="https://cdn.jsdelivr.net/gh/starfederation/datastar@1.0.0-RC.8/bundles/datastar.js"></script>
    

    Features:

  • Reactive state management

  • Real-time SSE updates from /ui/stream

  • Automatic DOM updates without page refresh

  • Modern, declarative templating

    Migration Notes

    For API consumers:

  • ✅ No breaking changes - all JSON endpoints remain unchanged

  • /status, /metrics, /health work exactly as before For Web UI users:

  • Navigate to /ui for the new reactive interface

  • Old bookmarks to /ui will automatically show the new UI

  • /ui/v2 endpoint removed (no longer needed)

    Files Modified

    internal/watcher/api.go     -432 lines, +14 lines (net: -418 lines)
    internal/watcher/static/    +1 file (datastar.js, for future vendoring)
    

    Testing

    # Build and test
    go build -o dockward ./cmd/dockward
    ./dockward -config config.json
    # Access new UI
    open http://localhost:8080/ui
    # Verify SSE updates
    curl http://localhost:8080/ui/stream
    

    This release completes the Web UI modernization effort started in alpha.8, providing a more reactive and maintainable interface for monitoring dockward services. The data-star framework enables a better user experience with automatic updates and cleaner code architecture.

1.0.0-alpha.10

Prerelease
View changes

Added

  • Per-container CPU and memory stats - ContainerInfo now includes individual resource usage for each container

  • Prometheus metric watcher_invalid_services_total - tracks count of services that failed config validation

  • Container ID field added to ContainerInfo struct for resource stats lookup

Fixed

  • CRITICAL: Web UI template error - added missing Now field to uiData struct (template rendering was completely broken)

  • CPU/memory stats now displayed per-container instead of aggregated at service level

  • Monitor now stores per-container resource usage keyed by container ID

Changed

  • Resource stats collection enhanced to track individual container metrics

  • API enriches container list with CPU/memory stats from monitor

  • Container details in web UI now show accurate per-container resource usage

    Technical Details

    Per-Container Stats Implementation

    The monitor now maintains two maps:

  • latest map[string]ServiceStats - Service-level aggregated stats (unchanged)

  • containerStats map[string]ContainerStats - Per-container stats keyed by container ID API flow:

    1. Monitor collects stats for each container and stores by ID
    2. Updater lists containers and includes container ID in ContainerInfo
    3. API enriches containers with stats during snapshot creation
    4. Web UI displays CPU/memory per container in expandable details

    Invalid Services Metric

    New Prometheus metric exposed via /metrics:

    # HELP watcher_invalid_services_total Number of services that failed config validation
    # TYPE watcher_invalid_services_total gauge
    watcher_invalid_services_total 1
    

    Enables alerting on misconfigured services that were skipped during startup.

1.0.0-alpha.9

Prerelease
View changes

Fixed

  • CRITICAL: Config validation no longer kills entire process when one service has invalid config (missing compose file, etc.)

  • Invalid services are now logged as warnings and skipped, allowing valid services to continue monitoring

  • /health endpoint now exposes config_warnings array listing any services that failed validation

Changed

  • Config validation behavior: service-level errors (missing files, invalid paths) are non-fatal and collected in InvalidServices

  • Only global validation errors (invalid runtime) remain fatal

  • Startup logs now show clear warnings for each skipped service with reason

1.0.0-alpha.8

Prerelease
View changes

Added

  • Docker daemon health checks with configurable intervals (docker_health.check_interval and docker_health.timeout in config)

  • /health endpoint now includes Docker daemon connectivity status and returns 503 Service Unavailable when Docker is down

  • Three new Prometheus metrics: docker_daemon_healthy, docker_daemon_consecutive_failures, docker_daemon_checks_total

  • Graceful shutdown coordinator ensures in-flight deployments complete before process exit (30-second timeout)

  • HTTP request body size limit (1 MB max) with 413 response on exceeded limit

  • HTTP request timeout enforcement (30s default for most endpoints, no timeout for SSE)

  • SSE connection resource limits (max 100 concurrent connections, 24-hour timeout)

  • SSE connection tracking and graceful disconnection on shutdown

Changed

  • /health endpoint response now includes status and components structure with detailed health information

  • Shutdown behavior: SIGTERM/SIGINT now trigger graceful shutdown instead of immediate termination

  • API handlers wrapped with timeout contexts and body size limiters for DoS protection

Security

  • CRITICAL: Fixed command injection vulnerability in compose execution by validating project names and file paths

  • Compose project names restricted to [a-zA-Z0-9_-]{1,64} pattern

  • Compose file paths must be absolute with no path traversal attempts (.. forbidden)

  • Env file paths validated with same security constraints

  • All file paths verified to exist and be regular files at config load time

Fixed

  • Shutdown no longer interrupts in-flight deployments mid-operation

  • Audit logs are flushed to disk before shutdown completes

  • HTTP and SSE connections close gracefully during shutdown

  • SSE connections no longer leak when clients disconnect unexpectedly

  • Audit logger double-unlock mutex bug (deferred unlock removed)

1.0.0-alpha.7

Prerelease
View changes

Fixed

  • Status refresh (every 15s) no longer collapses expanded container <details> rows; open state is preserved across innerHTML replacement via data-svc keying

  • Toggling a container list no longer causes horizontal layout shift in the services table; replaced nested <table> inside <details> with flex <div> rows to eliminate outer table column reflow

1.0.0-alpha.6

Prerelease
View changes

Added

  • images []string field on Service — replaces the removed image string field. Supports multiple registry images per compose project; one deploy cycle is triggered when any image changes

  • silent bool field on Service — excludes the service from validation and monitoring entirely. Healer and monitor skip silent services

  • Per-container stats aggregation in the monitor: CPU is summed across all containers; memory is derived from summed usage/limit bytes for accuracy

  • Web UI: service Name cell now contains a collapsible <details> block listing each container’s name, state, and status

Changed

  • blocked and not_found suppression map keys are now "service/image" (e.g. "myapp/api:latest") instead of "service", allowing independent state per image within a service

  • deployed map keys are now "service/image"; /status and web UI show the first matched image’s reference and digest

  • Monitor resource alerts are now per-container with independent cooldown keys, preventing one container from suppressing alerts for another

Fixed

  • Rollback image tagging failed when a compose service uses a short image reference (e.g. firegen:latest without a registry prefix); updater now resolves the full registry-prefixed reference before tagging the rollback image

Removed

  • image string field on Service — use images []string (breaking change)

  • compose_file string field on Service — use compose_files []string (breaking change)

  • container_uptime field from /status JSON and web UI; replaced by the live containers array

1.0.0-alpha.5

Prerelease
View changes

Added

  • image, image_digest, and container_uptime fields per service in GET /status JSON and the web UI; populated from the Docker list API on each poll cycle and from the inspect API after a successful deploy

1.0.0-alpha.4

Prerelease
View changes

Added

  • registry.stats_interval config field: decouples container resource stat collection from the registry poll interval. Defaults to poll_interval when unset. Set lower (e.g. 30) to get fresher CPU/memory data in the web UI and /status endpoint without increasing registry poll frequency

1.0.0-alpha.3

Prerelease
View changes

Fixed

  • Prometheus /metrics counters (updates_total, rollbacks_total, restarts_total, failures_total, service_blocked) only appeared for services with at least one event; all configured services are now pre-seeded to 0 at startup so every label combination is present from the first scrape

1.0.0-alpha.2

Prerelease
View changes

Added

  • Dark/light theme toggle in the agent web UI; preference persisted to localStorage; respects prefers-color-scheme as default

  • --verbose flag: noisy healer skip, cooldown, and deploy-guard log lines are suppressed by default and only emitted when --verbose is set

  • Quick start guide

Fixed

  • Health status shows “unknown” on boot until a Docker event fires; healer now inspects all configured containers at startup to seed health gauges immediately

  • CPU and memory always show “–” when no alert thresholds are configured; stats are now collected for all running containers on every poll cycle regardless of threshold config

  • Monitor stats show “–” for the entire first poll interval after startup; monitor now polls immediately on start before entering the ticker loop

  • Trigger and Unblock buttons caused an infinite browser spinner by triggering a page reload that reconnected the SSE stream; replaced form submissions with fetch calls

  • Table headers and “unknown” status were near-invisible in dark mode; fixed with CSS custom properties

Removed

  • Stale upgrade guide

1.0.0-alpha.1

Prerelease
View changes

Added

  • GET /audit endpoint returning recent audit entries as JSON (?limit=N, default 100, max 500); returns empty array when audit is disabled

  • Agent web UI SSE stream (GET /ui/events): live audit entries pushed to the browser via Server-Sent Events; replays the last 50 entries on connect

  • Agent web UI: replaced <meta http-equiv="refresh"> full-page reload with SSE live event feed and a 15-second fetch-based status table refresh

  • Shared internal/hub package: SSE publish-subscribe hub extracted from internal/warden; imported by both watcher and warden

  • audit.Broadcaster interface and Logger.WithBroadcast to fan out new entries to the local SSE hub without an import cycle

  • dockward-warden.service: systemd unit for running dockward in warden mode

  • linux/arm64 binary added to release pipeline (OVH Ampere, Raspberry Pi)

  • Watcher test coverage: api_test.go, updater_test.go, healer_test.go

View changes

Added

  • Central warden mode: aggregates audit entries from multiple dockward agents via HTTP push

  • --mode agent|warden flag; agent mode is the default (backward compatible)

  • Agent push config block (push.warden_url, push.token, push.machine_id): when warden_url is set, every audit entry is forwarded to the warden asynchronously

  • internal/push package: HTTP client that POSTs audit entries to warden /ingest

  • audit.Pusher interface and Logger.WithPush to decouple push client from audit package

  • Warden HTTP server with four endpoints: POST /ingest, GET /events (SSE), GET / (dashboard), GET /health

  • SSE hub: fan-out broadcaster; replays last 50 events on new connection

  • In-memory ring buffer (200 events) with per-agent connectivity state

  • Heartbeat poller: polls each agent GET /health every 30s; emits agent_online / agent_offline synthetic entries on state transitions

  • Multi-machine dashboard: per-agent status cards, real-time SSE event feed, machine and level filters

  • warden.sample.json: sample warden config

View changes

Added

  • Audit log: structured JSON Lines file (audit.path config field) recording deploy, rollback, heal, and resource alert events

  • GET /audit API endpoint returning the last 100 audit entries as JSON

  • Compose file watcher: re-deploys a service when its compose file content changes without pulling a new image (compose_watch: true)

  • Resource alerts: configurable CPU and memory thresholds per service (cpu_threshold, memory_threshold); sends notifications when exceeded

  • Web UI dashboard: served at GET /ui on the API port; shows service status, last event, and audit log; live updates over SSE

  • Interactive config wizard: dockward config [--config <path>] subcommand to create or edit config files interactively

  • GET /status endpoint: unified status response with per-service health, deploy state, and resource metrics

Fixed

  • Audit entries now written on rollback failure paths (previously missing)

View changes

Added

  • compose_files config field accepts an ordered list of compose files merged left to right (e.g. base + override). compose_file (singular) remains supported for backward compatibility

Fixed

  • Healer hallucination: repeated “recovered” alerts fired on every Docker health event during the cooldown window. The cooldown entry is now consumed on first use

  • Healer missing recovery alert: degraded state was not cleared when a healthy event arrived during an active deploy cycle, leaving the flag stuck. State is now always cleaned up on healthy, regardless of deploy status

  • Healer double notification: when the healer restarted a container, both verifyAfterRestart and the healthy event handler could send a recovery alert. verifyAfterRestart now skips the notification if the healthy event already handled it

View changes

Added

  • GET /not-found API endpoint to list services with unresolvable local digests

Fixed

  • Updater poll spam: when local image digest cannot be resolved, the updater no longer calls deploy every poll cycle. A notFound suppression map tracks the remote digest at time of failure and silences retries until the registry digest changes

  • Healer noise during deploys: handleHealthy now checks IsDeploying() to suppress spurious “Container recovered and is healthy” notifications during legitimate deploys

View changes

Fixed

  • Updater redeploying every poll cycle due to url.PathEscape encoding / in image names, causing InspectImage to fail on Docker API lookups

  • Removed url.PathEscape from image path parameters (InspectImage, TagImage, RemoveImage) to match Docker SDK behavior

  • Added container-based fallback for local digest resolution: if image inspect by reference fails, resolves via running container’s image ID

  • Swallowed errors from InspectImage now logged for debuggability

View changes

Added

  • Heal-only mode: monitor and auto-restart containers by name without compose or registry

  • container_name service config field for matching standalone containers

  • heal_max_restarts config field (default 3) to cap consecutive failed restart attempts

  • Healer sends “gave up” notification after max restarts exceeded, resets on healthy recovery

Changed

  • compose_file and compose_project only required when auto_update is true

  • findServiceByEvent matches both compose project label and container name

Fixed

  • Healer restart loop: previously restarted unhealthy containers indefinitely after each cooldown expiry

Added

  • Registry polling with digest comparison (remote vs local)

  • Auto-deploy via docker compose pull/up on image change

  • Rollback on unhealthy or non-running container after grace period

  • Blocked digest tracking to prevent infinite rollback loops

  • Auto-clear blocked digest when new registry digest appears

  • Atomic deploy guard preventing poll/API race conditions

  • Label-based container matching via com.docker.compose.project

  • Health polling every 5s during grace period (fail fast on unhealthy)

  • Auto-heal: Docker event listener restarts unhealthy containers with cooldown

  • Discord webhook notifications

  • SMTP email notifications

  • Custom webhook notifications with Go text/template body

  • Prometheus metrics endpoint (/metrics)

  • Trigger API: POST /trigger and POST /trigger/<service>

  • Blocked digest API: GET /blocked and DELETE /blocked/<service>

  • Health check endpoint (GET /health)

  • Systemd service unit

  • Version flag (-version) with build-time injection via ldflags