Spaces Docker Build Pauses and 503 Error on Restart

I am encountering an issue with my Docker based Hugging Face Space, Fara-BrowserUse-Demo (https://huggingface.co/spaces/HyperCluster/Fara-BrowserUse-Demo).

The Docker image builds successfully on my local machine and the container works with 2-3GB of RAM, which should be compatible with the ‘CPU basic’ Spaces tier.

However, the space gets paused immediately upon creation, and when I try to restart it, I receive a 503 error. There are no lines being written in the logs either other than build queued for the last 17 hours.

Based on what I read online, this could be related to either a build error or content issue. But since the Docker image builds without issues locally and it’s just an agentic browser use demo, I am not sure why this is occurring.

I tried creating another space from a Docker template in case it was some issue with my account, but that one builds and works perfectly.

Can anyone help me in resolving this issue?

1 Like

(Especially) In the case of a 503 error in Docker space, it may be an intentional block.


At a high level, what you are seeing is a mix of:

  • Spaces build-scheduler / infra bugs, especially for Docker Spaces.
  • Normal “app crashed / never became healthy” failure modes that surface as 503.
  • In a minority of cases, explicit HF-side blocking (“space is suspicious”).

Below is a breakdown focused on 2024–2025 patterns, plus concrete mitigations.


0. Your specific symptom pattern

  • Docker Space: HyperCluster/Fara-BrowserUse-Demo.

  • Locally: Docker image builds and runs in 2–3 GB RAM.

  • On Spaces:

    • On creation, the Space is immediately “Paused”.
    • Restart gives a 503 “Something went wrong”.
    • Build logs show only a “Build queued …” line for many hours; no real logs.
    • Other Docker Spaces under the same account build and run fine.

This combination is significant:

  • “Build queued” with no logs at all → build process never really starts.
  • 503 on restart while status is stuck on Building / Paused → router cannot attach to a healthy container, so HF front-end returns 503.
  • Another Docker Space works → it is not a generic “Docker is banned for this account” situation.

That puts you squarely into the same cluster as the 2024–2025 “stuck Building / no logs / Docker Space not triggering build” issues.


1. Main HF-side causes seen in 2024–2025

1.1 Build scheduler / stack bugs (especially for Docker Spaces)

Symptoms that match your case:

  • Build status stuck at “Building” or “Build queued” for a long time.
  • No build logs at all, or only the header line.
  • Same code builds fine in a clone / mirror Space with a different slug. (Hugging Face Forums)
  • In 2025, a GitHub issue describes a Docker Space with a valid Dockerfile that never triggers a build; build logs stay empty. (GitHub)

Forum examples:

  • “Space is permanently ‘Building’” – many Spaces stuck for weeks; workaround is to recreate the Space, same code, new slug. (Hugging Face Forums)
  • “Streamlit Docker space permanently in ‘Building’ state” – original Space never builds, mirror Space works fine; workaround is toggling SDK (docker → static → docker) or cloning. (Hugging Face Forums)
  • “Spaces and ‘Building’ stuck, infra side issue” – container image pushes successfully, then Space remains in Building; cloning resolves it, pointing to corrupted persistent storage or an internal flag. (Hugging Face Forums)
  • “Empty log and not building space” – multiple users with “Failed to load logs: Not Found” during an infra migration; HF later confirms it was an infra issue. (Hugging Face Forums)

Your own notes summarize this as “build pipeline behavior” issues: stuck builds, mis-reported status, and Spaces that only work when recreated.

In these cases the underlying bug is on HF’s side (builder queue / flags / cache / infra migration), not your Dockerfile.


1.2 Capacity or infra incidents

Separate from long-running code bugs, there are transient incidents:

  • “Space in permanent building state, no logs” – at that time, most Spaces could not be built; repeated Factory Reboot eventually revealed “not enough hardware capacity”, then things recovered once infra was fixed. (Hugging Face Forums)

Characteristic signs:

  • Many unrelated Spaces show the same behavior at the same time.
  • HF status page or forum shows broader incident reports.
  • Problem disappears without code changes.

For your Space:

  • You’ve confirmed other Spaces build and run normally.
  • The behavior is persistent over many hours.

So this looks more like a Space-specific scheduler bug than a broad incident.


1.3 Intentional content / abuse blocking (“space is suspicious”)

There is a separate, newer pattern where HF explicitly blocks a Space:

  • Example forum thread (Oct 2025): user repeatedly gets 503s and cannot restart or rebuild a Space (“CMW Copilot”). Build logs show a normal Docker build finishing, but the Space still cannot start. HF support replies that the Space is classified as “suspicious.” (Hugging Face Forums)
  • A community member then notes that in this case the errors are intentional, not misconfiguration, so there is nothing the user can do technically; only HF can lift or explain the block. (Hugging Face Forums)

Typical signals for this case:

  • Support explicitly mentions “space is suspicious”, “policy violation”, or similar.
  • Restart, Factory Rebuild, hardware changes, and code simplifications do nothing, even for a trivial “hello world” commit.
  • Sometimes build logs look fine, but the front-end still returns 503 for any restart.

For your Space:

  • Nobody has told you it is suspicious.
  • It’s an “agentic browser use” demo using Microsoft Fara-7B, which is a legitimate, documented “computer use agent” model. (Hugging Face)

So:

  • It could in principle be flagged if HF is cautious around fully automated browser agents, but that’s speculative.
  • You need explicit confirmation from HF support before you can conclude that this is a deliberate block.

2. App-side causes that can also produce 503 / stuck Building

Once the build actually runs (with logs), the usual failure modes are app-level. Even though your current symptom is “no logs”, you may still hit these later once the builder bug is worked around.

2.1 App never binds to the expected port / host

For Spaces to consider a Docker container healthy, the app must listen on the expected port, on 0.0.0.0. For Gradio / similar UIs this often means:

If the app binds only to 127.0.0.1, wrong port, or dies before serving, the health check fails and HF will:

  • Keep the Space in “Building” / “Error”.
  • Show 503 when you try to open or restart it.

2.2 Container crashes during startup

Other common startup failures (especially with complex agentic setups) include:

  • Missing or conflicting Python packages → ModuleNotFoundError, ImportError.
  • Heavy downloads or model loads at startup hitting memory limits / timeouts.
  • Repository > 1 GB or corrupt LFS history causing clone/build failures.

Your debug cheat-sheet notes that these show up as build logs entries:

  • Pip resolution errors.
  • LFS “repository storage limit reached”.
  • “Job failed with exit code 1” with missing Debian packages if your Dockerfile assumes old base images.

In your case, we don’t see such logs at all yet, which again points back to the builder never starting.


2.3 Server-side spec changes that make previously-working code fail

Especially relevant for GPU / ZeroGPU Spaces but conceptually similar for Docker:

  • ZeroGPU now enforces “no CUDA at import time / outside @spaces.GPU; old patterns like model.to("cuda") at the top level now raise ZeroGPU errors.
  • HF updated base OS images (e.g., Debian Trixie) and library versions, which broke older Dockerfiles or Hub APIs (cached_download removal in huggingface_hub) until users updated their code.

When you restart a Space days or weeks later, you are sometimes implicitly opting into a new base image / runtime behavior, so:

  • Code that was fine before can now fail on import.
  • Dockerfiles relying on removed packages or old APIs can break the build.

Again, you will see this only once you get real build logs; for now your builds are not starting.


3. Mapping to concrete causes for your Space

Given all the above, the most plausible cause ordering for your exact symptoms is:

  1. HF build-scheduler / stack bug for this specific Space/slug

    • Strong match: “Build queued, no logs” for 17+ hours, while other Docker Spaces work.
    • Strong external precedent: Docker and non-Docker Spaces where the original slug is “poisoned” but a cloned/mirror Space with identical code works. (Hugging Face Forums)
  2. Optional: persistent storage or internal flag corruption

    • Documented case: cloning a stuck Space immediately works, suggesting HF_HOME or some internal storage/cache is corrupted. (Hugging Face Forums)
  3. Once the build actually runs, ordinary app-level startup issues

    • Wrong host/port (e.g. missing GRADIO_SERVER_NAME=0.0.0.0).
    • Startup crashes due to dependencies, model downloads, or memory.
  4. Least likely, but important: explicit “suspicious” classification

    • Only if HF support explicitly tells you so (like in the CMW Copilot thread). (Hugging Face Forums)

4. Practical “solutions” and workarounds

I’ll separate these into “what you can do yourself” vs “what only HF can fix.”

4.1 What you can do yourself (today)

4.1.1 Prove it’s a Space/slug bug by cloning

These are the same tricks community members use for permanently-building Spaces:

  1. Clone / mirror the Space (you already know this pattern, but worth making it explicit):

    • Create a new Space with a different slug, SDK docker, same hardware.
    • Push exactly the same Dockerfile and code (or use the web “Duplicate this Space” option if available).
    • If the clone builds and runs, the original slug is almost certainly affected by a build stack / flag bug, not your Dockerfile. (Hugging Face Forums)
  2. If cloning works:

    • Treat the new Space as canonical.

    • Optionally:

      • Rename the old Space, set it to Private and Free CPU to “park” it, as suggested in the “permanently Building” thread. (Hugging Face Forums)
      • Update links in your README or docs to point to the new Space.

This is the cleanest workaround that many users are using in 2025.


4.1.2 Trigger a rebuild via SDK toggling (Docker ↔ static ↔ Docker)

From the Streamlit Docker thread: (Hugging Face Forums)

  • Changing sdk: docker to sdk: static in the Space configuration, pushing, then changing back to sdk: docker sometimes forces the builder to re-schedule the Space and clears a stuck state.
  • A similar trick is used for Gradio Spaces (sdk: gradio → static → gradio) when Factory Rebuild and Restart do nothing. (Hugging Face Forums)

For your Space:

  • This is worth trying once, especially if cloning/mirroring is inconvenient.
  • If it doesn’t fix the stuck build, don’t keep flipping SDK endlessly; move on to cloning and contacting support.

4.1.3 Audit the Dockerfile for “once logs exist” issues

Even though the current failure is likely infra, it is still valuable to ensure the Dockerfile and app will pass health checks once HF fixes the builder:

  1. Ensure your app:

    • Listens on 0.0.0.0 on the port HF exposes (typically via PORT env).
    • For Gradio: set ENV GRADIO_SERVER_NAME=0.0.0.0 and do not override port incorrectly. (Hugging Face Forums)
  2. Check that:

    • No extremely heavy work happens before the server starts (e.g., multi-GB downloads in __init__, synchronous calls to remote inference before binding the port).
    • All required Python dependencies are installed either in requirements.txt or via Dockerfile.
  3. Test locally in a clean environment:

    • Build and run the container with only the minimum environment variables HF will provide.
    • Confirm it comes up quickly and stays within 2–3 GB RAM, as you already observed.

Doing this now prevents you from hitting a second wave of issues after the builder bug is resolved.


4.2 What requires HF intervention

4.2.1 Stuck builder / empty logs that persist across clones

If after:

  • Cloning to a new Space,
  • SDK toggling,
  • Factory Rebuild,

you still see no build logs and constant “Build queued” for the new Space as well, you’re likely at a point where only HF can fix it.

Your own debugging notes already outline what to send to support:

  • Space URL(s) and slugs (original + clone).

  • Timestamps and screenshots of the stuck “Build queued” and 503 states.

  • Confirmation that:

    • Dockerfile is present.
    • Same image runs locally.
    • Another Docker Space under your account works.

This is essentially the same data HF engineers used in the documented Docker-Space issue (#3452) and “permanently building” threads. (GitHub)


4.2.2 “Space is suspicious” or other policy-driven blocks

If HF replies with wording like “space is suspicious” (as in the CMW Copilot thread), that means: (Hugging Face Forums)

  • The 503 / build failures are intentional, not bugs.

  • There is no purely technical “fix” you can apply in code or Dockerfile.

  • You must either:

    • Work with HF to understand and address the policy concern, or
    • Host the demo somewhere else and keep the HF Space minimal / non-interactive.

From a practical perspective:

  • Ask support explicitly whether your Space is blocked for policy reasons.

  • If yes, decide whether to:

    • Simplify the Space (e.g., remove fully automated browsing, add clear safety gating), or
    • Use Spaces only as a front-end that calls a separately hosted backend.

Do not try to “work around” a policy block with obfuscation; that will just cause more issues.


5. How 503 fits into this, technically

From your HTTP status notes, for Spaces:

  • 5xx (including 503) generally mean:

    • The app crashed or never became healthy, or
    • The platform had an internal problem.

Best practices:

  • For API callers, implement a small number of retries and log the status code and body.

  • For Spaces developers:

    • Treat repeated 503 on page load as a signal to check build/runtime logs and container health.
    • When no logs exist at all with a 503, suspect builder / infra issues.
    • For persistent 503s affecting a single Space even after trivial commits and clones, ask whether the Space is being blocked.

6. If you want a systematic debugging loop

Once HF has either fixed the builder or confirmed it’s not a policy block, your own Space-debugging loop applies as usual:

  1. Fix a minimal input / workflow that should work.
  2. Collect full build and runtime logs for one failing run.
  3. Reproduce locally in a clean env.
  4. Reduce to a minimal script (Docker + app) if needed.
  5. Only then iterate on code changes.

That loop is what you’ll use if/when the error evolves from “no logs” to “visible stack-trace in logs”.


Short summary

  • Your exact symptoms (Docker Space, “Build queued” with empty logs for many hours, 503 on restart, other Docker Spaces OK) most closely match HF build-scheduler / stack bugs on a specific Space/slug, not a problem in your Dockerfile.

  • There is a distinct 2024–2025 pattern of Spaces that only work when cloned or mirrored; same code, new slug, new Space → works; old Space → permanently Building. (Hugging Face Forums)

  • Separate pattern: HF-side “space is suspicious” blocking, which also surfaces as 503s and stuck restarts but can only be confirmed and resolved by HF staff. (Hugging Face Forums)

  • App-level causes (wrong host/port, runtime crashes, base-image changes) will matter after the builder actually runs; you should still harden your Dockerfile and app against these.

  • Practical next steps:

    • Clone the Space to a new slug and confirm whether the clone builds.
    • Optionally toggle SDK (docker → static → docker) once to jog the builder.
    • If both original and clone are stuck with no logs, contact HF with URLs, timestamps, and confirmation that the same image runs locally and other Docker Spaces are fine.
    • Ask explicitly if the Space is being blocked for policy reasons before spending time on app-side changes.

Curated references

  • “Space is permanently ‘Building’” – long-running build/queue issues and clone-as-workaround pattern. (Hugging Face Forums)
  • “Streamlit Docker space permanently in ‘Building’ state” – Docker-specific stuck-build behavior, SDK toggling workaround, and confirmation that cloning works. (Hugging Face Forums)
  • GitHub: “Docker Space not triggering build despite valid Dockerfile and correct SDK” – 2025 bug report for empty build logs and no build trigger. (GitHub)
  • “Can’t restart or rebuild the space” – example of HF explicitly classifying a Space as “suspicious” and intentionally erroring out. (Hugging Face Forums)

Uploading the files from the UI, rather than cloning the HF repo and pushing to it, worked!

1 Like