Huggingface datasets card not work correctly

For now, a quick diagnosis:


What I think is happening in your case

Your new dataset is not failing in the core way that would actually break usability. Right now, its page shows a working Dataset Viewer, an “Expand in Data Studio” link, preview rows with the columns audio, text, source, sample_rate, and speaker, and a visible “Downloads last month” count of 10. That means the important backend pieces are already working: the Hub can read the repo, understand the schema, and power the viewer/Data Studio layer. (Hugging Face)

What is missing on the new page is the top header metadata surface: no Modalities, no Formats, no Libraries, and no separate top tab labeled Data Studio. Your older dataset page does show all of those: Modalities: Audio, Text, Formats: parquet, Libraries: Datasets, Dask, Polars, plus a top-level Data Studio tab. So the two repos are being rendered differently at the page-header level even though both are dataset-viewer-compatible. (Hugging Face)

Why I do not think the dataset itself is broken

Both repositories are built around the same basic storage pattern: a data/ directory containing Parquet shards named train-...parquet. Your new repo has six shards (train-00000-of-00006.parquet through train-00005-of-00006.parquet), and the older repo has forty-one shards. Since both use Parquet shards under data/, the missing badges on the new page are not well explained by using the wrong file format or the wrong top-level layout. (Hugging Face)

Hugging Face’s own docs say that a dataset with a supported structure and supported file formats automatically gets a Dataset Viewer, and that the viewer backend auto-converts Hub datasets to Parquet for exploration. Your new page already passes that bar. In other words, the fundamental data ingestion pipeline appears healthy. (Hugging Face)

The real split: viewer/backend vs. metadata/header

Hugging Face separates two things:

  1. Dataset Viewer / Data Studio / schema preview, which come from the viewer backend and recognizable dataset structure.
  2. Dataset card metadata and header badges, which come from the README.md card and especially its YAML metadata block at the top. (Hugging Face)

That distinction matters a lot here. Your new repo already has the backend-driven features that matter most to users: preview, rows, columns, Parquet, and Data Studio access through the viewer. The missing pieces are mostly the card/header decorations and tags. That points much more strongly to a metadata inference / indexing / rendering inconsistency than to a damaged dataset. (Hugging Face)

The most important clue in your README

Your new README.md already contains structured metadata such as dataset_info, feature types, split info, configs, task_categories, language, and a license field. But the license field in the YAML is currently just cc, while the human-readable body below says the dataset is released under CC BY 4.0. On the page, the header also shows only the generic cc badge. Hugging Face’s license list distinguishes the broad family identifier cc from the specific identifier cc-by-4.0. So your repo is currently telling the Hub “generic Creative Commons” in metadata while telling readers “CC BY 4.0” in prose. That mismatch is a concrete sign that the metadata layer is not as specific as it should be. (Hugging Face)

By contrast, your old dataset’s page shows rich badges even though its README preview is effectively empty apart from auto-generated metadata. That tells me the older page likely benefited from auto-inference or prior indexing behavior that the new page has not reproduced in the same way. In plain terms: the older repo got lucky or got indexed more favorably; the newer one did not. (Hugging Face)

About the missing Modalities and Libraries

Hugging Face documents that:

  • modality is auto-detected from the files, but you can force it by adding tags such as audio and text to the YAML metadata;
  • the dataset page automatically shows compatible libraries, but you can also manually associate libraries by tagging the dataset with values such as datasets, dask, pandas, or webdataset. (Hugging Face)

That is almost exactly your situation. The page clearly understands your columns well enough to show audio and text in the viewer, but it is not surfacing those as top-level header badges. Since HF explicitly documents manual tags as the fallback, the safest conclusion is: automatic inference did not fully populate the page header for this repo. (Hugging Face)

About Data Studio

This part is easy to misread. Hugging Face says Data Studio is enabled by default for all public datasets. Your new page does not show a separate top Data Studio tab the way the older page does, but it does show Expand in Data Studio inside the viewer area. So for your repo, the correct reading is not “Data Studio is unavailable”; it is “Data Studio is available, but the page is presenting it differently.” (Hugging Face)

About download counts

This part is partly normal, partly confusing.

Hugging Face’s current rule is that all files downloaded by the same user/IP within a 5-minute window in one repository count as a single dataset download. That is done specifically so one user downloading many files or splits does not inflate the counter. So if you uploaded six Parquet shards, a person fetching several of them in one session still may count as only one dataset download. (Hugging Face)

That means a low number does not automatically mean tracking is broken. And in your repo’s current state, the count is not missing anyway: the page shows 10 downloads last month. So the strongest answer is: download tracking appears to be active right now. (Hugging Face)

At the same time, public forum posts do show that Hugging Face has had periods where dataset download statistics appeared stuck or delayed for some users. There are reports from June 2024, June–July 2025, and March 2026 describing counters not updating for days. Those reports do not prove your repo has a bug, but they do show that stats glitches on the Hub are a real thing, so it is reasonable that this made you suspicious. (Hugging Face Forums)

My actual diagnosis

For your repository, I would describe it this way:

  • Core dataset publishing: working. Viewer works, Parquet is recognized, schema is recognized, Data Studio is reachable, and downloads are being counted. (Hugging Face)
  • Header metadata / discoverability badges: inconsistent. The page is not surfacing modalities, format, libraries, or the dedicated Data Studio tab the way the older repo does. (Hugging Face)
  • Most likely cause: incomplete or inconsistent Hub-side metadata inference / indexing, made worse by your metadata being partially generic (license: cc) and not explicitly tagging modality/library. (Hugging Face)

So: not a broken dataset, but an incomplete card/header rendering state.

What I would do

1. Make the metadata explicit instead of relying on auto-detection

Add or revise the YAML block at the top of README.md so it explicitly tells the Hub what you want shown.

Use something close to this:

---
pretty_name: Synthetic Turkish TTS Data
language:
  - tr
license: cc-by-4.0
task_categories:
  - text-to-speech
  - automatic-speech-recognition
tags:
  - audio
  - text
  - datasets
  - dask
  - polars
  - pandas

dataset_info:
  features:
    - name: audio
      dtype: audio
    - name: text
      dtype: string
    - name: source
      dtype: string
    - name: sample_rate
      dtype: int64
    - name: speaker
      dtype: string
  splits:
    - name: train
      num_bytes: 2942869578
      num_examples: 13000
  download_size: 3303296950
  dataset_size: 2942869578

configs:
  - config_name: default
    data_files:
      - split: train
        path: data/train-*
---

Why this helps:

  • license: cc-by-4.0 matches what your card text already says, and it uses the specific official Hugging Face identifier instead of the generic cc. (Hugging Face)
  • tags: [audio, text] is the documented way to force the modality. (Hugging Face)
  • tags: [datasets, dask, polars, pandas] is the documented way to force library associations when the page does not show them automatically. (Hugging Face)

2. Commit the README change and let the page re-index

Because the header badges are metadata-driven, a README metadata commit is the right trigger for the Hub to recalculate what should be shown. Your current repo history already shows recent README updates, so another small metadata cleanup commit is reasonable. (Hugging Face)

3. Judge success by the right signals

After the commit, the signals that matter most are:

  • header shows Modalities: Audio, Text;
  • header shows one or more Libraries;
  • license badge becomes cc-by-4.0 rather than generic cc;
  • viewer still works;
  • Expand in Data Studio still works. (Hugging Face)

4. If the header still does not update, treat it as a Hub bug, not your fault

There are enough public examples of viewer/stat inconsistencies on the Hub that I would not keep rewriting your dataset structure forever if the explicit tags do not fix it. Once your metadata is explicit and the backend is still healthy, a persistent missing-badges problem is best treated as a Hub-side UI/indexing issue. Public reports about row counts, Data Studio visibility, and stats glitches show that this class of issue does happen. (Hugging Face Forums)

Bottom line

For your case, my answer is:

  • Is this normal? Partly. It is not unusual for Hugging Face’s metadata/display layer to be less consistent than the actual dataset backend, and the docs explicitly allow manual YAML tags because auto-detection is not always enough. (Hugging Face)
  • Is your dataset broken? No. The important backend features are already working. (Hugging Face)
  • Are downloads tracked? Yes, at least currently. The page shows 10, and HF’s counting rules also make counts look lower than many authors expect. (Hugging Face)
  • What should you do? Make the README YAML fully explicit: set license: cc-by-4.0, add audio and text tags, add the library tags you want, then let the page re-index. If the header still stays incomplete after that, treat it as a Hugging Face page/indexing issue rather than a dataset-format issue on your side. (Hugging Face)