I Curated Over 2,000 Seedance 2 Prompts into a Free Website and Open-Source Dataset for You to Use

#ai #opensource #showdev #sideprojects

I Curated Over 2,000 Seedance 2 Prompts into a Free Website and Open-Source Dataset for You to Use

When I was making AI videos myself, finding good Seedance 2 prompts was a huge pain.

I scoured X (formerly Twitter), TikTok, and Discord, only to find endless screenshots that I couldn't even copy and paste. Most of the so-called "prompt collections" online were either hidden behind paywalls or just dry walls of text with no original videos, no categorization, and no structure. They were practically useless if you wanted to dig deeper.

So, I decided to organize a list myself. But the more I gathered, the bigger the project became. Eventually, I decided to just build a website and an open-source dataset.

The URL is prompthub.gokuscraper.com. It's ready to use right out of the box—no registration or login required.

Currently, the supported models include Seedance 2, Midjourney V6, Flux, GPT Image 2, and Nano Banana Pro. It essentially covers all the mainstream AI image and video generation tools.

The prompts are categorized by use case: Trending, Today's Updates, Entertainment/Memes, Business/Productivity, and Content Creation. There are also source-based categories like "From X (Twitter)" and "From TikTok" to help people with different needs filter quickly.

Every prompt comes with a video preview—it's not just a plain text list, so you can see the results at a glance. It also supports searching by title, tags, and content, so you can easily find specific styles. Just click the "Copy" button, and you can grab the entire prompt without the hassle of manual highlighting.

There's also a lightning-fast "Generate Image" ⚡ button that takes you straight to the corresponding platform. Scroll down, and it automatically loads more. It feels like scrolling through an endless feed, and before you know it, you've gathered a ton of inspiration.

But the website is just a shell. The real effort went into the dataset behind it.

If I just wanted to build a site to display prompts, I wouldn't have gone to all this trouble. From the very beginning, I believed this data shouldn't just sit on a webpage—it needed to be truly open data.

The dataset is called seedance-2-prompts-datasets, hosted on Hugging Face. The total size is 12GB, containing over 2,110 Seedance 2.0 generated videos (mp4) and cover images (jpg).

The core of it is a metadata.jsonl file, where every prompt has been structurally processed. Titles, tags, English/Chinese translations, video file mappings, resolutions, durations, and safety ratings are all neatly labeled and standardized. Here’s an example of a data entry:

{
  "id": "SD2_00133",
  "category": "Entertainment",
  "raw_p": "Environment: A colossal glacial canyon under pale blue twilight...",
  "media": {
    "v": "seedance-2/videos/SD2_00133.mp4",
    "c": "seedance-2/covers/SD2_00133.jpg"
  },
  "spec": { "width": 1280, "height": 720, "ratio": 1.78, "duration": 15.12 },
  "i18n": {
    "zh": { "t": "冰谷虎蛇战", "p": "环境：一座巨大的冰川峡谷...", "tags": ["冰川峡谷", "冰虎", "霜蛇"] },
    "en": { "t": "Glacial Tiger vs Frost Serpent", "p": "Environment: A colossal...", "tags": ["ice canyon", "cinematic"] }
  }
}

For developers, you can load the entire dataset with just one line of code:

import pandas as pd
df = pd.read_json("https://huggingface.co/datasets/GokuScraper/seedance-2-prompts-datasets/raw/main/metadata.jsonl", lines=True)

It’s perfect for secondary uses like research, tool development, or model training. The entire dataset is under the CC BY 4.0 license, meaning commercial use is totally fine—just give attribution.

Why bother making it structured data?

In the AI era, prompts are essentially a new "productivity language." But the current reality is that good prompts are scattered everywhere—in screenshots, tweets, and video comment sections. They are fragmented; you can find them, but you can't easily use them.

What I want to do is simple: collect those scattered, high-quality prompts and turn them into data that machines can read, humans can search, and developers can use directly. It’s not just a "display"—it’s a computable, redistributable data asset.

This project and website are just the first step.

Of course, it's far from perfect right now.

To be honest, building it is one thing, but making it great is another. There are still many things about this project and website that I’m not entirely satisfied with. I'll list them out frankly:

Regarding the website:

A total of 2,110 prompts is far from enough for something meant to be a "Hub".
Model coverage is still incomplete. Right now, Seedance 2 is the main focus, and the volume for other models is visibly lacking.
Categorization could be much more granular. Some tags are a bit too broad right now.
The mobile experience hasn’t been specifically optimized, so it’s not the most comfortable to browse on a phone.
There’s no user system yet. Features like favoriting, liking, and personalized recommendations haven't been built.

Regarding the dataset:

Structured organization currently only covers Seedance 2. High-quality prompts from other models haven’t been integrated yet.
Data sources lean heavily on X (Twitter) and TikTok; content from other platforms is sparse.
Updates currently rely mostly on manual work. I'm still slowly building pipelines for automated scraping and cleaning.
The quality of the Chinese translation is mixed, and some parts need proofreading and rework.
The tagging system isn't detailed enough. Ideally, you should be able to filter by dimensions like camera shot types, lighting styles, and motion types, but that’s not possible yet.

These are the tough nuts I need to crack moving forward. There’s no shame in listing them—hiding the flaws misses the point.

But the direction is clear.

Right now, this data is just a starting point.

In the short term, I want to expand model coverage. Prompts for Midjourney, GPT Image-2, and other models need the same kind of structured organization. I’m building automated update pipelines so I don't have to manually scrape data every time, allowing the dataset to grow sustainably.

In the medium term, I hope to see more creators join in and contribute the great prompts they’ve refined. I want this Hub to be more than just me dumping stuff in. The ideal scenario is that people find it useful and naturally decide to share their own hidden gem prompts, growing the data pool for everyone.

If I'm lucky, this project might go even further—becoming a genuine public infrastructure for prompt data. Not a private asset, no paywalls to unlock things, just a clean, continuously updated, open-source data resource that anyone can use. It’s an ambitious thought, but it's a direction worth pursuing.

How to Access and Download

🌐 Try it online: https://prompthub.gokuscraper.com/

🤗 Download the full dataset: https://huggingface.co/datasets/GokuScraper/seedance-2-prompts-datasets

⭐ Synced updates on GitHub, stars and issues are welcome!

Wrapping Up

I spent a lot of time on this project and website, but it’s still far from perfect.

If you use it and have any thoughts, complaints, or suggestions, please let me know. I built this for people to use, and your feedback will directly guide the improvements in the next version.

Thanks for checking this out! If you found this interesting, please don't hesitate to toss a like, share it, or spread the word!

If you want to see my future articles as soon as they drop, don't forget to star ⭐ my page, so you don't lose track of it later.

Alright, that's all for today.