If you've ever tried to scrape Chinese-platform data at scale, you know the landscape: TLS fingerprinting walls, residential-proxy bills, integration headaches that take weeks before you even see one row of data.
Bilibili is the outlier. No API key. No browser. No proxy. Pure HTTP. Runs in 256 MB of RAM. Genuinely the easiest Chinese platform to scrape in 2026.
If you're tracking a gaming brand launch, doing creator research, building a market-trends dataset, or feeding a recommender system, Bilibili is where you should start. This post walks through how to scrape it from scratch in Python, what data you actually get, and when it makes sense to switch from DIY to a hosted scraper.
Why Bilibili is unusually scrape-friendly
Bilibili (哔哩哔哩) is China's YouTube — 300M+ monthly active users, skewed Gen Z and millennials, dominant in anime, gaming, tech, and educational content. From a scraping perspective, three things make it different from RedNote and Weibo:
1. Public JSON endpoints. Bilibili exposes JSON for video metadata, popular/trending, user info, and comments. Most don't require auth for public content.
2. Stable structure. The endpoint shapes change rarely — months between meaningful updates. You can ship a scraper and it'll keep working.
3. Globally accessible. No geo-fencing on most public content. You can hit it from a US datacenter without proxies for moderate volumes.
The trade-off: comments throttle aggressively from cloud IPs (only top ~3 comments come back). Search has its own constraints (more on that below). For everything else, Bilibili is a friendly target.
What you can actually extract
Per video:
- BVID, title, description, URL, duration
- View count, like count, danmaku (live scrolling comments) count, coin (tip) count, favorite count, share count, reply count
- Author MID, name, publish date, category, tags
Per user/creator:
- MID, name, signature/bio, level, follower count
- Total archive (video) count
- Profile URL
Per comment:
- Comment ID, text, like count, reply count, created timestamp
- Author MID, name, avatar, level
Bilibili-specific signal worth highlighting: coin-per-view ratio. When users "throw coins" at a video, they spend a finite daily allowance — that's a stronger quality signal than likes (which are free). Useful for creator vetting.
A minimal Python scraper (≈30 lines)
This pulls Bilibili's trending/popular videos as clean structured rows. No external dependencies beyond httpx. No request signing. Works on the first run.
import httpx
import json
HEADERS = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"Accept": "application/json",
"Referer": "https://www.bilibili.com/",
}
def fetch_popular_bilibili(max_results: int = 30):
url = "https://api.bilibili.com/x/web-interface/popular"
results = []
page = 1
while len(results) < max_results:
r = httpx.get(url, headers=HEADERS, params={"ps": 20, "pn": page})
data = r.json().get("data") or {}
items = data.get("list") or []
if not items:
break
for it in items:
owner = it.get("owner") or {}
stat = it.get("stat") or {}
results.append({
"bvid": it.get("bvid"),
"title": it.get("title"),
"author": owner.get("name"),
"viewCount": stat.get("view"),
"likeCount": stat.get("like"),
"danmakuCount": stat.get("danmaku"),
"duration": it.get("duration"),
"url": f"https://www.bilibili.com/video/{it.get('bvid')}",
})
page += 1
return results[:max_results]
if __name__ == "__main__":
rows = fetch_popular_bilibili(max_results=20)
print(json.dumps(rows, indent=2, ensure_ascii=False))
That's the whole scraper. Plain HTTP. No browser. No proxy. Returns Bilibili's currently-trending videos with the engagement metrics that matter.
Sample output
{
"bvid": "BV1YXDfBUETP",
"title": "AI教程入门:从零开始",
"author": "技术老师",
"viewCount": 1570113,
"likeCount": 182455,
"danmakuCount": 7466,
"duration": 767,
"url": "https://www.bilibili.com/video/BV1YXDfBUETP"
}
When DIY breaks down
The 30-line popular fetcher is great for prototyping and tracking trends. It falls over when you need:
- Search by keyword. Bilibili's search endpoints have additional request requirements that change over time. Reimplementing them yourself means signing logic that you'll need to chase as Bilibili updates it.
- Comments at depth. Cloud IPs get throttled to ~3 comments per video. You need residential IPs or session continuity to paginate full threads.
- Multi-mode pipelines. Search + video detail + user-videos + comments + popular all wired together with retries, rate-limit handling, and consistent output schemas — that's a weekend's work, not 30 lines.
- Long-running jobs. Cookie refresh, exponential backoff on rate-limit responses, image URL normalization, edge cases on user pages with paid memberships.
- Output stability. Bilibili sometimes changes a single field name silently. Your downstream breaks.
When that's where you are, point a hosted Actor at it.
Hosted alternative
I maintain the Bilibili Scraper on Apify — five modes (search, video_detail, video_comments, user_videos, popular) in one Actor. Pure HTTP under the hood, same as above, but with all the rate-limiting + retry + schema-stability work already done — including search, which the DIY example skips.
from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("zhorex/bilibili-scraper").call(run_input={
"mode": "search",
"searchQuery": "人工智能教程",
"maxResults": 100,
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item["title"], item["danmakuCount"])
Pay-per-result: $0.005 per item. First 1,000 items are on Apify's free tier ($5/month credit).
Common questions
Will Bilibili add anti-bot eventually? Some endpoints already have additional request requirements; popular/trending and basic video metadata have stayed open for years. The hosted Actor tracks any changes.
What about geo-restricted content? Some licensed videos (anime, copyrighted music) are mainland-only. Public metadata for those still works; the streams don't.
Is this legal? This Actor only accesses Bilibili's public HTTP endpoints — the same data any browser visitor sees without logging in. No authentication is bypassed. Always consult your local laws.
Building Chinese intelligence at scale?
If you're combining Bilibili with other Chinese platforms (Weibo, RedNote), I maintain the full suite under the same code style and schema conventions:
- Bilibili Scraper — (this one)
- Weibo Scraper — microblogging, 580M MAU
- RedNote (Xiaohongshu) Scraper — lifestyle social
- RedNote Shop Scraper — Xiaohongshu e-commerce
Running 1,000+ items per week? I offer custom output schemas, dedicated proxy pools, SLA support, and volume discounts above 50K items/month. DM me on Apify or open an Issue with subject "Enterprise inquiry".
Found a bug? Open an Issue on the Actor page — I usually ship fixes within 48 hours.
If this saved you time, a 30-second review on the Apify Store helps a lot. ⭐
Top comments (0)