ClawGear

Posted on May 11

35 ChatGPT Prompts for Data Engineers: Pipeline Docs, Stakeholder Reports, and Code Reviews Done Faster

#dataengineering #chatgpt #productivity #ai

35 ChatGPT Prompts for Data Engineers: Pipeline Docs, Stakeholder Reports, and Code Reviews Done Faster

You built the pipeline. It runs. Data flows. Stakeholders still have no idea what it does or why it matters.

That's not a technical problem. It's a communication problem — and it eats 30–40% of a data engineer's work week. Pipeline documentation written three months ago that nobody reads. Incident post-mortems that describe what broke but not what changed. Stakeholder reports that translate "data landed in the warehouse" into business language that actually lands.

The prompts in this article won't build your pipelines. They'll document them, explain them, and communicate about them — faster than you can write the Confluence page.

35 prompts across five categories: pipeline documentation, SQL and code review, stakeholder reporting, incident post-mortems, and onboarding runbooks. All production-ready, all copy-paste.

Why Data Engineers Lose Hours to Writing Tasks

The 2024 DORA State of DevOps Report found that high-performing engineering teams spend 44% less time on rework and 30% more time on value-added work. The bottleneck isn't usually the build — it's the communication overhead that surrounds it.

Data engineers face a specific version of this: they work at the intersection of engineering, data science, and business stakeholders. Every team speaks a different language. Documentation written for a senior engineer confuses an analyst. A report written for an analyst underwhelms a CTO.

ChatGPT handles the translation layer. You provide the technical facts; the prompts produce the right format for the right audience.

Category 1: Pipeline Documentation

The most-skipped task in data engineering. These prompts turn your pipeline logic into documentation someone will actually read.

Prompt 1 — Pipeline Overview (for non-technical stakeholders)

Write a pipeline overview document for a non-technical stakeholder audience.

Pipeline name: [name]
What it does in plain language: [describe the business purpose]
Source systems: [list 2-3 input sources]
Destination: [where data lands — e.g., BigQuery, Redshift, S3]
Update frequency: [e.g., hourly, daily at 3am UTC]
Business impact if it fails: [e.g., "dashboard data is stale by X hours"]

Format: 150-200 words. No SQL or code. Use the word "reports" not "tables." End with one sentence describing how failures are detected and communicated.

Prompt 2 — Technical Architecture Summary

Write a technical architecture summary for a new data engineer joining the team.

Pipeline name: [name]
Tech stack: [e.g., Airflow, dbt, Spark, Snowflake]
Ingestion method: [e.g., batch via API, streaming via Kafka]
Transformation logic summary: [describe key transformations in plain terms]
Key dependencies: [upstream tables or services this pipeline depends on]
Known gotchas: [list 1-3 edge cases or quirks engineers should know]

Format: 250-300 words, use headers for each section. Technical language is fine. End with a "first day troubleshooting" tip.

Prompt 3 — Data Dictionary Entry

Write a data dictionary entry for the following table.

Table name: [name]
Database/schema: [location]
Owner: [team or engineer]
Purpose: [what question does this table answer?]
Update cadence: [how often it refreshes]
Key columns (list each with type and business meaning): [list columns]
Known data quality issues: [list any caveats]

Format: structured markdown table for columns, prose paragraphs for purpose and caveats. Suitable for a Confluence page or README.

Prompt 4 — DAG / Pipeline README

Write a README for a data pipeline repository.

Pipeline name: [name]
What business problem it solves: [one sentence]
Owner: [team name]
Tech stack: [tools used]
Local setup instructions: [high-level steps — you'll fill in code blocks]
Environment variables required: [list them without values]
How to run tests: [brief description]
Deployment process: [brief description]
Common failure modes: [list 2-3]
Contact: [team channel or owner name]

Format: standard GitHub README structure. Use code blocks for commands. Keep prose sections under 3 sentences each.

Prompt 5 — SLA Documentation

Write an SLA document for a data pipeline.

Pipeline name: [name]
Business purpose: [brief]
Data freshness SLA: [e.g., data available by 9am UTC]
Uptime SLA: [e.g., 99.5% over rolling 30 days]
Recovery time objective (RTO): [e.g., < 4 hours]
Recovery point objective (RPO): [e.g., < 24 hours of data loss]
Who monitors it: [team or on-call rotation]
Escalation path: [list escalation steps]

Format: formal SLA document suitable for a service agreement or team agreement. 200-250 words. Include a signature block placeholder at the bottom.

Prompt 6 — Lineage Summary

Write a data lineage summary for internal documentation.

Source: [upstream system and table/endpoint]
Transformations applied (in order): [list key steps — e.g., filter, join, aggregate]
Output: [destination table or dataset]
Downstream consumers: [who or what uses this data?]
Data retention: [how long is data kept?]
PII or sensitive data handling: [describe any masking or anonymization]

Format: 150-200 words. Include a simple ASCII or text-based lineage diagram at the end showing the flow: Source → Transform → Output → Consumer.

Prompt 7 — Deprecation Notice

Write a deprecation notice for a data pipeline or table being retired.

Asset being deprecated: [pipeline or table name]
Reason for deprecation: [e.g., replaced by new pipeline, data no longer accurate]
Deprecation date: [date]
Replacement (if any): [name of replacement asset + migration guidance]
Impact: [who uses it and what they need to do]
Contact for questions: [team or individual]

Format: email-style announcement suitable for a team channel or Confluence page. Tone: professional, clear, not alarming. 150 words.

Category 2: SQL and Code Review

Reviewing code takes longer than writing it. These prompts accelerate the review cycle.

Prompt 8 — SQL Review Checklist

Review the following SQL query against a standard engineering checklist and provide structured feedback.

[Paste query here]

Checklist to evaluate:
- Performance: missing indexes, full table scans, unnecessary subqueries
- Correctness: potential NULL handling issues, incorrect JOINs, off-by-one date logic
- Readability: naming conventions, comment coverage, query structure
- Security: SQL injection risks (if dynamic), data exposure concerns
- Scalability: how this query will behave at 10x data volume

Format: bullet points per category. Flag severity (Critical / Warning / Suggestion). End with a one-sentence overall assessment.

Prompt 9 — dbt Model Review

Review this dbt model for engineering best practices.

[Paste dbt model SQL here]

Review for:
- Model naming convention adherence
- Source reference patterns (ref() vs raw SQL)
- Test coverage — what tests should exist but may be missing
- Documentation completeness (schema.yml)
- Performance considerations (materialization type, incremental logic if present)
- Idempotency — does this model produce the same result on reruns?

Format: structured feedback by category. Include specific recommendations, not just observations.

Prompt 10 — Code Review Comment (constructive)

Write a constructive code review comment for the following issue.

Context: I am reviewing a pull request from a colleague. I want to flag a problem in a way that's helpful, not critical.
Issue I've identified: [describe the technical problem — e.g., "this JOIN on a large table will cause a full scan"]
Suggested fix: [describe what I think should be done]
Severity: [Minor suggestion / Should fix before merge / Must fix — blocks merge]

Format: 3-5 sentences. Professional and collaborative tone. Include a code snippet showing the suggested improvement if applicable. Do not use "you should" — phrase as "one approach would be" or "we could consider."

Prompt 11 — PR Description Template

Write a pull request description for a data engineering change.

What this PR does: [describe the change]
Why: [business reason or technical motivation]
What changed: [list key files or logic changes]
How to test: [steps a reviewer should follow to validate]
Known limitations or tradeoffs: [any caveats]
Related issues or tickets: [list IDs]

Format: standard GitHub PR description with sections. Keep each section to 3-5 bullets or 2-3 sentences. Include a checklist at the bottom for reviewer sign-off.

Prompt 12 — Refactoring Justification Memo

Write a short technical memo justifying a refactoring decision.

Current state: [describe what exists and why it's a problem]
Proposed change: [describe the refactored approach]
Technical benefits: [list 3 specific improvements — performance, maintainability, cost]
Business benefits: [translate technical gains into business impact]
Estimated effort: [rough story points or days]
Risks: [what could go wrong]

Format: 200 words, suitable for sharing with an engineering manager or architect. End with a clear recommendation.

Prompt 13 — Anti-pattern Explanation

Write a clear explanation of why the following code pattern is problematic and what to use instead.

Anti-pattern: [describe the pattern — e.g., SELECT * in production queries, using LIMIT without ORDER BY]
Why it's a problem: [explain the technical and operational risks]
Better alternative: [describe the preferred pattern]
Example: [show before/after code, ideally]

Format: 150 words. Technical audience — no need to simplify. Include a one-line rule of thumb at the end that the team can use as a quick heuristic.

Category 3: Stakeholder Reporting

Translating pipeline output into language leadership actually cares about.

Prompt 14 — Executive Data Health Summary

Write a weekly data health summary for a VP or C-level audience.

Reporting period: [week of X]
Pipelines monitored: [number and names]
Overall status: [green/yellow/red with brief explanation]
Notable incidents this week: [list any with resolution]
Data quality metrics: [e.g., "98.7% of records passed validation checks"]
Upcoming changes or risks: [list 1-2]

Format: 150 words, executive email style. No technical jargon. Use traffic light emoji (🟢🟡🔴) for quick visual scan. End with one sentence on what the team is focused on next week.

Prompt 15 — Pipeline Cost Report

Write a pipeline cost analysis summary for engineering leadership.

Pipeline or workload: [name]
Current monthly cost: $[X]
Cost breakdown by component: [list — e.g., compute, storage, egress]
Cost trend: [up/down/flat] over last [period]
Optimization opportunities identified: [list 1-3 with estimated savings]
Recommended action: [what should be approved or prioritized]

Format: 200 words. Include a simple table showing current vs projected cost after optimization. Tone: analytical, not alarming.

Prompt 16 — Data Availability Status Update

Write a data availability status update for a business stakeholder who relies on a specific dataset.

Dataset/report affected: [name]
Current status: [available / delayed / unavailable]
Root cause (in plain language): [explain without technical jargon]
Expected resolution: [time estimate]
Business impact: [what decisions or reports are affected]
Next update: [when you'll communicate again]

Format: 100-150 words. Stakeholder is not technical. Tone: calm, informative, action-oriented. Do not use words like "pipeline," "DAG," or "orchestration."

Prompt 17 — Quarterly Data Engineering Wins Report

Write a quarterly wins report for the data engineering team.

Team: [team name]
Period: [Q and year]
Key projects shipped: [list 3-5 with one-line impact summary each]
Reliability improvement: [e.g., "reduced incident count from 12 to 4"]
Performance improvement: [e.g., "cut average query time from 4.2s to 0.8s"]
Cost optimization: [savings achieved]
Next quarter priorities: [list 2-3]

Format: 300 words, suitable for an all-hands or leadership review. Quantify everything possible. Tone: confident and forward-looking.

Prompt 18 — Ad Hoc Data Request Response

Write a response to an ad hoc data request from a business stakeholder.

Request received: [describe what they asked for]
What I can provide: [describe the data available — and its limitations]
What I cannot provide (and why): [describe any gaps — e.g., "we don't track X at that granularity"]
Estimated delivery time: [when they'll have it]
Clarifying questions needed (if any): [list what you need from them before proceeding]

Format: professional email, 100-150 words. Set accurate expectations without over-promising. Offer a follow-up call if the request is complex.

Prompt 19 — Data Platform Roadmap Update

Write a data platform roadmap update for a quarterly business review.

Team: [team name]
Quarter: [current]
Completed items: [list with one-line summaries]
In progress: [list with % complete and expected ship date]
Blocked: [list with blocker description]
Next quarter plan: [top 3 priorities]
Long-term investments (6-12 months): [1-2 items]

Format: slide-deck summary style — short bullets, no paragraphs. Suitable for a 5-minute verbal walkthrough. 200 words total.

Category 4: Incident Post-Mortems

Post-mortems no one reads don't prevent future incidents. These prompts produce ones that do.

Prompt 20 — Post-Mortem Draft

Write a post-mortem document for a data pipeline incident.

Incident title: [descriptive name]
Date and duration: [when it happened and how long it lasted]
Impact: [what broke, who was affected, what decisions couldn't be made]
Timeline:
  - [time]: [what happened]
  - [time]: [what happened]
  - [time]: [resolved]
Root cause: [technical root cause — be specific]
Contributing factors: [list 2-3]
What went well: [list 1-2]
Action items: [list each with owner and due date]

Format: standard post-mortem structure, 400-500 words. Blameless tone — focus on systems and processes, not individuals. Include a "lessons learned" section with 2-3 actionable insights.

Prompt 21 — Incident Timeline Reconstruction

Help me reconstruct a clear incident timeline from these raw notes.

Raw notes (paste your Slack thread, PagerDuty alerts, or notes here):
[paste raw notes]

Format the timeline as:
- Exact timestamps (or best estimates if unavailable)
- What event occurred
- Who took action and what they did
- What the system state was at each point

Output: clean markdown table with columns: Time | Event | Actor | System State.

Prompt 22 — Blameless RCA Statement

Write a root cause analysis (RCA) statement using the 5 Whys technique.

Incident description: [what went wrong]
Why 1: [first-level cause]
Why 2: [cause of Why 1]
Why 3: [cause of Why 2]
Why 4: [cause of Why 3]
Why 5 (root cause): [deepest root cause]

Format: present each "Why" as a clear statement leading to the next. End with a root cause summary paragraph (50 words) and two systemic fix recommendations. Blameless framing throughout.

Prompt 23 — Remediation Plan

Write a remediation plan following a data pipeline incident.

Incident summary: [brief description of what happened]
Immediate fixes applied: [list what was done during the incident]
Short-term actions (< 2 weeks): [list with owners and due dates]
Medium-term improvements (1-3 months): [list with owners and due dates]
Long-term systemic changes (3-6 months): [list]
Success criteria: [how will we know the system is more resilient?]

Format: structured action plan, 250 words. Include a priority column (Critical/High/Medium) for each item.

Prompt 24 — Stakeholder Incident Communication

Write a stakeholder communication for a resolved data pipeline incident.

Incident: [brief description]
Duration: [start to resolution time]
Business impact: [what decisions, reports, or processes were affected]
Root cause (in plain language): [no technical jargon]
What we fixed: [plain language explanation]
What we're doing to prevent recurrence: [2-3 bullets]

Format: email announcement to business stakeholders, 150-200 words. Tone: transparent and confident. Do not use the words "pipeline," "DAG," or "Airflow." Use "data systems" instead.

Category 5: Onboarding Runbooks

New engineers shouldn't spend their first week in Slack asking basic questions. These prompts write the docs that prevent that.

Prompt 25 — First Week Onboarding Guide

Write a first-week onboarding guide for a new data engineer joining the team.

Team name: [team name]
Tech stack: [list tools — e.g., Airflow, dbt, Snowflake, Spark]
Key repos: [list with brief descriptions]
Access to request (in order): [list systems they'll need]
First day tasks: [list 3-5 concrete actions]
Key people to meet: [roles, not names — e.g., "platform lead," "data scientist liaison"]
Common gotchas: [list 3 things new engineers always get wrong in the first month]
Where to get help: [channels, wikis, escalation paths]

Format: structured guide, 300-350 words. Conversational but actionable. Include a "Day 1 checklist" at the top.

Prompt 26 — Runbook for Common Failure Mode

Write an operational runbook for a recurring pipeline failure mode.

Failure mode name: [e.g., "Snowflake query timeout," "S3 source file missing"]
Symptoms: [what alerts fire, what the logs show]
Immediate impact: [what breaks downstream]
Diagnosis steps (in order): [list each step with expected output]
Resolution steps (in order): [list each step]
When to escalate: [criteria for escalating to senior engineer or on-call]
Prevention: [what configuration or monitoring change would catch this earlier]

Format: numbered steps for diagnosis and resolution. 250-300 words. Suitable for a junior engineer to execute without additional help.

Prompt 27 — Environment Setup Guide

Write an environment setup guide for a new data engineer.

Operating system assumption: [macOS / Linux / either]
Tools to install (in order): [list with versions]
Config files to create: [list with what goes in them — no actual secrets]
Environment variables to set: [list names and what they represent, not values]
Test to verify setup works: [describe the verification step]
Common setup errors: [list 2-3 with solutions]

Format: step-by-step guide with numbered steps. Include code blocks for all commands. 300-400 words. Assume the reader knows how to use a terminal but is new to this specific stack.

Prompt 28 — On-Call Handoff Template

Write an on-call handoff document for end of shift.

Shift period: [dates and times]
Incidents this shift: [list with status — resolved/ongoing]
Active monitoring concerns: [what's in a degraded or unusual state right now]
Upcoming scheduled maintenance or changes: [list with times]
Known risks for next shift: [list 1-3 items the next engineer should watch]
Open action items: [list with who owns each]

Format: structured markdown, 200 words. Scannable — the next on-call should be up to speed in 3 minutes.

Prompt 29 — Pipeline Ownership Transfer Document

Write a pipeline ownership transfer document.

Pipeline being transferred: [name]
Current owner: [team or engineer]
New owner: [team or engineer]
Transfer reason: [e.g., team reorg, engineer offboarding]
What the new owner needs to know: [top 5 things — critical knowledge transfer]
Known technical debt: [list outstanding issues the new owner inherits]
SLAs and expectations: [what stakeholders expect]
Support contacts: [who to call if something breaks and the new owner is stuck]

Format: knowledge transfer document, 300 words. Tone: helpful and complete, not rushed.

Prompt 30 — Glossary Entry for Internal Documentation

Write a data engineering glossary entry for our internal documentation.

Term: [technical term or internal system name]
Plain English definition: [what it means in the simplest possible language]
Technical definition: [accurate technical description]
How we use it: [how this term applies specifically to our systems]
Related terms: [list 2-3 related concepts]
Example: [a concrete example of this term in context]

Format: structured glossary entry, 150 words. Audience includes both technical engineers and business analysts.

Prompt 31 — FAQ for Internal Data Team Wiki

Write a FAQ section for our internal data engineering wiki.

Target audience: [e.g., analysts, product managers, new engineers]
Common questions (list 5-7): [list the questions your team gets repeatedly]
For each question: a clear, direct answer in 2-4 sentences. Avoid jargon unless the audience is technical.

Format: numbered Q&A format. 300-400 words total. Include a "Still have questions?" footer pointing to the team channel.

Prompt 32 — Incident Severity Level Guide

Write an incident severity level guide for a data engineering team.

Severity levels to define: P0, P1, P2, P3 (or SEV1-SEV4)
For each level, define:
- Business impact criteria (what makes it this severity)
- Response time expectation
- Escalation path
- Communication requirements

Context: our team runs pipelines for [internal BI / customer-facing data products / regulatory reporting].

Format: reference table (markdown), 200 words total. One row per severity level. Include a column for example incidents at that level.

Prompt 33 — Architecture Decision Record (ADR)

Write an Architecture Decision Record (ADR) for the following technical decision.

Decision title: [e.g., "Migrate batch pipeline from Spark to dbt"]
Context: [why was this decision needed?]
Decision: [what was decided]
Alternatives considered: [list 2-3 alternatives you evaluated]
Rationale: [why this option over the alternatives?]
Consequences: [what changes, what gets better, what gets harder]
Status: [Proposed / Accepted / Deprecated]

Format: standard ADR markdown structure. 300-350 words. Future engineers should be able to understand this decision without additional context.

Prompt 34 — Capacity Planning Document

Write a data platform capacity planning document for an annual planning cycle.

Current infrastructure: [describe current compute, storage, scale]
Current growth rate: [% growth in data volume / query load per month]
Projected growth: [12-month forecast]
Capacity constraints approaching: [what will hit limits first and when]
Recommended investments: [list 2-3 scaling decisions with cost estimates]
Risk of not acting: [what happens if capacity isn't added]

Format: 300 words, suitable for an engineering budget review. Quantify wherever possible. Include a simple table showing current vs. 12-month projected resource usage.

Prompt 35 — Vendor Evaluation Summary

Write a vendor evaluation summary for a data tooling decision.

Tool category: [e.g., orchestration, data quality, observability]
Vendors evaluated: [list 3-4]
Evaluation criteria: [list your scoring dimensions — e.g., cost, scalability, OSS vs proprietary, team familiarity]
Winner: [which vendor was selected]
Rationale: [why this vendor over the others]
Trade-offs accepted: [what you're giving up]
Next steps: [what needs to happen to implement]

Format: 250 words. Decision-ready summary suitable for an architecture review board. Include a simple scoring matrix at the end.

The Bottom Line

Data engineers are expensive. Having them spend 30% of their time on administrative writing tasks is an organizational problem with a cheap solution.

These 35 prompts don't replace your judgment. They replace your blank page. The analysis, the architectural decisions, the troubleshooting — that stays with you. The formatting, the translation for different audiences, the consistency across hundreds of documents — that's where AI actually saves time.

Use them as starting points. Fill in the brackets. Adjust the tone to match your team's style.

Go Deeper: The Full Data Engineer AI Toolkit

These 35 prompts cover documentation and communication. The Data Engineer AI Toolkit extends further — with prompt packs for data modeling reviews, cost optimization analysis, interview prep, and stakeholder alignment frameworks.

Built for data engineers who'd rather be building than explaining what they built.

Use code LAUNCH30 for 30% off — limited uses remaining.

→ Get the Data Engineer AI Toolkit

Prompts are for administrative and communication use only. Always verify technical accuracy of AI-generated documentation before publishing internally or externally.

DEV Community

35 ChatGPT Prompts for Data Engineers: Pipeline Docs, Stakeholder Reports, and Code Reviews Done Faster

35 ChatGPT Prompts for Data Engineers: Pipeline Docs, Stakeholder Reports, and Code Reviews Done Faster

Why Data Engineers Lose Hours to Writing Tasks

Category 1: Pipeline Documentation

Category 2: SQL and Code Review

Category 3: Stakeholder Reporting

Category 4: Incident Post-Mortems

Category 5: Onboarding Runbooks

The Bottom Line

Go Deeper: The Full Data Engineer AI Toolkit

Top comments (0)