TL;DR
This is Phase 7 of the FSx for ONTAP S3 Access Points serverless pattern library. Building on Phase 6, Phase 7 delivers:
- Three new Public Sector use cases (UC15 defense/satellite, UC16 government archives/FOIA, UC17 smart-city geospatial) — 21 new Lambdas, 110 new unit tests, 3 production CloudFormation templates
-
OutputDestinationAPI unification across 13 of 17 UCs (UC1-5, UC9-12, UC14-17) — a singleSTANDARD_S3 | FSXN_S3AP(Amazon FSx for NetApp ONTAP S3 Access Point output) switch decides where AI/ML artifacts land - End-to-end AWS verification in FSXN_S3AP mode for all three Public Sector UCs — Bedrock-generated Japanese urban planning reports land directly on FSx for ONTAP, readable by city officials via SMB/NFS without a separate output-copy step or a per-UC output bucket
- Extended Work completion: v7 OCR-based screenshot redaction (PR #2, 101 PNGs), 97-file localization batch (Chinese backfill for UC1-14 plus full 7-language translation for the three new Public Sector UCs), dual-Kiro parallel session protocol, 9-stack AWS cleanup, and a 17-UC cross-validation sweep that caught 10 silent production-affecting issues, including IAM policy gaps and a missing import, and introduced 3 permanent validation scripts
All deployable AWS runtime features remain opt-in via CloudFormation Conditions; the default deploy mode keeps legacy behavior bit-for-bit identical. The Extended Work items are repository-level tooling, validation, documentation, and process improvements — not runtime features.
In short: AWS serverless and AI/ML services can process files through an S3-compatible access path while both source data and generated artifacts are stored on FSx for ONTAP as the system of record. Existing SMB/NFS users can access the results under the same enterprise file governance model, without an additional AWS console workflow.
Repository: github.com/Yoshiki0705/FSx-for-ONTAP-S3AccessPoints-Serverless-Patterns
Why Public Sector needs sovereign storage + serverless
Three regulatory themes drive the design:
- Data sovereignty: Defense, intelligence, and space agencies commonly cannot move data across regions. FSx for ONTAP provides enterprise storage with NTFS ACL / Active Directory integration inside a single AWS region — the patterns in this library keep AI/ML artifacts in the same region as the source data.
- Access control via existing directory: Public records and geospatial data are already governed by NTFS ACLs in ONTAP. S3 Access Points provide an object-API access path to that same ONTAP-governed data set, with IAM policies and S3 Access Point policies controlling how serverless services (Lambda, Step Functions, Rekognition, Textract, Bedrock) reach it.
- Regulatory alignment: DoD CC SRG Impact Level 4/5, FedRAMP High, NARA GRS retention, FOIA 20-business-day deadlines, and OGC / INSPIRE geospatial standards influence the control design. For Japanese public sector deployments, ISMAP (Information System Security Management and Assessment Program), ガバメントクラウド (Government Cloud), and 個人情報保護法 (Act on the Protection of Personal Information) are relevant governance and regulatory contexts that should be evaluated alongside the technical pattern. Phase 7 models selected guardrails as CloudFormation Guard Hooks and selected workflow checks as Lambda business logic. Final compliance validation remains the responsibility of the deploying organization — this library provides building blocks, not an accreditation package.
Each UC is deployable to ap-northeast-1 today. For AWS GovCloud (US), the same pattern can be adapted after validating service availability (Bedrock, Rekognition, Textract, SageMaker, FSx for ONTAP S3 AP, Guard Hooks all have region-specific feature support), region-specific behaviour, and agency compliance requirements. This article does not claim GovCloud-certified patterns.
One pattern library, 17 industries, 8 languages
The FSx for ONTAP S3 Access Points library began with cross-industry ambition: the same "serverless AI/ML on top of enterprise file storage" pattern applies to legal, financial, healthcare, manufacturing, genomics, VFX, automotive, retail, logistics, and now Public Sector. To make that claim credible, each use case ships with its own <uc>/docs/architecture.md and <uc>/docs/demo-guide.md authored in the industry's domain vocabulary — not a single generic template reskinned 17 times.
Phase 7 is also where the library completed its full 8-language coverage for every per-UC doc, not just the top-level README:
- 8 target languages: Japanese (original), English, Korean, Simplified Chinese (zh-CN), Traditional Chinese (zh-TW), French, German, Spanish
-
Per-UC docs:
architecture.md+demo-guide.mdin each of the 17 UC directories - Coverage reached by the end of Phase 7: 17 UCs × 2 docs × 8 languages = 272 localized documentation files (only UC6's four one-off documents — meeting notes, audience-specific talk scripts — stay single-language on purpose)
Why per-UC docs matter alongside the top-level READMEs:
- Industry-specific vocabulary: "FOIA" (UC16), "DICOM" (UC5), "FASTQ" (UC7), "SEG-Y" (UC8), "DRC" (UC6), "IFC" (UC10), "LAS" (UC17) — these terms are not interchangeable, and a reviewer evaluating the pattern for their industry needs to see the correct terms in their own language
- Regional rollout: A Korean government agency (UC16), a French smart-city consortium (UC17), a German automotive OEM (UC9), and a Chinese retail chain (UC11) can each read the pattern in their primary working language and decide independently
- Translation-as-validation: Forcing the pattern description through 8 languages exposes ambiguous or industry-specific handwaving — if a sentence can't be translated cleanly, it usually isn't a precise sentence in the original
This multi-industry × multi-language stance is what made Phase 7's Theme R (below) a first-class deliverable, not an afterthought. The 97-file localization batch we describe later isn't just "fix a bug" — it's the completion of the same localization standard we've held for the top-level README, now applied down to every per-industry demo script.
UC15: Defense / Space — Satellite Imagery Analytics
Architecture
graph LR
FSx[FSx for ONTAP] --> S3AP[S3 Access Point]
S3AP --> SFN[Step Functions]
SFN --> D[Discovery]
D --> T["Tiling<br/>rasterio Layer"]
T --> R{Size?}
R -->|"< 5MB"| Rek[Rekognition]
R -->|">= 5MB"| SM[SageMaker Batch]
Rek --> CD["Change Detection<br/>DynamoDB geohash"]
SM --> CD
CD --> GE[Geo Enrichment]
GE --> A["Alert Generation<br/>SNS"]
Six Lambda functions for discovery, COG tiling, object detection, time-series change detection, geo-metadata enrichment, and SNS alerts. The tiling stage runs with rasterio via Lambda Layer (fallback to pure-Python header parsing when the Layer is absent), and the object detection stage routes via the Phase 6B determine_inference_path() helper — Rekognition for < 5 MB images, SageMaker Batch Transform for larger images.
Key design decisions
- geohash-based tile indexing: DynamoDB partition key is a precision-5 geohash (~5 km square) rather than a file path. This makes the time-series change detection join by spatial locality, not by filename.
-
Change threshold in km²: The alert threshold (
CHANGE_AREA_THRESHOLD_KM2) defaults to1.0km². The_compute_diff_area_km2helper treats Rekognition's normalizedBoundingBoxas degrees and converts to km² (1° ≒ 111 km). -
Float → Decimal conversion: DynamoDB does not accept Python float, so
_to_decimalrecursively converts all floats toDecimal. Discovered during AWS deployment verification and committed as a production-ready fix.
UC16: Government — Public Records / FOIA
Architecture
graph LR
FSx[FSx for ONTAP] --> S3AP[S3 Access Point]
S3AP --> SFN[Step Functions]
SFN --> D[Discovery]
D --> O["OCR<br/>Textract sync/async"]
O --> C["Classification<br/>Comprehend"]
C --> E["Entity Extraction<br/>PII Detection"]
E --> Red["Redaction<br/>sidecar JSON"]
Red --> Ch{OpenSearch?}
Ch -->|"enabled"| I[Index Generation]
Ch -->|"disabled"| Cmp["Compliance Check<br/>NARA GRS"]
I --> Cmp
FL["EventBridge<br/>1/day"] --> FLL[FOIA Deadline Lambda]
FLL --> SNS[SNS Reminder]
Eight Lambda functions: discovery, OCR (Textract sync/async), classification (Comprehend custom classifier with keyword fallback), PII entity extraction, redaction with sidecar metadata, OpenSearch index generation, NARA compliance check, and FOIA deadline reminder.
Sovereignty note: UC16 uses a Textract cross-region fallback to
us-east-1because Textract is not available inap-northeast-1at the time of writing. This is a functional verification path, not a sovereign deployment recommendation. For strict data-residency requirements, the Textract cross-region call should be disabled or replaced with an in-region OCR alternative (e.g., a Bedrock vision model in the local region, or a SageMaker-hosted document OCR model). The pattern is in the library to show the graceful-fallback wiring; production deployments under strict sovereignty constraints should select a different OCR backend.
Three OpenSearch deployment modes
Declared via the OpenSearchMode parameter:
| Mode | Use case | Monthly cost (est.) |
|---|---|---|
none |
Development / cost-optimized | $0 |
serverless |
Variable search workloads | $350 – $700 (min 2 OCU) |
managed |
Fixed small workload | $35 – $100 (t3.small.search × 1) |
The Step Functions IndexOrSkip Choice state bypasses IndexGeneration when OpenSearch is disabled. OpenSearch CloudFormation resources are guarded by CreateOpenSearchServerless / CreateOpenSearchManaged conditions so the stack deploys without OpenSearch at all when OpenSearchMode=none.
PII hashing — zero original retention
The redaction sidecar stores only SHA-256 hashes of the original PII (never the cleartext), with offsets preserved for audit:
{
"entity_type": "NAME",
"original_offset": [8, 16],
"original_text_hash": "sha256:a3b5...",
"confidence": 0.99
}
This satisfies NARA / FOIA Section 552 audit requirements while preventing accidental PII leakage through log aggregation or search indexing.
FOIA 20-business-day calculation
US federal holidays are hardcoded in foia_deadline_reminder/handler.py. The add_business_days helper skips weekends and federal holidays when computing deadlines. days_until_deadline returns 0 for past dates — upstream code interprets 0 as OVERDUE and publishes an SNS alert with severity=HIGH.
Chain structure: the feature that made OutputDestination non-trivial
UC16's processing chain is the deepest in the library:
OCR put_text(ocr-results/{key}.txt)
→
Classification get_text(ocr-results/{key}.txt) + put_json(classifications/...)
→
EntityExtraction get_text(ocr-results/{key}.txt) + put_json(pii-entities/...)
→
Redaction get_text(ocr-results/{key}.txt) + put_text(redacted/...) + put_json(redaction-metadata/...)
→
IndexGeneration get_text(redacted/...) + OpenSearch index call
When we flipped this to FSXN_S3AP mode, every downstream handler needed to read the previous stage's output from wherever the producer wrote it. The naïve implementation — each handler hardcodes s3://<output-bucket>/... — breaks immediately. We solved this by adding symmetric get_* helpers to shared/output_writer.py (more on this below).
UC17: Smart City — Geospatial Analytics & Urban Planning
Architecture
graph LR
FSx[FSx for ONTAP] --> S3AP[S3 Access Point]
S3AP --> SFN[Step Functions]
SFN --> D[Discovery]
D --> P["Preprocessing<br/>CRS → EPSG:4326"]
P --> L["Land Use<br/>Rekognition / SageMaker"]
L --> CD["Change Detection<br/>DynamoDB L1 delta"]
CD --> IA["Infra Assessment<br/>laspy LAS"]
IA --> RM["Risk Mapping<br/>flood/quake/slide"]
RM --> RG["Report Generation<br/>Bedrock Nova Lite"]
Seven Lambda functions. The headline feature is Bedrock Nova Lite-generated planning commentary in Japanese, produced from the combination of land-use distribution, change magnitude, and three risk scores.
Disaster risk model
Three independent risk scores, each normalized 0.0–1.0:
-
Flood:
0.4 × elevation_score + 0.3 × water_proximity_score + 0.3 × impervious_rate -
Earthquake:
0.6 × soil_score + 0.4 × building_density -
Landslide:
0.5 × slope_score + 0.3 × precipitation_score + 0.2 × vegetation_score
Classified into CRITICAL / HIGH / MEDIUM / LOW bands. The risk model is intentionally simple — it is production-ready as a first-pass indicator, but agencies are expected to replace it with domain-specific models in SageMaker.
Sample Bedrock report (Japanese)
### 自治体担当者向け所見レポート
#### 都市計画上の注目点
GISデータによると、市内の土地利用分布は安定しており、変化は検出されていません。
しかし、洪水、地震、斜面崩壊のリスクが中程度であることに注意が必要です。
#### 優先すべき対策案
1. 洪水対策の強化: 中程度の洪水リスクに対応するため、排水システムの改善と
洪水予測モデルの更新を実施。
2. 地震対策の強化: 地震リスクに対応するため、建物の耐震基準の見直しと
緊急避難経路の整備を推進。
3. 斜面崩壊対策の強化: 斜面崩壊リスクに対応するため、斜面の安定性調査と
防護工事の実施を検討。
Generated by a real Bedrock amazon.nova-lite-v1:0 invocation during AWS verification. The content is contextually appropriate though the input was a minimal test raster.
This report, at ~1.1 KiB saved as text/markdown, is the most visceral demonstration of the "no additional output-copy" pattern: a municipal planner browsing /fsxn-volume/gis/2026/05/ via SMB/NFS finds the raw GeoTIFF inputs alongside the Bedrock-generated Markdown report, openable in any text editor. No per-UC output bucket, no region transfer, and no additional AWS console workflow required for the end consumer beyond their existing SMB/NFS access.
The OutputDestination API: one switch for STANDARD_S3 vs FSXN_S3AP outputs
Phase 7 introduced Pattern B — a OutputDestination parameter with two modes:
| Mode | AI / ML output destination | Per-UC output bucket created? |
|---|---|---|
STANDARD_S3 (default) |
Output bucket (legacy behavior) | Yes |
FSXN_S3AP |
FSx ONTAP S3 Access Point | No (skipped via Condition) |
Before Phase 7, the library had three patterns coexisting:
-
Pattern A: FSx for ONTAP S3AP only (UC1-5, UC15-17 original form) — configured via
S3AccessPointAlias+S3AccessPointOutputAliasparameters -
Pattern B (new): Switchable via
OutputDestination(initially UC11 and UC14, then rolled out to UC9/10/12/14, then UC15-17, now also UC1-5) - Pattern C: Standard S3 only (UC6/7/8/13 due to Athena OutputLocation not supporting S3AP — tracked as FR-2)
After Phase 7 Extended Work completion on 2026-05-11, 13 of 17 UCs support the unified OutputDestination parameter. Only UC6/7/8/13 remain on Pattern C; moving them to a hybrid Pattern B+C is scheduled for Phase 8.
CloudFormation shape
Parameters:
OutputDestination:
Type: String
Default: "STANDARD_S3"
AllowedValues: ["STANDARD_S3", "FSXN_S3AP"]
S3AccessPointOutputAlias:
Type: String
Default: ""
Description: Required when OutputDestination=FSXN_S3AP
Conditions:
UseStandardS3:
!Equals [!Ref OutputDestination, "STANDARD_S3"]
Resources:
OutputBucket:
Type: AWS::S3::Bucket
Condition: UseStandardS3 # Skipped when FSXN_S3AP
Properties: ...
When OutputDestination=FSXN_S3AP, the output bucket resource is never created. The Lambda environment variables point to the S3 Access Point alias directly, and the shared/output_writer.py module routes writes through the S3AP regardless of whether a bucket exists.
The OutputWriter module
shared/output_writer.py centralizes the mode-aware logic:
class OutputWriter:
def __init__(self, mode: str, s3_ap_alias: str = "", bucket_name: str = ""):
self.mode = mode
if mode == "FSXN_S3AP":
self.s3_destination = f"arn:aws:s3:{region}:{account}:accesspoint/{s3_ap_alias}"
else:
self.s3_destination = bucket_name
@classmethod
def from_env(cls) -> "OutputWriter":
return cls(
mode=os.environ["OUTPUT_DESTINATION"],
s3_ap_alias=os.environ.get("S3_ACCESS_POINT_OUTPUT_ALIAS", ""),
bucket_name=os.environ.get("OUTPUT_BUCKET_NAME", ""),
)
def put_json(self, key: str, data: dict): ...
def put_text(self, key: str, text: str): ...
def put_bytes(self, key: str, data: bytes, content_type: str): ...
# Added during UC16 chain migration:
def get_json(self, key: str) -> dict: ...
def get_text(self, key: str) -> str: ...
def get_bytes(self, key: str) -> bytes: ...
The symmetric get_* helpers arose directly from UC16's chain structure. Without them, downstream handlers would have tried to read from a (non-existent) output bucket when running in FSXN_S3AP mode. The discovery came during the migration itself, and the fix was trivial once the problem was named — seven new unit tests in shared/tests/test_output_writer.py, total 28 PASS.
Phase 6B Guard Hooks compliance
All three templates deploy successfully with the active Guard Hooks stack fsxn-s3ap-guard-hooks:
| Rule | UC15 | UC16 | UC17 |
|---|---|---|---|
encryption-required |
✅ S3 SSE-KMS, DynamoDB SSE, SNS KMS | ✅ | ✅ |
iam-least-privilege |
✅ API-bound wildcards only | ✅ | ✅ Bedrock foundation-model ARN |
logging-required |
✅ all 6 Lambdas | ✅ all 8 | ✅ all 7 |
point-in-time-recovery |
✅ ChangeHistoryTable | ✅ RetentionTable + FoiaRequestTable | ✅ LandUseHistoryTable |
AWS verification in FSXN_S3AP mode
Theme E verification results (2026-05-11)
| UC | Stack | SFN duration | Artifacts | OutputBucket created? |
|---|---|---|---|---|
| UC15 defense-satellite | fsxn-uc15-demo |
~12s | 5 files on S3AP | No |
| UC16 government-archives | fsxn-uc16-demo |
~90s | 6 files on S3AP | No |
| UC17 smart-city-geospatial | fsxn-uc17-demo |
~14s | 4 files on S3AP | No |
Each stack's resource inventory confirmed via:
aws cloudformation describe-stack-resources \
--stack-name fsxn-uc15-demo \
--query '[?ResourceType==`AWS::S3::Bucket`]' \
--output json
# Returns [] — no bucket resources created
The most instructive single verification: UC16 chain read-back
After the Step Functions execution completed, we inspected the S3AP:
aws s3 ls s3://eda-demo-s3ap-<alias>/ai-outputs/uc16/
# PRE classifications/
# PRE pii-entities/
# PRE redacted/
# PRE redaction-metadata/
Each subdirectory has the expected artifacts. Critically, the Classification handler's input was a read-back from the OCR stage:
# classification/handler.py
output_writer = OutputWriter.from_env()
ocr_text = output_writer.get_text(f"ai-outputs/uc16/ocr-results/{key}.txt")
classification = bedrock_or_keyword_fallback(ocr_text)
output_writer.put_json(f"ai-outputs/uc16/classifications/{key}.json", classification)
In FSXN_S3AP mode, get_text() reads through the S3 Access Point → FSx for ONTAP volume. In STANDARD_S3 mode, the same call reads from the output bucket. The handler code is identical in both modes; the routing decision is environment-variable-driven inside OutputWriter.
UC17 Bedrock report: SMB/NFS accessibility
The Bedrock Nova Lite invocation took ~5 seconds and returned a markdown document. Saved at ai-outputs/uc17/reports/city1.tif.md, the file is:
- Immediately visible via SMB to anyone mounted on the same FSx for ONTAP volume
- Openable in any text editor — no AWS SDK or console workflow required for the end consumer
- Co-located with source rasters — the GIS team sees the AI-generated commentary alongside the raw data
This is the pattern that most clearly demonstrates "serverless AI artifacts land in your existing file-sharing infrastructure without a data-movement step."
Extended Work I: Screenshot Redaction — From Heavy Mask to OCR Precision
Phase 7 was originally delivered with a safe-by-default screenshot masking strategy (v6): grey rectangles covering entire console content areas, exposing only narrow strips that contained the Step Functions Graph view. This over-masked the UI/UX that the PR was meant to showcase.
PR #2 replaced v6 with v7 OCR-based precision masking across all 116 tracked screenshots.
How v7 works
- Run
pytesseract.image_to_data(lang="eng+jpn")on each screenshot to get word-level bounding boxes - For each detected word, check if it contains any substring from
SENSITIVE_STRINGS(a gitignored local file listing account IDs, resource IDs, private IPs, emails, etc.) - Draw a small black rectangle only over the matched word
- Re-run OCR on the partially-masked image, up to 4 passes (tesseract tokenization is non-deterministic at word boundaries — long URIs like
s3://bucket-<account-id>/objsometimes match on the second pass but not the first) - Always mask the top-right account widget (fixed position, often missed by OCR due to styling)
- Apply to both AWS console screenshots and HTML preview mocks — the latter had been copied as-is in v6 and were leaking account IDs embedded in rendered S3 URIs
Why lang="eng+jpn"
AWS console screenshots in our environment render in Japanese (sidebar labels, breadcrumbs, action buttons). Running tesseract with lang="eng" silently misses sensitive words that appear adjacent to Japanese text — the tokenizer treats the adjacent CJK characters as word terminators but doesn't always segment cleanly. Two leaks discovered post-v7 surfaced only when we switched to lang="eng+jpn":
-
phase1-fsx-filesystem-detail.png:fs-<filesystem-id>next to a Japanese label -
phase7-uc15-s3-satellite-uploaded.png: S3 Access Point alias next to Japanese column headers
The fix was a single string change. We now use eng+jpn across all masking and leak-verification tooling.
Content preservation
v7 preserves approximately 99% of screenshot content. A sample before/after comparison on uc11-product-tags.png:
- v6: Almost the entire HTML preview was grey-washed, only a hint of "AUTO-TAGGED" visible
-
v7: Product image, tag chips (Oval 99.93%, Food 60.67%, ...), description text, and panel layout all fully visible. Only the
s3://fsxn-retail-catalog-demo-output-<account-id>/tags/2026/05/10/product-001.jsonURI has a small black rectangle over the bucket name.
Verification workflow
We added scripts/_check_sensitive_leaks.py (gitignored) as a ground-truth detector:
def scan_image(path: Path) -> list[tuple[str, str]]:
img = Image.open(path).convert("RGB")
text = pytesseract.image_to_data(
img, lang="eng+jpn", output_type=pytesseract.Output.DICT
)
hits = []
for word in text["text"]:
if not word:
continue
for s in SENSITIVE_STRINGS:
if s in word:
hits.append((s, word))
return hits
After v7 re-masking, _check_sensitive_leaks.py reports:
Scanned: 116 masked images
Images with detectable sensitive substrings: 0
This zero-leak state has been maintained across every subsequent commit that adds or modifies screenshots — it is part of our pre-commit checklist.
Rule E (mandatory leak check)
Formalized in docs/dual-kiro-coordination.md: any session adding screenshots must run _check_sensitive_leaks.py and confirm 0 leaks before committing. The tooling is distributed via .example templates (scripts/_check_sensitive_leaks.py.example, scripts/_inplace_ocr_mask.py.example, scripts/_sensitive_strings.py.example) so each contributor can bootstrap their local copy without committing the actual sensitive-strings file.
Extended Work II: Completing 8-Language Coverage for All 17 UCs (97-File Localization Batch)
The library had committed to 8-language docs for every UC since Phase 5 (<uc>/README.md + 7 translated variants, plus per-UC architecture.md and demo-guide.md in the same languages). What we discovered mid-Phase-7 was that two of those eight languages had never actually been translated for the per-UC docs, despite having the language-switcher markup in place.
User-surfaced symptom: clicking [简体中文] in the language switcher at the top of smart-city-geospatial/docs/demo-guide.md navigated to demo-guide.zh-CN.md, but the body was still Japanese. Same behaviour for [繁體中文], and for all 14 non-Public-Sector UCs. For the three Public Sector UCs (UC15-17), all seven target-language variants were partial — the language switcher and the "this is an auto-generated draft" note had been translated at file creation, but the body remained Japanese.
A survey script (scripts/_survey_translation_status.py, gitignored) produced this matrix:
| Language | Translated | Partial | Stub (JP body) |
|---|---|---|---|
| en | 28 | 6 | 0 |
| ko | 28 | 6 | 0 |
| zh-CN | 0 | 17 | 17 |
| zh-TW | 0 | 17 | 17 |
| fr | 28 | 6 | 0 |
| de | 28 | 6 | 0 |
| es | 28 | 6 | 0 |
Two root causes:
- zh-CN and zh-TW had never been translated for per-UC docs — all 34 files across 14 UCs (UC1-14 × 2 docs × 2 langs) kept Japanese bodies. This was an honest omission from an earlier batch translation that prioritized top-level READMEs over per-UC docs; it was meant to be followed up and never was.
-
Phase 7 UC15-17 (new) were
partialacross all 7 target languages — auto-generated drafts shipped with the language-switcher and boilerplate translated, but the body remained Japanese pending a follow-up translation pass.
Total scope for this phase's localization batch: 97 files — 55 files backfilling the two Chinese variants for UC1-14, plus 42 files providing the full 7-language initial translation for UC15-17. This brings the per-UC language variant count to 272 files total (17 UCs × 2 docs × 8 languages, with 4 intentionally single-language one-off files under UC6 excluded).
Why 55 and not 56? The expected R-1 matrix has
14 UCs × 2 docs × 2 languages = 56 slots. One of those slots —semiconductor-eda/docs/demo-guide.zh-CN.md— had been hand-translated during an earlier session on 2026-05-09 and was already complete, so the Bedrock batch regenerated only the remaining 55 files. The git commit preserves this detail: "UC6 demo-guide.zh-CN was previously hand-translated on 2026-05-09 and left unchanged."
The framing matters: this wasn't a bug-fix, it was the completion of our existing 8-language standard for the per-UC docs. Legal compliance, financial IDP, manufacturing analytics, media/VFX, healthcare DICOM, semiconductor EDA, genomics, energy/seismic, autonomous driving, construction BIM, retail catalog, logistics OCR, education research, and insurance claims — all now have their demo-guide and architecture docs readable by a zh-CN or zh-TW speaker without Google Translate. And the three new Public Sector UCs (defense/satellite, government archives, smart city) reach the same parity at launch.
Using Amazon Bedrock Claude Sonnet 4.5
We built a per-file translator script (scripts/_translate_uc_docs.py, gitignored) around the Claude Sonnet 4.5 model via the JP regional inference profile jp.anthropic.claude-sonnet-4-5-20250929-v1:0:
def build_language_switcher(doc_base: str, target_lang: str) -> str:
"""Build the language switcher line with target language unlinked."""
...
TRANSLATION_PROMPT = """You are a professional technical translator.
Translate the following Japanese Markdown document into {target_lang_name}.
## Rules
1. Preserve code blocks verbatim.
2. Preserve technical identifiers: parameter names, ARN placeholders,
URLs, file paths, command names.
3. Preserve markdown structure exactly.
4. Translate Mermaid diagram labels but keep node IDs intact.
5. Translate table cells except proper nouns.
6. Industry terminology consistency (per uc-industry-mapping.md).
7. Output ONLY the translated Markdown body.
"""
And a batch driver (scripts/_batch_translate_r1.sh, gitignored) that iterates 14 UCs × 2 docs × 2 langs:
for uc in legal-compliance financial-idp ... insurance-claims; do
for doc in demo-guide architecture; do
for lang in zh-CN zh-TW; do
python3 scripts/_translate_uc_docs.py "$uc" "$doc" "$lang"
done
done
done
Results
- R-1 (UC1-14, zh-CN + zh-TW): 55 files translated in 45 minutes, ~$5–7 Bedrock cost (the expected 14 × 2 × 2 = 56-slot matrix had one slot already complete from a prior hand-translation, so the batch regenerated only the remaining 55)
- R-2 (UC15-17 × 7 langs × 2 docs): 42 files translated in 30 minutes, ~$3–4 Bedrock cost
- Total: 97 files, 75 minutes wall time, ~$10 Bedrock cost
Spot-check example: UC5 healthcare-dicom zh-CN
Before (stub):
# DICOM 匿名化工作流程 — 演示指南
🌐 Language / ... : 日本語 | English | ...
> Note: This translation is an auto-generated draft based on the Japanese original.
## 前提
- AWS アカウント、ap-northeast-1
- FSx for ONTAP + S3 Access Point
- Bedrock モデル利用可能化
After (translated):
# DICOM 匿名化工作流程 — Demo Guide
🌐 Language / ... : 日本語 | English | ... | 简体中文 | ...
> 注意:此翻译由 Amazon Bedrock Claude 生成。欢迎对翻译质量提出改进建议。
## 前提
- AWS 账户,ap-northeast-1
- FSx for ONTAP + S3 Access Point
- Bedrock 模型可用
Technical identifiers preserved (FSx for ONTAP, S3 Access Point, Bedrock, AWS, ap-northeast-1). Non-identifier Japanese text translated to Chinese.
Validation: distinguishing translated Chinese from untranslated Japanese
The first pass of our translation-status survey script used CJK Unicode ratio as the heuristic for "is this translated?" — high CJK meant "still Japanese". This failed for zh-CN/zh-TW output because translated Chinese also has high CJK ratio. We fixed the heuristic to check Hiragana + Katakana specifically (which are Japanese-exclusive scripts):
if target_lang in ("zh-CN", "zh-TW"):
# Japanese-specific scripts only
jp_chars = sum(
1 for c in body
if "\u3040" <= c <= "\u309f" or "\u30a0" <= c <= "\u30ff"
)
ratio = jp_chars / max(len(body), 1)
if ratio < 0.005:
return "translated"
After this fix plus the Bedrock batch, all 17 UCs × 7 target languages × 2 doc types show as translated in the survey.
Commit strategy: per-UC granularity for R-2
Per my parallel session partner's preference, R-2 was split into 3 commits (one per Phase 7 UC) rather than a single 42-file commit:
764d8c6 feat(i18n): R-2 UC15 defense-satellite — translate docs to 7 languages4a29a1a feat(i18n): R-2 UC16 government-archives — translate docs to 7 languages3fba028 feat(i18n): R-2 UC17 smart-city-geospatial — translate docs to 7 languages
This granularity makes git log -- defense-satellite/docs/ useful for UC-scoped review and permits UC-level revert if translation quality issues are caught post-merge.
Extended Work III: AWS Resource Cleanup — Three Failure Modes That Weren't in the Script
Post-verification cleanup of nine UC stacks (UC1/2/3/5/7/8/10/12/13) surfaced three blocking conditions that our scripts/cleanup_generic_ucs.sh did not handle:
Failure 1: Athena WorkGroup is non-empty
AthenaWorkgroup failed to delete:
"Invalid request provided: WorkGroup fsxn-manufacturing-analytics-demo-workgroup
is not empty"
Fix: aws athena delete-work-group --work-group <name> --recursive-delete-option. Affected UC3, UC7, UC8.
Failure 2: S3 bucket has object versions
AthenaResultsBucket failed to delete:
"The bucket you tried to delete is not empty. You must delete all versions."
The bucket had versioning enabled (correctly — for recovery). Normal aws s3 rb only removes current-version objects; delete markers and historical versions must be deleted via aws s3api delete-objects --delete file://markers.json. Fix: added a helper script scripts/_empty_versioned_bucket.sh (gitignored during Phase 7, planned for integration into the main cleanup script in Phase 8).
Failure 3: Lambda Security Group has a dependent object
LambdaSecurityGroup failed to delete:
"resource sg-<lambda-sg> has a dependent object"
Root cause: a Phase 7 manual workaround (tracked as O-2 in the Phase 7 tasks) added inbound rules to the VPC Endpoint SG referencing per-UC Lambda SGs. Deleting the UC stack tried to delete the Lambda SG, but the VPC Endpoint SG still had a rule referencing it.
Fix:
aws ec2 revoke-security-group-ingress \
--group-id sg-<vpc-endpoint-sg> \
--ip-permissions 'IpProtocol=tcp,FromPort=443,ToPort=443,
UserIdGroupPairs=[{GroupId=sg-<lambda-sg>}]'
Affected UC1, UC2. We revoke only the rules for the UCs being deleted; UC6's reference is kept because UC6 remains deployed.
Secondary bug in the cleanup script itself
The cleanup script had <ACCOUNT_ID> as a literal placeholder string (a leftover from a global redaction pass). When executed, it constructed bucket names like fsxn-legal-compliance-demo-output-<ACCOUNT_ID> — a bucket that doesn't exist, so aws s3 rb silently no-oped. Stacks that needed their output buckets emptied before deletion then failed.
Fix (commit 770f713):
ACCOUNT_ID="${ACCOUNT_ID:-$(aws sts get-caller-identity --query 'Account' --output text 2>/dev/null)}"
if [ -z "$ACCOUNT_ID" ] || [ "$ACCOUNT_ID" = "<ACCOUNT_ID>" ]; then
echo "ERROR: could not resolve AWS account ID." >&2
exit 1
fi
Final cleanup outcome
After resolving all three failure modes:
- 9 / 9 stacks:
DELETE_COMPLETE - 0 DynamoDB tables with
fsxn-prefix remaining - Total cleanup wall time: approximately 60 minutes (including CloudFormation poll intervals for VPC Lambda ENI release, which typically takes 15–30 minutes per stack)
The three failure modes are tracked in the Phase 8 spec as Theme A items to be integrated into a rewritten cleanup_generic_ucs.py (Python, with --dry-run and per-failure-mode recovery).
Extended Work IV: A Dual-Kiro Coordination Protocol
Phase 7 was built with two AI sessions working in parallel on the same repository. The sessions are labeled A and B in chat notifications. Early in the sprint, this parallelism nearly cost us a working tree of uncommitted v7 mask work — twice, via git checkout from the other session wiping my in-progress edits.
The recovery relied on git stash, which did save the work, but the near-miss triggered a codification effort.
docs/dual-kiro-coordination.md
A 506-line protocol document now lives at docs/dual-kiro-coordination.md. Its core rules:
Rule C-1: Check current branch before every commit.
Before every `git commit`, run `git branch --show-current` and
verify you are on the branch you intend to commit to.
Rule C-2: Never checkout with a dirty working tree you don't own.
If `git status --short` shows modifications that are not yours:
- Do NOT run `git checkout`, `git switch`, `git stash`, `git reset --hard`
- Send a chat notification first:
"[X] I need to checkout <target>. Your working tree has N dirty files.
Is your working tree safe to stash?"
- Wait for the other session's "yes" before proceeding.
Rule C-3: Stash is a safety net, not a workflow.
Rule C-4: Recognize reflog anomalies immediately.
Rule C-5: Force-push requires --force-with-lease, never bare --force.
Exclusive regions
Each session declares an "exclusive region" — the set of paths and files that only it edits during the sprint:
-
A's region:
scripts/mask_uc_demos.py,docs/screenshots/masked/**,docs/screenshots/MASK_GUIDE.md, cleanup helpers,.kiro/specs/.../phase7/tasks.md,docs/dual-kiro-coordination.md, article files -
B's region:
shared/output_writer.py, per-UC trees,docs/output-destination-patterns.md,docs/aws-feature-requests/fsxn-s3ap-improvements.md, README sections, per-UC demo-guide localizations, Phase 7 summary / troubleshooting docs
Cross-region edits require explicit chat lock acquisition ([A] LOCK REQUEST: README.md "AWS 仕様" section) with duration bounded to 15–30 minutes.
Rule E: Mandatory leak check
Any session adding screenshots must run _check_sensitive_leaks.py and report 0 leaks before committing. Tooling is distributed via .example templates so contributors bootstrap locally without ever committing the actual _sensitive_strings.py.
Rule F: Industry mapping consistency
docs/screenshots/uc-industry-mapping.md (maintained in the partner session's exclusive region) maps each UC number to an industry label in 8 languages. New UCs or renaming events require agreement before the mapping file is updated.
Emergency pause
If either session detects:
- Main CI in a broken state
- Unmergeable conflict after 2 rebase attempts
- Suspected destruction of the other's work
…the session posts [X] EMERGENCY PAUSE with a one-line summary and both halt non-trivial operations until the human driver decides recovery.
The full protocol is committed as docs/dual-kiro-coordination.md — it will evolve as we gather more experience in Phase 8.
Extended Work V: The 17-UC Cross-Validation Sweep
During the final screenshot round, running aws cloudformation deploy + Step Functions execution across all 17 UCs surfaced a class of bugs that had been silent under pure unit-test + cfn-lint verification. These were shipped-but-broken states: the templates passed static checks, the handlers passed unit tests, and yet AWS-level runtime behavior failed.
Bug class: IAM silent failure on S3 Access Point ARN form
CloudFormation templates that granted IAM access to an FSxN S3 Access Point only in alias form (arn:aws:s3:::<alias>) passed cfn-lint and deployed successfully. But at runtime, AWS-level S3 API calls performed against the ARN form (arn:aws:s3:<region>:<account>:accesspoint/<name>) hit AccessDenied because neither form implies the other in IAM policy evaluation.
The fix, already applied to UC6 earlier and now propagated across all affected UCs:
Conditions:
HasS3AccessPointName:
!Not [!Equals [!Ref S3AccessPointName, ""]]
Resources:
DiscoveryLambdaRole:
Type: AWS::IAM::Role
Properties:
Policies:
- PolicyName: S3APAccess
PolicyDocument:
Statement:
- Effect: Allow
Action: s3:GetObject
Resource:
- !Sub "arn:aws:s3:::${S3AccessPointAlias}/*"
- !If
- HasS3AccessPointName
- !Sub "arn:aws:s3:${AWS::Region}:${AWS::AccountId}:accesspoint/${S3AccessPointName}/object/*"
- !Ref "AWS::NoValue"
UCs affected by this silent bug: UC1 (legal-compliance), UC2 (financial-idp), UC3 (manufacturing-analytics), UC5 (healthcare-dicom), UC6 (semiconductor-eda), UC7 (genomics-pipeline), UC8 (energy-seismic), UC10 (construction-bim), UC14 (insurance-claims). UC4 and UC9 received the same fix during Theme Q. That's 11 of 17 UCs that had a production-affecting silent bug that only surfaced on full AWS deploy + run.
Bug class: Lambda handler NameError (missing imports)
One Lambda handler in financial-idp/functions/entity_extraction/handler.py called os.environ.get(...) without import os. Unit tests passed because the test fixtures had mocked the environment lookup path entirely, so the code path that needed os was never actually executed. CloudFormation accepted the zip. The handler ran on AWS and hit NameError: name 'os' is not defined at first invocation.
Validation tooling added
To prevent these classes of bugs from reaching production on future UC additions, three reusable validation scripts were added and are now part of the recommended pre-merge checklist:
| Script | Detection target | Runtime |
|---|---|---|
scripts/lint_all_templates.sh |
cfn-lint schema errors across all 17 UCs in parallel | 5-7 min |
scripts/check_handler_names.py |
pyflakes NameError / undefined name sweep of all Lambda handlers | ~30 sec |
scripts/check_conditional_refs.py |
UC9-class bug — !Sub ${Resource.Arn} referencing a Condition-guarded resource |
~5 sec |
Verification at the Extended Round's final commit: 0 real cfn-lint errors across 17 templates, 0 pyflakes errors across 87 handlers, 0 conditional-ref issues across 17 templates.
Why static checks weren't enough
This experience reinforces a pattern that showed up repeatedly in Phase 7:
- cfn-lint validates CloudFormation schema but doesn't simulate IAM evaluation
- Unit tests validate handler logic given mocked clients, but the mock setup can accidentally bypass the broken code path
-
aws cloudformation deploysucceeds even with silently-broken IAM configurations — AWS only complains at the first real API call
The remedy isn't to add more types of static checks (they'd pile up endlessly) — it's to establish a "deploy and actually run it" verification lane for every UC on every non-trivial change. The check_conditional_refs.py script is an exception: it's narrow, fast, and catches a specific class of bug that is otherwise only detectable by attempting a parameterized deploy.
Verification results (ap-northeast-1)
Full details in the Phase 7 verification results and the OutputDestination Theme E verification.
| UC | Step Functions execution | Output artifacts | Cost |
|---|---|---|---|
| UC15 | SUCCEEDED (~30s initial, ~12s with FSXN_S3AP) | enriched/ + detections/ + tiles/ on S3AP | ~$0.01 |
| UC16 | SUCCEEDED (~35s initial, ~90s with FSXN_S3AP + Textract cross-region) | classifications/ + pii-entities/ + redacted/ + redaction-metadata/ on S3AP | ~$0.02 |
| UC17 | SUCCEEDED (~45s initial, ~14s with FSXN_S3AP + Bedrock) | preprocessed/ + landuse/ + risk-maps/ + reports/*.md on S3AP | ~$0.02 |
Total verification cost across both rounds: ~$0.10 (excluding OpenSearch / SageMaker which stayed disabled).
Sample-data verification (initial run, STANDARD_S3 mode)
A first run with synthetic JPEG / PDF samples produced the following measurable outputs:
- UC15: Rekognition DetectLabels against a 1024×1024 synthetic aerial image (PIL-generated, buildings + roads) detected 15+ labels
- UC16: Textract cross-region call to us-east-1 returned 43 KB Blocks JSON from a 1.6 KB PDF (FORMS + TABLES features). Comprehend DetectPiiEntities found 5 PII entities (NAME, EMAIL, PHONE, SSN, DATE_TIME) with 99.63–99.99% confidence
-
UC17: Bedrock Nova Lite recognized
仙台(Sendai) from the filenamesendai_area.jpgand produced a Japanese urban planning report with 3 countermeasures + 1 monitoring indicator
FSXN_S3AP mode verification (second round, Theme E)
Three stacks deployed to ap-northeast-1 with OutputDestination=FSXN_S3AP. All Step Functions executions SUCCEEDED. Resource inventory confirmed zero AWS::S3::Bucket resources in each stack. CLI evidence is committed to docs/verification-evidence/uc{15,16,17}-demo/ for public reference.
Lessons learned during AWS deployment
From the UC15-17 implementation round
-
Rekognition rejects malformed images with
InvalidImageFormatException. Production code must catch this explicitly and return empty detections rather than letting the exception propagate throughlambda_error_handler(which would abort the workflow). -
DynamoDB's
float → Decimalrequirement. Silent unit test passing doesn't catch this; only real PutItem calls do. Added_to_decimalhelper in UC15 change_detection. -
Textract is unavailable in ap-northeast-1. UC16 OCR needed a graceful fallback (log the
EndpointConnectionError, returnapi_used="unavailable"). Production deploys Cross-Region Client to us-east-1 (the pattern used by UC2/UC10/UC12/UC13/UC14). -
Map state input shape. Step Functions' Map
ItemsPathiterations only expose per-item fields; root-level state machine input (e.g.,opensearch_enabled) is not propagated. Solution:IsPresent: trueguard in Choice, or explicitly pass fields via Map Parameters. -
S3 Access Point ARN format. FSx for ONTAP S3 APs require policies to grant both the alias ARN (
arn:aws:s3:::<alias>) AND the full Access Point ARN (arn:aws:s3:<region>:<account>:accesspoint/<name>). The conditional!If HasS3AccessPointNamepattern mirrors UC6's approach.
From the OutputDestination unification round
-
Lambda logs can go missing even when the handler succeeds. Python 3.13 runtime defaults the root logger to
WARNING. Calls tologger.info(...)silently drop unless the handler explicitly setslogging.getLogger().setLevel(logging.INFO). EMF metrics, Step Functions output, andaws s3api list-objects-v2are the reliable verification signals.aws logs tailoften shows an "empty" log stream even when the handler ran successfully. -
aws s3 ls s3://<alias>behaves differently from ARN form. Use the alias form (aws s3 ls s3://<alias>/path/) for interactivels/ download. The ARN form (aws s3 ls s3://arn:aws:s3:<region>:<account>:accesspoint/<alias>/) is accepted in IAM policies but is rejected by theaws s3 lsCLI command. This surfaced during evidence gathering for Theme E. -
Downstream handlers in a chain structure must use read-back. UC16's OCR → Classification → EntityExtraction → Redaction chain was discovered to need
OutputWriter.get_text()during migration, not via a failing unit test (the unit tests used mocks3_clientdirectly, missing the abstraction entry point). Added 7 new unit tests covering the read helpers to prevent regression. -
Lambda zip + CloudFormation deploy is not enough to refresh code. If a code-only change is pushed and the
cloudformation deployfinds no template changes, the Lambda function retains its previous code. The workaround isaws lambda update-function-code --function-name <name> --zip-file fileb://build/...zip, or touch a template parameter so the stack update triggers function updates.
From the UC9 template bug fix (Theme Q-1)
-
!Subcannot resolve${Resource.Arn}when the resource has aConditionthat evaluates to false. UC9's State MachineDefinitionStringreferenced${RealtimeInvokeFunction.Arn},${SageMakerInvokeFunction.Arn}, and${ComponentsInvokeFunction.Arn}— all guarded byCondition: CreateSageMakerResources/CreateRealtimeEndpoint/CreateInferenceComponents. When those conditions were false, CloudFormation failed with "Unresolved resource dependencies." The fix: useDefinitionSubstitutionswith!Ifto inject either the real ARN (when the resource exists) or a placeholder ARN (when it doesn't). Additionally, aSkipInferencePass state was added as theInferenceRoutingDefault target when all inference backends are disabled — it injects{"status": "SKIPPED"}into$.inference_resultso thatAnnotationManagercan resolve its input parameters without error. The result: UC9 reaches end-to-end SUCCEEDED (2:45 including VPC cold start + Bedrock annotation) even with all SageMaker features disabled. -
Step Functions
NumericGreaterThanOrEqualToPathdoes not exist. The correct field name isNumericGreaterThanEqualsPath. This typo had been present since Phase 2 but never surfaced because the code path was only reached when SageMaker was enabled — and SageMaker was always disabled in prior verification runs. The lesson: Step Functions schema validation only runs on states that are actually reachable during execution, not on the full definition at deploy time. -
Discovery Lambda must emit all fields referenced by downstream Choice states. UC9's
InferenceRoutingChoice state referenced$.discovery.inference_type,$.discovery.file_count, and$.discovery.batch_threshold. The Discovery handler returned none of these. Step Functions raises a runtime error ("Invalid path") when a Choice Variable path doesn't exist in the state input — even if that Choice branch would never be taken. The fix: Discovery always returns these fields with safe defaults (inference_type: "none",file_count: 0,batch_threshold: 100).
From the 17-UC cross-validation sweep
-
IAM policies on FSxN S3 Access Points must grant both alias ARN and full AP ARN.
arn:aws:s3:::<alias>/*andarn:aws:s3:<region>:<account>:accesspoint/<name>/object/*are not interchangeable in IAM evaluation. cfn-lint accepts templates that grant only the alias form; runtime calls made against the ARN form hitAccessDenied. This bug was silent across 9 of 17 UCs until full-UC AWS verification surfaced it. Fix: wrap the AP-ARN clause in!If [HasS3AccessPointName, ..., !Ref "AWS::NoValue"]so it's optional but present when a full AP name is provided. -
Python handlers can ship without required imports and still pass unit tests. UC2's
entity_extraction/handler.pycalledos.environ.get(...)withoutimport os. The unit tests' environment-mock setup never exercised the real code path. CloudFormation accepted the zip. The handler hitNameErroron first real invocation. The remedy: apyflakessweep across all Lambda handlers (scripts/check_handler_names.py) now runs in ~30 seconds and catches this entire class of bug. -
The right response to "unit tests passed but deploy is broken" is not more static checks. It's a "deploy and actually run it" verification lane per UC per non-trivial change. The only exception: narrow, specific static checks like
check_conditional_refs.pythat detect a single well-understood bug class in milliseconds.
From the parallel-session experience
-
A
checkoutdone by the other session can silently clobber your working tree. The recovery path is viagit stash list— the offending session'scheckoutwraps any dirty tracked changes into an auto-stash, which can be popped to recover. The protocol above (Rule C-2) codifies avoiding this by default. -
Translation output and source-layout verification benefit from the same tooling. The
_survey_translation_status.pyscript was useful not only for the 97-file localization batch but also as a regression guard — it produces a matrix that shows at a glance whether a later edit has broken a language variant.
Full stats
Code
-
Lambda functions: 21 new (6 + 8 + 7 across UC15/16/17) +
OutputWriter.get_*helpers inshared/ -
Unit tests: 110 new (31 + 47 + 32 for UC15/16/17) + 7 new for
OutputWriterread helpers — all PASS, 958 total project-wide -
CloudFormation templates: 3 new for UC15/16/17, and templates updated across UC1-5 + UC9-12 + UC14 for
OutputDestinationunification in Phase 7 Extended Work. UC6/7/8/13 remain Pattern C due to AthenaOutputLocationnot supporting S3 Access Points (tracked as FR-2); a hybrid Pattern B+C migration for those four UCs is scheduled for Phase 8 (seedocs/design-pattern-c-to-b-hybrid.md). - cfn-lint: 0 real errors across all templates (E2530 SnapStart warnings filtered per UC convention)
-
Validation scripts: 3 new reusable checks —
lint_all_templates.sh,check_handler_names.py, andcheck_conditional_refs.py
Screenshots
- Tracked masked screenshots: Step Functions Graph view (SUCCEEDED) for all 17 UCs, plus common infrastructure views and per-UC output previews across phase1–7 directories
- v7 re-masked in PR #2: 101 PNGs (v6 heavy-mask replaced)
-
Leak verification: 0 OCR-detectable sensitive substrings (lang=
eng+jpn)
Documentation
-
Per-UC architecture docs: 17 UCs × 8 languages = 136 files, all present; target-language variants marked
translatedby the survey script -
Per-UC demo-guides: 17 UCs × 8 languages = 136 files, all present; target-language variants marked
translated - Total per-UC language variants: 272 files across all 17 industries and 8 languages (only UC6 has 4 additional JP-only / EN-only one-off docs that are intentionally single-language — meeting notes and a specific-audience talk script)
-
Localization batch (Theme R) this phase: 97 files newly translated or backfilled in Phase 7 Extended Work
- R-1: UC1-14 zh-CN + zh-TW backfill (55 files — the two languages that had been lagging behind en/ko/fr/de/es; one of the expected 56 slots was already complete from an earlier hand-translation and did not need regeneration)
- R-2: UC15-17 × 7 target languages × 2 doc types (42 files — new Public Sector UCs' full 7-language initial translation)
- The remaining ~175 per-UC translated files were authored in Phase 5 / Phase 6 for UC1-14 en/ko/fr/de/es and stayed stable through Phase 7.
- Top-level READMEs: all 17 UCs with 8-language coverage (JP + 7 variants, translation batch 217a509 before Phase 7)
-
Cross-cutting docs:
dual-kiro-coordination.md,MASK_GUIDE.md(v7),phase7-summary.md,phase7-troubleshooting.md,output-destination-patterns.md,design-pattern-c-to-b-hybrid.md,design-output-writer-multipart.md
AWS verification
- AWS verification: Initial 12-stack verification across two rounds, followed by a final 17-UC cross-validation sweep
- Step Functions executions: all SUCCEEDED
- Total verification spend: ~$0.10 Phase 7 proper + ~$0.05 Phase 7 Extended cleanup + ~$10 Bedrock translation → ~$10.15 session total
Looking Forward to Phase 8
Phase 7 Extended Work surfaced several structural improvements that warrant a dedicated Phase 8:
-
Cleanup script rewrite: Python version with
--dry-run, integrated Athena WorkGroup + versioned-bucket + VPC Endpoint SG handling (Phase 8 Theme A) - VPC Endpoint SG automation: Custom Resource replacing the current manual inbound-rule workaround (Phase 8 Theme B)
- Sample data generator expansion: DICOM for UC5, IFC for UC10, SHP + LAS for UC17 (Phase 8 Theme C)
- Phase 7 UC15-17 UI/UX screenshots: Manual capture + v7 mask + integration into demo-guides (Phase 8 Theme D)
- Event-driven trigger pattern: S3 PutObject → EventBridge → Step Functions with DynamoDB-based idempotency, starting with UC1 as reference implementation (Phase 8 Theme E)
-
Dual-Kiro protocol v2:
git worktree-based physical checkout separation, exclusive region declaration files (Phase 8 Theme F) - Phase 8 article and 8-language docs update (Phase 8 Theme G)
-
Pattern C → Pattern B hybrid migration for UC6/7/8/13 (Phase 8 Theme H — see
docs/design-pattern-c-to-b-hybrid.mdfor the full design) -
OutputWriter.put_streamAPI for>5 GB artifacts using S3 MultipartUpload (Phase 8 Theme I — seedocs/design-output-writer-multipart.md) - DevSecOps CI/CD pipeline: cfn-lint + pyflakes + Guard Hooks + IAM Access Analyzer integrated into GitHub Actions / CodeBuild, replacing the current manual validation script execution (Phase 8 Theme M — Themes J/K/L cover OutputWriter multipart, template management, and code cleanup tracked in the Phase 8 spec)
- Observability and operational readiness: CloudWatch Alarms, Step Functions DLQ, EventBridge failure notifications, and per-UC operational runbooks for production-grade monitoring (Phase 8 Theme N)
And two residual items from Phase 7 Extended Work that are not Phase 8 scope but tracked as Phase 7 Theme Q:
-
UC9 autonomous-driving: template bug fixed in Phase 7 Theme Q-1 (commits
92d7ce6+b1e3021). Four issues resolved: conditional Lambda ARN resolution viaDefinitionSubstitutions,NumericGreaterThanEqualsPathtypo, Discovery handler missinginference_typefield, andSkipInferencePass state for proper end-to-end SUCCEEDED when inference is disabled. AWS verified: full workflow SUCCEEDED in 2:45 (VPC Discovery + Bedrock annotation + COCO JSON output). Remaining: OutputWriter migration (handler-sides3_client.put_object→OutputWriter.from_env()) deferred to Phase 8 Theme H. - UC4 media-vfx: Deadline Cloud farm/queue setup, sample VFX render, OutputDestination Pattern B migration, UI screenshot.
Conclusion
Phase 7 added three Public Sector use cases, unified OutputDestination across 13 of 17 UCs, brought all 17 industry-specific docs (architecture + demo-guide) to true 8-language parity, and — in its Extended Round — ran a 17-UC cross-validation sweep that caught and fixed 10 silent production-affecting issues before publication. The Extended Work as a whole — OCR-based screenshot redaction, 97-file localization batch, 9-stack AWS cleanup, dual-Kiro coordination protocol, and three new validation scripts — closed the gaps between "deployable code" and "publishable, repeatable, cross-team-usable patterns."
The most impactful decisions were:
-
Symmetric
OutputWriter.get_*helpers: without them, UC16's chain structure would have silently failed in FSXN_S3AP mode -
OCR masking with
lang="eng+jpn": single string change that eliminated the entire class of language-adjacency leaks - Hiragana/Katakana ratio as the translated-state heuristic for Chinese targets: let the validation tooling do its job without false negatives
- 17 industries × 8 languages as a first-class commitment: not just READMEs, but the per-UC architecture docs and demo-guides that industry practitioners actually read when evaluating a pattern for their own stack
-
docs/dual-kiro-coordination.md: codifying the near-miss protocol made subsequent parallel sessions reliably non-destructive -
Deploy-and-run verification per UC: unit tests plus cfn-lint passed on templates that had silent
AccessDeniedbugs for 9 of 17 UCs. Only full-UC AWS deploy + Step Functions execution surfaced them. The sweep is now standard practice.
The AWS verification artifact we keep coming back to is the UC17 Bedrock Markdown report: ~1.1 KiB of Japanese urban planning commentary, rendered on a GPU-less serverless Lambda, landing on an FSx for ONTAP volume via a single S3 API call, and immediately readable by a city planner through SMB/NFS next to the source GeoTIFF. No additional output-copy step. No per-UC output bucket created. No separate AWS console or SDK workflow required for file consumers beyond their existing SMB/NFS access path (which still carries NTFS ACLs and AD group membership — the access control surface is unchanged). And now, the same story is readable in seven other languages by engineers evaluating the pattern for their own public-sector, healthcare, manufacturing, or retail workloads — with the underlying templates, handlers, and permissions verified by a sweep that would have caught most of what almost shipped broken.
Repository: github.com/Yoshiki0705/FSx-for-ONTAP-S3AccessPoints-Serverless-Patterns
Previous phases: Phase 1 · Phase 6A/6B
Phase 7 artifacts (all in the GitHub repo):
Top comments (0)