DEV Community

Cover image for Public Sector Use Cases, Unified Output Destination, and a Localization Batch — FSx for ONTAP S3 Access Points, Phase 7

Public Sector Use Cases, Unified Output Destination, and a Localization Batch — FSx for ONTAP S3 Access Points, Phase 7

TL;DR

This is Phase 7 of the FSx for ONTAP S3 Access Points serverless pattern library. Building on Phase 6, Phase 7 delivers:

  • Three new Public Sector use cases (UC15 defense/satellite, UC16 government archives/FOIA, UC17 smart-city geospatial) — 21 new Lambdas, 110 new unit tests, 3 production CloudFormation templates
  • OutputDestination API unification across 13 of 17 UCs (UC1-5, UC9-12, UC14-17) — a single STANDARD_S3 | FSXN_S3AP (Amazon FSx for NetApp ONTAP S3 Access Point output) switch decides where AI/ML artifacts land
  • End-to-end AWS verification in FSXN_S3AP mode for all three Public Sector UCs — Bedrock-generated Japanese urban planning reports land directly on FSx for ONTAP, readable by city officials via SMB/NFS without a separate output-copy step or a per-UC output bucket
  • Extended Work completion: v7 OCR-based screenshot redaction (PR #2, 101 PNGs), 97-file localization batch (Chinese backfill for UC1-14 plus full 7-language translation for the three new Public Sector UCs), dual-Kiro parallel session protocol, 9-stack AWS cleanup, and a 17-UC cross-validation sweep that caught 10 silent production-affecting issues, including IAM policy gaps and a missing import, and introduced 3 permanent validation scripts

All deployable AWS runtime features remain opt-in via CloudFormation Conditions; the default deploy mode keeps legacy behavior bit-for-bit identical. The Extended Work items are repository-level tooling, validation, documentation, and process improvements — not runtime features.

In short: AWS serverless and AI/ML services can process files through an S3-compatible access path while both source data and generated artifacts are stored on FSx for ONTAP as the system of record. Existing SMB/NFS users can access the results under the same enterprise file governance model, without an additional AWS console workflow.

Repository: github.com/Yoshiki0705/FSx-for-ONTAP-S3AccessPoints-Serverless-Patterns


Why Public Sector needs sovereign storage + serverless

Three regulatory themes drive the design:

  1. Data sovereignty: Defense, intelligence, and space agencies commonly cannot move data across regions. FSx for ONTAP provides enterprise storage with NTFS ACL / Active Directory integration inside a single AWS region — the patterns in this library keep AI/ML artifacts in the same region as the source data.
  2. Access control via existing directory: Public records and geospatial data are already governed by NTFS ACLs in ONTAP. S3 Access Points provide an object-API access path to that same ONTAP-governed data set, with IAM policies and S3 Access Point policies controlling how serverless services (Lambda, Step Functions, Rekognition, Textract, Bedrock) reach it.
  3. Regulatory alignment: DoD CC SRG Impact Level 4/5, FedRAMP High, NARA GRS retention, FOIA 20-business-day deadlines, and OGC / INSPIRE geospatial standards influence the control design. For Japanese public sector deployments, ISMAP (Information System Security Management and Assessment Program), ガバメントクラウド (Government Cloud), and 個人情報保護法 (Act on the Protection of Personal Information) are relevant governance and regulatory contexts that should be evaluated alongside the technical pattern. Phase 7 models selected guardrails as CloudFormation Guard Hooks and selected workflow checks as Lambda business logic. Final compliance validation remains the responsibility of the deploying organization — this library provides building blocks, not an accreditation package.

Each UC is deployable to ap-northeast-1 today. For AWS GovCloud (US), the same pattern can be adapted after validating service availability (Bedrock, Rekognition, Textract, SageMaker, FSx for ONTAP S3 AP, Guard Hooks all have region-specific feature support), region-specific behaviour, and agency compliance requirements. This article does not claim GovCloud-certified patterns.


One pattern library, 17 industries, 8 languages

The FSx for ONTAP S3 Access Points library began with cross-industry ambition: the same "serverless AI/ML on top of enterprise file storage" pattern applies to legal, financial, healthcare, manufacturing, genomics, VFX, automotive, retail, logistics, and now Public Sector. To make that claim credible, each use case ships with its own <uc>/docs/architecture.md and <uc>/docs/demo-guide.md authored in the industry's domain vocabulary — not a single generic template reskinned 17 times.

Phase 7 is also where the library completed its full 8-language coverage for every per-UC doc, not just the top-level README:

  • 8 target languages: Japanese (original), English, Korean, Simplified Chinese (zh-CN), Traditional Chinese (zh-TW), French, German, Spanish
  • Per-UC docs: architecture.md + demo-guide.md in each of the 17 UC directories
  • Coverage reached by the end of Phase 7: 17 UCs × 2 docs × 8 languages = 272 localized documentation files (only UC6's four one-off documents — meeting notes, audience-specific talk scripts — stay single-language on purpose)

Why per-UC docs matter alongside the top-level READMEs:

  1. Industry-specific vocabulary: "FOIA" (UC16), "DICOM" (UC5), "FASTQ" (UC7), "SEG-Y" (UC8), "DRC" (UC6), "IFC" (UC10), "LAS" (UC17) — these terms are not interchangeable, and a reviewer evaluating the pattern for their industry needs to see the correct terms in their own language
  2. Regional rollout: A Korean government agency (UC16), a French smart-city consortium (UC17), a German automotive OEM (UC9), and a Chinese retail chain (UC11) can each read the pattern in their primary working language and decide independently
  3. Translation-as-validation: Forcing the pattern description through 8 languages exposes ambiguous or industry-specific handwaving — if a sentence can't be translated cleanly, it usually isn't a precise sentence in the original

This multi-industry × multi-language stance is what made Phase 7's Theme R (below) a first-class deliverable, not an afterthought. The 97-file localization batch we describe later isn't just "fix a bug" — it's the completion of the same localization standard we've held for the top-level README, now applied down to every per-industry demo script.


UC15: Defense / Space — Satellite Imagery Analytics

Architecture

graph LR
    FSx[FSx for ONTAP] --> S3AP[S3 Access Point]
    S3AP --> SFN[Step Functions]
    SFN --> D[Discovery]
    D --> T["Tiling<br/>rasterio Layer"]
    T --> R{Size?}
    R -->|"< 5MB"| Rek[Rekognition]
    R -->|">= 5MB"| SM[SageMaker Batch]
    Rek --> CD["Change Detection<br/>DynamoDB geohash"]
    SM --> CD
    CD --> GE[Geo Enrichment]
    GE --> A["Alert Generation<br/>SNS"]
Enter fullscreen mode Exit fullscreen mode

Six Lambda functions for discovery, COG tiling, object detection, time-series change detection, geo-metadata enrichment, and SNS alerts. The tiling stage runs with rasterio via Lambda Layer (fallback to pure-Python header parsing when the Layer is absent), and the object detection stage routes via the Phase 6B determine_inference_path() helper — Rekognition for < 5 MB images, SageMaker Batch Transform for larger images.

Key design decisions

  • geohash-based tile indexing: DynamoDB partition key is a precision-5 geohash (~5 km square) rather than a file path. This makes the time-series change detection join by spatial locality, not by filename.
  • Change threshold in km²: The alert threshold (CHANGE_AREA_THRESHOLD_KM2) defaults to 1.0 km². The _compute_diff_area_km2 helper treats Rekognition's normalized BoundingBox as degrees and converts to km² (1° ≒ 111 km).
  • Float → Decimal conversion: DynamoDB does not accept Python float, so _to_decimal recursively converts all floats to Decimal. Discovered during AWS deployment verification and committed as a production-ready fix.

UC16: Government — Public Records / FOIA

Architecture

graph LR
    FSx[FSx for ONTAP] --> S3AP[S3 Access Point]
    S3AP --> SFN[Step Functions]
    SFN --> D[Discovery]
    D --> O["OCR<br/>Textract sync/async"]
    O --> C["Classification<br/>Comprehend"]
    C --> E["Entity Extraction<br/>PII Detection"]
    E --> Red["Redaction<br/>sidecar JSON"]
    Red --> Ch{OpenSearch?}
    Ch -->|"enabled"| I[Index Generation]
    Ch -->|"disabled"| Cmp["Compliance Check<br/>NARA GRS"]
    I --> Cmp

    FL["EventBridge<br/>1/day"] --> FLL[FOIA Deadline Lambda]
    FLL --> SNS[SNS Reminder]
Enter fullscreen mode Exit fullscreen mode

Eight Lambda functions: discovery, OCR (Textract sync/async), classification (Comprehend custom classifier with keyword fallback), PII entity extraction, redaction with sidecar metadata, OpenSearch index generation, NARA compliance check, and FOIA deadline reminder.

Sovereignty note: UC16 uses a Textract cross-region fallback to us-east-1 because Textract is not available in ap-northeast-1 at the time of writing. This is a functional verification path, not a sovereign deployment recommendation. For strict data-residency requirements, the Textract cross-region call should be disabled or replaced with an in-region OCR alternative (e.g., a Bedrock vision model in the local region, or a SageMaker-hosted document OCR model). The pattern is in the library to show the graceful-fallback wiring; production deployments under strict sovereignty constraints should select a different OCR backend.

Three OpenSearch deployment modes

Declared via the OpenSearchMode parameter:

Mode Use case Monthly cost (est.)
none Development / cost-optimized $0
serverless Variable search workloads $350 – $700 (min 2 OCU)
managed Fixed small workload $35 – $100 (t3.small.search × 1)

The Step Functions IndexOrSkip Choice state bypasses IndexGeneration when OpenSearch is disabled. OpenSearch CloudFormation resources are guarded by CreateOpenSearchServerless / CreateOpenSearchManaged conditions so the stack deploys without OpenSearch at all when OpenSearchMode=none.

PII hashing — zero original retention

The redaction sidecar stores only SHA-256 hashes of the original PII (never the cleartext), with offsets preserved for audit:

{
  "entity_type": "NAME",
  "original_offset": [8, 16],
  "original_text_hash": "sha256:a3b5...",
  "confidence": 0.99
}
Enter fullscreen mode Exit fullscreen mode

This satisfies NARA / FOIA Section 552 audit requirements while preventing accidental PII leakage through log aggregation or search indexing.

FOIA 20-business-day calculation

US federal holidays are hardcoded in foia_deadline_reminder/handler.py. The add_business_days helper skips weekends and federal holidays when computing deadlines. days_until_deadline returns 0 for past dates — upstream code interprets 0 as OVERDUE and publishes an SNS alert with severity=HIGH.

Chain structure: the feature that made OutputDestination non-trivial

UC16's processing chain is the deepest in the library:

OCR            put_text(ocr-results/{key}.txt)
  →
Classification get_text(ocr-results/{key}.txt) + put_json(classifications/...)
  →
EntityExtraction get_text(ocr-results/{key}.txt) + put_json(pii-entities/...)
  →
Redaction      get_text(ocr-results/{key}.txt) + put_text(redacted/...) + put_json(redaction-metadata/...)
  →
IndexGeneration get_text(redacted/...) + OpenSearch index call
Enter fullscreen mode Exit fullscreen mode

When we flipped this to FSXN_S3AP mode, every downstream handler needed to read the previous stage's output from wherever the producer wrote it. The naïve implementation — each handler hardcodes s3://<output-bucket>/... — breaks immediately. We solved this by adding symmetric get_* helpers to shared/output_writer.py (more on this below).


UC17: Smart City — Geospatial Analytics & Urban Planning

Architecture

graph LR
    FSx[FSx for ONTAP] --> S3AP[S3 Access Point]
    S3AP --> SFN[Step Functions]
    SFN --> D[Discovery]
    D --> P["Preprocessing<br/>CRS → EPSG:4326"]
    P --> L["Land Use<br/>Rekognition / SageMaker"]
    L --> CD["Change Detection<br/>DynamoDB L1 delta"]
    CD --> IA["Infra Assessment<br/>laspy LAS"]
    IA --> RM["Risk Mapping<br/>flood/quake/slide"]
    RM --> RG["Report Generation<br/>Bedrock Nova Lite"]
Enter fullscreen mode Exit fullscreen mode

Seven Lambda functions. The headline feature is Bedrock Nova Lite-generated planning commentary in Japanese, produced from the combination of land-use distribution, change magnitude, and three risk scores.

Disaster risk model

Three independent risk scores, each normalized 0.0–1.0:

  • Flood: 0.4 × elevation_score + 0.3 × water_proximity_score + 0.3 × impervious_rate
  • Earthquake: 0.6 × soil_score + 0.4 × building_density
  • Landslide: 0.5 × slope_score + 0.3 × precipitation_score + 0.2 × vegetation_score

Classified into CRITICAL / HIGH / MEDIUM / LOW bands. The risk model is intentionally simple — it is production-ready as a first-pass indicator, but agencies are expected to replace it with domain-specific models in SageMaker.

Sample Bedrock report (Japanese)

### 自治体担当者向け所見レポート

#### 都市計画上の注目点
GISデータによると、市内の土地利用分布は安定しており、変化は検出されていません。
しかし、洪水、地震、斜面崩壊のリスクが中程度であることに注意が必要です。

#### 優先すべき対策案
1. 洪水対策の強化: 中程度の洪水リスクに対応するため、排水システムの改善と
   洪水予測モデルの更新を実施。
2. 地震対策の強化: 地震リスクに対応するため、建物の耐震基準の見直しと
   緊急避難経路の整備を推進。
3. 斜面崩壊対策の強化: 斜面崩壊リスクに対応するため、斜面の安定性調査と
   防護工事の実施を検討。
Enter fullscreen mode Exit fullscreen mode

Generated by a real Bedrock amazon.nova-lite-v1:0 invocation during AWS verification. The content is contextually appropriate though the input was a minimal test raster.

This report, at ~1.1 KiB saved as text/markdown, is the most visceral demonstration of the "no additional output-copy" pattern: a municipal planner browsing /fsxn-volume/gis/2026/05/ via SMB/NFS finds the raw GeoTIFF inputs alongside the Bedrock-generated Markdown report, openable in any text editor. No per-UC output bucket, no region transfer, and no additional AWS console workflow required for the end consumer beyond their existing SMB/NFS access.


The OutputDestination API: one switch for STANDARD_S3 vs FSXN_S3AP outputs

Phase 7 introduced Pattern B — a OutputDestination parameter with two modes:

Mode AI / ML output destination Per-UC output bucket created?
STANDARD_S3 (default) Output bucket (legacy behavior) Yes
FSXN_S3AP FSx ONTAP S3 Access Point No (skipped via Condition)

Before Phase 7, the library had three patterns coexisting:

  • Pattern A: FSx for ONTAP S3AP only (UC1-5, UC15-17 original form) — configured via S3AccessPointAlias + S3AccessPointOutputAlias parameters
  • Pattern B (new): Switchable via OutputDestination (initially UC11 and UC14, then rolled out to UC9/10/12/14, then UC15-17, now also UC1-5)
  • Pattern C: Standard S3 only (UC6/7/8/13 due to Athena OutputLocation not supporting S3AP — tracked as FR-2)

After Phase 7 Extended Work completion on 2026-05-11, 13 of 17 UCs support the unified OutputDestination parameter. Only UC6/7/8/13 remain on Pattern C; moving them to a hybrid Pattern B+C is scheduled for Phase 8.

CloudFormation shape

Parameters:
  OutputDestination:
    Type: String
    Default: "STANDARD_S3"
    AllowedValues: ["STANDARD_S3", "FSXN_S3AP"]

  S3AccessPointOutputAlias:
    Type: String
    Default: ""
    Description: Required when OutputDestination=FSXN_S3AP

Conditions:
  UseStandardS3:
    !Equals [!Ref OutputDestination, "STANDARD_S3"]

Resources:
  OutputBucket:
    Type: AWS::S3::Bucket
    Condition: UseStandardS3    # Skipped when FSXN_S3AP
    Properties: ...
Enter fullscreen mode Exit fullscreen mode

When OutputDestination=FSXN_S3AP, the output bucket resource is never created. The Lambda environment variables point to the S3 Access Point alias directly, and the shared/output_writer.py module routes writes through the S3AP regardless of whether a bucket exists.

The OutputWriter module

shared/output_writer.py centralizes the mode-aware logic:

class OutputWriter:
    def __init__(self, mode: str, s3_ap_alias: str = "", bucket_name: str = ""):
        self.mode = mode
        if mode == "FSXN_S3AP":
            self.s3_destination = f"arn:aws:s3:{region}:{account}:accesspoint/{s3_ap_alias}"
        else:
            self.s3_destination = bucket_name

    @classmethod
    def from_env(cls) -> "OutputWriter":
        return cls(
            mode=os.environ["OUTPUT_DESTINATION"],
            s3_ap_alias=os.environ.get("S3_ACCESS_POINT_OUTPUT_ALIAS", ""),
            bucket_name=os.environ.get("OUTPUT_BUCKET_NAME", ""),
        )

    def put_json(self, key: str, data: dict): ...
    def put_text(self, key: str, text: str): ...
    def put_bytes(self, key: str, data: bytes, content_type: str): ...

    # Added during UC16 chain migration:
    def get_json(self, key: str) -> dict: ...
    def get_text(self, key: str) -> str: ...
    def get_bytes(self, key: str) -> bytes: ...
Enter fullscreen mode Exit fullscreen mode

The symmetric get_* helpers arose directly from UC16's chain structure. Without them, downstream handlers would have tried to read from a (non-existent) output bucket when running in FSXN_S3AP mode. The discovery came during the migration itself, and the fix was trivial once the problem was named — seven new unit tests in shared/tests/test_output_writer.py, total 28 PASS.


Phase 6B Guard Hooks compliance

All three templates deploy successfully with the active Guard Hooks stack fsxn-s3ap-guard-hooks:

Rule UC15 UC16 UC17
encryption-required ✅ S3 SSE-KMS, DynamoDB SSE, SNS KMS
iam-least-privilege ✅ API-bound wildcards only ✅ Bedrock foundation-model ARN
logging-required ✅ all 6 Lambdas ✅ all 8 ✅ all 7
point-in-time-recovery ✅ ChangeHistoryTable ✅ RetentionTable + FoiaRequestTable ✅ LandUseHistoryTable

AWS verification in FSXN_S3AP mode

Theme E verification results (2026-05-11)

UC Stack SFN duration Artifacts OutputBucket created?
UC15 defense-satellite fsxn-uc15-demo ~12s 5 files on S3AP No
UC16 government-archives fsxn-uc16-demo ~90s 6 files on S3AP No
UC17 smart-city-geospatial fsxn-uc17-demo ~14s 4 files on S3AP No

Each stack's resource inventory confirmed via:

aws cloudformation describe-stack-resources \
    --stack-name fsxn-uc15-demo \
    --query '[?ResourceType==`AWS::S3::Bucket`]' \
    --output json
# Returns [] — no bucket resources created
Enter fullscreen mode Exit fullscreen mode

The most instructive single verification: UC16 chain read-back

After the Step Functions execution completed, we inspected the S3AP:

aws s3 ls s3://eda-demo-s3ap-<alias>/ai-outputs/uc16/
# PRE classifications/
# PRE pii-entities/
# PRE redacted/
# PRE redaction-metadata/
Enter fullscreen mode Exit fullscreen mode

Each subdirectory has the expected artifacts. Critically, the Classification handler's input was a read-back from the OCR stage:

# classification/handler.py
output_writer = OutputWriter.from_env()
ocr_text = output_writer.get_text(f"ai-outputs/uc16/ocr-results/{key}.txt")
classification = bedrock_or_keyword_fallback(ocr_text)
output_writer.put_json(f"ai-outputs/uc16/classifications/{key}.json", classification)
Enter fullscreen mode Exit fullscreen mode

In FSXN_S3AP mode, get_text() reads through the S3 Access Point → FSx for ONTAP volume. In STANDARD_S3 mode, the same call reads from the output bucket. The handler code is identical in both modes; the routing decision is environment-variable-driven inside OutputWriter.

UC17 Bedrock report: SMB/NFS accessibility

The Bedrock Nova Lite invocation took ~5 seconds and returned a markdown document. Saved at ai-outputs/uc17/reports/city1.tif.md, the file is:

  • Immediately visible via SMB to anyone mounted on the same FSx for ONTAP volume
  • Openable in any text editor — no AWS SDK or console workflow required for the end consumer
  • Co-located with source rasters — the GIS team sees the AI-generated commentary alongside the raw data

This is the pattern that most clearly demonstrates "serverless AI artifacts land in your existing file-sharing infrastructure without a data-movement step."


Extended Work I: Screenshot Redaction — From Heavy Mask to OCR Precision

Phase 7 was originally delivered with a safe-by-default screenshot masking strategy (v6): grey rectangles covering entire console content areas, exposing only narrow strips that contained the Step Functions Graph view. This over-masked the UI/UX that the PR was meant to showcase.

PR #2 replaced v6 with v7 OCR-based precision masking across all 116 tracked screenshots.

How v7 works

  1. Run pytesseract.image_to_data(lang="eng+jpn") on each screenshot to get word-level bounding boxes
  2. For each detected word, check if it contains any substring from SENSITIVE_STRINGS (a gitignored local file listing account IDs, resource IDs, private IPs, emails, etc.)
  3. Draw a small black rectangle only over the matched word
  4. Re-run OCR on the partially-masked image, up to 4 passes (tesseract tokenization is non-deterministic at word boundaries — long URIs like s3://bucket-<account-id>/obj sometimes match on the second pass but not the first)
  5. Always mask the top-right account widget (fixed position, often missed by OCR due to styling)
  6. Apply to both AWS console screenshots and HTML preview mocks — the latter had been copied as-is in v6 and were leaking account IDs embedded in rendered S3 URIs

Why lang="eng+jpn"

AWS console screenshots in our environment render in Japanese (sidebar labels, breadcrumbs, action buttons). Running tesseract with lang="eng" silently misses sensitive words that appear adjacent to Japanese text — the tokenizer treats the adjacent CJK characters as word terminators but doesn't always segment cleanly. Two leaks discovered post-v7 surfaced only when we switched to lang="eng+jpn":

  • phase1-fsx-filesystem-detail.png: fs-<filesystem-id> next to a Japanese label
  • phase7-uc15-s3-satellite-uploaded.png: S3 Access Point alias next to Japanese column headers

The fix was a single string change. We now use eng+jpn across all masking and leak-verification tooling.

Content preservation

v7 preserves approximately 99% of screenshot content. A sample before/after comparison on uc11-product-tags.png:

  • v6: Almost the entire HTML preview was grey-washed, only a hint of "AUTO-TAGGED" visible
  • v7: Product image, tag chips (Oval 99.93%, Food 60.67%, ...), description text, and panel layout all fully visible. Only the s3://fsxn-retail-catalog-demo-output-<account-id>/tags/2026/05/10/product-001.json URI has a small black rectangle over the bucket name.

Verification workflow

We added scripts/_check_sensitive_leaks.py (gitignored) as a ground-truth detector:

def scan_image(path: Path) -> list[tuple[str, str]]:
    img = Image.open(path).convert("RGB")
    text = pytesseract.image_to_data(
        img, lang="eng+jpn", output_type=pytesseract.Output.DICT
    )
    hits = []
    for word in text["text"]:
        if not word:
            continue
        for s in SENSITIVE_STRINGS:
            if s in word:
                hits.append((s, word))
    return hits
Enter fullscreen mode Exit fullscreen mode

After v7 re-masking, _check_sensitive_leaks.py reports:

Scanned: 116 masked images
Images with detectable sensitive substrings: 0
Enter fullscreen mode Exit fullscreen mode

This zero-leak state has been maintained across every subsequent commit that adds or modifies screenshots — it is part of our pre-commit checklist.

Rule E (mandatory leak check)

Formalized in docs/dual-kiro-coordination.md: any session adding screenshots must run _check_sensitive_leaks.py and confirm 0 leaks before committing. The tooling is distributed via .example templates (scripts/_check_sensitive_leaks.py.example, scripts/_inplace_ocr_mask.py.example, scripts/_sensitive_strings.py.example) so each contributor can bootstrap their local copy without committing the actual sensitive-strings file.


Extended Work II: Completing 8-Language Coverage for All 17 UCs (97-File Localization Batch)

The library had committed to 8-language docs for every UC since Phase 5 (<uc>/README.md + 7 translated variants, plus per-UC architecture.md and demo-guide.md in the same languages). What we discovered mid-Phase-7 was that two of those eight languages had never actually been translated for the per-UC docs, despite having the language-switcher markup in place.

User-surfaced symptom: clicking [简体中文] in the language switcher at the top of smart-city-geospatial/docs/demo-guide.md navigated to demo-guide.zh-CN.md, but the body was still Japanese. Same behaviour for [繁體中文], and for all 14 non-Public-Sector UCs. For the three Public Sector UCs (UC15-17), all seven target-language variants were partial — the language switcher and the "this is an auto-generated draft" note had been translated at file creation, but the body remained Japanese.

A survey script (scripts/_survey_translation_status.py, gitignored) produced this matrix:

Language Translated Partial Stub (JP body)
en 28 6 0
ko 28 6 0
zh-CN 0 17 17
zh-TW 0 17 17
fr 28 6 0
de 28 6 0
es 28 6 0

Two root causes:

  1. zh-CN and zh-TW had never been translated for per-UC docs — all 34 files across 14 UCs (UC1-14 × 2 docs × 2 langs) kept Japanese bodies. This was an honest omission from an earlier batch translation that prioritized top-level READMEs over per-UC docs; it was meant to be followed up and never was.
  2. Phase 7 UC15-17 (new) were partial across all 7 target languages — auto-generated drafts shipped with the language-switcher and boilerplate translated, but the body remained Japanese pending a follow-up translation pass.

Total scope for this phase's localization batch: 97 files — 55 files backfilling the two Chinese variants for UC1-14, plus 42 files providing the full 7-language initial translation for UC15-17. This brings the per-UC language variant count to 272 files total (17 UCs × 2 docs × 8 languages, with 4 intentionally single-language one-off files under UC6 excluded).

Why 55 and not 56? The expected R-1 matrix has 14 UCs × 2 docs × 2 languages = 56 slots. One of those slots — semiconductor-eda/docs/demo-guide.zh-CN.md — had been hand-translated during an earlier session on 2026-05-09 and was already complete, so the Bedrock batch regenerated only the remaining 55 files. The git commit preserves this detail: "UC6 demo-guide.zh-CN was previously hand-translated on 2026-05-09 and left unchanged."

The framing matters: this wasn't a bug-fix, it was the completion of our existing 8-language standard for the per-UC docs. Legal compliance, financial IDP, manufacturing analytics, media/VFX, healthcare DICOM, semiconductor EDA, genomics, energy/seismic, autonomous driving, construction BIM, retail catalog, logistics OCR, education research, and insurance claims — all now have their demo-guide and architecture docs readable by a zh-CN or zh-TW speaker without Google Translate. And the three new Public Sector UCs (defense/satellite, government archives, smart city) reach the same parity at launch.

Using Amazon Bedrock Claude Sonnet 4.5

We built a per-file translator script (scripts/_translate_uc_docs.py, gitignored) around the Claude Sonnet 4.5 model via the JP regional inference profile jp.anthropic.claude-sonnet-4-5-20250929-v1:0:

def build_language_switcher(doc_base: str, target_lang: str) -> str:
    """Build the language switcher line with target language unlinked."""
    ...

TRANSLATION_PROMPT = """You are a professional technical translator.

Translate the following Japanese Markdown document into {target_lang_name}.

## Rules
1. Preserve code blocks verbatim.
2. Preserve technical identifiers: parameter names, ARN placeholders,
   URLs, file paths, command names.
3. Preserve markdown structure exactly.
4. Translate Mermaid diagram labels but keep node IDs intact.
5. Translate table cells except proper nouns.
6. Industry terminology consistency (per uc-industry-mapping.md).
7. Output ONLY the translated Markdown body.
"""
Enter fullscreen mode Exit fullscreen mode

And a batch driver (scripts/_batch_translate_r1.sh, gitignored) that iterates 14 UCs × 2 docs × 2 langs:

for uc in legal-compliance financial-idp ... insurance-claims; do
  for doc in demo-guide architecture; do
    for lang in zh-CN zh-TW; do
      python3 scripts/_translate_uc_docs.py "$uc" "$doc" "$lang"
    done
  done
done
Enter fullscreen mode Exit fullscreen mode

Results

  • R-1 (UC1-14, zh-CN + zh-TW): 55 files translated in 45 minutes, ~$5–7 Bedrock cost (the expected 14 × 2 × 2 = 56-slot matrix had one slot already complete from a prior hand-translation, so the batch regenerated only the remaining 55)
  • R-2 (UC15-17 × 7 langs × 2 docs): 42 files translated in 30 minutes, ~$3–4 Bedrock cost
  • Total: 97 files, 75 minutes wall time, ~$10 Bedrock cost

Spot-check example: UC5 healthcare-dicom zh-CN

Before (stub):

# DICOM 匿名化工作流程 — 演示指南

🌐 Language / ... :  日本語 | English | ...

> Note: This translation is an auto-generated draft based on the Japanese original.

## 前提
- AWS アカウント、ap-northeast-1
- FSx for ONTAP + S3 Access Point
- Bedrock モデル利用可能化
Enter fullscreen mode Exit fullscreen mode

After (translated):

# DICOM 匿名化工作流程 — Demo Guide

🌐 Language / ... :  日本語 | English | ... | 简体中文 | ...

> 注意:此翻译由 Amazon Bedrock Claude 生成。欢迎对翻译质量提出改进建议。

## 前提
- AWS 账户,ap-northeast-1
- FSx for ONTAP + S3 Access Point
- Bedrock 模型可用
Enter fullscreen mode Exit fullscreen mode

Technical identifiers preserved (FSx for ONTAP, S3 Access Point, Bedrock, AWS, ap-northeast-1). Non-identifier Japanese text translated to Chinese.

Validation: distinguishing translated Chinese from untranslated Japanese

The first pass of our translation-status survey script used CJK Unicode ratio as the heuristic for "is this translated?" — high CJK meant "still Japanese". This failed for zh-CN/zh-TW output because translated Chinese also has high CJK ratio. We fixed the heuristic to check Hiragana + Katakana specifically (which are Japanese-exclusive scripts):

if target_lang in ("zh-CN", "zh-TW"):
    # Japanese-specific scripts only
    jp_chars = sum(
        1 for c in body
        if "\u3040" <= c <= "\u309f" or "\u30a0" <= c <= "\u30ff"
    )
    ratio = jp_chars / max(len(body), 1)
    if ratio < 0.005:
        return "translated"
Enter fullscreen mode Exit fullscreen mode

After this fix plus the Bedrock batch, all 17 UCs × 7 target languages × 2 doc types show as translated in the survey.

Commit strategy: per-UC granularity for R-2

Per my parallel session partner's preference, R-2 was split into 3 commits (one per Phase 7 UC) rather than a single 42-file commit:

  • 764d8c6 feat(i18n): R-2 UC15 defense-satellite — translate docs to 7 languages
  • 4a29a1a feat(i18n): R-2 UC16 government-archives — translate docs to 7 languages
  • 3fba028 feat(i18n): R-2 UC17 smart-city-geospatial — translate docs to 7 languages

This granularity makes git log -- defense-satellite/docs/ useful for UC-scoped review and permits UC-level revert if translation quality issues are caught post-merge.


Extended Work III: AWS Resource Cleanup — Three Failure Modes That Weren't in the Script

Post-verification cleanup of nine UC stacks (UC1/2/3/5/7/8/10/12/13) surfaced three blocking conditions that our scripts/cleanup_generic_ucs.sh did not handle:

Failure 1: Athena WorkGroup is non-empty

AthenaWorkgroup failed to delete:
"Invalid request provided: WorkGroup fsxn-manufacturing-analytics-demo-workgroup
 is not empty"
Enter fullscreen mode Exit fullscreen mode

Fix: aws athena delete-work-group --work-group <name> --recursive-delete-option. Affected UC3, UC7, UC8.

Failure 2: S3 bucket has object versions

AthenaResultsBucket failed to delete:
"The bucket you tried to delete is not empty. You must delete all versions."
Enter fullscreen mode Exit fullscreen mode

The bucket had versioning enabled (correctly — for recovery). Normal aws s3 rb only removes current-version objects; delete markers and historical versions must be deleted via aws s3api delete-objects --delete file://markers.json. Fix: added a helper script scripts/_empty_versioned_bucket.sh (gitignored during Phase 7, planned for integration into the main cleanup script in Phase 8).

Failure 3: Lambda Security Group has a dependent object

LambdaSecurityGroup failed to delete:
"resource sg-<lambda-sg> has a dependent object"
Enter fullscreen mode Exit fullscreen mode

Root cause: a Phase 7 manual workaround (tracked as O-2 in the Phase 7 tasks) added inbound rules to the VPC Endpoint SG referencing per-UC Lambda SGs. Deleting the UC stack tried to delete the Lambda SG, but the VPC Endpoint SG still had a rule referencing it.

Fix:

aws ec2 revoke-security-group-ingress \
    --group-id sg-<vpc-endpoint-sg> \
    --ip-permissions 'IpProtocol=tcp,FromPort=443,ToPort=443,
                      UserIdGroupPairs=[{GroupId=sg-<lambda-sg>}]'
Enter fullscreen mode Exit fullscreen mode

Affected UC1, UC2. We revoke only the rules for the UCs being deleted; UC6's reference is kept because UC6 remains deployed.

Secondary bug in the cleanup script itself

The cleanup script had <ACCOUNT_ID> as a literal placeholder string (a leftover from a global redaction pass). When executed, it constructed bucket names like fsxn-legal-compliance-demo-output-<ACCOUNT_ID> — a bucket that doesn't exist, so aws s3 rb silently no-oped. Stacks that needed their output buckets emptied before deletion then failed.

Fix (commit 770f713):

ACCOUNT_ID="${ACCOUNT_ID:-$(aws sts get-caller-identity --query 'Account' --output text 2>/dev/null)}"
if [ -z "$ACCOUNT_ID" ] || [ "$ACCOUNT_ID" = "<ACCOUNT_ID>" ]; then
    echo "ERROR: could not resolve AWS account ID." >&2
    exit 1
fi
Enter fullscreen mode Exit fullscreen mode

Final cleanup outcome

After resolving all three failure modes:

  • 9 / 9 stacks: DELETE_COMPLETE
  • 0 DynamoDB tables with fsxn- prefix remaining
  • Total cleanup wall time: approximately 60 minutes (including CloudFormation poll intervals for VPC Lambda ENI release, which typically takes 15–30 minutes per stack)

The three failure modes are tracked in the Phase 8 spec as Theme A items to be integrated into a rewritten cleanup_generic_ucs.py (Python, with --dry-run and per-failure-mode recovery).


Extended Work IV: A Dual-Kiro Coordination Protocol

Phase 7 was built with two AI sessions working in parallel on the same repository. The sessions are labeled A and B in chat notifications. Early in the sprint, this parallelism nearly cost us a working tree of uncommitted v7 mask work — twice, via git checkout from the other session wiping my in-progress edits.

The recovery relied on git stash, which did save the work, but the near-miss triggered a codification effort.

docs/dual-kiro-coordination.md

A 506-line protocol document now lives at docs/dual-kiro-coordination.md. Its core rules:

Rule C-1: Check current branch before every commit.

Before every `git commit`, run `git branch --show-current` and
verify you are on the branch you intend to commit to.
Enter fullscreen mode Exit fullscreen mode

Rule C-2: Never checkout with a dirty working tree you don't own.

If `git status --short` shows modifications that are not yours:
- Do NOT run `git checkout`, `git switch`, `git stash`, `git reset --hard`
- Send a chat notification first:
  "[X] I need to checkout <target>. Your working tree has N dirty files.
   Is your working tree safe to stash?"
- Wait for the other session's "yes" before proceeding.
Enter fullscreen mode Exit fullscreen mode

Rule C-3: Stash is a safety net, not a workflow.

Rule C-4: Recognize reflog anomalies immediately.

Rule C-5: Force-push requires --force-with-lease, never bare --force.

Exclusive regions

Each session declares an "exclusive region" — the set of paths and files that only it edits during the sprint:

  • A's region: scripts/mask_uc_demos.py, docs/screenshots/masked/**, docs/screenshots/MASK_GUIDE.md, cleanup helpers, .kiro/specs/.../phase7/tasks.md, docs/dual-kiro-coordination.md, article files
  • B's region: shared/output_writer.py, per-UC trees, docs/output-destination-patterns.md, docs/aws-feature-requests/fsxn-s3ap-improvements.md, README sections, per-UC demo-guide localizations, Phase 7 summary / troubleshooting docs

Cross-region edits require explicit chat lock acquisition ([A] LOCK REQUEST: README.md "AWS 仕様" section) with duration bounded to 15–30 minutes.

Rule E: Mandatory leak check

Any session adding screenshots must run _check_sensitive_leaks.py and report 0 leaks before committing. Tooling is distributed via .example templates so contributors bootstrap locally without ever committing the actual _sensitive_strings.py.

Rule F: Industry mapping consistency

docs/screenshots/uc-industry-mapping.md (maintained in the partner session's exclusive region) maps each UC number to an industry label in 8 languages. New UCs or renaming events require agreement before the mapping file is updated.

Emergency pause

If either session detects:

  • Main CI in a broken state
  • Unmergeable conflict after 2 rebase attempts
  • Suspected destruction of the other's work

…the session posts [X] EMERGENCY PAUSE with a one-line summary and both halt non-trivial operations until the human driver decides recovery.

The full protocol is committed as docs/dual-kiro-coordination.md — it will evolve as we gather more experience in Phase 8.


Extended Work V: The 17-UC Cross-Validation Sweep

During the final screenshot round, running aws cloudformation deploy + Step Functions execution across all 17 UCs surfaced a class of bugs that had been silent under pure unit-test + cfn-lint verification. These were shipped-but-broken states: the templates passed static checks, the handlers passed unit tests, and yet AWS-level runtime behavior failed.

Bug class: IAM silent failure on S3 Access Point ARN form

CloudFormation templates that granted IAM access to an FSxN S3 Access Point only in alias form (arn:aws:s3:::<alias>) passed cfn-lint and deployed successfully. But at runtime, AWS-level S3 API calls performed against the ARN form (arn:aws:s3:<region>:<account>:accesspoint/<name>) hit AccessDenied because neither form implies the other in IAM policy evaluation.

The fix, already applied to UC6 earlier and now propagated across all affected UCs:

Conditions:
  HasS3AccessPointName:
    !Not [!Equals [!Ref S3AccessPointName, ""]]

Resources:
  DiscoveryLambdaRole:
    Type: AWS::IAM::Role
    Properties:
      Policies:
        - PolicyName: S3APAccess
          PolicyDocument:
            Statement:
              - Effect: Allow
                Action: s3:GetObject
                Resource:
                  - !Sub "arn:aws:s3:::${S3AccessPointAlias}/*"
                  - !If
                    - HasS3AccessPointName
                    - !Sub "arn:aws:s3:${AWS::Region}:${AWS::AccountId}:accesspoint/${S3AccessPointName}/object/*"
                    - !Ref "AWS::NoValue"
Enter fullscreen mode Exit fullscreen mode

UCs affected by this silent bug: UC1 (legal-compliance), UC2 (financial-idp), UC3 (manufacturing-analytics), UC5 (healthcare-dicom), UC6 (semiconductor-eda), UC7 (genomics-pipeline), UC8 (energy-seismic), UC10 (construction-bim), UC14 (insurance-claims). UC4 and UC9 received the same fix during Theme Q. That's 11 of 17 UCs that had a production-affecting silent bug that only surfaced on full AWS deploy + run.

Bug class: Lambda handler NameError (missing imports)

One Lambda handler in financial-idp/functions/entity_extraction/handler.py called os.environ.get(...) without import os. Unit tests passed because the test fixtures had mocked the environment lookup path entirely, so the code path that needed os was never actually executed. CloudFormation accepted the zip. The handler ran on AWS and hit NameError: name 'os' is not defined at first invocation.

Validation tooling added

To prevent these classes of bugs from reaching production on future UC additions, three reusable validation scripts were added and are now part of the recommended pre-merge checklist:

Script Detection target Runtime
scripts/lint_all_templates.sh cfn-lint schema errors across all 17 UCs in parallel 5-7 min
scripts/check_handler_names.py pyflakes NameError / undefined name sweep of all Lambda handlers ~30 sec
scripts/check_conditional_refs.py UC9-class bug — !Sub ${Resource.Arn} referencing a Condition-guarded resource ~5 sec

Verification at the Extended Round's final commit: 0 real cfn-lint errors across 17 templates, 0 pyflakes errors across 87 handlers, 0 conditional-ref issues across 17 templates.

Why static checks weren't enough

This experience reinforces a pattern that showed up repeatedly in Phase 7:

  • cfn-lint validates CloudFormation schema but doesn't simulate IAM evaluation
  • Unit tests validate handler logic given mocked clients, but the mock setup can accidentally bypass the broken code path
  • aws cloudformation deploy succeeds even with silently-broken IAM configurations — AWS only complains at the first real API call

The remedy isn't to add more types of static checks (they'd pile up endlessly) — it's to establish a "deploy and actually run it" verification lane for every UC on every non-trivial change. The check_conditional_refs.py script is an exception: it's narrow, fast, and catches a specific class of bug that is otherwise only detectable by attempting a parameterized deploy.


Verification results (ap-northeast-1)

Full details in the Phase 7 verification results and the OutputDestination Theme E verification.

UC Step Functions execution Output artifacts Cost
UC15 SUCCEEDED (~30s initial, ~12s with FSXN_S3AP) enriched/ + detections/ + tiles/ on S3AP ~$0.01
UC16 SUCCEEDED (~35s initial, ~90s with FSXN_S3AP + Textract cross-region) classifications/ + pii-entities/ + redacted/ + redaction-metadata/ on S3AP ~$0.02
UC17 SUCCEEDED (~45s initial, ~14s with FSXN_S3AP + Bedrock) preprocessed/ + landuse/ + risk-maps/ + reports/*.md on S3AP ~$0.02

Total verification cost across both rounds: ~$0.10 (excluding OpenSearch / SageMaker which stayed disabled).

Sample-data verification (initial run, STANDARD_S3 mode)

A first run with synthetic JPEG / PDF samples produced the following measurable outputs:

  • UC15: Rekognition DetectLabels against a 1024×1024 synthetic aerial image (PIL-generated, buildings + roads) detected 15+ labels
  • UC16: Textract cross-region call to us-east-1 returned 43 KB Blocks JSON from a 1.6 KB PDF (FORMS + TABLES features). Comprehend DetectPiiEntities found 5 PII entities (NAME, EMAIL, PHONE, SSN, DATE_TIME) with 99.63–99.99% confidence
  • UC17: Bedrock Nova Lite recognized 仙台 (Sendai) from the filename sendai_area.jpg and produced a Japanese urban planning report with 3 countermeasures + 1 monitoring indicator

FSXN_S3AP mode verification (second round, Theme E)

Three stacks deployed to ap-northeast-1 with OutputDestination=FSXN_S3AP. All Step Functions executions SUCCEEDED. Resource inventory confirmed zero AWS::S3::Bucket resources in each stack. CLI evidence is committed to docs/verification-evidence/uc{15,16,17}-demo/ for public reference.


Lessons learned during AWS deployment

From the UC15-17 implementation round

  1. Rekognition rejects malformed images with InvalidImageFormatException. Production code must catch this explicitly and return empty detections rather than letting the exception propagate through lambda_error_handler (which would abort the workflow).
  2. DynamoDB's float → Decimal requirement. Silent unit test passing doesn't catch this; only real PutItem calls do. Added _to_decimal helper in UC15 change_detection.
  3. Textract is unavailable in ap-northeast-1. UC16 OCR needed a graceful fallback (log the EndpointConnectionError, return api_used="unavailable"). Production deploys Cross-Region Client to us-east-1 (the pattern used by UC2/UC10/UC12/UC13/UC14).
  4. Map state input shape. Step Functions' Map ItemsPath iterations only expose per-item fields; root-level state machine input (e.g., opensearch_enabled) is not propagated. Solution: IsPresent: true guard in Choice, or explicitly pass fields via Map Parameters.
  5. S3 Access Point ARN format. FSx for ONTAP S3 APs require policies to grant both the alias ARN (arn:aws:s3:::<alias>) AND the full Access Point ARN (arn:aws:s3:<region>:<account>:accesspoint/<name>). The conditional !If HasS3AccessPointName pattern mirrors UC6's approach.

From the OutputDestination unification round

  1. Lambda logs can go missing even when the handler succeeds. Python 3.13 runtime defaults the root logger to WARNING. Calls to logger.info(...) silently drop unless the handler explicitly sets logging.getLogger().setLevel(logging.INFO). EMF metrics, Step Functions output, and aws s3api list-objects-v2 are the reliable verification signals. aws logs tail often shows an "empty" log stream even when the handler ran successfully.
  2. aws s3 ls s3://<alias> behaves differently from ARN form. Use the alias form (aws s3 ls s3://<alias>/path/) for interactive ls / download. The ARN form (aws s3 ls s3://arn:aws:s3:<region>:<account>:accesspoint/<alias>/) is accepted in IAM policies but is rejected by the aws s3 ls CLI command. This surfaced during evidence gathering for Theme E.
  3. Downstream handlers in a chain structure must use read-back. UC16's OCR → Classification → EntityExtraction → Redaction chain was discovered to need OutputWriter.get_text() during migration, not via a failing unit test (the unit tests used mock s3_client directly, missing the abstraction entry point). Added 7 new unit tests covering the read helpers to prevent regression.
  4. Lambda zip + CloudFormation deploy is not enough to refresh code. If a code-only change is pushed and the cloudformation deploy finds no template changes, the Lambda function retains its previous code. The workaround is aws lambda update-function-code --function-name <name> --zip-file fileb://build/...zip, or touch a template parameter so the stack update triggers function updates.

From the UC9 template bug fix (Theme Q-1)

  1. !Sub cannot resolve ${Resource.Arn} when the resource has a Condition that evaluates to false. UC9's State Machine DefinitionString referenced ${RealtimeInvokeFunction.Arn}, ${SageMakerInvokeFunction.Arn}, and ${ComponentsInvokeFunction.Arn} — all guarded by Condition: CreateSageMakerResources / CreateRealtimeEndpoint / CreateInferenceComponents. When those conditions were false, CloudFormation failed with "Unresolved resource dependencies." The fix: use DefinitionSubstitutions with !If to inject either the real ARN (when the resource exists) or a placeholder ARN (when it doesn't). Additionally, a SkipInference Pass state was added as the InferenceRouting Default target when all inference backends are disabled — it injects {"status": "SKIPPED"} into $.inference_result so that AnnotationManager can resolve its input parameters without error. The result: UC9 reaches end-to-end SUCCEEDED (2:45 including VPC cold start + Bedrock annotation) even with all SageMaker features disabled.
  2. Step Functions NumericGreaterThanOrEqualToPath does not exist. The correct field name is NumericGreaterThanEqualsPath. This typo had been present since Phase 2 but never surfaced because the code path was only reached when SageMaker was enabled — and SageMaker was always disabled in prior verification runs. The lesson: Step Functions schema validation only runs on states that are actually reachable during execution, not on the full definition at deploy time.
  3. Discovery Lambda must emit all fields referenced by downstream Choice states. UC9's InferenceRouting Choice state referenced $.discovery.inference_type, $.discovery.file_count, and $.discovery.batch_threshold. The Discovery handler returned none of these. Step Functions raises a runtime error ("Invalid path") when a Choice Variable path doesn't exist in the state input — even if that Choice branch would never be taken. The fix: Discovery always returns these fields with safe defaults (inference_type: "none", file_count: 0, batch_threshold: 100).

From the 17-UC cross-validation sweep

  1. IAM policies on FSxN S3 Access Points must grant both alias ARN and full AP ARN. arn:aws:s3:::<alias>/* and arn:aws:s3:<region>:<account>:accesspoint/<name>/object/* are not interchangeable in IAM evaluation. cfn-lint accepts templates that grant only the alias form; runtime calls made against the ARN form hit AccessDenied. This bug was silent across 9 of 17 UCs until full-UC AWS verification surfaced it. Fix: wrap the AP-ARN clause in !If [HasS3AccessPointName, ..., !Ref "AWS::NoValue"] so it's optional but present when a full AP name is provided.
  2. Python handlers can ship without required imports and still pass unit tests. UC2's entity_extraction/handler.py called os.environ.get(...) without import os. The unit tests' environment-mock setup never exercised the real code path. CloudFormation accepted the zip. The handler hit NameError on first real invocation. The remedy: a pyflakes sweep across all Lambda handlers (scripts/check_handler_names.py) now runs in ~30 seconds and catches this entire class of bug.
  3. The right response to "unit tests passed but deploy is broken" is not more static checks. It's a "deploy and actually run it" verification lane per UC per non-trivial change. The only exception: narrow, specific static checks like check_conditional_refs.py that detect a single well-understood bug class in milliseconds.

From the parallel-session experience

  1. A checkout done by the other session can silently clobber your working tree. The recovery path is via git stash list — the offending session's checkout wraps any dirty tracked changes into an auto-stash, which can be popped to recover. The protocol above (Rule C-2) codifies avoiding this by default.
  2. Translation output and source-layout verification benefit from the same tooling. The _survey_translation_status.py script was useful not only for the 97-file localization batch but also as a regression guard — it produces a matrix that shows at a glance whether a later edit has broken a language variant.

Full stats

Code

  • Lambda functions: 21 new (6 + 8 + 7 across UC15/16/17) + OutputWriter.get_* helpers in shared/
  • Unit tests: 110 new (31 + 47 + 32 for UC15/16/17) + 7 new for OutputWriter read helpers — all PASS, 958 total project-wide
  • CloudFormation templates: 3 new for UC15/16/17, and templates updated across UC1-5 + UC9-12 + UC14 for OutputDestination unification in Phase 7 Extended Work. UC6/7/8/13 remain Pattern C due to Athena OutputLocation not supporting S3 Access Points (tracked as FR-2); a hybrid Pattern B+C migration for those four UCs is scheduled for Phase 8 (see docs/design-pattern-c-to-b-hybrid.md).
  • cfn-lint: 0 real errors across all templates (E2530 SnapStart warnings filtered per UC convention)
  • Validation scripts: 3 new reusable checks — lint_all_templates.sh, check_handler_names.py, and check_conditional_refs.py

Screenshots

  • Tracked masked screenshots: Step Functions Graph view (SUCCEEDED) for all 17 UCs, plus common infrastructure views and per-UC output previews across phase1–7 directories
  • v7 re-masked in PR #2: 101 PNGs (v6 heavy-mask replaced)
  • Leak verification: 0 OCR-detectable sensitive substrings (lang=eng+jpn)

Documentation

  • Per-UC architecture docs: 17 UCs × 8 languages = 136 files, all present; target-language variants marked translated by the survey script
  • Per-UC demo-guides: 17 UCs × 8 languages = 136 files, all present; target-language variants marked translated
  • Total per-UC language variants: 272 files across all 17 industries and 8 languages (only UC6 has 4 additional JP-only / EN-only one-off docs that are intentionally single-language — meeting notes and a specific-audience talk script)
  • Localization batch (Theme R) this phase: 97 files newly translated or backfilled in Phase 7 Extended Work
    • R-1: UC1-14 zh-CN + zh-TW backfill (55 files — the two languages that had been lagging behind en/ko/fr/de/es; one of the expected 56 slots was already complete from an earlier hand-translation and did not need regeneration)
    • R-2: UC15-17 × 7 target languages × 2 doc types (42 files — new Public Sector UCs' full 7-language initial translation)
    • The remaining ~175 per-UC translated files were authored in Phase 5 / Phase 6 for UC1-14 en/ko/fr/de/es and stayed stable through Phase 7.
  • Top-level READMEs: all 17 UCs with 8-language coverage (JP + 7 variants, translation batch 217a509 before Phase 7)
  • Cross-cutting docs: dual-kiro-coordination.md, MASK_GUIDE.md (v7), phase7-summary.md, phase7-troubleshooting.md, output-destination-patterns.md, design-pattern-c-to-b-hybrid.md, design-output-writer-multipart.md

AWS verification

  • AWS verification: Initial 12-stack verification across two rounds, followed by a final 17-UC cross-validation sweep
  • Step Functions executions: all SUCCEEDED
  • Total verification spend: ~$0.10 Phase 7 proper + ~$0.05 Phase 7 Extended cleanup + ~$10 Bedrock translation → ~$10.15 session total

Looking Forward to Phase 8

Phase 7 Extended Work surfaced several structural improvements that warrant a dedicated Phase 8:

  1. Cleanup script rewrite: Python version with --dry-run, integrated Athena WorkGroup + versioned-bucket + VPC Endpoint SG handling (Phase 8 Theme A)
  2. VPC Endpoint SG automation: Custom Resource replacing the current manual inbound-rule workaround (Phase 8 Theme B)
  3. Sample data generator expansion: DICOM for UC5, IFC for UC10, SHP + LAS for UC17 (Phase 8 Theme C)
  4. Phase 7 UC15-17 UI/UX screenshots: Manual capture + v7 mask + integration into demo-guides (Phase 8 Theme D)
  5. Event-driven trigger pattern: S3 PutObject → EventBridge → Step Functions with DynamoDB-based idempotency, starting with UC1 as reference implementation (Phase 8 Theme E)
  6. Dual-Kiro protocol v2: git worktree-based physical checkout separation, exclusive region declaration files (Phase 8 Theme F)
  7. Phase 8 article and 8-language docs update (Phase 8 Theme G)
  8. Pattern C → Pattern B hybrid migration for UC6/7/8/13 (Phase 8 Theme H — see docs/design-pattern-c-to-b-hybrid.md for the full design)
  9. OutputWriter.put_stream API for > 5 GB artifacts using S3 MultipartUpload (Phase 8 Theme I — see docs/design-output-writer-multipart.md)
  10. DevSecOps CI/CD pipeline: cfn-lint + pyflakes + Guard Hooks + IAM Access Analyzer integrated into GitHub Actions / CodeBuild, replacing the current manual validation script execution (Phase 8 Theme M — Themes J/K/L cover OutputWriter multipart, template management, and code cleanup tracked in the Phase 8 spec)
  11. Observability and operational readiness: CloudWatch Alarms, Step Functions DLQ, EventBridge failure notifications, and per-UC operational runbooks for production-grade monitoring (Phase 8 Theme N)

And two residual items from Phase 7 Extended Work that are not Phase 8 scope but tracked as Phase 7 Theme Q:

  • UC9 autonomous-driving: template bug fixed in Phase 7 Theme Q-1 (commits 92d7ce6 + b1e3021). Four issues resolved: conditional Lambda ARN resolution via DefinitionSubstitutions, NumericGreaterThanEqualsPath typo, Discovery handler missing inference_type field, and SkipInference Pass state for proper end-to-end SUCCEEDED when inference is disabled. AWS verified: full workflow SUCCEEDED in 2:45 (VPC Discovery + Bedrock annotation + COCO JSON output). Remaining: OutputWriter migration (handler-side s3_client.put_objectOutputWriter.from_env()) deferred to Phase 8 Theme H.
  • UC4 media-vfx: Deadline Cloud farm/queue setup, sample VFX render, OutputDestination Pattern B migration, UI screenshot.

Conclusion

Phase 7 added three Public Sector use cases, unified OutputDestination across 13 of 17 UCs, brought all 17 industry-specific docs (architecture + demo-guide) to true 8-language parity, and — in its Extended Round — ran a 17-UC cross-validation sweep that caught and fixed 10 silent production-affecting issues before publication. The Extended Work as a whole — OCR-based screenshot redaction, 97-file localization batch, 9-stack AWS cleanup, dual-Kiro coordination protocol, and three new validation scripts — closed the gaps between "deployable code" and "publishable, repeatable, cross-team-usable patterns."

The most impactful decisions were:

  1. Symmetric OutputWriter.get_* helpers: without them, UC16's chain structure would have silently failed in FSXN_S3AP mode
  2. OCR masking with lang="eng+jpn": single string change that eliminated the entire class of language-adjacency leaks
  3. Hiragana/Katakana ratio as the translated-state heuristic for Chinese targets: let the validation tooling do its job without false negatives
  4. 17 industries × 8 languages as a first-class commitment: not just READMEs, but the per-UC architecture docs and demo-guides that industry practitioners actually read when evaluating a pattern for their own stack
  5. docs/dual-kiro-coordination.md: codifying the near-miss protocol made subsequent parallel sessions reliably non-destructive
  6. Deploy-and-run verification per UC: unit tests plus cfn-lint passed on templates that had silent AccessDenied bugs for 9 of 17 UCs. Only full-UC AWS deploy + Step Functions execution surfaced them. The sweep is now standard practice.

The AWS verification artifact we keep coming back to is the UC17 Bedrock Markdown report: ~1.1 KiB of Japanese urban planning commentary, rendered on a GPU-less serverless Lambda, landing on an FSx for ONTAP volume via a single S3 API call, and immediately readable by a city planner through SMB/NFS next to the source GeoTIFF. No additional output-copy step. No per-UC output bucket created. No separate AWS console or SDK workflow required for file consumers beyond their existing SMB/NFS access path (which still carries NTFS ACLs and AD group membership — the access control surface is unchanged). And now, the same story is readable in seven other languages by engineers evaluating the pattern for their own public-sector, healthcare, manufacturing, or retail workloads — with the underlying templates, handlers, and permissions verified by a sweep that would have caught most of what almost shipped broken.


Repository: github.com/Yoshiki0705/FSx-for-ONTAP-S3AccessPoints-Serverless-Patterns
Previous phases: Phase 1 · Phase 6A/6B
Phase 7 artifacts (all in the GitHub repo):

Top comments (0)