Kozo-KI

Posted on May 12

I batch-processed 20 meeting minutes with Power Automate + LDX hub. It took 2 days and 8 HTTP actions.

#powerautomate #ldxhub #ai #microsoft

This is Part 4 of a series documenting a non-engineer CEO's attempts to connect Copilot Studio and Power Automate to LDX hub's StructFlow API.
Part 1 — It didn't work yet. Part 2 — REST API via Power Automate, finally working. Part 3 — MCP direct connection, 2 hours.

In Part 3, I connected LDX hub directly to Copilot Studio via MCP. One record at a time, in a chat interface. It worked great.

But then I asked the obvious question: what about 20 files? Batch processing 20 Word documents from SharePoint, extracting structured data from each, and synthesizing them into a single company-wide dashboard?

That's not a job for MCP. That's a job for Power Automate.

This is the story of building that pipeline — every error, every detour, and the moment it finally worked.

What I built:

Microsoft Power Automate flow
20 Word files in SharePoint
LDX hub ExtractDoc + StructFlow (REST API, not MCP)
Output: HTML management dashboard saved to SharePoint

Time required: ~2 days

Architecture

SharePoint (20 Word files)
  ↓ Get files (properties only)
  ↓ Initialize array variable: results[]
  ↓ Apply to each file:
    ├─ Get file content (by path)
    ├─ POST /uploads → file_id (upload session)
    ├─ PUT /uploads/{file_id} → upload binary (base64)
    ├─ POST /extractdoc/jobs → job_id
    ├─ Do until status = completed (poll GET /extractdoc/jobs/{job_id})
    ├─ GET /files/{output_file_id}/content → extracted text
    ├─ POST /structflow/jobs → job_id
    └─ Do until status = completed (poll GET /structflow/jobs/{job_id})
        → append body to results[]
  ↓ POST /structflow/jobs (cross-dept analysis)
  ↓ Do until status = completed
  ↓ Compose HTML dashboard
  ↓ Create file in SharePoint

8 HTTP actions per file. 20 files. Sequential processing.

The errors, in order

Error 1: Wrong upload endpoint

I started with POST /api/v1/uploads. Got 404.

The correct endpoint (without the /api/v1 prefix) is:

POST https://gw.ldxhub.io/uploads

Lesson: check the API docs directly. The base URL doesn't always include a version prefix.

Error 2: File content — multipart/form-data nightmare

POST /files requires multipart/form-data. Power Automate's HTTP connector doesn't handle this cleanly.

The workaround: use the chunk upload flow instead.

POST /uploads — creates an upload session, returns file_id
PUT /uploads/{file_id} — sends the file content as base64 JSON

{
  "data": "@{base64(body('パスによるファイル_コンテンツの取得'))}"
}

This is the JSON-based chunk upload designed for MCP clients, but it works perfectly from Power Automate too.

Error 3: File not found (SharePoint path)

Getting file content by ID didn't work. The fix: use "Get file content by path" instead of "Get file content".

The correct path format:

concat('/Shared Documents/General/LDXhubtest/', items('それぞれに適用する')?['{FilenameWithExtension}'])

The field name is {FilenameWithExtension} (with curly braces) — found by inspecting the raw output of the "Get files" action.

Error 4: ExtractDoc engine name

"engine": "docx" returned an error. The correct engine ID:

{
  "engine": "ki/extract"
}

Check available engines with GET /extractdoc/engines first.

Error 5: Do until condition syntax

Power Automate's new designer is strict about condition expressions. This fails:

@{body('HTTP_3')?['status']}  equals  completed

This works (in advanced mode):

@equals(body('HTTP_3')?['status'],'completed')

Error 6: ExtractDoc doesn't return text directly

I assumed ExtractDoc would return the extracted text in the response body. It doesn't.

The response contains output_file_id. You then need:

GET /files/{output_file_id}/content

to download the actual text. This requires an extra HTTP action between ExtractDoc polling and StructFlow job creation.

Error 7: Array variable append — null value

AppendToArrayVariable with body('HTTP_5')?['results'] returned a null error.

Fix: append body('HTTP_5') (the entire response), not just the results field.

Error 8: Cross-scope reference error

When I tried to reference loop-scoped actions from outside the loop (for the cross-department analysis step), Power Automate threw:

The action 'HTTP_5' is nested in a foreach scope of multiple levels. 
Referencing repetition actions from outside the scope is not supported.

The solution: accumulate everything into the results array variable inside the loop, then pass variables('results') to the final analysis step outside the loop.

The working flow — key settings

File upload (HTTP)

URI: https://gw.ldxhub.io/uploads
Method: POST
Headers:
  Content-Type: application/json
  Authorization: Bearer {API_KEY}
Body:
{
  "filename": "@{items('それぞれに適用する')?['{FilenameWithExtension}']}"
}

File content upload (HTTP 1)

URI: https://gw.ldxhub.io/uploads/@{body('HTTP')?['file_id']}
Method: PUT
Body:
{
  "data": "@{base64(body('パスによるファイル_コンテンツの取得'))}"
}

ExtractDoc job (HTTP 2)

URI: https://gw.ldxhub.io/extractdoc/jobs
Method: POST
Body:
{
  "engine": "ki/extract",
  "file_id": "@{body('HTTP')?['file_id']}",
  "output_format": "text"
}

Download extracted text (HTTP 8, after polling)

URI: https://gw.ldxhub.io/files/@{body('HTTP_3')?['output_file_id']}/content
Method: GET

StructFlow job (HTTP 4)

{
  "model": "anthropic/claude-sonnet-4-6",
  "system_prompt": "以下の会議議事録から構造化データを抽出してください...",
  "example_output": { ... },
  "inputs": [{"id": "0", "data": {"minutes": "@{body('HTTP_8')}"}}]
}

The result

After 2 days of iteration:

Metric	Result
Departments processed	20 / 20
StructFlow jobs completed	20 / 20
Total tasks extracted	100
High-severity risks identified	21
Cross-department dependency entries	60+

The HTML dashboard shows:

Company-wide task list (all 100, with assignee, deadline, related dept)
Risk cards by severity (color-coded)
Cross-department dependency map
Per-department summary cards

Key insight on architecture: LDX hub handles all the intelligence — text extraction (ExtractDoc) and structured data generation (StructFlow). The HTML template I wrote just renders the JSON. The processing engine and presentation layer are fully separated.

MCP vs REST API — the actual comparison

Now that I've done both, here's the honest breakdown:

	MCP (Part 3)	REST API — Power Automate (Part 4)
Setup time	~2 hours	~2 days
Errors	2	8+
Best for	Single record, interactive	Batch processing
20-file batch	❌ Not practical	✅ Right tool
Polling complexity	Handled by agent	Manual Do until loops
File upload	Via MCP chunk API	Via REST chunk upload

MCP wins on simplicity for conversational use cases. REST API wins for scheduled batch jobs.

What I'd do differently

Test with 1 file before 20. I wasted hours debugging a flow that was running on all 20 files.
Check the API docs before assuming endpoint paths. The /api/v1/ prefix doesn't exist on all endpoints.
Verify Do until conditions in advanced mode. The GUI condition builder generates subtly wrong expressions.
Add error handling. The current flow times out silently if an API call fails mid-loop.

What's Next

Phase 2: A quality comparison between two approaches to dashboard generation:

Structured data route: StructFlow extracts JSON → HTML renders JSON (what we built)
Unstructured data route: raw meeting text passed directly to an LLM → HTML rendered from prose output

The hypothesis: structured data produces more consistent, queryable, and accurate dashboards. But how much better, exactly? And at what cost difference? That's the next experiment.

Kawamura International is a translation and localization company documenting its AI process experiments in public. StructFlow, RefineLoop, RenderOCR — and whatever comes next.

DEV Community