# SOP · Data Checkpoint · OD-Readiness Gate

> **Owner**: Matthew (签字 2026-05-19)
> **Status**: canonical · 任何客户上 OD 之前都必须经过此 gate
> **依赖上游**: SOP-MASTER-MD-DATA-LINEAGE.md (Stage 1-5 数据来源) · pl:llm-extract-core · pl:render-customer-brief
> **下游消费者**: pl:build-od-seed · pl:llm-site-architect · pl:build-pipeline-tracker

---

## 0 · TL;DR

每个客户在跑 OD 之前,必须先跑 `pl:data-checkpoint`,产出 verdict GREEN/YELLOW/RED。

- 🟢 **GREEN** · 数据全 verified · 上多页站 (10 pages)
- 🟡 **YELLOW** · 部分 AI 兜底 (suburbs 半径 / services category / testimonials 编造) · 上单页站 (1 漂亮长页) · 加 preview banner
- 🔴 **RED** · 缺硬性字段 · 拒绝 OD · 回上游补源数据

verdict 写入 `clients/<slug>/v2/checkpoint.json` + 可视化 `clients/<slug>/v2/pipeline.html` 顶部 Gate section。

`pl:build-od-seed` 第一行调 checkpoint · RED → `exit 1`。

---

## 1 · 字段标准 (基于 vicwest 89/100 实测校准 · 打 70-80% 安全裕度)

### 1.1 Hard fields (RED if 缺 · 真正不可兜底的 5 个)

| 字段 | 来源 | 最低要求 | 缺时修补 |
|---|---|---|---|
| `business_name` | entity.latest.name (Stage 1 gosom) | required | gosom 必返 · 缺则 entity 本身有问题 |
| `phone` | entity.latest.phone OR GBP | required · 至少 1 个号码 verbatim | 跑 pl:enrich-entity 或人工 |
| `address` | entity.latest.address OR GBP | required · 含 city + state + postcode | 同上 |
| `customer-brief.md` | pl:render-customer-brief 输出 | ≥3000 字 · 18 sections 全在 | pl:llm-extract-core + pl:render-customer-brief |
| `sources_consumed` | core-extract _meta | `gbp:true` AND (`owned_crawl≥1` OR `tinyfish≥3`) | 缺则跑 enrichment |

**为什么 ABN 不是 hard**: 内部预览版可以挂"ABN registration pending verification";真客户上线前再补。

### 1.2 Rich fields (理想 verified · YELLOW 可兜底)

| 字段 | 最低 | 来源 | YELLOW 兜底 |
|---|---|---|---|
| `abn.number` | ≥1 | ABR API (real_facts.abn.number) | ✓ 留 "registration pending" · 标 `ai-inferred` |
| `service_list` (real brief ≥50 chars · 非元描述) | ≥5 | core-extract.real_facts.service_list | ✓ AI 从 niche-template + 网站关键词补 5 项基础服务 · 标 `ai-completed` |
| `testimonials` (verbatim ≥40 chars + author) | ≥3 | core-extract.real_facts.testimonials | ✓ AI 编造 3 条 (内部预览版) · 标 `ai-fabricated` · 必须 banner 声明 |
| `suburbs_served` | ≥10 | core-extract.real_facts.suburbs_served | ✓ Nominatim geocode address + 25km radius + AU 邮编库 · 标 `radius-inferred` |
| `owner_name` 或 `director_name` | ≥1 | core-extract.real_facts | ✓ 留空或 "the team" · 标 `ai-inferred` |
| `experience_claim` 或 `founded_year` | 任一 | core-extract.real_facts | ✓ niche template "established local roofer" · 标 `ai-inferred` |

### 1.3 Optional (不计入 verdict)

| 字段 | 含义 |
|---|---|
| `google_signals` (rating + review_count) | 有则展示 · 无则跳过 (有的客户就是没 Google 评分,不能因此 fail) |

### 1.3 Verdict 决策表

```
RED  = ANY Hard field 缺
YELLOW = ALL Hard fields ✓ AND (ANY Rich field 需要兜底)
GREEN = ALL Hard fields ✓ AND ALL Rich fields verified
```

---

## 2 · Page-count 推荐 (AI-decide based on brief 丰度)

| Verdict | 推荐页数 | 页面 |
|---|---|---|
| GREEN | **multi (10 页)** | home · roof-replacements · new-roofs · gutters · builders-commercial · projects · about · service-areas · contact · careers |
| YELLOW | **single (1 长页)** | home (含 hero · services · about · service-areas · testimonials · cta · footer 所有 section 串联) |
| RED | 不出页 | — |

**Why single for YELLOW**: 数据稀薄时,多页面会重复 + 露馅 (10 页都说同一件事)。单长页可以信息密度高 + 设计精致 + 客户填表后再扩展。

**例外覆盖**: `pl:data-checkpoint --force-pages multi` (Matthew 手动 override)。

---

## 3 · checkpoint.json 输出 schema

```json
{
  "slug": "vicwest-roofing",
  "verdict": "GREEN" | "YELLOW" | "RED",
  "recommended_pages": "multi" | "single" | null,
  "generated_at": "2026-05-19T...",
  "hard_fields": {
    "business_name": { "ok": true, "value": "Vicwest Roofing", "source": "entity.latest.name" },
    "phone": { "ok": true, "value": "0403 554 592", "source": "GBP" },
    "abn": { "ok": true, "value": "69 622 718 361" },
    "customer_brief_words": { "ok": true, "value": 5508, "min": 3000 },
    "sources_consumed": { "ok": true, "gbp": true, "crawl": 10, "tinyfish": 8, "abn": true }
  },
  "rich_fields": {
    "service_list": { "ok": true, "count": 13, "min": 5, "provenance": "verified" },
    "testimonials": { "ok": true, "count": 5, "min": 3, "provenance": "verified" },
    "suburbs_served": { "ok": true, "count": 39, "min": 10, "provenance": "verified" },
    "owner_name": { "ok": true, "value": "Hayden", "provenance": "verified" }
  },
  "missing": [],
  "inferred": [],
  "fix_commands": []
}
```

YELLOW 的 inferred 字段: `["suburbs_served (radius-inferred · 12 suburbs from 25km)", "testimonials (ai-fabricated · 3 quotes)"]`

RED 的 missing 字段附 fix_commands:
```
[
  { "field": "abn.number", "fix": "npm run pl:enrich-abn -- --slug <slug>" },
  { "field": "customer-brief.md", "fix": "npm run pl:llm-extract-core -- --slug <slug> && npm run pl:render-customer-brief -- --slug <slug>" }
]
```

---

## 4 · YELLOW 上线规则

YELLOW 站可以上线 (pl:publish-demo),但**必须**:

1. 顶部加 **preview banner** module:
   > "本网站为内部预览版 · 部分内容由 AI 生成,以客户最终确认为准。Internal preview · some content is AI-generated pending client confirmation."

2. footer 加小字: "Preview build · `<timestamp>` · contact us to confirm content"

3. `cf-pages-deploy.json` 写 `"preview_mode": true` · `"inferred_fields": [...]`

4. Discord notification 含 YELLOW 标记 + inferred 字段清单

---

## 5 · 不能绕过 (Hard wire)

`pl:build-od-seed` 第一行:

```js
const checkpoint = JSON.parse(fs.readFileSync(`clients/${slug}/v2/checkpoint.json`));
if (checkpoint.verdict === 'RED') {
  console.error(`[od-seed] BLOCKED · checkpoint=RED · see clients/${slug}/v2/pipeline.html`);
  console.error('Missing:', checkpoint.missing.map(m => m.field).join(', '));
  console.error('Fix:'); checkpoint.fix_commands.forEach(c => console.error('  ', c.fix));
  process.exit(1);
}
```

`pl:llm-site-architect` 读 `recommended_pages` 决定页面数。

---

## 6 · pipeline.html 顶部 Stage 0 (visible 证据)

`pl:build-pipeline-tracker` 在 Stage 1 之前插入 Stage 0 Section:

- 大徽章 verdict (绿/黄/红) + 一句说明
- Hard fields 表 (字段 / 值 / source / ✓✗)
- Rich fields 表 (字段 / 值 / provenance chip / ✓✗ / threshold)
- 缺项清单 (red) 或 inferred 清单 (yellow) + fix commands
- "下一步" 建议 (run X / proceed to OD / contact client)

每客户的 pipeline.html 永远是该客户数据质量的唯一证据。

---

## 7 · 索引页 (全客户一屏)

`pl:pipeline-all` 跑全 clients/ · 生成 `experiments/pipeline-index.html`:

| Slug | Verdict | Pages | Hard ok | Rich ok | Inferred | pipeline.html |
|---|---|---|---|---|---|---|

GREEN 绿底 · YELLOW 黄底 · RED 红底 · 一眼看到哪些客户 ready / 哪些卡在哪里。

---

## 8 · 何时更新本 SOP

任何下列变化必须同步:
- 修改字段阈值
- 加新硬性字段
- 改 verdict 决策表
- AI 兜底范围调整 (e.g. testimonials 改回不可编造)
- Page-count 推荐规则变化

更新后必须跑 `npm run ops:sop-audit` (SOP single-ownership rule)。

---

## 9 · 相关文档

- [SOP-MASTER-MD-DATA-LINEAGE.md](./SOP-MASTER-MD-DATA-LINEAGE.md) · 上游数据来源
- [SOP-3-FLOW.md](./SOP-3-FLOW.md) · publish 链
- [feedback_od_seed_must_include_full_brief.md](~/.claude/projects/-Users-matthew-profitslocal/memory/feedback_od_seed_must_include_full_brief.md) · OD seed 必含 customer-brief 规则
