e2e codex skill w/ chrome-devtools-mcp

chrome-devtools-mcp 기반 e2e 테스트 및 QA 가이드 생성하는 Codex Skill

Table of Contents

요즘 구현 끝낸 뒤 chrome-devtools-mcp로 e2e 테스트를 진행하고 있다.

/clarify
chrome-devtools-mcp 를 이용한 e2e 테스트 계획을 세우자.
대상은 refs/origin/develop 대비 현재 브랜치에서 변경된 모든 커밋/변경사항이야.

어떤 방향으로 무엇(UI)을 테스트해야하고, 그것이 어떤 결과를 가져야 하는지 먼저 happy-path 를 구성해보자.

진행 시 interrupts req + mock data 이용하는데, 절대로 기존 서버 req/res 구조를 변경해서는 안돼.
서버 API 문서는 https://.../ 여기를 참고할 수 있어.
또한 절대로 임의 추측/판단하지 말고, 반드시 실제 데이터/코드/문서/조사결과만을 바탕으로 진행하자.

이 claude-code 프롬프트에서 시작해 지금은 codex 로 아래와 같이 스킬을 구성했다. ctx가 1M 이기도 하고, 5h/1w 한도가 널널해서 codex로 하기로 했다.

diff-aware-web-e2e
---
name: diff-aware-web-e2e
description: Plan all impacted web E2E paths for current branch changes against a user-provided base ref, then execute only the user-selected paths with chrome-devtools-mcp. By default, use code-derived page-level request and response interception plus mock data unless the user asks for real API behavior. Use when the user wants evidence-based UI test planning and execution without changing server request or response contracts.
---

# Diff-Aware Web E2E

Use this skill when the user wants to turn current branch changes into concrete, evidence-based web E2E checks.

## What This Skill Does

- Reads `<base_ref>...HEAD` changes to find impacted UI areas.
- Produces a plan-level user product intent summary before scenario planning output.
- Builds a full scenario inventory covering directly impacted user paths plus diff-backed edge-case and regression-focus checks.
- Derives request and response shapes from code and docs before planning default interception and mock data.
- States the planned mock target request, mock target response, and mock verification approach in plan output when API behavior matters.
- Writes scenarios with step-by-step actions and step-by-step expected UI states.
- Shows the full planned inventory, marks a recommended set with reasons, then asks the user which scenario IDs or recommended set to execute.
- Optionally executes only the selected scenarios with `chrome-devtools-mcp`.
- Reports only evidence-backed results.

## Default Mode

Start in **plan-only** mode and do not execute until the user chooses which planned scenarios to run.

## Inputs

Collect only what is missing:

- `mode`: `plan` or `execute`
- `base_ref`: required; compare `<base_ref>...HEAD`
- `change_focus`: optional user concern or priority area
- `target_area`: optional specific feature or page
- `api_docs_url`: optional; use when provided
- `mock_mode`: `auto` (default; use interception plus mock data unless the user says otherwise), `required`, or `off`

## Core Rules

- Never invent affected pages, API behavior, or expected results.
- If `base_ref` is missing, ask the user for it and stop until it is provided.
- Before the scenario inventory, provide a plan-level user product intent summary.
- Build the user product intent summary from user input first, then supplement with defensible diff or related code evidence when needed.
- Structure the user product intent summary using `Confirmed` and `Inferred`.
- If product intent remains partially unclear, include `Open Questions` or `Unclear Intent` and continue planning when the scenarios are still defensible.
- Keep the user product intent summary informative only; do not change scenario priority or recommended-set logic just because of it.
- Derive every scenario from at least one concrete source:
  - changed code
  - tests or stories
  - API docs
  - observed browser or network evidence
- Unless the user explicitly asks for real API behavior or `mock_mode: off`, treat chrome-devtools-mcp-based page-level request and response interception plus mock data as the default strategy.
- Derive mock request and response shapes from code and docs before planning or executing page-level interception and mocking.
- Plan the full impacted scenario inventory before suggesting execution.
- Cover directly impacted user paths plus same-screen or same-flow paths, and include diff-backed edge-case and regression-focus scenarios when they are defensibly tied to the change.
- Expand each planned path until a clear completion state is reached.
- Include step-by-step user actions and step-by-step expected UI states.
- When API behavior matters, say which request and response will be intercepted or mocked and how mock verification will be checked for each scenario.
- Provide scenario IDs, priority, and a recommended set with reasons for planned paths.
- Never auto-select or auto-execute the recommended set.
- Do not change server request or response contracts.
- Never terminate `chrome-devtools-mcp`, Chrome, or remote-debugging processes that this run did not start.
- If evidence is insufficient, ask or stop.

## Evidence Order

1. Changed code and tests
2. Provided API docs
3. Runtime DOM or network evidence
4. User confirmation

See [evidence-rules.md](references/evidence-rules.md).

## Planning Workflow

1. If `base_ref` is missing, ask the user for it and stop.
2. Read the diff against `<base_ref>...HEAD`.
3. Find directly impacted UI entry points and related routes.
4. Read only the minimal supporting code, tests, and docs needed to map the full impacted scenario inventory.
5. Derive request and response shapes from code first by tracing the changed UI trigger, API caller, request builder, shared client, and response consumer.
6. If `api_docs_url` is provided, inspect it to confirm endpoint purpose and response shapes.
7. Before the scenario inventory, produce a plan-level user product intent summary that includes:
   - `Confirmed`: user-stated intent or intent made explicit in provided product context
   - `Inferred`: defensible intent inferred from the diff or related code
   - `Open Questions` or `Unclear Intent` when intent remains partially unresolved
   - brief evidence notes showing why each item is defensible
8. For each scenario, produce:
   - scenario ID
   - coverage relation: `direct impact`, `same-screen branch`, or `same-flow regression`
   - scenario objective: `primary`, `edge-case`, or `regression-focus`
   - target UI or flow
   - why it is tied to the diff
   - ordered user actions
   - step-by-step expected UI states
   - request and response derivation evidence when API behavior matters
   - default execution strategy: mocked or real API fallback
   - mock target request and response when API behavior matters
   - mock verification plan
   - injection approach only when it is needed to explain feasibility
   - priority
   - recommended-set status and reason
9. Show the user product intent summary, show the full scenario inventory, show the recommended set, and ask the user which scenario IDs or recommended set should move to execution.

See [planning-rules.md](references/planning-rules.md).

## Execution Workflow

Use this only after the user chooses which planned scenario IDs or recommended set to execute.

1. Start a run-owned isolated Chrome and `chrome-devtools-mcp` context.
2. Check that the current MCP runtime supports isolated execution, timeout tuning, and log capture. If that is clearly missing, report `block` with the missing preconditions.
3. Do not attach cleanup behavior to any Chrome or MCP process not started by this run.
4. Load the planned request and response derivation evidence together with the planned mock targets and verification checks.
5. Unless the user opted out or the selected scenario was explicitly planned as a real API fallback, apply the planned page-level request and response interception and mock data with `initScript` or `evaluate_script` using only code-derived or doc-derived payload shapes.
6. If needed, navigate to the selected target path.
7. Wait for stable UI evidence before judging results.
8. Use snapshots for structure and screenshots for reporting.
9. Inspect console and network activity for corroborating evidence.
10. On connection failure, retry in-place, then restart the isolated run-owned instance, then collect logs and report `block`.
11. Report `pass`, `fail`, or `block`.

Use these `chrome-devtools-mcp` capabilities when relevant:

- `list_pages`
- `select_page`
- `new_page`
- `navigate_page`
- `wait_for`
- `take_snapshot`
- `take_screenshot`
- `list_network_requests`
- `get_network_request`
- `list_console_messages`
- `evaluate_script`

See [execution-rules.md](references/execution-rules.md).

## Mocking Policy

- Unless the user explicitly asks for real API behavior or `mock_mode: off`, treat chrome-devtools-mcp-based page-level request and response interception plus mock data as the default strategy.
- Planning should assume mocked execution first and describe the target request, target response, and mock verification approach for each scenario when API behavior matters.
- Use code-first request and response derivation before deciding whether page-level mocking is safe.
- First check whether page-level mocking is sufficient.
- Page-level mocking may use `initScript` or `evaluate_script` to patch `fetch` or `XMLHttpRequest`.
- For mocked flows, do not require a real network request as mandatory evidence.
- Verify that mocking actually took effect using observable DOM evidence, explicit mock-hit evidence, or both.
- Do not claim this is full browser-level interception.
- If page-level mocking is unreliable or the scenario needs broader interception than it can safely cover, fall back to real API behavior or report `block`.

## Output Format

### Plan Mode

- User product intent summary shown before the scenario inventory
- `Confirmed`, `Inferred`, and `Open Questions` or `Unclear Intent` when needed
- Evidence notes for the user product intent summary
- Full impacted scenario inventory
- Scenario ID, coverage relation, scenario objective, priority, and recommended-set status for each scenario
- Recommended set with a short reason for each included scenario
- Evidence for each scenario
- Step-by-step actions
- Step-by-step expected UI states
- Request and response derivation evidence when API behavior matters
- Default execution strategy
- Mock target request and response when API behavior matters
- Mock verification plan
- Injection approach only when it is needed to explain feasibility
- A final clarify step asking which scenario IDs or recommended set should move to execution

### Execute Mode

- Selected path result: `pass`, `fail`, or `block`
- Screenshot
- Brief evidence summary
- Relevant network notes, or mock-hit notes for mocked flows
- Mock verification notes
- Recovery notes if connection handling was needed
- A concise summary of the process used to get to the result

## Completion Wrap-Up

- After all planned or selected execution work is complete, summarize the process used:
  - diff basis
  - path selection logic
  - request and response derivation basis
  - execution setup
  - mocking approach
  - evidence collected
  - blockers or recoveries
- If execution was performed, always ask whether to create a concise QA handoff document in Korean aimed at QA or planners.
- If the user says yes, do not draft the QA handoff document yet.
- First produce a concise QA handoff plan and ask the user to approve that plan before writing any file.
- The QA handoff plan must include:
  - proposed `.md` save path with a single recommended location for user confirmation
  - intended audience
  - document section outline
  - scenario coverage to include
  - existing screenshots that are good enough to reuse
  - additional screenshots that must be captured or recaptured
  - device context for each screenshot: desktop or mobile
  - why each screenshot is needed for fast QA understanding
- Do not draft the QA handoff document unless the user approves that QA handoff plan.
- After the user approves the QA handoff plan, write the QA handoff as a `.md` file.
- If the user asks for the QA handoff document, include:
  - write it in Korean for QA or planning audiences
  - scenario ID and title
  - goal
  - scope or covered user path in product terms
  - setup and mock strategy in audience-friendly terms
  - steps
  - expected UI
  - actual evidence
  - screenshot list with captions, using element-focused captures with minimal surrounding noise when possible
  - network or mock verification notes
  - blockers or open risks
- Exclude development implementation details, code-level explanations, internal reasoning, and backend contract discussion unless the audience explicitly asks for them.
- Use full-page screenshots only when the element-focused capture would hide necessary product context.
- Prefer screenshots that center the changed UI with only the surrounding context needed to understand the state.
- Match screenshot device context to the product path being documented and say which captures are desktop or mobile.
- If the existing screenshots are too broad, show the wrong device context, or do not make the changed UI easy to understand, take additional screenshots before drafting the QA handoff.

## When To Stop And Ask

- `base_ref` is missing
- No reliable UI candidate can be tied to the diff
- A completion path cannot be justified from code, docs, or observed behavior
- Request or response shapes cannot be derived from code or docs without guessing
- Code-derived request or response shapes conflict with observed runtime evidence
- The planned scenario inventory is ready and user path selection is required
- Mocking is required but safe scope is unclear
- Authentication or setup prevents reliable execution
- The user approved QA handoff creation and the QA handoff plan still needs approval
- The proposed QA handoff save path needs user confirmation
- Existing screenshots are not sufficient and additional capture scope still needs confirmation

## Non-Goals

- Branch-to-branch visual diff systems
- App-wide route crawling unrelated to directly impacted paths
- Full browser-level request interception guarantees
- Mobile WebView-specific flows
- Changing backend contracts to make tests easier
- Killing externally managed Chrome or MCP processes

diff base 전달 - E2E 테스트 계획 - 시나리오 수립 - 테스트 실행 - QA 가이드 제안 - QA 가이드 작성(옵션) 이런 흐름으로 진행한다. diff 사이즈에 따라 다르지만, 시나리오 수립 후 실행하면 opus-4.6(thinking)/gpt-5.4(xhigh+fast) 기준 약 15~40분 정도 작업을 수행한다.

그런데 따로 말해주지 않으면 claude-code 대비 findings(scenarios) 를 조금 덜 찾아준다… 프롬프트를 좀 조정해 세부적으로 시나리오를 제공하도록 했다.

chrome-devtools-mcp는 chrome bin 을 이용한다. 그래서 그런가 조금 자주 실패하기에(특히 transport closed), 실패 시 재시도 - 재생성 - 로그 남기고 block & 알림 이 순서로 상황 접근하도록 했다. 참고로 transport closed 는 단순히 mcp-브라우저 통신이 닫힌 상황.

그래서 병렬 실행을 위해서도 조금 손봐야 한다. 명시적으로 지정하지 않으면 종종 shared로 브라우저 사용하는 것으로 보인다. Skill 에도 나와있지만, 크롬 브라우저 사용 시 --isolated 옵션을 전달하도록 해야 한다(이 skill에서만 사용될거라 생각하기에 mcp 설정보다 skill에서 전달하도록 했다). 그 외에도 headless 모드 사용하도록 mcp 설정했고.

wait_for는 페이지 진입 후 콘텐츠가 바로 보이지 않을 수 있기에(가령 SPA) 구성해줬다. Puppeteer 에도 이런 API가 있다.

req/res는 기본적으로 가로채서 mocking한다. 특히 로그인 필요한 기능이 많아서 종종 핸드오프되기에… 계정(토큰)을 .env나 직접 전달하는건 조금 이상하기도 하고. 물론 이것도 계획에서 사용자에게 물어본다.

재밌었던 점은 QA 가이드를 상당히 잘 만들어 준다는 점. QA 뿐만 아니라 기획자 등 비개발자에게 관련 내용을 전달할 때 매우 편했다. 물론 이 역시 프롬프트에 존재한다. Execution 끝나면 QA 가이드 만들어줄지 물어보고, Yes 하면 세부적으로 계획을 세운다. 목적, 예상 범위, 예상 스크린샷 중심으로 말해준다.

references 디렉터리를 이용해 planning+evidence, execute 시 어떻게 접근해야 하는지도 문서화했다.

Planning Rules
# Planning Rules

## Scope

Plan the full impacted scenario inventory for changes in the current branch against `<base_ref>...HEAD`.
Before the scenario inventory, provide a plan-level user product intent summary.
Unless the user explicitly asks for real API behavior or `mock_mode: off`, planning should assume page-level request and response interception plus mock data as the default execution strategy.

## User Product Intent Summary

Create a single plan-level summary before the scenario list.

Use these fields:

- `Confirmed`: intent explicitly stated by the user or made explicit in provided product context
- `Inferred`: defensible intent inferred from the diff or closely related code
- `Open Questions` or `Unclear Intent`: unresolved gaps that do not block a defensible scenario plan

Intent evidence should prefer user input first and use diff or related code only as supporting evidence.
Do not change scenario priority or recommended-set logic based on this summary alone.

## Candidate UI Signals

Use these signals to infer impacted UI:

- route or page files
- router config
- changed components imported by pages
- button, heading, link, or `data-testid` strings
- tests, stories, or Playwright specs

## Scenario Taxonomy

Every scenario must include both labels:

- `coverage relation`: `direct impact`, `same-screen branch`, or `same-flow regression`
- `scenario objective`: `primary`, `edge-case`, or `regression-focus`

Use defensible pairings:

- `primary` usually covers the main changed path and commonly pairs with `direct impact`
- `edge-case` covers alternate, boundary, empty, error, or validation states that are defensibly tied to the diff and commonly pairs with `direct impact` or `same-screen branch`
- `regression-focus` covers behavior that should remain stable around the changed path and commonly pairs with `same-screen branch` or `same-flow regression`

If a scenario needs an unusual pairing, explain why that pairing is justified by the diff or supporting code.

## Code-Based Request And Response Derivation

When API behavior matters, derive request and response shapes from code first:

1. identify the changed UI trigger
2. find the action handler or event path
3. trace the API caller or shared client
4. inspect request builders, params, payload keys, and headers
5. inspect response consumers, branch conditions, and rendered UI states
6. use API docs only to confirm or supplement what code already supports
7. use runtime network evidence only to verify or compare against the code-derived understanding

If code and docs do not support a request or response shape, do not invent one.

## Scenario Construction

For each scenario, include:

- scenario ID
- coverage relation
- scenario objective
- target page or flow
- changed code evidence proving why the scenario is tied to the diff
- start point
- completion condition
- step-by-step user actions
- step-by-step expected UI states
- request and response derivation evidence when API behavior matters
- default execution strategy: mocked or real API fallback
- mock target request and response when API behavior matters
- mock verification plan
- injection approach only when it is needed to explain feasibility
- priority
- recommended-set status and reason
- confidence and any assumptions that still need user confirmation

## Path Expansion

- Expand each directly impacted path until the user reaches a clear completion state.
- Do not stop at the first changed screen if the affected flow continues.
- Include same-screen branches and same-flow regression checks when they are defensibly tied to the changed path.
- Include separate diff-backed `edge-case` and `regression-focus` scenarios when the changed path exposes them.
- Do not broaden into unrelated feature-wide regression coverage.

## Documentation Use

- If `api_docs_url` is available, use it to confirm endpoint purpose and response shape after tracing the code path.
- If no docs are available, rely on code first and runtime evidence only as supporting verification.
- If neither code nor docs support an expected API outcome, do not invent one.

## Question Policy

Ask only when a missing fact blocks a defensible plan.

If `base_ref` is missing, ask for it before reading the diff.

If request or response shapes cannot be derived from code or docs without guessing, ask before planning mocks.

If code-derived request or response shapes conflict with runtime evidence in a way that changes the scenario expectation, ask before continuing.

If mocked execution looks unsafe or unsupported for the scenario and no defensible real API fallback is available, ask or plan the scenario as `block`.

If user product intent is only partially clear but the impacted scenarios are still defensible, continue planning and record the gaps under `Open Questions` or `Unclear Intent`.

After presenting the full planned scenario inventory and recommended set, ask the user which scenario IDs or recommended set should move to execution.
Evidence Rules
# Evidence Rules

## Allowed Evidence

- changed source files
- tests and stories
- official API docs when provided
- observed DOM state
- observed network requests and responses
- observed console output
- explicit mock-hit markers exposed by the patched page layer

## Path Coverage Standard

- The planning result should account for all directly impacted user paths that can be justified from the diff and supporting code.
- The planning result should also account for same-screen branch and same-flow regression scenarios when they are defensibly tied to those directly impacted paths.
- The planning result should include separate diff-backed `edge-case` and `regression-focus` scenarios when the changed path exposes them.
- Each planned path should explain why it is tied to the change and why its coverage relation and scenario objective are justified.
- When API behavior matters, each planned path should also say which request and response will be mocked by default and how mock confirmation will be checked.

## Disallowed Behavior

- inventing routes or user flows
- inventing response fields not supported by code or docs
- inventing mock payloads not supported by code or docs
- claiming pass or fail without a visible or observable signal
- changing backend request or response contracts
- claiming mocking worked without observable confirmation
- requiring a real network request as mandatory proof for a page-level mocked flow

## Reporting Standard

Each conclusion should be traceable to one or more concrete observations.

Each step-level expected UI state should be traceable to code, docs, tests, or observed behavior.

When planning a mocked flow, the report should name the intended mock target request, intended mock target response, and mock verification approach in concise scenario-level terms.

For page-level mocked flows, acceptable confirmation may come from DOM state, explicit mock-hit markers, console markers, or a real network request when one still occurs.

For QA handoff screenshots, prefer relevant element-focused captures with minimal surrounding noise. Use broader page captures only when the focused capture would hide necessary product context.

For QA handoff screenshots, the chosen capture should make the changed UI easy to understand quickly for QA readers, not just prove that the page existed.

For QA handoff screenshots, match the capture to the documented device context, including desktop versus mobile.

If an existing screenshot is too broad, lacks the changed UI focus, or shows the wrong device context, treat it as insufficient and capture a better one before drafting the QA handoff.

If code-derived request or response shapes conflict with runtime evidence, say so and stop or report `block` instead of guessing.

The final report should also include a short process summary describing how the result was reached.

If confidence is low, say so and explain what evidence is missing.
Execution Rules
# Execution Rules

## Browser Strategy

- Each execution owns its own isolated Chrome and `chrome-devtools-mcp` context.
- Never terminate Chrome, remote-debugging Chrome, or `chrome-devtools-mcp` processes that this run did not start.
- If the user needs to log in manually inside the isolated run-owned browser, pause and resume after the handoff.

## Preflight

- Before execution, verify that the active MCP runtime is configured for isolated launches or another equally safe ownership model.
- If the runtime clearly lacks isolation, timeout tuning, or log capture, report `block` and name the missing preconditions.

## MCP Tool Usage

- Use `take_snapshot` to inspect page structure before acting.
- Use `wait_for` for stable UI evidence instead of fixed sleeps when possible.
- Use `take_screenshot` for final reporting artifacts.
- Use `list_network_requests` and `get_network_request` to confirm real API activity or compare against the code-derived understanding when real requests occur.
- Use `list_console_messages` to spot frontend regressions or collect explicit mock-hit markers when available.
- For QA handoff screenshots, prefer element-focused captures with minimal surrounding noise and use full-page captures only when broader product context is necessary.
- Before drafting a QA handoff document, review whether the existing screenshots clearly show the changed UI state for the intended audience.
- If an existing screenshot is too broad, hides the changed UI, or uses the wrong device context, recapture it.
- Match the screenshot viewport and framing to the documented product context, including desktop versus mobile.

## Connection Recovery

- On `transport closed` or similar MCP connection failures, retry once or twice in the current run-owned context.
- If retry fails, recreate the isolated run-owned Chrome and MCP context and continue.
- If recovery still fails, enable log capture, preserve the failure evidence, and report `block`.
- Prefer explicit logging and timeout tuning over silent retries.
- When the client configuration allows it, raise `startup_timeout_ms` and capture `--log-file` output for failed runs.
- Prefer isolated run-owned launches over auto-connecting to external browsers for parallel execution.

## Result Labels

- `pass`: expected UI and supporting evidence match
- `fail`: expected UI or supporting evidence clearly diverge
- `block`: missing auth, missing data, unsupported mocking, insufficient evidence, or unresolved code versus runtime conflicts

## Mocking

- Unless the user explicitly asks for real API behavior or `mock_mode: off`, request and response interception with mock data is the default strategy.
- Follow the planned mock targets and mock verification checks from plan mode unless runtime evidence forces a safer fallback.
- Derive request and response shapes from code and docs before writing any mock payload.
- Page-level mocking may patch `fetch` and `XMLHttpRequest`.
- For mocked flows, a real network request is optional evidence, not mandatory evidence.
- Confirm that mocking took effect with observable DOM evidence, explicit mock-hit evidence, or both.
- Treat explicit mock-hit evidence as something the patched layer exposes and can be checked with the page or console state.
- Do not present page-level mocking as complete interception coverage.
- If the flow depends on navigation requests, service workers, or subresource control, treat that as unsupported unless the project already provides a safe mechanism.
- If the planned mock target cannot be intercepted safely at the page level, fall back to real API behavior only when the selected scenario still stays evidence-based; otherwise report `block`.
- If request or response shapes cannot be derived without guessing, stop and ask instead of inventing payloads.
- If code-derived request or response shapes conflict with observed runtime behavior in a way that changes the selected scenario, stop and ask or report `block`.

혼자 사용하면 재미없고 발전이 없기에 팀 ai 활용사례? 리포지토리에도 올렸다. 가이드와 함께 설치/검증 스크립트를 제공했다.

Installation
#!/usr/bin/env bash
set -euo pipefail

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
SOURCE_DIR="${REPO_ROOT}/skills/diff-aware-web-e2e"
TARGET_DIR="${HOME}/.codex/skills/diff-aware-web-e2e"

if [[ ! -f "${SOURCE_DIR}/SKILL.md" ]]
then
  echo "Error: source skill file not found: ${SOURCE_DIR}/SKILL.md" >&2
  exit 1
fi

mkdir -p "$(dirname "${TARGET_DIR}")"

if [[ -d "${TARGET_DIR}" ]]
then
  BACKUP_DIR="$(mktemp -d "${TARGET_DIR}.bak.XXXXXX")"
  rmdir "${BACKUP_DIR}"
  mv "${TARGET_DIR}" "${BACKUP_DIR}"
  echo "Backed up existing skill directory: ${BACKUP_DIR}"
fi

cp -R "${SOURCE_DIR}" "${TARGET_DIR}"

echo "Installed diff-aware-web-e2e skill: ${TARGET_DIR}"
echo "Next: run ./scripts/verify-diff-aware-web-e2e-skill.sh"
Verification
#!/usr/bin/env bash
set -euo pipefail

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
SOURCE_DIR="${REPO_ROOT}/skills/diff-aware-web-e2e"
TARGET_DIR="${HOME}/.codex/skills/diff-aware-web-e2e"

hash_file() {
  local file_path
  file_path="$1"

  if command -v shasum >/dev/null 2>&1
  then
    shasum -a 256 "${file_path}" | awk '{print $1}'
    return
  fi

  if command -v sha256sum >/dev/null 2>&1
  then
    sha256sum "${file_path}" | awk '{print $1}'
    return
  fi

  echo "Error: neither shasum nor sha256sum is available." >&2
  exit 1
}

hash_stdin() {
  if command -v shasum >/dev/null 2>&1
  then
    shasum -a 256 | awk '{print $1}'
    return
  fi

  if command -v sha256sum >/dev/null 2>&1
  then
    sha256sum | awk '{print $1}'
    return
  fi

  echo "Error: neither shasum nor sha256sum is available." >&2
  exit 1
}

dir_manifest_hash() {
  local dir_path
  dir_path="$1"

  (
    cd "${dir_path}"
    find . -type f | LC_ALL=C sort | while read -r relative_path
    do
      local_hash="$(hash_file "${dir_path}/${relative_path#./}")"
      printf '%s  %s\n' "${local_hash}" "${relative_path}"
    done
  ) | hash_stdin
}

if [[ ! -f "${SOURCE_DIR}/SKILL.md" ]]
then
  echo "Error: source skill file not found: ${SOURCE_DIR}/SKILL.md" >&2
  exit 1
fi

if [[ ! -d "${TARGET_DIR}" ]]
then
  echo "Error: target skill directory not found: ${TARGET_DIR}" >&2
  echo "Run: ./scripts/install-diff-aware-web-e2e-skill.sh"
  exit 1
fi

SOURCE_HASH="$(dir_manifest_hash "${SOURCE_DIR}")"
TARGET_HASH="$(dir_manifest_hash "${TARGET_DIR}")"

echo "Source: ${SOURCE_DIR}"
echo "Target: ${TARGET_DIR}"
echo "Source manifest SHA-256: ${SOURCE_HASH}"
echo "Target manifest SHA-256: ${TARGET_HASH}"

if [[ "${SOURCE_HASH}" == "${TARGET_HASH}" ]]
then
  echo "Status: synchronized"
  echo "Standard usage: \$diff-aware-web-e2e <your request>"
  exit 0
fi

echo "Status: mismatch"
echo "Run: ./scripts/install-diff-aware-web-e2e-skill.sh"
exit 2