Trust Infrastructure

We built a system to prove
ourselves wrong.

Before anything gets published on FitChef, it has to survive our verification system. 5 gates. 3 pipelines. 28 kill switches. Zero exceptions. Here's exactly how it works.

Most nutrition content online is broken. You've seen it. Monday: "Eating 6 meals a day boosts your metabolism!" Tuesday: "Actually, meal frequency doesn't matter!" Same research. Different headlines. Neither tells you what the study actually found.

Here's how it happens. Someone reads a study. They pull out what sounds interesting. They write a headline. But somewhere between the paper and the publish button, the facts shift. "Suggests" becomes "proves." A finding about trained athletes becomes advice for everyone. A modest effect becomes a miracle.

We decided we couldn't trust ourselves to not make the same mistakes. So we built something most content platforms would never build: a system designed to catch our own errors before you ever see them.

Three Types of Content. Three Dedicated Pipelines.

FitChef produces three types of content, each with its own dedicated production pipeline and its own verification gates. A study page is different from a claim page, which is different from a flagship guide. Each type has different failure modes — so each type gets different checks.

Pipeline 1

Study Analysis

We read a peer-reviewed paper and write an editorial analysis of what the researchers found. Three skeptic gates verify every number, every sentence, and every import.

3 verification gates

Pipeline 2

Claim Synthesis

We combine findings from multiple studies into one comprehensive, evidence-verified answer. A dedicated skeptic gate verifies that our synthesis is honest and our writing matches the evidence.

1 verification gate

Pipeline 3

Flagship Guides

We weave verified answers into flagship guides. One skeptic gate verifies that every claim is accurately represented — no overstatement, no creative drift.

1 verification gate

That's 5 verification gates between the original research paper and the flagship guide you read. Each gate catches a different type of error. If any gate fails — the content doesn't publish.

Evidence-Connected Content

Beyond our three evidence pipelines, FitChef also publishes recipes and shorts — content that doesn't create new evidence claims, but inherits trust from the studies and claims they link to. Every macro in a recipe traces back to a verified study. Every claim in a short links to our evidence chain.

These 2 connected pipelines run their own verification: 22 kill switches and 8 verification gates ensure that recipes don't invent nutrition numbers and shorts don't overstate what the research found.

Study Gate 1

Did We Read the Paper Correctly?

Here's how FitChef works. We read a research paper. We extract every finding from it — the numbers, the methodology, the results, the limitations. These findings become the foundation for everything we publish. If this foundation is wrong, everything built on top of it is wrong too.

Study Gate 1 takes every finding we extracted and checks it against the original paper. Not a summary. Not an abstract. The actual paper.

Do our numbers match the paper?

Every number we extracted — sample sizes, p-values, effect sizes — gets checked against the paper. If the paper says 47 studies and we wrote "about 50," killed.

Did we quote them correctly?

Every quote gets checked word-for-word. If we changed even one word — even to make it "sound better" — caught.

Did we frame it the way the researchers did?

This is where most sites go wrong. A study about trained athletes becomes advice for everyone. Our Skeptic compares our framing to the paper's actual conclusions.

Did we miss any warnings?

If a study found benefits only in young males and we didn't mention that — flagged. Every limitation the researchers noted, we must note too.

Study Gate 2

Does Our Writing Stay Faithful?

Having accurate findings is step one. But then we write editorial content from those findings. And this is exactly where facts drift. A "modest benefit in trained athletes" quietly becomes "everyone should do this."

Before any study page goes live, the Skeptic runs again. Every sentence — and every word in the audio script — gets checked against the verified findings. Does our content still say what the paper says?

Did our writing change the meaning?

Every sentence compared against the original finding. If the meaning shifted — even slightly — caught.

Did we add hype that wasn't there?

"Breakthrough," "miracle," "game-changer" — flagged instantly. If the researchers didn't use that language, we don't get to either.

The researcher nod test

If the person who ran the study read our page and thought "yep, that's what we found" — it passes. If they'd cringe — it doesn't.

Study Gate 3

Is Everything Import-Ready?

Gates 1 and 2 verify what was extracted and what was written. Gate 3 verifies that the finished package — every field, every number, every link — is consistent, complete, and infrastructure-ready before it enters WordPress.

This is the final checkpoint. If data was verified correctly but the import JSON has a mismatch, a broken citation link, or a missing field — it doesn't ship.

Does the import match the verified data?

Every field in the import JSON is checked against the verified source data. If the study found a p-value of 0.03 and the import says 0.3 — killed.

Is the infrastructure consistent?

Citation links, schema markup, trust bar integration, audio files — everything is validated against the WordPress infrastructure before import.

Does the trust layer hold together?

The final trust audit checks that the Verification Ledger, all cross-references, and the complete data chain are intact. One broken link in the chain = no publish.

81 studies verified through all three gates — 2,268 checks run

Claim Gate

Is Our Multi-Study Synthesis Honest?

Here's where it gets interesting. A study page analyzes one paper. But a claim page — a comprehensive, evidence-based answer to a question like "how much protein per day?" — synthesizes findings from multiple studies. That's a fundamentally different task with fundamentally different failure modes.

When you combine 12 studies into one answer, the temptation is to cherry-pick. To give more weight to the study that tells a better story. To ignore the one study that diverges. Every nutrition site on the internet does this.

The Claim Gate checks the synthesis itself. Did we include all relevant studies — even the ones that diverge? Is our consistency index justified? Does our certainty tier reflect the actual evidence quality?

Did we cherry-pick evidence?

If 9 studies support our answer and 1 diverges, we must include all 10. The diverging study doesn't weaken our answer — it makes it honest.

Is our consistency index justified?

Every claim gets an Evidence Consistency Index (0-100) and a Certainty Tier (High, Moderate, or Low). The Skeptic checks whether these are mathematically justified. Inflated consistency = instant fail.

Does our writing match the synthesis?

The synthesis is verified — but did our writing, our audio, our kitchen translation stay faithful? Every sentence gets checked against the evidence base.

Would a researcher agree?

If an independent researcher looked at the same studies and said "yes, that's a fair representation" — it passes. "Well, that's one way to read it" — it doesn't.

Library Gate

Does Our Flagship Accurately Represent the Answers?

Our flagship guides weave multiple verified answers into one flowing editorial. This is the most creatively demanding content FitChef produces — and creative writing is exactly where facts drift.

The Library Gate checks every claim reference. If we say "9 out of 12 studies agree" — do they? If we label something High Certainty — does the evidence actually support that? The flagship must represent the evidence, not embellish it.

Five Layers Deep

When you read a fact in one of our flagship guides, that fact has been verified five separate times — each catching a different type of error:

Library Gate Does the guide accurately represent the answers?

Claim Gate Is the multi-study synthesis honest and consistent?

Study Gate 3 Is the import package complete and consistent?

Study Gate 2 Does our writing match the verified findings?

Study Gate 1 Do our numbers match the original paper?

Most nutrition platforms don't have even one verification gate. We have 5 — because each layer of content introduces a new way for facts to drift, and each gate is designed to catch that specific drift.

I don't care how good the writing is. If the paper says "small benefit in trained athletes eating 2g/kg" and our page says "everyone needs more protein" — I kill it. If 12 studies were analyzed and only 9 made it into the claim — I kill it. That's the whole point.

— The Skeptic

28 Things That Kill Content Instantly

No appeals. No editorial override. No second chances.

Gate 1: 11 Data Kill Switches

Main finding doesn't match the paper

Study has been retracted

Fabricated or misattributed quotes

Wrong study type reported

Sample size off by more than 10%

Evidence cherry-picked in synthesis

Consistency index inflated beyond evidence

Claim broader than evidence supports

Editorial language stronger than the paper's own words

Data or statistics not found in the original paper

Finding claimed for a wider scope than the study tested

Gate 2: 17 Content Kill Switches

Narrative claims not sourced to paper or marked as context

Info gain presented as study finding

Persona actions not grounded or marked as extrapolation

Answer capsules distort meaning through compression

Audio script exaggerates findings

Inference errors detected

Precision attribution incorrect

Hype language in human-facing fields

Prescriptive health language not attributed to research

Hedging doesn't match evidence strength

Unsourced hard claims in human-facing content

Real-world translation factually incorrect

External citations not properly hedged

Supplementary blocks fail magazine quality gate

Health-condition study disclaimer present

Source URLs verified live and resolvable

No medical verdicts stated as FitChef's own claims

How We Verify Data Integrity

Surviving the gates is step one. After that, every piece of content carries a Verification Ledger — a transparent record of exactly what was checked and whether it passed. Studies and claims use different verification systems because they measure different things.

Study Verification Ledger (6-Point Check)

Every study page carries a Verification Ledger — a complete record of every check the pipeline ran. Each check is pass or fail. No scores. No ratings. No editorial opinion about the research itself. FitChef verifies our own data extraction fidelity, not the quality of the science.

Number Accuracy Every number matches the original paper

Quote Fidelity Every quote is word-for-word accurate

Framing Integrity Our conclusions match the researchers' conclusions

Limitation Coverage Every limitation the researchers noted, we note too

Citation Chain Every claim traces back to the source

Import Integrity The published page matches the verified data exactly

Claim Certainty Tiers

When multiple studies are synthesized into a claim, the claim gets an Evidence Consistency Index and a Certainty Tier — visible on every claim page. These are algorithmically determined from the underlying evidence, not editorially assigned. They show how consistent the evidence is across studies and how certain the conclusion.

High Certainty (85-100) Strong consistency across multiple high-quality studies. Meta-analyses and large RCTs converge.

Moderate Certainty (50-84) Good consistency with some limitations. Limited RCTs or mixed effect sizes across populations.

Low Certainty (0-49) Preliminary or limited evidence. Early-stage research, small samples, or divergent findings.

These tiers are algorithmically determined — not editorially assigned. FitChef does not rate or judge the quality of research. We verify our own data extraction and report the consistency of evidence across studies. The Certainty Tier tells you how much the studies agree with each other, not how good we think the science is.

For a focused explanation of what each tier means and how it's calculated, see How Certainty Works.

When the Science Changes, Everything Updates

Science isn't static. Studies get retracted. New evidence surfaces. When that happens, most sites quietly update one page. We do something different.

When a study is corrected on FitChef, that correction automatically cascades through every piece of content that references it. The study page gets updated. Every claim citing that study gets flagged. Every flagship guide citing those claims gets flagged. Everything is re-verified and logged publicly.

Nothing stays stale. Every change is timestamped and visible on our corrections log.

Is This Efficient? No. That's the Point.

Running 5 verification gates across 3 pipelines takes time. It would be faster to just check once. It would be faster to not check at all — that's what most sites do.

But why would you trust us if we aren't 100% sure? You wouldn't. So we pay the cost. Every study, every claim, every flagship guide. No shortcuts.

Are we perfect? No. Science changes. What we can commit to is that everything on FitChef accurately represents what the research says — and when it changes, we update everything.

When We Catch Something, You See It

We could hide our Skeptic's findings. We could quietly fix things. We don't.

Every correction is permanent, timestamped, and public. The old version stays visible. Nothing gets quietly edited.

And our Skeptic isn't static. Every time it catches a pattern, it upgrades itself. The changelog below updates automatically.

Within an internet drowning in AI-generated noise and confident-sounding nonsense — this is the only way to earn trust. Not by asking for it. By proving it.

The Numbers Right Now

81 Studies Verified

91 Claims Grounded

2,268 Verification Checks Run

0 Corrections Published

The Skeptic Changelog

Our verification system evolves. Every time the pipeline catches a pattern, the system upgrades. Here's every change, documented automatically.

Pull Quote Isolation Test — standalone readability enforcement

Added a mandatory Isolation Test to the narrative writer (R1) and field verification (R7) agents. Every pull quote must now pass a three-question self-containedness check: can a reader identify the subject, the claim, and at least one specific data point without any article context? A 60-character minimum floor catches compressed fragments mechanically. Pull quotes that work as narrative pacing devices but fail as standalone cards are now blocked before they reach the site.

Quality review found pull quotes rendering as vague fragments ('A 73% crash.', 'More going in. Less showing up.') — dramatic as narrative beats but meaningless as standalone cards. Data analysis across ~75 studies confirmed only 2 outliers, both from recent runs. Dual execution failure: writer produced staccato fragments, verification agent passed them despite existing standalone readability check.

Visual-vs-prose consistency gate added

When a visual element displays a data value alongside a threshold or benchmark, the pipeline now verifies that the body text's characterization of the relationship between those numbers is mathematically defensible. A visual that shows readers one thing while the prose claims another undermines trust — this gate catches the mismatch before publication. Added to both the Visual Team (VT) and Short Verifier (SV1) specs.

External AI review flagged a live Short where a visual showed a value just below a threshold while the body text claimed the value 'cleared' it. The underlying science was correct, but the prose overclaimed relative to what the reader could see.

Pull quote attribution now distinguishes editorial synthesis from verbatim quotes

Pull quotes on study and Short pages were rendered with curly quotation marks and researcher-name attribution (e.g., '— Morton et al. 2018'), which visually implied the text was a direct quote from the paper. In reality, pull quotes are always FitChef's editorial synthesis — rewritten in plain language, never raw academic text. The theme now renders pull quotes without quotation marks and with 'Based on {source}' attribution, making it clear the insight is our interpretation of the research, not the researcher's own words. Pipeline specs (R1, R7, Shorts rendering model, Component Library) updated to reflect and enforce the new format. 142 study pull quotes and 116 Short screenshot sentences are affected — all corrected automatically by the theme change with no import JSON edits needed.

External review (Gemini) flagged that a pull quote on the Collagen Short attributed editorial prose to Ravindran et al. (2026) in a format indistinguishable from a verbatim quote. Audit confirmed the pattern was systemic across all studies and Shorts.

Statistical notation gate now catches correlation coefficients

R9's Zero PubMed mechanical code gate previously caught p-values, confidence intervals, heterogeneity notation, effect sizes, and odds/risk ratios — but not correlation coefficients (r-values). A live Short shipped with raw r=0.07 and r=0.60 in reader-facing text because the gate's regex didn't include the pattern. The gate now scans for correlation coefficient notation and translates it to plain language, closing the gap.

Live Short ('why-regain-weight-after-every-diet') found with raw correlation coefficients in takeaway box, body text, SEO description, and speakable text — all Magazine layer fields that should contain zero statistical notation.

Short ↔ Recipe Bridge — cross-content grounding through shared evidence sources

Shorts and Recipes are now connected bidirectionally when they share a grounded evidence source (study DOI, claim slug, or internal synthesis register entry). SV1 Gate 7D scans recipe imports when verifying a Short; RR2 Step 7.7 scans Short imports when building a recipe. Both directions enforce the HELL YES threshold: the recipe must physically deploy the mechanism the Short explains. Categorical similarity ("both about protein") is never sufficient. Three new kill switches enforce slug verification, mechanistic-only grounding, and a per-page cap. The bridge creates a knowledge graph where readers can move between "why this works" (Short) and "put this into practice" (Recipe) — grounded by the same underlying evidence.

Architecture decision to connect the two newest content types through shared evidence rather than topic similarity

P4 Source Hunter now cross-checks bomb derivations against verified extraction data

Added a mandatory derivation cross-check (Step 4B) to the Source Hunter agent. Every number in a bomb derivation that claims to come from the study must now match a verified point estimate in the extraction — not a confidence interval bound, subgroup footnote, or table annotation. All derived values (ratios, percentages) are recalculated from verified point estimates. This catches the most common derivation error: using a CI bound instead of an effect size.

During the Dupuy 2018 recovery study run, P4 used a confidence interval bound (g = −0.20) instead of the point estimate (g = −0.47) for cold water immersion. This produced an incorrect 11x ratio instead of the correct ~5x ratio. The error propagated through the grounded file and was not caught until the Number Skeptic (R6) — 8 agents later. Creative content used the correct value from extraction, but the discrepancy persisted in the grounded info-gain file.

Medical Topic Exclusion Gate added to Shorts pipeline

New two-layer gate systematically excludes keywords that target medical conditions from the Shorts pipeline. Keywords about blood sugar management, blood pressure, depression treatment, cardiovascular disease, or any clinical condition are now automatically killed — even when the evidence is strong and actionable. FitChef is a fitness editorial platform for people building their bodies, not a health information site for people managing medical conditions. SK0 (Keyword Scout) enforces this at mining time; SF1 (Short Fueler) enforces it again as a safety net at production time. Seven keywords from the June 2026 bank refill were retroactively killed under this gate.

SK0 v2.9 bank refill admitted 7 keywords targeting blood sugar, blood pressure, depression treatment, and heart health. All passed the existing Body-Composition Filter (their answers DO change behavior) but their audiences are patients managing conditions, not gym-goers optimizing performance.

External source claims now require human verification

When a claim page names a specific external source with a specific number (e.g., 'Healthline says 44 to 78 grams'), the pipeline now requires that source+number pair to be human-verified — the same standard applied to scientific evidence. Previously, web-fetched competitive data from SEO research could enter narrative prose without human verification. The pipeline already distinguished cultural fuel from grounded evidence at the research stage, but the distinction was not enforced at the writing stage or caught by the content skeptic. Two new gates close this gap: a writer-level rule preventing unverified named-source claims, and a skeptic-level gate catching any that slip through.

Mark's review of CL-004 creative output identified that web-fetched competitive data was being used as grounded facts in prose.

Macro attribution accuracy gate added to recipe pipeline

Added a new verification rule (SR9 + KS-RR2-19) ensuring that nutritional values stated in recipe descriptions, audio scripts, and FAQ answers are attributed to the whole meal rather than a single ingredient. Recipe macros describe the combined contribution of every ingredient on the plate. An audit of 201 recipes found cases where editorial text credited the full meal protein to one ingredient while grains, vegetables, and other sources contributed meaningfully. The pipeline now requires meal-level language for meal-level numbers.

Manual audit discovered recipe audio scripts attributing total meal protein to a single protein source, ignoring contributions from grains, legumes, and vegetables

Cross-Content Consistency Gate added — catches contradictions between pages

Added a new verification gate (Gate 1C, KS-SV1-46) that checks every claim in new content against all existing live content for contradictions. Each piece of content was already verified against its own sources individually — but two pages could each be grounded in real research while still contradicting each other if the interpretation drifted. This gate catches that drift before it goes live. Triggered by discovering two live Shorts making conflicting claims about the same tissue: one called muscle "the strongest predictor of calorie burn" (an interpretation drift from the source, which said "fat-free mass"), while the other correctly noted muscle is "metabolically quiet at rest" at 6 calories per pound per day. Both were individually grounded in real studies. Neither was wrong about its own source. But the reader saw two pages on the same site saying opposite things. The contradiction has been corrected, and the new gate prevents any future content from shipping with cross-page inconsistencies.

Manual audit discovered two live Shorts (KW-035 muscle-does-not-turn-to-fat and KW-015 muscle-calories-at-rest) making contradictory claims about muscle's metabolic contribution — each individually grounded, but conflicting due to interpretation drift from source verbatim

Body-Composition Filter now enforced across all content tiers including Shorts

The Body-Composition Filter — which ensures every FitChef page changes how you eat, train, or build your body rather than providing medical reassurance — was already enforced at study tier (C0 Cluster Architect) and claim tier (Claim Rule 8). It was NOT enforced in the Shorts pipeline. Two high-demand keywords passed every existing gate (separation, evidence mapping, competition assessment) but their honest answers were medical reassurance: 'your kidneys are fine' and 'it's safe long-term.' Both killed. SK0 (Keyword Scout) now has a mandatory Body-Composition Filter gate that catches these keywords before they enter the production queue. SF1 (Short Fueler) has a safety-net gate that catches any that slipped through older bank versions. Editorial Foundation Rule 9 — the filter applies at every tier identically — is now mechanically enforced end-to-end.

Keyword bank audit found KW-021 ('is creatine safe long term') and KW-024 ('will too much protein damage kidneys') in the production queue. Both had massive search demand and zero competition — but both fail the Body-Composition Filter because their answers are safety reassurance, not behavioral change. The protein cluster build plan had already excluded the kidney topic at study tier with cut_reason 'medical_reassurance.'

Guide headline stats now provably match the studies on the page

A guide's headline numbers — how many studies and how many participants — must now be computed from that guide's own analyzed studies, not borrowed from any single underlying question. During an internal audit we found one guide whose 'total participants' figure had been pulled from a single claim's evidence set, most of it from one large study, which made the headline unrepresentative of the guide as a whole. We corrected the figure to reflect the guide's actual study set, and added a verification step that blocks any headline stat that can't be reconstructed from the studies shown on the page.

Internal audit found the Carbs guide's hero stat reported a participant total lifted from one claim (dominated by a single 68,128-participant study) instead of the guide's own studies.

Removed a misleading “extra” from an ultra-processed-food hero stat

The study's headline read '340 extra calories.' The 340 figure is grounded — it is how many calories the measured eating rate delivers in a 20-minute meal — but the word 'extra' implied it was the between-diet difference, which is actually 508 calories per day (stated correctly in the key finding). We removed 'extra' so the headline number is unambiguous.

Internal audit found the hero stat labeled a 20-minute intake figure as 'extra' calories (Hall 2019).

Corrected a meta-analysis's “participants” figure that was actually its study count

On the sugar-and-body-weight study page and a claim that cited it, the 'Participants' figure read 68 — but 68 is the number of studies the meta-analysis pooled (30 randomised trials plus 38 prospective cohort studies), not participants. The source paper reports per-analysis effect sizes, not a single participant total, so we removed the incorrect figure rather than substitute an invented one.

Internal audit traced the study page's 'Participants: 68' to the meta-analysis's pooled study count (Te Morenga 2013, BMJ).

Removed unverifiable sample size from Schoenfeld 2017 load meta-analysis

The Schoenfeld 2017 study page stated '684 participants' as the pooled sample size across 21 studies. During the deep pre-scale audit, we verified this number against the full paper (J Strength Cond Res 31(12):3508-3523): the paper does not report a pooled participant total. Table 1 lists per-study sample sizes summing to 630, but this includes control groups excluded from the meta-analysis — the actual analyzed N is lower and unreported. 684 appeared nowhere in the paper text, abstract, or tables. Rather than replace it with a derived estimate, we removed the participant count entirely and retained '21 studies' as the scope descriptor, consistent with what the paper itself reports. The training library flagship participant total was updated accordingly (9,350 → 8,666). A cross-reference in the Zhang 2025 exercise-deficit study was also corrected.

Deep pre-scale quality audit (Check 1: study number traceability). Verified 17,599 numbers across 224 published pieces; Schoenfeld 684 was one of two medium findings.

Corrected protein badge to match EU nutrition claim threshold

The Stuffed Portobello Mushrooms recipe displayed a 'High Protein' badge, but protein provided only 16.7% of total energy — below the EU EFSA threshold of 20% required for 'high protein' claims (Regulation 1924/2006). The badge was changed to '35g Protein' (a factual statement, not a regulated claim). The recipe pipeline spec (RR2) was updated with mandatory EFSA threshold checks so future recipes are automatically verified before a nutrition badge is assigned.

Deep pre-scale quality audit (Check 17: EFSA health-claim compliance).

New Rule: Zero Fabrication on Import Fix

Added Rule 24 to the pipeline master instructions. When fixing import validation errors (type mismatches, missing fields), the system must never populate fields with inferred or derived content. Only three actions are permitted: type conversion with empty values, direct verbatim copy from a grounded source file with explicit tracing, or mechanical transforms specified by the pipeline spec. Every non-empty value must trace to an exact source file and field path — otherwise the field stays empty. This closes a gap where reasonable-sounding inferences could bypass the grounding requirement.

During a post-pipeline type-mismatch fix, the system fabricated inclusion criteria by inferring from sample characteristics instead of using empty containers. Caught during import review.

Pull-quote attribution now matches the actual source

The pull-quote blockquote (the shareable sentence shown mid-article in Shorts) attributed the finding to the first study in the fuel list regardless of which study the sentence actually described. For multi-source Shorts this could display the wrong author. A new field now explicitly maps each pull-quote to its correct source, and a kill switch (KS-SV1-34) verifies the match before any Short ships.

Shorts Scale Audit discovered nighttime-carbs Short attributed Sofer et al. 2011's finding (28% more weight loss with dinner carbs) to Gardner et al. 2018 (a different study that happened to be first in the fuel array). Second mismatch found in processed-food-speed-trap Short (Forde finding attributed to Hall).

LLM citation hint now shows actual answer text

The 'AI systems — cite as:' line in the Cite This Short block was showing a CSS selector (.fc-short-takeaway) instead of the actual citation-ready answer for 5 Shorts, and was missing entirely for 18 Shorts. All 27 Shorts now display the correct answer capsule text that AI systems can use for accurate citation.

Shorts Scale Audit discovered 5 Shorts rendering '.fc-short-takeaway' as the LLM citation hint and 18 Shorts with no hint at all. Root cause: SC1 spec described the field ambiguously (CSS selector vs text), and SV1 never included fcc_short_speakable in its field template.

Data Register Verification added to Shorts pipeline

Added five register-level thinking rules (Paste-Back Test, Feelability Gate, Researcher-as-Subject, Authority-Label, Compound-Noun) to SW1 (writer) and a new Gate 6C to SV1 (verifier). These catch sentences that pass vocabulary checks but read like study-results prose rather than FitChef story voice. The existing Gate 6B caught jargon words; Gate 6C catches the register — the frame in which data is presented.

Protein-before-bed Short (KW-001) passed all existing gates but contained '+8.4 cm² vs +4.8 cm²', 'Trommelen and colleagues tracked', and 'Type II muscle fibers' — structurally sound vocabulary but study-report register.

Statistical detail rendering — full numbers now visible on all claim pages

Fixed a rendering gap where 34 of 45 claim pages showed study names in the Statistical Detail accordion but not the actual numbers (effect sizes, p-values, heterogeneity). The theme template now renders both the canonical format and the legacy flat-key format. Pipeline specs (CR7, CR12) updated to mandate the correct structure for all future claims.

CL-002 production run revealed statistical numbers missing from the accordion. Deep audit confirmed 34 of 45 claims affected.

Sidebar participant counts corrected after cross-system audit

An independent AI review flagged discrepancies between article body numbers and sidebar Evidence Base widgets. Investigation traced the root cause: the agent building sidebar data had no mandated source for participant counts, leading to 6 values across 3 clusters that could not be traced to verified study extractions. Every article body number was confirmed grounded — the issue was isolated to the sidebar rendering layer. All 6 values have been corrected to match verified extraction data, and a new kill switch (KS-LIB-11) now requires every sidebar participant count to be sourced from the same grounded extraction files that feed the article body.

Independent AI cross-check flagged body-sidebar number mismatches

New blocking gate prevents sidebar-body number mismatches

Added KS-LIB-12: a mandatory cross-check that compares sidebar participant totals against the verified fuel file breakdown before any library page can be imported. If the sidebar sum does not match the article body total, the import is blocked until the discrepancy is resolved. This gate makes it structurally impossible for sidebar and body numbers to diverge — the same grounded data must feed both.

Number consistency audit revealed no existing gate caught the mismatch

Study participant counts standardized to prevent silent parsing errors

Audit discovered that some study pages stored participant counts as freeform text (e.g., '49 studies, 1863 participants' or 'n=20 (10 per group)'). The rendering system extracted the first number it found — sometimes reading a study count as a participant count, or reading zero from text starting with a letter. Seven study imports and three claim imports have been corrected to clean integers, each verified against the original study's grounded extraction data.

Number consistency audit found PHP integer casting silently misreading freeform sample size values

Four-category jargon sweep added to Shorts verification

The Short Verifier now scans every Short and its audio script for four categories of technical language: statistical notation, biomedical terms, exercise-science jargon, and paper-apparatus vocabulary. Any sentence a smart 8-year-old wouldn't understand on first read is caught and sent back for rewriting before it can publish. FAQ answers get the same sweep — no p-values or confidence intervals in reader-facing dropdowns.

Quality audit of first three Shorts found jargon leaks that passed verification: statistical notation in prose, untranslated clinical terms in audio.

Audio scripts now grounding-verified against source research

Every number, claim, and finding in a Short's audio script must now trace back to the grounded fuel files — the same verified research that powers the written content. Audio translates verified content into spoken-word format; it can never add new claims, regardless of whether the claim is true. The verifier checks audio against fuel files before the Short can publish.

Architecture review found Shorts audio had no grounding constraint — the study pipeline's audio agent (R4) had this rule but the Shorts pipeline did not.

Citation links now verified against actual published page URLs

The Citability Engineer and Short Verifier now verify every source citation link against the actual published page URL from the import system. Previously, citation URLs could use internal folder names instead of WordPress page slugs, producing broken links. The verifier also ensures external sources with DOIs always have clickable links — readers can click through to verify every cited study.

Review of first three Shorts found four broken citation URLs (wrong slugs) and two external sources with DOIs that had no clickable link.

Universal Citation Verification Gate — every study now verified before entering the pipeline

The citation verification gate (KS-C0-CITATION) now requires every study — whether discovered by Claude or suggested by Gemini — to be verified via WebSearch or WebFetch before it can enter the cluster plan. Previously, only Gemini-suggested studies required verification. This upgrade was triggered when a hallucinated citation ('Ravelli 2019') was detected in the fat-loss cluster plan — the paper did not exist in any indexed source. The real paper had a different author, year, and journal. The hallucination never reached any published page (caught before the study entered the pipeline), but the gap that allowed it to enter the plan at all has now been closed. Every citation in every cluster plan is now tool-call verified.

Hallucinated citation detected in fat-loss cluster architecture during production queue review

Answer capsule standalone gate now catches title-dependent openers

The answer capsule appears in featured snippets, AI citations, and the answer hero card — all contexts where the page title may be absent. The standalone gate previously verified that the answer named its subject and embedded its purpose, but did not check whether the opening word was a direct response to the headline question. An answer starting with 'No —' or 'Yes —' only makes sense if the reader just read a yes/no question above it. In standalone surfaces, there is no question. The gate now includes a fifth test: the first word must make sense without a question preceding it.

CL-005 test-boosters-mostly-scam-one-exception — answer_short 'No — three out of four ingredients...' passed all four existing tests but opened with a title-dependent 'No —' that has no referent in standalone contexts

New blocking gate: Answer Capsule Standalone Verification

Added Step 3E to the Content Skeptic (CR10) — a blocking gate that verifies the answer capsule works as a standalone statement without the page title. The answer capsule appears in Google Featured Snippets, AI citations (ChatGPT, Perplexity), and the answer hero card — contexts where the title may be absent. The gate checks that the text names its subject and embeds its purpose. Previously, if the answer capsule writer (CR4) falsely logged a standalone PASS, no downstream agent caught it. On CL-003, the answer started with 'Yes — but the boost is smaller than it feels' — which doesn't tell the reader WHAT is being discussed without the title above it. The gate now catches this pattern before publication.

CL-003 preworkout-caffeine-small-real-edge — answer_short standalone failure propagated through entire pipeline undetected

Doctor’s-letter test expanded to body prose — 4 prescriptive sentences caught and fixed

A full legal audit of all 60 import JSONs across 4 clusters found 4 sentences in claim body prose that read like dietary prescriptions rather than evidence reporting — all using 'aim for' with specific gram or frequency targets. The verification gate (CR10 Step 3A2) already scanned persona actions, FAQs, and skeptic notes for this pattern, but body prose was not included in the scan. All 4 sentences have been rewritten to describe what the evidence found rather than tell the reader what to do, and body prose is now included in the gate so the pattern cannot recur.

Pre-scaling legal audit of all import JSONs (2026-05-12)

New mechanical gate catches raw statistical notation in narratives

A cross-cluster audit found that 7 of 23 study narratives contained raw statistical notation (p-values, confidence intervals, heterogeneity scores) that should have been translated to plain language. The existing Zero PubMed rule relied on agent judgment, which interpreted 'translate and keep' as compliant. Three spec upgrades close the gap: R1 now requires notation to be translated AWAY (not accompanied), R3 tightens the story-point exception to at most one instance per narrative, and R9 adds a mandatory regex code gate that mechanically catches any remaining notation before publication. Six study narratives were corrected.

Mark’s review of Schwingshackl 2013 live page revealed p-value density. Cross-cluster audit confirmed pattern across carbs and meal-timing clusters.

Audio scripts now pass the doctor’s-letter test

A pre-scale legal audit of all 31 audio scripts found 5 scripts containing prescriptive imperatives ('aim for X grams', 'push toward X') — the same pattern found and fixed in written persona actions. All 5 scripts were surgically edited to replace imperatives with evidence-reporting language. Two audio scriptwriter specs were upgraded: CR9 (v2.4.0) for claim audio and R4 (v2.6) for study audio both now include an explicit doctor’s-letter test gate (CR9 Step 2C, R4 SR7B). The disclaimer check confirmed all 31 scripts already end with a spoken 'not medical advice' disclaimer — that system was already working correctly.

Pre-scale legal audit — Check 2 (audio scripts) following Check 1 (persona actions)

FAQ answers and skeptic notes now pass the doctor’s-letter test

A comprehensive legal audit of all 164 FAQ answers and all skeptic notes across both clusters found 5 FAQ answers and 1 skeptic note containing prescriptive imperatives ('aim for X grams', 'you should aim for', 'eat more protein'). All 6 were surgically edited to replace imperatives with evidence-reporting language while preserving the same practical information. A final sweep of all 30 import JSONs confirmed zero prescriptive patterns remaining in any field.

Pre-scale legal audit — Checks 6 and 12 (FAQ fields and skeptic notes) following Checks 1-2 (persona actions and audio scripts)

Persona actions and translations now pass the doctor’s-letter test

A pre-scale legal audit found that 14 persona action fields across 13 claims contained direct imperatives ('Aim for X grams', 'Push toward X', 'Move X from Y to Z') that could be read as medical advice rather than evidence reporting. All 22 affected fields across both clusters were rewritten to describe what the research tested and found, letting readers infer the action themselves. Three pipeline specs were upgraded with a new blocking gate: CR7 (v2.11.0) now bans prescriptive imperatives in persona actions and real-world translations, CR10 (v1.7.0) adds a mandatory doctor’s-letter verification scan before any claim can pass Gate 2, and R2 (v2.24) extends the same rule to study so_what fields. The reframe preserves the same practical information in the same plain language — the only change is whether FitChef tells readers what to do (old) or reports what was studied (new).

Comprehensive legal audit of all live content across protein and meal-timing clusters before scaling

Claim Pipeline goes live — first multi-study synthesis verified

The Claim Pipeline (Phase 2) produced its first verified claim: 'Does intermittent fasting actually give you a better body than regular dieting?' This claim synthesizes 4 studies (529 participants) through a 15-agent pipeline with 18 verification steps, including evidence consistency scoring, content verification against all source papers, reader simulation, and Gemini cross-check. The claim pipeline adds a new layer of verification on top of individual study checks: every factual statement in the synthesis traces back to a verified study extraction, and a quality audit scores the final content across 5 dimensions (evidence integrity, content quality, dwell time design, SEO readiness, trust layer). This first claim scored 91.3/100 composite — above the 85 ship threshold.

First claim reaching production readiness through the complete claim pipeline

Legal safety audit v2: numeric scores removed from public pages + claim audio disclaimer added

Comprehensive 15-point legal safety re-audit of all FitChef systems. Found and fixed 4 places where internal numeric scores (consistency_index/100, trust_score/5) were rendered on public pages — claim cards, cluster hub rows, and OG meta descriptions. These numeric scores could imply medical authority or study validation (Legal Audit §1.3 violation). All replaced with human-readable certainty tier labels. Also found claim audio scripts were missing the legal disclaimer that study audio scripts have had since day one. Added mandatory disclaimer specification to CR9 (Audio Scriptwriter) and verification gate to CR10 (Content Skeptic). Fixed prescriptive heading language on study pages.

Full legal re-audit requested after many system iterations since original April 2026 audit

Internal numeric scores removed from all public outputs

Internal quality metrics (consistency index, data fidelity rate) were previously exposed in REST API responses, JSON-LD structured data, HTML meta tags, and AI-facing llms.txt. These numbers are used internally for sorting and quality gates but could imply FitChef is rating or evaluating science — which contradicts our identity as a content platform. All numeric scores have been removed from every public-facing output. The public now sees only human-readable certainty tiers (High/Moderate/Low) and factual counts (studies verified, claims grounded). The numbers continue to work behind the scenes for quality control.

Legal safety audit v2 — making all public outputs rock-solid for FitChef's identity as a content platform, not a research evaluator

Sidebar fixes + Legal text sweep + Mobile horizontal scroll fix

Multiple theme fixes shipped in v12.15.6: (1) Study sidebar 'What kind of study is this?' disclosure styled with custom CSS chevron. (2) Claim audio playlist CTA is now sticky at bottom-right. (3) Full legal text sweep — updated footer disclaimer, methodology page, skeptic protocol, and cite block headings to remove any implication FitChef conducts original research. (4) Claim card text overflow fixed with CSS line-clamp. (5) Studies archive mobile horizontal scroll fixed — Pattern A center columns now use split overflow (overflow-x: hidden, overflow-y: visible) instead of overflow: visible which broke horizontal containment.

Visual QA audit + Legal safety review

Claim pipeline: 3 automated code gates + voice-register enforcement added

Deep audit of claim pipeline vs. study pipeline revealed 12 quality gaps. Added three mandatory Python code gates to the claim editorial polish agent: bold distribution gate (ensures every content section has visual anchoring), paragraph length gate (max 400 characters — prevents wall-of-text rendering on mobile), and dense-outcome gate (catches results-section-style data dumps). Also added Pillar B voice-register enforcement matching the study pipeline's proven system — four-category jargon sweep, one-voice consistency check across all page fields, and gender-neutral language enforcement. These gates run automatically on every claim before it can pass editorial review.

Comprehensive claim vs. study pipeline comparison revealed claim pipeline lacked the mechanical quality enforcement the study pipeline has had since v3.5

First Claim Pipeline Execution Complete (CL-001)

The claim pipeline (Phase 2) processed its first claim: 'How much protein do you actually need per day?' (CL-001). 18 agents synthesized evidence from 4 studies (Morton 2018, Schoenfeld 2013, Nunes 2022, Jäger 2017 — Consistency Index 87, High Certainty). Quality audit score: 91.5 (SHIP). Gemini external review completed with one accepted fix. Infrastructure validation (CR13): PASS_WITH_GAPS — 0 blocking, 1 non-blocking (bridge metadata field not yet in plugin registry; bridge content preserved in body prose). Trust audit (CR14): PASS — 0 kill switches triggered, evidence chain fully traceable, trust page consistency confirmed (ClaimReview schema, LLM meta tags, Skeptic Protocol references all present).

Claim pipeline (Phase 2) completed first full production run for protein cluster CL-001

Cluster Architect gains meta-analysis mandatory check and tension splitting gate

Post-launch review of the protein cluster revealed a critical gap: the 'how much protein when losing weight' question had no covering study — Wycherley 2012 (meta-analysis, 24 RCTs, 1,063 participants) was never evaluated as a candidate because the tension was framed as 'body recomposition' (Longland 2016), collapsing two different mass-audience questions into one. Two new rules now prevent this: A-S9 forces the Cluster Architect to search for meta-analyses for every tension with an RCT flagship, and A-S10 forces a two-archetype test to detect collapsed tensions. The protein cluster has been updated: 9 flagships (was 8), with Wycherley 2012 covering deficit populations that Morton's 1.62g/kg breakpoint explicitly excludes.

Post-launch protein cluster integrity review found Wycherley 2012 was never evaluated

Independent Dwell-Time Verification Added to Claim Pages

Every claim page now undergoes an independent reading experience check by a second AI model (Gemini) that has never seen the pipeline. Gemini reads the page as a naive reader who Googled the question, identifies where engagement drops, and flags structural similarities between sibling claims. This model has zero authority over the evidence — all factual claims remain locked by the existing verification gates. The check catches dwell-time weak spots that the writing model cannot detect in its own work.

Claim pipeline design — claim pages need independent reading experience verification like study pages get from R10

Cluster Planning Rebuilt: Tension-First Architecture + Flagship/Satellite Studies + Body-Composition Filter

Every cluster is now built around the real debates fitness audiences argue about — not around which papers exist in the literature. Studies that answer the same question are grouped into flagship pages with convergent evidence sections, so you see one definitive page backed by multiple studies instead of redundant overlapping pages. Medical-reassurance topics (organ safety, mortality statistics) no longer earn standalone study pages — only findings that change how you eat, train, or build your body get published. Layer 3 of each cluster now includes four content types: a comprehensive Master Guide, a shareable Myths Piece, a methodology transparency Skeptic Note, and an interactive Tool (where applicable). The result is tighter clusters, zero redundancy, and every page serves a unique purpose.

Protein cluster C0 Phase A execution revealed five structural issues: redundant meta-analysis pages, medical-reassurance content passing the viral filter, Layer 3 underspecification, bottom-up planning producing academic completeness, and no upfront claim mapping. Cross-AI verification with Gemini confirmed and strengthened proposed solutions.

Gate 3b independent verification now phased for satellite studies

The independent skeptic verification (Gate 3b) now uses a structured two-phase process when satellite studies exist. Phase 1 completes a full forensic audit of the flagship extraction with its own verdict. Phase 2 then verifies each satellite individually with a targeted 6-point check against its own paper. This prevents quality degradation from information overload when the fresh verifier receives multiple papers at once. A scaling rule splits verification into separate sessions if satellite papers exceed 200KB combined.

First flagship-with-satellites execution (Morton 2018 protein cluster) — Mark flagged that the original single-prompt design could cause rushing when verifying 3+ papers simultaneously

Satellite Studies Now Fully Cited

Weight of Evidence satellite studies (independent research confirming or nuancing our flagship study) now receive the same citation treatment as every other source: inline [N] markers in the article text linking to the original paper, plus clickable source links on the evidence cards. Previously, satellite studies were mentioned by name but without verifiable links.

Morton 2018 first pipeline run: Nunes 2022 and Jäger 2017 appeared without source URLs (2026-04-16)

Full Picture trust block now runs a mandatory cross-sibling swap test

The Full Picture trust block on every study page now runs a mandatory cross-sibling swap test before publishing. An audit found byte-identical Section 2 prose across four protein-cluster studies. Writing agents must now read at least two sibling blocks and pass a four-point comparison: different headers, different opening words, no shared six-word phrases. Any collision blocks publish until the block is regenerated fresh from source.

Audit found identical transparency prose across four sibling study pages in the protein cluster.

Kill-switch count reconciled to 28 — KS-26 retired

The active kill-switch count drops from 29 to 28. KS-26 (Platform Number Fabrication) is retired because the data path it guarded — platform statistics appearing in study articles — was structurally removed when study pages were decoupled from the FitChef product layer. The error class is eliminated at the architecture level rather than caught by a verification gate. KS-26's ID slot is not reused; remaining IDs stay stable.

Architecture rebuild removed the platform data path from study pages, making KS-26 unnecessary.

Trust pages reconciled to match actual verification state

Four trust-page surfaces were still describing retired pipeline features as live. The Verification Ledger heading was wired to the wrong integer source. Sidebar navigation referenced old gate numbers and a removed anchor. The AI Transparency page described a retired study tier. The llms.txt page listed meta-tag names the plugin no longer emits. All four now match the actual ship state.

Zero-tolerance audit found stale copy on trust pages that survived the prior architecture cleanup.

Reading-grade ceiling enforced — every study must read at 8th-grade level or lower

Every study narrative now passes a Flesch-Kincaid reading-grade gate before reaching the human-experience review. Articles must score grade 8 or lower (magazine target). Grade 8-9 is flagged for review. Above grade 9 blocks the study for rewriting. Any single paragraph above grade 10 is fixed in place or flagged.

Post-rebuild alignment sweep found no numeric reading-grade enforcement in the readability agent.

New gate: causal language detection for observational studies

Added a two-layer defense against causal language in articles based on observational research. When a study's design is observational (cohort, case-control, meta-analysis of observational studies), FitChef's editorial voice now must use associational language — 'was associated with,' 'showed an association' — never causal language like 'protects,' 'contributes,' 'prevents,' or 'scored.' Layer 1: a new Sacred Rule (SR6B) in the Editorial Polish agent catches causal phrasing during writing. Layer 2: a new mandatory code gate (Check 12C) in the Infrastructure Validator programmatically scans every field in the import JSON before publication, blocking any article that contains causal language for observational findings. This distinction matters because observational studies show correlations, not proven cause-and-effect — and implying causation from correlational data is both scientifically inaccurate and a legal risk.

Manual audit of a mortality/cancer/CVD meta-analysis (Naghshi 2020) found 47 instances of subtly causal language that passed all existing verification gates. Words like 'protection,' 'contributes,' and 'scored' imply proven effects but don't match classic overclaim patterns. Cross-study audit confirmed all 16 other articles were clean — but the pipeline must prevent this systematically for all future studies.

P2 anti-anchoring rule for finding count

Added explicit instruction to P2 Section 9 preventing Claude from anchoring to a fixed finding count. 10/14 extractions had exactly 10 findings due to unconscious pattern-matching. Rule states: no target count, paper complexity decides, pause if count is a round number.

Pattern analysis — 10/14 studies converged to exactly 10 findings

Independent verification prompt strengthened after Casuso-Goossens review

The Gate 3b independent skeptic prompt — used when a fresh Claude session audits every extraction — was rewritten from a 12-line general checklist to a 50-line forensic audit protocol. The new prompt requires field-by-field walkthrough of the entire extraction JSON (no high-level skimming), explicit verification marks for every field, and specific anti-fabrication checks for metadata values like dropout rates and sample sizes that may be invented rather than sourced from the paper. This was triggered by the Casuso-Goossens 2025 review where the independent skeptic initially gave a surface-level pass and only found 11 errors (including a fabricated dropout rate) after being explicitly asked to check more thoroughly.

Casuso-Goossens 2025 Gate 3b independent review — skeptic required prompting for full audit

Three-layer anti-delegation enforcement prevents invalid subagent execution

Added three structural safeguards to prevent Claude from delegating pipeline agent execution to subagents (which receive summarized instructions and produce invalid output). Layer 1: CLAUDE.md Rule 18 — absolute Agent tool ban during pipeline execution, placed in the file read FIRST every session so it survives context compaction. Layer 2: PIPELINE_ORCHESTRATOR.md Rule 10B compaction-survival clause with mandatory execution_method field in post-agent logs. Layer 3: Subagent tripwire in all 14 Run Phase spec reading gates — if a subagent reads the spec, it encounters a STOP instruction before execution begins. Together these make delegation structurally impossible, not just rule-prohibited.

Schoenfeld 2018 production failure — after context compaction, Claude delegated R6-R10 to subagents that produced invalid verification outputs with false PASS/SHIP verdicts

Attribute & Report enforcement extended to audio scripts, titles, card headlines, and AI answer capsules

Audit found that audio scripts, post titles, card headlines, and answer capsules lacked the same Attribute & Report checks applied to body text. All 3 live studies had audio scripts where FitChef stated health outcomes as its own claims instead of attributing to researchers. Fixed the audio scripts and extended verification gates across 7 pipeline specs: R1 (title creation), R2 (card headlines), R4 (audio writing), R5 (answer capsules), R7 (audio verification), R9 (quality audit scope), and R11 (import validation).

Pre-scaling legal audit found all 3 live studies contained unattributed health verdicts in audio scripts and non-body-text content types

Kill Switch 29: No medical verdicts in any content type

New kill switch added after pre-scaling legal audit found all 3 live studies contained unattributed health verdicts in audio scripts, post titles, persona actions, and FAQs. KS-29 catches FitChef stating health outcomes as its own claims ('builds zero muscle', 'does nothing for your muscles', 'boosts metabolism') and medical screening statements ('This applies to healthy people only') without researcher attribution. Distinct from existing KS-20 (prescriptive 'you should' language): KS-29 targets the subtler pattern of FitChef asserting health facts as a journalistic authority rather than attributing to researchers.

Pre-scaling legal audit found all 3 live studies contained unattributed health verdicts in audio scripts, post titles, persona actions, and FAQs

No medical verdicts in titles, meta descriptions, or featured snippets

Added a three-layer verification check preventing health conclusions from appearing as FitChef claims in search-visible elements. Titles and meta descriptions now must create curiosity about what research found — they can never state a health outcome as fact. P5 SEO Strategist validation checklist now includes title, meta, and snippet medical-verdict checks. R7 Field Skeptic now verifies all search elements against the Attribute & Report Rule before any study ships. This protects FitChef's legal position as data journalism (reporting what researchers found) rather than a medical authority (making health claims).

During Devries 2018 P5 execution, initial title stated 'Zero Damage' and meta stated 'zero effect on kidney function' — both medical verdicts positioned as FitChef claims rather than attributed research findings.

Creative Director now enforces Attribute & Report from the source

The Creative Director agent (P6) sets the creative direction that the Narrative Writer executes. Previously, P6 had no direct rule preventing medical verdict language in its creative briefs — it relied on downstream agents to catch and fix verdict framing inherited from the brief. This created a gap: if the creative direction said 'protein is safe for kidneys,' the writer had to actively resist its own input to comply with data journalism rules. SR-7 now requires all creative direction language to attribute findings to the study ('the meta-analysis found no effect') rather than state health conclusions as FitChef's voice ('protein is safe'). Every sentence in the creative brief must pass the test: if the writer copied this phrasing into the published article, would it pass KS-20 (no prescriptive health language) and KS-29 (no medical verdicts)?

Pre-scaling legal audit found P6 was the only content-directing agent without direct Attribute & Report enforcement

Four agents patched with direct Attribute & Report enforcement

Four agents that produce or modify reader-facing text had insufficient or zero Attribute & Report enforcement. R10 (Reader Simulation) runs after all verification gates and can add/edit prose — but had only an indirect FITCHEF_VOICE.md reference, no decision rules. V1 (Visual Creator) and V2 (Social Image Creator) produce headlines, insight lines, and social text with zero A&R rules. R3 (Editorial Polish) had a single anti-pattern line but no structured enforcement. All four now have dedicated Sacred Rules with self-tests, kill gates, and explicit prohibitions against dropping attribution during engagement-focused editing. R10 also gained a new quality test (Test 5: The Attribution Test) that blocks shipping if any enhancement introduced a medical verdict.

Systematic A&R audit across all 23+ pipeline agents revealed 3 high-risk and 1 medium-risk gap in agents that produce published content

Mechanical Override Gate prevents recurring import format failures

Added S28 Hardcoded Value Override Gate to R11 Import Builder. This gate runs after all assembly is complete and mechanically overwrites values that agents consistently get wrong: audio URL forced to 'pending' (was empty, making Audio Generator invisible), primary_results forced to array format (was flat dict, making Card 3 empty), sample_size stripped to digits only, DOI prefix enforced, trust_score forced to integer. These failures recurred across multiple studies despite existing documentation because agents reason about values instead of mechanically applying them. S28 eliminates agent discretion for these fields.

Devries 2018 production failures: empty audio URL (no Generate Audio button), flat dict primary_results (empty Card 3), non-numeric sample_size, missing DOI prefix

Three-Zone Page Architecture for Clearer Evidence Hierarchy

Redesigned study page structure into three semantic zones — Narrative (immersion), Personalization (reader-specific insights), and Evidence & Trust (skeptical review). This architectural change strengthens verification integrity by clarifying the reader's scroll journey and assigning distinct verification territory to each zone. Narrative sections remain the story. Personalization fields (so_what, persona_actions) render after the narrative with reader-specific context. Evidence fields (skeptic_note, findings, controversy, FAQ) render together in a dedicated zone where skeptical readers expect detailed verification. Added deduplication rule preventing both narrative and controversy field from covering the same scientific debate, reducing reader confusion about what's established vs. disputed.

Study page template v12.1.0 redesign to improve reader navigation and field clarity.

Citability Pipeline — Machine Layer Integration

R5 Citability Engineer now feeds structured claim data, citation hints, and self-contained answer capsules through R11 into the WordPress schema and citation toolkit. LLMs and journalists see richer structured data and copy-ready citable paragraphs on every study page.

Citability pipeline integration — R5→R11 contract completion

Excerpt Defensibility Gate — Sentence-Level Precision Check

Field Skeptic (Gate 2B) now verifies the boldest claim sentences can stand alone when excerpted — by social media, AI summaries, or skimming readers — without overstating the paper's actual evidence strength. Ensures every sentence is individually defensible, not just globally hedged.

Adversarial content review of published study — identified that bold claims excerpted out of context bypass global hedging sections

Headline Defensibility Gates — Medical Positioning & Metaphor Escalation Checks

Field Skeptic now runs two additional checks on every headline and title. The Medical Authority Positioning Check catches titles that frame FitChef against healthcare professionals rather than against guidelines (e.g., 'Your Doctor Is Wrong' → rewrite to critique the government number, not the doctor). The Editorial Metaphor Escalation Check catches vivid metaphors that characterize findings more strongly than the paper's own language (e.g., 'Broken Math' for what the paper calls 'systematic bias'). Both checks enforce the Attribute & Report rule at the headline level, where excerpt defensibility matters most.

Legal safety audit of Jäger 2017 article found headline 'Your Doctor's Protein Advice Is Based on Broken Math' — defensible in body text context but not as a standalone excerpt. Two patterns identified: medical counter-positioning and editorial metaphor escalation.

Kill switch 28: Source URL liveness verification

Added a new verification gate that checks every source URL in the article actually resolves to a live page with relevant content. Dead URLs are auto-fixed if a correct URL can be found, or flagged for human review. Additionally, P4 (Source Hunter) now verifies all URLs are live before presenting them to the human operator — catching hallucinated URLs at the source instead of after publication.

Morton 2018 shipped with a dead Grand View Research URL. Claude hallucinated a plausible URL suffix (-report) that didn't exist. The content was real (human-verified) but the link was broken from day one. No agent in the 14-agent pipeline checked if the URL actually worked.

Reader Simulation Anti-Rationalization Rule — No More 'Necessary But Flat' Passes

R10 (Reader Simulation) now has an explicit anti-rationalization rule that prevents flat sections from shipping with labels like 'necessary mechanism beat' or 'lower energy but appropriate.' If R10 notices a section is flat but finds itself thinking 'it's necessary,' that IS a drag point — fix it or flag it. Additionally, the Dwell Test now checks for relative attention valleys: 3+ consecutive elements at attention 7-8 surrounded by 9-10 sections are flagged as drag even though each element individually passes the absolute threshold.

Jäger 2017 shipped with a 4-paragraph nitrogen balance methodology section rated attention 7-8 that R10 labeled 'necessary mechanism beat' without enhancement. Human reader (Mark) flagged the article as feeling technical and flat. R10 had zero enhancements across the entire 54-element article — the anti-rationalization escape hatch let boring content pass verification.

All verified sources now get numbered references in articles

Fixed a classification gap where human-verified sources used in the article text could be excluded from the numbered Sources list. Previously, sources classified as 'bomb amplifiers' or 'hidden gems' in the editorial pipeline were named in the text but not given clickable [N] reference numbers — even when they contained specific, verifiable claims. Now any human-verified source with attributable claims in the article gets a numbered reference, regardless of editorial classification.

Morton 2018 article referenced Greg Nuckols (Stronger by Science) and Menno Henselmans by name with specific claims but neither had a [N] marker or appeared in the Sources list. Readers had no way to verify those claims.

Audio Script Validation Gate

Added a mandatory validation check that blocks study imports if the audio script is missing. Previously, a study could import without its audio data even when the audio creative work was complete — causing the audio player to show no content. The gate now verifies that every audio field (script, title, narrator) is properly transferred from creative data to the import file.

Trommelen-2023 study imported without audio script despite complete audio.json

Post-Enhancement Grounding Verification

R10 Reader Simulation now adds real-world connection moments to articles — but every sentence it adds must cite the exact source file and quote from verified fuel that grounds it. This closes the gap where text added after the Triple Skeptic (R6-R8) could bypass grounding checks.

CEO-level pipeline audit — Check 3: Post-Verification Addition Gap

Kill switch 27: Health-condition study safety gate

Added a new verification gate for studies involving diagnosed medical conditions (kidney disease, diabetes, etc.). The gate ensures null findings ('no harmful effects detected') are never presented as positive safety claims, and mandates a healthcare-provider caveat in the reality check section. This prevents readers with medical conditions from interpreting research translations as personal medical guidance.

Legal safety audit identified that null findings in condition-specific studies could be misread as safety endorsements by patients

Removed invalid study rating from schema

The Skeptic Review schema on study pages incorrectly included a reviewRating block that implied FitChef rates studies on a 1-5 scale. FitChef does not rate studies — the value was an extraction-accuracy metric (trust context), not a quality judgment. The rating block has been removed. The Skeptic Review still documents what was verified (review body and notes), but no longer includes a numeric rating. This aligns with FitChef's documented position as a transparent translator, not an evaluator.

External schema audit identified Rating node with text value in ratingValue field — both schema-invalid and a positioning violation

Anti-template upgrade: Scene-based opening detection

Discovered and fixed a systematic pattern where the creative brainstorm process was producing identical 'It's 10pm, you're lying in bed...' opening scenes across different studies. The root cause: generic reader context (time-of-day, physical location) was being fed into the creative process instead of unique study data. Two published articles were affected. Fix ensures only study-specific findings and tensions drive creative direction — never interchangeable reader scenarios. Added mandatory cross-study scene detection to catch any future pattern matches before publication.

Pattern detected during Trommelen 2023 creative brainstorm: every study defaulted to nighttime phone-in-bed openings

Study-derived persona takeaways replace fixed template

Persona takeaway cards on each study page are now derived from the study's actual data instead of forced into four fixed categories. Each study defines its own audience segments based on who the study tested and what subgroups the data speaks to. Labels, count, and selection are all study-specific. Studies that tested specific age groups, training levels, or goals now show those exact categories instead of generic placeholders.

Architectural review revealed fixed persona keys conflicted with core principle that each study is unique

Verification Pipeline v2.0 Launched

FitChef Verification Pipeline v2.0 launched. 23 agents, 3 verification gates, 28 kill switches. Every study processed through triple-skeptic review before publication.

Pipeline development complete. First production study (Morton 2018) published and verified.

Recent Skeptic Catches

Real findings from real pipeline runs — not hypotheticals.

Does Eating Fat Make You Fat? What 57,000 People Show

Explore Our Trust Layer

Methodology The full verification pipeline How We Verify Step-by-step verification process Corrections Log When we get it wrong, we say so Trust Dashboard Live verification numbers Grounded Truth Map The complete evidence map

On This Page

Live Verification

Studies Verified 81

Claims Grounded 91

Checks Run 2,268

Verification Gates 5

Kill Switches 28

What FitChef Is

FitChef creates nutrition content people love — the kind you save, share, and cook from. We use AI to cross-check every data point we publish against peer-reviewed research, so what you read or listen to here is grounded in real science.

We don't conduct original research. We're not PubMed, and we're not your doctor. Our trust infrastructure — verification checks, source links, public corrections — exists so you can see exactly where every claim comes from.

This is science-grounded inspiration — not medical advice. Always consult a qualified professional before making health decisions.

Read our full disclaimer →

We built a system to proveourselves wrong.