Our Methodology

AI extracts. Humans review. Science decides.

Most platforms hide how they use AI. We built our entire methodology around it — and we can show you exactly how it works.

We Don't Use AI to Be Fast. We Use AI to Be Good.

Most companies use AI to produce the same content as everyone else — just faster. More articles. More pages. More noise. That's not what we do.

We use AI to create content that simply wasn't possible without AI.

A single FitChef library page synthesizes findings from dozens of peer-reviewed studies. Not summaries — synthesis. The kind of deep, cross-referenced analysis that would take a research team weeks to produce manually. We do it with AI extraction, machine verification, and human review — and the result is something no human team and no AI alone could build.

When you read a FitChef page, you're not reading a blog post. You're reading the conclusion of a process that started with hundreds of pages of scientific literature. In minutes, you learn what took dozens of studies and hundreds of pages of research to establish.

That's not efficiency. That's a new kind of content — and it's what we exist to build.

How We Actually Work

Every piece of content on FitChef follows the same path — from raw science to something you love reading. Here's what that path looks like.

1

We build topic clusters

Every topic our users care about — protein timing, gut health, sleep and recovery — becomes a cluster. Each cluster maps out every question worth answering and every study worth reading.

Human decision
2

AI reads the full papers

Not the abstract. Not the conclusion. Every page, every table, every footnote, every limitation. AI extracts every finding — numbers, study design, population details, and what the researchers said might be wrong with their own work.

AI extraction
3

The Skeptic verifies everything

Our Skeptic Protocol checks every extraction against the original paper. Does the number match? Is the claim within scope? Did AI overstate a finding? 28 kill switches catch overreach, missing context, and distorted conclusions.

Skeptic gate 1
4

Verified findings become the evidence base

Every verified finding — with its source study, verification status, and limitations — enters our evidence base. This is the foundation for everything on FitChef. Each finding links back to its paper. Every study gets a full deep-dive linking forward to every answer that uses its findings.

Foundation layer
5

Findings become evidence-based answers

Multiple studies on the same question? We synthesize their findings into one evidence-verified answer — backed by the full weight of evidence, not just one paper. A dedicated Skeptic gate verifies the synthesis is honest and the writing matches.

Skeptic gate 4
6

Answers become flagship guides

Verified answers are woven into flagship guides — the one page that replaces everything else on the topic. A final Skeptic gate verifies every claim is accurately represented. No overstatement. No creative drift.

Skeptic gate 5
7

Humans press publish

Our team reviews every Skeptic flag before anything goes live. AI doesn't press the publish button. Humans do — after seeing every flag, every context note, and every limitation the Skeptic raised.

Human approval

Seven steps. 3 pipelines. 5 Skeptic gates. One rule: nothing reaches you that hasn't survived the full chain.

The Foundation That Makes the Difference

Every website on the internet can write "you need 1.6 to 2.2 grams of protein per kilogram of bodyweight." That sentence takes five seconds to type. It's on a thousand pages already.

When FitChef says it, that number is connected to a Grounded Truth Map — a living database of verified claims, each linked to the specific study it came from, the exact page and table where the finding appears, the limitations the researchers disclosed, and the verification status our Skeptic Protocol assigned after checking every data point.

That one number — 1.6 to 2.2 grams — might be supported by a dozen studies across our map. Different populations. Different methods. Different years. Each one extracted and cross-checked through our automated pipeline. The claim you read on our page is the synthesis of all of them.

This is the difference between content that sounds evidence-based and content that is evidence-based. The Grounded Truth Map is what makes that difference visible — to you, to researchers, and to every AI system that evaluates our pages.

Peer-Reviewed Studies Full papers. Every page. Every table.
Verified Findings Verified. Source-linked. Limitation-tagged.
Evidence-Based Answers Multi-study. Consistency-verified. Certainty-tiered.
The Page You Read Every sentence traces back through five verification layers.

What Our Pipeline Extracts From Every Study

Not every study is created equal. A 12-person pilot study and a 10,000-person randomized controlled trial both count as "research" — but they don't carry the same weight. Our system knows the difference.

Verification Ledger

PASS / FAIL

Every study runs through a 28-point verification check. Each check is pass or fail — no scores, no ratings, no editorial opinion about the research itself. We verify our own data extraction fidelity. When multiple studies are combined into a claim, that claim gets an Evidence Consistency Index and a Certainty Tier (High, Moderate, or Low) — algorithmically determined, not editorially assigned.

Sample size

Larger populations produce more reliable findings. A study of 2,000 people carries more weight than a study of 20.

Study design

Randomized controlled trials and systematic reviews score higher than observational studies and case reports.

Replication

Findings confirmed across multiple independent studies receive the highest confidence. A single study — no matter how large — is a starting point, not a conclusion.

Journal quality

Peer-reviewed publications in established journals. We don't cite preprints, press releases, or studies we could only read the abstract of.

Effect size

Statistical significance isn't enough. We look at whether the effect is large enough to matter in real life — not just in a statistics table.

Disclosed limitations

Studies that honestly discuss their own limitations score higher. If researchers know what might be wrong with their work, we trust them more — not less.

When findings are combined into answers, the Certainty Tier tells you how consistent the evidence is: High Certainty means strong consistency across multiple high-quality studies. Low Certainty means preliminary or limited evidence — and we say so clearly. Every answer carries source links so you can verify it yourself.

When We're Wrong — And What Happens Next

We will get things wrong. New research will contradict old findings. A study we relied on will be retracted. Our own Skeptic will catch an error we missed. This is certain — and we designed our entire system around it.

Here's what happens when a claim in our Grounded Truth Map changes:

The claim is updated

The Grounded Truth Map entry is corrected with the new finding, the reason for the change, and the source that prompted it.

Every connected page updates

Every library page, every synthesis, every piece of content that references that claim is flagged for revision. Not just the page where we noticed the error — all of them. Atomically.

We publish a correction note

A public note is added to every affected page explaining what changed and why. Not buried in a changelog — visible on the page itself. Because if you read the old version, you deserve to know what's different now.

The corrections log records it

Every correction is permanently documented in our public corrections log — with dates, reasons, and the old vs. new information. Nothing is quietly deleted.

Is this efficient? No. Not at all. Updating every connected page for a single changed claim is expensive, time-consuming, and most platforms would never bother. But when the internet is drowning in AI-generated noise where nobody knows what's true anymore — this is the only way to earn your trust.

Three Content Layers. Each One Deeper.

FitChef doesn't just publish articles. We build three layers of content — each one synthesizing the layer below it. Each layer has its own dedicated pipeline, its own verification gates, and its own editorial voice.

1

Study Analysis

Every peer-reviewed paper gets a deep editorial analysis — like a sports commentator breaking down a match. What did they test? What did they find? What does nobody tell you about this study? Three verification gates check every number, every sentence, and every import.

2

Evidence-Based Answers

Findings from multiple studies are synthesized into evidence-verified answers. Not one study's opinion — the weight of all evidence. A dedicated verification gate checks that our synthesis is honest and our writing matches.

3

Flagship Guides

Verified answers are woven into flagship guides — the one page that replaces everything else on the topic. A final verification gate checks that every claim is accurately represented. No overstatement. No creative drift.

This is why we can show you the Skeptic flags on this very page. We're not afraid of transparency — transparency is the product.

If your methodology page doesn't survive its own methodology — you don't have a methodology.
— The Skeptic

Three Pipelines. Every Team. Every Decision.

Verified research doesn't just power our content. It powers the entire platform.

Content

Every study analysis, every evidence-based answer, and every flagship guide is built from verified evidence. 3 pipelines. 5 verification gates. Nothing is published without passing the full chain.

Product

Meal plans, recipes, and features are built on verified claims. When research shows a specific protein range for trained athletes, that finding becomes the formula behind your dinner — not a guess, not a trend.

Support

When a member asks "why no dairy in this plan?" — the platform surfaces the specific studies, the specific findings, and the certainty level. Not opinions. Sourced answers from the same structured data.

The Skeptic Itself

Every review, every correction, every edge case makes the verification system sharper. The Skeptic Protocol is the only part of FitChef that improves automatically — because every interaction teaches it something new.

A System That Gets Smarter Every Day

Most verification systems are built once and left to rot. Ours evolves.

Every time the Skeptic catches an overreach, it learns what overreach looks like in that context. Every time a human reviewer corrects a flag in Slack, that correction feeds back into the system. Every edge case — a study with unusual methodology, a claim that sits on the boundary between verified and contested — makes the next review sharper.

Six months from now, the Skeptic will catch things it can't catch today. A year from now, it will flag patterns across hundreds of studies that no human reviewer would ever notice. This is the compounding advantage of building verification into the foundation — not bolting it on as an afterthought.

Don't Take Our Word for It

Everything above describes how we work. But we're a methodology page — so we should prove it. Below are the peer-reviewed studies that validate our approach. We ran our own Skeptic Protocol on every one of them, because if we're going to claim our methodology is evidence-based, the evidence itself has to survive our verification system.

Key Finding

AI-Assisted Data Extraction With a Large Language Model

Gartlehner et al., 2025 · Annals of Internal Medicine · 9,341 data elements · 63 studies

AI-assisted extraction achieved 91.0% accuracy compared to 89.0% for human-only extraction. Errors were fewer (9.0% vs 11.0%), time was reduced by 41 minutes per study, and among discordant items, humans were wrong more often (41.7%) than the AI-assisted approach (32.9%).

Our Skeptic flagged: AI hallucination rate was slightly higher (0.8% vs 0.5%). The 91% accuracy includes human verification of AI output — not AI alone. This is exactly FitChef's model.

This study is the most relevant to FitChef's approach because it used the same AI model we use, in a real-world workflow, with blinded adjudicators creating the reference standard. It wasn't a lab test — it was a workflow validation across six ongoing systematic reviews.

AI Tools for Evidence Synthesis

Helms Andersen et al., 2025 · Danish Health Technology Assessment

AI achieved approximately 90% accuracy matching human reviewers for data extraction. But AI makes different kinds of errors than humans — confabulations vs. omissions.

Skeptic flag: Small sample size. The 15% error rate came from a different study. Supports human-in-the-loop, not AI-only.

LLM vs Human Reviewers in RCTs

Bianchi et al., 2025 · Cochrane Evidence Synthesis · 20 RCTs

AI extracted more data than humans in 29.3% of cases. Only 4.3% of extractions were classified as wrong. But for complex variables like intervention effects, AI missed nuance in 95% of cases.

Skeptic flag: Small convenience sample (20 RCTs). No blinded reference standard. Used Elicit, not our LLM.

Data Extraction Error Frequency

Mathes et al., 2017 · BMC Medical Research Methodology · Systematic review

Human extraction error rates ranged from 8% to 50%. Experience had no significant effect on accuracy. Even Cochrane reviews — the gold standard — contained errors in half the cases reviewed.

Skeptic flag: Only 6 included studies — thin evidence base. "Up to 50%" is the ceiling, not the average. Errors rarely changed final review conclusions.

Overtime Work and Cognitive Function

Virtanen et al., 2009 · Whitehall II Study · 2,214 participants

Long working hours were associated with lower cognitive function in reasoning and vocabulary tests. The cognitive cost of sustained knowledge work is measurable and significant.

Skeptic flag: Observational study — cannot prove causation. Effect found in only 2 of 5 cognitive tests. Population was predominantly white-collar British civil servants.

No single study proves our exact pipeline. But the evidence points in one direction: AI extraction plus human verification outperforms either alone. That's the architecture we built.

Why Humans Alone Aren't Enough

Here's something the research community knows but rarely talks about publicly: even trained experts make errors when extracting data from studies. Not occasionally — routinely.

A systematic review by Mathes, Klaßen, and Pieper examined error rates across multiple studies of data extraction in systematic reviews. What they found was striking: error rates ranged from 8% to 50% across studies. Even in gold-standard Cochrane reviews — the most rigorous reviews in medicine — half contained at least one extraction error. And experience didn't help. Reviewers with years of systematic review experience made errors at roughly the same rate as newcomers.

Separate research on cognitive performance shows why. The Whitehall II study, tracking over 2,000 British civil servants, found that sustained long working hours were associated with measurable cognitive decline. The people doing the hardest knowledge work are the most vulnerable to the fatigue that causes errors.

This isn't a criticism of human reviewers. It's a recognition of human biology. We get tired. We miss things. We read "standard deviation" when the paper says "standard error." We transpose digits. And these small errors can cascade through an entire evidence synthesis.

That's exactly why our system combines AI extraction — which never gets tired, never transposes digits, and reads every page with the same attention — with human review that catches the things AI misses: context, nuance, and the judgment calls that require understanding the real world.

What's In. What's Coming. What's Not Ready Yet.

Most platforms show you what they offer today and hope you don't ask questions. We'd rather show you everything — including what we haven't built yet and why.

Our roadmap is driven by evidence, not trends. A feature makes it into the platform when the research supports it — and the Skeptic has verified that research. Until then, it stays on this list. Honestly.

Live — Verified
AI-powered study extraction from peer-reviewed papers
5-gate Skeptic verification across 3 content pipelines
Human review in Slack before any publish
Public corrections log — every error documented
Grounded Truth Map linking claims to source studies
Building — Evidence Reviewed
Meal plans grounded in verified nutrition claims
Recipe formulations based on evidence-backed macros
Skeptic Plugin for continuous self-improvement
Cross-claim contradiction detection
Researching — Not Yet Proven
Personalized nutrition based on biomarker research
Gut microbiome integration — promising but early-stage
Chronobiology-based meal timing optimization
Supplement efficacy database — evidence still mixed

This list changes. When new evidence passes the Skeptic Protocol, features move from "Researching" to "Building" to "Live." When evidence weakens, we say so. The roadmap is as honest as the content.

See the full picture Our Public Roadmap What's live, what's building, and what the science isn't ready for yet.

Not p-values. Not Jargon. Content So Good It's the Final Destination.

We didn't build all of this so we could throw statistics at you. We built it so that when you read a FitChef page about protein timing or gut health or sleep recovery, you never need to read another page about that topic. Ever.

The science is the foundation. But the product is something else entirely — content so clear, so well-sourced, and so practical that it becomes the last thing you need to read. In a few minutes, you learn what took dozens of studies and hundreds of pages of research to establish. Not because we dumbed it down. Because we built a system that makes the complex clear without losing what makes it true.

Most nutrition platforms could adopt this approach. They won't. Because reading full papers is expensive. Verification slows you down. Public corrections feel risky. And building a system that flags your own mistakes requires a kind of institutional humility that doesn't come naturally.

We built it anyway. Not because we're better people. Because we spent a decade doing it the other way, and we know where that road leads. It leads to a million readers and zero confidence that what you told them was accurate.

This methodology is our answer to that. Not perfect — honest. And getting better every single day.

See the verification system in action The Skeptic Protocol 5 gates. 3 pipelines. 28 kill switches. Zero exceptions.
Explore Our Trust Layer