AI Prompts – Data Validation Tool

Fraudulent Data Detection

About

Market Research Fraud continues YoY growth with recent news from the DOJ"Tuesday, April 15, 2025 Press Release: Eight Defendants Indicted in International Conspiracy to Bill $10 Million for Fraudulent Market Survey Data".

If you're in the field or wrangled in the participation with an agency, there are ways to gain quality and insights, but it takes work and active involvement. Buying the research unchecked - without validation - should give you pause. If you aren’t provided customer ride-alongs, that’s a flag. If you’re receiving more data and inputs than can be validated - be concerned. You don’t need 1,000 survey results to know whether something makes sense or not.

Overview

Armed with that knowledge, here is a turnkey "AI‐validation" prompt you can drop into ChatGPT (or any LLM) to interactively audit your survey data for signs of fraud. It's designed for small‐to‐mid-sized companies and will guide the model through a series of forensic checks and follow-up questions.

Simplier than it looks

  1. Copy & paste the markdown in your LLM (ChatGPT) of choice and add your data to the markdown
  2. Add your survey data batch as the User message in the markdown
  3. Submit in LLM (ChatGPT)
  4. Run-through Prompt Staging (locate in the "How to Use This Prompt Instructions" section)
  5. Run your Fraud-Detection routine (locate in the "Follow-up AI Prompt Questions section)

The following markdown includes a mix of "clean" and suspicious records illustrating fast completions, straight-lining, contradictions, IP repeats, and duplicate open-text answers for testing.


What to look for:

  • Too fast (e.g. IDs 002 & 004 at 5 s & 8 s vs. avg ~80 s)
  • Straight-lining (002 all 7’s; 008 monotonic 1–4)
  • Contradictions (003 says “No” to Q1 owning but “Yes” brand affinity in Q5)
  • IP repeats (001 & 002 share 203.0.113.5; 005 & 006 share 192.0.2.45; 003/004/008 share 198.51.100.12)
  • Duplicate open-text (002 & 004: “TextA” / “Short” across every Q7–Q10; 005 & 006 identical answers)
                    
markdown - sample CSV with clean and sus included
**System Prompt** You are SurveyGuard™, an AI assistant specialized in validating marketing research data and detecting potential fraud or low-quality responses. Your goal is to flag suspicious patterns, ask clarifying follow-up questions, and give a final fraud-risk score for each batch of responses. **Interaction Template** **User** Here is a batch of raw survey data (in CSV, JSON, or pasted rows). Each record has respondent ID, timestamp, duration, answers to Q1–Q10, IP (if available), and any metadata. ```csv respondent_id,timestamp,duration_seconds,Q1,Q2,Q3,Q4,Q5,Q6,Q7,Q8,Q9,Q10,ip 001,2025-04-15T10:02:12Z,12,Yes,No,3,4,Yes,7,TextA,TextB,TextC,TextD,192.0.2.1 002,2025-04-15T10:03:05Z,8,Yes,No,3,4,Yes,7,TextA,TextB,TextC,TextD,192.0.2.1 003,2025-04-20T09:12:34Z,35,Yes,No,4,2,No,5,"I use brand X","Occasionally","I’m satisfied","None",203.0.113.5 004,2025-04-20T09:12:50Z,5,Yes,Yes,7,7,7,7,TextA,TextA,TextA,TextA,203.0.113.5 005,2025-04-20T09:15:01Z,45,No,Yes,3,4,Yes,2,"Brand X is great","Often","Love it","No issues",198.51.100.12 006,2025-04-20T09:17:22Z,8,Yes,No,1,1,1,1,Short,Short,Short,Short,198.51.100.12 007,2025-04-20T09:20:15Z,60,Yes,No,2,5,No,6,"I prefer Y","Never","It’s okay","Minor",192.0.2.45 008,2025-04-20T09:21:00Z,58,Yes,No,2,5,No,6,"I prefer Y","Never","It’s okay","Minor",192.0.2.45 009,2025-04-20T09:30:10Z,300,No,No,6,3,No,4,"No opinion","Rarely","Could improve","Somewhat",203.0.113.80 010,2025-04-20T09:35:50Z,12,Yes,No,1,2,3,4,"Mixed","Mixed","Mixed","Mixed",198.51.100.12 011,2025-04-20T09:40:05Z,40,Yes,No,4,3,No,5,"I use brand X","Occasionally","I’m satisfied","None",203.0.113.5 012,2025-04-20T09:45:30Z,30,No,Yes,5,2,Yes,1,"Brand Z","Often","Very happy","No issues",192.0.2.99 ```
How to Use This Prompt
  1. Paste the entire System Prompt into your LLM's system input.
  2. Send your survey data batch as the User message.
  3. The assistant will perform each forensic check and flag suspicious records.
  4. Answer any follow-up questions it asks to refine the fraud assessment.
  5. Review the final table of respondent IDs, risk scores, and actions (retain/review/discard).

Follow-up AI Prompt Questions

Run your fraud-detection routine: Guide the model through a series of forensic checks and follow-up questions.

Assistant (SurveyGuard™)
  1. Completion Time Analysis
    Compute the mean and SD of duration_seconds; flag any durations < (mean–2 SD) as "too fast."
  2. Straight-lining & Pattern Checks
    For matrix questions (e.g. Q3–Q6 on a 1–7 scale), detect identical or monotonic sequences.
  3. Consistency & Contradictions
    Identify logically incompatible answers (e.g. "No" to owning a car but "Yes" to brand-preference questions).
  4. IP & Geo-duplication
    Flag multiple responses from the same IP within a short window.
  5. Open-Text Analysis
    Check richness: extremely short or copy-paste answers ("TextA" repeatedly) get flagged.
  6. Anomaly Scoring
    Aggregate all flags into a 0–100 "fraud risk" score per respondent.
  7. Follow-Up Questions
    If overall risk > 70, ask the user:
    • "Respondent 002 completed in 8 s (avg 25 s); OK or exclude?"
    • "Do you have metadata on geolocation to cross-check IP 192.0.2.1?"
  8. Final Report
    Output a table of respondent IDs, risk scores, and recommended action (retain, review, discard).

Paid data only delivers value when you can trust its source. With market research so easy to manipulate, put a rigorous validation process in place and build a repeatable strategy. Focus on the insights you truly need to make confident, data-driven decisions for your business.

Have ideas & suggestions? I'd love to hear them.