# Coding Agent Swarm Brief_ Synthetic Client Graph Generation **Date:** 2026-04-18 **Status:** Draft **Owner:** Sagnik **Reviewers:** Sayan, Sourik **Scope:** Generate 250 full synthetic client graphs for Project Velocity CRM and Client 360 validation **Purpose:** Provide a decision-complete brief to a coding-agent swarm that will synthesize realistic client datasets aligned to the future founder CRM schema. **Decision Boundary:** This brief defines the data generation target. It does not itself generate the data. ## 1. Mission Generate **250 fully synthetic client graphs** that can be imported into Project Velocity and used to validate the future CRM, import, Client 360, Oracle, QD, and reminder workflows. These are not toy demo leads. They should behave like real premium real-estate clients and surrounding commercial context. ## 2. Target Domain Assumptions The synthetic data must align to the planned canonical domains: - `crm_*` - `intel_*` - `inventory_*` - `workflow_*` The dataset must feel like it was generated for an AI-native CRM rather than a flat spreadsheet app. ## 3. Geography and Inventory Pool Every synthetic client graph should be interested in one or more of these Kolkata-area projects: - Eden Devprayag - Sugam Prakriti - Atri Aqua - Atri Surya Toron - Siddha Suburbia Bungalow - Merlin Avana - DTC Good Earth - Siddha Serena - Siddha Sky Waterfront - Godrej Blue - DTC Sojon - Shriram Grand City - Godrej Elevate - Ambuja Utpaala ## 4. Dataset Composition Generate at least: - 250 synthetic people / primary clients - linked family or co-buyer structures where relevant - linked accounts/organizations where relevant - multiple lead and opportunity states - multiple interaction histories per client ## 5. Required Synthetic Output Classes ### Identity and CRM records - person identity - contact details - demographic hints - account or employer context - household/co-buyer relationships - lead status and opportunity stage history ### Commercial records - project interests - unit preferences - budget bands - urgency - financing posture - timeline to decision - objections and motivations ### Communication records - WhatsApp messages - WhatsApp voice-call records - voice-call transcripts with speaker segmentation - email threads - meeting notes - reminders and task chains ### Timeline and visit records - site visits - revisit intent - stage changes over time - follow-up loops ### Intelligence and enrichment - QD score history - QD time-series shifts - intent and urgency summaries - inferred persona labels - risk flags - recommended next actions ### Evidence placeholders Generate metadata placeholders for: - CCTV references - number-plate events - room or perception events - media asset references Do not attempt to generate real CCTV image/video payloads unless the swarm can do so well. Metadata placeholders are acceptable. ## 6. File Formats Required The swarm should produce: - import-ready CSV files for major canonical entities - JSON sidecars for nested or graph-heavy artifacts - relationship maps where one flat CSV is insufficient - one README describing how the synthetic dataset is organized ## 7. Required Realism Rules - names, organizations, communication tone, and buying patterns must feel believable - communication history should reflect premium property sales behavior, not generic consumer retail behavior - stage transitions should make narrative sense - reminders and follow-up tasks should reflect actual sales cadence - transcripts should contain realistic but synthetic dialogue ## 8. Distribution Guidance The 250 graphs should be varied across: - high-intent buyers - slow-burn investors - NRI buyers - family decision units - price-sensitive but aspirational prospects - brokers/referral chains - repeat visitors ## 9. Output Quality Checks The swarm must ensure: - referential integrity across IDs - no impossible date ordering - no orphaned opportunities or interactions - no transcript without parent interaction/call references - every QD or enrichment artifact points back to a plausible source ## 10. Acceptance Criteria - 250 complete synthetic client graphs produced - all listed project names are represented - output spans CRM, interaction, opportunity, reminder, transcript, and enrichment layers - files are structured to support future CSV-first import testing - a human reviewer can inspect a client graph and believe it is coherent ## 11. Bottom Line The swarm’s job is to generate the synthetic world Velocity needs in order to stop designing around empty CRM tables and start validating against realistic client intelligence data.