164 lines
4.6 KiB
Markdown
164 lines
4.6 KiB
Markdown
# Coding Agent Swarm Brief_ Synthetic Client Graph Generation
|
||
|
||
**Date:** 2026-04-18
|
||
**Status:** Draft
|
||
**Owner:** Sagnik
|
||
**Reviewers:** Sayan, Sourik
|
||
**Scope:** Generate 250 full synthetic client graphs for Project Velocity CRM and Client 360 validation
|
||
**Purpose:** Provide a decision-complete brief to a coding-agent swarm that will synthesize realistic client datasets aligned to the future founder CRM schema.
|
||
**Decision Boundary:** This brief defines the data generation target. It does not itself generate the data.
|
||
|
||
## 1. Mission
|
||
|
||
Generate **250 fully synthetic client graphs** that can be imported into Project Velocity and used to validate the future CRM, import, Client 360, Oracle, QD, and reminder workflows.
|
||
|
||
These are not toy demo leads. They should behave like real premium real-estate clients and surrounding commercial context.
|
||
|
||
## 2. Target Domain Assumptions
|
||
|
||
The synthetic data must align to the planned canonical domains:
|
||
|
||
- `crm_*`
|
||
- `intel_*`
|
||
- `inventory_*`
|
||
- `workflow_*`
|
||
|
||
The dataset must feel like it was generated for an AI-native CRM rather than a flat spreadsheet app.
|
||
|
||
## 3. Geography and Inventory Pool
|
||
|
||
Every synthetic client graph should be interested in one or more of these Kolkata-area projects:
|
||
|
||
- Eden Devprayag
|
||
- Sugam Prakriti
|
||
- Atri Aqua
|
||
- Atri Surya Toron
|
||
- Siddha Suburbia Bungalow
|
||
- Merlin Avana
|
||
- DTC Good Earth
|
||
- Siddha Serena
|
||
- Siddha Sky Waterfront
|
||
- Godrej Blue
|
||
- DTC Sojon
|
||
- Shriram Grand City
|
||
- Godrej Elevate
|
||
- Ambuja Utpaala
|
||
|
||
## 4. Dataset Composition
|
||
|
||
Generate at least:
|
||
|
||
- 250 synthetic people / primary clients
|
||
- linked family or co-buyer structures where relevant
|
||
- linked accounts/organizations where relevant
|
||
- multiple lead and opportunity states
|
||
- multiple interaction histories per client
|
||
|
||
## 5. Required Synthetic Output Classes
|
||
|
||
### Identity and CRM records
|
||
|
||
- person identity
|
||
- contact details
|
||
- demographic hints
|
||
- account or employer context
|
||
- household/co-buyer relationships
|
||
- lead status and opportunity stage history
|
||
|
||
### Commercial records
|
||
|
||
- project interests
|
||
- unit preferences
|
||
- budget bands
|
||
- urgency
|
||
- financing posture
|
||
- timeline to decision
|
||
- objections and motivations
|
||
|
||
### Communication records
|
||
|
||
- WhatsApp messages
|
||
- WhatsApp voice-call records
|
||
- voice-call transcripts with speaker segmentation
|
||
- email threads
|
||
- meeting notes
|
||
- reminders and task chains
|
||
|
||
### Timeline and visit records
|
||
|
||
- site visits
|
||
- revisit intent
|
||
- stage changes over time
|
||
- follow-up loops
|
||
|
||
### Intelligence and enrichment
|
||
|
||
- QD score history
|
||
- QD time-series shifts
|
||
- intent and urgency summaries
|
||
- inferred persona labels
|
||
- risk flags
|
||
- recommended next actions
|
||
|
||
### Evidence placeholders
|
||
|
||
Generate metadata placeholders for:
|
||
|
||
- CCTV references
|
||
- number-plate events
|
||
- room or perception events
|
||
- media asset references
|
||
|
||
Do not attempt to generate real CCTV image/video payloads unless the swarm can do so well. Metadata placeholders are acceptable.
|
||
|
||
## 6. File Formats Required
|
||
|
||
The swarm should produce:
|
||
|
||
- import-ready CSV files for major canonical entities
|
||
- JSON sidecars for nested or graph-heavy artifacts
|
||
- relationship maps where one flat CSV is insufficient
|
||
- one README describing how the synthetic dataset is organized
|
||
|
||
## 7. Required Realism Rules
|
||
|
||
- names, organizations, communication tone, and buying patterns must feel believable
|
||
- communication history should reflect premium property sales behavior, not generic consumer retail behavior
|
||
- stage transitions should make narrative sense
|
||
- reminders and follow-up tasks should reflect actual sales cadence
|
||
- transcripts should contain realistic but synthetic dialogue
|
||
|
||
## 8. Distribution Guidance
|
||
|
||
The 250 graphs should be varied across:
|
||
|
||
- high-intent buyers
|
||
- slow-burn investors
|
||
- NRI buyers
|
||
- family decision units
|
||
- price-sensitive but aspirational prospects
|
||
- brokers/referral chains
|
||
- repeat visitors
|
||
|
||
## 9. Output Quality Checks
|
||
|
||
The swarm must ensure:
|
||
|
||
- referential integrity across IDs
|
||
- no impossible date ordering
|
||
- no orphaned opportunities or interactions
|
||
- no transcript without parent interaction/call references
|
||
- every QD or enrichment artifact points back to a plausible source
|
||
|
||
## 10. Acceptance Criteria
|
||
|
||
- 250 complete synthetic client graphs produced
|
||
- all listed project names are represented
|
||
- output spans CRM, interaction, opportunity, reminder, transcript, and enrichment layers
|
||
- files are structured to support future CSV-first import testing
|
||
- a human reviewer can inspect a client graph and believe it is coherent
|
||
|
||
## 11. Bottom Line
|
||
|
||
The swarm’s job is to generate the synthetic world Velocity needs in order to stop designing around empty CRM tables and start validating against realistic client intelligence data.
|