feat(crm): canonical crm and imported routes implementation
This commit is contained in:
322
db assets/synthetic_crm_v1/README.md
Normal file
322
db assets/synthetic_crm_v1/README.md
Normal file
@@ -0,0 +1,322 @@
|
||||
# Project Velocity - Synthetic Client Graph Dataset
|
||||
|
||||
**Generated:** 2026-04-18
|
||||
**Dataset Version:** 1.0.0
|
||||
**Target:** 250 full synthetic client graphs
|
||||
**Owner:** Sagnik
|
||||
**Alignment:** Founder CRM and Platform Delivery Pack (Doc 16)
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
This dataset contains 250 fully synthetic client graphs aligned to the Project Velocity canonical domain model. It is designed for:
|
||||
|
||||
- CRM module validation and testing
|
||||
- Import pipeline replay testing
|
||||
- Client 360 aggregation validation
|
||||
- Oracle intelligence and writeback testing
|
||||
- QD score and timeseries validation
|
||||
- Communication capture and transcript processing
|
||||
- Workflow and approval governance testing
|
||||
|
||||
The data simulates premium real-estate sales behavior in the Kolkata market across 14 projects.
|
||||
|
||||
---
|
||||
|
||||
## Geography and Inventory
|
||||
|
||||
**Market:** Kolkata and surrounding micro-markets
|
||||
**Projects:** 14 premium residential projects
|
||||
|
||||
| Project ID | Project Name | Developer | Micro-Market |
|
||||
|------------|--------------|-----------|--------------|
|
||||
| PRJ-001 | Eden Devprayag | Eden Group | Rajarhat |
|
||||
| PRJ-002 | Sugam Prakriti | Sugam Homes | Barasat |
|
||||
| PRJ-003 | Atri Aqua | Atri Developers | New Town |
|
||||
| PRJ-004 | Atri Surya Toron | Atri Developers | Rajarhat |
|
||||
| PRJ-005 | Siddha Suburbia Bungalow | Siddha Group | Madanpur |
|
||||
| PRJ-006 | Merlin Avana | Merlin Group | Tangra |
|
||||
| PRJ-007 | DTC Good Earth | DTC Projects | New Town |
|
||||
| PRJ-008 | Siddha Serena | Siddha Group | New Town |
|
||||
| PRJ-009 | Siddha Sky Waterfront | Siddha Group | Beliaghata |
|
||||
| PRJ-010 | Godrej Blue | Godrej Properties | New Town |
|
||||
| PRJ-011 | DTC Sojon | DTC Projects | Rajarhat |
|
||||
| PRJ-012 | Shriram Grand City | Shriram Properties | Howrah |
|
||||
| PRJ-013 | Godrej Elevate | Godrej Properties | Dum Dum |
|
||||
| PRJ-014 | Ambuja Utpaala | Ambuja Neotia | Tollygunge |
|
||||
|
||||
---
|
||||
|
||||
## Dataset Composition
|
||||
|
||||
### Primary Entities
|
||||
|
||||
| Entity | Count | Description |
|
||||
|--------|-------|-------------|
|
||||
| Primary Clients (People) | 250 | Main decision-makers and buyers |
|
||||
| Co-buyers/Family | 91 | Secondary contacts linked to households |
|
||||
| Accounts (Organizations) | 153 | Employers, businesses, referral partners |
|
||||
| Households | 118 | Family decision units |
|
||||
| Relationships | 91 | Spouse, parent, sibling, business partner links |
|
||||
| Leads | 250 | Funnel-stage qualification records |
|
||||
| Opportunities | 400 | Deal pipeline objects (1-3 per client) |
|
||||
| Property Interests | 400 | Project/unit preference records |
|
||||
| Stage History | 1,373 | Lead stage transition audit trail |
|
||||
|
||||
### Interaction Graph
|
||||
|
||||
| Artifact | Count | Description |
|
||||
|----------|-------|-------------|
|
||||
| Interactions | 1,897 | Umbrella communication events |
|
||||
| WhatsApp Messages | 3,367 | Text messages with realistic dialogue |
|
||||
| WhatsApp Threads | 606 | Conversation thread summaries |
|
||||
| Phone Calls | 478 | Call records with duration and direction |
|
||||
| Transcripts | 231 | Speaker-segmented call transcripts |
|
||||
| Emails | 149 | Business correspondence with subjects and bodies |
|
||||
| Site Visits | 305 | Physical site visit records with notes |
|
||||
| Reminders/Tasks | 759 | Follow-up items and action reminders |
|
||||
|
||||
### Intelligence & Enrichment
|
||||
|
||||
| Artifact | Count | Description |
|
||||
|----------|-------|-------------|
|
||||
| QD Scores | 250 | Latest qualification/disposition scores |
|
||||
| QD Timeseries | 1,953 | Historical score propagation (4-12 pts/client) |
|
||||
| Vehicle Events | 80 | Number-plate detection events |
|
||||
| Perception Events | 60 | Behavioral/dwell-time intelligence |
|
||||
| CCTV Links | 120 | Video clip references linked to visits |
|
||||
|
||||
### Workflow & Governance
|
||||
|
||||
| Artifact | Count | Description |
|
||||
|----------|-------|-------------|
|
||||
| Workflow Actions | 100 | Import reviews, merge proposals, writebacks |
|
||||
| Approvals | 49 | Human review decisions |
|
||||
| Writebacks | 28 | Approved canonical mutations |
|
||||
|
||||
### Inventory
|
||||
|
||||
| Artifact | Count | Description |
|
||||
|----------|-------|-------------|
|
||||
| Projects | 14 | Master project records |
|
||||
| Units | 209 | Individual unit inventory (8-20 per project) |
|
||||
|
||||
---
|
||||
|
||||
## File Structure
|
||||
|
||||
```
|
||||
synthetic_client_graphs/
|
||||
├── csv/
|
||||
│ ├── inventory_projects.csv
|
||||
│ ├── inventory_units.csv
|
||||
│ ├── crm_people.csv
|
||||
│ ├── crm_accounts.csv
|
||||
│ ├── crm_households.csv
|
||||
│ ├── crm_relationships.csv
|
||||
│ ├── crm_leads.csv
|
||||
│ ├── crm_opportunities.csv
|
||||
│ ├── crm_property_interests.csv
|
||||
│ ├── crm_stage_history.csv
|
||||
│ ├── intel_interactions.csv
|
||||
│ ├── intel_messages.csv
|
||||
│ ├── intel_calls.csv
|
||||
│ ├── intel_transcripts.csv
|
||||
│ ├── intel_emails.csv
|
||||
│ ├── intel_whatsapp_threads.csv
|
||||
│ ├── intel_visits.csv
|
||||
│ ├── intel_reminders.csv
|
||||
│ ├── intel_qd_scores.csv
|
||||
│ ├── intel_qd_timeseries.csv
|
||||
│ ├── intel_vehicle_events.csv
|
||||
│ ├── intel_perception_events.csv
|
||||
│ ├── intel_cctv_links.csv
|
||||
│ ├── workflow_actions.csv
|
||||
│ ├── workflow_approvals.csv
|
||||
│ └── workflow_writebacks.csv
|
||||
├── json/
|
||||
│ ├── client_360_snapshots_batch_1.json (Clients 1-50)
|
||||
│ ├── client_360_snapshots_batch_2.json (Clients 51-100)
|
||||
│ ├── client_360_snapshots_batch_3.json (Clients 101-150)
|
||||
│ ├── client_360_snapshots_batch_4.json (Clients 151-200)
|
||||
│ ├── client_360_snapshots_batch_5.json (Clients 201-250)
|
||||
│ ├── import_mapping_manifest_example.json
|
||||
│ ├── relationship_graph_map.json
|
||||
│ └── transcript_sidecars.json
|
||||
└── README.md
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Buyer Persona Distribution
|
||||
|
||||
The 250 primary clients are distributed across realistic premium real-estate buyer personas:
|
||||
|
||||
| Persona | Percentage | Count | Characteristics |
|
||||
|---------|-----------|-------|-----------------|
|
||||
| High-Intent Buyer | 20% | ~50 | Quick decision cycle, clear requirements, responsive |
|
||||
| Slow-Burn Investor | 18% | ~45 | Long horizon, price-sensitive, comparison-heavy |
|
||||
| NRI Buyer | 12% | ~30 | Remote decision-making, video calls, family proxies |
|
||||
| Family Decision Unit | 20% | ~50 | Multiple stakeholders, consensus-driven, Vastu-conscious |
|
||||
| Price-Sensitive Aspirational | 15% | ~37 | Stretch budget, EMI-focused, festival-offer hunters |
|
||||
| Broker/Referral Chain | 8% | ~20 | Multiple client representations, commission-focused |
|
||||
| Repeat Visitor | 7% | ~18 | High engagement, multiple visits, decision paralysis |
|
||||
|
||||
---
|
||||
|
||||
## Canonical Domain Alignment
|
||||
|
||||
This dataset maps to the planned Velocity canonical domains:
|
||||
|
||||
### `crm_*` Domain
|
||||
- `crm_people`: Contact identity and demographics
|
||||
- `crm_accounts`: Organization and employer records
|
||||
- `crm_households`: Family and co-buyer structures
|
||||
- `crm_relationships`: Person-to-person linkages
|
||||
- `crm_leads`: Funnel stage and qualification
|
||||
- `crm_opportunities`: Deal pipeline and valuation
|
||||
- `crm_property_interests`: Project/unit preferences
|
||||
- `crm_stage_history`: Audit trail of stage transitions
|
||||
|
||||
### `intel_*` Domain
|
||||
- `intel_interactions`: Unified communication events
|
||||
- `intel_messages`: Text-level message records
|
||||
- `intel_calls`: Call metadata and duration
|
||||
- `intel_transcripts`: Speaker-segmented conversation text
|
||||
- `intel_emails`: Email correspondence
|
||||
- `intel_whatsapp_threads`: Thread-level summaries
|
||||
- `intel_visits`: Site visit records and notes
|
||||
- `intel_reminders`: Task and follow-up tracking
|
||||
- `intel_qd_scores`: Qualification/disposition scores
|
||||
- `intel_qd_timeseries`: Temporal score evolution
|
||||
- `intel_vehicle_events`: Parking/entry detection
|
||||
- `intel_perception_events`: Behavioral intelligence
|
||||
- `intel_cctv_links`: Video evidence references
|
||||
|
||||
### `inventory_*` Domain
|
||||
- `inventory_projects`: Master project catalog
|
||||
- `inventory_units`: Unit-level availability and pricing
|
||||
|
||||
### `workflow_*` Domain
|
||||
- `workflow_actions`: Proposed AI/human actions
|
||||
- `workflow_approvals`: Review decisions
|
||||
- `workflow_writebacks`: Committed mutations
|
||||
|
||||
---
|
||||
|
||||
## Quality Assurance
|
||||
|
||||
### Referential Integrity
|
||||
All foreign key relationships have been validated:
|
||||
- ✅ All `lead.person_id` values exist in `crm_people`
|
||||
- ✅ All `opportunity.lead_id` values exist in `crm_leads`
|
||||
- ✅ All `interaction.person_id` values exist in `crm_people`
|
||||
- ✅ All `visit.person_id` values exist in `crm_people`
|
||||
- ✅ All `qd_score.person_id` values exist in `crm_people`
|
||||
- ✅ No orphaned stage history records
|
||||
- ✅ All `opportunity.project_id` values exist in `inventory_projects`
|
||||
- ✅ All `property_interest.project_id` values exist in `inventory_projects`
|
||||
|
||||
### Temporal Consistency
|
||||
- ✅ Lead creation dates precede interaction dates
|
||||
- ✅ Stage history transitions are monotonic in time
|
||||
- ✅ QD timeseries points are chronologically ordered
|
||||
- ✅ Visit dates align with lead stage progression
|
||||
- ✅ Reminder due dates follow interaction dates
|
||||
|
||||
### Realism Rules Applied
|
||||
- **Names:** Realistic Indian names (Bengali, Hindi, mixed demographics)
|
||||
- **Organizations:** Major Indian IT, banking, manufacturing, and consulting firms
|
||||
- **Communication:** Premium property sales tone, not generic retail
|
||||
- **Stage Transitions:** Narratively coherent (enquiry → visit → negotiation → booking)
|
||||
- **Sales Cadence:** Realistic follow-up intervals (3-15 days between touches)
|
||||
- **Dialogue:** Context-aware transcripts referencing specific projects, prices, and objections
|
||||
- **Budgets:** Aligned to Kolkata premium market (1.5 Cr - 25 Cr range)
|
||||
|
||||
---
|
||||
|
||||
## Usage Instructions
|
||||
|
||||
### CSV-First Import Testing
|
||||
1. Start with `crm_people.csv` as the identity anchor
|
||||
2. Join `crm_leads.csv` on `person_id`
|
||||
3. Join `crm_opportunities.csv` on `lead_id`
|
||||
4. Join `inventory_projects.csv` and `inventory_units.csv` on project/unit IDs
|
||||
5. Map `intel_interactions.csv` on `person_id` for communication history
|
||||
6. Aggregate `intel_qd_scores.csv` and `intel_qd_timeseries.csv` for intelligence
|
||||
|
||||
### Client 360 Validation
|
||||
Load `json/client_360_snapshots_batch_*.json` to validate:
|
||||
- Aggregation accuracy
|
||||
- Cross-domain joining
|
||||
- Derived field computation
|
||||
- Missing data handling
|
||||
|
||||
### Oracle Writeback Testing
|
||||
Use `workflow_actions.csv` and `workflow_writebacks.csv` to test:
|
||||
- Proposal generation
|
||||
- Approval flow simulation
|
||||
- Canonical mutation application
|
||||
- Audit trail completeness
|
||||
|
||||
### Transcript Processing
|
||||
Load `json/transcript_sidecars.json` for:
|
||||
- Speaker diarization validation
|
||||
- Conversation context extraction
|
||||
- Sentiment and intent inference testing
|
||||
|
||||
---
|
||||
|
||||
## Evidence Placeholders
|
||||
|
||||
The dataset includes metadata placeholders for:
|
||||
- CCTV clip references (`clips/VIS_{visit_id}_{random}.mp4`)
|
||||
- Call recording references (`rec/CAL_{call_id}.mp3`)
|
||||
- Transcript references (`trx/CAL_{call_id}.json`)
|
||||
- Camera IDs and gate references
|
||||
|
||||
These are structured metadata only. Actual media payloads are not included.
|
||||
|
||||
---
|
||||
|
||||
## Synthetic Data Limitations
|
||||
|
||||
1. **Names and addresses** are fictional but culturally realistic
|
||||
2. **Phone numbers** follow Indian format but are not real
|
||||
3. **Email addresses** are synthetic and non-deliverable
|
||||
4. **Prices** are representative of Kolkata premium market but approximate
|
||||
5. **Communication text** is template-generated but contextually coherent
|
||||
6. **Transcripts** are structured dialogue, not actual ASR output
|
||||
|
||||
---
|
||||
|
||||
## Acceptance Criteria Verification
|
||||
|
||||
| Criterion | Status |
|
||||
|-----------|--------|
|
||||
| 250 complete synthetic client graphs | ✅ |
|
||||
| All 14 project names represented | ✅ |
|
||||
| Spans CRM, interaction, opportunity, reminder, transcript, enrichment layers | ✅ |
|
||||
| Files structured for CSV-first import testing | ✅ |
|
||||
| Human reviewer can inspect a graph and believe it is coherent | ✅ (sample review recommended) |
|
||||
| Referential integrity across all IDs | ✅ |
|
||||
| No impossible date ordering | ✅ |
|
||||
| No orphaned opportunities or interactions | ✅ |
|
||||
| Every QD artifact points back to plausible evidence | ✅ |
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Import Replay:** Load CSVs into the Velocity import pipeline and validate mapping proposals
|
||||
2. **Client 360 Render:** Use JSON snapshots to test frontend dossier rendering
|
||||
3. **QD Validation:** Verify score computation logic against interaction density
|
||||
4. **Oracle Testing:** Use workflow items to test writeback proposal generation
|
||||
5. **Synthetic Expansion:** Add more projects, cities, or persona types as needed
|
||||
|
||||
---
|
||||
|
||||
**Generated for:** Project Velocity Founder CRM and Platform Planning
|
||||
**Canonical Source:** Doc 16 - Coding Agent Swarm Brief: Synthetic Client Graph Generation
|
||||
**Reviewers:** Sayan, Sourik
|
||||
Reference in New Issue
Block a user