8.5 KiB
Platform Reality and Communications Capture Strategy
Date: 2026-04-17
Status: Second-pass reality check and execution guardrail
Owner: Sagnik
Primary Reader: Sayan
Purpose: Keep the communication-memory requirement fully in scope while removing technically false assumptions about what iPhone, Android, telephony, and messaging platforms will actually allow.
1. Business Requirement
The requirement is valid and important:
- remember who called or messaged
- remember what they said
- remember what was promised
- remember when to follow up
- surface this back to the operator later as reminders and context
This requirement stays in scope.
What changes is the implementation assumption.
The product should not be specified as "the app can always directly record every call and read every message thread on the phone." That assumption is false on multiple platforms.
For this Sprint 1 planning pass, the active priority channels are:
- cellular calls
- WhatsApp messages
- WhatsApp voice calls
- optional later-stage email ingestion
SMS is not a Sprint 1 focus.
2. What Actually Stops Direct Capture
iPhone
iPhone does not expose a public general-purpose API for third-party apps to directly read iMessage or standard SMS thread history the way a device owner may imagine. CallKit integrates calling apps with the system and lets apps observe call state for app-appropriate scenarios, but it does not turn a third-party business assistant app into a universal recorder for carrier calls or a universal reader for all message content.
Practical implication:
- do not design the iPhone edge app around unrestricted Phone app call audio capture
- do not design it around unrestricted iMessage or SMS history extraction
- do design it around supported provider integrations, explicit imports, notes, reminders, and app-owned or business-owned channels
- if cellular call recording is required on iPhone, treat provider-routed or explicit import workflows as the baseline, not silent in-app interception
Android
Android is less restrictive in some areas, but it is still not correct to assume universal call or message access. Deeper integration depends on whether the app is the default dialer, default SMS app, enterprise-managed, paired companion, or using an approved provider path. Even then, capture and storage behavior must be treated as channel-specific and permission-sensitive.
Practical implication:
- Android phone edge can go deeper than iPhone in some approved scenarios
- it still must be designed around explicit capability gates, not blanket assumptions
WhatsApp business integration is real, but it must be treated as business-platform integration, not as a magical full-device backdoor. Messaging support is a first-class supported path. Voice and video capabilities must be treated as provider- and rollout-dependent, with business API constraints checked explicitly before product promises are made.
Practical implication:
- supported WhatsApp business messaging is in scope
- use a dedicated business messaging and calling surface in our edge apps and WebOS where WhatsApp Business Platform integration supports it
- do not promise universal WhatsApp call recording as the default implementation assumption
- if voice or video calling support is used, it must be routed through the approved business stack and documented as such
Cellular Calling
Cellular calling remains a valid business requirement, but its implementation path should be treated as channel engineering, not as a generic mobile feature toggle.
Practical implication:
- if office-assigned numbers can be moved behind a business telephony provider, that path is preferable because call recording, transcription, and metadata become server-governed
- if office-assigned numbers remain plain carrier numbers, iPhone and Android behavior must be handled separately and any recording path must be explicit and compliant
- an audible recording disclosure is acceptable from a product standpoint, but the recording architecture still needs to be supported by the chosen channel setup
Email, Facebook, Instagram, Business Telephony
These are more realistic channels for durable ingestion because they already support server-side, provider-side, or webhook-based integration patterns.
Practical implication:
- these channels should be treated as priority sources for communication memory
3. Correct Product Framing
The product feature is:
- communication memory
The product feature is not:
- universal hidden interception of all device communications
That means the app stack should be designed to answer:
- what happened with this lead
- what reminders should exist
- what transcript, message, or note supports that reminder
instead of assuming every communication path can be captured identically.
4. Required Three-Mode Strategy
All communication ingestion must fall into one of these modes.
Mode A: Direct Supported Ingestion
Use this where the platform or provider explicitly supports it.
Examples:
- business messaging webhooks
- approved provider call metadata
- app-owned VoIP or in-app communication flows
- approved default-handler scenarios on Android where the team intentionally accepts that role
Mode B: Provider-Routed Server-Side Ingestion
This is the most important path for durable enterprise behavior.
Examples:
- business telephony provider records call metadata and recordings server-side
- WhatsApp business messaging enters through provider or business API webhooks
- WhatsApp business calling events and recordings enter through approved business-calling infrastructure where available
- email is mirrored or ingested through mailbox integration
- Facebook and Instagram business messaging arrive through official business APIs
Mode C: Operator-Assisted Import and Confirmation
This is the fallback that keeps the business requirement alive when direct capture is blocked.
Examples:
- user uploads a recording
- user confirms a note after a call
- user marks a follow-up promise manually
- system converts note plus metadata into a communication memory fact
5. Product Consequences for Sayan
iPhone Edge App
Treat the iPhone app as:
- control surface
- lead context viewer
- reminder surface
- business WhatsApp surface where supported
- note and import surface
- provider-routed communication memory viewer
Do not treat it as a guaranteed universal recorder or inbox scraper.
Android Phone Edge App
Treat the Android app as:
- the same baseline as iPhone
- plus optional deeper integration only when a specific supported pathway is chosen and documented
Backend
Backend must own:
- communication event persistence
- transcription job state
- speaker-separated transcript storage
- memory fact extraction
- reminder scheduling
- calendar action suggestions and confirmed calendar writes
- provider provenance
- import workflows
Oracle
Oracle must be able to reason over:
- communication events
- extracted memory facts
- follow-up dates
- reminder confidence and provenance
6. Channel Matrix Sayan Must Produce
Sayan should explicitly map, for each channel:
- can we directly ingest it
- do we need provider routing
- do we only support import or notes
- what consent or policy gate applies
- what reminder data can be extracted
Minimum channels:
- PSTN or business telephony calls
- WhatsApp business messages
- WhatsApp voice calls
- optional email integration
- calendar events created from communication-derived follow-ups
- CRM and QD score side effects from confirmed insights
- PSTN or business telephony calls
- WhatsApp business messages
- WhatsApp voice calls
7. Calendar and Insight Consequences
The communications stack is not complete if it only stores recordings and transcripts. It must also drive:
- user-exclusive calendar events
- follow-up reminders
- CRM updates
- QD score updates
- operator-facing insight summaries
NemoClaw should operate on top of stored recordings, transcripts, and extracted memory facts, but only confirmed actions should write into critical systems such as calendar, CRM, and QD score unless product explicitly allows autopilot behavior later.
8. Bottom Line
Nothing about platform restrictions removes the business need.
What it removes is lazy product wording.
The correct plan is:
- keep communication memory as a critical feature
- model channel reality explicitly
- route what can be routed through supported providers
- capture what can be captured directly
- import and confirm the rest
That is the version of this feature that can actually ship.