From each mailbox we build an in-memory conversation graph. What is stored today:
Node types
- Participant β One per email address (sender or recipient). Attributes: message count, sent/received, ham/spam/URL counts, first/last seen, internal vs external (by domain).
- Domain β One per domain (e.g.
company.com). Attributes: message count, ham/spam/URL counts, participant count, first/last seen, external flag. Used for hub and spam-ratio analysis.
- Email β One per message. Attributes: message ID, subject, from, to addresses, received time, conversation ID (to link messages in the same thread), sent-folder flag. Threat types detected for that email (e.g. auth_failure, sxl_high_risk_url) are stored on the node so algorithms and UI can use per-message threat context.
Edge types
- Directed send β Participant A β Participant B. One edge per (sender, recipient) pair with at least one message. Attributes: message count, ham/spam/URL counts, first/last time, relationship score (used for cold outreach and anomaly detection).
- From β Participant (sender) β Email. Links each email to its sender for pathfinding and threat visibility.
- To β Email β Participant. Links each email to each To recipient.
- CC β Email β Participant. Links each email to each CC recipient.
- Reply β Email β Email. Links a message to its parent when
In-Reply-To is present (same conversation thread). Enables reply-chain and propagation algorithms.
Conversations (threads) and messages-by-ID are also kept. The graph is not persisted; it is built per scan. After the scan, threat types are attached to each email node. The graph view in Investigate still renders the participant-level graph; the full node and edge set is available for algorithms.