Threat Signal Verification and Risk Indexing
Back to NotesThis note shows how threat signals, external feeds, AI verification, and post-audit indexing improve evidence quality, supporting platform growth through cleaner, searchable risk records.
Written date: 07/20/2025 17:18:41Engineering Notes
Introduction
This section outlines the Threat Analysis pipeline used to detect Malicious Activity/Fraud from community reports, external feeds, and automated validation. It turns raw signals into verified cases through ingestion, AI/OCR review, post-audit processing, and indexed lookup, helping reduce noisy reports and support safer risk monitoring.
Data Ingestion (Community Signals)
Community Signals are ingested from user reports and external feeds, then converted into structured cases for verification and lookup across phone numbers, bank accounts, websites, social profiles, emails, URLs, and evidence. The diagram below shows the main verification and indexing flow for those signals.
All examples in this note use synthetic or masked indicator references. They are included only to explain the validation workflow and do not represent real personal, financial, or account data.
How threat signals move through verification and indexing
Indicators: sample_phone_ref, sample_bank_ref, sample_social_ref, sample_domain_ref, sample_email_ref, URLs.
Evidence: screenshots, chat logs, payment proof, media files, report descriptions.
Normalization: Vietnamese phone numbers use canonical +84, aliases are mapped, and duplicated identifiers are checked.
Storage: new cases enter SQL with Processing status before search exposure.
$phone_ref = normalize_indicator_ref($phone_ref);
INSERT INTO scam_check SET indicator_ref='$phone_ref', status='Processing';
External feeds use a crawler pipeline. Python loads report lists, downloads detail pages, stores raw HTML, then parses identifiers, phone numbers, accounts, bank names, descriptions, images, and update time into JSON. PHP reads the JSON, checks duplicates, creates SQL case records, maps image paths, and queues the case for AI verification.
Community Signals / External Feeds
-> Normalize phone, bank, social, website, email
-> Check duplicate identifier
-> Save SQL case + evidence
-> Queue for AI verification
The full flow connects user reports, crawler feeds, normalization, SQL case storage, queue handling, and AI verification.
AI Verification (Semantic + OCR Evidence)
AI verification filters noisy, duplicated, or unsafe reports before public exposure. Reports pass input validation, rule checks, spam limits, semantic review, OCR/evidence checks, and queue-based approval.
Input quality layer:
PHP receives JSON and validates required fields. Rule engine blocks malformed or repeated reports. Daily limits reduce spam. Base64 evidence images are saved as physical files and linked to the case. AI checks names, bank references, descriptions, and evidence text.
Was this content helpful to you?