System Monitor: Threat Detection Logic

Introduction

This section outlines the Threat Analysis pipeline used to detect Malicious Activity/Fraud from community reports, external feeds, and automated validation. It turns raw signals into verified cases through ingestion, AI/OCR review, post-audit processing, and indexed lookup, helping reduce noisy reports and support safer risk monitoring.


Data Ingestion (Community Signals)

Community Signals are ingested from user reports and external feeds, then converted into structured cases for verification and lookup across phone numbers, bank accounts, websites, social profiles, emails, URLs, and evidence. The diagram below shows the main verification and indexing flow for those signals.

Threat Signal Process
How threat signals move through verification and indexing

Indicators: phone numbers, bank accounts, bank names, websites, social handles, emails, URLs. Evidence: screenshots, chat logs, payment proof, media files, report descriptions. Normalization: Vietnamese phone numbers use canonical +84, aliases are mapped, and duplicated identifiers are checked. Storage: new cases enter SQL with Processing status before search exposure.

$phone_number = $phone_number[0] == '0' ? '+84'.substr($phone_number, 1) : $phone_number;
INSERT INTO scam_check SET phone_number='$phone_number', status='Processing';

External feeds use a crawler pipeline. Python loads report lists, downloads detail pages, stores raw HTML, then parses identifiers, phone numbers, accounts, bank names, descriptions, images, and update time into JSON. PHP reads the JSON, checks duplicates, creates SQL case records, maps image paths, and queues the case for AI verification.

Community Signals / External Feeds
  -> Normalize phone, bank, social, website, email
  -> Check duplicate identifier
  -> Save SQL case + evidence
  -> Queue for AI verification

[Insert system diagram here: user reports + crawler feeds → normalization → SQL case storage → queue → AI verification]


AI Verification (Semantic + OCR Evidence)

AI verification filters noisy, duplicated, or unsafe reports before public exposure. Reports pass input validation, rule checks, spam limits, semantic review, OCR/evidence checks, and queue-based approval.

Input quality layer:

PHP receives JSON and validates required fields. Rule engine blocks malformed or repeated reports. Daily limits reduce spam. Base64 evidence images are saved as physical files and linked to the case. AI checks names, bank references, descriptions, and evidence text.

Valid reports enter orders with category_code = scams_check and Processing status. A batch worker locks records as In progress, gathers text/images, sends them to AI, then updates the case as Completed or Canceled.

{
  "signal_type": "bank_account",
  "indicator": "0123456789",
  "evidence": {
    "ocr_text": "BANK: ABC | ACCOUNT: 0123456789",
    "semantic_score": 0.92
  },
  "review": {
    "status": "validated",
    "category": "payment_fraud"
  },
}

Only validated cases move from Processing to Completed, keeping unverified reports out of public results.


Real-time Post-Audit

Post-audit keeps reports clean, searchable, and reusable. Small batch workers use status gates so completed steps are not repeated and failed steps can retry.

process_scam_url: link extraction. process_json_url: JSON normalization. process_ai_url: AI-based indicator mapping. Invalid or low-quality descriptions fall back to scam_url = []. Each worker updates its own process flag.

If scam_url is empty but the description is valid, AI extracts links and stores normalized JSON in scam_description_1.scam_url.

SELECT * FROM scam_process WHERE process_scam_url='0' LIMIT 4;
$text_vip = get_link_ai($description_scam);

Parsed indicators are mapped into lookup indexes:

Website/domain → MongoDB website index. Email → normalized email index. Facebook/YouTube/Telegram → resolved social ID. MongoDB stores id_scam_check for traceability. MySQL keeps full case details, descriptions, account data, status, and audit history.

Signals
  -> Extract links
  -> Normalize JSON indicators
  -> Map website / email / social
  -> Index in MongoDB
  -> Keep full case in MySQL

MongoDB keeps lookup fast, while SQL stores the full record for review, masking, and audit.


Threat Analysis Outcomes

Validated reports are promoted into a Threat Intelligence index for real-time lookup. A PHP rule engine classifies input before selecting the lookup path.

Website/email: normalize domain or email, then find linked id_scam_check in MongoDB. Social: resolve Facebook, YouTube, or Telegram into a stable social ID. Phone/account: normalize phone number and match SQL fields such as phone, account, bank, name, and URL. Response: return only Completed cases and mask sensitive fields before public output.

$found = find_website_entry($mongo_results, $domain);
$found = find_scam_social_entry($mongo_results, $id_social);
$account_name = substr($account_name, 0, 5) . '***';

Validated signals support real-time lookup for suspicious phone numbers, accounts, domains, emails, and social profiles. The system keeps traceable case history through id_scam_check, reduces false positives through AI review, and exposes only masked, verified data for stable platform operation.

Reader Value

Readers can use this pattern to organize community signals, external feeds, and OCR evidence into a cleaner validation flow for their own projects over time. In practice, it helps reduce noisy submissions, speed up review, and improve lookup reliability for stable operation in scalable web systems.

Conclusion

This threat detection design combines structured ingestion, AI-assisted verification, post-audit control, and indexed lookup into one consistent monitoring layer. It strengthens system integration and supports stable operation across connected platform components.

Was this content helpful to you?