Entity Extraction
Extract structured entities (prices, emails, phones, PII) from text for policy evaluation.
The veto-sdk/extractors module provides deterministic, regex-based entity extraction from arbitrary text. It detects prices, emails, phone numbers, salary figures, equity percentages, government IDs, credit cards, and API keys.
This is the same extraction engine used by the Veto browser extension to populate arguments.extracted_entities in browser agent contexts.
Installation
import { extractEntities } from 'veto-sdk/extractors';
// or
import { extractEntities } from 'veto-sdk';extractEntities(text, options?)
function extractEntities(
text: string,
options?: ExtractEntitiesOptions,
): ExtractedEntitiesReturns an ExtractedEntities object. Returns empty defaults if text is shorter than 3 characters. Text is capped at textCap characters before processing (default 200,000).
ExtractedEntities
| Field | Type | Description |
|---|---|---|
prices | number[] | Prices found in the text. Values above $1M are excluded. |
max_price | number | Highest price in prices, or 0 if none. |
min_price | number | Lowest price in prices, or 0 if none. |
emails | string[] | Deduplicated emails, lowercased. |
phone_numbers | string[] | Deduplicated phone numbers that pass length heuristics. |
salary_figures | number[] | Salary/compensation amounts. Range: $1,001–$9,999,999. |
has_salary_figures | boolean | true if at least one salary figure was found. |
equity_percentages | number[] | Equity percentages in the range 0–100. |
has_equity_info | boolean | true if at least one equity percentage was found. |
sensitive_terms | string[] | Labels for which sensitive entities were detected: salary, equity, gov_id, credit_card, api_key, email, phone. |
has_sensitive_pii | boolean | true if sensitive_terms is non-empty. |
has_credit_cards | boolean | true if a Luhn-valid 16-digit card number was found. |
has_gov_ids | boolean | true if a government ID pattern matched. |
has_api_keys | boolean | true if an API key pattern matched. |
ExtractEntitiesOptions
| Option | Type | Default | Description |
|---|---|---|---|
maxPrices | number | 100 | Maximum number of prices to collect. |
maxEmails | number | 50 | Maximum number of emails to collect. |
maxPhones | number | 50 | Maximum number of phone numbers to collect. |
maxSalaryFigures | number | 50 | Maximum number of salary figures to collect. |
maxEquityPercentages | number | 50 | Maximum number of equity percentages to collect. |
textCap | number | 200000 | Characters to process. Text beyond this limit is ignored. |
Supported entity types
Prices
Multi-currency: USD ($), EUR (€), GBP (£), JPY (¥), INR (₹), KRW (₩), CHF, AUD, CAD, CNY. Currency code prefixes (USD 1,200) are also matched. Values at or above $1,000,000 are excluded.
$49.99 EUR 1,200 ¥3000Emails
RFC-style pattern with length limits: 64-character local part, 255-character domain. Deduplicated case-insensitively.
user@example.com hr+payroll@company.co.ukPhone numbers
International format (+country code) and domestic (10+ digits). Short numeric sequences are filtered out: international numbers require 8+ digits, domestic numbers require 10+.
+1 415 555 0100 (800) 867-5309Salary figures
Keyword-anchored: salary, compensation, comp, pay, wage, income, base, total comp, OTE, CTC. Supports K suffix ($150K). Required range: $1,001–$9,999,999.
Base salary: $120,000 OTE $200K/yr compensation: EUR 85,000Equity percentages
Keyword-anchored after the percentage: equity, vesting, options, ownership, stake, shares, stock, RSUs, ESOP. Range: 0–100%.
0.5% equity 2% vesting 15% ESOPGovernment IDs
Three patterns:
- US SSN:
XXX-XX-XXXX - UK NIN:
XX XX XX XX X - US EIN:
XX-XXXXXXX
Detection is boolean — matched IDs are not stored in the return value.
Credit cards
16-digit patterns (XXXX-XXXX-XXXX-XXXX or space-separated). Luhn checksum validation reduces false positives. Detection is boolean.
API keys
Tokens starting with sk, pk, api, key, token, secret, or bearer, followed by 20+ alphanumeric characters. Case-insensitive. Detection is boolean.
sk-abc123... Bearer eyJhbGc... api_key_ABCD...Usage with rules
Extracted entities can feed directly into evaluateRulesLocally via the arguments.extracted_entities field, which is the same path the browser extension populates.
import { extractEntities } from 'veto-sdk/extractors';
import { evaluateRulesLocally } from 'veto-sdk';
const entities = extractEntities(pageText);
const result = evaluateRulesLocally(rules, 'browser_click', {
arguments: { extracted_entities: entities }
});Example rule that blocks actions when salary data is present:
- id: block-salary-exfil
name: Block actions on pages with salary data
enabled: true
severity: high
action: block
tools: [browser_click, form_submit]
conditions:
- field: arguments.extracted_entities.has_salary_figures
operator: equals
value: trueRelated
- Browser Agents — using Veto in Chrome extensions
- Output Validation — validating agent outputs before they reach users
- Economic Authorization — price-based policy enforcement