Agent Mask | PII Detection & Anonymization Native to Snowflake

The Hidden Cost

The Data Tax

You’re paying to store data you can’t safely use.

A customer shares their credit card number in a live chat to process a refund.
A nurse types a patient’s name into a support ticket to report a software bug.
An account holder reads their SSN to a rep to verify their identity.
An employee signs their name in an "anonymous" survey to ensure a follow-up.

Now there’s a credit card number in your chat logs. A patient identity in your help desk. A Social Security number in your call transcripts. A personal identity in your "anonymous" dataset.

That data is now trapped. You can’t safely feed it to your LLMs. You can’t run analytics on it. You can’t use it for training or research without creating massive risk.

It’s either a compliance liability or wasted insights—pick one.

That data has value—if you could just separate what’s sensitive from what’s useful.

Compliance Liability

Unmasked PII = audit findings, fines, breach risk

Locked Analytics

Data teams can’t use what they can’t safely access

Blocked AI/ML

Models can’t train on data riddled with PII

Data Sprawl

PII copies spreading across dev, test, and staging

The Solution

Reclaim the Value

Scan text fields for hidden PII and mask it automatically—all inside Snowflake.

Comprehensive Detection

From obvious SSNs to names hidden in messy transcripts, Agent Mask catches what simpler tools can’t. One solution for healthcare, finance, government, and enterprise data.

Native to Snowflake

Your data never leaves your environment. No transfers, no third-party exposure, no additional infrastructure to manage.

Production Ready

GPU-optimized batch processing handles enterprise workloads. Detect and anonymize millions of records without breaking a sweat.

Multi-Language

Detect PII in English, Spanish, German, French, and more. Global data, single solution.

Flexible Output

Mask, redact, or replace. Configure anonymization per entity type and use case.

Structured Mapping

Know exactly what was found and where. Get entity positions and types, not just redacted text.

Solutions for

HIPAA-Compliant Analytics

Clinical notes, medical transcripts, discharge summaries—healthcare data is unstructured and sensitive. Agent Mask detects PHI in freeform text with the precision HIPAA demands.

De-identify clinical text for research and analytics
Share with research partners without expanding BAA scope
Train ML models on real patient language, not PII

Outcome

Share data with internal teams and external partners for research, analytics, and care coordination—without compromising patient privacy. Train AI on real clinical notes and transcripts, safely.

Discharge Summary

Margaret ChenPERSON, 67F, discharged 03/14/2024DATE following cardiac catheterization. Attending: Dr. Robert OkonkwoPERSON (NPI: 1528496372NPI). Pt to follow up with cardiology in 2 weeks. Daughter Linda ChenPERSON (415-555-0189PHONE) designated emergency contact. Insurance: Blue Cross ID 7294851036INSURANCE_ID.

PCI-DSS and Privacy Compliance

Loan applications, transaction notes, customer communications—financial data lives in documents and conversations. Agent Mask finds sensitive data wherever it hides.

Clean transaction data for fraud analytics
Enable BI teams to query without compliance risk
Provision safe datasets for dev and QA environments

Outcome

Run fraud models on transaction notes that were previously off-limits. Enable data-driven decisions while maintaining the regulatory compliance your business depends on.

Advisor Call Notes

Client David RamirezPERSON from Meridian Capital PartnersORG called re: wire. IBAN: DE89370400440532013000IBAN. Verified via SSN 412-68-6789SSN and DOB 11/03/1978DOB. Card 4532-7891-2345-4421CREDIT_CARD. Callback: 832-555-0147PHONE.

Safe Data Sharing at Scale

Employee feedback, user research, customer surveys—valuable data locked behind privacy concerns. Agent Mask makes it safe to share across teams.

De-identify employee surveys for workforce analytics
Clean user research before sharing with product
Prepare customer feedback for company-wide insights

Outcome

Turn restricted data into company-wide assets. Analyze employee feedback without exposing who said what.

Employee Survey Response

Honestly, my manager Kevin WalshPERSON has been great but the workload since MarchDATE is unsustainable. I've talked to PriyaPERSON and JamesPERSON on my team and they feel the same. I'm starting to look elsewhere. You can reach me at t.morrison@company.comEMAIL if HR wants to discuss.

FOIA and Public Records Compliance

Court filings, body cam transcripts, investigative reports—government records require redaction before release. Agent Mask automates what used to take hours of manual review.

Accelerate FOIA response turnaround
Enable public records search without exposure
Prepare documents for inter-agency sharing

Outcome

Prepare public records without manual review of every document. Meet disclosure deadlines without compromising privacy.

Constituent Complaint

My name is Barbara HendricksPERSON and I live at 2847 Oak Street, Apt 4BADDRESS. I'm writing about the situation at Riverside ElementaryORG. Please contact me at bhendricks@gmail.comEMAIL or 555-294-8831PHONE. My case number is GOV-2024-08472CASE_ID.

Specifications

Enterprise-Grade Specifications

Technical Capabilities

Supported Entities

SSN, credit cards, emails, phones, names, addresses, medical IDs, financial IDs, and more

Languages

Major world languages — European, Asian, Middle Eastern

Detection

Context-aware AI with high accuracy on unstructured text

Processing

GPU-optimized batch processing for high-volume workloads

Deployment

Snowflake Native App—no external infrastructure

Data Residency

All processing within your Snowflake account

Anonymization

Masking, redaction, replacement, tokenization

Personal Identifiers

SSN Name Email Phone Address DOB Driver's License Passport

Financial

Credit Card Bank Account IBAN Tax ID Account Numbers

Healthcare

NPI MBI DEA MRN Dates PHI

Digital & Location

IP Address URL Username Device ID ZIP Code GPS

Other

VIN License Plate National ID

HIPAA

HIPAA Ready

Healthcare data protection. BAA support and PHI detection.

GDPR

GDPR Compliant

EU data types, right to erasure, data minimization.

CCPA

CCPA Ready

California consumer data protection and disclosure.

PCI

PCI-DSS Aligned

Credit card detection and masking for payments.

SOC2

SOC 2 Ready

Built with SOC 2 controls for enterprise security.

Zero Trust Architecture

Agent Mask operates on a zero-trust model. We never see your data, never store your data, never have access to your data. The application runs in your Snowflake environment with the permissions you grant—nothing more.

The Trade-Offs End Here

What You Give Up With Every Alternative

Cloud APIs leak your data. LLMs hallucinate. Regex misses context. Open source requires an engineering team. Pick your poison—or don't.

Cloud APIs

AWS Comprehend · Google Cloud DLP

Your data leaves Snowflake, crosses the network, and lands on AWS or Google servers for processing. Per-character pricing means costs scale unpredictably. And Comprehend only supports 2 languages.

Their Limitations

Data egress to third-party servers
English/Spanish only (Comprehend)
$1-3/GB pricing explodes at scale
No pseudonymization—redact or label only
No checksum validation (false positives on random numbers)

Agent Mask Advantage

Zero data egress—runs inside Snowflake
14 languages with dedicated models
Predictable infrastructure costs
Consistent pseudonymization across columns
Luhn/checksum validation reduces false positives

LLM APIs

GPT-4 · Claude · Gemini

Run the same prompt twice, get different results. LLMs hallucinate PII that isn't there and miss PII that is. Your compliance team will love explaining that to auditors.

Their Limitations

Data sent to OpenAI/Anthropic/Google servers
Non-deterministic—different results each run
Hallucinates entities that don't exist
Returns prose, not structured positions
Per-token costs at $15-60/M tokens
No audit trail or reproducibility

Agent Mask Advantage

Data never leaves Snowflake
Deterministic—same input, same output, every time
ML detection + checksum validation combined
Returns exact character positions for each entity
Flat infrastructure costs, no token fees
Full audit trail for compliance

Snowflake AI_REDACT

Cortex Built-in Function

Snowflake's own docs call it "best-effort" redaction that "requires manual review." English-only, 4K token limit, no entity positions returned. Good for demos, not production.

Their Limitations

"Best-effort"—Snowflake says manual review required
English only (degrades on other languages)
4K token input limit, 1K output limit
US/UK/CA entities only—no EU, APAC, LATAM
No medical identifiers (NPI, MBI, DEA)
Returns redacted text only, no entity positions
"John Doe" in column A ≠ "John Doe" in column B

Agent Mask Advantage

14 languages with dedicated AI models
No token limits—process any text length
Returns exact character positions for every entity
40+ entity types including EU, APAC, LATAM PII
Full healthcare support: NPI, MBI, DEA with validation
Cross-column consistency: same person = same ID everywhere
AI models + checksum validation combined

Regex & Rule-Based

In-house Keyword Lists · Custom Scripts

"Smith & Wesson" flagged as a person. "123-45-6789" caught, but "my social is one two three..." missed entirely. You'll spend more time tuning patterns than doing actual work.

Their Limitations

No context: "Smith & Wesson" = person name
Misses spelled-out numbers and variations
Can't detect names without a dictionary
New PII format? Write another pattern
False positives on valid-format random numbers
Engineering time sink that never ends

Agent Mask Advantage

AI models understand semantic context
Checksum validation catches false positives
Knows "John Smith" ≠ "Smith & Wesson"
Catches spelled-out and formatted variations
Luhn validation rejects random number sequences
Zero pattern maintenance—models handle edge cases

Microsoft Presidio

Open Source Framework

Great foundation—we built on it. But raw Presidio has known false positive issues at production confidence scores, no checksum validation, and no Snowflake integration. You'll need a team to make it work.

Their Limitations

No checksum/format validation layer
Known false positives at 0.7+ confidence scores
No cross-column entity consistency
You build and maintain the infrastructure
No native Snowflake integration
GPU optimization? That's on you too

Agent Mask Advantage

Context-aware ML + format validation combined
ML + validation layers with conflict resolution
Cross-column pseudonymization built in
Runs in Snowflake Container Services
GPU-optimized batch processing included
Production-ready in 30 minutes, not 30 days

Data Privacy Vaults

Skyflow · Protegrity

Enterprise vaults that tokenize and encrypt structured data fields. Great for credit card storage—but they can't scan free-form text for PII you don't know is there. $100K-$200K+/year for a different problem.

Their Limitations

Structured data only—no unstructured text NLP
Can't discover unknown PII in free-form fields
Requires data migration to external vault
$100K-$200K+ annual contracts
Overkill if you just need text anonymization
No AI-based entity recognition

Agent Mask Advantage

Purpose-built for unstructured text at scale
Detects PII in any text you send it
No data migration—runs inside Snowflake
Fraction of enterprise vault pricing
Right-sized for text anonymization
AI models + checksum validation

AI-Native DLP

Nightfall · Protecto

Different tools for different jobs. Nightfall watches SaaS apps for data leakage in real-time. Protecto sanitizes prompts before they hit LLMs. Neither is designed to redact text columns in your data warehouse.

Their Limitations

Nightfall: real-time SaaS monitoring, not batch redaction
Protecto: LLM prompt sanitization, not warehouse-scale
No native Snowflake integration
Data flows through their cloud infrastructure
Per-API-call pricing at warehouse scale
No cross-column pseudonymization

Agent Mask Advantage

Purpose-built for Snowflake text redaction
Runs inside your Snowflake environment
Batch processing for warehouse-scale workloads
Predictable infrastructure costs
Cross-column consistency: same entity = same ID
14 languages with dedicated models

FAQ

Frequently Asked Questions

Most PII tools require you to send data to their service via API. Agent Mask runs natively inside your Snowflake account—your data never leaves. This eliminates data transfer risk, simplifies compliance, and keeps your sensitive information exactly where it belongs.

We detect dozens of entity types including SSNs, credit cards, emails, phone numbers, names, addresses, and industry-specific identifiers like NPI, MBI, and DEA numbers. Our context-aware detection catches sensitive data that simpler pattern-matching tools miss.

Most customers are up and running in under 30 minutes. Install from the Snowflake Marketplace, grant access to your schemas, and start detecting. No infrastructure to deploy, no agents to install.

Yes. We've built specialized detection for healthcare data including unstructured clinical text, medical transcripts, and health records. Our approach is based on proven de-identification methodology designed specifically for medical language.

Agent Mask supports major world languages across European, Asian, and Middle Eastern regions. Contact us for the current list of supported languages.

Agent Mask is designed to support HIPAA compliance. Because your data never leaves your Snowflake account, there’s no PHI transfer to external services. We’re happy to sign a BAA and discuss how Agent Mask fits into your compliance program.

You can mask (partial redaction), fully redact, replace with synthetic data, or tokenize detected PII. Anonymization strategies are configurable per entity type, so you can mask names while redacting SSNs, for example.

Yes. We offer guided proof-of-concept engagements where we help you evaluate Agent Mask on your actual data. Contact us to schedule a technical assessment.

No system is perfect, but context-aware processing significantly outperforms regex. Test on a sample first—you see exactly what gets masked before running on production data.

Configure sensitivity per entity type. Exclude specific patterns. Test on samples and adjust until output matches what you need.

Nothing. The function returns new anonymized output—your source columns stay exactly as they were.

Stop HoardingDATA

The Data Tax

Compliance Liability

Locked Analytics

Blocked AI/ML

Data Sprawl

Reclaim the Value

Comprehensive Detection

Native to Snowflake

Production Ready

Multi-Language

Flexible Output

Structured Mapping

HIPAA-Compliant Analytics

PCI-DSS and Privacy Compliance

Safe Data Sharing at Scale

FOIA and Public Records Compliance

Enterprise-Grade Specifications

Personal Identifiers

Financial

Healthcare

Digital & Location

Other

HIPAA Ready

GDPR Compliant

CCPA Ready

PCI-DSS Aligned

SOC 2 Ready

Zero Trust Architecture

What You Give Up With Every Alternative

Cloud APIs

LLM APIs

Snowflake AI_REDACT

Regex & Rule-Based

Microsoft Presidio

Data Privacy Vaults

AI-Native DLP

Frequently Asked Questions

How Much Sensitive Data Is Hiding in Your Snowflake Tables Right Now?

Install

Point

Get Clean Data

Stop Hoarding
DATA