Stop Hoarding
DATA

PII hides everywhere—support tickets, clinical notes, chat logs. You don't see it until there's a breach. Pass it through one function. Get clean data back.

The Data Tax

You’re paying to store data you can’t safely use.

  • A customer shares their credit card number in a live chat to process a refund.
  • A nurse types a patient’s name into a support ticket to report a software bug.
  • An account holder reads their SSN to a rep to verify their identity.
  • An employee signs their name in an "anonymous" survey to ensure a follow-up.

Now there’s a credit card number in your chat logs. A patient identity in your help desk. A Social Security number in your call transcripts. A personal identity in your "anonymous" dataset.

That data is now trapped. You can’t safely feed it to your LLMs. You can’t run analytics on it. You can’t use it for training or research without creating massive risk.

It’s either a compliance liability or wasted insights—pick one.

That data has value—if you could just separate what’s sensitive from what’s useful.

Compliance Liability

Unmasked PII = audit findings, fines, breach risk

Locked Analytics

Data teams can’t use what they can’t safely access

Blocked AI/ML

Models can’t train on data riddled with PII

Data Sprawl

PII copies spreading across dev, test, and staging

Reclaim the Value

Scan text fields for hidden PII and mask it automatically—all inside Snowflake.

Comprehensive Detection

From obvious SSNs to names hidden in messy transcripts, Agent Mask catches what simpler tools can’t. One solution for healthcare, finance, government, and enterprise data.

Native to Snowflake

Your data never leaves your environment. No transfers, no third-party exposure, no additional infrastructure to manage.

Production Ready

GPU-optimized batch processing handles enterprise workloads. Detect and anonymize millions of records without breaking a sweat.

Multi-Language

Detect PII in English, Spanish, German, French, and more. Global data, single solution.

Flexible Output

Mask, redact, or replace. Configure anonymization per entity type and use case.

Structured Mapping

Know exactly what was found and where. Get entity positions and types, not just redacted text.

HIPAA-Compliant Analytics

Clinical notes, medical transcripts, discharge summaries—healthcare data is unstructured and sensitive. Agent Mask detects PHI in freeform text with the precision HIPAA demands.

  • De-identify clinical text for research and analytics
  • Share with research partners without expanding BAA scope
  • Train ML models on real patient language, not PII
Outcome

Share data with internal teams and external partners for research, analytics, and care coordination—without compromising patient privacy. Train AI on real clinical notes and transcripts, safely.

Discharge Summary
Margaret ChenPERSON, 67F, discharged 03/14/2024DATE following cardiac catheterization. Attending: Dr. Robert OkonkwoPERSON (NPI: 1528496372NPI). Pt to follow up with cardiology in 2 weeks. Daughter Linda ChenPERSON (415-555-0189PHONE) designated emergency contact. Insurance: Blue Cross ID 7294851036INSURANCE_ID.

PCI-DSS and Privacy Compliance

Loan applications, transaction notes, customer communications—financial data lives in documents and conversations. Agent Mask finds sensitive data wherever it hides.

  • Clean transaction data for fraud analytics
  • Enable BI teams to query without compliance risk
  • Provision safe datasets for dev and QA environments
Outcome

Run fraud models on transaction notes that were previously off-limits. Enable data-driven decisions while maintaining the regulatory compliance your business depends on.

Advisor Call Notes
Client David RamirezPERSON from Meridian Capital PartnersORG called re: wire. IBAN: DE89370400440532013000IBAN. Verified via SSN 412-68-6789SSN and DOB 11/03/1978DOB. Card 4532-7891-2345-4421CREDIT_CARD. Callback: 832-555-0147PHONE.

Safe Data Sharing at Scale

Employee feedback, user research, customer surveys—valuable data locked behind privacy concerns. Agent Mask makes it safe to share across teams.

  • De-identify employee surveys for workforce analytics
  • Clean user research before sharing with product
  • Prepare customer feedback for company-wide insights
Outcome

Turn restricted data into company-wide assets. Analyze employee feedback without exposing who said what.

Employee Survey Response
Honestly, my manager Kevin WalshPERSON has been great but the workload since MarchDATE is unsustainable. I've talked to PriyaPERSON and JamesPERSON on my team and they feel the same. I'm starting to look elsewhere. You can reach me at t.morrison@company.comEMAIL if HR wants to discuss.

FOIA and Public Records Compliance

Court filings, body cam transcripts, investigative reports—government records require redaction before release. Agent Mask automates what used to take hours of manual review.

  • Accelerate FOIA response turnaround
  • Enable public records search without exposure
  • Prepare documents for inter-agency sharing
Outcome

Prepare public records without manual review of every document. Meet disclosure deadlines without compromising privacy.

Constituent Complaint
My name is Barbara HendricksPERSON and I live at 2847 Oak Street, Apt 4BADDRESS. I'm writing about the situation at Riverside ElementaryORG. Please contact me at bhendricks@gmail.comEMAIL or 555-294-8831PHONE. My case number is GOV-2024-08472CASE_ID.

Enterprise-Grade Specifications

Technical Capabilities
Supported Entities
SSN, credit cards, emails, phones, names, addresses, medical IDs, financial IDs, and more
Languages
Major world languages — European, Asian, Middle Eastern
Detection
Context-aware AI with high accuracy on unstructured text
Processing
GPU-optimized batch processing for high-volume workloads
Deployment
Snowflake Native App—no external infrastructure
Data Residency
All processing within your Snowflake account
Anonymization
Masking, redaction, replacement, tokenization

Personal Identifiers

SSN Name Email Phone Address DOB Driver's License Passport

Financial

Credit Card Bank Account IBAN Tax ID Account Numbers

Healthcare

NPI MBI DEA MRN Dates PHI

Digital & Location

IP Address URL Username Device ID ZIP Code GPS

Other

VIN License Plate National ID
HIPAA

HIPAA Ready

Healthcare data protection. BAA support and PHI detection.

GDPR

GDPR Compliant

EU data types, right to erasure, data minimization.

CCPA

CCPA Ready

California consumer data protection and disclosure.

PCI

PCI-DSS Aligned

Credit card detection and masking for payments.

SOC2

SOC 2 Ready

Built with SOC 2 controls for enterprise security.

Zero Trust Architecture

Agent Mask operates on a zero-trust model. We never see your data, never store your data, never have access to your data. The application runs in your Snowflake environment with the permissions you grant—nothing more.

What You Give Up With Every Alternative

Cloud APIs leak your data. LLMs hallucinate. Regex misses context. Open source requires an engineering team. Pick your poison—or don't.

Cloud APIs

AWS Comprehend · Google Cloud DLP

Your data leaves Snowflake, crosses the network, and lands on AWS or Google servers for processing. Per-character pricing means costs scale unpredictably. And Comprehend only supports 2 languages.

Their Limitations
  • Data egress to third-party servers
  • English/Spanish only (Comprehend)
  • $1-3/GB pricing explodes at scale
  • No pseudonymization—redact or label only
  • No checksum validation (false positives on random numbers)
Agent Mask Advantage
  • Zero data egress—runs inside Snowflake
  • 14 languages with dedicated models
  • Predictable infrastructure costs
  • Consistent pseudonymization across columns
  • Luhn/checksum validation reduces false positives

LLM APIs

GPT-4 · Claude · Gemini

Run the same prompt twice, get different results. LLMs hallucinate PII that isn't there and miss PII that is. Your compliance team will love explaining that to auditors.

Their Limitations
  • Data sent to OpenAI/Anthropic/Google servers
  • Non-deterministic—different results each run
  • Hallucinates entities that don't exist
  • Returns prose, not structured positions
  • Per-token costs at $15-60/M tokens
  • No audit trail or reproducibility
Agent Mask Advantage
  • Data never leaves Snowflake
  • Deterministic—same input, same output, every time
  • ML detection + checksum validation combined
  • Returns exact character positions for each entity
  • Flat infrastructure costs, no token fees
  • Full audit trail for compliance

Snowflake AI_REDACT

Cortex Built-in Function

Snowflake's own docs call it "best-effort" redaction that "requires manual review." English-only, 4K token limit, no entity positions returned. Good for demos, not production.

Their Limitations
  • "Best-effort"—Snowflake says manual review required
  • English only (degrades on other languages)
  • 4K token input limit, 1K output limit
  • US/UK/CA entities only—no EU, APAC, LATAM
  • No medical identifiers (NPI, MBI, DEA)
  • Returns redacted text only, no entity positions
  • "John Doe" in column A ≠ "John Doe" in column B
Agent Mask Advantage
  • 14 languages with dedicated AI models
  • No token limits—process any text length
  • Returns exact character positions for every entity
  • 40+ entity types including EU, APAC, LATAM PII
  • Full healthcare support: NPI, MBI, DEA with validation
  • Cross-column consistency: same person = same ID everywhere
  • AI models + checksum validation combined

Regex & Rule-Based

In-house Keyword Lists · Custom Scripts

"Smith & Wesson" flagged as a person. "123-45-6789" caught, but "my social is one two three..." missed entirely. You'll spend more time tuning patterns than doing actual work.

Their Limitations
  • No context: "Smith & Wesson" = person name
  • Misses spelled-out numbers and variations
  • Can't detect names without a dictionary
  • New PII format? Write another pattern
  • False positives on valid-format random numbers
  • Engineering time sink that never ends
Agent Mask Advantage
  • AI models understand semantic context
  • Checksum validation catches false positives
  • Knows "John Smith" ≠ "Smith & Wesson"
  • Catches spelled-out and formatted variations
  • Luhn validation rejects random number sequences
  • Zero pattern maintenance—models handle edge cases

Microsoft Presidio

Open Source Framework

Great foundation—we built on it. But raw Presidio has known false positive issues at production confidence scores, no checksum validation, and no Snowflake integration. You'll need a team to make it work.

Their Limitations
  • No checksum/format validation layer
  • Known false positives at 0.7+ confidence scores
  • No cross-column entity consistency
  • You build and maintain the infrastructure
  • No native Snowflake integration
  • GPU optimization? That's on you too
Agent Mask Advantage
  • Context-aware ML + format validation combined
  • ML + validation layers with conflict resolution
  • Cross-column pseudonymization built in
  • Runs in Snowflake Container Services
  • GPU-optimized batch processing included
  • Production-ready in 30 minutes, not 30 days

Data Privacy Vaults

Skyflow · Protegrity

Enterprise vaults that tokenize and encrypt structured data fields. Great for credit card storage—but they can't scan free-form text for PII you don't know is there. $100K-$200K+/year for a different problem.

Their Limitations
  • Structured data only—no unstructured text NLP
  • Can't discover unknown PII in free-form fields
  • Requires data migration to external vault
  • $100K-$200K+ annual contracts
  • Overkill if you just need text anonymization
  • No AI-based entity recognition
Agent Mask Advantage
  • Purpose-built for unstructured text at scale
  • Detects PII in any text you send it
  • No data migration—runs inside Snowflake
  • Fraction of enterprise vault pricing
  • Right-sized for text anonymization
  • AI models + checksum validation

AI-Native DLP

Nightfall · Protecto

Different tools for different jobs. Nightfall watches SaaS apps for data leakage in real-time. Protecto sanitizes prompts before they hit LLMs. Neither is designed to redact text columns in your data warehouse.

Their Limitations
  • Nightfall: real-time SaaS monitoring, not batch redaction
  • Protecto: LLM prompt sanitization, not warehouse-scale
  • No native Snowflake integration
  • Data flows through their cloud infrastructure
  • Per-API-call pricing at warehouse scale
  • No cross-column pseudonymization
Agent Mask Advantage
  • Purpose-built for Snowflake text redaction
  • Runs inside your Snowflake environment
  • Batch processing for warehouse-scale workloads
  • Predictable infrastructure costs
  • Cross-column consistency: same entity = same ID
  • 14 languages with dedicated models

Frequently Asked Questions

Most PII tools require you to send data to their service via API. Agent Mask runs natively inside your Snowflake account—your data never leaves. This eliminates data transfer risk, simplifies compliance, and keeps your sensitive information exactly where it belongs.
We detect dozens of entity types including SSNs, credit cards, emails, phone numbers, names, addresses, and industry-specific identifiers like NPI, MBI, and DEA numbers. Our context-aware detection catches sensitive data that simpler pattern-matching tools miss.
Most customers are up and running in under 30 minutes. Install from the Snowflake Marketplace, grant access to your schemas, and start detecting. No infrastructure to deploy, no agents to install.
Yes. We've built specialized detection for healthcare data including unstructured clinical text, medical transcripts, and health records. Our approach is based on proven de-identification methodology designed specifically for medical language.
Agent Mask supports major world languages across European, Asian, and Middle Eastern regions. Contact us for the current list of supported languages.
Agent Mask is designed to support HIPAA compliance. Because your data never leaves your Snowflake account, there’s no PHI transfer to external services. We’re happy to sign a BAA and discuss how Agent Mask fits into your compliance program.
You can mask (partial redaction), fully redact, replace with synthetic data, or tokenize detected PII. Anonymization strategies are configurable per entity type, so you can mask names while redacting SSNs, for example.
Yes. We offer guided proof-of-concept engagements where we help you evaluate Agent Mask on your actual data. Contact us to schedule a technical assessment.
No system is perfect, but context-aware processing significantly outperforms regex. Test on a sample first—you see exactly what gets masked before running on production data.
Configure sensitivity per entity type. Exclude specific patterns. Test on samples and adjust until output matches what you need.
Nothing. The function returns new anonymized output—your source columns stay exactly as they were.

How Much Sensitive Data Is Hiding in Your Snowflake Tables Right Now?

Every organization we've tested found PII they didn't know they had. The question isn't whether it's there—it's whether you find it before an auditor does.

Setup in under 30 minutes No infrastructure to deploy Free proof-of-concept available

You're already on Snowflake—that's the hard part done.

1

Install

Get Agent Mask from the Marketplace. Grant access to your schemas.

2

Point

Pass your text columns through the function—support tickets, clinical notes, survey responses, whatever contains unstructured data.

3

Get Clean Data

Receive anonymized output with PII replaced. Your original stays intact.

Start with a free proof of concept on your actual data.