Stop Hoarding
DATA

Your warehouse is full of PII that’s one query away from exposure. Support tickets, call transcripts, intake forms, chat logs — and now AI agents are generating sensitive text faster than humans can clean it.

Detect and de-identify it in a single function call, entirely inside Snowflake. No data egress. No external processing by Agent Mask.

99% Name detection accuracy We detect dozens of entities and even custom entities. But names are what you care about most — and the hardest to get right. 99.4% accuracy across 174K+ name spans in our 150K‑test benchmark.
14 Languages Most de‑identification tools are English‑only. Yours doesn’t have to be.
0 Data egress Your data stays in Snowflake. No external processing by Agent Mask, no data residency workarounds. The simplest compliance story you’ll ever tell.
<30min Setup to first scan Install from the Marketplace, call one function. You’re scanning real data today — not next quarter.

The Data Tax

You’re paying to store data you can’t safely use.

  • A customer shares their credit card number in a live chat to process a refund.
  • A nurse types a patient’s name into a support ticket to report a software bug.
  • An account holder reads their SSN to a rep to verify their identity.
  • An employee signs their name in an "anonymous" survey to ensure a follow-up.

Now there’s a credit card number in your chat logs. A patient identity in your help desk. A Social Security number in your call transcripts. A personal identity in your "anonymous" dataset.

That’s the data tax. You can’t safely feed it to your LLMs. Your analysts need six approvals to query it. Your ML team won’t touch it for training. It’s either a compliance liability or wasted insights—pick one.

That data has value—if you could just separate what’s sensitive from what’s useful.

Regulatory Risk

Unmasked PII = audit findings, fines, breach risk

Locked Analytics

Data teams can’t use what they can’t safely access

Blocked AI/ML

Models shouldn’t be trained on data riddled with PII

Risky Data Sprawl

PII copies spreading across dev, test, and staging

Reclaim the Value

Scan text fields and documents for hidden PII and mask it automatically—all inside Snowflake.

HIPAA-Compliant Analytics

Clinical notes, medical transcripts, discharge summaries—healthcare data is unstructured and sensitive. Agent Mask detects PII in freeform text with the precision HIPAA demands.

  • De-identify clinical text for research and analytics
  • Share with research partners without expanding BAA scope
  • Train ML models on the language, not the PII
Outcome

Share data with internal teams and external partners for research, analytics, and care coordination—without compromising patient privacy. Train AI on real clinical notes and transcripts, safely.

Discharge Summary
Margaret ChenPERSON, 67F, discharged 03/14/2024DATE following cardiac catheterization. Attending: Dr. Robert OkonkwoPERSON (NPI: 1528496379NPI). Pt to follow up with cardiology in 2 weeks. Daughter Linda ChenPERSON (415-555-0189PHONE) designated emergency contact. Insurance: Blue Cross ID 7294851036INSURANCE_ID.

PCI-DSS and Privacy Compliance

Loan applications, transaction notes, customer communications—financial data lives in documents and conversations. Agent Mask finds sensitive data wherever it hides.

  • Clean transaction data for fraud analytics
  • Enable BI teams to query without compliance risk
  • Provision safe datasets for dev and QA environments
Outcome

Run fraud models on transaction notes that were previously off-limits. Enable data-driven decisions while maintaining the regulatory compliance your business depends on.

Advisor Call Notes
Client David RamirezPERSON from Meridian Capital PartnersORG called re: wire. IBAN: DE89370400440532013000IBAN. Verified via SSN 412-68-6789SSN and DOB 11/03/1978DOB. Card 4532-7891-2345-4421CREDIT_CARD. Callback: 832-555-0147PHONE.

Safe Data Sharing at Scale

Employee feedback, user research, customer surveys—valuable data locked behind privacy concerns. Agent Mask makes it safe to share across teams.

  • De-identify employee surveys for workforce analytics
  • Clean user research before sharing with product
  • Prepare customer feedback for company-wide insights
Outcome

Turn restricted data into company-wide assets. Analyze employee feedback without exposing who said what.

Employee Survey Response
Honestly, my manager Kevin WalshPERSON has been great but the workload since MarchDATE is unsustainable. I've talked to PriyaPERSON and JamesPERSON on my team and they feel the same. I'm starting to look elsewhere. You can reach me at t.morrison@company.comEMAIL if HR wants to discuss.

FOIA and Public Records Compliance

Court filings, body cam transcripts, investigative reports—government records require redaction before release. Agent Mask automates what used to take hours of manual review.

  • Accelerate FOIA response turnaround
  • Enable public records search without exposure
  • Prepare documents for inter-agency sharing
Outcome

Prepare public records without manual review of every document. Meet disclosure deadlines without compromising privacy.

Constituent Complaint
My name is Barbara HendricksPERSON and I live at 2847 Oak Street, Apt 4BADDRESS. I'm writing about the situation at Riverside ElementaryORG. Please contact me at bhendricks@gmail.comEMAIL or 555-294-8831PHONE. My case number is GOV-2024-08472CASE_ID.
Under the Hood

Comprehensive Detection

Detecting SSNs and credit cards is the easy part — every tool does that. The hard part is everything else: ambiguous contexts where Austin is a person, not a city. Drug names buried in clinical prose. Sensitive data unique to your industry that no generic model knows to look for.

Agent Mask understands context, resolves name variants to a single identity, and lets you define custom categories in plain English.

One engine for healthcare, finance, government, and enterprise data across 14 languages.

Native to Snowflake

Runs inside your Snowflake account. No file transfers, no external processing by Agent Mask, and no additional infrastructure to manage.

Document Redaction

Submit PDFs and scanned documents. Get back extracted text with PII de-identified, plus visually redacted files with PII masked in both the text layer and the rendered image — so no one can copy-paste or extract their way around it.

Multi-Language

Detect PII across 14 languages with dedicated models for each. Your EU, APAC, and LATAM data gets the same coverage — no extra tools, no extra vendors.

Flexible De-Identification

Eight operators, configured per entity type. Every response includes a full entity mapping for audit trails and authorized re-identification.

Patient Sarah Chen (DOB: 03/15/1987, SSN: 078-05-1120) presented with recurring lower back pain and bilateral hip stiffness. Symptoms began approximately six months ago and have worsened with prolonged sitting. No history of acute trauma. Referring physician Dr. James Whitfield documented initial assessment on 01/08/2025 and noted prior conservative treatment including physical therapy and NSAIDs with limited improvement. Imaging ordered. Patient to follow up with orthopedics within two weeks. Reach Sarah Chen at sarah@acme.com or 555-867-5309 to confirm scheduling.

Your Business. Their Blind Spot.

Other tools ship with fixed lists and patterns. Your most sensitive data falls through the cracks. Describe what you're looking for in plain English—Agent Mask figures out what matches.

Semantic Inference, Not Pattern Matching

Category Definitions Reset
mrn medical record numbers (MRN)
insurance health insurance: plan names, group numbers
rx_med prescription drug names: Zoloft, Prozac, Ambien, metformin
dosage medication dosages: 50mg BID, 10mg IV push, 500mg TID
mental_dx psychiatric diagnoses: schizophrenia, OCD, anorexia, ADHD
substance substance abuse: cocaine, heroin, methamphetamine, alcohol dependence
terminal_dx terminal diagnoses: ALS, stage IV cancer, end-stage renal
genetic genetic markers and test results: BRCA2, HER2, Lynch syndrome
orientation sexual orientation: gay, lesbian, queer, LGBTQ+
immigration immigration status: visa type, undocumented, asylum
Patient Record
CONFIDENTIAL - Integrated Care Assessment Patient: Maria SantosPERSON (MRN: 847291MRN, DOB: 03/15/1978DATE) Insurance: Blue Cross PPOINSURANCE Psychiatric History: Current medications: - LexaproRX_MED - 20mg PO dailyDOSAGE for generalized anxiety disorderMENTAL_DX - KlonopinRX_MED - 0.5mg SL PRNDOSAGE for panic attacks - SeroquelRX_MED - 100mg PO QHSDOSAGE for sleep/mood Diagnoses: bipolar II disorderMENTAL_DX, post-traumatic stress disorderMENTAL_DX, persistent depressive disorderMENTAL_DX Pain Management: - PercocetRX_MED for cancer pain - SuboxoneRX_MED - 8mg/2mg SL dailyDOSAGE for opioid addictionSUBSTANCE, in remission Oncology: Diagnosis: metastatic pancreatic adenocarcinomaTERMINAL_DX Genetic testing: BRCA1GENETIC positive Social Assessment: Currently at Salvation Army shelter following eviction. Patient identifies as bisexualORIENTATION. On H-1B visaIMMIGRATION, renewal pending.
No regex. No lookup tables. Just describe what's sensitive — Agent Mask understands what you mean.

Industry Starter Kits

Pre-built. Ready to go.

Load a preset and start detecting industry-specific data immediately—diagnoses, medications, account numbers, employee IDs, and more. Mix with your own definitions for complete coverage.

Format Enforcement

Flexible detection. Strict matching.

Layer pattern rules on top of semantic detection to kill false positives. Enforce org-specific formats like MRNs, account numbers, and case IDs—the model detects, you decide what’s real.

Fine-Grained Control

Tune each type independently.

Set different sensitivity levels for different data types. Aggressive detection for medication names, strict matching for structured IDs—each with its own de-identification method, without one affecting the other.

That clinical record above? Your data looks just like it. Psychiatric diagnoses, substance history, genetic markers, immigration status — buried in free text that generic tools don’t know to look for.

Same Entity. Same Mask.

Other tools give the same person three different placeholders — and your data stops making sense. Contextual matching and cross-field consistency keep your data analytically useful.

PERSON_1 · Sarah Elizabeth Chen, Chen, Sarah
PERSON_2 · James Park, Park
PERSON_3 · Lisa Chen-Nakamura, Lisa
LOCATION_1 · Mercy General Hospital, Mercy General
Demographics
Patient: Sarah Elizabeth ChenPERSON_1 Employer: Mercy General HospitalLOCATION_1 Referred by: Dr. James ParkPERSON_2 Emergency contact: Lisa Chen-NakamuraPERSON_3 (sister)
Clinical Notes
Dr. ParkPERSON_2 referred pt for chronic migraine. ChenPERSON_1 reports worsening with aura. Seen at Mercy GeneralLOCATION_1 outpatient neuro. SarahPERSON_1 declines imaging. Sister LisaPERSON_3 present. ParkPERSON_2 to follow up in 4wk.

Reference Resolution

Every variant, one mask.

Contextual AI matches name variants, abbreviations, and acronyms that rules alone would miss—so “Sarah Elizabeth Chen”, “Chen”, and “Sarah” all collapse to a single placeholder. Deterministic normalization does the same for structured data—“(555) 123-4567” and “555.123.4567”, or “January 15, 2024” and “01/15/2024”. Your de-identified data reads like real data—not a bag of disconnected placeholders.

Corpus Consistency

Mix formats. One identity map.

Send text columns, PDFs, images, and DOCX files together and Agent Mask connects the dots across all of them. The same name always gets the same replacement, everywhere it appears—no manual alignment needed.

vs. Simple Redaction

  • Every mention = different placeholder
  • Your data loses all referential meaning
  • Useless for analytics or ML

vs. Rule-Based & Name-Part Matching

  • Exact strings or split name parts—no context
  • Can’t disambiguate “Smith” when John Smith and Jane Smith both appear
  • Names only—no locations, orgs, dates, or phone numbers

vs. Manual Review

  • Doesn't scale past dozens of records
  • Human reviewers miss cross-column links
  • Can't link name variants to the same person

Built for Production

Technical Capabilities
Supported Entities
Dozens of built-in entity types across personal, financial, healthcare, and digital categories—plus unlimited custom types
Languages
14 languages with dedicated models for each
Detection
Context-aware AI — 97% person name detection, 94% overall NER quality across 17 locales
Processing
GPU-optimized batch processing for high-volume workloads
Deployment
Snowflake Native App—no external infrastructure
Data Residency
All processing within your Snowflake account
De-Identification
Pseudonymization, masking, hashing, encryption, synthetic data (Faker), redaction, keep (detect-only)
Document Formats
PDF (text & scanned/OCR), DOCX — visual redaction with bounding-box metadata
Entity Collapsing
AI-driven coreference for names, places, and orgs (strict / moderate / broad threshold) + deterministic normalization for everything else

Personal Identifiers

Person Email Phone SSN ITIN Driver's License Passport

Financial

Credit Card Bank Account

Healthcare

NPI MBI DEA Health Plan ID Date

Digital & Location

IP Address URL ZIP Code Location

Organizations & Groups

Organization Religion, Nationality, Political Affiliation

Your Custom Entities

Define domain-specific entity types with natural language, let the model do the rest. Ship with industry starter kits or build your own.

See custom detection ↑
HIPAA

Safe Harbor De-identification

Masks names, dates, contacts, SSNs, and medical record numbers across clinical text.

GDPR

EU Personal Data

Pseudonymization and data minimization across 14 supported languages.

CCPA

Consumer Privacy

Detects and de-identifies California-regulated personal information in unstructured text.

PCI

Payment Data

Detects and masks credit card numbers, expiration dates, and CVVs.

SOC2

Audit-Ready Controls

No data egress, permission-scoped access, and deterministic processing. One less thing for the auditor to flag.

Zero Trust Architecture

Agent Mask runs inside your Snowflake account. We do not receive, store, or process your data outside your environment. The app uses only the permissions you grant—nothing more.

Batch-First. Snowflake-Native. Expanding.

Call Agent Mask from dbt models, scheduled tasks, or batch queries — it only touches the fields you hand it. No scanning. No crawling. Today, that means Snowflake-native de-identification with no extra infrastructure to run.

What You Give Up With Every Alternative

LLM APIs look easy until token costs, parsing, and auditability show up. Cloud APIs need pipelines. Enterprise platforms weren't built for free text.

You're probably not using any of these.

Most teams aren't switching from another tool—they're doing nothing: sitting on unprotected PII and hoping no audit or breach exposes it. Every month without coverage is another month your data stays risky, restricted, and unusable for AI. Agent Mask makes it easy to start today.

LLM APIs

GPT · Claude · Gemini · Bedrock

You probably thought of using an LLM already. Send text to GPT, Claude, or Gemini and prompt-engineer it to redact the PII. Works great for a demo. Then you do the math on per-token costs at production volume. And shipping your most sensitive data to a third-party API doesn't sit right. Oh, and they just raised their prices. Again.

Their Limitations
  • Metered per-token pricing. The prompts, the inputs, the outputs, the retries—costs explode at warehouse scale
  • Export text from Snowflake, send it to the API, parse the response, and write results back; most of your focus is on deploying and maintaining yet another bespoke pipeline.
  • Non-deterministic output can vary between runs, or miss real PII
Agent Mask Advantage
  • Predictable pricing based on Snowflake compute not per-token
  • Snowflake-native function—process text inside your account with no external pipeline
  • Deterministic output with exact character positions and reproducible audit trails

Google Sensitive Data Protection

Cloud DLP

Google Sensitive Data Protection is a serious platform: a deep library of built-in detectors, custom infoTypes, deterministic tokenization, and mature DLP controls. If your data already lives in Google Cloud, this one belongs on the shortlist. If your data lives in Snowflake, you still need to move it through a Google Cloud workflow and stitch the results back into your warehouse.

Their Limitations
  • Snowflake data has to be exported or streamed to Google, processed through DLP jobs or API calls, then loaded back
  • Usage-based per-GB pricing adds a second meter for inspection, transformation, storage, and orchestration
  • Deterministic tokenization preserves identical strings, but it does not resolve aliases like "John Smith," "Dr. Smith," and "Smith" as one person
Agent Mask Advantage
  • Snowflake-native function—no extra infrastructure, pipeline, templates, keys, or writeback path to operate
  • Snowflake compute pricing model and Marketplace procurement keep the workflow tied to the warehouse you already control
  • Coreference-aware pseudonymization with entity spans—the same real-world identity gets the same replacement, even when mentions vary

AWS Comprehend

PII Detection API

AWS Comprehend PII is useful if your stack already runs on AWS and you need English or Spanish PII offsets or redaction. But it requires stitching together more cloud infrastructure: IAM, S3 or API jobs, retries, parsing, and writeback. Custom entities are a separate trained-model path, not something you describe in the de-identification call.

Their Limitations
  • Snowflake teams still build AWS plumbing: IAM, S3 or API jobs, retry handling, output parsing, and writeback
  • Comprehend PII is English and Spanish with fixed PII types; Comprehend Medical is a separate English clinical service
  • Custom entities require annotations or entity lists to train a separate recognizer, not a plain-English description in the masking request
Agent Mask Advantage

John Snow Labs

Healthcare NLP

John Snow Labs is a serious healthcare NLP platform: 96% F1 on clinical PHI, 50+ medical entity types, deployment across Databricks, Snowflake, Azure, and AWS. Health systems running a full clinical AI program get real value from that depth. But that depth ships across multiple products, each with its own price tag and limited integration options.

Their Limitations
  • Healthcare-first by design—entity coverage and tuning assume clinical text, so fintech, retail, HR, and SaaS warehouses sit outside the optimization target
  • On Snowflake Marketplace, clinical text and DICOM de-identification ship as separate listings; a mixed text + image workflow means stitching multiple JSL products
  • Custom entities require annotation projects, zero-shot labels, custom models, or rules, not a plain-English description in the masking call
Agent Mask Advantage
  • Domain-agnostic out of the box—clinical, financial, retail, and custom categories side by side in one setup
  • One Snowflake-native workflow for text columns and supported files, installed from a single Marketplace listing
  • Custom entities in plain English—describe what to detect in the masking call, no annotation projects, labels, models, or rules to set up

Snowflake AI_REDACT

Cortex Built-in Function

Snowflake's built-in option. Their docs say it "works best with well-formed English text." Convenient for a quick demo, but the gaps show fast in production.

Their Limitations
  • Very limited. 4K token limit on input and output combined; 1K token output cap
  • English-optimized only—Currently supports only US PII and some UK and Canadian PII
  • No pseudonymization or other advanced operators, or cross-column consistency
Agent Mask Advantage
  • No token limits—process documents of any length
  • 14 languages with dedicated models, broader regional coverage, and custom entity types
  • Pseudonymization and other advanced replacements, and cross-column coreference resolution

Data Privacy Vaults

Skyflow · Protegrity

Enterprise platforms built around structured-data governance—Skyflow’s privacy vault for tokenized PII columns, Protegrity’s field-level protection across enterprise data stores. Both have layered on unstructured-text capabilities recently, but their flagship products were designed for known PII in fixed schemas. Expect enterprise sales cycles, governance rollouts, and six-figure annual contracts—for a platform built around a different problem than yours.

Their Limitations
  • Flagship products are structured-data platforms—Skyflow’s privacy vault and Protegrity’s field-level tokenization; unstructured-text capabilities are recent layers on top, not the core engineering focus
  • Protegrity: external function calls route data out of Snowflake for processing
  • Enterprise procurement path—security review, governance approval, and professional-services engagement—before the first query runs
Agent Mask Advantage
  • Purpose-built for unstructured text with custom entity types, coreference resolution, and cross-column consistency
  • Runs inside Snowflake—no vault infrastructure, no data egress
  • Focused unstructured-text de-identification—Snowflake Marketplace install, no vault to provision, no field-mapping engagement

Stop Paying the Data Tax

You already know your data is filled with PII. You’re already paying the data tax. Get it de-identified this afternoon. Not next quarter. Not after a six-month integration. Today.

Setup in under 30 minutes No infrastructure to deploy Free proof of concept

You're already on Snowflake—that's the hard part done.

1

Install

Get Agent Mask from the Marketplace. Grant access to your schemas.

2

Point

Pass your text columns through the function.

3

Get Clean Data

Receive de-identified output with PII replaced. Your original stays intact.

No $200K enterprise contracts. No per-character API fees. Start with a free proof of concept on your actual data.

Talk to a human

Enterprise deals, BAAs, custom integration work, or anything beyond the Snowflake Marketplace. Tell us what you need to de-identify and where Agent Mask needs to fit.

Response
Within one business day

Usually faster for Marketplace access, BAA questions, and Snowflake deployment blockers.

Useful context
Your data type and target workflow

Clinical notes, tickets, call transcripts, LLM pipelines, or anything else packed with unstructured PII.

Other deployments
Need another deployment target?

Tell us what environment you need and what would make Snowflake-native coverage incomplete.

Direct email
info@agentmask.io

Same inbox, same humans, no mailing list.

Secure intake Takes about 30 seconds

We only use this to reply. No newsletter, no third-party list.