Stop Hoarding
DATA
Your warehouse is full of PII that’s one query away from exposure. Support tickets, call transcripts, intake forms, chat logs — and now AI agents are generating sensitive text faster than humans can clean it.
Detect and de-identify it in a single function call, entirely inside Snowflake. No data egress. No external processing by Agent Mask.
The Data Tax
You’re paying to store data you can’t safely use.
- A customer shares their credit card number in a live chat to process a refund.
- A nurse types a patient’s name into a support ticket to report a software bug.
- An account holder reads their SSN to a rep to verify their identity.
- An employee signs their name in an "anonymous" survey to ensure a follow-up.
Now there’s a credit card number in your chat logs. A patient identity in your help desk. A Social Security number in your call transcripts. A personal identity in your "anonymous" dataset.
That’s the data tax. You can’t safely feed it to your LLMs. Your analysts need six approvals to query it. Your ML team won’t touch it for training. It’s either a compliance liability or wasted insights—pick one.
That data has value—if you could just separate what’s sensitive from what’s useful.
Regulatory Risk
Unmasked PII = audit findings, fines, breach risk
Locked Analytics
Data teams can’t use what they can’t safely access
Blocked AI/ML
Models shouldn’t be trained on data riddled with PII
Risky Data Sprawl
PII copies spreading across dev, test, and staging
Reclaim the Value
Scan text fields and documents for hidden PII and mask it automatically—all inside Snowflake.
HIPAA-Compliant Analytics
Clinical notes, medical transcripts, discharge summaries—healthcare data is unstructured and sensitive. Agent Mask detects PII in freeform text with the precision HIPAA demands.
- De-identify clinical text for research and analytics
- Share with research partners without expanding BAA scope
- Train ML models on the language, not the PII
Share data with internal teams and external partners for research, analytics, and care coordination—without compromising patient privacy. Train AI on real clinical notes and transcripts, safely.
PCI-DSS and Privacy Compliance
Loan applications, transaction notes, customer communications—financial data lives in documents and conversations. Agent Mask finds sensitive data wherever it hides.
- Clean transaction data for fraud analytics
- Enable BI teams to query without compliance risk
- Provision safe datasets for dev and QA environments
Run fraud models on transaction notes that were previously off-limits. Enable data-driven decisions while maintaining the regulatory compliance your business depends on.
Safe Data Sharing at Scale
Employee feedback, user research, customer surveys—valuable data locked behind privacy concerns. Agent Mask makes it safe to share across teams.
- De-identify employee surveys for workforce analytics
- Clean user research before sharing with product
- Prepare customer feedback for company-wide insights
Turn restricted data into company-wide assets. Analyze employee feedback without exposing who said what.
FOIA and Public Records Compliance
Court filings, body cam transcripts, investigative reports—government records require redaction before release. Agent Mask automates what used to take hours of manual review.
- Accelerate FOIA response turnaround
- Enable public records search without exposure
- Prepare documents for inter-agency sharing
Prepare public records without manual review of every document. Meet disclosure deadlines without compromising privacy.
Comprehensive Detection
Detecting SSNs and credit cards is the easy part — every tool does that. The hard part is everything else: ambiguous contexts where Austin is a person, not a city. Drug names buried in clinical prose. Sensitive data unique to your industry that no generic model knows to look for.
Agent Mask understands context, resolves name variants to a single identity, and lets you define custom categories in plain English.
One engine for healthcare, finance, government, and enterprise data across 14 languages.
Native to Snowflake
Runs inside your Snowflake account. No file transfers, no external processing by Agent Mask, and no additional infrastructure to manage.
Document Redaction
Submit PDFs and scanned documents. Get back extracted text with PII de-identified, plus visually redacted files with PII masked in both the text layer and the rendered image — so no one can copy-paste or extract their way around it.
Multi-Language
Detect PII across 14 languages with dedicated models for each. Your EU, APAC, and LATAM data gets the same coverage — no extra tools, no extra vendors.
Flexible De-Identification
Eight operators, configured per entity type. Every response includes a full entity mapping for audit trails and authorized re-identification.
Your Business. Their Blind Spot.
Other tools ship with fixed lists and patterns. Your most sensitive data falls through the cracks. Describe what you're looking for in plain English—Agent Mask figures out what matches.
Semantic Inference, Not Pattern Matching
mrn medical record numbers (MRN)insurance health insurance: plan names, group numbersrx_med prescription drug names: Zoloft, Prozac, Ambien, metformindosage medication dosages: 50mg BID, 10mg IV push, 500mg TIDmental_dx psychiatric diagnoses: schizophrenia, OCD, anorexia, ADHDsubstance substance abuse: cocaine, heroin, methamphetamine, alcohol dependenceterminal_dx terminal diagnoses: ALS, stage IV cancer, end-stage renalgenetic genetic markers and test results: BRCA2, HER2, Lynch syndromeorientation sexual orientation: gay, lesbian, queer, LGBTQ+immigration immigration status: visa type, undocumented, asylum
Industry Starter Kits
Pre-built. Ready to go.Load a preset and start detecting industry-specific data immediately—diagnoses, medications, account numbers, employee IDs, and more. Mix with your own definitions for complete coverage.
Format Enforcement
Flexible detection. Strict matching.Layer pattern rules on top of semantic detection to kill false positives. Enforce org-specific formats like MRNs, account numbers, and case IDs—the model detects, you decide what’s real.
Fine-Grained Control
Tune each type independently.Set different sensitivity levels for different data types. Aggressive detection for medication names, strict matching for structured IDs—each with its own de-identification method, without one affecting the other.
That clinical record above? Your data looks just like it. Psychiatric diagnoses, substance history, genetic markers, immigration status — buried in free text that generic tools don’t know to look for.
Same Entity. Same Mask.
Other tools give the same person three different placeholders — and your data stops making sense. Contextual matching and cross-field consistency keep your data analytically useful.
Reference Resolution
Every variant, one mask.Contextual AI matches name variants, abbreviations, and acronyms that rules alone would miss—so “Sarah Elizabeth Chen”, “Chen”, and “Sarah” all collapse to a single placeholder. Deterministic normalization does the same for structured data—“(555) 123-4567” and “555.123.4567”, or “January 15, 2024” and “01/15/2024”. Your de-identified data reads like real data—not a bag of disconnected placeholders.
Corpus Consistency
Mix formats. One identity map.Send text columns, PDFs, images, and DOCX files together and Agent Mask connects the dots across all of them. The same name always gets the same replacement, everywhere it appears—no manual alignment needed.
vs. Simple Redaction
- Every mention = different placeholder
- Your data loses all referential meaning
- Useless for analytics or ML
vs. Rule-Based & Name-Part Matching
- Exact strings or split name parts—no context
- Can’t disambiguate “Smith” when John Smith and Jane Smith both appear
- Names only—no locations, orgs, dates, or phone numbers
vs. Manual Review
- Doesn't scale past dozens of records
- Human reviewers miss cross-column links
- Can't link name variants to the same person
Built for Production
Personal Identifiers
Financial
Healthcare
Digital & Location
Organizations & Groups
Your Custom Entities
Define domain-specific entity types with natural language, let the model do the rest. Ship with industry starter kits or build your own.
See custom detection ↑Safe Harbor De-identification
Masks names, dates, contacts, SSNs, and medical record numbers across clinical text.
EU Personal Data
Pseudonymization and data minimization across 14 supported languages.
Consumer Privacy
Detects and de-identifies California-regulated personal information in unstructured text.
Payment Data
Detects and masks credit card numbers, expiration dates, and CVVs.
Audit-Ready Controls
No data egress, permission-scoped access, and deterministic processing. One less thing for the auditor to flag.
Zero Trust Architecture
Agent Mask runs inside your Snowflake account. We do not receive, store, or process your data outside your environment. The app uses only the permissions you grant—nothing more.
Batch-First. Snowflake-Native. Expanding.
Call Agent Mask from dbt models, scheduled tasks, or batch queries — it only touches the fields you hand it. No scanning. No crawling. Today, that means Snowflake-native de-identification with no extra infrastructure to run.
What You Give Up With Every Alternative
LLM APIs look easy until token costs, parsing, and auditability show up. Cloud APIs need pipelines. Enterprise platforms weren't built for free text.
You're probably not using any of these.
Most teams aren't switching from another tool—they're doing nothing: sitting on unprotected PII and hoping no audit or breach exposes it. Every month without coverage is another month your data stays risky, restricted, and unusable for AI. Agent Mask makes it easy to start today.
LLM APIs
GPT · Claude · Gemini · BedrockYou probably thought of using an LLM already. Send text to GPT, Claude, or Gemini and prompt-engineer it to redact the PII. Works great for a demo. Then you do the math on per-token costs at production volume. And shipping your most sensitive data to a third-party API doesn't sit right. Oh, and they just raised their prices. Again.
- Metered per-token pricing. The prompts, the inputs, the outputs, the retries—costs explode at warehouse scale
- Export text from Snowflake, send it to the API, parse the response, and write results back; most of your focus is on deploying and maintaining yet another bespoke pipeline.
- Non-deterministic output can vary between runs, or miss real PII
- Predictable pricing based on Snowflake compute not per-token
- Snowflake-native function—process text inside your account with no external pipeline
- Deterministic output with exact character positions and reproducible audit trails
Google Sensitive Data Protection
Cloud DLPGoogle Sensitive Data Protection is a serious platform: a deep library of built-in detectors, custom infoTypes, deterministic tokenization, and mature DLP controls. If your data already lives in Google Cloud, this one belongs on the shortlist. If your data lives in Snowflake, you still need to move it through a Google Cloud workflow and stitch the results back into your warehouse.
- Snowflake data has to be exported or streamed to Google, processed through DLP jobs or API calls, then loaded back
- Usage-based per-GB pricing adds a second meter for inspection, transformation, storage, and orchestration
- Deterministic tokenization preserves identical strings, but it does not resolve aliases like "John Smith," "Dr. Smith," and "Smith" as one person
- Snowflake-native function—no extra infrastructure, pipeline, templates, keys, or writeback path to operate
- Snowflake compute pricing model and Marketplace procurement keep the workflow tied to the warehouse you already control
- Coreference-aware pseudonymization with entity spans—the same real-world identity gets the same replacement, even when mentions vary
AWS Comprehend
PII Detection APIAWS Comprehend PII is useful if your stack already runs on AWS and you need English or Spanish PII offsets or redaction. But it requires stitching together more cloud infrastructure: IAM, S3 or API jobs, retries, parsing, and writeback. Custom entities are a separate trained-model path, not something you describe in the de-identification call.
- Snowflake teams still build AWS plumbing: IAM, S3 or API jobs, retry handling, output parsing, and writeback
- Comprehend PII is English and Spanish with fixed PII types; Comprehend Medical is a separate English clinical service
- Custom entities require annotations or entity lists to train a separate recognizer, not a plain-English description in the masking request
- One Snowflake function workflow for text columns and documents—no AWS job orchestration
- Built-in and plain-English custom entity detection across 14 languages
- Custom entity support—describe what you want to detect in plain English
John Snow Labs
Healthcare NLPJohn Snow Labs is a serious healthcare NLP platform: 96% F1 on clinical PHI, 50+ medical entity types, deployment across Databricks, Snowflake, Azure, and AWS. Health systems running a full clinical AI program get real value from that depth. But that depth ships across multiple products, each with its own price tag and limited integration options.
- Healthcare-first by design—entity coverage and tuning assume clinical text, so fintech, retail, HR, and SaaS warehouses sit outside the optimization target
- On Snowflake Marketplace, clinical text and DICOM de-identification ship as separate listings; a mixed text + image workflow means stitching multiple JSL products
- Custom entities require annotation projects, zero-shot labels, custom models, or rules, not a plain-English description in the masking call
- Domain-agnostic out of the box—clinical, financial, retail, and custom categories side by side in one setup
- One Snowflake-native workflow for text columns and supported files, installed from a single Marketplace listing
- Custom entities in plain English—describe what to detect in the masking call, no annotation projects, labels, models, or rules to set up
Snowflake AI_REDACT
Cortex Built-in FunctionSnowflake's built-in option. Their docs say it "works best with well-formed English text." Convenient for a quick demo, but the gaps show fast in production.
- Very limited. 4K token limit on input and output combined; 1K token output cap
- English-optimized only—Currently supports only US PII and some UK and Canadian PII
- No pseudonymization or other advanced operators, or cross-column consistency
- No token limits—process documents of any length
- 14 languages with dedicated models, broader regional coverage, and custom entity types
- Pseudonymization and other advanced replacements, and cross-column coreference resolution
Data Privacy Vaults
Skyflow · ProtegrityEnterprise platforms built around structured-data governance—Skyflow’s privacy vault for tokenized PII columns, Protegrity’s field-level protection across enterprise data stores. Both have layered on unstructured-text capabilities recently, but their flagship products were designed for known PII in fixed schemas. Expect enterprise sales cycles, governance rollouts, and six-figure annual contracts—for a platform built around a different problem than yours.
- Flagship products are structured-data platforms—Skyflow’s privacy vault and Protegrity’s field-level tokenization; unstructured-text capabilities are recent layers on top, not the core engineering focus
- Protegrity: external function calls route data out of Snowflake for processing
- Enterprise procurement path—security review, governance approval, and professional-services engagement—before the first query runs
- Purpose-built for unstructured text with custom entity types, coreference resolution, and cross-column consistency
- Runs inside Snowflake—no vault infrastructure, no data egress
- Focused unstructured-text de-identification—Snowflake Marketplace install, no vault to provision, no field-mapping engagement
Stop Paying the Data Tax
You already know your data is filled with PII. You’re already paying the data tax. Get it de-identified this afternoon. Not next quarter. Not after a six-month integration. Today.
You're already on Snowflake—that's the hard part done.
Install
Get Agent Mask from the Marketplace. Grant access to your schemas.
Point
Pass your text columns through the function.
Get Clean Data
Receive de-identified output with PII replaced. Your original stays intact.
No $200K enterprise contracts. No per-character API fees. Start with a free proof of concept on your actual data.
Talk to a human
Enterprise deals, BAAs, custom integration work, or anything beyond the Snowflake Marketplace. Tell us what you need to de-identify and where Agent Mask needs to fit.
Usually faster for Marketplace access, BAA questions, and Snowflake deployment blockers.
Clinical notes, tickets, call transcripts, LLM pipelines, or anything else packed with unstructured PII.
Tell us what environment you need and what would make Snowflake-native coverage incomplete.
Same inbox, same humans, no mailing list.