ingestlayer/blog

all posts
Post#playbook

Redact PII from webhook events before they hit Slack.

Signup and payment notifications carry customer emails, names, and IPs into Slack. How to redact PII from webhook events, per destination, in the pipe.

ben7 min read


If your app posts signups or payments to Slack, those messages carry customer data — an email, a name, sometimes a phone number or an IP address. That's PII, and the channel copy of it is the one that never shows up in a data inventory.

The fix is to redact PII from webhook events before they reach the channel, and to decide it per destination: Slack and your database do different jobs, so they shouldn't receive the same body.

the copy nobody audits

The database copy of a customer is the one you've thought about. It's access-controlled, it's in the data map, and when a deletion request arrives there's a row you can point at and remove. The Slack copy has none of that: it's searchable by every member of the workspace, included in exports, and retained on whatever schedule the workspace happens to have.

The same goes for every other place the event lands — the Telegram group, the CRM webhook payload, a log line somewhere. When a customer asks to be deleted, you delete the row; the messages stay. The notification copy is personal data that left your systems without ever being written down as such.

one privacy level doesn't fit three destinations

The first fix everybody writes is masking at the call site:

notify.ts
// the fix everybody writes first: mask at the call site
await slack.post("#payments",
  mask(charge.email) + " paid " + fmt(charge.amount));
await crm.upsert(charge.email, charge);      // full copy — the CRM needs it
await db.insert("events.payments", charge);  // full copy — so does the table
// three call sites, three opinions about privacy. a new hire adds a fourth.

This rots on contact with a codebase. Every integration becomes its own opinion about what's safe to send, and the next destination ships with whatever its author remembered about privacy that afternoon. The failure mode isn't malice; it's that the policy lives in five call sites and zero documents.

Masking once for the whole pipeline fails in the other direction. Now the warehouse fills with j•••@acme.com — joins break, support can't look anyone up, and the one copy that was supposed to be the truth isn't. The actual requirement has a shape no single switch can express: Slack needs recognition (a real customer paid, roughly who), the CRM needs the record, the warehouse needs the truth. Same event, three exposure levels.

redact pii once, decide per destination

So treat it as what it is: a matrix. PII types down the side, destinations across the top, and in each cell one of four verbs — preserve, mask, hash, or drop. The email cell says mask for Slack and preserve for Postgres. The phone row says preserve everywhere except Slack, where it's drop — the channel doesn't need it, so it doesn't get it.

Detection takes two mechanisms, because PII comes in two kinds. An email, a phone number, an IP address — those pattern-match, so content rules catch them wherever they appear in the payload. A name or a company doesn't; Dana Miller is just two words. Those you redact by field name — wherever a customer_name key shows up, at any depth, apply the rule, whatever the value looks like.

Two of the verbs deserve a sentence. Hash is for fields you want joinable but not readable — count distinct IPs, group by user, without storing anything a human can act on. And the default column is quietly the most important cell: the destination you add next month inherits it automatically. A new integration can't silently opt out of the policy, which is exactly the property the call-site version never had.

the same event, three bodies

Here's the whole thing on a payment event — as the diagram you'd see in the app or the file you commit. Flip between them. In the visual view the redact.pii step is unfolded so you can read the matrix itself: fields down the side, destinations across the top, a verb in every cell.

representation

01source

sourcehttp.webhookWebhook
matchcharge.succeeded

02pipeline · 2 steps

  • 01CTLdedupekey $event.id · within 24h
  • 02MUTredact.piione matrix, applied per destination
    slackwebhookpostgresemailmaskpreservepreservephonedroppreservepreserveipdrophashhashcustomer_name · keymaskpreservepreserve

03destinations · 3

  • toslackSlack
    channel#payments
  • towebhookWebhook
    url$env.CRM_URL
  • towarehouse.pgPostgres
    tableevents.payments

Look at the Slack template: it still says {{ $event.email }}. Nothing about the template knows redaction exists. The matrix is applied at fan-out, per destination, before the template renders — so Slack sees j•••@acme.com, the CRM webhook gets the full record, and the Postgres row arrives complete. One event in, three bodies out, each shaped to what its room should know.

Drop and hash do their quieter work in the same pass: the phone number and IP never reach Slack, and the IP lands in the warehouse as a hash — still countable and joinable, no longer readable.

how ingestlayer does this

Everything above is real. redact.pii is a first-class action you drop into a pipeline like any other step — content rules for the four detectable types, key rules for everything else, and the per-destination matrix applied when the event fans out.

It also composes with the rest of the pipe. The noise moves — filter, dedupe, throttle — decide whether a message reaches the channel; redaction decides what the message is allowed to carry when it does. Wire both once and the channel copy stops being the one nobody audited.


Read next

Classify webhook events with an LLM, then route by meaning.

Some events can only be sorted by what they mean, not which field they carry. Here's how to classify webhook events with an LLM mid-pipe, then route on the label.

← back to all posts