The Situation
This Pipette depicts the typical configuration of a Data Compliance processor group within a Telemetry Pipeline. In this situation, the objectives are to send financial transaction and account access data to a storage location in case it is needed for later analysis, and to redact or encrypt Personally Identifying Information (PII).
This group uses the $link[page,220155,auto$] to send specific events to the $link[page,260911,auto$] and the $link[page,220149,auto$], before storage, to obfuscate user IDs and credit card numbers, and to also enable the decryption of credit card numbers in case they are needed for specific analysis.
Interactive Demo
This demo demonstrates the configuration of a processor chain for encrypting and redacting data. You will need to have pop-ups enabled for your browser or docs.mezmo.com to view the demo. You can also view this demo on the mezmo.com website without a pop-up.
Overview
This schematic of the Pipette illustrates the Processor chain for redacting and encrypting Personally Identifying Information focusing on login User IDs and credit card numbers. The Processor configurations are described in detail in the sections that match the numbers in the schematic.
1 - Demo Logs Source
This Pipette uses the $link[page,223618,auto$] with the Financial Data option to send a sample of data containing PII through the Processor chain.
2 - Route Processor
The $link[page,220155,auto$] enables you to set conditions under which telemetry data will be sent to other points in the processing chain. In this case, it filters three types of events from the incoming data for processing: Access, Transaction, and Bootup. Any events that don't match these three types are sent directly to the storage location.
3 - Encrypt Processor
The transaction events contain credit card information that should be redacted or encrypted before being sent to storage. In this case, since the credit card numbers may be needed later, for example for fraud analysis, the $link[page,220149,auto$] is set to encrypt the card numbers, so that they can later be decrypted using the encryption key.
4 - Redact Processor
Information that is redacted is obfuscated completely, and cannot be recovered after processing. For this reason, the $link[page,260911,auto$] should be used to remove PII that is particularly sensitive, but doesn't need to be used for later analysis. In this case, the login User ID from Access events is redacted, since this is information that could be used to hack user accounts, but isn't needed for analysis. The Processor operation is based on searching for specific patterns, such as social security numbers or email addresses, or custom patterns, and then using a hash or replacement pattern to obfuscate the data. In this case, it searches the field.access.user_id for a custom pattern based on a regular expression, and then hashes it using the md5 algorithm.
5 - Blackhole Destination
The $link[page,227761,auto$] Destination drops all data sent to it. This makes it useful for testing your Processor chain to make sure you are getting the expected results before sending them on to a production destination. Mezmo supports a wide variety of popular destinations including $link[page,220137,auto$], $link[page,227633,auto$], and $link[page,223093,auto$].
In this case, note that the data volume from the Source to the Destination has increased by almost 22%. It's typical for data volume to increase with these Processors because they add characters to the message strings. However, fine tuning of the algorithms and encryption keys can limit the increase in data volume.
For More Information
For more information on how to implement security modules for your Pipeline data management needs, contact our Solutions Engineering team to schedule a free consultation.