Profiling Log Data

Why it matters

The Mezmo Data Profiler gives teams real-time visibility into the shape and quality of their telemetry data as it flows through the pipeline. It automatically surfaces schemas, cardinality, and value distributions to highlight noisy, inconsistent, or high-cost fields. With these insights, teams can make informed decisions on filtering, redaction, and enrichment—optimizing both observability costs and the usefulness of their data.

The Data Profiler currently works with log data. It categorizes this data by app, host, level, and type, based on how the processor is configured. It will also identify other fields and report on their cardinality. To build a Data Profile report, you'll need a pipeline with the Data Profiler processor in it. Mezmo recommends that this pipeline be separate from pipelines that feed your observability destinations, as the Data Profiler adds additional annotation data that is specific to it and not useful for observability purposes.

The Data Profiler consumes 1 million events at a time and builds the report in real time as those events are being consumed. To consume additional events, use the Rerun Analysis button. This will add an additional 1 million lines into the profile. A single profile can contain up to 3 days worth of events before it needs to be reset. In addition, fields from an event can be mapped into Data Profiler fields to customize the report results (see Step 3).

OpenTelemetry Log Profile

OpenTelemetry Log Profile

Step 1: Create an "Exploration" Pipeline

Create a new Mezmo Pipeline by clicking New Pipeline in the platform. Give this a name like Log Explorer and select Create a blank pipeline.

Create Pipeline

Create Pipeline

Step 2: Add OpenTelemetry Log Source

Click Add Source and select your OpenTelemetry Log source from the Shared Sources list.

OpenTelemetry Log Source

OpenTelemetry Log Source

OpenTelemetry Shared Source Selection

OpenTelemetry Shared Source Selection

Step 3: Insert Otel to Profile mapping Script

In order to fully take advantage of the Mezmo Data Profiler, let's modify the structure of this data to increase the insights the profiler provides.

To do this, connect a Script Processor to the Log Source as the first processor and copy in the following code:

Javascript
Copy

The above script simply maps Otel data to the defaults of Mezmo's profiling nodes. Note that the Profiler is completely configurable and thus mapping is not always needed.

Step 4: Connect a Profiler node

Add a Data Profiler node connected after the Script processor from Step 3. Give it a name like Otel Demo Log Exploration and leave the default configuration. To complete the pipeline, add a Blackhole Destination and name it Log Data Dump.

Add Log Profiler

Add Log Profiler

Step 5: Deploy

Finally, you must deploy your pipeline in order to start exploring your log data.

*

Final Pipeline: Log Exploration

Final Pipeline: Log Exploration

Step 6: Analyze Log Patterns

Once your data profiling pipeline has been built and deployed, the profiler report will build real time in a few seconds. Within minutes, you will see something similar to the following

OpenTelemetry Log Profile

OpenTelemetry Log Profile

Immediately there are two insights of note:

  1. There is an inordinate amount of logs simply stating a homepage is being flooded coming from the load-generator service. This is standard behavior of the OpenTelemetry Demo using the Feature Flag: loadgeneratorFloodHomepage but as one can see, it is quite noisy and inevitably costly to retain.
Homepage Flood Log Profile

Homepage Flood Log Profile

  1. Unparsed events that appear to be custom Apache logs from the frontend-proxy service. While these are defined in the demo code here, we can ensure this data is structred and parsed properly to be fully searchable in any downstream Observability system.
Custom Apache Profile

Custom Apache Profile

In the next section, we will build out a log data pipeline to address both these concerns.

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard