Profiling Log Data

Why it matters

The Mezmo Data Profiler gives teams real-time visibility into the shape and quality of their telemetry data as it flows through the pipeline. It automatically surfaces schemas, cardinality, and value distributions to highlight noisy, inconsistent, or high-cost fields. With these insights, teams can make informed decisions on filtering, redaction, and enrichment—optimizing both observability costs and the usefulness of their data.

The Data Profiler currently works with log data. It categorizes this data by app, host, level, and type, based on how the processor is configured. It will also identify other fields and report on their cardinality. To build a Data Profile report, you'll need a pipeline with the Data Profiler processor in it. Mezmo recommends that this pipeline be separate from pipelines that feed your observability destinations, as the Data Profiler adds additional annotation data that is specific to it and not useful for observability purposes.

The Data Profiler consumes 1 million events at a time and builds the report in real time as those events are being consumed. To consume additional events, use the Rerun Analysis button. This will add an additional 1 million lines into the profile. A single profile can contain up to 3 days worth of events before it needs to be reset. In addition, fields from an event can be mapped into Data Profiler fields to customize the report results (see Step 3).

OpenTelemetry Log Profile

Step 1: Create an "Exploration" Pipeline

Create a new Mezmo Pipeline by clicking New Pipeline in the platform. Give this a name like Log Explorer and select Create a blank pipeline.

Create Pipeline

Step 2: Add OpenTelemetry Log Source

Click Add Source and select your OpenTelemetry Log source from the Shared Sources list.

OpenTelemetry Log Source

Step 3: Insert Otel to Profile mapping Script

In order to fully take advantage of the Mezmo Data Profiler, let's modify the structure of this data to increase the insights the profiler provides.

To do this, connect a Script Processor to the Log Source as the first processor and copy in the following code:

Javascript
    
​x
 
function processEvent(message, metadata, timestamp, annotations) {    let line = message  let app = metadata.resource.attributes["container.name"]  let host = metadata.resource.attributes["container.hostname"]  let level = metadata.level    if( app == null || app == '' ){    app = metadata.resource["service.name"]  }  if( app == null || app == '' ){    app = metadata.resource["service_name"]  }  if( app == null || app == '' ){    app = metadata.scope.name  }  if( app == null || app == '' ){    app = 'na'  }​  if( host == null || host == '' ){    host = metadata.headers["x-kafka-partition-key"]  }  if( host == null || host == '' ){    host = metadata.attributes["log.file.path"]  }  if( host == null || host == '' ){    host = 'na'  }    if( level == null || level == '' ){    level = annotations.level  }​  let new_msg = {    "line":line,    "app":app,    "host":host,    "level":level  }​  // Extract metadata to top level fields  for( const meta of Object.entries(metadata) ){    let meta_name = 'metadataotel_' + meta[0].toString()    let meta_val = meta[1]    new_msg[meta_name] = meta_val  }​  return new_msg}
Copy

The above script simply maps Otel data to the defaults of Mezmo's profiling nodes. Note that the Profiler is completely configurable and thus mapping is not always needed.

Immediately there are two insights of note:

There is an inordinate amount of logs simply stating a Product has been found coming from the product-catalog service. This is standard type of message that contains metadata on which product was found, but it can be quite costly to retain each and every one of these.

Product Found Log Profile

Unparsed events that appear to be custom Apache logs from the frontend-proxy service. While these are defined in the demo code here, we can ensure this data is structured and parsed properly to be fully searchable in any downstream Observability system.

Custom Apache Profile

In the next section, we will build out a log data optimization pipeline to address both these concerns.

Last updated on

Was this page helpful?

Profiling Log Data

Why it matters

Step 1: Create an "Exploration" Pipeline

Step 2: Add OpenTelemetry Log Source

Step 3: Insert Otel to Profile mapping Script

Step 4: Connect a Profiler node

Step 5: Deploy

Step 6: Analyze Log Patterns