Archiving Logs

This guide covers how to use the archiving feature located under the Settings pane of the LogDNA Web App.

Overview

Archiving is an automatic function that exports your logs from Mezmo to an external source. Archived logs are in JSON format and preserve metadata associated with each line. Once archiving is configured for your account, your logs will be exported hourly in a compressed format (.json.gz). Only retained logs will be archived (ex. logs affected by exclusion rules would not be archived). The first time you configure archiving, your archived logs will typically appear within 12-24 hours.

Hourly Archiving

Hourly archives create 24+ Gzip JSON files per day. If there are no logs, then no files will be uploaded for that hour. Archives are expected to appear within 24 hours but may take up to 72 hours for larger customers. The file contents will stay the same as the daily archives, except now the file will be stored as year=YYYY/month=MM/day=DD/<accountID>.<YYYY>-<MM>-<DD>.<HH>00.json.gz (where HH is hours in 24 format) for all providers.

Note: If log lines are attributed or received 6 hours beyond the hour bucket the log line belongs to, subsequent archive files will be created in the name format of <accountID>.<YYYY>-<MM>-<DD>.<HH>00.<NUMBER>.json.gz, where <NUMBER> is an incrementing number starting from 1, to prevent filename conflicts.

Hourly archive may create duplication of logs in the storage (1% of the time). You can tell a line has been duplicated if the lines have the same log line ID.

AWS S3

To export your logs to an S3 bucket, make sure that you have an AWS account with access to S3. If you need to create a new S3 bucket for log storage, follow the instructions in the AWS S3 Getting Started Guide.

Add Mezmo as a Grantee for Your S3 Bucket Access Control List

To set up log archiving for your bucket, follow the AWS instructions for Using the S3 console to set ACL permissions for a bucket. In those instructions, follow the steps to “To grant access to another AWS account” and use this canonical ID for Mezmo:

659c621e261e7ffa5d8f925bbe9fe1698f3637878e96bc1a9e7216838799b71a

Enable these permissions for Mezmo:

  • Objects List
  • Objects Write
  • Bucket ACL Read
  • Bucket ACL Write

Configure Mezmo

  1. Go to the Archive pane of the LogDNA web app
  2. Under the S3 Archiving section, input the name of your newly created S3 bucket, and click Save.

Azure Blob Storage

To export your logs to Azure Blob Storage, ensure that you have an Azure account with access to storage accounts.

  1. Create a Storage Account on Microsoft Azure
  2. Once created, click your storage account and then click Access Keys under the heading Settings
  3. Create a key if you do not already have one
  4. Go to the Archive pane of the LogDNA web app
  5. Under the Azure Blob Storage archiving section, input your storage account name and key and then click Save.

Google Cloud Storage

To export your logs to Google Cloud Storage, ensure that you have a Google Cloud Platform account and project with access to storage.

  1. Ensure that Google Cloud Storage JSON API is enabled.
  2. Create a new bucket (or use an existing one) in Google Cloud Storage.
  3. Update the permissions of the bucket and add a new member [email protected] with the role of Storage Admin.
  4. Go to the Archive pane of the LogDNA web app.
  5. Under the Google Cloud Storage Archiving section, input your ProjectId and Bucket and then click Save.

OpenStack Swift

To export your logs to OpenStack Swift, ensure that you have an OpenStack account with access to Swift.

  1. Set up Swift by following these instructions.
  2. Go to the Archive pane of the LogDNA web app.
  3. Under the OpenStack Swift Archiving section, input your Username, Password, Auth URL, and Tenant Name and then click Save.

Digital Ocean Spaces

To export your logs to Digital Ocean Spaces, ensure that you have a Digital Ocean account with access to storage.

  1. Create a new space (or use an existing one) in Digital Ocean Spaces.
  2. Create a new spaces access key in Digital Ocean Applications & API. Make sure to save the access key and secret key.
  3. Go to the Archive pane of the LogDNA web app.
  4. Under the Digital Ocean Spaces Archiving section, input your Bucket, Region, AccessKey, and SecretKey. Note that your region can be found in your spaces URL e.g. https://my-logdna-bucket.nyc3.digitaloceanspaces.com has the region nyc3.

IBM Cloud Object Storage Archiving

To export your logs to IBM Cloud Object Storage Archiving, ensure that you have an IBM Cloud account with access to storage.

  1. Create a new object storage service (or use an existing one) in IBM Cloud Object Storage.
  2. Create a new bucket (or use an existing one) in your service for Mezmo dump files.
  3. Go to the Archive pane of the LogDNA web app.
  4. Under the IBM Cloud Object Storage Archiving section, input your Bucket, Public Endpoint, API Key, and Resource Instance ID, and then click Save.

Security

By default, Mezmo encrypts your archived data in transit, and requests server-side encryption where possible, including using x-amz-server-side-encryption upon upload of logs to S3.

Reading archived logs

Log files are stored in a zipped JSON lines format. While we do not currently support re-ingesting historical data, there are a number of tools we can recommend to parse your archived logs.

Amazon Athena

Amazon Athena is a serverless interactive query service that can analyze large datasets residing in S3 buckets. You can use Amazon Athena to define a schema and query results using SQL. More information about Amazon Athena is available here.

Google BigQuery

Google BigQuery is a serverless enterprise data warehouse that can analyze large datasets. One of our customers, Life.Church, has generously shared a command-line utility, DNAQuery, that loads Mezmo archived data into Google BigQuery. More information about Google Big Query is available here.

📘

DNAQuery

This repository is a great reference. Note: it has not been updated since 2019.

IBM SQL Query

IBM SQL Query is a serverless data processing and analytics service for large volumes of data stored on IBM Cloud Object Storage. You can use it to query and transform as is on object storage using SQL. You can optionally also define a table definition and query that one. See this blog article for details of using IBM SQL Query with Mezmo data in IBM Cloud.

jq

jq is a handy command-line tool used to parse JSON data. Once your archive has been uncompressed, you can use jq to parse your archive log files. More information about jq is available here.