Ad Hoc Log Analysis with Kibana and Docker

Recently, one of our clients had an issue with thousands of requests with an unknown origin. It was quickly clear that the requests haven't been malicious and have been caused by either the application or the infrastructure. Anyway, the problem was made more complex as not all involved systems had logging enabled and for the remaining log files no aggregation was present.

That's the point where the ELK stack (Elasticsearch, Logstash and Kibana) and Docker jumped in to help us. The ELK stack is absolutely amazing for log analysis, no question. But it was Docker which allowed us to set up a working system within less than an hour. This is a great example to demonstrate the power of Docker, let's see how we proceeded.

First, our goal was to find a suitable Docker image because creating our own ELK image would certainly take more time. So we just searched on Google for "docker logstash" and found the image pblittle/docker-logstash which looked pretty complete. And indeed it came with Elastichsearch, Logstash and Kibana all in one image.

Time to fire it up for a first trial:

docker run -d \  
  -p 9292:9292 \
  -p 9200:9200 \
  pblittle/docker-logstash

This will run the image and basically provide a fully functional demo with some example logs indexed. Kibana can be accessed by opening http://localhost:9292.

Logs visualized with Kibana

The next step was to change the Logstash configuration and inject our Apache logfiles we wanted to analyze. For this, an URL to a custom config file can be provided as an environment variable. The image then downloads and uses the given config instead of the default one. We based our config on the given example and changed the input, filter and output part to fit our requirements for Apache log parsing:

input {  
  file {
    type => "apache"
    path => [ "/data/*.log" ]
    start_position => "beginning"
  }
}

filter {  
    grok {
        match=> { message => "%{COMBINEDAPACHELOG}" }
    }
    date {
        locale => "en"
        match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
        timezone => "Europe/Rome"
    }
}

output {  
  elasticsearch {
    embedded => ES_EMBEDDED
    host => "ES_HOST"
    port => "ES_PORT"
    protocol => "http"
  }
}

As you can see, the logs are imported from the /data/ folder and our only output is Elasticsearch as we don't want to log everything to stdout. The config is available for you as a Gist.

After having the correct configuration we still needed to inject our log files. This was done by mounting a host volume to the Docker image under the /data path. Allowing us to add more logs over time.

So our command to run our ad hoc ELK image looks like:

docker run -d \  
  -e LOGSTASH_CONFIG_URL=http://bit.ly/1DQrVqd \
  -p 9292:9292 \ 
  -p 9200:9200 \
  -v /tmp/logs:/data \
  pblittle/docker-logstash

Now copy your log files to /tmp/logs on your host machine. Depending on the size of your log files, indexing will take some time. It's probably a good idea to test your Logstash configuration first with a small demo log file before indexing the full log set.

Afterwards your on-demand ELK stack is ready and can be accessed on http://localhost:9292 to start analyse your logs like a champ.

We went on by moving the ELK container to a proper server and assigned the application more memory as the default of 500m is barely enough for Elasticsearch. Simply set the env variable LS_HEAP_SIZE when starting the container:

docker run -d \  
  -e LS_HEAP_SIZE=8192m \
  ...

And when you are looking for what you can do with Kibana, this Kibana 101 is pretty helpful.

In conclusion, log analysis with Kabana is possible even without having a permanent setup and on-demand thanks to Docker and the great image by P. Barrett Little. Now, with our step by step instructions it should only take minutes to set it up and we will definitely use it again if we have to dig through log files.