Elasticsearch - Extract logstash logs

So you have sent all your logs from logstash to your elasticsearch instance, now they are stored safely and with Kibana on-top to query, filter and visualise them beautifully. This post is about how to get your logs back from elasticsearch!

Export Examples

Export all logs, no filter or query

The project elasticdump allows indexes in elasticsearch to be exported in JSON format. We will use that to get those logs back, this command will download all your logs from your elasticsearch.

Caution: With a lot of logs in elasticsearch this command will take a long time and take a lot up a lot of resources on your elasticsearch instance.

elasticdump --input=http://localhost:9200/logstash-* --output=$ --type=data > logstash-logs.json

Example Output

[
...
    {
      "_index":"logstash-2015.02.20",
      "_type":"log-type-one",
      "_id":"1U3KsV1C2HFGG7A9c15eD","_score":0,
      "_source":
      {
        "message":"2015/02/20-12:05:01.632 +0000: Hello World!",
        "@version":"1",
        "@timestamp":"2015-02-20T12:05:01.632Z",
        "type":"log-type-one",
        "host":"31ac3e434210",
        "path":"/var/mylog.log",
        "timestamp":"2015/02/20-12:05:01.632 +0000",
        "text":"Hello World!"
      }
    }
...
]

Export all logs, only including specific fields

You'll notice in the previous command the _source section returns a lot of extra fields you may not need. You can use source filter and use the --searchBody argument in elasticdump to get only the source fields you need. Below is an example of only getting the message field in _source.

Caution: With a lot of logs in elasticsearch this command will take a long time and take a lot up a lot of resources on your elasticsearch instance.

elasticdump \
  --input=http://localhost:9200/logstash-* \
  --output=$ \
  --type=data \
  --searchBody='{ "_source": "message", "query": {"match_all": {}} }' \
  > logstash-logs.json

Export logs within a time range and filter

Using a filter for a time range allows you to get specific logs. In the example below this will return logs within a 10 second time range, there is also a filter which makes sure only type log-type-one log lines are returned.

elasticdump \
  --input=http://localhost:9200/logstash-* \
  --output=$ \
  --type=data \
  --searchBody='{ "_source": "message", "filter": {"type": {"value":"log-type-one"}}, "query": {"range": {"@timestamp" : { "gte":"2015-02-20T12:02:00.632Z", "lt": "2015-02-20T12:02:00.632Z||+10s"}} }}' \
  > logstash-logs.json

The results from all of these will give you JSON and will always contain the fields _index, _type, _id and _source. So the JSON array returned will still need to be parsed if you don't want a JSON, for example you could recreate the original raw logs by grabbing only the message field which contains it.

As shown before the --searchBody in elasticdump which uses elasticsearch's query APIs like search query and filter are very powerful and should be explored more if you need to get even more specific logs.

Script converting JSON logs to the original raw log format

Example python script that would convert the logs back into their original format.

# pip install ijson
import ijson, os, sys, codecs
from optparse import OptionParser

def convertToRawLogs(inputFileName):
    with open(inputFileName) as json_file:
        # Make stdout utf-8
        sys.stdout = codecs.getwriter('utf8')(sys.stdout)
        parser = ijson.parse(json_file)
        for prefix, event, value in parser:
            if prefix == 'item._source.message':
                print(value)

def main():
    parser = OptionParser()
    (options, args) = parser.parse_args()
    if len(args) != 1:
        scriptName = os.path.basename(__file__)
        parser.error("\n\tusage: %s <input-file>" % scriptName)
    else:
        convertToRawLogs(args[0])

if __name__ == "__main__":
    main()

Usage of the script

The script writes to stdout so you need to pipe it to a file if you want to save it.

usage: convert-logs-to-raw.py <input-file>

Example of running the script

$ python convert-logs-to-raw.py logstash-logs.json > raw-log-output.txt