Elasticsearch - Extract logstash logs
So you have sent all your logs from logstash to your elasticsearch instance, now they are stored safely and with Kibana on-top to query, filter and visualise them beautifully. This post is about how to get your logs back from elasticsearch!
Export Examples
Export all logs, no filter or query
The project elasticdump allows indexes in elasticsearch to be exported in JSON format. We will use that to get those logs back, this command will download all your logs from your elasticsearch.
Caution: With a lot of logs in elasticsearch this command will take a long time and take a lot up a lot of resources on your elasticsearch instance.
elasticdump --input=http://localhost:9200/logstash-* --output=$ --type=data > logstash-logs.json
Example Output
[
...
{
"_index":"logstash-2015.02.20",
"_type":"log-type-one",
"_id":"1U3KsV1C2HFGG7A9c15eD","_score":0,
"_source":
{
"message":"2015/02/20-12:05:01.632 +0000: Hello World!",
"@version":"1",
"@timestamp":"2015-02-20T12:05:01.632Z",
"type":"log-type-one",
"host":"31ac3e434210",
"path":"/var/mylog.log",
"timestamp":"2015/02/20-12:05:01.632 +0000",
"text":"Hello World!"
}
}
...
]
Export all logs, only including specific fields
You'll notice in the previous command the _source section returns a lot of extra fields you may not need. You can use source filter and use the --searchBody argument in elasticdump to get only the source fields you need. Below is an example of only getting the message field in _source.
Caution: With a lot of logs in elasticsearch this command will take a long time and take a lot up a lot of resources on your elasticsearch instance.
elasticdump \
--input=http://localhost:9200/logstash-* \
--output=$ \
--type=data \
--searchBody='{ "_source": "message", "query": {"match_all": {}} }' \
> logstash-logs.json
Export logs within a time range and filter
Using a filter for a time range allows you to get specific logs. In the example below this will return logs within a 10 second time range, there is also a filter which makes sure only type log-type-one log lines are returned.
elasticdump \
--input=http://localhost:9200/logstash-* \
--output=$ \
--type=data \
--searchBody='{ "_source": "message", "filter": {"type": {"value":"log-type-one"}}, "query": {"range": {"@timestamp" : { "gte":"2015-02-20T12:02:00.632Z", "lt": "2015-02-20T12:02:00.632Z||+10s"}} }}' \
> logstash-logs.json
The results from all of these will give you JSON and will always contain the fields _index, _type, _id and _source. So the JSON array returned will still need to be parsed if you don't want a JSON, for example you could recreate the original raw logs by grabbing only the message field which contains it.
As shown before the --searchBody in elasticdump which uses elasticsearch's query APIs like search query and filter are very powerful and should be explored more if you need to get even more specific logs.
Script converting JSON logs to the original raw log format
Example python script that would convert the logs back into their original format.
# pip install ijson
import ijson, os, sys, codecs
from optparse import OptionParser
def convertToRawLogs(inputFileName):
with open(inputFileName) as json_file:
# Make stdout utf-8
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
parser = ijson.parse(json_file)
for prefix, event, value in parser:
if prefix == 'item._source.message':
print(value)
def main():
parser = OptionParser()
(options, args) = parser.parse_args()
if len(args) != 1:
scriptName = os.path.basename(__file__)
parser.error("\n\tusage: %s <input-file>" % scriptName)
else:
convertToRawLogs(args[0])
if __name__ == "__main__":
main()
Usage of the script
The script writes to stdout so you need to pipe it to a file if you want to save it.
usage: convert-logs-to-raw.py <input-file>
Example of running the script
$ python convert-logs-to-raw.py logstash-logs.json > raw-log-output.txt