Debugging dropped logs from LaaS
Causes
There can be multiple causes for dropped logs in LaaS:
-
A clashing of the fields within the object structure expected by LaaS originating in the codebase
-
The Index Pattern stored LaaS is out of date for the app and needs to be refreshed
Understanding the issue
When logs are sent to LaaS, an indexing of the object is automatically created, where each field within the object is assigned a type. Most commonly we will see issues here, for example, with an object containing a field called 'message'. This will be auto indexed as a string initially. However, further down in the code, we might be passing an object into a field called message. This will cause such a clash in LaaS.
We would expect the error message to look like one of the following:
Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"ems-me-client-pipeline-2020.01.30", :_type=>"logs", :_routing=>nil}, 2020-01-30T14:32:36.212Z %{host} {emsSdk=2.7.1, applicationVersion=10.1.2, timezone=+0200, contactReference=cr:8309fb0330f61f63:eu-west-1, language=hu, pushToken=null, platform=android, requestTime=1580394756173, application={customerId=213239625, id=188, appCode=EMS00-56D5E, contactHardwareId=100011990}, osVersion=9.2.0, requestOrder=20200130153235, hardwareId=HWID-InApp-SSeg3299-Anon-ME_InApp_SEL_Staging_Pipeline, deviceModel=Samsung,Galaxy A8, applicationId=EMS00-56D5E}], :response=>{"index"=>{"_index"=>"ems-me-client-pipeline-2020.01.30", "_type"=>"logs", "_id"=>"AW_23joAew-k5ixNUW6-", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse [message]", "caused_by"=>{"type"=>"illegal_state_exception", "reason"=>"Can't get text on a START_OBJECT at 1:268"}}}}}
This indicates that some field called 'message' could not be parsed as an Object, since some other type was expected.
This can also go both ways. i.e. an object is expected, but a concrete value is passed, in which case the error message might look like this:
Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"ems-me-event-publisher-pipe-2020.02.03", :_type=>"logs", :_routing=>nil}, 2020-02-03T12:38:07.283Z %{host} %{message}], :response=>{"index"=>{"_index"=>"ems-me-event-publisher-pipe-2020.02.03", "_type"=>"logs", "_id"=>"AXALDuGMew-k5ixN82KL", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"object mapping for [event] tried to parse field [event] as object, but found a concrete value"}}}}
It is possible that when the initial index was created this was meant to be an object, but going forward, these will now always be eg. a string. In such case we might be able to sove the issue by simply updating the mapping in LaaS.
If in doubt, ask in a channel with one of the involved developers.
Querying dropped logs on LaaS
These dropped logs are stored under the following index:
laas-*
Here is a sample query that you can use to filter for all ME related dropped logs due to conflicting types:
message: ("Could not index event to Elasticsearch" AND ("ems-me-*" OR "ems-segment-diff") AND "mapper_parsing_exception")
Feel free to play around with that.
Solutions
Clashing Fields
Go into the code of the affected repo and find the line where we might log a message which could cause this issue.
Most commonly it will be due to lazily logging eg. a message object or an event object, without specifying a different name for the field within the call to the logger.
Index Pattern refresh
If you are absolutely certain that the object is being parsed incorrectly in LaaS, you can refresh the field list of the index Pattern in LaaS. To do this go to LaaS and:
-
Select 'Management' on the left
-
Select 'Index Patterns'
-
Select the affected index from the list
-
Then you can hit the 'Refresh Field List' Button in the top right
This will reindex all the types of incoming messages, and existing messages. This may take a while.