You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@zeppelin.apache.org by ashish rawat <dc...@gmail.com> on 2016/04/19 18:53:29 UTC

Elastic Search Interpreter limitation?

Hi,

I am trying to use the filters aggregation of elastic search
https://www.elastic.co/guide/en/elasticsearch/reference/2.2/search-aggregations-bucket-filters-aggregation.html


As documented on the elastic page, I made the following query through
zeppelin
{
  "aggs" : {
    "messages" : {
      "filters" : {
        "filters" : {
          "error" :   { "term" : { "logLevel" : "error"   }},
          "trace" : { "term" : { "logLevel" : "trace" }}
        }
      },
     "aggs" : {
        "messages_over_time" : {
            "date_histogram" : {
                "field" : "timestamp",
                "interval" : "day",
                "format" : "yyyy-MM-dd"
            }
        }
    }
    }

but the response only contained the fields: 'key' and 'doc_count', whereas
if I run the same query through elastic's rest interface, I get the
following result

  "aggregations": {
    "messages": {
      "buckets": {
        "error": {
          "doc_count": 57,
          "messages_over_time": {
            "buckets": [
              {
                "key_as_string": "2016-03-21",
                "key": 1458518400000,
                "doc_count": 1
              },
              {
                "key_as_string": "2016-03-22",
                "key": 1458604800000,
                "doc_count": 0
              },
              {
                "key_as_string": "2016-03-23",
                "key": 1458691200000,
                "doc_count": 0
              },
              {
                "key_as_string": "2016-03-24",
                "key": 1458777600000,
                "doc_count": 0
              },
              {
                "key_as_string": "2016-03-25",
                "key": 1458864000000,
                "doc_count": 0
              },
              {
                "key_as_string": "2016-03-26",
                "key": 1458950400000,
                "doc_count": 0
              },
              {
                "key_as_string": "2016-03-27",
                "key": 1459036800000,
                "doc_count": 0
              },
              {
                "key_as_string": "2016-03-28",
                "key": 1459123200000,
                "doc_count": 0
              },
              {
                "key_as_string": "2016-03-29",
                "key": 1459209600000,
                "doc_count": 0
              },
              {
                "key_as_string": "2016-03-30",
                "key": 1459296000000,
                "doc_count": 0
              },
              {
                "key_as_string": "2016-03-31",
                "key": 1459382400000,
                "doc_count": 0
              },
              {
                "key_as_string": "2016-04-01",
                "key": 1459468800000,
                "doc_count": 8
              },
              {
                "key_as_string": "2016-04-02",
                "key": 1459555200000,
                "doc_count": 0
              },
              {
                "key_as_string": "2016-04-03",
                "key": 1459641600000,
                "doc_count": 0
              },
              {
                "key_as_string": "2016-04-04",
                "key": 1459728000000,
                "doc_count": 48
              }
            ]
          }
        },
        "trace": {
          "doc_count": 372,
          "messages_over_time": {
            "buckets": [
              {
                "key_as_string": "2016-04-04",
                "key": 1459728000000,
                "doc_count": 372
              }
            ]
          }
        }
      }
    }

as expected, it has the timeseries of the 'error' and 'trace' messages.

Is there any limitation in elastic search interpreter which does not allow
parsing of complex responses?

Regards,
Ashish

Re: Elastic Search Interpreter limitation?

Posted by ashish rawat <dc...@gmail.com>.
Hi Bruno,

I believe I have found the issue. There indeed is a dependency on the
_source fields in this line 461

final String json = hit.getSourceAsString();

Regards,
Ashish

On Wed, Apr 20, 2016 at 12:05 AM, ashish rawat <dc...@gmail.com> wrote:

> Hi Bruno,
>
> I am encountering another issue, which might also be related to the
> interpreter.
>
> When using the "fields" attribute in the query to select the exact fields
> to return, I get an "Error: String is null" through Zeppelin, while the
> same query works through the REST interface.
>
> I noticed that a normal query, of the form
>
> {
>   "query": {"regexp":{"log":"module"}}
> }
>
> returns results in the following format:
> "hits": {....
>     "hits": [
>       {....
>         "_source": {
>
>
> while a query with "fields", return results in the format:
> "hits": {....
>     "hits": [
>       {....
>         "fields": {
>
> Could this be the issue? I had a quick scan over the
> ElasticsearchInterpreter.buildSearchHitsResponseMessage, but couldn't find
> any dependency on "_source" to validate my assumption.
>
> Do you think this could be an interpreter issue?
>
> Regards,
> Ashish
>
> On Tue, Apr 19, 2016 at 10:54 PM, ashish rawat <dc...@gmail.com>
> wrote:
>
>> Thanks Bruno for the prompt reply. Do you know of any indirect way of
>> achieving the same, i.e. timeseries' of all values of a field (eg logLevel,
>> httpMethod)
>>
>> Regards,
>> Ashish
>>
>> On Tue, Apr 19, 2016 at 10:38 PM, Bruno Bonnin <bb...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> You are right, there are some limitations with the Elasticsearch
>>> interpreter.
>>> I have developed it and I'am going to check how I can change the
>>> component to take into account this kind of more complex request.
>>>
>>> Regards,
>>> Bruno
>>>
>>> 2016-04-19 18:53 GMT+02:00 ashish rawat <dc...@gmail.com>:
>>>
>>>> Hi,
>>>>
>>>> I am trying to use the filters aggregation of elastic search
>>>>
>>>> https://www.elastic.co/guide/en/elasticsearch/reference/2.2/search-aggregations-bucket-filters-aggregation.html
>>>>
>>>>
>>>> As documented on the elastic page, I made the following query through
>>>> zeppelin
>>>> {
>>>>   "aggs" : {
>>>>     "messages" : {
>>>>       "filters" : {
>>>>         "filters" : {
>>>>           "error" :   { "term" : { "logLevel" : "error"   }},
>>>>           "trace" : { "term" : { "logLevel" : "trace" }}
>>>>         }
>>>>       },
>>>>      "aggs" : {
>>>>         "messages_over_time" : {
>>>>             "date_histogram" : {
>>>>                 "field" : "timestamp",
>>>>                 "interval" : "day",
>>>>                 "format" : "yyyy-MM-dd"
>>>>             }
>>>>         }
>>>>     }
>>>>     }
>>>>
>>>> but the response only contained the fields: 'key' and 'doc_count',
>>>> whereas if I run the same query through elastic's rest interface, I get the
>>>> following result
>>>>
>>>>   "aggregations": {
>>>>     "messages": {
>>>>       "buckets": {
>>>>         "error": {
>>>>           "doc_count": 57,
>>>>           "messages_over_time": {
>>>>             "buckets": [
>>>>               {
>>>>                 "key_as_string": "2016-03-21",
>>>>                 "key": 1458518400000,
>>>>                 "doc_count": 1
>>>>               },
>>>>               {
>>>>                 "key_as_string": "2016-03-22",
>>>>                 "key": 1458604800000,
>>>>                 "doc_count": 0
>>>>               },
>>>>               {
>>>>                 "key_as_string": "2016-03-23",
>>>>                 "key": 1458691200000,
>>>>                 "doc_count": 0
>>>>               },
>>>>               {
>>>>                 "key_as_string": "2016-03-24",
>>>>                 "key": 1458777600000,
>>>>                 "doc_count": 0
>>>>               },
>>>>               {
>>>>                 "key_as_string": "2016-03-25",
>>>>                 "key": 1458864000000,
>>>>                 "doc_count": 0
>>>>               },
>>>>               {
>>>>                 "key_as_string": "2016-03-26",
>>>>                 "key": 1458950400000,
>>>>                 "doc_count": 0
>>>>               },
>>>>               {
>>>>                 "key_as_string": "2016-03-27",
>>>>                 "key": 1459036800000,
>>>>                 "doc_count": 0
>>>>               },
>>>>               {
>>>>                 "key_as_string": "2016-03-28",
>>>>                 "key": 1459123200000,
>>>>                 "doc_count": 0
>>>>               },
>>>>               {
>>>>                 "key_as_string": "2016-03-29",
>>>>                 "key": 1459209600000,
>>>>                 "doc_count": 0
>>>>               },
>>>>               {
>>>>                 "key_as_string": "2016-03-30",
>>>>                 "key": 1459296000000,
>>>>                 "doc_count": 0
>>>>               },
>>>>               {
>>>>                 "key_as_string": "2016-03-31",
>>>>                 "key": 1459382400000,
>>>>                 "doc_count": 0
>>>>               },
>>>>               {
>>>>                 "key_as_string": "2016-04-01",
>>>>                 "key": 1459468800000,
>>>>                 "doc_count": 8
>>>>               },
>>>>               {
>>>>                 "key_as_string": "2016-04-02",
>>>>                 "key": 1459555200000,
>>>>                 "doc_count": 0
>>>>               },
>>>>               {
>>>>                 "key_as_string": "2016-04-03",
>>>>                 "key": 1459641600000,
>>>>                 "doc_count": 0
>>>>               },
>>>>               {
>>>>                 "key_as_string": "2016-04-04",
>>>>                 "key": 1459728000000,
>>>>                 "doc_count": 48
>>>>               }
>>>>             ]
>>>>           }
>>>>         },
>>>>         "trace": {
>>>>           "doc_count": 372,
>>>>           "messages_over_time": {
>>>>             "buckets": [
>>>>               {
>>>>                 "key_as_string": "2016-04-04",
>>>>                 "key": 1459728000000,
>>>>                 "doc_count": 372
>>>>               }
>>>>             ]
>>>>           }
>>>>         }
>>>>       }
>>>>     }
>>>>
>>>> as expected, it has the timeseries of the 'error' and 'trace' messages.
>>>>
>>>> Is there any limitation in elastic search interpreter which does not
>>>> allow parsing of complex responses?
>>>>
>>>> Regards,
>>>> Ashish
>>>>
>>>>
>>>
>>
>

Re: Elastic Search Interpreter limitation?

Posted by ashish rawat <dc...@gmail.com>.
Hi Bruno,

I am encountering another issue, which might also be related to the
interpreter.

When using the "fields" attribute in the query to select the exact fields
to return, I get an "Error: String is null" through Zeppelin, while the
same query works through the REST interface.

I noticed that a normal query, of the form

{
  "query": {"regexp":{"log":"module"}}
}

returns results in the following format:
"hits": {....
    "hits": [
      {....
        "_source": {


while a query with "fields", return results in the format:
"hits": {....
    "hits": [
      {....
        "fields": {

Could this be the issue? I had a quick scan over the
ElasticsearchInterpreter.buildSearchHitsResponseMessage, but couldn't find
any dependency on "_source" to validate my assumption.

Do you think this could be an interpreter issue?

Regards,
Ashish

On Tue, Apr 19, 2016 at 10:54 PM, ashish rawat <dc...@gmail.com> wrote:

> Thanks Bruno for the prompt reply. Do you know of any indirect way of
> achieving the same, i.e. timeseries' of all values of a field (eg logLevel,
> httpMethod)
>
> Regards,
> Ashish
>
> On Tue, Apr 19, 2016 at 10:38 PM, Bruno Bonnin <bb...@gmail.com> wrote:
>
>> Hello,
>>
>> You are right, there are some limitations with the Elasticsearch
>> interpreter.
>> I have developed it and I'am going to check how I can change the
>> component to take into account this kind of more complex request.
>>
>> Regards,
>> Bruno
>>
>> 2016-04-19 18:53 GMT+02:00 ashish rawat <dc...@gmail.com>:
>>
>>> Hi,
>>>
>>> I am trying to use the filters aggregation of elastic search
>>>
>>> https://www.elastic.co/guide/en/elasticsearch/reference/2.2/search-aggregations-bucket-filters-aggregation.html
>>>
>>>
>>> As documented on the elastic page, I made the following query through
>>> zeppelin
>>> {
>>>   "aggs" : {
>>>     "messages" : {
>>>       "filters" : {
>>>         "filters" : {
>>>           "error" :   { "term" : { "logLevel" : "error"   }},
>>>           "trace" : { "term" : { "logLevel" : "trace" }}
>>>         }
>>>       },
>>>      "aggs" : {
>>>         "messages_over_time" : {
>>>             "date_histogram" : {
>>>                 "field" : "timestamp",
>>>                 "interval" : "day",
>>>                 "format" : "yyyy-MM-dd"
>>>             }
>>>         }
>>>     }
>>>     }
>>>
>>> but the response only contained the fields: 'key' and 'doc_count',
>>> whereas if I run the same query through elastic's rest interface, I get the
>>> following result
>>>
>>>   "aggregations": {
>>>     "messages": {
>>>       "buckets": {
>>>         "error": {
>>>           "doc_count": 57,
>>>           "messages_over_time": {
>>>             "buckets": [
>>>               {
>>>                 "key_as_string": "2016-03-21",
>>>                 "key": 1458518400000,
>>>                 "doc_count": 1
>>>               },
>>>               {
>>>                 "key_as_string": "2016-03-22",
>>>                 "key": 1458604800000,
>>>                 "doc_count": 0
>>>               },
>>>               {
>>>                 "key_as_string": "2016-03-23",
>>>                 "key": 1458691200000,
>>>                 "doc_count": 0
>>>               },
>>>               {
>>>                 "key_as_string": "2016-03-24",
>>>                 "key": 1458777600000,
>>>                 "doc_count": 0
>>>               },
>>>               {
>>>                 "key_as_string": "2016-03-25",
>>>                 "key": 1458864000000,
>>>                 "doc_count": 0
>>>               },
>>>               {
>>>                 "key_as_string": "2016-03-26",
>>>                 "key": 1458950400000,
>>>                 "doc_count": 0
>>>               },
>>>               {
>>>                 "key_as_string": "2016-03-27",
>>>                 "key": 1459036800000,
>>>                 "doc_count": 0
>>>               },
>>>               {
>>>                 "key_as_string": "2016-03-28",
>>>                 "key": 1459123200000,
>>>                 "doc_count": 0
>>>               },
>>>               {
>>>                 "key_as_string": "2016-03-29",
>>>                 "key": 1459209600000,
>>>                 "doc_count": 0
>>>               },
>>>               {
>>>                 "key_as_string": "2016-03-30",
>>>                 "key": 1459296000000,
>>>                 "doc_count": 0
>>>               },
>>>               {
>>>                 "key_as_string": "2016-03-31",
>>>                 "key": 1459382400000,
>>>                 "doc_count": 0
>>>               },
>>>               {
>>>                 "key_as_string": "2016-04-01",
>>>                 "key": 1459468800000,
>>>                 "doc_count": 8
>>>               },
>>>               {
>>>                 "key_as_string": "2016-04-02",
>>>                 "key": 1459555200000,
>>>                 "doc_count": 0
>>>               },
>>>               {
>>>                 "key_as_string": "2016-04-03",
>>>                 "key": 1459641600000,
>>>                 "doc_count": 0
>>>               },
>>>               {
>>>                 "key_as_string": "2016-04-04",
>>>                 "key": 1459728000000,
>>>                 "doc_count": 48
>>>               }
>>>             ]
>>>           }
>>>         },
>>>         "trace": {
>>>           "doc_count": 372,
>>>           "messages_over_time": {
>>>             "buckets": [
>>>               {
>>>                 "key_as_string": "2016-04-04",
>>>                 "key": 1459728000000,
>>>                 "doc_count": 372
>>>               }
>>>             ]
>>>           }
>>>         }
>>>       }
>>>     }
>>>
>>> as expected, it has the timeseries of the 'error' and 'trace' messages.
>>>
>>> Is there any limitation in elastic search interpreter which does not
>>> allow parsing of complex responses?
>>>
>>> Regards,
>>> Ashish
>>>
>>>
>>
>

Re: Elastic Search Interpreter limitation?

Posted by ashish rawat <dc...@gmail.com>.
Thanks Bruno for the prompt reply. Do you know of any indirect way of
achieving the same, i.e. timeseries' of all values of a field (eg logLevel,
httpMethod)

Regards,
Ashish

On Tue, Apr 19, 2016 at 10:38 PM, Bruno Bonnin <bb...@gmail.com> wrote:

> Hello,
>
> You are right, there are some limitations with the Elasticsearch
> interpreter.
> I have developed it and I'am going to check how I can change the component
> to take into account this kind of more complex request.
>
> Regards,
> Bruno
>
> 2016-04-19 18:53 GMT+02:00 ashish rawat <dc...@gmail.com>:
>
>> Hi,
>>
>> I am trying to use the filters aggregation of elastic search
>>
>> https://www.elastic.co/guide/en/elasticsearch/reference/2.2/search-aggregations-bucket-filters-aggregation.html
>>
>>
>> As documented on the elastic page, I made the following query through
>> zeppelin
>> {
>>   "aggs" : {
>>     "messages" : {
>>       "filters" : {
>>         "filters" : {
>>           "error" :   { "term" : { "logLevel" : "error"   }},
>>           "trace" : { "term" : { "logLevel" : "trace" }}
>>         }
>>       },
>>      "aggs" : {
>>         "messages_over_time" : {
>>             "date_histogram" : {
>>                 "field" : "timestamp",
>>                 "interval" : "day",
>>                 "format" : "yyyy-MM-dd"
>>             }
>>         }
>>     }
>>     }
>>
>> but the response only contained the fields: 'key' and 'doc_count',
>> whereas if I run the same query through elastic's rest interface, I get the
>> following result
>>
>>   "aggregations": {
>>     "messages": {
>>       "buckets": {
>>         "error": {
>>           "doc_count": 57,
>>           "messages_over_time": {
>>             "buckets": [
>>               {
>>                 "key_as_string": "2016-03-21",
>>                 "key": 1458518400000,
>>                 "doc_count": 1
>>               },
>>               {
>>                 "key_as_string": "2016-03-22",
>>                 "key": 1458604800000,
>>                 "doc_count": 0
>>               },
>>               {
>>                 "key_as_string": "2016-03-23",
>>                 "key": 1458691200000,
>>                 "doc_count": 0
>>               },
>>               {
>>                 "key_as_string": "2016-03-24",
>>                 "key": 1458777600000,
>>                 "doc_count": 0
>>               },
>>               {
>>                 "key_as_string": "2016-03-25",
>>                 "key": 1458864000000,
>>                 "doc_count": 0
>>               },
>>               {
>>                 "key_as_string": "2016-03-26",
>>                 "key": 1458950400000,
>>                 "doc_count": 0
>>               },
>>               {
>>                 "key_as_string": "2016-03-27",
>>                 "key": 1459036800000,
>>                 "doc_count": 0
>>               },
>>               {
>>                 "key_as_string": "2016-03-28",
>>                 "key": 1459123200000,
>>                 "doc_count": 0
>>               },
>>               {
>>                 "key_as_string": "2016-03-29",
>>                 "key": 1459209600000,
>>                 "doc_count": 0
>>               },
>>               {
>>                 "key_as_string": "2016-03-30",
>>                 "key": 1459296000000,
>>                 "doc_count": 0
>>               },
>>               {
>>                 "key_as_string": "2016-03-31",
>>                 "key": 1459382400000,
>>                 "doc_count": 0
>>               },
>>               {
>>                 "key_as_string": "2016-04-01",
>>                 "key": 1459468800000,
>>                 "doc_count": 8
>>               },
>>               {
>>                 "key_as_string": "2016-04-02",
>>                 "key": 1459555200000,
>>                 "doc_count": 0
>>               },
>>               {
>>                 "key_as_string": "2016-04-03",
>>                 "key": 1459641600000,
>>                 "doc_count": 0
>>               },
>>               {
>>                 "key_as_string": "2016-04-04",
>>                 "key": 1459728000000,
>>                 "doc_count": 48
>>               }
>>             ]
>>           }
>>         },
>>         "trace": {
>>           "doc_count": 372,
>>           "messages_over_time": {
>>             "buckets": [
>>               {
>>                 "key_as_string": "2016-04-04",
>>                 "key": 1459728000000,
>>                 "doc_count": 372
>>               }
>>             ]
>>           }
>>         }
>>       }
>>     }
>>
>> as expected, it has the timeseries of the 'error' and 'trace' messages.
>>
>> Is there any limitation in elastic search interpreter which does not
>> allow parsing of complex responses?
>>
>> Regards,
>> Ashish
>>
>>
>

Re: Elastic Search Interpreter limitation?

Posted by Bruno Bonnin <bb...@gmail.com>.
Hello,

You are right, there are some limitations with the Elasticsearch
interpreter.
I have developed it and I'am going to check how I can change the component
to take into account this kind of more complex request.

Regards,
Bruno

2016-04-19 18:53 GMT+02:00 ashish rawat <dc...@gmail.com>:

> Hi,
>
> I am trying to use the filters aggregation of elastic search
>
> https://www.elastic.co/guide/en/elasticsearch/reference/2.2/search-aggregations-bucket-filters-aggregation.html
>
>
> As documented on the elastic page, I made the following query through
> zeppelin
> {
>   "aggs" : {
>     "messages" : {
>       "filters" : {
>         "filters" : {
>           "error" :   { "term" : { "logLevel" : "error"   }},
>           "trace" : { "term" : { "logLevel" : "trace" }}
>         }
>       },
>      "aggs" : {
>         "messages_over_time" : {
>             "date_histogram" : {
>                 "field" : "timestamp",
>                 "interval" : "day",
>                 "format" : "yyyy-MM-dd"
>             }
>         }
>     }
>     }
>
> but the response only contained the fields: 'key' and 'doc_count', whereas
> if I run the same query through elastic's rest interface, I get the
> following result
>
>   "aggregations": {
>     "messages": {
>       "buckets": {
>         "error": {
>           "doc_count": 57,
>           "messages_over_time": {
>             "buckets": [
>               {
>                 "key_as_string": "2016-03-21",
>                 "key": 1458518400000,
>                 "doc_count": 1
>               },
>               {
>                 "key_as_string": "2016-03-22",
>                 "key": 1458604800000,
>                 "doc_count": 0
>               },
>               {
>                 "key_as_string": "2016-03-23",
>                 "key": 1458691200000,
>                 "doc_count": 0
>               },
>               {
>                 "key_as_string": "2016-03-24",
>                 "key": 1458777600000,
>                 "doc_count": 0
>               },
>               {
>                 "key_as_string": "2016-03-25",
>                 "key": 1458864000000,
>                 "doc_count": 0
>               },
>               {
>                 "key_as_string": "2016-03-26",
>                 "key": 1458950400000,
>                 "doc_count": 0
>               },
>               {
>                 "key_as_string": "2016-03-27",
>                 "key": 1459036800000,
>                 "doc_count": 0
>               },
>               {
>                 "key_as_string": "2016-03-28",
>                 "key": 1459123200000,
>                 "doc_count": 0
>               },
>               {
>                 "key_as_string": "2016-03-29",
>                 "key": 1459209600000,
>                 "doc_count": 0
>               },
>               {
>                 "key_as_string": "2016-03-30",
>                 "key": 1459296000000,
>                 "doc_count": 0
>               },
>               {
>                 "key_as_string": "2016-03-31",
>                 "key": 1459382400000,
>                 "doc_count": 0
>               },
>               {
>                 "key_as_string": "2016-04-01",
>                 "key": 1459468800000,
>                 "doc_count": 8
>               },
>               {
>                 "key_as_string": "2016-04-02",
>                 "key": 1459555200000,
>                 "doc_count": 0
>               },
>               {
>                 "key_as_string": "2016-04-03",
>                 "key": 1459641600000,
>                 "doc_count": 0
>               },
>               {
>                 "key_as_string": "2016-04-04",
>                 "key": 1459728000000,
>                 "doc_count": 48
>               }
>             ]
>           }
>         },
>         "trace": {
>           "doc_count": 372,
>           "messages_over_time": {
>             "buckets": [
>               {
>                 "key_as_string": "2016-04-04",
>                 "key": 1459728000000,
>                 "doc_count": 372
>               }
>             ]
>           }
>         }
>       }
>     }
>
> as expected, it has the timeseries of the 'error' and 'trace' messages.
>
> Is there any limitation in elastic search interpreter which does not allow
> parsing of complex responses?
>
> Regards,
> Ashish
>
>