You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@nifi.apache.org by johny casanova <co...@gmail.com> on 2016/10/25 18:36:46 UTC

Fwd: fetch elasticsearch http

Hello,

Do you guys have an example config of how this processor should look? I
have a regular easticsearch install that is only receiving syslogs. I'm
trying to figure out how to find or what to put for document identifier. I
did a curl in elasticsearch and saw a field "id" but, it does not look like
that works.

Re: fetch elasticsearch http

Posted by johny casanova <co...@gmail.com>.

Matt,

Thanks for the template. I'll see if it works and respond back. Looks like
in 1.1.0 it will be easier to implement so I might wait until then.

On Wed, Oct 26, 2016 at 1:18 PM, Matt Burgess <ma...@apache.org> wrote:

> Johny,
>
> I have a template that I created before the Elasticsearch processors
> were available, it uses InvokeHttp to do a query, then later uses
> InvokeHttp to get the individual files (if you didn't ask for the full
> doc text to be returned by the query). The latter one can be replaced
> with FetchElasticsearch or FetchElasticsearchHttp, and after 1.1.0
> comes out, the first one can be replaced by either
> QueryElasticsearchHttp or ScrollElasticsearchHttp (depending on how
> you want to page the results).  For now, it sounds like you want the
> first part of the flow, to create a flow file and configure the
> InvokeHttp processor to query an ES index, then parse the JSON
> results.
>
> I put the template up as a gist:
> https://gist.github.com/mattyb149/f612d052adb07434c975e4f930a995eb
>
> Regards,
> Matt
>
> On Wed, Oct 26, 2016 at 12:50 PM, johny casanova
> <co...@gmail.com> wrote:
> > Matt,
> >
> > I'm trying out the 1.0 version of nifi. I'm trying to get documents using
> > the FetchElasticSearch(Http) Maybe that's the problem I'm having. I was
> not
> > aware or noticed in the docs mentioning to use the invokehttp. So
> basically
> > what I'm trying to do is get all the syslogs in a specific index using
> nifi
> > then store them on HDFS.
> >
> > On Tue, Oct 25, 2016 at 6:34 PM, Matt Burgess <ma...@apache.org>
> wrote:
> >>
> >> Johny,
> >>
> >> What version of NiFi are you using? Also are you trying to get
> >> documents from ES using FetchElasticSearch(Http) or put docs to it
> >> using PutElasticsearch(Http)?  For Fetching, the Document Identifier
> >> is the _id of the document you want to retrieve. If you're looking to
> >> do a search on documents from a given index, type, etc. then (before
> >> NiFi 1.1.0 comes out) you'd have to use InvokeHttp to interact with
> >> the Elasticsearch REST API, then parse the response to get the
> >> document identifiers for each of the results and put that into
> >> FetchElasticsearch. NiFi 1.1.0 will have QueryElasticsearchHttp and
> >> ScrollElasticsearchHttp [1], which are made for getting results from
> >> searches vs direct "gets" (via FetchES). Out of curiosity, what REST
> >> endpoint are you using with curl?
> >>
> >> If you are trying to put docs into ES, then the field is named
> >> Document Identifier Attribute, and that refers to the name of a
> >> FlowFile attribute whose value is the identifier you want to use for
> >> the document (whose body is the content of the FlowFile).
> >> PutElasticsearchHttp supports leaving that field blank when adding to
> >> an index (the ID will be auto-generated), but it is an open issue [2]
> >> to support auto-generation in PutElasticsearch.
> >>
> >> Does this answer your question? If not please let me know and I can
> >> provide more info.
> >>
> >> Regards,
> >> Matt
> >>
> >> [1] https://issues.apache.org/jira/browse/NIFI-2417
> >> [2] https://issues.apache.org/jira/browse/NIFI-1576
> >>
> >> On Tue, Oct 25, 2016 at 2:36 PM, johny casanova
> >> <co...@gmail.com> wrote:
> >> >
> >> >
> >> >
> >> > Hello,
> >> >
> >> > Do you guys have an example config of how this processor should look?
> I
> >> > have
> >> > a regular easticsearch install that is only receiving syslogs. I'm
> >> > trying to
> >> > figure out how to find or what to put for document identifier. I did a
> >> > curl
> >> > in elasticsearch and saw a field "id" but, it does not look like that
> >> > works.
> >> >
> >
> >
>

Re: fetch elasticsearch http

Posted by Matt Burgess <ma...@apache.org>.

Johny,

I have a template that I created before the Elasticsearch processors
were available, it uses InvokeHttp to do a query, then later uses
InvokeHttp to get the individual files (if you didn't ask for the full
doc text to be returned by the query). The latter one can be replaced
with FetchElasticsearch or FetchElasticsearchHttp, and after 1.1.0
comes out, the first one can be replaced by either
QueryElasticsearchHttp or ScrollElasticsearchHttp (depending on how
you want to page the results).  For now, it sounds like you want the
first part of the flow, to create a flow file and configure the
InvokeHttp processor to query an ES index, then parse the JSON
results.

I put the template up as a gist:
https://gist.github.com/mattyb149/f612d052adb07434c975e4f930a995eb

Regards,
Matt

On Wed, Oct 26, 2016 at 12:50 PM, johny casanova
<co...@gmail.com> wrote:
> Matt,
>
> I'm trying out the 1.0 version of nifi. I'm trying to get documents using
> the FetchElasticSearch(Http) Maybe that's the problem I'm having. I was not
> aware or noticed in the docs mentioning to use the invokehttp. So basically
> what I'm trying to do is get all the syslogs in a specific index using nifi
> then store them on HDFS.
>
> On Tue, Oct 25, 2016 at 6:34 PM, Matt Burgess <ma...@apache.org> wrote:
>>
>> Johny,
>>
>> What version of NiFi are you using? Also are you trying to get
>> documents from ES using FetchElasticSearch(Http) or put docs to it
>> using PutElasticsearch(Http)?  For Fetching, the Document Identifier
>> is the _id of the document you want to retrieve. If you're looking to
>> do a search on documents from a given index, type, etc. then (before
>> NiFi 1.1.0 comes out) you'd have to use InvokeHttp to interact with
>> the Elasticsearch REST API, then parse the response to get the
>> document identifiers for each of the results and put that into
>> FetchElasticsearch. NiFi 1.1.0 will have QueryElasticsearchHttp and
>> ScrollElasticsearchHttp [1], which are made for getting results from
>> searches vs direct "gets" (via FetchES). Out of curiosity, what REST
>> endpoint are you using with curl?
>>
>> If you are trying to put docs into ES, then the field is named
>> Document Identifier Attribute, and that refers to the name of a
>> FlowFile attribute whose value is the identifier you want to use for
>> the document (whose body is the content of the FlowFile).
>> PutElasticsearchHttp supports leaving that field blank when adding to
>> an index (the ID will be auto-generated), but it is an open issue [2]
>> to support auto-generation in PutElasticsearch.
>>
>> Does this answer your question? If not please let me know and I can
>> provide more info.
>>
>> Regards,
>> Matt
>>
>> [1] https://issues.apache.org/jira/browse/NIFI-2417
>> [2] https://issues.apache.org/jira/browse/NIFI-1576
>>
>> On Tue, Oct 25, 2016 at 2:36 PM, johny casanova
>> <co...@gmail.com> wrote:
>> >
>> >
>> >
>> > Hello,
>> >
>> > Do you guys have an example config of how this processor should look? I
>> > have
>> > a regular easticsearch install that is only receiving syslogs. I'm
>> > trying to
>> > figure out how to find or what to put for document identifier. I did a
>> > curl
>> > in elasticsearch and saw a field "id" but, it does not look like that
>> > works.
>> >
>
>

Re: fetch elasticsearch http

Posted by johny casanova <co...@gmail.com>.

Matt,

I'm trying out the 1.0 version of nifi. I'm trying to get documents using
the FetchElasticSearch(Http) Maybe that's the problem I'm having. I was not
aware or noticed in the docs mentioning to use the invokehttp. So basically
what I'm trying to do is get all the syslogs in a specific index using nifi
then store them on HDFS.

On Tue, Oct 25, 2016 at 6:34 PM, Matt Burgess <ma...@apache.org> wrote:

> Johny,
>
> What version of NiFi are you using? Also are you trying to get
> documents from ES using FetchElasticSearch(Http) or put docs to it
> using PutElasticsearch(Http)?  For Fetching, the Document Identifier
> is the _id of the document you want to retrieve. If you're looking to
> do a search on documents from a given index, type, etc. then (before
> NiFi 1.1.0 comes out) you'd have to use InvokeHttp to interact with
> the Elasticsearch REST API, then parse the response to get the
> document identifiers for each of the results and put that into
> FetchElasticsearch. NiFi 1.1.0 will have QueryElasticsearchHttp and
> ScrollElasticsearchHttp [1], which are made for getting results from
> searches vs direct "gets" (via FetchES). Out of curiosity, what REST
> endpoint are you using with curl?
>
> If you are trying to put docs into ES, then the field is named
> Document Identifier Attribute, and that refers to the name of a
> FlowFile attribute whose value is the identifier you want to use for
> the document (whose body is the content of the FlowFile).
> PutElasticsearchHttp supports leaving that field blank when adding to
> an index (the ID will be auto-generated), but it is an open issue [2]
> to support auto-generation in PutElasticsearch.
>
> Does this answer your question? If not please let me know and I can
> provide more info.
>
> Regards,
> Matt
>
> [1] https://issues.apache.org/jira/browse/NIFI-2417
> [2] https://issues.apache.org/jira/browse/NIFI-1576
>
> On Tue, Oct 25, 2016 at 2:36 PM, johny casanova
> <co...@gmail.com> wrote:
> >
> >
> >
> > Hello,
> >
> > Do you guys have an example config of how this processor should look? I
> have
> > a regular easticsearch install that is only receiving syslogs. I'm
> trying to
> > figure out how to find or what to put for document identifier. I did a
> curl
> > in elasticsearch and saw a field "id" but, it does not look like that
> works.
> >
>

Re: fetch elasticsearch http

Posted by Matt Burgess <ma...@apache.org>.

Johny,

What version of NiFi are you using? Also are you trying to get
documents from ES using FetchElasticSearch(Http) or put docs to it
using PutElasticsearch(Http)?  For Fetching, the Document Identifier
is the _id of the document you want to retrieve. If you're looking to
do a search on documents from a given index, type, etc. then (before
NiFi 1.1.0 comes out) you'd have to use InvokeHttp to interact with
the Elasticsearch REST API, then parse the response to get the
document identifiers for each of the results and put that into
FetchElasticsearch. NiFi 1.1.0 will have QueryElasticsearchHttp and
ScrollElasticsearchHttp [1], which are made for getting results from
searches vs direct "gets" (via FetchES). Out of curiosity, what REST
endpoint are you using with curl?

If you are trying to put docs into ES, then the field is named
Document Identifier Attribute, and that refers to the name of a
FlowFile attribute whose value is the identifier you want to use for
the document (whose body is the content of the FlowFile).
PutElasticsearchHttp supports leaving that field blank when adding to
an index (the ID will be auto-generated), but it is an open issue [2]
to support auto-generation in PutElasticsearch.

Does this answer your question? If not please let me know and I can
provide more info.

Regards,
Matt

[1] https://issues.apache.org/jira/browse/NIFI-2417
[2] https://issues.apache.org/jira/browse/NIFI-1576

On Tue, Oct 25, 2016 at 2:36 PM, johny casanova
<co...@gmail.com> wrote:
>
>
>
> Hello,
>
> Do you guys have an example config of how this processor should look? I have
> a regular easticsearch install that is only receiving syslogs. I'm trying to
> figure out how to find or what to put for document identifier. I did a curl
> in elasticsearch and saw a field "id" but, it does not look like that works.
>