You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Sreenivas.T" <sr...@gmail.com> on 2017/12/07 03:57:34 UTC

Calling rest API from Solr custom tokenizer plugin

All,

I need help from experts. We are trying to build a cognitive search
platform with enterprise content from content sources like sharepoint, file
share etc.. before content is getting indexed to Solr, I need to call our
internal AI platform to get additional metadata like classification tags
etc..

I'm planning to leverage manifold cf for getting the content from sources
and planning to write
Custom tokenizer plugin to send the content to AI platform, which intern
returns with additional tags. I'll index additional tags dynamically
through plugin code.

Is it a feasible solution?Is there any other way to achieve the same? I was
planning to not to customize manifold cf.

Please suggest



Regards,
Sreenivas

Re: Calling rest API from Solr custom tokenizer plugin

Posted by "Sreenivas.T" <sr...@gmail.com>.
Thanks Doug. Now I think it's better to customize Manifold CF's output
connector for
Solr.

Sreenivas
On Thu, Dec 7, 2017 at 10:01 AM Doug Turnbull <
dturnbull@opensourceconnections.com> wrote:

> A tokenizer plugin is probably not what you want, you probably want
> something more like an UpdateProcessor that can manipulate the whole
> document as it comes into Solr. Or you may want to avoid having a Solr
> plugin call to an API and do this work outside of Solr (what happens when
> the API is down, should doc updates fail? for example).
>
> A tokenizer plugin would definitely not be recommended. Tokenizers need to
> fast, low-level code that split up text into tokens based on readily
> accesible config & data. The overhead of a network call would be far too
> high,
>
> You probably want to put your extracted tags Into a different field anyway,
> and a tokenizer only works on text within a single field.
>
> -Doug
>
> On Wed, Dec 6, 2017 at 10:57 PM Sreenivas.T <sr...@gmail.com> wrote:
>
> > All,
> >
> > I need help from experts. We are trying to build a cognitive search
> > platform with enterprise content from content sources like sharepoint,
> file
> > share etc.. before content is getting indexed to Solr, I need to call our
> > internal AI platform to get additional metadata like classification tags
> > etc..
> >
> > I'm planning to leverage manifold cf for getting the content from sources
> > and planning to write
> > Custom tokenizer plugin to send the content to AI platform, which intern
> > returns with additional tags. I'll index additional tags dynamically
> > through plugin code.
> >
> > Is it a feasible solution?Is there any other way to achieve the same? I
> was
> > planning to not to customize manifold cf.
> >
> > Please suggest
> >
> >
> >
> > Regards,
> > Sreenivas
> >
> --
> Consultant, OpenSource Connections. Contact info at
> http://o19s.com/about-us/doug-turnbull/; Free/Busy (
> http://bit.ly/dougs_cal)
>

Re: Calling rest API from Solr custom tokenizer plugin

Posted by Doug Turnbull <dt...@opensourceconnections.com>.
A tokenizer plugin is probably not what you want, you probably want
something more like an UpdateProcessor that can manipulate the whole
document as it comes into Solr. Or you may want to avoid having a Solr
plugin call to an API and do this work outside of Solr (what happens when
the API is down, should doc updates fail? for example).

A tokenizer plugin would definitely not be recommended. Tokenizers need to
fast, low-level code that split up text into tokens based on readily
accesible config & data. The overhead of a network call would be far too
high,

You probably want to put your extracted tags Into a different field anyway,
and a tokenizer only works on text within a single field.

-Doug

On Wed, Dec 6, 2017 at 10:57 PM Sreenivas.T <sr...@gmail.com> wrote:

> All,
>
> I need help from experts. We are trying to build a cognitive search
> platform with enterprise content from content sources like sharepoint, file
> share etc.. before content is getting indexed to Solr, I need to call our
> internal AI platform to get additional metadata like classification tags
> etc..
>
> I'm planning to leverage manifold cf for getting the content from sources
> and planning to write
> Custom tokenizer plugin to send the content to AI platform, which intern
> returns with additional tags. I'll index additional tags dynamically
> through plugin code.
>
> Is it a feasible solution?Is there any other way to achieve the same? I was
> planning to not to customize manifold cf.
>
> Please suggest
>
>
>
> Regards,
> Sreenivas
>
-- 
Consultant, OpenSource Connections. Contact info at
http://o19s.com/about-us/doug-turnbull/; Free/Busy (http://bit.ly/dougs_cal)