You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Zheng Lin Edwin Yeo <ed...@gmail.com> on 2018/08/01 09:59:42 UTC

Questions on The Tagger Handler

Hi,

I am trying out the Tagger Handler in Solr 7.4.0 by following the tutorial
from
https://lucene.apache.org/solr/guide/7_4/the-tagger-handler.html#tutorial-with-geonames

I have managed to set it up to work, but what I do not really understand is
how to analyse the output. From the example, it seems to be trying to tag
'Hello New York City', and it returns one output. This seems more like
searching for the 'name' field (in the example, the 'name' field is copied
to the 'name_tag' field for tagging) and getting the records with the name
"New York City".

What is the actual purpose of doing this?

Also, what does the "startOffset" and "endOffset" means, and how the value
is calculated?

{
  "responseHeader":{
    "status":0,
    "QTime":1},
  "tagsCount":1,
  "tags":[[
      "startOffset",6,
      "endOffset",19,
      "ids",["5128581"]]],
  "response":{"numFound":1,"start":0,"docs":[
      {
        "id":"5128581",
        "name":["New York City"],
        "countrycode":["US"]}]
  }}


Regards,
Edwin

Re: Questions on The Tagger Handler

Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
Hi Alexandre,

I have found that the ConcatenateGraphFilterFactory at the end of the
indexing chain and it will merge the tokens back into a single field (even
though the Standard Tokenizer has split it, which is what we use for normal
phrase search), so I believe it is due to this that when I search for
"Hello New York City" or "Hello New York", it is not able to match "New
York City".

So this is the correct way that the Tagger Handler works?

Regards,
Edwin

On Thu, 2 Aug 2018 at 11:41, Alexandre Rafalovitch <ar...@gmail.com>
wrote:

> You have "Hello New York City" as both working and non-working
> example. I am not sure what specifically is an issue.
>
> In general, you have processing on both indexing and query and then
> the tokens must match in the right order. Just like a normal phrase
> search, but in reverse.
>
> Regards,
>    Alex.
>
> On 1 August 2018 at 22:13, Zheng Lin Edwin Yeo <ed...@gmail.com>
> wrote:
> > Hi Alexandre,
> >
> > Thanks for the information.
> >
> > I found that it is able to retrieve the record if I search for "Hello New
> > York City" or "New York City".
> > However, I am not able to retrieve it if I search for "Hello New York
> City"
> > or "Hello New York".
> > Is that the right behavior?
> >
> > Regards,
> > Edwin
> >
> > On Wed, 1 Aug 2018 at 22:13, Alexandre Rafalovitch <ar...@gmail.com>
> > wrote:
> >
> >> You may find this interesting:
> >>
> >>
> https://slideshare.net/arafalov/searching-for-ai-leveraging-solr-for-classic-artificial-intelligence-tasks/
> >> Specifically, slides 15-18.
> >>
> >> Basically, it is a reverse from normal search. You are searching for
> >> occurrences of the already indexed terms (here, the place names) in
> >> the text you sent. And it returns information about what it found and
> >> where in your original text it is (the offsets). The text you send to
> >> the tagger does not end up in Solr.
> >>
> >> What is missing is a good visualization of what it found. Which would
> >> be a bit like highlighter, taking those offsets and applying them to
> >> the original text.
> >>
> >> Regards,
> >>    Alex.
> >>
> >> On 1 August 2018 at 05:59, Zheng Lin Edwin Yeo <ed...@gmail.com>
> >> wrote:
> >> > Hi,
> >> >
> >> > I am trying out the Tagger Handler in Solr 7.4.0 by following the
> >> tutorial
> >> > from
> >> >
> >>
> https://lucene.apache.org/solr/guide/7_4/the-tagger-handler.html#tutorial-with-geonames
> >> >
> >> > I have managed to set it up to work, but what I do not really
> understand
> >> is
> >> > how to analyse the output. From the example, it seems to be trying to
> tag
> >> > 'Hello New York City', and it returns one output. This seems more like
> >> > searching for the 'name' field (in the example, the 'name' field is
> >> copied
> >> > to the 'name_tag' field for tagging) and getting the records with the
> >> name
> >> > "New York City".
> >> >
> >> > What is the actual purpose of doing this?
> >> >
> >> > Also, what does the "startOffset" and "endOffset" means, and how the
> >> value
> >> > is calculated?
> >> >
> >> > {
> >> >   "responseHeader":{
> >> >     "status":0,
> >> >     "QTime":1},
> >> >   "tagsCount":1,
> >> >   "tags":[[
> >> >       "startOffset",6,
> >> >       "endOffset",19,
> >> >       "ids",["5128581"]]],
> >> >   "response":{"numFound":1,"start":0,"docs":[
> >> >       {
> >> >         "id":"5128581",
> >> >         "name":["New York City"],
> >> >         "countrycode":["US"]}]
> >> >   }}
> >> >
> >> >
> >> > Regards,
> >> > Edwin
> >>
>

Re: Questions on The Tagger Handler

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
You have "Hello New York City" as both working and non-working
example. I am not sure what specifically is an issue.

In general, you have processing on both indexing and query and then
the tokens must match in the right order. Just like a normal phrase
search, but in reverse.

Regards,
   Alex.

On 1 August 2018 at 22:13, Zheng Lin Edwin Yeo <ed...@gmail.com> wrote:
> Hi Alexandre,
>
> Thanks for the information.
>
> I found that it is able to retrieve the record if I search for "Hello New
> York City" or "New York City".
> However, I am not able to retrieve it if I search for "Hello New York City"
> or "Hello New York".
> Is that the right behavior?
>
> Regards,
> Edwin
>
> On Wed, 1 Aug 2018 at 22:13, Alexandre Rafalovitch <ar...@gmail.com>
> wrote:
>
>> You may find this interesting:
>>
>> https://slideshare.net/arafalov/searching-for-ai-leveraging-solr-for-classic-artificial-intelligence-tasks/
>> Specifically, slides 15-18.
>>
>> Basically, it is a reverse from normal search. You are searching for
>> occurrences of the already indexed terms (here, the place names) in
>> the text you sent. And it returns information about what it found and
>> where in your original text it is (the offsets). The text you send to
>> the tagger does not end up in Solr.
>>
>> What is missing is a good visualization of what it found. Which would
>> be a bit like highlighter, taking those offsets and applying them to
>> the original text.
>>
>> Regards,
>>    Alex.
>>
>> On 1 August 2018 at 05:59, Zheng Lin Edwin Yeo <ed...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > I am trying out the Tagger Handler in Solr 7.4.0 by following the
>> tutorial
>> > from
>> >
>> https://lucene.apache.org/solr/guide/7_4/the-tagger-handler.html#tutorial-with-geonames
>> >
>> > I have managed to set it up to work, but what I do not really understand
>> is
>> > how to analyse the output. From the example, it seems to be trying to tag
>> > 'Hello New York City', and it returns one output. This seems more like
>> > searching for the 'name' field (in the example, the 'name' field is
>> copied
>> > to the 'name_tag' field for tagging) and getting the records with the
>> name
>> > "New York City".
>> >
>> > What is the actual purpose of doing this?
>> >
>> > Also, what does the "startOffset" and "endOffset" means, and how the
>> value
>> > is calculated?
>> >
>> > {
>> >   "responseHeader":{
>> >     "status":0,
>> >     "QTime":1},
>> >   "tagsCount":1,
>> >   "tags":[[
>> >       "startOffset",6,
>> >       "endOffset",19,
>> >       "ids",["5128581"]]],
>> >   "response":{"numFound":1,"start":0,"docs":[
>> >       {
>> >         "id":"5128581",
>> >         "name":["New York City"],
>> >         "countrycode":["US"]}]
>> >   }}
>> >
>> >
>> > Regards,
>> > Edwin
>>

Re: Questions on The Tagger Handler

Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
Hi Alexandre,

Thanks for the information.

I found that it is able to retrieve the record if I search for "Hello New
York City" or "New York City".
However, I am not able to retrieve it if I search for "Hello New York City"
or "Hello New York".
Is that the right behavior?

Regards,
Edwin

On Wed, 1 Aug 2018 at 22:13, Alexandre Rafalovitch <ar...@gmail.com>
wrote:

> You may find this interesting:
>
> https://slideshare.net/arafalov/searching-for-ai-leveraging-solr-for-classic-artificial-intelligence-tasks/
> Specifically, slides 15-18.
>
> Basically, it is a reverse from normal search. You are searching for
> occurrences of the already indexed terms (here, the place names) in
> the text you sent. And it returns information about what it found and
> where in your original text it is (the offsets). The text you send to
> the tagger does not end up in Solr.
>
> What is missing is a good visualization of what it found. Which would
> be a bit like highlighter, taking those offsets and applying them to
> the original text.
>
> Regards,
>    Alex.
>
> On 1 August 2018 at 05:59, Zheng Lin Edwin Yeo <ed...@gmail.com>
> wrote:
> > Hi,
> >
> > I am trying out the Tagger Handler in Solr 7.4.0 by following the
> tutorial
> > from
> >
> https://lucene.apache.org/solr/guide/7_4/the-tagger-handler.html#tutorial-with-geonames
> >
> > I have managed to set it up to work, but what I do not really understand
> is
> > how to analyse the output. From the example, it seems to be trying to tag
> > 'Hello New York City', and it returns one output. This seems more like
> > searching for the 'name' field (in the example, the 'name' field is
> copied
> > to the 'name_tag' field for tagging) and getting the records with the
> name
> > "New York City".
> >
> > What is the actual purpose of doing this?
> >
> > Also, what does the "startOffset" and "endOffset" means, and how the
> value
> > is calculated?
> >
> > {
> >   "responseHeader":{
> >     "status":0,
> >     "QTime":1},
> >   "tagsCount":1,
> >   "tags":[[
> >       "startOffset",6,
> >       "endOffset",19,
> >       "ids",["5128581"]]],
> >   "response":{"numFound":1,"start":0,"docs":[
> >       {
> >         "id":"5128581",
> >         "name":["New York City"],
> >         "countrycode":["US"]}]
> >   }}
> >
> >
> > Regards,
> > Edwin
>

Re: Questions on The Tagger Handler

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
You may find this interesting:
https://slideshare.net/arafalov/searching-for-ai-leveraging-solr-for-classic-artificial-intelligence-tasks/
Specifically, slides 15-18.

Basically, it is a reverse from normal search. You are searching for
occurrences of the already indexed terms (here, the place names) in
the text you sent. And it returns information about what it found and
where in your original text it is (the offsets). The text you send to
the tagger does not end up in Solr.

What is missing is a good visualization of what it found. Which would
be a bit like highlighter, taking those offsets and applying them to
the original text.

Regards,
   Alex.

On 1 August 2018 at 05:59, Zheng Lin Edwin Yeo <ed...@gmail.com> wrote:
> Hi,
>
> I am trying out the Tagger Handler in Solr 7.4.0 by following the tutorial
> from
> https://lucene.apache.org/solr/guide/7_4/the-tagger-handler.html#tutorial-with-geonames
>
> I have managed to set it up to work, but what I do not really understand is
> how to analyse the output. From the example, it seems to be trying to tag
> 'Hello New York City', and it returns one output. This seems more like
> searching for the 'name' field (in the example, the 'name' field is copied
> to the 'name_tag' field for tagging) and getting the records with the name
> "New York City".
>
> What is the actual purpose of doing this?
>
> Also, what does the "startOffset" and "endOffset" means, and how the value
> is calculated?
>
> {
>   "responseHeader":{
>     "status":0,
>     "QTime":1},
>   "tagsCount":1,
>   "tags":[[
>       "startOffset",6,
>       "endOffset",19,
>       "ids",["5128581"]]],
>   "response":{"numFound":1,"start":0,"docs":[
>       {
>         "id":"5128581",
>         "name":["New York City"],
>         "countrycode":["US"]}]
>   }}
>
>
> Regards,
> Edwin