You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by "Jamal, Sarfaraz" <Sa...@VerizonWireless.com.INVALID> on 2016/06/02 18:50:51 UTC

Question(s) about Highlighting

I am having some difficulty understanding how to do something and if it is even possible

I have tried the following sets of Synonyms:

1.  sarfaraz, sas, sasjamal
2.  sasjamal,sas => Sarfaraz

In the second instance, any searches with the world 'sasjamal' do not appear in the results, as it has been converted to Sarfaraz (I believe) -
In the first instance it works better - I believe all instances of any of those words  appear in the results. However the highlighted snippets also stop working when any of those words are 
Matched. Is there any documentation, insights or help about this issue?

Thanks in advance,

Sas

-----Original Message-----
From: Shawn Heisey [mailto:apache@elyograg.org] 
Sent: Thursday, June 2, 2016 2:43 PM
To: solr-user@lucene.apache.org
Subject: [E] Re: MongoDB and Solr - Massive re-indexing

On 6/2/2016 11:56 AM, Robert Brown wrote:
> My question is whether sending batches of 1,000 documents to Solr is 
> still beneficial (thinking about docs that may not change), or if I 
> should look at the MongoDB connector for Solr, based on the volume of 
> incoming data we see.
>
> Would the connector still see all docs updating if I re-insert them 
> blindly, and thus still send all 50m documents back to Solr everyday 
> anyway?
>
> Is my setup quite typical for the MongoDB connector?

Sending update requests to Solr containing batches of 1000 docs is a good idea.  Depending on how large they are, you may be able to send even more than 1000.  If you can avoid sending documents that haven't changed, Solr will likely perform better and relevance scoring will be better, because you won't have as many deleted docs.

The mongo connector is not software from the Solr project, or even from Apache.  We don't know anything about it.  If you have questions about that software, please contact the people who maintain it.  If their answers lead to questions about Solr itself, then you can bring those back here.

Thanks,
Shawn

RE: [E] Re: Question(s) about Highlighting

Posted by "Jamal, Sarfaraz" <Sa...@VerizonWireless.com.INVALID>.

Update on this:

I feel I have a good grasp of synonyms:

In that I am doing it only at query time and not at indexing time

It looks like this in Synonyms.txt
sarfaraz jamal,sasjamal, sas,sarfaraz,wiggidy

Each one of those bring back the exact same records.

However it only highlights Jamal (with a space in front of it) 

Is there a way I can get the highlight snippets for each of the 4 synonyms of each other?

Thank you !

Sas


-----Original Message-----
From: Jamal, Sarfaraz [mailto:Sarfaraz.Jamal@VerizonWireless.com.INVALID] 
Sent: Friday, June 3, 2016 9:52 AM
To: solr-user@lucene.apache.org
Subject: RE: [E] Re: Question(s) about Highlighting

Good Morning Alessandro,

I verified it through the analysis tool (thanks for pointing it out), and it appears to be working correctly - As I see all of them as being synonyms of each other for this entry:

sasjamal, sarfaraz, sas

- When I do it only at indexing time, and disable it during query time (editing the synonyms.txt file SOLR6) - It does not treat them equally

When I do it at indexing and query time, it seems to work - but the highlight snippets stop working.

I believe it is working, MINUS the highlighting/snippets if that makes sense?

Thanks

Sarfaraz Jamal (Sas)
Revenue Assurance Tech Ops
614-560-8556
Sarfaraz.Jamal@VerizonWireless.com

-----Original Message-----
From: Alessandro Benedetti [mailto:abenedetti@apache.org]
Sent: Thursday, June 2, 2016 5:41 PM
To: solr-user@lucene.apache.org
Subject: [E] Re: Question(s) about Highlighting

Hi Jamal,
I assume you are using the Synonym token filter.
From the observation I can assume you are using it only at indexing time.
This means that when you index you are  :

1) given a row in the synonym.txt you index all the terms per row in place of any of the term in the row .

2) given any of the term in the left side of the expression, you index the term in the right side of the expression

You can verify this easily with the analysis tool in the Solr UI .



On Thu, Jun 2, 2016 at 7:50 PM, Jamal, Sarfaraz < Sarfaraz.Jamal@verizonwireless.com.invalid> wrote:

> I am having some difficulty understanding how to do something and if 
> it is even possible
>
> I have tried the following sets of Synonyms:
>
> 1.  sarfaraz, sas, sasjamal
> 2.  sasjamal,sas => Sarfaraz
>
> In the second instance, any searches with the world 'sasjamal' do not 
> appear in the results, as it has been converted to Sarfaraz (I
> believe) -
>

This means you don't use the same synonym.txt at query time. indeed sasjamal is not in the index at all.


> In the first instance it works better - I believe all instances of any 
> of those words  appear in the results. However the highlighted 
> snippets also stop working when any of those words are Matched. Is 
> there any documentation, insights or help about this issue?
>

I should verify that, it could be related the term offset.
Please take a look to the analysis tool as well to understand better how the offsets are assigned.
I remember long time ago there was a discussion about it and a bug or similar raised.

Cheers

>
> Thanks in advance,
>
> Sas
>
>
> -----Original Message-----
> From: Shawn Heisey [mailto:apache@elyograg.org]
> Sent: Thursday, June 2, 2016 2:43 PM
> To: solr-user@lucene.apache.org
> Subject: [E] Re: MongoDB and Solr - Massive re-indexing
>
> On 6/2/2016 11:56 AM, Robert Brown wrote:
> > My question is whether sending batches of 1,000 documents to Solr is 
> > still beneficial (thinking about docs that may not change), or if I 
> > should look at the MongoDB connector for Solr, based on the volume 
> > of incoming data we see.
> >
> > Would the connector still see all docs updating if I re-insert them 
> > blindly, and thus still send all 50m documents back to Solr everyday 
> > anyway?
> >
> > Is my setup quite typical for the MongoDB connector?
>
> Sending update requests to Solr containing batches of 1000 docs is a 
> good idea.  Depending on how large they are, you may be able to send 
> even more than 1000.  If you can avoid sending documents that haven't 
> changed, Solr will likely perform better and relevance scoring will be 
> better, because you won't have as many deleted docs.
>
> The mongo connector is not software from the Solr project, or even 
> from Apache.  We don't know anything about it.  If you have questions 
> about that software, please contact the people who maintain it.  If 
> their answers lead to questions about Solr itself, then you can bring those back here.
>
> Thanks,
> Shawn
>
>


--
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

RE: [E] Re: Question(s) about Highlighting

Posted by "Jamal, Sarfaraz" <Sa...@VerizonWireless.com.INVALID>.

Good Morning Alessandro,

I verified it through the analysis tool (thanks for pointing it out), and it appears to be working correctly - As I see all of them as being synonyms of each other for this entry:

sasjamal, sarfaraz, sas

- When I do it only at indexing time, and disable it during query time (editing the synonyms.txt file SOLR6) -
It does not treat them equally

When I do it at indexing and query time, it seems to work - but the highlight snippets stop working.

I believe it is working, MINUS the highlighting/snippets if that makes sense?

Thanks

Sarfaraz Jamal (Sas)
Revenue Assurance Tech Ops
614-560-8556
Sarfaraz.Jamal@VerizonWireless.com

-----Original Message-----
From: Alessandro Benedetti [mailto:abenedetti@apache.org] 
Sent: Thursday, June 2, 2016 5:41 PM
To: solr-user@lucene.apache.org
Subject: [E] Re: Question(s) about Highlighting

Hi Jamal,
I assume you are using the Synonym token filter.
From the observation I can assume you are using it only at indexing time.
This means that when you index you are  :

1) given a row in the synonym.txt you index all the terms per row in place of any of the term in the row .

2) given any of the term in the left side of the expression, you index the term in the right side of the expression

You can verify this easily with the analysis tool in the Solr UI .



On Thu, Jun 2, 2016 at 7:50 PM, Jamal, Sarfaraz < Sarfaraz.Jamal@verizonwireless.com.invalid> wrote:

> I am having some difficulty understanding how to do something and if 
> it is even possible
>
> I have tried the following sets of Synonyms:
>
> 1.  sarfaraz, sas, sasjamal
> 2.  sasjamal,sas => Sarfaraz
>
> In the second instance, any searches with the world 'sasjamal' do not 
> appear in the results, as it has been converted to Sarfaraz (I 
> believe) -
>

This means you don't use the same synonym.txt at query time. indeed sasjamal is not in the index at all.


> In the first instance it works better - I believe all instances of any 
> of those words  appear in the results. However the highlighted 
> snippets also stop working when any of those words are Matched. Is 
> there any documentation, insights or help about this issue?
>

I should verify that, it could be related the term offset.
Please take a look to the analysis tool as well to understand better how the offsets are assigned.
I remember long time ago there was a discussion about it and a bug or similar raised.

Cheers

>
> Thanks in advance,
>
> Sas
>
>
> -----Original Message-----
> From: Shawn Heisey [mailto:apache@elyograg.org]
> Sent: Thursday, June 2, 2016 2:43 PM
> To: solr-user@lucene.apache.org
> Subject: [E] Re: MongoDB and Solr - Massive re-indexing
>
> On 6/2/2016 11:56 AM, Robert Brown wrote:
> > My question is whether sending batches of 1,000 documents to Solr is 
> > still beneficial (thinking about docs that may not change), or if I 
> > should look at the MongoDB connector for Solr, based on the volume 
> > of incoming data we see.
> >
> > Would the connector still see all docs updating if I re-insert them 
> > blindly, and thus still send all 50m documents back to Solr everyday 
> > anyway?
> >
> > Is my setup quite typical for the MongoDB connector?
>
> Sending update requests to Solr containing batches of 1000 docs is a 
> good idea.  Depending on how large they are, you may be able to send 
> even more than 1000.  If you can avoid sending documents that haven't 
> changed, Solr will likely perform better and relevance scoring will be 
> better, because you won't have as many deleted docs.
>
> The mongo connector is not software from the Solr project, or even 
> from Apache.  We don't know anything about it.  If you have questions 
> about that software, please contact the people who maintain it.  If 
> their answers lead to questions about Solr itself, then you can bring those back here.
>
> Thanks,
> Shawn
>
>


--
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: Question(s) about Highlighting

Posted by Alessandro Benedetti <ab...@apache.org>.

Hi Jamal,
I assume you are using the Synonym token filter.
From the observation I can assume you are using it only at indexing time.
This means that when you index you are  :

1) given a row in the synonym.txt you index all the terms per row in place
of any of the term in the row .

2) given any of the term in the left side of the expression, you index the
term in the right side of the expression

You can verify this easily with the analysis tool in the Solr UI .



On Thu, Jun 2, 2016 at 7:50 PM, Jamal, Sarfaraz <
Sarfaraz.Jamal@verizonwireless.com.invalid> wrote:

> I am having some difficulty understanding how to do something and if it is
> even possible
>
> I have tried the following sets of Synonyms:
>
> 1.  sarfaraz, sas, sasjamal
> 2.  sasjamal,sas => Sarfaraz
>
> In the second instance, any searches with the world 'sasjamal' do not
> appear in the results, as it has been converted to Sarfaraz (I believe) -
>

This means you don't use the same synonym.txt at query time. indeed
sasjamal is not in the index at all.


> In the first instance it works better - I believe all instances of any of
> those words  appear in the results. However the highlighted snippets also
> stop working when any of those words are
> Matched. Is there any documentation, insights or help about this issue?
>

I should verify that, it could be related the term offset.
Please take a look to the analysis tool as well to understand better how
the offsets are assigned.
I remember long time ago there was a discussion about it and a bug or
similar raised.

Cheers

>
> Thanks in advance,
>
> Sas
>
>
> -----Original Message-----
> From: Shawn Heisey [mailto:apache@elyograg.org]
> Sent: Thursday, June 2, 2016 2:43 PM
> To: solr-user@lucene.apache.org
> Subject: [E] Re: MongoDB and Solr - Massive re-indexing
>
> On 6/2/2016 11:56 AM, Robert Brown wrote:
> > My question is whether sending batches of 1,000 documents to Solr is
> > still beneficial (thinking about docs that may not change), or if I
> > should look at the MongoDB connector for Solr, based on the volume of
> > incoming data we see.
> >
> > Would the connector still see all docs updating if I re-insert them
> > blindly, and thus still send all 50m documents back to Solr everyday
> > anyway?
> >
> > Is my setup quite typical for the MongoDB connector?
>
> Sending update requests to Solr containing batches of 1000 docs is a good
> idea.  Depending on how large they are, you may be able to send even more
> than 1000.  If you can avoid sending documents that haven't changed, Solr
> will likely perform better and relevance scoring will be better, because
> you won't have as many deleted docs.
>
> The mongo connector is not software from the Solr project, or even from
> Apache.  We don't know anything about it.  If you have questions about that
> software, please contact the people who maintain it.  If their answers lead
> to questions about Solr itself, then you can bring those back here.
>
> Thanks,
> Shawn
>
>


-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England