You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Ajay Sharma <aj...@indiamart.com.INVALID> on 2021/01/13 12:04:09 UTC
Cursor Performance Issue
Hi All,
I have used cursors to search and export documents in solr according to
https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html#fetching-a-large-number-of-sorted-results-cursors
Solr version: 6.5.0
No of Documents: 10 crore
Before implementing cursor, I was using the start and rows parameter to
fetch records
Service response time used to be 2 sec
*Before implementing Cursor Solr URL:*
http://localhost:8080/solr/search/select?q=bird
toy&qt=mapping&ps=3&rows=25&mm=100
Request handler Looks like this: fl contains approx 20 fields
<requestHandler name="mapping" class="solr.SearchHandler">
<lst name="invariants">
<str name="defType">edismax</str>
<str name="indent">on</str>
<float name="tie">0.01</float>
</lst>
<lst name="appends">
<str name="fl">id,refid,title,smalldesc:""</str>
</lst>
<lst name="defaults">
<str name="echoParams">none</str>
<str name="wt">json</str>
<int name="rows">25</int>
<str name="timeAllowed">15000</str>
<str name="qf">smalldesc</str>
<str name="qf">title_text</str>
<str name="qf">titlews^3</str>
<str name="qf">sdescnisq</str>
<str name="qs">1</str>
<!-- retrive following fields -->
<str name="mm">2<-1 4<70%</str>
</lst>
</requestHandler>
Sharing Response with EchoParams=all > Qtime is 6
responseHeader: {
status: 0,
QTime: 6,
params: {
ps: "3",
echoParams: "all",
indent: "on",
fl: "id,refid,title,smalldesc:"",
tie: "0.01",
defType: "edismax",
qf: "customphonetic",
wt: "json",
qs: "1",
qt: "mapping",
rows: "25",
q: "bird toy",
timeAllowed: "15000"
}
},
response: {
numFound: 17,
start: 0,
maxScore: 26.616478,
docs: [
{
id: "22347708097",
refid: "152585558",
title: "Round BIRD COLOURFUL SWINGING CIRCULAR SITTING TOY",
smalldesc: "",
score: 26.616478
}
]
}
I am facing a performance issue now after implementing the cursor. Service
response time is increased 3 to 4 times .i.e. 8 sec in some cases
*After implementing Cursor query is-*
localhost:8080/solr/search/select?q=bird
toy&qt=cursor&ps=3&rows=1000&mm=100&sort=score desc,id asc&cursorMark=*
Just added &sort=score desc,id asc&cursorMark=* to the before query and
rows to be fetched is 1000 now and fl contains just a single field
Request handler remains same as before just changed the name and made fl
change and added df in defaults
<requestHandler name="cursor" class="solr.SearchHandler">
<lst name="invariants">
<str name="defType">edismax</str>
<str name="indent">on</str>
<float name="tie">0.01</float>
</lst>
<lst name="appends">
<str name="fl">refid</str>
</lst>
<lst name="defaults">
<str name="echoParams">none</str>
<str name="wt">json</str>
<int name="rows">1000</int>
<str name="qf">smalldesc</str>
<str name="qf">title_text</str>
<str name="qf">titlews^3</str>
<str name="qf">sdescnisq</str>
<str name="qs">1</str>
<str name="mm">2<-1 4<70%</str>
<str name="df">product_titles</str>
</lst>
</requestHandler>
Response with Cursor and echoParams=all-> *Qtime is now 17* i.e approx 3
time of previous qtime
responseHeader: {
status: 0,
QTime: 17,
params: {
df: "product_titles",
ps: "3",
echoParams: "all",
indent: "on",
fl: "refid",
tie: "0.01",
defType: "edismax",
qf: "customphonetic",
qs: "1",
qt: "cursor",
sort: "score desc,id asc",
rows: "1000",
q: "bird toy",
cursorMark: "*",
}
},
response: {
numFound: 17,
start: 0,
docs: [
{
refid: "152585558"
},
{
refid: "157276077"
}
]
}
When i curl http://localhost:8080/solr/search/select?q=bird
toy&qt=mapping&ps=3&rows=25&mm=100, i can get results in 3 seconds.
When i curl localhost:8080/solr/search/select?q=bird
toy&qt=cursor&ps=3&rows=1000&mm=100&sort=score desc,id asc&cursorMark=* it
consumed 8 seconds to return result even if the result count=0
BTW, the id schema definition is used in sort
<field name="id" type="string" indexed="true" stored="true" required="true"
omitNorms="true" multiValued="false"/>
Is it due to the sort I have applied or I have implemented it in the wrong
way?
Please help or provide the direction to solve this issue
Thanks in advance
--
Thanks & Regards,
Ajay Sharma
Product Search
Indiamart Intermesh Ltd.
--
Re: Cursor Performance Issue
Posted by Ajay Sharma <aj...@indiamart.com.INVALID>.
Hi Mike,
Thanks for your reply.
I remember DocValues is enabled by default since solr 6.
If it is not and I reindex the data with DocValues= true for id field. How
much my index size will increase due to this.
Currently I have 90 GB as index size
On Wed, 13 Jan, 2021, 9:14 pm Mike Drob, <md...@mdrob.com> wrote:
> You should be using docvalues on your id, but note that switching this
> would require a reindex.
>
> On Wed, Jan 13, 2021 at 6:04 AM Ajay Sharma <aj...@indiamart.com.invalid>
> wrote:
>
> > Hi All,
> >
> > I have used cursors to search and export documents in solr according to
> >
> >
> https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html#fetching-a-large-number-of-sorted-results-cursors
> >
> > Solr version: 6.5.0
> > No of Documents: 10 crore
> >
> > Before implementing cursor, I was using the start and rows parameter to
> > fetch records
> > Service response time used to be 2 sec
> >
> > *Before implementing Cursor Solr URL:*
> > http://localhost:8080/solr/search/select?q=bird
> > toy&qt=mapping&ps=3&rows=25&mm=100
> >
> > Request handler Looks like this: fl contains approx 20 fields
> > <requestHandler name="mapping" class="solr.SearchHandler">
> > <lst name="invariants">
> > <str name="defType">edismax</str>
> > <str name="indent">on</str>
> > <float name="tie">0.01</float>
> > </lst>
> > <lst name="appends">
> > <str name="fl">id,refid,title,smalldesc:""</str>
> > </lst>
> > <lst name="defaults">
> > <str name="echoParams">none</str>
> > <str name="wt">json</str>
> > <int name="rows">25</int>
> > <str name="timeAllowed">15000</str>
> > <str name="qf">smalldesc</str>
> > <str name="qf">title_text</str>
> > <str name="qf">titlews^3</str>
> > <str name="qf">sdescnisq</str>
> > <str name="qs">1</str>
> > <!-- retrive following fields -->
> > <str name="mm">2<-1 4<70%</str>
> > </lst>
> > </requestHandler>
> >
> > Sharing Response with EchoParams=all > Qtime is 6
> > responseHeader: {
> > status: 0,
> > QTime: 6,
> > params: {
> > ps: "3",
> > echoParams: "all",
> > indent: "on",
> > fl: "id,refid,title,smalldesc:"",
> > tie: "0.01",
> > defType: "edismax",
> > qf: "customphonetic",
> > wt: "json",
> > qs: "1",
> > qt: "mapping",
> > rows: "25",
> > q: "bird toy",
> > timeAllowed: "15000"
> > }
> > },
> > response: {
> > numFound: 17,
> > start: 0,
> > maxScore: 26.616478,
> > docs: [
> > {
> > id: "22347708097",
> > refid: "152585558",
> > title: "Round BIRD COLOURFUL SWINGING CIRCULAR SITTING TOY",
> > smalldesc: "",
> > score: 26.616478
> > }
> > ]
> > }
> >
> > I am facing a performance issue now after implementing the cursor.
> Service
> > response time is increased 3 to 4 times .i.e. 8 sec in some cases
> >
> > *After implementing Cursor query is-*
> > localhost:8080/solr/search/select?q=bird
> > toy&qt=cursor&ps=3&rows=1000&mm=100&sort=score desc,id asc&cursorMark=*
> >
> > Just added &sort=score desc,id asc&cursorMark=* to the before query and
> > rows to be fetched is 1000 now and fl contains just a single field
> >
> > Request handler remains same as before just changed the name and made fl
> > change and added df in defaults
> >
> > <requestHandler name="cursor" class="solr.SearchHandler">
> > <lst name="invariants">
> > <str name="defType">edismax</str>
> > <str name="indent">on</str>
> > <float name="tie">0.01</float>
> > </lst>
> > <lst name="appends">
> > <str name="fl">refid</str>
> > </lst>
> > <lst name="defaults">
> > <str name="echoParams">none</str>
> > <str name="wt">json</str>
> > <int name="rows">1000</int>
> > <str name="qf">smalldesc</str>
> > <str name="qf">title_text</str>
> > <str name="qf">titlews^3</str>
> > <str name="qf">sdescnisq</str>
> > <str name="qs">1</str>
> > <str name="mm">2<-1 4<70%</str>
> > <str name="df">product_titles</str>
> > </lst>
> > </requestHandler>
> >
> > Response with Cursor and echoParams=all-> *Qtime is now 17* i.e approx 3
> > time of previous qtime
> > responseHeader: {
> > status: 0,
> > QTime: 17,
> > params: {
> > df: "product_titles",
> > ps: "3",
> > echoParams: "all",
> > indent: "on",
> > fl: "refid",
> > tie: "0.01",
> > defType: "edismax",
> > qf: "customphonetic",
> > qs: "1",
> > qt: "cursor",
> > sort: "score desc,id asc",
> > rows: "1000",
> > q: "bird toy",
> > cursorMark: "*",
> > }
> > },
> > response: {
> > numFound: 17,
> > start: 0,
> > docs: [
> > {
> > refid: "152585558"
> > },
> > {
> > refid: "157276077"
> > }
> > ]
> > }
> >
> >
> > When i curl http://localhost:8080/solr/search/select?q=bird
> > toy&qt=mapping&ps=3&rows=25&mm=100, i can get results in 3 seconds.
> > When i curl localhost:8080/solr/search/select?q=bird
> > toy&qt=cursor&ps=3&rows=1000&mm=100&sort=score desc,id asc&cursorMark=*
> it
> > consumed 8 seconds to return result even if the result count=0
> >
> > BTW, the id schema definition is used in sort
> > <field name="id" type="string" indexed="true" stored="true"
> required="true"
> > omitNorms="true" multiValued="false"/>
> >
> > Is it due to the sort I have applied or I have implemented it in the
> wrong
> > way?
> > Please help or provide the direction to solve this issue
> >
> >
> > Thanks in advance
> >
> > --
> > Thanks & Regards,
> > Ajay Sharma
> > Product Search
> > Indiamart Intermesh Ltd.
> >
> > --
> >
> >
>
--
Re: Cursor Performance Issue
Posted by Mike Drob <md...@mdrob.com>.
You should be using docvalues on your id, but note that switching this
would require a reindex.
On Wed, Jan 13, 2021 at 6:04 AM Ajay Sharma <aj...@indiamart.com.invalid>
wrote:
> Hi All,
>
> I have used cursors to search and export documents in solr according to
>
> https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html#fetching-a-large-number-of-sorted-results-cursors
>
> Solr version: 6.5.0
> No of Documents: 10 crore
>
> Before implementing cursor, I was using the start and rows parameter to
> fetch records
> Service response time used to be 2 sec
>
> *Before implementing Cursor Solr URL:*
> http://localhost:8080/solr/search/select?q=bird
> toy&qt=mapping&ps=3&rows=25&mm=100
>
> Request handler Looks like this: fl contains approx 20 fields
> <requestHandler name="mapping" class="solr.SearchHandler">
> <lst name="invariants">
> <str name="defType">edismax</str>
> <str name="indent">on</str>
> <float name="tie">0.01</float>
> </lst>
> <lst name="appends">
> <str name="fl">id,refid,title,smalldesc:""</str>
> </lst>
> <lst name="defaults">
> <str name="echoParams">none</str>
> <str name="wt">json</str>
> <int name="rows">25</int>
> <str name="timeAllowed">15000</str>
> <str name="qf">smalldesc</str>
> <str name="qf">title_text</str>
> <str name="qf">titlews^3</str>
> <str name="qf">sdescnisq</str>
> <str name="qs">1</str>
> <!-- retrive following fields -->
> <str name="mm">2<-1 4<70%</str>
> </lst>
> </requestHandler>
>
> Sharing Response with EchoParams=all > Qtime is 6
> responseHeader: {
> status: 0,
> QTime: 6,
> params: {
> ps: "3",
> echoParams: "all",
> indent: "on",
> fl: "id,refid,title,smalldesc:"",
> tie: "0.01",
> defType: "edismax",
> qf: "customphonetic",
> wt: "json",
> qs: "1",
> qt: "mapping",
> rows: "25",
> q: "bird toy",
> timeAllowed: "15000"
> }
> },
> response: {
> numFound: 17,
> start: 0,
> maxScore: 26.616478,
> docs: [
> {
> id: "22347708097",
> refid: "152585558",
> title: "Round BIRD COLOURFUL SWINGING CIRCULAR SITTING TOY",
> smalldesc: "",
> score: 26.616478
> }
> ]
> }
>
> I am facing a performance issue now after implementing the cursor. Service
> response time is increased 3 to 4 times .i.e. 8 sec in some cases
>
> *After implementing Cursor query is-*
> localhost:8080/solr/search/select?q=bird
> toy&qt=cursor&ps=3&rows=1000&mm=100&sort=score desc,id asc&cursorMark=*
>
> Just added &sort=score desc,id asc&cursorMark=* to the before query and
> rows to be fetched is 1000 now and fl contains just a single field
>
> Request handler remains same as before just changed the name and made fl
> change and added df in defaults
>
> <requestHandler name="cursor" class="solr.SearchHandler">
> <lst name="invariants">
> <str name="defType">edismax</str>
> <str name="indent">on</str>
> <float name="tie">0.01</float>
> </lst>
> <lst name="appends">
> <str name="fl">refid</str>
> </lst>
> <lst name="defaults">
> <str name="echoParams">none</str>
> <str name="wt">json</str>
> <int name="rows">1000</int>
> <str name="qf">smalldesc</str>
> <str name="qf">title_text</str>
> <str name="qf">titlews^3</str>
> <str name="qf">sdescnisq</str>
> <str name="qs">1</str>
> <str name="mm">2<-1 4<70%</str>
> <str name="df">product_titles</str>
> </lst>
> </requestHandler>
>
> Response with Cursor and echoParams=all-> *Qtime is now 17* i.e approx 3
> time of previous qtime
> responseHeader: {
> status: 0,
> QTime: 17,
> params: {
> df: "product_titles",
> ps: "3",
> echoParams: "all",
> indent: "on",
> fl: "refid",
> tie: "0.01",
> defType: "edismax",
> qf: "customphonetic",
> qs: "1",
> qt: "cursor",
> sort: "score desc,id asc",
> rows: "1000",
> q: "bird toy",
> cursorMark: "*",
> }
> },
> response: {
> numFound: 17,
> start: 0,
> docs: [
> {
> refid: "152585558"
> },
> {
> refid: "157276077"
> }
> ]
> }
>
>
> When i curl http://localhost:8080/solr/search/select?q=bird
> toy&qt=mapping&ps=3&rows=25&mm=100, i can get results in 3 seconds.
> When i curl localhost:8080/solr/search/select?q=bird
> toy&qt=cursor&ps=3&rows=1000&mm=100&sort=score desc,id asc&cursorMark=* it
> consumed 8 seconds to return result even if the result count=0
>
> BTW, the id schema definition is used in sort
> <field name="id" type="string" indexed="true" stored="true" required="true"
> omitNorms="true" multiValued="false"/>
>
> Is it due to the sort I have applied or I have implemented it in the wrong
> way?
> Please help or provide the direction to solve this issue
>
>
> Thanks in advance
>
> --
> Thanks & Regards,
> Ajay Sharma
> Product Search
> Indiamart Intermesh Ltd.
>
> --
>
>