You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Montu v Boda <mo...@highqsolutions.com> on 2013/04/16 12:22:40 UTC

first time with new keyword, solr take to much time to give the result

Hi,

when we search with any new keyword at first time then solr 4.2.1 take to
much time to give the result.

we have 5060000 document is index in solr and it's size is 400GB.

now when We search for keyword "test" it will take 1 min to give the
response for 10000 rows.

we fire the query from the java application using solrj client.

this behavior is same with solr 1.4, 3.5 and 4.2.1.

all 400GB data is indexed in one folder called "<Solr Home>\data\index".

after fire the query, when we open the resource management then it will show
that more cost is of Disk I/O

any help would be helpfull to us

Thanks & Regards
Montu v Boda



--
View this message in context: http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: first time with new keyword, solr take to much time to give the result

Posted by Raymond Wiker <rw...@gmail.com>.
On Tue, Apr 16, 2013 at 3:13 PM, Montu v Boda <montu.boda@highqsolutions.com
> wrote:

> hi
>
> we are trying to return 10,000 rows
>
> it is necessary to return 10000 rows because from that 10000, we are pick
> only top 100 record based on the user permission and permission is stored
> in
> database not on solr.
>
> and if we try to return 100 rows then it may possible that from the 100
> rows, user does not have permission of any document. user will get blank
> search result.
>
>
You may have some other options:

1) Add the access rights to SOLR, and have a front-end that takes a user id
and expands it into a set of access rights (groups, mainly) for the user.
This is then added as a filter to the queries.

2) Run the query with a smaller number of hits requested, and use the
"start" parameter to fetch more hits (if necessary).

Also, you may want to restrict the fields returned by your query, to the
minimal set required.

Re: first time with new keyword, solr take to much time to give the result

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi,

Have you considered ManifoldCF?

Otis
--
SOLR Performance Monitoring - http://sematext.com/spm/index.html




On Tue, Apr 16, 2013 at 10:02 AM, Montu v Boda
<mo...@highqsolutions.com> wrote:
> Hi
>
> problem is that the permission is frequently update in our system so that we
> have to update the index in the same manner other wise it will give wrong
> result.
>
> in that case i think the cache will get effect and the performance may be
> reduced.
>
>
> Thanks & Regards
> Montu v Boda
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254p4056322.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: first time with new keyword, solr take to much time to give the result

Posted by Montu v Boda <mo...@highqsolutions.com>.
Hi

Thanks For your reply.

we will try to index the permission in solr and add the filter query and try
to get optimum(150 or 100 rows) in result from the solr.

and in future we will try with SSD as well.


Thanks to all For such a great response.

Thanks & Regards
Montu v Boda



--
View this message in context: http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254p4056685.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: first time with new keyword, solr take to much time to give the result

Posted by Duncan Irvine <du...@gmail.com>.
tl;dr: retrieving 10,000 docs is a bad idea. Look into docValues for
storing security info

I suspect that you'll be better served by keeping the permissions
up-to-date in solr and invalidating the caches rather than trying to return
10,000 docs.  On average, you'll be attempting to read up to 800MB of data
per query (400GB * 10000/5060000), and that will be accessed randomly.
 Assuming As Toke said earlier, on a disc this will just be a bad idea.  If
you must persist in querying like this, then I'd second the SSDs - a pair
in RAID 1 should give you good read performance, adequate write and
redundancy. You might be able to get that query down to something in the
region of 5-10 seconds at a guess.  Assuming that you're not actually
returning the entire document in the response, giving an 800MB network
response even on GbE that'll be 10s just for layer-2, let alone the
serialisation overhead.

You might try looking into storing your security information in docValues
fields - set docValues=true against the field in schema.xml (needs Solr 4.2).
 That ought to give greater performance when reading that field and may
circumvent your concerns over cache invalidation although I haven't played
with them yet, so don't quote me on that.

Can you be more specific about the security model? What is being stored in
the DB? How does that get applied to the document? Can you translate that
into a query that solr could understand?  Is it too complex, or are you
really just worried about cache invalidation?

Would it be acceptable to have the security info in solr, but lagging the
DB somewhat. Then select a smaller selection and post-filter in your
business layer?
i.e. instead of running just q=foo:bar&rows=10000 then filtering, you run a
query such as q=foo:bar&fq=security_group:(2 3 19)&rows=150 and then
filtering against your DB as a final double-check before presenting to your
user.  This would mean that they would immediately be prevented from seeing
something that they're no longer allowed to, but may have to wait for the
next update to see something they've just been allowed to.

Regards,
  Duncan.


On 16 April 2013 15:02, Montu v Boda <mo...@highqsolutions.com> wrote:

> Hi
>
> problem is that the permission is frequently update in our system so that
> we
> have to update the index in the same manner other wise it will give wrong
> result.
>
> in that case i think the cache will get effect and the performance may be
> reduced.
>
>
> Thanks & Regards
> Montu v Boda
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254p4056322.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Don't let your mind wander -- it's too little to be let out alone.

Re: first time with new keyword, solr take to much time to give the result

Posted by Montu v Boda <mo...@highqsolutions.com>.
Hi

problem is that the permission is frequently update in our system so that we
have to update the index in the same manner other wise it will give wrong
result.

in that case i think the cache will get effect and the performance may be
reduced.


Thanks & Regards
Montu v Boda 



--
View this message in context: http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254p4056322.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: first time with new keyword, solr take to much time to give the result

Posted by Jack Krupansky <ja...@basetechnology.com>.
Why not just add a filter query for user permissions?

-- Jack Krupansky

-----Original Message----- 
From: Montu v Boda
Sent: Tuesday, April 16, 2013 9:13 AM
To: solr-user@lucene.apache.org
Subject: Re: first time with new keyword, solr take to much time to give the 
result

hi

we are trying to return 10,000 rows

it is necessary to return 10000 rows because from that 10000, we are pick
only top 100 record based on the user permission and permission is stored in
database not on solr.

and if we try to return 100 rows then it may possible that from the 100
rows, user does not have permission of any document. user will get blank
search result.

Thanks & Regards
Montu v Boda



--
View this message in context: 
http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254p4056306.html
Sent from the Solr - User mailing list archive at Nabble.com. 


Re: first time with new keyword, solr take to much time to give the result

Posted by Montu v Boda <mo...@highqsolutions.com>.
Hi

problem is that the permission is frequently update in our system so that we
have to update the index in the same manner other wise it will give wrong
result.

in that case i think the cache will get effect and the performance may be
reduced.


Thanks & Regards
Montu v Boda



--
View this message in context: http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254p4056321.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: first time with new keyword, solr take to much time to give the result

Posted by Ahmet Arslan <io...@yahoo.com>.
Hi Montu,

Regarding permissions, you may find this solution more elegant:

http://searchhub.org/2012/02/22/custom-security-filtering-in-solr/

http://hokiesuns.blogspot.com/2012/11/using-solrs-postfiltering-to-collect.html


--- On Tue, 4/16/13, Montu v Boda <mo...@highqsolutions.com> wrote:

> From: Montu v Boda <mo...@highqsolutions.com>
> Subject: Re: first time with new keyword, solr take to much time to give the result
> To: solr-user@lucene.apache.org
> Date: Tuesday, April 16, 2013, 4:13 PM
> hi
> 
> we are trying to return 10,000 rows
> 
> it is necessary to return 10000 rows because from that
> 10000, we are pick
> only top 100 record based on the user permission and
> permission is stored in
> database not on solr.
> 
> and if we try to return 100 rows then it may possible that
> from the 100
> rows, user does not have permission of any document. user
> will get blank
> search result.
> 
> Thanks & Regards
> Montu v Boda
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254p4056306.html
> Sent from the Solr - User mailing list archive at
> Nabble.com.
> 

Re: first time with new keyword, solr take to much time to give the result

Posted by Montu v Boda <mo...@highqsolutions.com>.
hi

we are trying to return 10,000 rows

it is necessary to return 10000 rows because from that 10000, we are pick
only top 100 record based on the user permission and permission is stored in
database not on solr.

and if we try to return 100 rows then it may possible that from the 100
rows, user does not have permission of any document. user will get blank
search result.

Thanks & Regards
Montu v Boda



--
View this message in context: http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254p4056306.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: first time with new keyword, solr take to much time to give the result

Posted by Duncan Irvine <du...@gmail.com>.
Are you actually trying to return 10,000 records, or is that the number of
hits, and you're only retrieving the top 10?

Cheers,
  Duncan.


On 16 April 2013 12:39, Montu v Boda <mo...@highqsolutions.com> wrote:

> Hi
>
> Thanks for info.
>
> we did the same thing but no effect for first time.
>
> what to do for first time query with new keyword?
>
> how we can make the query faster for first time with new keyword?
>
> say for ex if i try to search the text key word "test" first time then it
> will take to much time to execute.
>
> for second time the same keyword works faster...
>
> Thanks & Regards
> Montu v Boda
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254p4056276.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Don't let your mind wander -- it's too little to be let out alone.

Re: first time with new keyword, solr take to much time to give the result

Posted by Montu v Boda <mo...@highqsolutions.com>.
Hi

Thanks for info.

we did the same thing but no effect for first time.

what to do for first time query with new keyword?

how we can make the query faster for first time with new keyword?

say for ex if i try to search the text key word "test" first time then it
will take to much time to execute.

for second time the same keyword works faster...

Thanks & Regards
Montu v Boda



--
View this message in context: http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254p4056276.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: first time with new keyword, solr take to much time to give the result

Posted by Dmitry Kan <so...@gmail.com>.
In the admin page you can monitor the cache parameters, like eviction. If
you cache evicts too much, you can increase its capacity. NOTE: this will
affect on RAM consumption, so you would need to change the tomcat config
too.


On Tue, Apr 16, 2013 at 2:08 PM, Montu v Boda <montu.boda@highqsolutions.com
> wrote:

> Hi
>
> currently, my solr is deploy in tomcat1 and we have given 4GB memory of
> that
> tomcat
>
> Thanks & Regards
> Montu v Boda
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254p4056261.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: first time with new keyword, solr take to much time to give the result

Posted by Montu v Boda <mo...@highqsolutions.com>.
Hi

currently, my solr is deploy in tomcat1 and we have given 4GB memory of that
tomcat

Thanks & Regards
Montu v Boda



--
View this message in context: http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254p4056261.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: first time with new keyword, solr take to much time to give the result

Posted by Dmitry Kan <so...@gmail.com>.
Hi,

Things to google ;)

1. warmup queries
2. solr cache

How much RAM does you index take now?

Dmitry


On Tue, Apr 16, 2013 at 1:22 PM, Montu v Boda <montu.boda@highqsolutions.com
> wrote:

> Hi,
>
> when we search with any new keyword at first time then solr 4.2.1 take to
> much time to give the result.
>
> we have 5060000 document is index in solr and it's size is 400GB.
>
> now when We search for keyword "test" it will take 1 min to give the
> response for 10000 rows.
>
> we fire the query from the java application using solrj client.
>
> this behavior is same with solr 1.4, 3.5 and 4.2.1.
>
> all 400GB data is indexed in one folder called "<Solr Home>\data\index".
>
> after fire the query, when we open the resource management then it will
> show
> that more cost is of Disk I/O
>
> any help would be helpfull to us
>
> Thanks & Regards
> Montu v Boda
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: first time with new keyword, solr take to much time to give the result

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Tue, 2013-04-16 at 12:22 +0200, Montu v Boda wrote:
> we have 5060000 document is index in solr and it's size is 400GB.
> 
> now when We search for keyword "test" it will take 1 min to give the
> response for 10000 rows.

At this point, you have searched for other keywords before you measure
on keyword "test", right? The first search on a newly opened index is
notoriously slow.

> after fire the query, when we open the resource management then it will show
> that more cost is of Disk I/O

Both searching and value retrieval (for the 10K rows) requires a lot of
random access in Lucene/Solr and, I guess, just about every other
comparable search engines.

I will bet a cake that your underlying storage is spinning disks. When
you perform a search for a keyword that has not been used before or not
in a while, the disk cache has little data for that search so there will
be a lot of random access to the underlying storage. Spinning disks are
really bad at this.

> any help would be helpfull to us

Short answer: Use a SSD.

Longer answer: You need to either lower the amount of seeks or make them
faster (or both). You lower the amount of seeks by (in your case)
copious amounts of RAM and a lot of warming of your searchers. You make
the seeks faster by switching storage type.

RAIDing of spinning drives does not help much as the benefits of this
are higher bulk transfer rates and/or concurrent requests, where you
need lower latency. You could buy faster spinning drives, but with
current prices of SSDs I would really advice that you choose that road
instead.

Regards,
Toke Eskildsen, State and University Library, Denmark