You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Chien Nguyen <ch...@gmail.com> on 2016/11/03 08:49:07 UTC

Apache Solr Question

Hi everyone! 
I'm a newbie in using Apache Solr. I've read some documents about it. But i
can't answer some questions. 
1. How many documents Solr can search at a moment??
2. Can Solr index the media data?? 
3. What's the max size of document that Solr can index??? 
Can you help me and explain it for me??? Please! It's important to me.
Thank you so much!




--
View this message in context: http://lucene.472066.n3.nabble.com/Apache-Solr-Question-tp4304308.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Apache Solr Question

Posted by Rick Leir <rl...@leirtech.com>.

On November 3, 2016 4:49:07 AM EDT, Chien Nguyen <ch...@gmail.com> wrote:
>Hi everyone! 
>I'm a newbie in using Apache Solr. 

Welcome!

> I've read some documents about it.
>But i
>can't answer some questions. 
>1. How many documents Solr can search at a moment??

I would like to say unlimited. But it depends on your hardware. Solr can index huge numbers of documents.

>2. Can Solr index the media data?? 

Meta data? Yes

>3. What's the max size of document that Solr can index??? 

Again, huge. You could read some intros and blogs on Solr, then come back and talk more. 

>Can you help me and explain it for me??? Please! It's important to me.
>Thank you so much!
>
>--
>View this message in context:
>http://lucene.472066.n3.nabble.com/Apache-Solr-Question-tp4304308.html
>Sent from the Solr - User mailing list archive at Nabble.com.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

RE: Apache Solr Question

Posted by "Davis, Daniel (NIH/NLM) [C]" <da...@nih.gov>.
Case in point - https://collections.nlm.nih.gov/ has one index (core) for documents and another index (core) for pages within the documents.
I think LOC (Library of Congress) does something similar from a presentation they gave at Lucene/DC Exchange.

-----Original Message-----
From: Doug Turnbull [mailto:dturnbull@opensourceconnections.com] 
Sent: Thursday, November 03, 2016 10:26 AM
To: solr-user@lucene.apache.org
Subject: Re: Apache Solr Question

For general search use cases, it's generally not a good idea to index giant documents. A relevance score for an entire book is generally less meaningful than if you can break it up into chapters or sections. Those subdivisions are often much more useful to a user from a usability standpoint for understanding not just that say a book is relevant but a particular section in a book is relevant to their query.

Just my 2 cents
-Doug

On Thu, Nov 3, 2016 at 9:57 AM Shawn Heisey <ap...@elyograg.org> wrote:

> On 11/3/2016 2:49 AM, Chien Nguyen wrote:
> > Hi everyone! I'm a newbie in using Apache Solr. I've read some 
> > documents about it. But i can't answer some questions.
>
> Second reply, so I'm aiming for more detail.
>
> > 1. How many documents Solr can search at a moment??
>
> A *single* Solr index has Lucene's limitation of slightly more than 2 
> billion documents.  This is part of the problem solved by SolrCloud.  
> By throwing multiple machines/shards at the problem, there is 
> effectively no limit to the size of a SolrCloud collection.  I have 
> encountered someone who has a collection with five billion documents in it.
>
> That 2 billion document limit I mentioned, which is Java's 
> Integer.MAX_VALUE, is the ONLY hard limit that I know of in the 
> software, and only applies when the index is not sharded.
>
> > 2. Can Solr index the media data??
>
> I have no idea what you meant here, but if you mean metadata, Solr 
> most likely can handle it.  If you meant actual media, like an image, 
> I believe there is a binary field type that you can even store a full 
> source document in, but that is not normally the way Solr is used, and 
> I don't recommend it.
>
> > 3. What's the max size of document that Solr can index???
>
> I don't think there is a limit.  I think there are some limits on the 
> number and size of individual terms, but not on the total size of a 
> document.  If documents get particularly large and numerous, 
> performance might suffer, but I am not aware of any total size limitations.
>
> Thanks,
> Shawn
>
>

Re: Apache Solr Question

Posted by Doug Turnbull <dt...@opensourceconnections.com>.
For general search use cases, it's generally not a good idea to index giant
documents. A relevance score for an entire book is generally less
meaningful than if you can break it up into chapters or sections. Those
subdivisions are often much more useful to a user from a usability
standpoint for understanding not just that say a book is relevant but a
particular section in a book is relevant to their query.

Just my 2 cents
-Doug

On Thu, Nov 3, 2016 at 9:57 AM Shawn Heisey <ap...@elyograg.org> wrote:

> On 11/3/2016 2:49 AM, Chien Nguyen wrote:
> > Hi everyone! I'm a newbie in using Apache Solr. I've read some
> > documents about it. But i can't answer some questions.
>
> Second reply, so I'm aiming for more detail.
>
> > 1. How many documents Solr can search at a moment??
>
> A *single* Solr index has Lucene's limitation of slightly more than 2
> billion documents.  This is part of the problem solved by SolrCloud.  By
> throwing multiple machines/shards at the problem, there is effectively
> no limit to the size of a SolrCloud collection.  I have encountered
> someone who has a collection with five billion documents in it.
>
> That 2 billion document limit I mentioned, which is Java's
> Integer.MAX_VALUE, is the ONLY hard limit that I know of in the
> software, and only applies when the index is not sharded.
>
> > 2. Can Solr index the media data??
>
> I have no idea what you meant here, but if you mean metadata, Solr most
> likely can handle it.  If you meant actual media, like an image, I
> believe there is a binary field type that you can even store a full
> source document in, but that is not normally the way Solr is used, and I
> don't recommend it.
>
> > 3. What's the max size of document that Solr can index???
>
> I don't think there is a limit.  I think there are some limits on the
> number and size of individual terms, but not on the total size of a
> document.  If documents get particularly large and numerous, performance
> might suffer, but I am not aware of any total size limitations.
>
> Thanks,
> Shawn
>
>

Re: Apache Solr Question

Posted by Erick Erickson <er...@gmail.com>.
bq: I have encountered someone who has a collection with five billion
documents in it...

I know of installations many times that. Admittedly when you start
getting into the 100s of billions you must plan carefully....

Erick

On Thu, Nov 3, 2016 at 7:44 AM, Susheel Kumar <su...@gmail.com> wrote:
> For media like images etc, there is LIRE solr plugin which can be utilised.
> I have used in the past and may meet your requirement. See
> http://www.lire-project.net/
>
> Thanks,
> Susheel
>
> On Thu, Nov 3, 2016 at 9:57 AM, Shawn Heisey <ap...@elyograg.org> wrote:
>
>> On 11/3/2016 2:49 AM, Chien Nguyen wrote:
>> > Hi everyone! I'm a newbie in using Apache Solr. I've read some
>> > documents about it. But i can't answer some questions.
>>
>> Second reply, so I'm aiming for more detail.
>>
>> > 1. How many documents Solr can search at a moment??
>>
>> A *single* Solr index has Lucene's limitation of slightly more than 2
>> billion documents.  This is part of the problem solved by SolrCloud.  By
>> throwing multiple machines/shards at the problem, there is effectively
>> no limit to the size of a SolrCloud collection.  I have encountered
>> someone who has a collection with five billion documents in it.
>>
>> That 2 billion document limit I mentioned, which is Java's
>> Integer.MAX_VALUE, is the ONLY hard limit that I know of in the
>> software, and only applies when the index is not sharded.
>>
>> > 2. Can Solr index the media data??
>>
>> I have no idea what you meant here, but if you mean metadata, Solr most
>> likely can handle it.  If you meant actual media, like an image, I
>> believe there is a binary field type that you can even store a full
>> source document in, but that is not normally the way Solr is used, and I
>> don't recommend it.
>>
>> > 3. What's the max size of document that Solr can index???
>>
>> I don't think there is a limit.  I think there are some limits on the
>> number and size of individual terms, but not on the total size of a
>> document.  If documents get particularly large and numerous, performance
>> might suffer, but I am not aware of any total size limitations.
>>
>> Thanks,
>> Shawn
>>
>>

Re: Apache Solr Question

Posted by Chien Nguyen <ch...@gmail.com>.
Great! Thank you so much. ^^



--
View this message in context: http://lucene.472066.n3.nabble.com/Apache-Solr-Question-tp4304308p4304437.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Apache Solr Question

Posted by Susheel Kumar <su...@gmail.com>.
For media like images etc, there is LIRE solr plugin which can be utilised.
I have used in the past and may meet your requirement. See
http://www.lire-project.net/

Thanks,
Susheel

On Thu, Nov 3, 2016 at 9:57 AM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 11/3/2016 2:49 AM, Chien Nguyen wrote:
> > Hi everyone! I'm a newbie in using Apache Solr. I've read some
> > documents about it. But i can't answer some questions.
>
> Second reply, so I'm aiming for more detail.
>
> > 1. How many documents Solr can search at a moment??
>
> A *single* Solr index has Lucene's limitation of slightly more than 2
> billion documents.  This is part of the problem solved by SolrCloud.  By
> throwing multiple machines/shards at the problem, there is effectively
> no limit to the size of a SolrCloud collection.  I have encountered
> someone who has a collection with five billion documents in it.
>
> That 2 billion document limit I mentioned, which is Java's
> Integer.MAX_VALUE, is the ONLY hard limit that I know of in the
> software, and only applies when the index is not sharded.
>
> > 2. Can Solr index the media data??
>
> I have no idea what you meant here, but if you mean metadata, Solr most
> likely can handle it.  If you meant actual media, like an image, I
> believe there is a binary field type that you can even store a full
> source document in, but that is not normally the way Solr is used, and I
> don't recommend it.
>
> > 3. What's the max size of document that Solr can index???
>
> I don't think there is a limit.  I think there are some limits on the
> number and size of individual terms, but not on the total size of a
> document.  If documents get particularly large and numerous, performance
> might suffer, but I am not aware of any total size limitations.
>
> Thanks,
> Shawn
>
>

Re: Apache Solr Question

Posted by Shawn Heisey <ap...@elyograg.org>.
On 11/3/2016 2:49 AM, Chien Nguyen wrote:
> Hi everyone! I'm a newbie in using Apache Solr. I've read some
> documents about it. But i can't answer some questions. 

Second reply, so I'm aiming for more detail. 

> 1. How many documents Solr can search at a moment??

A *single* Solr index has Lucene's limitation of slightly more than 2
billion documents.  This is part of the problem solved by SolrCloud.  By
throwing multiple machines/shards at the problem, there is effectively
no limit to the size of a SolrCloud collection.  I have encountered
someone who has a collection with five billion documents in it.

That 2 billion document limit I mentioned, which is Java's
Integer.MAX_VALUE, is the ONLY hard limit that I know of in the
software, and only applies when the index is not sharded.

> 2. Can Solr index the media data??

I have no idea what you meant here, but if you mean metadata, Solr most
likely can handle it.  If you meant actual media, like an image, I
believe there is a binary field type that you can even store a full
source document in, but that is not normally the way Solr is used, and I
don't recommend it.

> 3. What's the max size of document that Solr can index??? 

I don't think there is a limit.  I think there are some limits on the
number and size of individual terms, but not on the total size of a
document.  If documents get particularly large and numerous, performance
might suffer, but I am not aware of any total size limitations.

Thanks,
Shawn