You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Zahoor Mohamed <za...@indix.com> on 2013/10/04 10:51:39 UTC

Size of ID field

Hi

Does the size of ID field matter .. in terms of memory usage...and query
performance...

i.e. will Solr use more memory if you use a URL string as ID field instead
of a int value?

./zahoor

Re: Size of ID field

Posted by Zahoor Mohamed <za...@indix.com>.
Thanks


On Fri, Oct 4, 2013 at 5:56 PM, Jack Krupansky <ja...@basetechnology.com>wrote:

> It all depends. I mean, if you have 20 million URLs averaging 40
> characters each, that's 80 MB, not a big deal at all, but if you have 20
> billion URLs that would take up 80 GB, which might be a big deal. But if
> you shard those 20 billion documents into 10 shards, 8 GB may or may not be
> a big deal, all depending on your hardware and expectations, not to mention
> all the rest of the fields in your documents.
>
> Sure, a string longer than 4 characters takes up more space than an int.
> Is that what you are asking??
>
> But, generally, the ID should be a string - there are plenty of places in
> Solr which only support string IDs.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Zahoor Mohamed
> Sent: Friday, October 04, 2013 4:51 AM
> To: solr-user@lucene.apache.org
> Subject: Size of ID field
>
>
> Hi
>
> Does the size of ID field matter .. in terms of memory usage...and query
> performance...
>
> i.e. will Solr use more memory if you use a URL string as ID field instead
> of a int value?
>
> ./zahoor
>

Re: Size of ID field

Posted by Jack Krupansky <ja...@basetechnology.com>.
It all depends. I mean, if you have 20 million URLs averaging 40 characters 
each, that's 80 MB, not a big deal at all, but if you have 20 billion URLs 
that would take up 80 GB, which might be a big deal. But if you shard those 
20 billion documents into 10 shards, 8 GB may or may not be a big deal, all 
depending on your hardware and expectations, not to mention all the rest of 
the fields in your documents.

Sure, a string longer than 4 characters takes up more space than an int. Is 
that what you are asking??

But, generally, the ID should be a string - there are plenty of places in 
Solr which only support string IDs.

-- Jack Krupansky

-----Original Message----- 
From: Zahoor Mohamed
Sent: Friday, October 04, 2013 4:51 AM
To: solr-user@lucene.apache.org
Subject: Size of ID field

Hi

Does the size of ID field matter .. in terms of memory usage...and query
performance...

i.e. will Solr use more memory if you use a URL string as ID field instead
of a int value?

./zahoor 


Re: Size of ID field

Posted by Dmitry Kan <so...@gmail.com>.
Using arbitrary strings affects at least on the traffic between the
shard(s) and a querying client or shards and a frontend solr instance. We
have actually hit such an issue, described here:
https://issues.apache.org/jira/browse/SOLR-4903, which has triggered the
suggestion for ids compaction:
https://issues.apache.org/jira/browse/SOLR-4904

Lucene uses internal doc ids, so I would wildly guess it shouldn't matter
what ids are used on the application level, but would love to hear more on
this topic from someone who knows for sure.

Dmitry


On Fri, Oct 4, 2013 at 11:51 AM, Zahoor Mohamed <za...@indix.com> wrote:

> Hi
>
> Does the size of ID field matter .. in terms of memory usage...and query
> performance...
>
> i.e. will Solr use more memory if you use a URL string as ID field instead
> of a int value?
>
> ./zahoor
>