You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Damien Fontaine <df...@rosebud.fr> on 2011/01/24 09:03:02 UTC

Taxonomy in SOLR

Hi,

I am trying Solr and i have one question. In the schema that i set up, 
there are 10 fields with always same data(hierarchical taxonomies) but 
with 4 million
documents, space disk and indexing time must be big. I need this field 
for auto complete. Is there another way to do this type of operation ?

Damien

Re: Taxonomy in SOLR

Posted by Em <ma...@yahoo.de>.

Just for illustration:

This is your original data:

doc1 : hello world
doc2: hello daniem
doc3: hello pal

Now, Lucene produces something like this from the input:
hello: id_doc1,id_doc2,id_doc3
daniem: id_doc2
pal: id_doc3

Well, it's more complex, but enough for illustration.
As you can see, the representation of a document is completly different.
A document costs only a few bytes for a Lucene-internal id per word.

If words occur more than one time per document AND you do not store
termVectors, Lucene just adds the number of occurences per word per doc to
its index:

hello: id_doc1[1],id_doc2[1],id_doc3[1]
daniem: id_doc2[1]
pal: id_doc3[1]

Imagine what happens to longer texts where especially stopwords or important
words occur more than one time.

I would suggest to start with the Lucene-Wiki, if you want to learn more
about Lucene.

Regards,
Em
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Taxonomy-in-SOLR-tp2317955p2319920.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Taxonomy in SOLR

Posted by Damien Fontaine <df...@rosebud.fr>.


Le 24/01/2011 13:10, Em a écrit :
> Hi Daniem,
>
> ahm, the formula I wrote was no definitive guide, just some numbers I
> combined to visualize the amount of data - perhaps not even a complete
> formula.
>
> Well, when you can use your taxonomy as indexed-only you do not double the
> used disk space when you are indexing two equal documents.
So, five document or 4 mi with the same taxonomy are equal in using disk 
space to one ?

> Lucene - and also Solr - are working with an inverted index: This means
> every document is mapped against its indexed terms.
> So your index-size will depend on the number of unique taxonomy-terms and
> the pointers of the documents to these terms. That's it. Usually the used
> disk-space for an index is much smaller than the size of the original data.
>
> I hope what I tried to explain was easy to understand.
Thanks, it's very helpfull !

How i can find more explaination on the internal structure of the Lucene 
indexer ?

Damien

Re: Taxonomy in SOLR

Posted by Em <ma...@yahoo.de>.

Hi Daniem,

ahm, the formula I wrote was no definitive guide, just some numbers I
combined to visualize the amount of data - perhaps not even a complete
formula.

Well, when you can use your taxonomy as indexed-only you do not double the
used disk space when you are indexing two equal documents.

Lucene - and also Solr - are working with an inverted index: This means
every document is mapped against its indexed terms.
So your index-size will depend on the number of unique taxonomy-terms and
the pointers of the documents to these terms. That's it. Usually the used
disk-space for an index is much smaller than the size of the original data.

I hope what I tried to explain was easy to understand.

Regards
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Taxonomy-in-SOLR-tp2317955p2319202.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Taxonomy in SOLR

Posted by Damien Fontaine <df...@rosebud.fr>.

Thanks Em,

How i can calculate index time, update time and space disk used by one 
taxonomy ?

Le 24/01/2011 10:58, Em a écrit :
> 100 Entries per taxon?
> Well, with Solr you got 100 taxon-entries * 4mio docs * 10 taxons.
> If your indexed taxon-versions are looking okay, you could leave out the
> DB-overhead and could do everything in Solr.
>
>

Re: Taxonomy in SOLR

Posted by Em <ma...@yahoo.de>.

100 Entries per taxon?
Well, with Solr you got 100 taxon-entries * 4mio docs * 10 taxons.
If your indexed taxon-versions are looking okay, you could leave out the
DB-overhead and could do everything in Solr.


-- 
View this message in context: http://lucene.472066.n3.nabble.com/Taxonomy-in-SOLR-tp2317955p2318550.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Taxonomy in SOLR

Posted by Damien Fontaine <df...@rosebud.fr>.

Thanks Em and Erick for your answers,

Now, i better understand functioning of Solr.

Damien

Le 24/01/2011 16:23, Erick Erickson a écrit :
> First, the redundancy is certainly there, but that's what Solr does, handles
> large
> amounts of data. 4 million documents is actually a pretty small corpus by
> Solr
> standards, so you may well be able to do exactly what you propose with
> acceptable performance/size. I'd advise just trying it with, say, 200,000
> docs.
> Why 200K? because index growth is non-linear with the first bunch of
> documents
> taking up more space than the second. So index 100K, examine your indexes
> and index 100K more. Now use the delta to extrapolate to 4M.
>
> You don't need to store the taxonomy in each doc for auto-complete, you can
> get your auto-completion from a different index. Or you can index your
> taxonomies
> in a "special" document in Solr and query the (unique) field in that
> document for
> autocomplete.
>
> For faceting, you do need taxonomies. But remember that the nature of the
> inverted index is that unique terms are only stored once, and the document
> ID for each document that that term appears in is recorded. So if you have
> 3/europe/germany/berlin stored in 1M documents, your index space is really
> <string length + overhead>  +<space for 1M ids>.
>
> Best
> Erick
>
> On Mon, Jan 24, 2011 at 4:53 AM, Damien Fontaine<df...@rosebud.fr>wrote:
>
>> Yes, i am not obliged to store taxonomies.
>>
>> My taxonomies are type of
>>
>> english_taxon_label = Berlin
>> english_taxon_type = location
>> english_taxon_hierarchy = 0/world
>>                                               1/world/europe
>>                                               2/world/europe/germany
>>                                               3/world/europe/germany/berlin
>>
>> I need *_taxon_hierarchy to faceting and label to auto complete.
>>
>> With a RDBMs, i have 100 entry max for one taxo, but with solr and 4
>> million documents the redundandcy is huge, no ?
>>
>> And i have 10 different taxonomies per document ....
>>
>> Damien
>>
>> Le 24/01/2011 10:30, Em a écrit :
>>
>>   Hi Damien,
>>> why are you storing the taxonomies?
>>> When it comes to faceting, it only depends on indexed values. If there is
>>> a
>>> meaningful difference between the indexed and the stored value, I would
>>> prefer to use an RDBMs or something like that to reduce redundancy.
>>>
>>> Does this help?
>>>
>>> Regards
>>>
>>

Re: Taxonomy in SOLR

Posted by Em <ma...@yahoo.de>.

Thank you for the advice, Erick!

I will take a look at extending the StandardRequestHandler for such
usecases.


Erick Erickson wrote:
> 
> I wasn't thinking about this for adding information to the *request*.
> Rather, in this
> case the autocomplete uses an Ajax call that just uses the TermsComponent
> to get the autocomplete data and display it. This is just textual, so
> adding
> it to the
> request is client-side magic.
> 
> If you want your app to have access to the meta-data for other purposes,
> you'd
> just query and cache it from the app. You could use that to build up the
> links
> you embed in the page for new queries if you chose, no custom handlers
> necessary.
> 
> Otherwise, I guess you'd create a custom request handler, that seems like
> a
> reasonable place.
> 
> Best
> Erick
> 
> On Mon, Jan 24, 2011 at 11:03 AM, Em <ma...@yahoo.de> wrote:
> 
>>
>> Hi Erick,
>>
>> in some usecases I really think that your suggestion with some
>> unique-documents for meta-information is a good approach to solve some
>> issues.
>> However there is a hurdle for me and maybe you can help me to clear it:
>>
>> What is the best way to get such meta-data?
>> I see three possible approaches:
>> 1st: get it in another request
>> 2nd: get it with a requestHandler
>> 3rd: get it with a searchComponent
>>
>> I think the 2nd and 3rd are the cleanest ways.
>> But to make a decision between them I run into two problems:
>> RequestHandler: Should I extend the StandardRequestHandler to do what I
>> need? If so, I could just query my index for the needed information and
>> add
>> it to the request before I pass it up the SearchComponents.
>>
>> SearchComponent: The problem with the SearchComponent is the distributed
>> thing and how to test it. However, if this would be the cleanest way to
>> go,
>> one should go it.
>>
>> What would you do, if you want to add some meta-information to your
>> request
>> that was not given by the user?
>>
>> Regards,
>> Em
>>
>>
>> Erick Erickson wrote:
>> >
>> > First, the redundancy is certainly there, but that's what Solr does,
>> > handles
>> > large
>> > amounts of data. 4 million documents is actually a pretty small corpus
>> by
>> > Solr
>> > standards, so you may well be able to do exactly what you propose with
>> > acceptable performance/size. I'd advise just trying it with, say,
>> 200,000
>> > docs.
>> > Why 200K? because index growth is non-linear with the first bunch of
>> > documents
>> > taking up more space than the second. So index 100K, examine your
>> indexes
>> > and index 100K more. Now use the delta to extrapolate to 4M.
>> >
>> > You don't need to store the taxonomy in each doc for auto-complete, you
>> > can
>> > get your auto-completion from a different index. Or you can index your
>> > taxonomies
>> > in a "special" document in Solr and query the (unique) field in that
>> > document for
>> > autocomplete.
>> >
>> > For faceting, you do need taxonomies. But remember that the nature of
>> the
>> > inverted index is that unique terms are only stored once, and the
>> document
>> > ID for each document that that term appears in is recorded. So if you
>> have
>> > 3/europe/germany/berlin stored in 1M documents, your index space is
>> really
>> > <string length + overhead> + <space for 1M ids>.
>> >
>> > Best
>> > Erick
>> >
>> > On Mon, Jan 24, 2011 at 4:53 AM, Damien Fontaine
>> > <df...@rosebud.fr>wrote:
>> >
>> >> Yes, i am not obliged to store taxonomies.
>> >>
>> >> My taxonomies are type of
>> >>
>> >> english_taxon_label = Berlin
>> >> english_taxon_type = location
>> >> english_taxon_hierarchy = 0/world
>> >>                                              1/world/europe
>> >>                                              2/world/europe/germany
>> >>
>> >> 3/world/europe/germany/berlin
>> >>
>> >> I need *_taxon_hierarchy to faceting and label to auto complete.
>> >>
>> >> With a RDBMs, i have 100 entry max for one taxo, but with solr and 4
>> >> million documents the redundandcy is huge, no ?
>> >>
>> >> And i have 10 different taxonomies per document ....
>> >>
>> >> Damien
>> >>
>> >> Le 24/01/2011 10:30, Em a écrit :
>> >>
>> >>  Hi Damien,
>> >>>
>> >>> why are you storing the taxonomies?
>> >>> When it comes to faceting, it only depends on indexed values. If
>> there
>> >>> is
>> >>> a
>> >>> meaningful difference between the indexed and the stored value, I
>> would
>> >>> prefer to use an RDBMs or something like that to reduce redundancy.
>> >>>
>> >>> Does this help?
>> >>>
>> >>> Regards
>> >>>
>> >>
>> >>
>> >
>> >
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Taxonomy-in-SOLR-tp2317955p2320666.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
> 
> 

-- 
View this message in context: http://lucene.472066.n3.nabble.com/Taxonomy-in-SOLR-tp2317955p2321340.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Taxonomy in SOLR

Posted by Erick Erickson <er...@gmail.com>.

I wasn't thinking about this for adding information to the *request*.
Rather, in this
case the autocomplete uses an Ajax call that just uses the TermsComponent
to get the autocomplete data and display it. This is just textual, so adding
it to the
request is client-side magic.

If you want your app to have access to the meta-data for other purposes,
you'd
just query and cache it from the app. You could use that to build up the
links
you embed in the page for new queries if you chose, no custom handlers
necessary.

Otherwise, I guess you'd create a custom request handler, that seems like a
reasonable place.

Best
Erick

On Mon, Jan 24, 2011 at 11:03 AM, Em <ma...@yahoo.de> wrote:

>
> Hi Erick,
>
> in some usecases I really think that your suggestion with some
> unique-documents for meta-information is a good approach to solve some
> issues.
> However there is a hurdle for me and maybe you can help me to clear it:
>
> What is the best way to get such meta-data?
> I see three possible approaches:
> 1st: get it in another request
> 2nd: get it with a requestHandler
> 3rd: get it with a searchComponent
>
> I think the 2nd and 3rd are the cleanest ways.
> But to make a decision between them I run into two problems:
> RequestHandler: Should I extend the StandardRequestHandler to do what I
> need? If so, I could just query my index for the needed information and add
> it to the request before I pass it up the SearchComponents.
>
> SearchComponent: The problem with the SearchComponent is the distributed
> thing and how to test it. However, if this would be the cleanest way to go,
> one should go it.
>
> What would you do, if you want to add some meta-information to your request
> that was not given by the user?
>
> Regards,
> Em
>
>
> Erick Erickson wrote:
> >
> > First, the redundancy is certainly there, but that's what Solr does,
> > handles
> > large
> > amounts of data. 4 million documents is actually a pretty small corpus by
> > Solr
> > standards, so you may well be able to do exactly what you propose with
> > acceptable performance/size. I'd advise just trying it with, say, 200,000
> > docs.
> > Why 200K? because index growth is non-linear with the first bunch of
> > documents
> > taking up more space than the second. So index 100K, examine your indexes
> > and index 100K more. Now use the delta to extrapolate to 4M.
> >
> > You don't need to store the taxonomy in each doc for auto-complete, you
> > can
> > get your auto-completion from a different index. Or you can index your
> > taxonomies
> > in a "special" document in Solr and query the (unique) field in that
> > document for
> > autocomplete.
> >
> > For faceting, you do need taxonomies. But remember that the nature of the
> > inverted index is that unique terms are only stored once, and the
> document
> > ID for each document that that term appears in is recorded. So if you
> have
> > 3/europe/germany/berlin stored in 1M documents, your index space is
> really
> > <string length + overhead> + <space for 1M ids>.
> >
> > Best
> > Erick
> >
> > On Mon, Jan 24, 2011 at 4:53 AM, Damien Fontaine
> > <df...@rosebud.fr>wrote:
> >
> >> Yes, i am not obliged to store taxonomies.
> >>
> >> My taxonomies are type of
> >>
> >> english_taxon_label = Berlin
> >> english_taxon_type = location
> >> english_taxon_hierarchy = 0/world
> >>                                              1/world/europe
> >>                                              2/world/europe/germany
> >>
> >> 3/world/europe/germany/berlin
> >>
> >> I need *_taxon_hierarchy to faceting and label to auto complete.
> >>
> >> With a RDBMs, i have 100 entry max for one taxo, but with solr and 4
> >> million documents the redundandcy is huge, no ?
> >>
> >> And i have 10 different taxonomies per document ....
> >>
> >> Damien
> >>
> >> Le 24/01/2011 10:30, Em a écrit :
> >>
> >>  Hi Damien,
> >>>
> >>> why are you storing the taxonomies?
> >>> When it comes to faceting, it only depends on indexed values. If there
> >>> is
> >>> a
> >>> meaningful difference between the indexed and the stored value, I would
> >>> prefer to use an RDBMs or something like that to reduce redundancy.
> >>>
> >>> Does this help?
> >>>
> >>> Regards
> >>>
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Taxonomy-in-SOLR-tp2317955p2320666.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Taxonomy in SOLR

Posted by Em <ma...@yahoo.de>.

Hi Erick,

in some usecases I really think that your suggestion with some
unique-documents for meta-information is a good approach to solve some
issues.
However there is a hurdle for me and maybe you can help me to clear it:

What is the best way to get such meta-data?
I see three possible approaches:
1st: get it in another request
2nd: get it with a requestHandler
3rd: get it with a searchComponent

I think the 2nd and 3rd are the cleanest ways.
But to make a decision between them I run into two problems:
RequestHandler: Should I extend the StandardRequestHandler to do what I
need? If so, I could just query my index for the needed information and add
it to the request before I pass it up the SearchComponents.

SearchComponent: The problem with the SearchComponent is the distributed
thing and how to test it. However, if this would be the cleanest way to go,
one should go it.

What would you do, if you want to add some meta-information to your request
that was not given by the user?

Regards,
Em


Erick Erickson wrote:
> 
> First, the redundancy is certainly there, but that's what Solr does,
> handles
> large
> amounts of data. 4 million documents is actually a pretty small corpus by
> Solr
> standards, so you may well be able to do exactly what you propose with
> acceptable performance/size. I'd advise just trying it with, say, 200,000
> docs.
> Why 200K? because index growth is non-linear with the first bunch of
> documents
> taking up more space than the second. So index 100K, examine your indexes
> and index 100K more. Now use the delta to extrapolate to 4M.
> 
> You don't need to store the taxonomy in each doc for auto-complete, you
> can
> get your auto-completion from a different index. Or you can index your
> taxonomies
> in a "special" document in Solr and query the (unique) field in that
> document for
> autocomplete.
> 
> For faceting, you do need taxonomies. But remember that the nature of the
> inverted index is that unique terms are only stored once, and the document
> ID for each document that that term appears in is recorded. So if you have
> 3/europe/germany/berlin stored in 1M documents, your index space is really
> <string length + overhead> + <space for 1M ids>.
> 
> Best
> Erick
> 
> On Mon, Jan 24, 2011 at 4:53 AM, Damien Fontaine
> <df...@rosebud.fr>wrote:
> 
>> Yes, i am not obliged to store taxonomies.
>>
>> My taxonomies are type of
>>
>> english_taxon_label = Berlin
>> english_taxon_type = location
>> english_taxon_hierarchy = 0/world
>>                                              1/world/europe
>>                                              2/world/europe/germany
>>                                             
>> 3/world/europe/germany/berlin
>>
>> I need *_taxon_hierarchy to faceting and label to auto complete.
>>
>> With a RDBMs, i have 100 entry max for one taxo, but with solr and 4
>> million documents the redundandcy is huge, no ?
>>
>> And i have 10 different taxonomies per document ....
>>
>> Damien
>>
>> Le 24/01/2011 10:30, Em a écrit :
>>
>>  Hi Damien,
>>>
>>> why are you storing the taxonomies?
>>> When it comes to faceting, it only depends on indexed values. If there
>>> is
>>> a
>>> meaningful difference between the indexed and the stored value, I would
>>> prefer to use an RDBMs or something like that to reduce redundancy.
>>>
>>> Does this help?
>>>
>>> Regards
>>>
>>
>>
> 
> 

-- 
View this message in context: http://lucene.472066.n3.nabble.com/Taxonomy-in-SOLR-tp2317955p2320666.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Taxonomy in SOLR

Posted by Erick Erickson <er...@gmail.com>.

First, the redundancy is certainly there, but that's what Solr does, handles
large
amounts of data. 4 million documents is actually a pretty small corpus by
Solr
standards, so you may well be able to do exactly what you propose with
acceptable performance/size. I'd advise just trying it with, say, 200,000
docs.
Why 200K? because index growth is non-linear with the first bunch of
documents
taking up more space than the second. So index 100K, examine your indexes
and index 100K more. Now use the delta to extrapolate to 4M.

You don't need to store the taxonomy in each doc for auto-complete, you can
get your auto-completion from a different index. Or you can index your
taxonomies
in a "special" document in Solr and query the (unique) field in that
document for
autocomplete.

For faceting, you do need taxonomies. But remember that the nature of the
inverted index is that unique terms are only stored once, and the document
ID for each document that that term appears in is recorded. So if you have
3/europe/germany/berlin stored in 1M documents, your index space is really
<string length + overhead> + <space for 1M ids>.

Best
Erick

On Mon, Jan 24, 2011 at 4:53 AM, Damien Fontaine <df...@rosebud.fr>wrote:

> Yes, i am not obliged to store taxonomies.
>
> My taxonomies are type of
>
> english_taxon_label = Berlin
> english_taxon_type = location
> english_taxon_hierarchy = 0/world
>                                              1/world/europe
>                                              2/world/europe/germany
>                                              3/world/europe/germany/berlin
>
> I need *_taxon_hierarchy to faceting and label to auto complete.
>
> With a RDBMs, i have 100 entry max for one taxo, but with solr and 4
> million documents the redundandcy is huge, no ?
>
> And i have 10 different taxonomies per document ....
>
> Damien
>
> Le 24/01/2011 10:30, Em a écrit :
>
>  Hi Damien,
>>
>> why are you storing the taxonomies?
>> When it comes to faceting, it only depends on indexed values. If there is
>> a
>> meaningful difference between the indexed and the stored value, I would
>> prefer to use an RDBMs or something like that to reduce redundancy.
>>
>> Does this help?
>>
>> Regards
>>
>
>

Re: Taxonomy in SOLR

Posted by Damien Fontaine <df...@rosebud.fr>.

Yes, i am not obliged to store taxonomies.

My taxonomies are type of

english_taxon_label = Berlin
english_taxon_type = location
english_taxon_hierarchy = 0/world
                                               1/world/europe
                                               2/world/europe/germany
                                               3/world/europe/germany/berlin

I need *_taxon_hierarchy to faceting and label to auto complete.

With a RDBMs, i have 100 entry max for one taxo, but with solr and 4 
million documents the redundandcy is huge, no ?

And i have 10 different taxonomies per document ....

Damien

Le 24/01/2011 10:30, Em a écrit :
> Hi Damien,
>
> why are you storing the taxonomies?
> When it comes to faceting, it only depends on indexed values. If there is a
> meaningful difference between the indexed and the stored value, I would
> prefer to use an RDBMs or something like that to reduce redundancy.
>
> Does this help?
>
> Regards

Re: Taxonomy in SOLR

Posted by Em <ma...@yahoo.de>.

Hi Damien,

why are you storing the taxonomies?
When it comes to faceting, it only depends on indexed values. If there is a
meaningful difference between the indexed and the stored value, I would
prefer to use an RDBMs or something like that to reduce redundancy.

Does this help?

Regards
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Taxonomy-in-SOLR-tp2317955p2318363.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Taxonomy in SOLR

Posted by Damien Fontaine <df...@rosebud.fr>.

My schema :

<field name=id type=string indexed=true stored=true required=true />

<!-- Document -->
<field name=lead type=string indexed=true stored=true />
<field name=title type=string indexed=true stored=true required=true />
<field name=text type=string indexed=true stored=true required=true />

<!-- taxo -->
<dynamicField name=*_taxon_label type=string indexed=true stored=true />
<dynamicField name=*_taxon_type type=string indexed=true stored=true />
<dynamicField name=*_taxon_hierarchy type=string indexed=true stored=true multiValued=true />

<field name=type type=string indexed=true stored=true required=true />


Le 24/01/2011 09:56, Em a écrit :
> Hi Damien,
>
> can you provide a schema sample plus example-data?
> Since your information is really general, I think no one can give you a
> situation-specific advice.
>
> Regards

Re: Taxonomy in SOLR

Posted by Em <ma...@yahoo.de>.

Hi Damien,

can you provide a schema sample plus example-data?
Since your information is really general, I think no one can give you a
situation-specific advice.

Regards
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Taxonomy-in-SOLR-tp2317955p2318200.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Taxonomy in SOLR

Posted by Jonathan Rochkind <ro...@jhu.edu>.

There aren't any great general purpose out of the box ways to handle 
hieararchical data in Solr.  Solr isn't an rdbms.

There may be some particular advice on how to set up a particular Solr 
index to answer particular questions with regard to hieararchical data.

I saw a great point made recently comparing rdbms to NoSQL stores, which 
applied to Solr too even though Solr is NOT a "noSQL store".  In rdbms, 
you set up your schema thinking only about your _data_, and modelling 
your data as flexibly as possible. Then once you've done that, you can 
ask pretty much any well-specified question you want of your data, and 
get a correct and reasonably performant answer.

In Solr, on the other hand, we set up our schemas to answer particular 
questions. You have to first figure out what kinds of questions you will 
want to ask Solr, what kinds of queries you'll want to make, and then 
you can figure out how to structure your data to ask those questions.  
Some questions are actually very hard to set up Solr to answer -- in 
general Solr is about setting up your data so whatever question you have 
can be reduced to asking "is token X in field Y".

This can be especially tricky in cases where you want to use a single 
Solr index to answer multiple questions, where the questions are such 
that you really need to set up your data _differently_ to get Solr to 
optimally answer each question.

Solr is not a general purpose store like an rdbms, where you can set up 
your schema once in terms of your data and use it to answer nearly any 
conceivable well-specified question after that.  Instead, Solr does 
things that rdbms can't do quickly or can't do at all.  But you lose 
some things too.

On 1/24/2011 3:03 AM, Damien Fontaine wrote:
> Hi,
>
> I am trying Solr and i have one question. In the schema that i set up,
> there are 10 fields with always same data(hierarchical taxonomies) but
> with 4 million
> documents, space disk and indexing time must be big. I need this field
> for auto complete. Is there another way to do this type of operation ?
>
> Damien
>