You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Olson, Ron" <RO...@lbpc.com> on 2011/09/19 18:12:47 UTC

Two unrelated questions

Hi all-

I'm not sure if I should break this out into two separate questions to the list for searching purposes, or if one is more acceptable (don't want to flood).

I have two (hopefully) straightforward questions:

1. Is it possible to expose the unique ID of a document to a DIH query? The reason I want to do this is because I use the unique ID of the row in the table as the unique ID of the Lucene document, but I've noticed that the counts of documents doesn't match the count in the table; I'd like to add these rows and was hoping to avoid writing a custom SolrJ app to do it.

2. Is there any limit to the number of conditions in a Boolean search? We're working on a new project where the user can choose either, for example, "Ford Vehicles", in which case I can simply search for "Ford", but if the user chooses specific makes and models, then I have to say something like "Crown Vic OR Focus OR Taurus OR F-150", etc., where they could theoretically choose every model of Ford ever made except one. This could lead to a *very* large query, and was worried both that it was even possible, but also the impact on performance.


Thanks, and I apologize if this really should be two separate messages.

Ron

DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is  unauthorized and strictly prohibited.  If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company.
Thank you.

Re: Two unrelated questions

Posted by "tamanjit.bindra@yahoo.co.in" <ta...@yahoo.co.in>.
For *1* I have faced similar issues, and have realized that it has got more
to do with the data I am trying to index. In some cases when I run even a
full-import with DIH, unless its a flat table that I am tryin to index,
there are often issues at data end when I try to get joins and then index
data.

Am not too sure if you are joining two tables. If not I would suggest that
you re-check your data and then re-index using full-import.

--
View this message in context: http://lucene.472066.n3.nabble.com/Two-unrelated-questions-tp3348991p3357720.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Two unrelated questions

Posted by Rob Casson <ro...@gmail.com>.
for #1, i don't use DIH, but is there any possibility of that column
having duplicate keys, with subsequent docs replacing existing ones?

and for #2, for some cases you could use a negative filterquery:

     http://wiki.apache.org/solr/SimpleFacetParameters#Retrieve_docs_with_facets_missing

so instead of that "fq=-facetField:[* TO *]", something like
"fq=-car_make:Taurus".  picking "negatives" might even make the UI a
bit easier.

anyway, just some thoughts.  cheers,
rob

On Wed, Sep 21, 2011 at 5:17 PM, Olson, Ron <RO...@lbpc.com> wrote:
> Thanks for the reply. As far as #1, my table that I'm indexing via DIH has a PK field, generated by a sequence, so there are records with ID of 1, 2, 3, etc. That same id is the one I use in my unique id field in the document (<uniqueKey>ID</uniqueID>).
>
> I've noticed that the table has, say, 10 rows. My index only has 8. I don't know why that is, but I'd like to figure out which records are missing and add them (and hopefully understand why they weren't added in the first place). I was just wondering if there was some way to compare the two as part of a sql query, but on reflection, it does seem like an absurd request, so I apologize; I think what I'll have to do is write a solrj program that gets every ID in the table, then does a search on that ID in the index, and add the ones that are missing.
>
> Regarding the second item, yes, it's crazy but I'm not sure what to do; there really are that many options and some searches will be extremely specific, yet broad enough in terms for this to be a problem.
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Wednesday, September 21, 2011 3:55 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Two unrelated questions
>
> for <1> I don't quite get what you're driving at. Your DIH
> query assigns the uniqueKey, it's not like it's something
> auto-generated. Perhaps a concrete example would
> help.
>
> <2> There's a limit you can adjust that defaults to
> 1024 (maxBooleanClauses in solrconfig.xml). You can
>  bump this very high, but you're right, if anyone actually
> does something absurd it'll slow *that* query down. But
> just bumping this query higher won't change performance
> absent someone actually putting a ton of items in it...
>
> Best
> Erick
>
> On Mon, Sep 19, 2011 at 9:12 AM, Olson, Ron <RO...@lbpc.com> wrote:
>> Hi all-
>>
>> I'm not sure if I should break this out into two separate questions to the list for searching purposes, or if one is more acceptable (don't want to flood).
>>
>> I have two (hopefully) straightforward questions:
>>
>> 1. Is it possible to expose the unique ID of a document to a DIH query? The reason I want to do this is because I use the unique ID of the row in the table as the unique ID of the Lucene document, but I've noticed that the counts of documents doesn't match the count in the table; I'd like to add these rows and was hoping to avoid writing a custom SolrJ app to do it.
>>
>> 2. Is there any limit to the number of conditions in a Boolean search? We're working on a new project where the user can choose either, for example, "Ford Vehicles", in which case I can simply search for "Ford", but if the user chooses specific makes and models, then I have to say something like "Crown Vic OR Focus OR Taurus OR F-150", etc., where they could theoretically choose every model of Ford ever made except one. This could lead to a *very* large query, and was worried both that it was even possible, but also the impact on performance.
>>
>>
>> Thanks, and I apologize if this really should be two separate messages.
>>
>> Ron
>>
>> DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is  unauthorized and strictly prohibited.  If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company.
>> Thank you.
>>
>
>
> DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is  unauthorized and strictly prohibited.  If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company.
> Thank you.
>

RE: Two unrelated questions

Posted by "Olson, Ron" <RO...@lbpc.com>.
Thanks for the reply. As far as #1, my table that I'm indexing via DIH has a PK field, generated by a sequence, so there are records with ID of 1, 2, 3, etc. That same id is the one I use in my unique id field in the document (<uniqueKey>ID</uniqueID>).

I've noticed that the table has, say, 10 rows. My index only has 8. I don't know why that is, but I'd like to figure out which records are missing and add them (and hopefully understand why they weren't added in the first place). I was just wondering if there was some way to compare the two as part of a sql query, but on reflection, it does seem like an absurd request, so I apologize; I think what I'll have to do is write a solrj program that gets every ID in the table, then does a search on that ID in the index, and add the ones that are missing.

Regarding the second item, yes, it's crazy but I'm not sure what to do; there really are that many options and some searches will be extremely specific, yet broad enough in terms for this to be a problem.

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com]
Sent: Wednesday, September 21, 2011 3:55 PM
To: solr-user@lucene.apache.org
Subject: Re: Two unrelated questions

for <1> I don't quite get what you're driving at. Your DIH
query assigns the uniqueKey, it's not like it's something
auto-generated. Perhaps a concrete example would
help.

<2> There's a limit you can adjust that defaults to
1024 (maxBooleanClauses in solrconfig.xml). You can
 bump this very high, but you're right, if anyone actually
does something absurd it'll slow *that* query down. But
just bumping this query higher won't change performance
absent someone actually putting a ton of items in it...

Best
Erick

On Mon, Sep 19, 2011 at 9:12 AM, Olson, Ron <RO...@lbpc.com> wrote:
> Hi all-
>
> I'm not sure if I should break this out into two separate questions to the list for searching purposes, or if one is more acceptable (don't want to flood).
>
> I have two (hopefully) straightforward questions:
>
> 1. Is it possible to expose the unique ID of a document to a DIH query? The reason I want to do this is because I use the unique ID of the row in the table as the unique ID of the Lucene document, but I've noticed that the counts of documents doesn't match the count in the table; I'd like to add these rows and was hoping to avoid writing a custom SolrJ app to do it.
>
> 2. Is there any limit to the number of conditions in a Boolean search? We're working on a new project where the user can choose either, for example, "Ford Vehicles", in which case I can simply search for "Ford", but if the user chooses specific makes and models, then I have to say something like "Crown Vic OR Focus OR Taurus OR F-150", etc., where they could theoretically choose every model of Ford ever made except one. This could lead to a *very* large query, and was worried both that it was even possible, but also the impact on performance.
>
>
> Thanks, and I apologize if this really should be two separate messages.
>
> Ron
>
> DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is  unauthorized and strictly prohibited.  If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company.
> Thank you.
>


DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is  unauthorized and strictly prohibited.  If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company.
Thank you.

Re: Two unrelated questions

Posted by Erick Erickson <er...@gmail.com>.
for <1> I don't quite get what you're driving at. Your DIH
query assigns the uniqueKey, it's not like it's something
auto-generated. Perhaps a concrete example would
help.

<2> There's a limit you can adjust that defaults to
1024 (maxBooleanClauses in solrconfig.xml). You can
 bump this very high, but you're right, if anyone actually
does something absurd it'll slow *that* query down. But
just bumping this query higher won't change performance
absent someone actually putting a ton of items in it...

Best
Erick

On Mon, Sep 19, 2011 at 9:12 AM, Olson, Ron <RO...@lbpc.com> wrote:
> Hi all-
>
> I'm not sure if I should break this out into two separate questions to the list for searching purposes, or if one is more acceptable (don't want to flood).
>
> I have two (hopefully) straightforward questions:
>
> 1. Is it possible to expose the unique ID of a document to a DIH query? The reason I want to do this is because I use the unique ID of the row in the table as the unique ID of the Lucene document, but I've noticed that the counts of documents doesn't match the count in the table; I'd like to add these rows and was hoping to avoid writing a custom SolrJ app to do it.
>
> 2. Is there any limit to the number of conditions in a Boolean search? We're working on a new project where the user can choose either, for example, "Ford Vehicles", in which case I can simply search for "Ford", but if the user chooses specific makes and models, then I have to say something like "Crown Vic OR Focus OR Taurus OR F-150", etc., where they could theoretically choose every model of Ford ever made except one. This could lead to a *very* large query, and was worried both that it was even possible, but also the impact on performance.
>
>
> Thanks, and I apologize if this really should be two separate messages.
>
> Ron
>
> DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is  unauthorized and strictly prohibited.  If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company.
> Thank you.
>