You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Ian Connor <ia...@gmail.com> on 2010/07/12 22:45:20 UTC

How to find first document for the ALL search

I have found that this search crashes:

/solr/select?q=*%3A*&fq=&start=0&rows=1&fl=id

SEVERE: java.lang.IndexOutOfBoundsException: Index: 114, Size: 90
    at java.util.ArrayList.RangeCheck(ArrayList.java:547)
    at java.util.ArrayList.get(ArrayList.java:322)
    at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:288)
    at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:217)
    at
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:948)
    at
org.apache.lucene.index.DirectoryReader.document(DirectoryReader.java:506)
    at
org.apache.solr.search.SolrIndexReader.document(SolrIndexReader.java:259)

but this one works:

/solr/select?q=*%3A*&fq=&start=1&rows=1&fl=id

It looks like just that first document is bad. I am happy to delete it - but
not sure how to get to it. Does anyone know how to find it?

- Ian

Re: How to find first document for the ALL search

Posted by Ian Connor <ia...@gmail.com>.
Hi,

The good news is that:

/solr/select?q=*%3A*&fq=&start=1&rows=1&fl=id

did work (kind of odd really) so I am reading all the documents from the bad
one to a new solr using using the same configuration using ruby (complete
rebuild).

so far so good - it is gone through 500k out of 1.7M and seems to be the
best I could think of.

Running the luke tool and trying to check the index on a copy ended up
destroying the index and leaving only about 5k documents left. Reading them
out via ruby seemed better in this case (and less work than restoring from
backup and re running a few days transactions to catch it up).

Ian.


On Wed, Jul 14, 2010 at 9:22 PM, Chris Hostetter
<ho...@fucit.org>wrote:

>
> : I have found that this search crashes:
> :
> : /solr/select?q=*%3A*&fq=&start=0&rows=1&fl=id
>
> Ouch .. that exception is kind of hairy.  it suggests that your index may
> have been corrupted in some way -- do you have nay idea what happened?
> have you tried using hte CheckIndex tool to see what it says?
>
> (I'd hate to help you workd arround this but get bit by a timebomb of some
> other bad docs later)
>
> : It looks like just that first document is bad. I am happy to delete it -
> but
> : not sure how to get to it. Does anyone know how to find it?
>
> CheckIndexes might help ... if it doesn't the next thing you might try is
> asking for a legitimate field name that you know no document has (ie: if
> you have a dynamicField with the pattern "str_*" because you have fields
> like "str_foo" and "str_bar" but you never have fields named
> "str____BOGUS" then use fl=str____BOGUS) and then add debugQuery=true to
> the URL -- the debug info should contain the id.
>
> I'll be honest thought: i'm guessing that if your example query doesn't
> work, by suggestion won't either -- because if you get that error just
> trying to access the "id" field, the same thing will probably happen when
> the debugComponent tries to look at up as well.
>
>
>
> -Hoss
>
>


-- 
Regards,

Ian Connor
1 Leighton St #723
Cambridge, MA 02141
Call Center Phone: +1 (714) 239 3875 (24 hrs)
Fax: +1(770) 818 5697
Skype: ian.connor

Re: How to find first document for the ALL search

Posted by Chris Hostetter <ho...@fucit.org>.
: I have found that this search crashes:
: 
: /solr/select?q=*%3A*&fq=&start=0&rows=1&fl=id

Ouch .. that exception is kind of hairy.  it suggests that your index may 
have been corrupted in some way -- do you have nay idea what happened?  
have you tried using hte CheckIndex tool to see what it says?

(I'd hate to help you workd arround this but get bit by a timebomb of some 
other bad docs later)

: It looks like just that first document is bad. I am happy to delete it - but
: not sure how to get to it. Does anyone know how to find it?

CheckIndexes might help ... if it doesn't the next thing you might try is 
asking for a legitimate field name that you know no document has (ie: if 
you have a dynamicField with the pattern "str_*" because you have fields 
like "str_foo" and "str_bar" but you never have fields named 
"str____BOGUS" then use fl=str____BOGUS) and then add debugQuery=true to 
the URL -- the debug info should contain the id.

I'll be honest thought: i'm guessing that if your example query doesn't 
work, by suggestion won't either -- because if you get that error just 
trying to access the "id" field, the same thing will probably happen when 
the debugComponent tries to look at up as well.



-Hoss


Re: range faceting with integers

Posted by Chris Hostetter <ho...@fucit.org>.
: Subject: range faceting with integers
: References: <AA...@mail.gmail.com>
: In-Reply-To: <AA...@mail.gmail.com>

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking




-Hoss


range faceting with integers

Posted by Jonathan Rochkind <ro...@jhu.edu>.
So I want to provide some range facets with an integer (probably tint, 
that is trie field with non-0 precision) solr field.

It's clear enough how to do this, along the lines of facet.query=[1 TO 
100]&facet.query=[101 TO 200]&facet.query=[201 TO 300]

etc.

The issue is that I'd like to calculate N equal ranges based on the min 
and max value found in the field. 

I can't think of any way to do this that doesn't require two querries -- 
one to get the min and max (within the current search set), then 
calculate the ranges client-side (possibly making the boundaries 'nice' 
numbers instead of strictly equal ranges), then do another query with 
the calculated facet.queries set.

Is there any other trick I'm missing here?  If there were date values, 
you could possibly use facet.date.gap, although I'm not even sure if 
that works without explicitly setting the facet.date.start, not sure if 
you can leave facet.date.start unset meaning "the minimum value in the 
field" or not.  But I'm not dealing with dates here anyway, but with 
integers.

So anything I'm missing, or just have the client do two queries?   For 
that matter, is there an easy way to ask for minimum and maximum values 
in a field, within a result set?

Thanks for any advice,
Jonathan