You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Mingfeng Yang <mf...@wisewindow.com> on 2013/06/15 00:53:17 UTC

Read an solr index with two different lucene formats

I have a solr index built with solr 1.4 a few years ago, and later upgraded
to solr 3.6, and now the index is consisting of 150 million documents.

Now I want to read all values of a DateField from the index.  But it turns
out that for nearly 100 million documents,  document.get('date') return
null, and all other 50 million works just fine.

I used solr to query the index, and verified that each document does have a
non-blank date field.  I suspect that it's because the lucene-3.6 api I am
using can not read datefield correctly from documents written in lucene 1.4
format.

Is this possible?  If it is, is there anyway to get the values right?

Ming-

Re: Read an solr index with two different lucene formats

Posted by Mingfeng Yang <mf...@wisewindow.com>.
Figured out the solution.

The datefield in those documents were stored as binary, so what I should do
is

Fieldable df = doc.getFieldable(fname);
byte[] ary = df.getBinaryValue();
ByteBuffer bb = ByteBuffer.wrap(ary);
long num = bb.getLong();
ate dt = DateTools.stringToDate(DateTools.timeToString(num,
DateTools.Resolution.SECOND));

Then you get dt as a string in the right format.

Ming-


On Fri, Jun 14, 2013 at 4:24 PM, Mingfeng Yang <mf...@wisewindow.com>wrote:

> I did System.println(d.get('date')), and the output is
> "stored,binary,omitNorms,indexOptions=DOCS_ONLY<da...@4cbfea1d>"
>
> Emmm.
>
>
>
>
> On Fri, Jun 14, 2013 at 4:05 PM, Chris Hostetter <hossman_lucene@fucit.org
> > wrote:
>
>>
>> : I used solr to query the index, and verified that each document does
>> have a
>> : non-blank date field.  I suspect that it's because the lucene-3.6 api I
>> am
>> : using can not read datefield correctly from documents written in lucene
>> 1.4
>> : format.
>>
>> how did you verify that they all have a non-blank value?
>>
>> my wild short in the dark guess here...
>>
>> 1) you are "verifying" that every doc has a value in the date field by
>> using something like q=date:[* TO *] and looking at the numfound and it
>> matches q=*:*
>> 2) at some point your date field was indexed but not stored, and a large
>> number of documnts were added during this time.
>> 3) so now all of your documents have an *indexed* value for the date
>> field, but many of them have no *stored* value for the date field.
>>
>>
>> -Hoss
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

Re: Read an solr index with two different lucene formats

Posted by Mingfeng Yang <mf...@wisewindow.com>.
I did System.println(d.get('date')), and the output is
"stored,binary,omitNorms,indexOptions=DOCS_ONLY<da...@4cbfea1d>"

Emmm.




On Fri, Jun 14, 2013 at 4:05 PM, Chris Hostetter
<ho...@fucit.org>wrote:

>
> : I used solr to query the index, and verified that each document does
> have a
> : non-blank date field.  I suspect that it's because the lucene-3.6 api I
> am
> : using can not read datefield correctly from documents written in lucene
> 1.4
> : format.
>
> how did you verify that they all have a non-blank value?
>
> my wild short in the dark guess here...
>
> 1) you are "verifying" that every doc has a value in the date field by
> using something like q=date:[* TO *] and looking at the numfound and it
> matches q=*:*
> 2) at some point your date field was indexed but not stored, and a large
> number of documnts were added during this time.
> 3) so now all of your documents have an *indexed* value for the date
> field, but many of them have no *stored* value for the date field.
>
>
> -Hoss
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Read an solr index with two different lucene formats

Posted by Mingfeng Yang <mf...@wisewindow.com>.
Hoss,

I did in two ways.  The first is the 1) in your list,  q=date:*  match
q=*:*.

And all fields are stored in the index.  I got a doc id (say 3315), do
q=id:3315, the output contain the datefield and value.

Anyway, I am 100% sure every doc has a date field and value indexed and
stored there.  It's just I can't read the value from some documents.

Ming-


On Fri, Jun 14, 2013 at 4:05 PM, Chris Hostetter
<ho...@fucit.org>wrote:

>
> : I used solr to query the index, and verified that each document does
> have a
> : non-blank date field.  I suspect that it's because the lucene-3.6 api I
> am
> : using can not read datefield correctly from documents written in lucene
> 1.4
> : format.
>
> how did you verify that they all have a non-blank value?
>
> my wild short in the dark guess here...
>
> 1) you are "verifying" that every doc has a value in the date field by
> using something like q=date:[* TO *] and looking at the numfound and it
> matches q=*:*
> 2) at some point your date field was indexed but not stored, and a large
> number of documnts were added during this time.
> 3) so now all of your documents have an *indexed* value for the date
> field, but many of them have no *stored* value for the date field.
>
>
> -Hoss
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Read an solr index with two different lucene formats

Posted by Chris Hostetter <ho...@fucit.org>.
: I used solr to query the index, and verified that each document does have a
: non-blank date field.  I suspect that it's because the lucene-3.6 api I am
: using can not read datefield correctly from documents written in lucene 1.4
: format.

how did you verify that they all have a non-blank value?

my wild short in the dark guess here...

1) you are "verifying" that every doc has a value in the date field by 
using something like q=date:[* TO *] and looking at the numfound and it 
matches q=*:*
2) at some point your date field was indexed but not stored, and a large 
number of documnts were added during this time.
3) so now all of your documents have an *indexed* value for the date 
field, but many of them have no *stored* value for the date field.


-Hoss

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org