You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Mingfeng Yang <mf...@wisewindow.com> on 2013/06/15 00:53:17 UTC
Read an solr index with two different lucene formats
I have a solr index built with solr 1.4 a few years ago, and later upgraded
to solr 3.6, and now the index is consisting of 150 million documents.
Now I want to read all values of a DateField from the index. But it turns
out that for nearly 100 million documents, document.get('date') return
null, and all other 50 million works just fine.
I used solr to query the index, and verified that each document does have a
non-blank date field. I suspect that it's because the lucene-3.6 api I am
using can not read datefield correctly from documents written in lucene 1.4
format.
Is this possible? If it is, is there anyway to get the values right?
Ming-
Re: Read an solr index with two different lucene formats
Posted by Mingfeng Yang <mf...@wisewindow.com>.
Figured out the solution.
The datefield in those documents were stored as binary, so what I should do
is
Fieldable df = doc.getFieldable(fname);
byte[] ary = df.getBinaryValue();
ByteBuffer bb = ByteBuffer.wrap(ary);
long num = bb.getLong();
ate dt = DateTools.stringToDate(DateTools.timeToString(num,
DateTools.Resolution.SECOND));
Then you get dt as a string in the right format.
Ming-
On Fri, Jun 14, 2013 at 4:24 PM, Mingfeng Yang <mf...@wisewindow.com>wrote:
> I did System.println(d.get('date')), and the output is
> "stored,binary,omitNorms,indexOptions=DOCS_ONLY<da...@4cbfea1d>"
>
> Emmm.
>
>
>
>
> On Fri, Jun 14, 2013 at 4:05 PM, Chris Hostetter <hossman_lucene@fucit.org
> > wrote:
>
>>
>> : I used solr to query the index, and verified that each document does
>> have a
>> : non-blank date field. I suspect that it's because the lucene-3.6 api I
>> am
>> : using can not read datefield correctly from documents written in lucene
>> 1.4
>> : format.
>>
>> how did you verify that they all have a non-blank value?
>>
>> my wild short in the dark guess here...
>>
>> 1) you are "verifying" that every doc has a value in the date field by
>> using something like q=date:[* TO *] and looking at the numfound and it
>> matches q=*:*
>> 2) at some point your date field was indexed but not stored, and a large
>> number of documnts were added during this time.
>> 3) so now all of your documents have an *indexed* value for the date
>> field, but many of them have no *stored* value for the date field.
>>
>>
>> -Hoss
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
Re: Read an solr index with two different lucene formats
Posted by Mingfeng Yang <mf...@wisewindow.com>.
I did System.println(d.get('date')), and the output is
"stored,binary,omitNorms,indexOptions=DOCS_ONLY<da...@4cbfea1d>"
Emmm.
On Fri, Jun 14, 2013 at 4:05 PM, Chris Hostetter
<ho...@fucit.org>wrote:
>
> : I used solr to query the index, and verified that each document does
> have a
> : non-blank date field. I suspect that it's because the lucene-3.6 api I
> am
> : using can not read datefield correctly from documents written in lucene
> 1.4
> : format.
>
> how did you verify that they all have a non-blank value?
>
> my wild short in the dark guess here...
>
> 1) you are "verifying" that every doc has a value in the date field by
> using something like q=date:[* TO *] and looking at the numfound and it
> matches q=*:*
> 2) at some point your date field was indexed but not stored, and a large
> number of documnts were added during this time.
> 3) so now all of your documents have an *indexed* value for the date
> field, but many of them have no *stored* value for the date field.
>
>
> -Hoss
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: Read an solr index with two different lucene formats
Posted by Mingfeng Yang <mf...@wisewindow.com>.
Hoss,
I did in two ways. The first is the 1) in your list, q=date:* match
q=*:*.
And all fields are stored in the index. I got a doc id (say 3315), do
q=id:3315, the output contain the datefield and value.
Anyway, I am 100% sure every doc has a date field and value indexed and
stored there. It's just I can't read the value from some documents.
Ming-
On Fri, Jun 14, 2013 at 4:05 PM, Chris Hostetter
<ho...@fucit.org>wrote:
>
> : I used solr to query the index, and verified that each document does
> have a
> : non-blank date field. I suspect that it's because the lucene-3.6 api I
> am
> : using can not read datefield correctly from documents written in lucene
> 1.4
> : format.
>
> how did you verify that they all have a non-blank value?
>
> my wild short in the dark guess here...
>
> 1) you are "verifying" that every doc has a value in the date field by
> using something like q=date:[* TO *] and looking at the numfound and it
> matches q=*:*
> 2) at some point your date field was indexed but not stored, and a large
> number of documnts were added during this time.
> 3) so now all of your documents have an *indexed* value for the date
> field, but many of them have no *stored* value for the date field.
>
>
> -Hoss
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: Read an solr index with two different lucene formats
Posted by Chris Hostetter <ho...@fucit.org>.
: I used solr to query the index, and verified that each document does have a
: non-blank date field. I suspect that it's because the lucene-3.6 api I am
: using can not read datefield correctly from documents written in lucene 1.4
: format.
how did you verify that they all have a non-blank value?
my wild short in the dark guess here...
1) you are "verifying" that every doc has a value in the date field by
using something like q=date:[* TO *] and looking at the numfound and it
matches q=*:*
2) at some point your date field was indexed but not stored, and a large
number of documnts were added during this time.
3) so now all of your documents have an *indexed* value for the date
field, but many of them have no *stored* value for the date field.
-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org