You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Drew Farris <al...@prodigy.net> on 2004/05/28 03:26:04 UTC

Binary stored fields (was Re: suggestions for a student project)

On Thu, 2004-05-27 at 17:34, Dmitry Serebrennikov wrote:
> Drew Farris wrote:
> >
> >I send out a patch for this soon after the original discussion...
> >
>
> Drew, I'm sorry I missed this. Definetely didn't mean to ignore you 
> work! I just assumed that this is still undone. Oops....

No worries, it might have helped for me to include [PATCH] in the
subject line as the Jakarta guidelines suggest :) Thanks for the good
words and taking the time to look at it now -- I will be glad if I can
contribute something of value. Comments follow inline.

> I just took a look at your patch. Looks great! Simple, and does the trick.
> One comment:
> - I understand why getBinaryValues(String fieldName) has to return 
> byte[][]. This is to deal with multiple fields with the same name, 
> right? 

Yep.

> I think this is a relatively rare case, so perhaps it would be 
> good to have a method getBinaryValue(String fieldName) that just returns 
> the first byte[] for a given name. I think you can avoid allocating an 
> array and most applications would find it more convinient.

Sounds great, I'll implement byte[] getBinaryValue(String fieldName)
which returns the first binary field of the given name encountered in
the results from getFields(fieldName);

> As far as testing, I think unit tests are enough. It would be good to 
> have a few tests though. One test to have as with a large amount of data.

Ok. I've updated DocHelper, TestFieldInfos, TestSegmentReader in
addition to TestDocument which I had already implemented in the last
patch. I will implement a few more as well including one with a ton of
data.

> What happens if there are two fields with the same name but one binary 
> and one not? Will there be an error? If so, when will it be given?

The code in the patch doesn't address this very well at all, stuffing
nulls into arrays which requires the api user to do some extra checking
which . If binary and non binary fields with the same name can co-exist
in a document, I could either:

1) Do away with the call to getFields(fieldName) in getValues and
getBinaryValue/s entirely and walk the fields member in these methods
checking for name/binary-ness.

2) Change Field[] getFields(String fieldName) so it only returns
non-binary fields and implement Field[] getBinaryFields(String
fieldName) that only returns binary fields

Does anyone have a preference? I'm on the fence because I'm not sure if
it's ok to change getFields to not return all of the fields of a given
name.

> Is it possible to set confliciting flags on a field if one does not use 
> the new Field.Binary method? 

I don't think so. The only way to set the isBinary flag in Field is by 
using Field.Binary, or new Field(String name, byte[] value); Each of the
flag member variables are private and there are no public set methods
for any of them.

> I'm +1 for including this (provided it works and all :), I have not 
> actually tested it myself).

Excellent! I'll work on some more extensive unit tests to really
exercise this addition and hopefully that plus your suggestions will
whip it into shape.

> Again, Drew, great work! Thanks for your contribution to Lucene.
> Sorry to have missed your original message.

No problem! Thanks and for taking the time to review it and offering the
great suggestions.

Drew


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Binary stored fields (was Re: suggestions for a student project)

Posted by Roy <ro...@gmail.com>.
Drew,

Thanks for the pointer! I will test it out somtime this week.

Roy

On Tue, 20 Jul 2004 10:10:10 -0400, Drew Farris <dr...@gmail.com> wrote:
> Hi Roy,
> 
> The binary field feature has not been rolled into a release of Lucene,
> but there is a patch available as an attachment to the bugzilla entry:
> http://issues.apache.org/bugzilla/show_bug.cgi?id=29370
> 
> The patch can be accessed directly here:
> http://issues.apache.org/bugzilla/showattachment.cgi?attach_id=11751
> 
> You will need to check out the lucene source code from CVS, apply the
> patch and recompile. Although the patch was generated against a CVS
> snapshot prior to the final 1.4 release, I just tested it against a
> checkout using the lucene_1_4_final tag and it seems to be ok.
> 
> I haven't tested this patch extensively, but the JUnit tests work. Let
> me know if you run into any problems.
> 
> Drew
> 
> 
> 
> On Mon, 19 Jul 2004 17:55:40 -0700, Roy <ro...@gmail.com> wrote:
> > Drew,
> >
> > I am very interested in the binary field feature. Do you have any
> > updates on that?
> >
> > Thanks.
> >
> > Roy
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Binary stored fields (was Re: suggestions for a student project)

Posted by Drew Farris <dr...@gmail.com>.
Hi Roy,

The binary field feature has not been rolled into a release of Lucene,
but there is a patch available as an attachment to the bugzilla entry:
http://issues.apache.org/bugzilla/show_bug.cgi?id=29370

The patch can be accessed directly here:
http://issues.apache.org/bugzilla/showattachment.cgi?attach_id=11751

You will need to check out the lucene source code from CVS, apply the
patch and recompile. Although the patch was generated against a CVS
snapshot prior to the final 1.4 release, I just tested it against a
checkout using the lucene_1_4_final tag and it seems to be ok.

I haven't tested this patch extensively, but the JUnit tests work. Let
me know if you run into any problems.

Drew

On Mon, 19 Jul 2004 17:55:40 -0700, Roy <ro...@gmail.com> wrote:
> Drew,
> 
> I am very interested in the binary field feature. Do you have any
> updates on that?
> 
> Thanks.
> 
> Roy
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Binary stored fields (was Re: suggestions for a student project)

Posted by Roy <ro...@gmail.com>.
Drew,

I am very interested in the binary field feature. Do you have any
updates on that?

Thanks.

Roy

On Thu, 27 May 2004 21:26:04 -0400, Drew Farris <al...@prodigy.net> wrote:
> On Thu, 2004-05-27 at 17:34, Dmitry Serebrennikov wrote:
> > Drew Farris wrote:
> > >
> > >I send out a patch for this soon after the original discussion...
> > >
> >
> > Drew, I'm sorry I missed this. Definetely didn't mean to ignore you
> > work! I just assumed that this is still undone. Oops....
> 
> No worries, it might have helped for me to include [PATCH] in the
> subject line as the Jakarta guidelines suggest :) Thanks for the good
> words and taking the time to look at it now -- I will be glad if I can
> contribute something of value. Comments follow inline.
> 
> > I just took a look at your patch. Looks great! Simple, and does the trick.
> > One comment:
> > - I understand why getBinaryValues(String fieldName) has to return
> > byte[][]. This is to deal with multiple fields with the same name,
> > right?
> 
> Yep.
> 
> > I think this is a relatively rare case, so perhaps it would be
> > good to have a method getBinaryValue(String fieldName) that just returns
> > the first byte[] for a given name. I think you can avoid allocating an
> > array and most applications would find it more convinient.
> 
> Sounds great, I'll implement byte[] getBinaryValue(String fieldName)
> which returns the first binary field of the given name encountered in
> the results from getFields(fieldName);
> 
> > As far as testing, I think unit tests are enough. It would be good to
> > have a few tests though. One test to have as with a large amount of data.
> 
> Ok. I've updated DocHelper, TestFieldInfos, TestSegmentReader in
> addition to TestDocument which I had already implemented in the last
> patch. I will implement a few more as well including one with a ton of
> data.
> 
> > What happens if there are two fields with the same name but one binary
> > and one not? Will there be an error? If so, when will it be given?
> 
> The code in the patch doesn't address this very well at all, stuffing
> nulls into arrays which requires the api user to do some extra checking
> which . If binary and non binary fields with the same name can co-exist
> in a document, I could either:
> 
> 1) Do away with the call to getFields(fieldName) in getValues and
> getBinaryValue/s entirely and walk the fields member in these methods
> checking for name/binary-ness.
> 
> 2) Change Field[] getFields(String fieldName) so it only returns
> non-binary fields and implement Field[] getBinaryFields(String
> fieldName) that only returns binary fields
> 
> Does anyone have a preference? I'm on the fence because I'm not sure if
> it's ok to change getFields to not return all of the fields of a given
> name.
> 
> > Is it possible to set confliciting flags on a field if one does not use
> > the new Field.Binary method?
> 
> I don't think so. The only way to set the isBinary flag in Field is by
> using Field.Binary, or new Field(String name, byte[] value); Each of the
> flag member variables are private and there are no public set methods
> for any of them.
> 
> > I'm +1 for including this (provided it works and all :), I have not
> > actually tested it myself).
> 
> Excellent! I'll work on some more extensive unit tests to really
> exercise this addition and hopefully that plus your suggestions will
> whip it into shape.
> 
> > Again, Drew, great work! Thanks for your contribution to Lucene.
> > Sorry to have missed your original message.
> 
> No problem! Thanks and for taking the time to review it and offering the
> great suggestions.
> 
> Drew
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org