You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by manoj raj <ma...@gmail.com> on 2013/11/04 13:33:02 UTC
Lucene Empty Non-empty Fields
I did some experiments for finding empty fields, But i want to know whether
there is any other better method. Have to reduce hard disk space.
Method 1: Add "NULL String" in empty fields
We can search with null string for empty column & non empty column
Observations:
- Index size will grow.
- Suppose if we add one new column, then old documents will not have
null string for that new column in index.
- While fetching results, more IO will happen because of null string.
Method 2: Add one extra field namely NON_EMPTY_COLUMN and add all not empty
column names in that.
We can search like NON_EMPTY_COLUMN:Field_Name, for empty column documents
we have to search with Not of field name
Observations:
- Again Index size will grow
- Fetching is not costly
Method 3: While Searching, iterate through results and check for empty
column by using doc.get
Observations:
- Iterations will become costly, suppose required results is not present
in first set of results.
- Also IO big impact in this.
RE: Lucene Empty Non-empty Fields
Posted by Vitaly Funstein <vf...@gmail.com>.
Or FieldValueFilter - that's probably easier to use.
> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Monday, November 04, 2013 4:37 AM
> To: Lucene Users
> Subject: Re: Lucene Empty Non-empty Fields
>
> You can also use FieldCache.getDocsWithField?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Mon, Nov 4, 2013 at 7:33 AM, manoj raj <ma...@gmail.com> wrote:
> > I did some experiments for finding empty fields, But i want to know
> > whether there is any other better method. Have to reduce hard disk space.
> >
> >
> > Method 1: Add "NULL String" in empty fields
> >
> > We can search with null string for empty column & non empty column
> >
> >
> > Observations:
> >
> > - Index size will grow.
> > - Suppose if we add one new column, then old documents will not have
> > null string for that new column in index.
> > - While fetching results, more IO will happen because of null string.
> >
> >
> > Method 2: Add one extra field namely NON_EMPTY_COLUMN and add all not
> > empty column names in that.
> > We can search like NON_EMPTY_COLUMN:Field_Name, for empty column
> > documents we have to search with Not of field name
> >
> > Observations:
> >
> > - Again Index size will grow
> > - Fetching is not costly
> >
> >
> > Method 3: While Searching, iterate through results and check for empty
> > column by using doc.get
> >
> > Observations:
> >
> > - Iterations will become costly, suppose required results is not present
> > in first set of results.
> > - Also IO big impact in this.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Lucene Empty Non-empty Fields
Posted by Michael McCandless <lu...@mikemccandless.com>.
You can also use FieldCache.getDocsWithField?
Mike McCandless
http://blog.mikemccandless.com
On Mon, Nov 4, 2013 at 7:33 AM, manoj raj <ma...@gmail.com> wrote:
> I did some experiments for finding empty fields, But i want to know whether
> there is any other better method. Have to reduce hard disk space.
>
>
> Method 1: Add "NULL String" in empty fields
>
> We can search with null string for empty column & non empty column
>
>
> Observations:
>
> - Index size will grow.
> - Suppose if we add one new column, then old documents will not have
> null string for that new column in index.
> - While fetching results, more IO will happen because of null string.
>
>
> Method 2: Add one extra field namely NON_EMPTY_COLUMN and add all not empty
> column names in that.
> We can search like NON_EMPTY_COLUMN:Field_Name, for empty column documents
> we have to search with Not of field name
>
> Observations:
>
> - Again Index size will grow
> - Fetching is not costly
>
>
> Method 3: While Searching, iterate through results and check for empty
> column by using doc.get
>
> Observations:
>
> - Iterations will become costly, suppose required results is not present
> in first set of results.
> - Also IO big impact in this.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org