You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by manoj raj <ma...@gmail.com> on 2013/11/04 13:33:02 UTC

Lucene Empty Non-empty Fields

I did some experiments for finding empty fields, But i want to know whether
there is any other better method. Have to reduce hard disk space.


Method 1: Add "NULL String" in empty fields

We can search with null string for empty column & non empty column


Observations:

   - Index size will grow.
   - Suppose if we add one new column, then old documents will not have
   null string  for that new column in index.
   - While fetching results, more IO will happen because of null string.


Method 2: Add one extra field namely NON_EMPTY_COLUMN and add all not empty
column names in that.
We can search like NON_EMPTY_COLUMN:Field_Name, for empty column documents
we have to search with Not of field name

Observations:

   - Again Index size will grow
   - Fetching is not costly


Method 3: While Searching, iterate through results and check for empty
column by using doc.get

Observations:

   - Iterations will become costly, suppose required results is not present
   in first set of results.
   - Also IO big impact in this.

RE: Lucene Empty Non-empty Fields

Posted by Vitaly Funstein <vf...@gmail.com>.

Or FieldValueFilter - that's probably easier to use.

> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Monday, November 04, 2013 4:37 AM
> To: Lucene Users
> Subject: Re: Lucene Empty Non-empty Fields
> 
> You can also use FieldCache.getDocsWithField?
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> 
> On Mon, Nov 4, 2013 at 7:33 AM, manoj raj <ma...@gmail.com> wrote:
> > I did some experiments for finding empty fields, But i want to know
> > whether there is any other better method. Have to reduce hard disk space.
> >
> >
> > Method 1: Add "NULL String" in empty fields
> >
> > We can search with null string for empty column & non empty column
> >
> >
> > Observations:
> >
> >    - Index size will grow.
> >    - Suppose if we add one new column, then old documents will not have
> >    null string  for that new column in index.
> >    - While fetching results, more IO will happen because of null string.
> >
> >
> > Method 2: Add one extra field namely NON_EMPTY_COLUMN and add all not
> > empty column names in that.
> > We can search like NON_EMPTY_COLUMN:Field_Name, for empty column
> > documents we have to search with Not of field name
> >
> > Observations:
> >
> >    - Again Index size will grow
> >    - Fetching is not costly
> >
> >
> > Method 3: While Searching, iterate through results and check for empty
> > column by using doc.get
> >
> > Observations:
> >
> >    - Iterations will become costly, suppose required results is not present
> >    in first set of results.
> >    - Also IO big impact in this.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Lucene Empty Non-empty Fields

Posted by Michael McCandless <lu...@mikemccandless.com>.

You can also use FieldCache.getDocsWithField?

Mike McCandless

http://blog.mikemccandless.com


On Mon, Nov 4, 2013 at 7:33 AM, manoj raj <ma...@gmail.com> wrote:
> I did some experiments for finding empty fields, But i want to know whether
> there is any other better method. Have to reduce hard disk space.
>
>
> Method 1: Add "NULL String" in empty fields
>
> We can search with null string for empty column & non empty column
>
>
> Observations:
>
>    - Index size will grow.
>    - Suppose if we add one new column, then old documents will not have
>    null string  for that new column in index.
>    - While fetching results, more IO will happen because of null string.
>
>
> Method 2: Add one extra field namely NON_EMPTY_COLUMN and add all not empty
> column names in that.
> We can search like NON_EMPTY_COLUMN:Field_Name, for empty column documents
> we have to search with Not of field name
>
> Observations:
>
>    - Again Index size will grow
>    - Fetching is not costly
>
>
> Method 3: While Searching, iterate through results and check for empty
> column by using doc.get
>
> Observations:
>
>    - Iterations will become costly, suppose required results is not present
>    in first set of results.
>    - Also IO big impact in this.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org