You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by David Arthur <mu...@gmail.com> on 2013/03/14 20:43:57 UTC

DocValue field "families"

I have an experimental patch that adds support for field families for 
doc values. The idea is taken from various BigTable implementations 
where a set of fields can be configured to appear in the same physical 
file. The idea is, rather than putting all docvalue fields into a single 
file, they can be grouped together if they are commonly accessed together.

Would this be something worth fleshing out and contributing?

-David

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: DocValue field "families"

Posted by Simon Willnauer <si...@gmail.com>.
On Thu, Mar 14, 2013 at 9:23 PM, David Arthur <mu...@gmail.com> wrote:
> If you have doc value fields A-Z, and you know that fields A, T, and U are
> commonly accessed together, they could be put into the same family and
> therefor the same file. Wouldn't this give you some small I/O gain since
> they are contiguous? It could very well be that the overhead of additional
> files is not worth it.

doc values are a column-stride storage so even if you put them into
the same file you should need to seek to fetch all the data. if you
know that this data is accessed together you should put it in the same
field. But what you are describing is what stored fields are for.

simon
>
>
> On 3/14/13 3:51 PM, Robert Muir wrote:
>>
>> Whats the advantage of using more files?
>>
>> fyi If you wanted to do this, you can abuse PerFieldDocValuesFormat
>> today to do it, e.g.:
>>
>>      iwc.setCodec(new Lucene42Codec() {
>>
>>        final DocValuesFormat group1 = new Lucene42DocValuesFormat();
>>        final DocValuesFormat group2 = new Lucene42DocValuesFormat();
>>        ...
>>
>>        @Override
>>        public DocValuesFormat getDocValuesFormatForField(String field) {
>>          if (field in some list) {
>>            return group1;
>>          } else {
>>            return group2;
>>          } ...
>>        }
>>      });
>>
>> On Thu, Mar 14, 2013 at 3:43 PM, David Arthur <mu...@gmail.com> wrote:
>>>
>>> I have an experimental patch that adds support for field families for doc
>>> values. The idea is taken from various BigTable implementations where a
>>> set
>>> of fields can be configured to appear in the same physical file. The idea
>>> is, rather than putting all docvalue fields into a single file, they can
>>> be
>>> grouped together if they are commonly accessed together.
>>>
>>> Would this be something worth fleshing out and contributing?
>>>
>>> -David
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: DocValue field "families"

Posted by David Arthur <mu...@gmail.com>.
If you have doc value fields A-Z, and you know that fields A, T, and U 
are commonly accessed together, they could be put into the same family 
and therefor the same file. Wouldn't this give you some small I/O gain 
since they are contiguous? It could very well be that the overhead of 
additional files is not worth it.

On 3/14/13 3:51 PM, Robert Muir wrote:
> Whats the advantage of using more files?
>
> fyi If you wanted to do this, you can abuse PerFieldDocValuesFormat
> today to do it, e.g.:
>
>      iwc.setCodec(new Lucene42Codec() {
>
>        final DocValuesFormat group1 = new Lucene42DocValuesFormat();
>        final DocValuesFormat group2 = new Lucene42DocValuesFormat();
>        ...
>
>        @Override
>        public DocValuesFormat getDocValuesFormatForField(String field) {
>          if (field in some list) {
>            return group1;
>          } else {
>            return group2;
>          } ...
>        }
>      });
>
> On Thu, Mar 14, 2013 at 3:43 PM, David Arthur <mu...@gmail.com> wrote:
>> I have an experimental patch that adds support for field families for doc
>> values. The idea is taken from various BigTable implementations where a set
>> of fields can be configured to appear in the same physical file. The idea
>> is, rather than putting all docvalue fields into a single file, they can be
>> grouped together if they are commonly accessed together.
>>
>> Would this be something worth fleshing out and contributing?
>>
>> -David
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: DocValue field "families"

Posted by Robert Muir <rc...@gmail.com>.
Whats the advantage of using more files?

fyi If you wanted to do this, you can abuse PerFieldDocValuesFormat
today to do it, e.g.:

    iwc.setCodec(new Lucene42Codec() {

      final DocValuesFormat group1 = new Lucene42DocValuesFormat();
      final DocValuesFormat group2 = new Lucene42DocValuesFormat();
      ...

      @Override
      public DocValuesFormat getDocValuesFormatForField(String field) {
        if (field in some list) {
          return group1;
        } else {
          return group2;
        } ...
      }
    });

On Thu, Mar 14, 2013 at 3:43 PM, David Arthur <mu...@gmail.com> wrote:
> I have an experimental patch that adds support for field families for doc
> values. The idea is taken from various BigTable implementations where a set
> of fields can be configured to appear in the same physical file. The idea
> is, rather than putting all docvalue fields into a single file, they can be
> grouped together if they are commonly accessed together.
>
> Would this be something worth fleshing out and contributing?
>
> -David
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org