You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by David Arthur <mu...@gmail.com> on 2013/03/14 20:43:57 UTC
DocValue field "families"
I have an experimental patch that adds support for field families for
doc values. The idea is taken from various BigTable implementations
where a set of fields can be configured to appear in the same physical
file. The idea is, rather than putting all docvalue fields into a single
file, they can be grouped together if they are commonly accessed together.
Would this be something worth fleshing out and contributing?
-David
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: DocValue field "families"
Posted by Simon Willnauer <si...@gmail.com>.
On Thu, Mar 14, 2013 at 9:23 PM, David Arthur <mu...@gmail.com> wrote:
> If you have doc value fields A-Z, and you know that fields A, T, and U are
> commonly accessed together, they could be put into the same family and
> therefor the same file. Wouldn't this give you some small I/O gain since
> they are contiguous? It could very well be that the overhead of additional
> files is not worth it.
doc values are a column-stride storage so even if you put them into
the same file you should need to seek to fetch all the data. if you
know that this data is accessed together you should put it in the same
field. But what you are describing is what stored fields are for.
simon
>
>
> On 3/14/13 3:51 PM, Robert Muir wrote:
>>
>> Whats the advantage of using more files?
>>
>> fyi If you wanted to do this, you can abuse PerFieldDocValuesFormat
>> today to do it, e.g.:
>>
>> iwc.setCodec(new Lucene42Codec() {
>>
>> final DocValuesFormat group1 = new Lucene42DocValuesFormat();
>> final DocValuesFormat group2 = new Lucene42DocValuesFormat();
>> ...
>>
>> @Override
>> public DocValuesFormat getDocValuesFormatForField(String field) {
>> if (field in some list) {
>> return group1;
>> } else {
>> return group2;
>> } ...
>> }
>> });
>>
>> On Thu, Mar 14, 2013 at 3:43 PM, David Arthur <mu...@gmail.com> wrote:
>>>
>>> I have an experimental patch that adds support for field families for doc
>>> values. The idea is taken from various BigTable implementations where a
>>> set
>>> of fields can be configured to appear in the same physical file. The idea
>>> is, rather than putting all docvalue fields into a single file, they can
>>> be
>>> grouped together if they are commonly accessed together.
>>>
>>> Would this be something worth fleshing out and contributing?
>>>
>>> -David
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: DocValue field "families"
Posted by David Arthur <mu...@gmail.com>.
If you have doc value fields A-Z, and you know that fields A, T, and U
are commonly accessed together, they could be put into the same family
and therefor the same file. Wouldn't this give you some small I/O gain
since they are contiguous? It could very well be that the overhead of
additional files is not worth it.
On 3/14/13 3:51 PM, Robert Muir wrote:
> Whats the advantage of using more files?
>
> fyi If you wanted to do this, you can abuse PerFieldDocValuesFormat
> today to do it, e.g.:
>
> iwc.setCodec(new Lucene42Codec() {
>
> final DocValuesFormat group1 = new Lucene42DocValuesFormat();
> final DocValuesFormat group2 = new Lucene42DocValuesFormat();
> ...
>
> @Override
> public DocValuesFormat getDocValuesFormatForField(String field) {
> if (field in some list) {
> return group1;
> } else {
> return group2;
> } ...
> }
> });
>
> On Thu, Mar 14, 2013 at 3:43 PM, David Arthur <mu...@gmail.com> wrote:
>> I have an experimental patch that adds support for field families for doc
>> values. The idea is taken from various BigTable implementations where a set
>> of fields can be configured to appear in the same physical file. The idea
>> is, rather than putting all docvalue fields into a single file, they can be
>> grouped together if they are commonly accessed together.
>>
>> Would this be something worth fleshing out and contributing?
>>
>> -David
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: DocValue field "families"
Posted by Robert Muir <rc...@gmail.com>.
Whats the advantage of using more files?
fyi If you wanted to do this, you can abuse PerFieldDocValuesFormat
today to do it, e.g.:
iwc.setCodec(new Lucene42Codec() {
final DocValuesFormat group1 = new Lucene42DocValuesFormat();
final DocValuesFormat group2 = new Lucene42DocValuesFormat();
...
@Override
public DocValuesFormat getDocValuesFormatForField(String field) {
if (field in some list) {
return group1;
} else {
return group2;
} ...
}
});
On Thu, Mar 14, 2013 at 3:43 PM, David Arthur <mu...@gmail.com> wrote:
> I have an experimental patch that adds support for field families for doc
> values. The idea is taken from various BigTable implementations where a set
> of fields can be configured to appear in the same physical file. The idea
> is, rather than putting all docvalue fields into a single file, they can be
> grouped together if they are commonly accessed together.
>
> Would this be something worth fleshing out and contributing?
>
> -David
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org