You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Shai Erera (JIRA)" <ji...@apache.org> on 2013/01/10 12:06:12 UTC
[jira] [Updated] (LUCENE-4620) Explore IntEncoder/Decoder bulk API

     [ https://issues.apache.org/jira/browse/LUCENE-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shai Erera updated LUCENE-4620:
-------------------------------

    Attachment: LUCENE-4620.patch

Patch makes the following changes:

* {{IntEncoder.encode()}} takes an {{IntsRef}} and {{BytesRef}} and encodes the integers from {{IntsRef}} to {{BytesRef}}. Similarily, {{IntDecoder.decode()}} takes a {{BytesRef}} and {{IntsRef}} and decodes the integers from the byte array to the integer array.

* {{CategoryListIterator}} and {{Aggregator}} were changed to do bulk handling of category ordinals as well.

* In the process I merged some methods such as {{PayloadIterator.setdoc}} and {{PayloadIterator.getPayload}}, as well as {{AssociationsPayloadIterator}}, to reduce even further the number of method calls that happen during search.

* Added a test which tests MultiCategoryListIterator (we didn't have one!) and improved EncodingTest to test a large number of random values.

All tests pass, and 'ant javadocs' passes too.
                
> Explore IntEncoder/Decoder bulk API
> -----------------------------------
>
>                 Key: LUCENE-4620
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4620
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Shai Erera
>         Attachments: LUCENE-4620.patch
>
>
> Today, IntEncoder/Decoder offer a streaming API, where you can encode(int) and decode(int). Originally, we believed that this layer can be useful for other scenarios, but in practice it's used only for writing/reading the category ordinals from payload/DV.
> Therefore, Mike and I would like to explore a bulk API, something like encode(IntsRef, BytesRef) and decode(BytesRef, IntsRef). Perhaps the Encoder can still be streaming (as we don't know in advance how many ints will be written), dunno. Will figure this out as we go.
> One thing to check is whether the bulk API can work w/ e.g. facet associations, which can write arbitrary byte[], and so may decoding to an IntsRef won't make sense. This too we'll figure out as we go. I don't rule out that associations will use a different bulk API.
> At the end of the day, the requirement is for someone to be able to configure how ordinals are written (i.e. different encoding schemes: VInt, PackedInts etc.) and later read, with as little overhead as possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org