You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/10/03 16:31:20 UTC
[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment
interface
[ https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15542797#comment-15542797 ]
ASF GitHub Bot commented on CARBONDATA-284:
-------------------------------------------
GitHub user jackylk opened a pull request:
https://github.com/apache/incubator-carbondata/pull/208
[CARBONDATA-284][WIP] Abstracting index and segment interface
This PR adds new User API and Dev API for carbon-hadoop module:
### User API
- `CarbonColumnarInputFormat/OutputFormat`: it uses current `CarbonInputFormat` as internal implementation.
- `CarbonRowInputFormat/OutputFormat`: it needs to be implemented
- `CarbonOutputCommitter`: used for managing segment commit
They are based on `CarbonInputFormatBase/OutputFormatBase`
### Dev API
- Segment: an abstract class represents a single load of data, used by CarbonInputFormatBase to get all InputSplit by matching QueryModel, and used by CarbonOutputCommitter to prepare for reading. Implementation examples are `IndexedSegment` and `StreamingSegment`.
- SegmentManager: an interface to manage segments. Current implementation is `ZkSegmentManager`, which need to be mapped to existing logic.
- Index: an interface that can is used by `IndexedSegment` to filter InputSplit. Current implementation is `InMemoryBTreeIndex` which load the index into driver's memory.
`CarbonInputFormatUtil` is modified so that it can also be used by `CarbonColumnarInputFormat`.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jackylk/incubator-carbondata index-interface
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-carbondata/pull/208.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #208
----
commit 398d2ec3e6706c615918a734a90f9dc4111067d8
Author: jackylk <ja...@huawei.com>
Date: 2016-10-03T16:01:48Z
add User API
commit 1d92a00403faeebc09bf595ba11b3e55d4c997f2
Author: jackylk <ja...@huawei.com>
Date: 2016-10-03T16:02:04Z
add Developer API
commit 1812a0a68b53ba5d48fc030e2a59329b0e827b05
Author: jackylk <ja...@huawei.com>
Date: 2016-10-03T16:02:49Z
refactory existing code
commit 430e7710b88725b587c1f3542d4d66ab02958cbc
Author: jackylk <ja...@huawei.com>
Date: 2016-10-03T16:27:10Z
change Index interface
----
> Abstracting Index and Segment interface
> ---------------------------------------
>
> Key: CARBONDATA-284
> URL: https://issues.apache.org/jira/browse/CARBONDATA-284
> Project: CarbonData
> Issue Type: Improvement
> Components: hadoop-integration
> Affects Versions: 0.1.0-incubating
> Reporter: Jacky Li
> Fix For: 0.2.0-incubating
>
>
> This issue is intended to abstract developer API and user API to achieve following goals:
> Goal 1: User can choose the place to store Index data, it can be stored in
> processing framework's memory space (like in spark driver memory) or in
> another service outside of the processing framework (like using a
> independent database service, which can be shared across client)
> Goal 2: Developer can add more index of his choice to CarbonData files.
> Besides B+ tree on multi-dimensional key which current CarbonData supports,
> developers are free to add other indexing technology to make certain
> workload faster. These new indices should be added in a pluggable way.
> This Jira has been discussed in maillist:
> http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)