You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2015/07/13 19:49:05 UTC
[jira] [Commented] (GEODE-10) HDFS Integration

    [ https://issues.apache.org/jira/browse/GEODE-10?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625023#comment-14625023 ] 

ASF subversion and git services commented on GEODE-10:
------------------------------------------------------

Commit 3772869d02148eec8b5ce97fbf1af9415bccd98c in incubator-geode's branch refs/heads/feature/GEODE-10 from Ashvin Agrawal
[ https://git-wip-us.apache.org/repos/asf?p=incubator-geode.git;h=3772869 ]

GEODE-10: Refactor HdfsStore api to match spec

* Currently HdfsStore's configuration object is nested and a user needs to
  create multiple sub objects to manage the store instance. This is less usable
  and gets confusing at times. User also gets exposed to a lot of internal
  details. So replacing nested configuration with a flat structure will be
  better.
* Rename members


> HDFS Integration
> ----------------
>
>                 Key: GEODE-10
>                 URL: https://issues.apache.org/jira/browse/GEODE-10
>             Project: Geode
>          Issue Type: New Feature
>          Components: hdfs
>            Reporter: Dan Smith
>            Assignee: Ashvin
>         Attachments: GEODE-HDFSPersistence-Draft-060715-2109-21516.pdf
>
>
> Ability to persist data on HDFS had been under development for GemFire. It was part of the latest code drop, GEODE-8. As part of this feature we are proposing some changes to the HdfsStore management API (see attached doc for details). 
> # The current API has nested configuration for compaction and async queue. This nested structure forces user to execute multiple steps to manage a store. It also does not seem to be consistent with other management APIs
> # Some member names in current API are confusing
> HDFS Integration: Geode as a transactional layer that microbatches data out to Hadoop. This capability makes Geode a NoSQL store that can sit on top of Hadoop and parallelize the process of moving data from the in memory tier into Hadoop, making it very useful for capturing and processing fast data while making it available for Hadoop jobs relatively quickly. The key requirements being met here are
> # Ingest data into HDFS parallely
> # Cache bloom filters and allow fast lookups of individual elements
> # Have programmable policies for deciding what stays in memory
> # Roll files in HDFS
> # Index data that is in memory
> # Have expiration policies that allows the transactional set to decay out older data
> # Solution needs to support replicated and partitioned regions



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)