You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@streams.apache.org by "Steve Blackmon (JIRA)" <ji...@apache.org> on 2015/03/23 19:59:52 UTC

[jira] [Created] (STREAMS-300) processor to fix handling of non-string fields from mongoexport

Steve Blackmon created STREAMS-300:
--------------------------------------

             Summary: processor to fix handling of non-string fields from mongoexport
                 Key: STREAMS-300
                 URL: https://issues.apache.org/jira/browse/STREAMS-300
             Project: Streams
          Issue Type: Improvement
            Reporter: Steve Blackmon


mongoexport is useful for producing files full of json documents which can be read by streams in lieu of paging through documents in mongo.  however, there are some artifacts of the export which much be cleaned up to reconstruct the original document.

specifically, dates and numbers show up as dictionaries instead of fields. for example:

    "created_at": {
        "$date": "2015-02-11T04:24:48.101+0000"
    }
    id": {
       "$numberLong": "2405068880"
    }

write a processor that can sit behind WebHdfsPersistReader and clean this up, such that mongoexport -> WebHdfsPersistReader -> MongoExportCleanup -> downstream works equivalently to MongoPersistReader -> downstream



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)