You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@streams.apache.org by "Steve Blackmon (JIRA)" <ji...@apache.org> on 2015/03/23 19:59:52 UTC
[jira] [Created] (STREAMS-300) processor to fix handling of
non-string fields from mongoexport
Steve Blackmon created STREAMS-300:
--------------------------------------
Summary: processor to fix handling of non-string fields from mongoexport
Key: STREAMS-300
URL: https://issues.apache.org/jira/browse/STREAMS-300
Project: Streams
Issue Type: Improvement
Reporter: Steve Blackmon
mongoexport is useful for producing files full of json documents which can be read by streams in lieu of paging through documents in mongo. however, there are some artifacts of the export which much be cleaned up to reconstruct the original document.
specifically, dates and numbers show up as dictionaries instead of fields. for example:
"created_at": {
"$date": "2015-02-11T04:24:48.101+0000"
}
id": {
"$numberLong": "2405068880"
}
write a processor that can sit behind WebHdfsPersistReader and clean this up, such that mongoexport -> WebHdfsPersistReader -> MongoExportCleanup -> downstream works equivalently to MongoPersistReader -> downstream
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)