You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apex.apache.org by Shubham Pathak <sh...@datatorrent.com> on 2016/08/04 08:13:57 UTC

Enricher - Support to load data from other file formats in FSLoader

Hello Community,

I would like to take up this JIRA issue
https://issues.apache.org/jira/browse/APEXMALHAR-2151

In the current implementation, FSLoader loads the data from a file but the
data needs to be in JSON format.
Support for reading from different formats would be a good addition.

To be able to easily plug in support for different formats, I propose
following design changes :
1. Make FSLoader
<https://github.com/apache/apex-malhar/blob/master/contrib/src/main/java/com/datatorrent/contrib/enrich/FSLoader.java>
abstract
2. Add an abstract method extractFields(String line) which gets called from
loadInitialData()
<https://github.com/apache/apex-malhar/blob/master/contrib/src/main/java/com/datatorrent/contrib/enrich/FSLoader.java#L94>
3. Concrete implementations of FSLoader will provide an implementation of
extractFields(String line) to parse the line, extract fields and return
them as a Map. For e.g JSONFSLoader, DelimitedFSLoader, FixedLengthFSLoader

To start with, I will provide implementations of JSONFSLoader,
DelimitedFSLoader.

Would like to receive  feedback on proposed design changes.

 Thanks,
Shubham

Re: Enricher - Support to load data from other file formats in FSLoader

Posted by Shubham Pathak <sh...@datatorrent.com>.
Hello Community,

I would like to take up this JIRA issue
https://issues.apache.org/jira/browse/APEXMALHAR-2152

Similar to JSONFSLoader and DelimitedFSLoader, having a support to load
data from FixedWidth files will be a good addition.
For parsing fixed width files, I'll be using
https://github.com/uniVocity/univocity-parsers#parsing-fixed-width-files .

Thanks,
Shubham


On Thu, Aug 4, 2016 at 1:43 PM, Shubham Pathak <sh...@datatorrent.com>
wrote:

> Hello Community,
>
> I would like to take up this JIRA issue
> https://issues.apache.org/jira/browse/APEXMALHAR-2151
>
> In the current implementation, FSLoader loads the data from a file but the
> data needs to be in JSON format.
> Support for reading from different formats would be a good addition.
>
> To be able to easily plug in support for different formats, I propose
> following design changes :
> 1. Make FSLoader
> <https://github.com/apache/apex-malhar/blob/master/contrib/src/main/java/com/datatorrent/contrib/enrich/FSLoader.java>
> abstract
> 2. Add an abstract method extractFields(String line) which gets called
> from loadInitialData()
> <https://github.com/apache/apex-malhar/blob/master/contrib/src/main/java/com/datatorrent/contrib/enrich/FSLoader.java#L94>
> 3. Concrete implementations of FSLoader will provide an implementation of
> extractFields(String line) to parse the line, extract fields and return
> them as a Map. For e.g JSONFSLoader, DelimitedFSLoader, FixedLengthFSLoader
>
> To start with, I will provide implementations of JSONFSLoader,
> DelimitedFSLoader.
>
> Would like to receive  feedback on proposed design changes.
>
>  Thanks,
> Shubham
>