You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@griffin.apache.org by "Chitral Verma (Jira)" <ji...@apache.org> on 2020/04/07 14:57:00 UTC

[jira] [Updated] (GRIFFIN-297) Allow support for additional file based data sources

     [ https://issues.apache.org/jira/browse/GRIFFIN-297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chitral Verma updated GRIFFIN-297:
----------------------------------
        Parent: GRIFFIN-302
    Issue Type: Sub-task  (was: Improvement)

> Allow support for additional file based data sources
> ----------------------------------------------------
>
>                 Key: GRIFFIN-297
>                 URL: https://issues.apache.org/jira/browse/GRIFFIN-297
>             Project: Griffin
>          Issue Type: Sub-task
>            Reporter: Chitral Verma
>            Priority: Major
>              Labels: features
>             Fix For: 0.6.0
>
>          Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> In the current version of Apache griffin (0.5.0), there is very limited support for file based data sources as only Avro and Text files are supported. 
> I propose the feature to allow support for additional file based data sources like Parquet, CSV, TSV, ORC etc in batch mode. Since most of the above sources already have first class support provided by spark, the implementation is straight forward.
> Also, this feature will allow data to be read directly from stand alone files as well as directories present in both local and distributed filesystems.
> A sample config would look like,
> {noformat}
> {
>   "name": "source",
>   "baseline": true,
>   "connectors": [
>     {
>       "type": "file",
>       "version": "1.7",
>       "config": {
>         "format": "parquet",
>         "options": { 
>           "k1": "v1",
>           "k2": "v2"
>         },
>         "paths": [
>           "/home/chitral/path/to/source/",
>           "/home/chitral/path/to/test.parquet"
>         ]
>       }
>     }
>   ]
> }{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)