You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@griffin.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/08/14 00:45:00 UTC

[jira] [Work logged] (GRIFFIN-278) AvroBatchDataConnector handle input is directory

     [ https://issues.apache.org/jira/browse/GRIFFIN-278?focusedWorklogId=294308&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294308 ]

ASF GitHub Bot logged work on GRIFFIN-278:
------------------------------------------

                Author: ASF GitHub Bot
            Created on: 14/Aug/19 00:44
            Start Date: 14/Aug/19 00:44
    Worklog Time Spent: 10m 
      Work Description: joohnnie commented on pull request #521: GRIFFIN-278 AvroBatchDataConnector handle input is directory
URL: https://github.com/apache/griffin/pull/521
 
 
   AvroBatchDataConnector process data based on file-level and need to handle the case when input is a directory.
   
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 294308)
            Time Spent: 10m
    Remaining Estimate: 0h

> AvroBatchDataConnector handle input is directory
> ------------------------------------------------
>
>                 Key: GRIFFIN-278
>                 URL: https://issues.apache.org/jira/browse/GRIFFIN-278
>             Project: Griffin
>          Issue Type: Improvement
>            Reporter: Johnnie
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Griffin data connector designed to compare the dataset's accuracy between source and target.
> However, in big data eco-system, most of the source is huge and will have hundreds of files in one folder. I think it would be great if griffin can handle the source by folder instead of a file by default.
>  In addition, in spark normally it reads data from a folder. in this case we don't need to union all the files in one folder



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)