You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@quickstep.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/09/29 22:00:22 UTC

[jira] [Commented] (QUICKSTEP-46) Fault tolerance in bulk loading data

    [ https://issues.apache.org/jira/browse/QUICKSTEP-46?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15534189#comment-15534189 ] 

ASF GitHub Bot commented on QUICKSTEP-46:
-----------------------------------------

GitHub user tarunbansal opened a pull request:

    https://github.com/apache/incubator-quickstep/pull/103

    QUICKSTEP-46 Fault tolerance in bulk loading data

    This PR helps add fault tolerance while bulk loading data from a file. Now quickstep will skip a record(a row) if it is faulty and move to the next one. Possible error cases are listed below and further classified as To-do or Done :
    DONE
    1) Record has too few fields.
    2) Record has too many fields.
    3) A field could not be parsed properly as per the attribute type in a schema.
    4) Null literal specified for a column with a not nullable type.
    TO-DO
    5) Null character mixed in with other data for a column.
    6) Backslash line splicing support.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tarunbansal/incubator-quickstep master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-quickstep/pull/103.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #103
    
----
commit 596c783d161fbbfd5f1f987c3aa40d621e142ab9
Author: Tarun Bansal <ta...@rockhopper-09.cs.wisc.edu>
Date:   2016-09-23T14:30:23Z

    changes to fix QUICKSTEP-46

commit b753d5209e11718b39869fce8659611a7c9f6945
Author: Tarun Bansal <ta...@rockhopper-05.cs.wisc.edu>
Date:   2016-09-24T22:37:36Z

    QUICKSTEP-46 fixed

commit daa4a37f1a6cc18c999ed8c8b39fc37d6cd67819
Author: Tarun Bansal <ta...@rockhopper-09.cs.wisc.edu>
Date:   2016-09-25T01:11:43Z

    QUICKSTEP-46 fixed

commit b1f91650aa9806e018a64043b0e35861d78a8e6f
Author: Tarun Bansal <ta...@rockhopper-06.cs.wisc.edu>
Date:   2016-09-28T00:10:57Z

    QUICKSTEP-46 fixed

commit 9c30199658a462cded24c71bf7788f121294f29d
Author: Tarun Bansal <ta...@rockhopper-09.cs.wisc.edu>
Date:   2016-09-23T14:30:23Z

    changes to fix QUICKSTEP-46

commit dfeb999b2818a95837f84eccfd5215f70bee14a8
Author: Tarun Bansal <ta...@rockhopper-05.cs.wisc.edu>
Date:   2016-09-24T22:37:36Z

    QUICKSTEP-46 fixed

commit 1bbaf4121e23aa76041c58d10cd984d86844315b
Author: Tarun Bansal <ta...@gmail.com>
Date:   2016-09-25T01:11:43Z

    QUICKSTEP-46 fixed

commit cd5a4561d91d0b670630dbccf6ebe345f363a6c8
Author: Tarun Bansal <ta...@gmail.com>
Date:   2016-09-28T00:30:29Z

    QUICKSTEP-46 fixed

commit fb0d65841a6b623c3a8c22e4f6767cd87d838c56
Author: Tarun Bansal <ta...@gmail.com>
Date:   2016-09-28T01:03:08Z

    QUICKSTEP-46 fixed

----


> Fault tolerance in bulk loading data
> ------------------------------------
>
>                 Key: QUICKSTEP-46
>                 URL: https://issues.apache.org/jira/browse/QUICKSTEP-46
>             Project: Apache Quickstep
>          Issue Type: Improvement
>          Components: Storage
>            Reporter: Harshad Deshmukh
>
> Background: The bulk load ("COPY FROM" command) of data into Quickstep tables can't handle errors gracefully. Some examples are: A faulty row with fewer number of columns than the original table, attribute mismatch or misalignment etc. 
> Proposed solutions: 
> 1. Ignore the discarded row and move on to the next row, instead of terminating the whole process. (Easiest to implement and most practical)
> 2. Let user choose an action as to what to do with the erroneous tuple - discard the tuple or supply a value for the missing column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)