You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@quickstep.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/09/29 22:00:22 UTC
[jira] [Commented] (QUICKSTEP-46) Fault tolerance in bulk loading
data
[ https://issues.apache.org/jira/browse/QUICKSTEP-46?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15534189#comment-15534189 ]
ASF GitHub Bot commented on QUICKSTEP-46:
-----------------------------------------
GitHub user tarunbansal opened a pull request:
https://github.com/apache/incubator-quickstep/pull/103
QUICKSTEP-46 Fault tolerance in bulk loading data
This PR helps add fault tolerance while bulk loading data from a file. Now quickstep will skip a record(a row) if it is faulty and move to the next one. Possible error cases are listed below and further classified as To-do or Done :
DONE
1) Record has too few fields.
2) Record has too many fields.
3) A field could not be parsed properly as per the attribute type in a schema.
4) Null literal specified for a column with a not nullable type.
TO-DO
5) Null character mixed in with other data for a column.
6) Backslash line splicing support.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tarunbansal/incubator-quickstep master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-quickstep/pull/103.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #103
----
commit 596c783d161fbbfd5f1f987c3aa40d621e142ab9
Author: Tarun Bansal <ta...@rockhopper-09.cs.wisc.edu>
Date: 2016-09-23T14:30:23Z
changes to fix QUICKSTEP-46
commit b753d5209e11718b39869fce8659611a7c9f6945
Author: Tarun Bansal <ta...@rockhopper-05.cs.wisc.edu>
Date: 2016-09-24T22:37:36Z
QUICKSTEP-46 fixed
commit daa4a37f1a6cc18c999ed8c8b39fc37d6cd67819
Author: Tarun Bansal <ta...@rockhopper-09.cs.wisc.edu>
Date: 2016-09-25T01:11:43Z
QUICKSTEP-46 fixed
commit b1f91650aa9806e018a64043b0e35861d78a8e6f
Author: Tarun Bansal <ta...@rockhopper-06.cs.wisc.edu>
Date: 2016-09-28T00:10:57Z
QUICKSTEP-46 fixed
commit 9c30199658a462cded24c71bf7788f121294f29d
Author: Tarun Bansal <ta...@rockhopper-09.cs.wisc.edu>
Date: 2016-09-23T14:30:23Z
changes to fix QUICKSTEP-46
commit dfeb999b2818a95837f84eccfd5215f70bee14a8
Author: Tarun Bansal <ta...@rockhopper-05.cs.wisc.edu>
Date: 2016-09-24T22:37:36Z
QUICKSTEP-46 fixed
commit 1bbaf4121e23aa76041c58d10cd984d86844315b
Author: Tarun Bansal <ta...@gmail.com>
Date: 2016-09-25T01:11:43Z
QUICKSTEP-46 fixed
commit cd5a4561d91d0b670630dbccf6ebe345f363a6c8
Author: Tarun Bansal <ta...@gmail.com>
Date: 2016-09-28T00:30:29Z
QUICKSTEP-46 fixed
commit fb0d65841a6b623c3a8c22e4f6767cd87d838c56
Author: Tarun Bansal <ta...@gmail.com>
Date: 2016-09-28T01:03:08Z
QUICKSTEP-46 fixed
----
> Fault tolerance in bulk loading data
> ------------------------------------
>
> Key: QUICKSTEP-46
> URL: https://issues.apache.org/jira/browse/QUICKSTEP-46
> Project: Apache Quickstep
> Issue Type: Improvement
> Components: Storage
> Reporter: Harshad Deshmukh
>
> Background: The bulk load ("COPY FROM" command) of data into Quickstep tables can't handle errors gracefully. Some examples are: A faulty row with fewer number of columns than the original table, attribute mismatch or misalignment etc.
> Proposed solutions:
> 1. Ignore the discarded row and move on to the next row, instead of terminating the whole process. (Easiest to implement and most practical)
> 2. Let user choose an action as to what to do with the erroneous tuple - discard the tuple or supply a value for the missing column.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)