You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2016/02/01 20:14:39 UTC
[jira] [Commented] (SQOOP-2811) Sqoop2: Extracting sequence files
may result in duplicates
[ https://issues.apache.org/jira/browse/SQOOP-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15126820#comment-15126820 ]
ASF subversion and git services commented on SQOOP-2811:
--------------------------------------------------------
Commit 118aa7c4f9cb7ed3a81ce792e7bf56d31f9107e5 in sqoop's branch refs/heads/sqoop2 from [~jarcec]
[ https://git-wip-us.apache.org/repos/asf?p=sqoop.git;h=118aa7c ]
SQOOP-2811: Sqoop2: Extracting sequence files may result in duplicates
(Abraham Fine via Jarek Jarcec Cecho)
> Sqoop2: Extracting sequence files may result in duplicates
> ----------------------------------------------------------
>
> Key: SQOOP-2811
> URL: https://issues.apache.org/jira/browse/SQOOP-2811
> Project: Sqoop
> Issue Type: Bug
> Affects Versions: 1.99.6
> Reporter: Abraham Fine
> Assignee: Abraham Fine
> Fix For: 1.99.7
>
> Attachments: SQOOP-2811.patch
>
>
> In the hdfs extractor we use:
> {code:java}
> if (start > filereader.getPosition()) {
> filereader.sync(start); // sync to start
> }
> {code}
> to jump to the correct point in the sequence file that we want to extract.
> If the sequence file is small, multiple start points may `sync` to the same point and we could end up extracting the same record multiple times.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)