You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "Abraham Fine (JIRA)" <ji...@apache.org> on 2016/01/29 23:16:39 UTC
[jira] [Created] (SQOOP-2811) Sqoop2: Extracting sequence files may
result in duplicates
Abraham Fine created SQOOP-2811:
-----------------------------------
Summary: Sqoop2: Extracting sequence files may result in duplicates
Key: SQOOP-2811
URL: https://issues.apache.org/jira/browse/SQOOP-2811
Project: Sqoop
Issue Type: Bug
Affects Versions: 1.99.6
Reporter: Abraham Fine
Assignee: Abraham Fine
In the hdfs extractor we use:
```
if (start > filereader.getPosition()) {
filereader.sync(start); // sync to start
}
```
to jump to the correct point in the sequence file that we want to extract.
If the sequence file is small, multiple start points may `sync` to the same point and we could end up extracting the same record multiple times.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)