You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "chen kai (JIRA)" <ji...@apache.org> on 2016/12/21 10:57:58 UTC

[jira] [Commented] (SQOOP-951) --export-dir to support subdirectories

    [ https://issues.apache.org/jira/browse/SQOOP-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766753#comment-15766753 ] 

chen kai commented on SQOOP-951:
--------------------------------

Removing the parent path if exist sub dir ?  This will cause more splits

```
  private void scanSubDirectory(Path path, FileSystem fs, List<Path> pathList) throws IOException {
    FileStatus[] status = fs.listStatus(path);
    // remove parent path
    pathList.remove(path);
    for(FileStatus fstat : status) {
      if(fstat.isDir()) {
        pathList.add(fstat.getPath());
        scanSubDirectory(fstat.getPath(), fs, pathList);
      } else {
        pathList.add(fstat.getPath());
      }
    }
  }
```


> --export-dir to support subdirectories
> --------------------------------------
>
>                 Key: SQOOP-951
>                 URL: https://issues.apache.org/jira/browse/SQOOP-951
>             Project: Sqoop
>          Issue Type: Improvement
>    Affects Versions: 1.4.3
>         Environment: Debian GNU/Linux 6.0
>            Reporter: Matthieu Labour
>            Assignee: Vasanth kumar RJ
>         Attachments: SQOOP-951.patch
>
>
> I am using sqoop-1.4.2 to export to Sql.
> --export-dir does not work when the dir being passed is the root of subdirectories. -export-dir is not doing any recursive lookup for files. It expects directory with files that you want export.
> It would be great if one could pass a directory with subdirectories.
> Example:
> The following command exports the data to Sql
> ~/sqoop-1.4.2.bin__hadoop-1.0.0/bin/sqoop export --connect jdbc:postgresql://ec2-XX-XXX-XXX-XXX.compute-1.amazonaws.com:XXXX/xxxxxxxxxxxxx --username xxxxxxxxxx --password xxxxxxxxxx --table ml_ys_log_gmt_daily_experiment_2 --export-dir =hdfs:///mnt/var/lib/hadoop/dfs/logs_daily_sanitized/dt=2013-02-01 --input-fields-terminated-by='\t' --lines-terminated-by='\n' --verbose --batch
> hadoop@domU-XX-XX-XX-XX-XX-XX:/mnt/var/lib/hadoop/steps/2$ hadoop fs -ls hdfs:///mnt/var/lib/hadoop/dfs/logs_daily_sanitized/dt=2013-02-01
> Found 1 items
> -rw-r--r--   1 hadoop supergroup   15931406 2013-03-15 17:03 /mnt/var/lib/hadoop/dfs/logs_daily_sanitized/dt=2013-02-01/part-r-00001
> The following command does not export the data to Sql
> ~/sqoop-1.4.2.bin__hadoop-1.0.0/bin/sqoop export --connect jdbc:postgresql://ec2-XX-XXX-XXX-XXX.compute-1.amazonaws.com:XXXX/xxxxxxxxxxxxx --username xxxxxxxxxx --password xxxxxxxxxx --table ml_ys_log_gmt_daily_experiment_2 --export-dir =hdfs:///mnt/var/lib/hadoop/dfs/logs_daily_sanitized --input-fields-terminated-by='\t' --lines-terminated-by='\n' --verbose --batch
> hadoop@domU-XX-XX-XX-XX-XX-XX:/mnt/var/lib/hadoop/steps/2$ hadoop fs -ls hdfs:///mnt/var/lib/hadoop/dfs/logs_daily_sanitized/
> Found 44 items
> -rw-r--r--   1 hadoop supergroup          0 2013-03-15 17:03 /mnt/var/lib/hadoop/dfs/logs_daily_sanitized/_SUCCESS
> drwxr-xr-x   - hadoop supergroup          0 2013-03-15 17:03 /mnt/var/lib/hadoop/dfs/logs_daily_sanitized/dt=2013-02-01
> drwxr-xr-x   - hadoop supergroup          0 2013-03-15 17:03 /mnt/var/lib/hadoop/dfs/logs_daily_sanitized/dt=2013-02-02
> drwxr-xr-x   - hadoop supergroup          0 2013-03-15 17:03 /mnt/var/lib/hadoop/dfs/logs_daily_sanitized/dt=2013-02-03
> drwxr-xr-x   - hadoop supergroup          0 2013-03-15 17:03 /mnt/var/lib/hadoop/dfs/logs_daily_sanitized/dt=2013-02-04
> drwxr-xr-x   - hadoop supergroup          0 2013-03-15 17:03 /mnt/var/lib/hadoop/dfs/logs_daily_sanitized/dt=2013-02-05
> drwxr-xr-x   - hadoop supergroup          0 2013-03-15 17:03 /mnt/var/lib/hadoop/dfs/logs_daily_sanitized/dt=2013-02-06



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)