You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Kirk True (JIRA)" <ji...@apache.org> on 2011/02/16 21:21:25 UTC

[jira] Created: (HIVE-1996) "LOAD DATA INPATH" fails when the table already contains a file of the same name

"LOAD DATA INPATH" fails when the table already contains a file of the same name
--------------------------------------------------------------------------------

                 Key: HIVE-1996
                 URL: https://issues.apache.org/jira/browse/HIVE-1996
             Project: Hive
          Issue Type: Bug
    Affects Versions: 0.7.0
            Reporter: Kirk True
            Assignee: Kirk True


Steps:

1. From the command line copy the kv2.txt data file into the current user's HDFS directory:

    $ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt

2. In Hive, create the table:

    create table tst_src1 (key_ int, value_ string);

3. Load the data into the table from HDFS:

    load data inpath './kv2.txt' into table tst_src1;

4. Repeat step 1
5. Repeat step 3

Expected:

To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307.

Actual:

File is renamed, but Hive.copyFiles doesn't "see" the change in "srcs" as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error:

java.lang.NullPointerException
    at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725)
    at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541)
    at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173)
    at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-1996) "LOAD DATA INPATH" fails when the table already contains a file of the same name

Posted by "Chinna Rao Lalam (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chinna Rao Lalam updated HIVE-1996:
-----------------------------------

    Status: Patch Available  (was: Open)

> "LOAD DATA INPATH" fails when the table already contains a file of the same name
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-1996
>                 URL: https://issues.apache.org/jira/browse/HIVE-1996
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Kirk True
>            Assignee: Chinna Rao Lalam
>         Attachments: HIVE-1996.Patch
>
>
> Steps:
> 1. From the command line copy the kv2.txt data file into the current user's HDFS directory:
> {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}}
> 2. In Hive, create the table:
> {{create table tst_src1 (key_ int, value_ string);}}
> 3. Load the data into the table from HDFS:
> {{load data inpath './kv2.txt' into table tst_src1;}}
> 4. Repeat step 1
> 5. Repeat step 3
> Expected:
> To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307.
> Actual:
> File is renamed, but {{Hive.copyFiles}} doesn't "see" the change in {{srcs}} as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error:
> {noformat}
> java.lang.NullPointerException
>     at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725)
>     at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541)
>     at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173)
>     at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
>     at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060)
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HIVE-1996) "LOAD DATA INPATH" fails when the table already contains a file of the same name

Posted by "Kirk True (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kirk True reassigned HIVE-1996:
-------------------------------

    Assignee: Chinna Rao Lalam  (was: Kirk True)

> "LOAD DATA INPATH" fails when the table already contains a file of the same name
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-1996
>                 URL: https://issues.apache.org/jira/browse/HIVE-1996
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Kirk True
>            Assignee: Chinna Rao Lalam
>
> Steps:
> 1. From the command line copy the kv2.txt data file into the current user's HDFS directory:
> {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}}
> 2. In Hive, create the table:
> {{create table tst_src1 (key_ int, value_ string);}}
> 3. Load the data into the table from HDFS:
> {{load data inpath './kv2.txt' into table tst_src1;}}
> 4. Repeat step 1
> 5. Repeat step 3
> Expected:
> To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307.
> Actual:
> File is renamed, but {{Hive.copyFiles}} doesn't "see" the change in {{srcs}} as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error:
> {noformat}
> java.lang.NullPointerException
>     at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725)
>     at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541)
>     at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173)
>     at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
>     at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060)
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-1996) "LOAD DATA INPATH" fails when the table already contains a file of the same name

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HIVE-1996:
-------------------------------

    Status: Open  (was: Patch Available)

> "LOAD DATA INPATH" fails when the table already contains a file of the same name
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-1996
>                 URL: https://issues.apache.org/jira/browse/HIVE-1996
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Kirk True
>            Assignee: Chinna Rao Lalam
>         Attachments: HIVE-1996.1.Patch, HIVE-1996.Patch
>
>
> Steps:
> 1. From the command line copy the kv2.txt data file into the current user's HDFS directory:
> {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}}
> 2. In Hive, create the table:
> {{create table tst_src1 (key_ int, value_ string);}}
> 3. Load the data into the table from HDFS:
> {{load data inpath './kv2.txt' into table tst_src1;}}
> 4. Repeat step 1
> 5. Repeat step 3
> Expected:
> To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307.
> Actual:
> File is renamed, but {{Hive.copyFiles}} doesn't "see" the change in {{srcs}} as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error:
> {noformat}
> java.lang.NullPointerException
>     at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725)
>     at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541)
>     at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173)
>     at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
>     at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060)
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-1996) "LOAD DATA INPATH" fails when the table already contains a file of the same name

Posted by "Chinna Rao Lalam (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chinna Rao Lalam updated HIVE-1996:
-----------------------------------

    Status: Patch Available  (was: Open)

> "LOAD DATA INPATH" fails when the table already contains a file of the same name
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-1996
>                 URL: https://issues.apache.org/jira/browse/HIVE-1996
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Kirk True
>            Assignee: Chinna Rao Lalam
>         Attachments: HIVE-1996.1.Patch, HIVE-1996.Patch
>
>
> Steps:
> 1. From the command line copy the kv2.txt data file into the current user's HDFS directory:
> {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}}
> 2. In Hive, create the table:
> {{create table tst_src1 (key_ int, value_ string);}}
> 3. Load the data into the table from HDFS:
> {{load data inpath './kv2.txt' into table tst_src1;}}
> 4. Repeat step 1
> 5. Repeat step 3
> Expected:
> To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307.
> Actual:
> File is renamed, but {{Hive.copyFiles}} doesn't "see" the change in {{srcs}} as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error:
> {noformat}
> java.lang.NullPointerException
>     at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725)
>     at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541)
>     at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173)
>     at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
>     at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060)
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HIVE-1996) "LOAD DATA INPATH" fails when the table already contains a file of the same name

Posted by "Kirk True (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kirk True updated HIVE-1996:
----------------------------

    Description: 
Steps:

1. From the command line copy the kv2.txt data file into the current user's HDFS directory:

    {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}}

2. In Hive, create the table:

    {{create table tst_src1 (key_ int, value_ string);}}

3. Load the data into the table from HDFS:

    {{load data inpath './kv2.txt' into table tst_src1;}}

4. Repeat step 1
5. Repeat step 3

Expected:

To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307.

Actual:

File is renamed, but Hive.copyFiles doesn't "see" the change in "srcs" as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error:

{{java.lang.NullPointerException
    at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725)
    at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541)
    at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173)
    at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
}}

  was:
Steps:

1. From the command line copy the kv2.txt data file into the current user's HDFS directory:

    $ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt

2. In Hive, create the table:

    create table tst_src1 (key_ int, value_ string);

3. Load the data into the table from HDFS:

    load data inpath './kv2.txt' into table tst_src1;

4. Repeat step 1
5. Repeat step 3

Expected:

To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307.

Actual:

File is renamed, but Hive.copyFiles doesn't "see" the change in "srcs" as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error:

java.lang.NullPointerException
    at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725)
    at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541)
    at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173)
    at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)


> "LOAD DATA INPATH" fails when the table already contains a file of the same name
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-1996
>                 URL: https://issues.apache.org/jira/browse/HIVE-1996
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Kirk True
>            Assignee: Kirk True
>
> Steps:
> 1. From the command line copy the kv2.txt data file into the current user's HDFS directory:
>     {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}}
> 2. In Hive, create the table:
>     {{create table tst_src1 (key_ int, value_ string);}}
> 3. Load the data into the table from HDFS:
>     {{load data inpath './kv2.txt' into table tst_src1;}}
> 4. Repeat step 1
> 5. Repeat step 3
> Expected:
> To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307.
> Actual:
> File is renamed, but Hive.copyFiles doesn't "see" the change in "srcs" as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error:
> {{java.lang.NullPointerException
>     at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725)
>     at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541)
>     at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173)
>     at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
>     at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060)
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> }}

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-1996) "LOAD DATA INPATH" fails when the table already contains a file of the same name

Posted by "Namit Jain (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13166594#comment-13166594 ] 

Namit Jain commented on HIVE-1996:
----------------------------------

Yongqiang, can you take a look ?
                
> "LOAD DATA INPATH" fails when the table already contains a file of the same name
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-1996
>                 URL: https://issues.apache.org/jira/browse/HIVE-1996
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Kirk True
>            Assignee: Chinna Rao Lalam
>         Attachments: HIVE-1996.1.Patch, HIVE-1996.2.Patch, HIVE-1996.Patch
>
>
> Steps:
> 1. From the command line copy the kv2.txt data file into the current user's HDFS directory:
> {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}}
> 2. In Hive, create the table:
> {{create table tst_src1 (key_ int, value_ string);}}
> 3. Load the data into the table from HDFS:
> {{load data inpath './kv2.txt' into table tst_src1;}}
> 4. Repeat step 1
> 5. Repeat step 3
> Expected:
> To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307.
> Actual:
> File is renamed, but {{Hive.copyFiles}} doesn't "see" the change in {{srcs}} as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error:
> {noformat}
> java.lang.NullPointerException
>     at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725)
>     at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541)
>     at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173)
>     at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
>     at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060)
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-1996) "LOAD DATA INPATH" fails when the table already contains a file of the same name

Posted by "Chinna Rao Lalam (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chinna Rao Lalam updated HIVE-1996:
-----------------------------------

    Attachment: HIVE-1996.2.Patch
    
> "LOAD DATA INPATH" fails when the table already contains a file of the same name
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-1996
>                 URL: https://issues.apache.org/jira/browse/HIVE-1996
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Kirk True
>            Assignee: Chinna Rao Lalam
>         Attachments: HIVE-1996.1.Patch, HIVE-1996.2.Patch, HIVE-1996.Patch
>
>
> Steps:
> 1. From the command line copy the kv2.txt data file into the current user's HDFS directory:
> {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}}
> 2. In Hive, create the table:
> {{create table tst_src1 (key_ int, value_ string);}}
> 3. Load the data into the table from HDFS:
> {{load data inpath './kv2.txt' into table tst_src1;}}
> 4. Repeat step 1
> 5. Repeat step 3
> Expected:
> To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307.
> Actual:
> File is renamed, but {{Hive.copyFiles}} doesn't "see" the change in {{srcs}} as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error:
> {noformat}
> java.lang.NullPointerException
>     at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725)
>     at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541)
>     at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173)
>     at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
>     at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060)
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-1996) "LOAD DATA INPATH" fails when the table already contains a file of the same name

Posted by "Chinna Rao Lalam (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103814#comment-13103814 ] 

Chinna Rao Lalam commented on HIVE-1996:
----------------------------------------

Agreed, I will update the patch with configuration.

> "LOAD DATA INPATH" fails when the table already contains a file of the same name
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-1996
>                 URL: https://issues.apache.org/jira/browse/HIVE-1996
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Kirk True
>            Assignee: Chinna Rao Lalam
>         Attachments: HIVE-1996.1.Patch, HIVE-1996.Patch
>
>
> Steps:
> 1. From the command line copy the kv2.txt data file into the current user's HDFS directory:
> {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}}
> 2. In Hive, create the table:
> {{create table tst_src1 (key_ int, value_ string);}}
> 3. Load the data into the table from HDFS:
> {{load data inpath './kv2.txt' into table tst_src1;}}
> 4. Repeat step 1
> 5. Repeat step 3
> Expected:
> To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307.
> Actual:
> File is renamed, but {{Hive.copyFiles}} doesn't "see" the change in {{srcs}} as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error:
> {noformat}
> java.lang.NullPointerException
>     at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725)
>     at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541)
>     at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173)
>     at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
>     at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060)
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-1996) "LOAD DATA INPATH" fails when the table already contains a file of the same name

Posted by "Chinna Rao Lalam (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109247#comment-13109247 ] 

Chinna Rao Lalam commented on HIVE-1996:
----------------------------------------

Hi He Yongqiang,
Here these 2 scenarios need to consider
1)If rename disabled  load one data folder that contains 10 files like 1.txt,2.txt...,10.txt   here in the table already one file present with same name 5.txt. While loading 5.txt it will throw the  exception and operation will fail but here already loaded file(1.txt,2.txt....4.txt) will present...

 1.txt,2.txt...,10.txt   here in the table already one file present with same name 6.txt. While loading 6.txt it will throw the  exception and operation will fail but here already loaded file(1.txt,2.txt....4.txt,5.txt ) will present...

	So its mainly dependent on the order and can cause inconsistencies.

2)In the current implementation also "org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Path, Path, FileSystem)"  if any of the file it is unable to rename it will throw exception but for the same operation some file will be loaded.

Proposed Sol:  While loading if any exception comes note that file as unloaded file and continue the load with remaining files  and operation will fail with the exception and unloaded file information  so user can retry loading the unloaded files alone. Here there is no inconsistent data.

Pls give u r inputs

> "LOAD DATA INPATH" fails when the table already contains a file of the same name
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-1996
>                 URL: https://issues.apache.org/jira/browse/HIVE-1996
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Kirk True
>            Assignee: Chinna Rao Lalam
>         Attachments: HIVE-1996.1.Patch, HIVE-1996.Patch
>
>
> Steps:
> 1. From the command line copy the kv2.txt data file into the current user's HDFS directory:
> {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}}
> 2. In Hive, create the table:
> {{create table tst_src1 (key_ int, value_ string);}}
> 3. Load the data into the table from HDFS:
> {{load data inpath './kv2.txt' into table tst_src1;}}
> 4. Repeat step 1
> 5. Repeat step 3
> Expected:
> To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307.
> Actual:
> File is renamed, but {{Hive.copyFiles}} doesn't "see" the change in {{srcs}} as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error:
> {noformat}
> java.lang.NullPointerException
>     at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725)
>     at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541)
>     at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173)
>     at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
>     at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060)
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HIVE-1996) "LOAD DATA INPATH" fails when the table already contains a file of the same name

Posted by "Kirk True (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12995499#comment-12995499 ] 

Kirk True commented on HIVE-1996:
---------------------------------

This is very closely related to, but not the same as, HIVE-307. That bug specifically pertains to {{LOCAL}} files.

> "LOAD DATA INPATH" fails when the table already contains a file of the same name
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-1996
>                 URL: https://issues.apache.org/jira/browse/HIVE-1996
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Kirk True
>            Assignee: Kirk True
>
> Steps:
> 1. From the command line copy the kv2.txt data file into the current user's HDFS directory:
> {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}}
> 2. In Hive, create the table:
> {{create table tst_src1 (key_ int, value_ string);}}
> 3. Load the data into the table from HDFS:
> {{load data inpath './kv2.txt' into table tst_src1;}}
> 4. Repeat step 1
> 5. Repeat step 3
> Expected:
> To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307.
> Actual:
> File is renamed, but Hive.copyFiles doesn't "see" the change in "srcs" as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error:
> {noformat}
> java.lang.NullPointerException
>     at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725)
>     at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541)
>     at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173)
>     at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
>     at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060)
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> {noformat}

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-1996) "LOAD DATA INPATH" fails when the table already contains a file of the same name

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147146#comment-13147146 ] 

jiraposter@reviews.apache.org commented on HIVE-1996:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1610/
-----------------------------------------------------------

(Updated 2011-11-09 16:44:55.708392)


Review request for hive, Carl Steinbach and John Sichi.


Changes
-------

Load rename made it configurable.


Summary
-------

"LOAD DATA INPATH" fails when the table already contains a file of the same name. If any name confilcts occurs it will rename the file, After file name got changed it is trying to load with the old name because of this load is failed. Now we have changed the code like, load with the changed filename for that introduced a map it will maintain the old name and changed filename as key value pair and while loading need to use this map.


This addresses bug HIVE-1996.
    https://issues.apache.org/jira/browse/HIVE-1996


Diffs (updated)
-----

  trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1198626 
  trunk/conf/hive-default.xml 1198626 
  trunk/data/conf/hive-site.xml 1198626 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 1198626 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 1198626 
  trunk/ql/src/test/queries/clientpositive/input47.q PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/input47.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/1610/diff


Testing
-------

Added a test case for this scenario.


Thanks,

chinna


                
> "LOAD DATA INPATH" fails when the table already contains a file of the same name
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-1996
>                 URL: https://issues.apache.org/jira/browse/HIVE-1996
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Kirk True
>            Assignee: Chinna Rao Lalam
>         Attachments: HIVE-1996.1.Patch, HIVE-1996.2.Patch, HIVE-1996.Patch
>
>
> Steps:
> 1. From the command line copy the kv2.txt data file into the current user's HDFS directory:
> {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}}
> 2. In Hive, create the table:
> {{create table tst_src1 (key_ int, value_ string);}}
> 3. Load the data into the table from HDFS:
> {{load data inpath './kv2.txt' into table tst_src1;}}
> 4. Repeat step 1
> 5. Repeat step 3
> Expected:
> To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307.
> Actual:
> File is renamed, but {{Hive.copyFiles}} doesn't "see" the change in {{srcs}} as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error:
> {noformat}
> java.lang.NullPointerException
>     at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725)
>     at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541)
>     at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173)
>     at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
>     at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060)
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-1996) "LOAD DATA INPATH" fails when the table already contains a file of the same name

Posted by "Chinna Rao Lalam (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chinna Rao Lalam updated HIVE-1996:
-----------------------------------

    Attachment: HIVE-1996.1.Patch

> "LOAD DATA INPATH" fails when the table already contains a file of the same name
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-1996
>                 URL: https://issues.apache.org/jira/browse/HIVE-1996
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Kirk True
>            Assignee: Chinna Rao Lalam
>         Attachments: HIVE-1996.1.Patch, HIVE-1996.Patch
>
>
> Steps:
> 1. From the command line copy the kv2.txt data file into the current user's HDFS directory:
> {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}}
> 2. In Hive, create the table:
> {{create table tst_src1 (key_ int, value_ string);}}
> 3. Load the data into the table from HDFS:
> {{load data inpath './kv2.txt' into table tst_src1;}}
> 4. Repeat step 1
> 5. Repeat step 3
> Expected:
> To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307.
> Actual:
> File is renamed, but {{Hive.copyFiles}} doesn't "see" the change in {{srcs}} as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error:
> {noformat}
> java.lang.NullPointerException
>     at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725)
>     at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541)
>     at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173)
>     at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
>     at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060)
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-1996) "LOAD DATA INPATH" fails when the table already contains a file of the same name

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088911#comment-13088911 ] 

jiraposter@reviews.apache.org commented on HIVE-1996:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1610/
-----------------------------------------------------------

Review request for hive, Carl Steinbach and John Sichi.


Summary
-------

"LOAD DATA INPATH" fails when the table already contains a file of the same name. If any name confilcts occurs it will rename the file, After file name got changed it is trying to load with the old name because of this load is failed. Now we have changed the code like, load with the changed filename for that introduced a map it will maintain the old name and changed filename as key value pair and while loading need to use this map.


This addresses bug HIVE-1996.
    https://issues.apache.org/jira/browse/HIVE-1996


Diffs
-----

  trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 1160102 
  trunk/ql/src/test/queries/clientpositive/input44.q PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/input44.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/1610/diff


Testing
-------

Added a test case for this scenario.


Thanks,

chinna



> "LOAD DATA INPATH" fails when the table already contains a file of the same name
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-1996
>                 URL: https://issues.apache.org/jira/browse/HIVE-1996
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Kirk True
>            Assignee: Chinna Rao Lalam
>         Attachments: HIVE-1996.1.Patch, HIVE-1996.Patch
>
>
> Steps:
> 1. From the command line copy the kv2.txt data file into the current user's HDFS directory:
> {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}}
> 2. In Hive, create the table:
> {{create table tst_src1 (key_ int, value_ string);}}
> 3. Load the data into the table from HDFS:
> {{load data inpath './kv2.txt' into table tst_src1;}}
> 4. Repeat step 1
> 5. Repeat step 3
> Expected:
> To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307.
> Actual:
> File is renamed, but {{Hive.copyFiles}} doesn't "see" the change in {{srcs}} as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error:
> {noformat}
> java.lang.NullPointerException
>     at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725)
>     at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541)
>     at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173)
>     at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
>     at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060)
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-1996) "LOAD DATA INPATH" fails when the table already contains a file of the same name

Posted by "Chinna Rao Lalam (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chinna Rao Lalam updated HIVE-1996:
-----------------------------------

    Status: Patch Available  (was: Open)
    
> "LOAD DATA INPATH" fails when the table already contains a file of the same name
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-1996
>                 URL: https://issues.apache.org/jira/browse/HIVE-1996
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Kirk True
>            Assignee: Chinna Rao Lalam
>         Attachments: HIVE-1996.1.Patch, HIVE-1996.2.Patch, HIVE-1996.Patch
>
>
> Steps:
> 1. From the command line copy the kv2.txt data file into the current user's HDFS directory:
> {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}}
> 2. In Hive, create the table:
> {{create table tst_src1 (key_ int, value_ string);}}
> 3. Load the data into the table from HDFS:
> {{load data inpath './kv2.txt' into table tst_src1;}}
> 4. Repeat step 1
> 5. Repeat step 3
> Expected:
> To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307.
> Actual:
> File is renamed, but {{Hive.copyFiles}} doesn't "see" the change in {{srcs}} as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error:
> {noformat}
> java.lang.NullPointerException
>     at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725)
>     at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541)
>     at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173)
>     at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
>     at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060)
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-1996) "LOAD DATA INPATH" fails when the table already contains a file of the same name

Posted by "Chinna Rao Lalam (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chinna Rao Lalam updated HIVE-1996:
-----------------------------------

    Attachment: HIVE-1996.Patch

> "LOAD DATA INPATH" fails when the table already contains a file of the same name
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-1996
>                 URL: https://issues.apache.org/jira/browse/HIVE-1996
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Kirk True
>            Assignee: Chinna Rao Lalam
>         Attachments: HIVE-1996.Patch
>
>
> Steps:
> 1. From the command line copy the kv2.txt data file into the current user's HDFS directory:
> {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}}
> 2. In Hive, create the table:
> {{create table tst_src1 (key_ int, value_ string);}}
> 3. Load the data into the table from HDFS:
> {{load data inpath './kv2.txt' into table tst_src1;}}
> 4. Repeat step 1
> 5. Repeat step 3
> Expected:
> To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307.
> Actual:
> File is renamed, but {{Hive.copyFiles}} doesn't "see" the change in {{srcs}} as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error:
> {noformat}
> java.lang.NullPointerException
>     at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725)
>     at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541)
>     at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173)
>     at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
>     at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060)
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-1996) "LOAD DATA INPATH" fails when the table already contains a file of the same name

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Sichi updated HIVE-1996:
-----------------------------

    Status: Open  (was: Patch Available)

I tried running the test, but it fails for me.  I looked in hive.log and found

{noformat}
2011-07-07 14:45:34,359 ERROR hive.log (MetaStoreUtils.java:logAndThrowMetaException(778)) - java.io.FileNotFoundException: File file:/tmp1/load2_overwrite2 does not exist.
{noformat}


> "LOAD DATA INPATH" fails when the table already contains a file of the same name
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-1996
>                 URL: https://issues.apache.org/jira/browse/HIVE-1996
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Kirk True
>            Assignee: Chinna Rao Lalam
>         Attachments: HIVE-1996.Patch
>
>
> Steps:
> 1. From the command line copy the kv2.txt data file into the current user's HDFS directory:
> {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}}
> 2. In Hive, create the table:
> {{create table tst_src1 (key_ int, value_ string);}}
> 3. Load the data into the table from HDFS:
> {{load data inpath './kv2.txt' into table tst_src1;}}
> 4. Repeat step 1
> 5. Repeat step 3
> Expected:
> To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307.
> Actual:
> File is renamed, but {{Hive.copyFiles}} doesn't "see" the change in {{srcs}} as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error:
> {noformat}
> java.lang.NullPointerException
>     at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725)
>     at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541)
>     at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173)
>     at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
>     at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060)
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-1996) "LOAD DATA INPATH" fails when the table already contains a file of the same name

Posted by "Chinna Rao Lalam (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088903#comment-13088903 ] 

Chinna Rao Lalam commented on HIVE-1996:
----------------------------------------

This scenario will work if load from local but it will fail if loading from File-system. To replicate this scenarios i have used this queries "create table load_overwrite2 (key string, value string) stored as textfile location 'file:/tmp1/load2_overwrite2';" so as part of this query execution it should create this "file:/tmp1/load2_overwrite2". I have verified this in my environment it is working without fail. Pls let me know if any issues.

> "LOAD DATA INPATH" fails when the table already contains a file of the same name
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-1996
>                 URL: https://issues.apache.org/jira/browse/HIVE-1996
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Kirk True
>            Assignee: Chinna Rao Lalam
>         Attachments: HIVE-1996.1.Patch, HIVE-1996.Patch
>
>
> Steps:
> 1. From the command line copy the kv2.txt data file into the current user's HDFS directory:
> {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}}
> 2. In Hive, create the table:
> {{create table tst_src1 (key_ int, value_ string);}}
> 3. Load the data into the table from HDFS:
> {{load data inpath './kv2.txt' into table tst_src1;}}
> 4. Repeat step 1
> 5. Repeat step 3
> Expected:
> To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307.
> Actual:
> File is renamed, but {{Hive.copyFiles}} doesn't "see" the change in {{srcs}} as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error:
> {noformat}
> java.lang.NullPointerException
>     at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725)
>     at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541)
>     at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173)
>     at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
>     at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060)
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HIVE-1996) "LOAD DATA INPATH" fails when the table already contains a file of the same name

Posted by "Kirk True (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kirk True updated HIVE-1996:
----------------------------

    Description: 
Steps:

1. From the command line copy the kv2.txt data file into the current user's HDFS directory:

{{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}}

2. In Hive, create the table:

{{create table tst_src1 (key_ int, value_ string);}}

3. Load the data into the table from HDFS:

{{load data inpath './kv2.txt' into table tst_src1;}}

4. Repeat step 1
5. Repeat step 3

Expected:

To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307.

Actual:

File is renamed, but Hive.copyFiles doesn't "see" the change in "srcs" as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error:

{noformat}
java.lang.NullPointerException
    at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725)
    at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541)
    at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173)
    at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
{noformat}

  was:
Steps:

1. From the command line copy the kv2.txt data file into the current user's HDFS directory:

    {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}}

2. In Hive, create the table:

    {{create table tst_src1 (key_ int, value_ string);}}

3. Load the data into the table from HDFS:

    {{load data inpath './kv2.txt' into table tst_src1;}}

4. Repeat step 1
5. Repeat step 3

Expected:

To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307.

Actual:

File is renamed, but Hive.copyFiles doesn't "see" the change in "srcs" as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error:

{{java.lang.NullPointerException
    at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725)
    at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541)
    at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173)
    at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
}}


> "LOAD DATA INPATH" fails when the table already contains a file of the same name
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-1996
>                 URL: https://issues.apache.org/jira/browse/HIVE-1996
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Kirk True
>            Assignee: Kirk True
>
> Steps:
> 1. From the command line copy the kv2.txt data file into the current user's HDFS directory:
> {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}}
> 2. In Hive, create the table:
> {{create table tst_src1 (key_ int, value_ string);}}
> 3. Load the data into the table from HDFS:
> {{load data inpath './kv2.txt' into table tst_src1;}}
> 4. Repeat step 1
> 5. Repeat step 3
> Expected:
> To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307.
> Actual:
> File is renamed, but Hive.copyFiles doesn't "see" the change in "srcs" as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error:
> {noformat}
> java.lang.NullPointerException
>     at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725)
>     at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541)
>     at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173)
>     at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
>     at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060)
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> {noformat}

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-1996) "LOAD DATA INPATH" fails when the table already contains a file of the same name

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103155#comment-13103155 ] 

He Yongqiang commented on HIVE-1996:
------------------------------------

For this, we need to make the rename optional, and by default disabled. If disabled rename, should throw an error to user.

> "LOAD DATA INPATH" fails when the table already contains a file of the same name
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-1996
>                 URL: https://issues.apache.org/jira/browse/HIVE-1996
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Kirk True
>            Assignee: Chinna Rao Lalam
>         Attachments: HIVE-1996.1.Patch, HIVE-1996.Patch
>
>
> Steps:
> 1. From the command line copy the kv2.txt data file into the current user's HDFS directory:
> {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}}
> 2. In Hive, create the table:
> {{create table tst_src1 (key_ int, value_ string);}}
> 3. Load the data into the table from HDFS:
> {{load data inpath './kv2.txt' into table tst_src1;}}
> 4. Repeat step 1
> 5. Repeat step 3
> Expected:
> To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307.
> Actual:
> File is renamed, but {{Hive.copyFiles}} doesn't "see" the change in {{srcs}} as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error:
> {noformat}
> java.lang.NullPointerException
>     at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725)
>     at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541)
>     at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173)
>     at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
>     at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060)
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HIVE-1996) "LOAD DATA INPATH" fails when the table already contains a file of the same name

Posted by "Kirk True (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kirk True updated HIVE-1996:
----------------------------

    Description: 
Steps:

1. From the command line copy the kv2.txt data file into the current user's HDFS directory:

{{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}}

2. In Hive, create the table:

{{create table tst_src1 (key_ int, value_ string);}}

3. Load the data into the table from HDFS:

{{load data inpath './kv2.txt' into table tst_src1;}}

4. Repeat step 1
5. Repeat step 3

Expected:

To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307.

Actual:

File is renamed, but {{Hive.copyFiles}} doesn't "see" the change in {{srcs}} as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error:

{noformat}
java.lang.NullPointerException
    at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725)
    at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541)
    at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173)
    at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
{noformat}

  was:
Steps:

1. From the command line copy the kv2.txt data file into the current user's HDFS directory:

{{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}}

2. In Hive, create the table:

{{create table tst_src1 (key_ int, value_ string);}}

3. Load the data into the table from HDFS:

{{load data inpath './kv2.txt' into table tst_src1;}}

4. Repeat step 1
5. Repeat step 3

Expected:

To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307.

Actual:

File is renamed, but Hive.copyFiles doesn't "see" the change in "srcs" as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error:

{noformat}
java.lang.NullPointerException
    at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725)
    at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541)
    at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173)
    at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
{noformat}


> "LOAD DATA INPATH" fails when the table already contains a file of the same name
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-1996
>                 URL: https://issues.apache.org/jira/browse/HIVE-1996
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Kirk True
>            Assignee: Kirk True
>
> Steps:
> 1. From the command line copy the kv2.txt data file into the current user's HDFS directory:
> {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}}
> 2. In Hive, create the table:
> {{create table tst_src1 (key_ int, value_ string);}}
> 3. Load the data into the table from HDFS:
> {{load data inpath './kv2.txt' into table tst_src1;}}
> 4. Repeat step 1
> 5. Repeat step 3
> Expected:
> To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307.
> Actual:
> File is renamed, but {{Hive.copyFiles}} doesn't "see" the change in {{srcs}} as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error:
> {noformat}
> java.lang.NullPointerException
>     at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725)
>     at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541)
>     at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173)
>     at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
>     at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060)
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> {noformat}

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-1996) "LOAD DATA INPATH" fails when the table already contains a file of the same name

Posted by "Chinna Rao Lalam (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036197#comment-13036197 ] 

Chinna Rao Lalam commented on HIVE-1996:
----------------------------------------

After file name got changed it is trying to load with the old name because of this load is failed. 
Now we have changed the code like,  load with the changed filename  for that introduced a map it will maintain  the old name and changed filename as  key  value pair  and  while loading need to use this map.

> "LOAD DATA INPATH" fails when the table already contains a file of the same name
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-1996
>                 URL: https://issues.apache.org/jira/browse/HIVE-1996
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Kirk True
>            Assignee: Chinna Rao Lalam
>         Attachments: HIVE-1996.Patch
>
>
> Steps:
> 1. From the command line copy the kv2.txt data file into the current user's HDFS directory:
> {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}}
> 2. In Hive, create the table:
> {{create table tst_src1 (key_ int, value_ string);}}
> 3. Load the data into the table from HDFS:
> {{load data inpath './kv2.txt' into table tst_src1;}}
> 4. Repeat step 1
> 5. Repeat step 3
> Expected:
> To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307.
> Actual:
> File is renamed, but {{Hive.copyFiles}} doesn't "see" the change in {{srcs}} as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error:
> {noformat}
> java.lang.NullPointerException
>     at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725)
>     at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541)
>     at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173)
>     at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
>     at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060)
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira