You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org> on 2010/08/20 00:24:16 UTC

[jira] Created: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode

referencing an added file by it's name in a transform script does not work in hive local mode
---------------------------------------------------------------------------------------------

                 Key: HIVE-1570
                 URL: https://issues.apache.org/jira/browse/HIVE-1570
             Project: Hadoop Hive
          Issue Type: Bug
          Components: Query Processor
            Reporter: Joydeep Sen Sarma


Yongqiang tried this and it fails in local mode:

add file ../data/scripts/dumpdata_script.py;

select count(distinct subq.key) from
(FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key = 10) subq;


this needs to be fixed because it means we cannot choose local mode automatically in case of transform scripts (since different paths need to be used for cluster vs. local mode execution)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma updated HIVE-1570:
------------------------------------

    Attachment: 1570.3.patch

added a console output for local mapred jobs containing location of execution log for debugging.

> referencing an added file by it's name in a transform script does not work in hive local mode
> ---------------------------------------------------------------------------------------------
>
>                 Key: HIVE-1570
>                 URL: https://issues.apache.org/jira/browse/HIVE-1570
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: 1570.1.patch, 1570.2.patch, 1570.3.patch
>
>
> Yongqiang tried this and it fails in local mode:
> add file ../data/scripts/dumpdata_script.py;
> select count(distinct subq.key) from
> (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key = 10) subq;
> this needs to be fixed because it means we cannot choose local mode automatically in case of transform scripts (since different paths need to be used for cluster vs. local mode execution)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma updated HIVE-1570:
------------------------------------

    Attachment: 1570.4.patch

added fix for hive-1520 - don't reset HADOOP_HEAPSIZE unless the child jvm is being launched for local mode execution.

it's a one liner - simpler to get it all in in one shot.

> referencing an added file by it's name in a transform script does not work in hive local mode
> ---------------------------------------------------------------------------------------------
>
>                 Key: HIVE-1570
>                 URL: https://issues.apache.org/jira/browse/HIVE-1570
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: 1570.1.patch, 1570.2.patch, 1570.3.patch, 1570.4.patch
>
>
> Yongqiang tried this and it fails in local mode:
> add file ../data/scripts/dumpdata_script.py;
> select count(distinct subq.key) from
> (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key = 10) subq;
> this needs to be fixed because it means we cannot choose local mode automatically in case of transform scripts (since different paths need to be used for cluster vs. local mode execution)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma updated HIVE-1570:
------------------------------------

    Attachment: 1570.2.patch

working patch. no need for new test. had to modify some other tests to use 'add file'.

> referencing an added file by it's name in a transform script does not work in hive local mode
> ---------------------------------------------------------------------------------------------
>
>                 Key: HIVE-1570
>                 URL: https://issues.apache.org/jira/browse/HIVE-1570
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: 1570.1.patch, 1570.2.patch
>
>
> Yongqiang tried this and it fails in local mode:
> add file ../data/scripts/dumpdata_script.py;
> select count(distinct subq.key) from
> (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key = 10) subq;
> this needs to be fixed because it means we cannot choose local mode automatically in case of transform scripts (since different paths need to be used for cluster vs. local mode execution)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909943#action_12909943 ] 

Ning Zhang commented on HIVE-1570:
----------------------------------

Joy, scriptfile1.q actually failed on TestMinimrCliDriver with the command 

ant test -Dhadoop.version=0.20.0 -Dtestcase=TestMinimrCliDriver -Dminimr.query.files=scriptfile1.q

It gives NPE on ExecDriver.java:625. This NPE is a different issue and it can be solved by changing 'conf' to 'job'. But even after this change the NPE is gone and the test still failed. Should we move this test outside minimr.query.files for now before this JIRA is fixed?

> referencing an added file by it's name in a transform script does not work in hive local mode
> ---------------------------------------------------------------------------------------------
>
>                 Key: HIVE-1570
>                 URL: https://issues.apache.org/jira/browse/HIVE-1570
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>
> Yongqiang tried this and it fails in local mode:
> add file ../data/scripts/dumpdata_script.py;
> select count(distinct subq.key) from
> (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key = 10) subq;
> this needs to be fixed because it means we cannot choose local mode automatically in case of transform scripts (since different paths need to be used for cluster vs. local mode execution)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode

Posted by "John Sichi (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Sichi reassigned HIVE-1570:
--------------------------------

    Assignee: Joydeep Sen Sarma

> referencing an added file by it's name in a transform script does not work in hive local mode
> ---------------------------------------------------------------------------------------------
>
>                 Key: HIVE-1570
>                 URL: https://issues.apache.org/jira/browse/HIVE-1570
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>
> Yongqiang tried this and it fails in local mode:
> add file ../data/scripts/dumpdata_script.py;
> select count(distinct subq.key) from
> (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key = 10) subq;
> this needs to be fixed because it means we cannot choose local mode automatically in case of transform scripts (since different paths need to be used for cluster vs. local mode execution)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909950#action_12909950 ] 

Joydeep Sen Sarma commented on HIVE-1570:
-----------------------------------------

sure. confused - because the tests were all passing earlier when i added the minimr tests.

> referencing an added file by it's name in a transform script does not work in hive local mode
> ---------------------------------------------------------------------------------------------
>
>                 Key: HIVE-1570
>                 URL: https://issues.apache.org/jira/browse/HIVE-1570
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>
> Yongqiang tried this and it fails in local mode:
> add file ../data/scripts/dumpdata_script.py;
> select count(distinct subq.key) from
> (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key = 10) subq;
> this needs to be fixed because it means we cannot choose local mode automatically in case of transform scripts (since different paths need to be used for cluster vs. local mode execution)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-1570:
-----------------------------

       Resolution: Fixed
    Fix Version/s: 0.7.0
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

Committed. Thanks Joy

> referencing an added file by it's name in a transform script does not work in hive local mode
> ---------------------------------------------------------------------------------------------
>
>                 Key: HIVE-1570
>                 URL: https://issues.apache.org/jira/browse/HIVE-1570
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>             Fix For: 0.7.0
>
>         Attachments: 1570.1.patch, 1570.2.patch, 1570.3.patch, 1570.4.patch, 1570.5.patch
>
>
> Yongqiang tried this and it fails in local mode:
> add file ../data/scripts/dumpdata_script.py;
> select count(distinct subq.key) from
> (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key = 10) subq;
> this needs to be fixed because it means we cannot choose local mode automatically in case of transform scripts (since different paths need to be used for cluster vs. local mode execution)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma updated HIVE-1570:
------------------------------------

    Attachment: 1570.5.patch

also adding trivial patch for HIVE-1473. filed separate patches for 1473 and 1520 as well - but folded in everything here.

> referencing an added file by it's name in a transform script does not work in hive local mode
> ---------------------------------------------------------------------------------------------
>
>                 Key: HIVE-1570
>                 URL: https://issues.apache.org/jira/browse/HIVE-1570
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: 1570.1.patch, 1570.2.patch, 1570.3.patch, 1570.4.patch, 1570.5.patch
>
>
> Yongqiang tried this and it fails in local mode:
> add file ../data/scripts/dumpdata_script.py;
> select count(distinct subq.key) from
> (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key = 10) subq;
> this needs to be fixed because it means we cannot choose local mode automatically in case of transform scripts (since different paths need to be used for cluster vs. local mode execution)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma updated HIVE-1570:
------------------------------------

    Attachment: 1570.1.patch

before running a map-reduce job in local mode we:
1. set a new working directory
2. symlink all added files from that working directory

this is pretty much identical to how hadoop sets up task execution environment. all references to scripts and add files using their names only now resolve correctly in local mode.

there was some hacky code in SemanticAnalyzer.java to deal with this that doesn't work in all cases (when referenced file is not the first item in command line or in automatic local mode). i have deleted it.

duplicated one of the tests so that we get coverage against a real cluster (scriptfile1.q executed against minimr) and local mode (scriptfile2.q).

still running tests.

> referencing an added file by it's name in a transform script does not work in hive local mode
> ---------------------------------------------------------------------------------------------
>
>                 Key: HIVE-1570
>                 URL: https://issues.apache.org/jira/browse/HIVE-1570
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: 1570.1.patch
>
>
> Yongqiang tried this and it fails in local mode:
> add file ../data/scripts/dumpdata_script.py;
> select count(distinct subq.key) from
> (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key = 10) subq;
> this needs to be fixed because it means we cannot choose local mode automatically in case of transform scripts (since different paths need to be used for cluster vs. local mode execution)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900501#action_12900501 ] 

Joydeep Sen Sarma commented on HIVE-1570:
-----------------------------------------

hmmm - how come scriptfile1.q works then?


CREATE TABLE dest1(key INT, value STRING);

ADD FILE src/test/scripts/testgrep;

FROM (
  FROM src
  SELECT TRANSFORM(src.key, src.value)
         USING 'testgrep' AS (tkey, tvalue) 
  CLUSTER BY tkey 
) tmap
INSERT OVERWRITE TABLE dest1 SELECT tmap.tkey, tmap.tvalue;


> referencing an added file by it's name in a transform script does not work in hive local mode
> ---------------------------------------------------------------------------------------------
>
>                 Key: HIVE-1570
>                 URL: https://issues.apache.org/jira/browse/HIVE-1570
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>
> Yongqiang tried this and it fails in local mode:
> add file ../data/scripts/dumpdata_script.py;
> select count(distinct subq.key) from
> (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key = 10) subq;
> this needs to be fixed because it means we cannot choose local mode automatically in case of transform scripts (since different paths need to be used for cluster vs. local mode execution)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma updated HIVE-1570:
------------------------------------

    Status: Patch Available  (was: Open)

> referencing an added file by it's name in a transform script does not work in hive local mode
> ---------------------------------------------------------------------------------------------
>
>                 Key: HIVE-1570
>                 URL: https://issues.apache.org/jira/browse/HIVE-1570
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: 1570.1.patch, 1570.2.patch
>
>
> Yongqiang tried this and it fails in local mode:
> add file ../data/scripts/dumpdata_script.py;
> select count(distinct subq.key) from
> (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key = 10) subq;
> this needs to be fixed because it means we cannot choose local mode automatically in case of transform scripts (since different paths need to be used for cluster vs. local mode execution)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.