You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Azim Uddin (JIRA)" <ji...@apache.org> on 2014/07/03 18:18:24 UTC

[jira] [Created] (HIVE-7347) Pig Query with defined schema fails when submitted via WebHcat -Query parameter

Azim Uddin created HIVE-7347:
--------------------------------

             Summary: Pig Query with defined schema fails when submitted via WebHcat -Query parameter
                 Key: HIVE-7347
                 URL: https://issues.apache.org/jira/browse/HIVE-7347
             Project: Hive
          Issue Type: Bug
          Components: WebHCat
    Affects Versions: 0.13.0, 0.12.0
         Environment: HDP 2.1 on Windows; HDInsight deploying HDP 2.1  
            Reporter: Azim Uddin


1. Consider you are using HDP 2.1 on Windows, and you have a tsv file (named rawInput.tsv) like this (just an example, you can use any) -

http://a.com	http://b.com	1
http://b.com	http://c.com	2
http://d.com	http://e.com	3

2. With the tsv file uploaded to HDFS, run the following Pig job via WebHcat using 'execute' parameter, something like this-

curl.exe -d execute="rawInput = load '/test/data' using PigStorage as (SourceUrl:chararray, DestinationUrl:chararray, InstanceCount:int); readyInput = limit rawInput 10; store readyInput into '/test/output' using PigStorage;" -d statusdir="/test/status" "http://localhost:50111/templeton/v1/pig?user.name=hadoop" --user hadoop:any

The job fails with exit code 255 -
"[main] org.apache.hive.hcatalog.templeton.tool.LaunchMapper: templeton: job failed with exit code 255"

>From stderr, we see the following -"readyInput was unexpected at this time."

3. The same job works via Pig Grunt Shell and if we use the WebHcat 'file' parameter, instead of 'execute' parameter - 

a. Create a pig script called pig-script.txt with the query below and put it HDFS /test/script
rawInput = load '/test/data' using PigStorage as (SourceUrl:chararray, DestinationUrl:chararray, InstanceCount:int);
readyInput = limit rawInput 10;
store readyInput into '/test/Output' using PigStorage;

b. Run the job via webHcat:
curl.exe -d file="/test/script/pig_script.txt" -d statusdir="/test/status" "http://localhost:50111/templeton/v1/pig?user.name=hadoop" --user hadoop:any

4. Also, WebHcat 'execute' option works if we don't define the schema in the Pig query, something like this-

curl.exe -d execute="rawInput = load '/test/data' using PigStorage; readyInput = limit rawInput 10; store readyInput into '/test/output' using PigStorage;" -d statusdir="/test/status" "http://localhost:50111/templeton/v1/pig?user.name=hadoop" --user hadoop:any


Ask is-
WebHcat 'execute' option should work for Pig query with schema defined - it appears to be a parsing issue with WebHcat.



--
This message was sent by Atlassian JIRA
(v6.2#6252)