You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Shuja Rehman <sh...@gmail.com> on 2010/06/10 00:07:00 UTC

Load data from xml using Mapper.py in hive

Hi
I have created a table in hive (Suppose table1 with two columns, col1 and
col2 )

now i have an xml file for which i have write a python script which read the
xml file and transform it in single row with tab seperated
e.g the output of python script can be

row 1 = val1     val2
row2 =  val3     val4

so the output of file has straight rows with the help of python script. now
i want to load this into created table. I have seen the example of in which
the data is first loaded in u_data table then transform it using python
script in u_data_new but in m scenario. it does not fit as i have xml file
as source.


Kindly let me know can I achieve this??
Thanks

-- 
Regards
Shuja-ur-Rehman Baig
_________________________________
MS CS - School of Science and Engineering
Lahore University of Management Sciences (LUMS)
Sector U, DHA, Lahore, 54792, Pakistan
Cell: +92 3214207445

Re: Load data from xml using Mapper.py in hive

Posted by Shuja Rehman <sh...@gmail.com>.
Hi Ashish

Can you tell me how to create a table using \001 as record delimiter. i am
trying according to this

*create table test (xmlFile String)ROW FORMAT DELIMITED FIELDS TERMINATED BY
'\t' LINES TERMINATED BY '\001' ;*

but it giving me the error saying that

*ERROR ql.Driver: FAILED: Error in semantic analysis: LINES TERMINATED BY
only supports newline '\n' right now*



On Thu, Jun 10, 2010 at 3:05 AM, Ashish Thusoo <at...@facebook.com> wrote:

>  You could load this whole xml file into a table with a single row and a
> single column. The default record delimiter is \n but you can create a table
> where the record delimiter is \001. Once you do that you can follow the
> approach that you described below. Will this solve your problem?
>
> Ashish
>
>  ------------------------------
> *From:* Shuja Rehman [mailto:shujamughal@gmail.com]
> *Sent:* Wednesday, June 09, 2010 3:07 PM
> *To:* hive-user@hadoop.apache.org
> *Subject:* Load data from xml using Mapper.py in hive
>
> Hi
> I have created a table in hive (Suppose table1 with two columns, col1 and
> col2 )
>
> now i have an xml file for which i have write a python script which read
> the xml file and transform it in single row with tab seperated
> e.g the output of python script can be
>
> row 1 = val1     val2
> row2 =  val3     val4
>
> so the output of file has straight rows with the help of python script. now
> i want to load this into created table. I have seen the example of in which
> the data is first loaded in u_data table then transform it using python
> script in u_data_new but in m scenario. it does not fit as i have xml file
> as source.
>
>
> Kindly let me know can I achieve this??
> Thanks
>
> --
> Regards
> Shuja-ur-Rehman Baig
> _________________________________
> MS CS - School of Science and Engineering
> Lahore University of Management Sciences (LUMS)
> Sector U, DHA, Lahore, 54792, Pakistan
> Cell: +92 3214207445
>



-- 
Regards
Shuja-ur-Rehman Baig
_________________________________
MS CS - School of Science and Engineering
Lahore University of Management Sciences (LUMS)
Sector U, DHA, Lahore, 54792, Pakistan
Cell: +92 3214207445

Re: Load data from xml using Mapper.py in hive

Posted by Shuja Rehman <sh...@gmail.com>.
Hi Tomasz Domański
Thanks for answer. This problem is solved now. This exception was due to
file which was missing before. now the program runs fine if whole xml file
is in one line not having (\n). But the actual problem is that hive does not
support row terminator other than '\n' according to my research. so the
problem i want to load whole xml file into single row and single column so
groovy script can have whole xml file as input and then parse it.

Please let me know how to do it?
Thanks

2010/6/11 Tomasz Domański <do...@gmail.com>

> Hi Shuja,
>
> the answer seems to be in lines:
>
> Caused by: java.io.IOException: Cannot run program "sampleMapper.groovy": java.io.IOException: error=2, No such file or directory
>
>
> Hadoop can't see this file or can't run it.
>
> 1. make sure you added file correctly
> 2. check if hadoop can run script on your hadoop machines
>
> Can you run this script in console on hadoop machine like
>
> >sampleMapper.groovy
>
> or you runn it:
>
> > groovy sampleMapper.groovy
>
> Mabe you should specify that groovy is needed to run your script.
>
> try  to change your select into: "  ... using 'groovy sampleMapper.groovy'
> ... "
>
>
> On 10 June 2010 14:01, Shuja Rehman <sh...@gmail.com> wrote:
>
>> and on the link
>>
>> http://localhost:50030/jobfailures.jsp?jobid=job_201006101118_0009&kind=map&cause=failed
>>
>> i have found this output.
>>
>> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmlfile":"*Hello*world"}
>>
>>
>> 	at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:171)
>> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>
>>
>> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
>> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmlfile":"*Hello*world"}
>>
>>
>> 	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:417)
>> 	at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:153)
>> 	... 4 more
>> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot initialize ScriptOperator
>>
>>
>> 	at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:319)
>> 	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:456)
>> 	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:696)
>>
>>
>> 	at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
>> 	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:456)
>> 	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:696)
>>
>>
>> 	at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:45)
>> 	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:456)
>> 	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:696)
>>
>>
>> 	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:400)
>> 	... 5 more
>> Caused by: java.io.IOException: Cannot run program "sampleMapper.groovy": java.io.IOException: error=2, No such file or directory
>>
>>
>> 	at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
>> 	at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279)
>> 	... 14 more
>> Caused by: java.io.IOException: java.io.IOException: error=2, No such file or directory
>>
>>
>> 	at java.lang.UNIXProcess.(UNIXProcess.java:148)
>> 	at java.lang.ProcessImpl.start(ProcessImpl.java:65)
>> 	at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
>> 	... 15 more
>>
>>
>>
>>
>> On Thu, Jun 10, 2010 at 1:57 PM, Shuja Rehman <sh...@gmail.com>wrote:
>>
>>> I have changes the logging level according to this command
>>>
>>> *bin/hive -hiveconf hive.root.logger=INFO,console *
>>>
>>> and the outout is
>>>
>>>
>>> ------------------------------------------------------------------------------------------------------------------------------
>>> 10/06/10 13:51:20 INFO parse.ParseDriver: Parsing command: INSERT
>>> OVERWRITE TABLE test_new
>>>
>>> SELECT
>>>   TRANSFORM (xmlfile)
>>>   USING 'sampleMapper.groovy'
>>>   AS (b,c)
>>> FROM test
>>> 10/06/10 13:51:20 INFO parse.ParseDriver: Parse Completed
>>> 10/06/10 13:51:20 INFO parse.SemanticAnalyzer: Starting Semantic Analysis
>>> 10/06/10 13:51:20 INFO parse.SemanticAnalyzer: Completed phase 1 of
>>> Semantic Analysis
>>> 10/06/10 13:51:20 INFO parse.SemanticAnalyzer: Get metadata for source
>>> tables
>>> 10/06/10 13:51:20 INFO metastore.HiveMetaStore: 0: Opening raw store with
>>> implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
>>> 10/06/10 13:51:20 INFO metastore.ObjectStore: ObjectStore, initialize
>>> called
>>> 10/06/10 13:51:20 ERROR DataNucleus.Plugin: Bundle "org.eclipse.jdt.core"
>>> requires "org.eclipse.core.resources" but it cannot be resolved.
>>> 10/06/10 13:51:20 ERROR DataNucleus.Plugin: Bundle "org.eclipse.jdt.core"
>>> requires "org.eclipse.core.runtime" but it cannot be resolved.
>>> 10/06/10 13:51:20 ERROR DataNucleus.Plugin: Bundle "org.eclipse.jdt.core"
>>> requires "org.eclipse.text" but it cannot be resolved.
>>> 10/06/10 13:51:22 INFO metastore.ObjectStore: Initialized ObjectStore
>>> 10/06/10 13:51:22 INFO metastore.HiveMetaStore: 0: get_table : db=default
>>> tbl=test
>>> 10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
>>> 10/06/10 13:51:23 INFO parse.SemanticAnalyzer: Get metadata for
>>> subqueries
>>> 10/06/10 13:51:23 INFO parse.SemanticAnalyzer: Get metadata for
>>> destination tables
>>> 10/06/10 13:51:23 INFO metastore.HiveMetaStore: 0: get_table : db=default
>>> tbl=test_new
>>> 10/06/10 13:51:23 INFO hive.log: DDL: struct test_new { string b, string
>>> c}
>>> 10/06/10 13:51:23 INFO parse.SemanticAnalyzer: Completed getting MetaData
>>> in Semantic Analysis
>>> 10/06/10 13:51:23 INFO hive.log: DDL: struct test_new { string b, string
>>> c}
>>> 10/06/10 13:51:23 INFO ppd.OpProcFactory: Processing for FS(3)
>>> 10/06/10 13:51:23 INFO ppd.OpProcFactory: Processing for SCR(2)
>>> 10/06/10 13:51:23 INFO ppd.OpProcFactory: Processing for SEL(1)
>>> 10/06/10 13:51:23 INFO ppd.OpProcFactory: Processing for TS(0)
>>> 10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
>>> 10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
>>> 10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
>>> 10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
>>> 10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
>>> 10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
>>> 10/06/10 13:51:23 INFO parse.SemanticAnalyzer: Completed plan generation
>>> 10/06/10 13:51:23 INFO ql.Driver: Semantic Analysis Completed
>>> 10/06/10 13:51:23 INFO ql.Driver: Returning Hive schema:
>>> Schema(fieldSchemas:[FieldSchema(name:b, type:string, comment:null),
>>> FieldSchema(name:c, type:string, comment:null)], properties:null)
>>> 10/06/10 13:51:23 INFO ql.Driver: query plan =
>>> file:/tmp/root/hive_2010-06-10_13-51-20_112_5091815325633732890/queryplan.xml
>>> 10/06/10 13:51:24 INFO ql.Driver: Starting command: INSERT OVERWRITE
>>> TABLE test_new
>>>
>>> SELECT
>>>   TRANSFORM (xmlfile)
>>>   USING 'sampleMapper.groovy'
>>>   AS (b,c)
>>> FROM test
>>> Total MapReduce jobs = 2
>>> 10/06/10 13:51:24 INFO ql.Driver: Total MapReduce jobs = 2
>>> Launching Job 1 out of 2
>>> 10/06/10 13:51:24 INFO ql.Driver: Launching Job 1 out of 2
>>> Number of reduce tasks is set to 0 since there's no reduce operator
>>> 10/06/10 13:51:24 INFO exec.ExecDriver: Number of reduce tasks is set to
>>> 0 since there's no reduce operator
>>> 10/06/10 13:51:24 INFO exec.ExecDriver: Using
>>> org.apache.hadoop.hive.ql.io.HiveInputFormat
>>> 10/06/10 13:51:24 INFO exec.ExecDriver: Processing alias test
>>> 10/06/10 13:51:24 INFO exec.ExecDriver: Adding input file
>>> hdfs://localhost:9000/user/hive/warehouse/test
>>> 10/06/10 13:51:24 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 10/06/10 13:51:24 INFO mapred.FileInputFormat: Total input paths to
>>> process : 1
>>> Starting Job = job_201006101118_0009, Tracking URL =
>>> http://localhost:50030/jobdetails.jsp?jobid=job_201006101118_0009
>>> 10/06/10 13:51:25 INFO exec.ExecDriver: Starting Job =
>>> job_201006101118_0009, Tracking URL =
>>> http://localhost:50030/jobdetails.jsp?jobid=job_201006101118_0009
>>> Kill Command = /usr/local/hadoop/hadoop-0.20.2/bin/../bin/hadoop job
>>> -Dmapred.job.tracker=localhost:9001 -kill job_201006101118_0009
>>> 10/06/10 13:51:25 INFO exec.ExecDriver: Kill Command =
>>> /usr/local/hadoop/hadoop-0.20.2/bin/../bin/hadoop job
>>> -Dmapred.job.tracker=localhost:9001 -kill job_201006101118_0009
>>> 2010-06-10 13:51:32,255 Stage-1 map = 0%,  reduce = 0%
>>> 10/06/10 13:51:32 INFO exec.ExecDriver: 2010-06-10 13:51:32,255 Stage-1
>>> map = 0%,  reduce = 0%
>>> 2010-06-10 13:51:35,305 Stage-1 map = 50%,  reduce = 0%
>>> 10/06/10 13:51:35 INFO exec.ExecDriver: 2010-06-10 13:51:35,305 Stage-1
>>> map = 50%,  reduce = 0%
>>> 2010-06-10 13:51:58,505 Stage-1 map = 100%,  reduce = 100%
>>> 10/06/10 13:51:58 INFO exec.ExecDriver: 2010-06-10 13:51:58,505 Stage-1
>>> map = 100%,  reduce = 100%
>>> Ended Job = job_201006101118_0009 with errors
>>> 10/06/10 13:51:58 ERROR exec.ExecDriver: Ended Job =
>>> job_201006101118_0009 with errors
>>>
>>> Task with the most failures(4):
>>> -----
>>> Task ID:
>>>   task_201006101118_0009_m_000000
>>>
>>> URL:
>>>
>>> http://localhost:50030/taskdetails.jsp?jobid=job_201006101118_0009&tipid=task_201006101118_0009_m_000000
>>> -----
>>>
>>> 10/06/10 13:51:58 ERROR exec.ExecDriver:
>>> Task with the most failures(4):
>>> -----
>>> Task ID:
>>>   task_201006101118_0009_m_000000
>>>
>>> URL:
>>>
>>> http://localhost:50030/taskdetails.jsp?jobid=job_201006101118_0009&tipid=task_201006101118_0009_m_000000
>>> -----
>>>
>>>
>>> FAILED: Execution Error, return code 2 from
>>> org.apache.hadoop.hive.ql.exec.ExecDriver
>>> 10/06/10 13:51:58 ERROR ql.Driver: FAILED: Execution Error, return code 2
>>> from org.apache.hadoop.hive.ql.exec.ExecDriver
>>>
>>>
>>> -----------------------------------------------------------------------------------------------------------------------------
>>>
>>> Any clue???
>>>
>>>
>>> On Thu, Jun 10, 2010 at 1:43 PM, Sonal Goyal <so...@gmail.com>wrote:
>>>
>>>> Can you try changing your logging level to debug and see the exact
>>>> error message in hive.log?
>>>>
>>>> Thanks and Regards,
>>>> Sonal
>>>> www.meghsoft.com
>>>> http://in.linkedin.com/in/sonalgoyal
>>>>
>>>>
>>>>
>>>> On Thu, Jun 10, 2010 at 5:07 PM, Shuja Rehman <sh...@gmail.com>
>>>> wrote:
>>>> > Hi
>>>> > I have try to do as you described. Let me explain in steps.
>>>> >
>>>> > 1- create table test (xmlFile String);
>>>> >
>>>> ----------------------------------------------------------------------------------
>>>> >
>>>> > 2-LOAD DATA LOCAL INPATH '1.xml'
>>>> > OVERWRITE INTO TABLE test;
>>>> >
>>>> ----------------------------------------------------------------------------------
>>>> >
>>>> > 3-CREATE TABLE test_new (
>>>> >     b STRING,
>>>> >     c STRING
>>>> >   )
>>>> > ROW FORMAT DELIMITED
>>>> > FIELDS TERMINATED BY '\t';
>>>> >
>>>> >
>>>> ----------------------------------------------------------------------------------
>>>> > 4-add FILE sampleMapper.groovy;
>>>> >
>>>> ----------------------------------------------------------------------------------
>>>> > 5- INSERT OVERWRITE TABLE test_new
>>>> > SELECT
>>>> >   TRANSFORM (xmlfile)
>>>> >   USING 'sampleMapper.groovy'
>>>> >   AS (b,c)
>>>> > FROM test;
>>>> >
>>>> ----------------------------------------------------------------------------------
>>>> > XML FILE:
>>>> > xml file has only one row for testing purpose which is
>>>> >
>>>> > <xy><a><b>Hello</b><c>world</c></a></xy>
>>>> >
>>>> ----------------------------------------------------------------------------------
>>>> > MAPPER
>>>> > and i have write the mapper in groovy to parse it. the mapper is
>>>> >
>>>> >    def xmlData =""
>>>> >  System.in.withReader {
>>>> >         xmlData=xmlData+ it.readLine()
>>>> > }
>>>> >
>>>> > def xy = new XmlParser().parseText(xmlData)
>>>> > def b=xy.a.b.text()
>>>> >     def c=xy.a.c.text()
>>>> >     println  ([b,c].join('\t') )
>>>> >
>>>> ----------------------------------------------------------------------------------
>>>> > Now step 1-4 are fine but when i perform step 5 which will load the
>>>> data
>>>> > from test table to new table using mapper, it throws the error. The
>>>> error on
>>>> > console is
>>>> >
>>>> > FAILED: Execution Error, return code 2 from
>>>> > org.apache.hadoop.hive.ql.exec.ExecDriver
>>>> >
>>>> > I am facing hard time. Any suggestions
>>>> > Thanks
>>>> >
>>>> > On Thu, Jun 10, 2010 at 3:05 AM, Ashish Thusoo <at...@facebook.com>
>>>> wrote:
>>>> >>
>>>> >> You could load this whole xml file into a table with a single row and
>>>> a
>>>> >> single column. The default record delimiter is \n but you can create
>>>> a table
>>>> >> where the record delimiter is \001. Once you do that you can follow
>>>> the
>>>> >> approach that you described below. Will this solve your problem?
>>>> >>
>>>> >> Ashish
>>>> >> ________________________________
>>>> >> From: Shuja Rehman [mailto:shujamughal@gmail.com]
>>>> >> Sent: Wednesday, June 09, 2010 3:07 PM
>>>> >> To: hive-user@hadoop.apache.org
>>>> >> Subject: Load data from xml using Mapper.py in hive
>>>> >>
>>>> >> Hi
>>>> >> I have created a table in hive (Suppose table1 with two columns, col1
>>>> and
>>>> >> col2 )
>>>> >>
>>>> >> now i have an xml file for which i have write a python script which
>>>> read
>>>> >> the xml file and transform it in single row with tab seperated
>>>> >> e.g the output of python script can be
>>>> >>
>>>> >> row 1 = val1     val2
>>>> >> row2 =  val3     val4
>>>> >>
>>>> >> so the output of file has straight rows with the help of python
>>>> script.
>>>> >> now i want to load this into created table. I have seen the example
>>>> of in
>>>> >> which the data is first loaded in u_data table then transform it
>>>> using
>>>> >> python script in u_data_new but in m scenario. it does not fit as i
>>>> have xml
>>>> >> file as source.
>>>> >>
>>>> >>
>>>> >> Kindly let me know can I achieve this??
>>>> >> Thanks
>>>> >>
>>>> >> --
>>>> >
>>>> > --
>>>> > Regards
>>>> > Baig
>>>> >
>>>> >
>>>>
>>>
>>>
>>>
>>> --
>>> Regards
>>> Shuja-ur-Rehman Baig
>>> _________________________________
>>> MS CS - School of Science and Engineering
>>> Lahore University of Management Sciences (LUMS)
>>> Sector U, DHA, Lahore, 54792, Pakistan
>>> Cell: +92 3214207445
>>>
>>
>>
>>
>> --
>> Regards
>> Shuja-ur-Rehman Baig
>> _________________________________
>> MS CS - School of Science and Engineering
>> Lahore University of Management Sciences (LUMS)
>> Sector U, DHA, Lahore, 54792, Pakistan
>> Cell: +92 3214207445
>>
>
>


-- 
Regards
Shuja-ur-Rehman Baig
_________________________________
MS CS - School of Science and Engineering
Lahore University of Management Sciences (LUMS)
Sector U, DHA, Lahore, 54792, Pakistan
Cell: +92 3214207445

Re: Load data from xml using Mapper.py in hive

Posted by Tomasz Domański <do...@gmail.com>.
Hi Shuja,

the answer seems to be in lines:

Caused by: java.io.IOException: Cannot run program
"sampleMapper.groovy": java.io.IOException: error=2, No such file or
directory


Hadoop can't see this file or can't run it.

1. make sure you added file correctly
2. check if hadoop can run script on your hadoop machines

Can you run this script in console on hadoop machine like

>sampleMapper.groovy

or you runn it:

> groovy sampleMapper.groovy

Mabe you should specify that groovy is needed to run your script.

try  to change your select into: "  ... using 'groovy sampleMapper.groovy'
... "


On 10 June 2010 14:01, Shuja Rehman <sh...@gmail.com> wrote:

> and on the link
>
> http://localhost:50030/jobfailures.jsp?jobid=job_201006101118_0009&kind=map&cause=failed
>
> i have found this output.
>
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmlfile":"*Hello*world"}
>
> 	at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:171)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmlfile":"*Hello*world"}
>
> 	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:417)
> 	at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:153)
> 	... 4 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot initialize ScriptOperator
>
> 	at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:319)
> 	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:456)
> 	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:696)
>
> 	at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> 	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:456)
> 	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:696)
>
> 	at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:45)
> 	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:456)
> 	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:696)
>
> 	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:400)
> 	... 5 more
> Caused by: java.io.IOException: Cannot run program "sampleMapper.groovy": java.io.IOException: error=2, No such file or directory
>
> 	at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
> 	at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279)
> 	... 14 more
> Caused by: java.io.IOException: java.io.IOException: error=2, No such file or directory
>
> 	at java.lang.UNIXProcess.(UNIXProcess.java:148)
> 	at java.lang.ProcessImpl.start(ProcessImpl.java:65)
> 	at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
> 	... 15 more
>
>
>
>
> On Thu, Jun 10, 2010 at 1:57 PM, Shuja Rehman <sh...@gmail.com>wrote:
>
>> I have changes the logging level according to this command
>>
>> *bin/hive -hiveconf hive.root.logger=INFO,console *
>>
>> and the outout is
>>
>>
>> ------------------------------------------------------------------------------------------------------------------------------
>> 10/06/10 13:51:20 INFO parse.ParseDriver: Parsing command: INSERT
>> OVERWRITE TABLE test_new
>>
>> SELECT
>>   TRANSFORM (xmlfile)
>>   USING 'sampleMapper.groovy'
>>   AS (b,c)
>> FROM test
>> 10/06/10 13:51:20 INFO parse.ParseDriver: Parse Completed
>> 10/06/10 13:51:20 INFO parse.SemanticAnalyzer: Starting Semantic Analysis
>> 10/06/10 13:51:20 INFO parse.SemanticAnalyzer: Completed phase 1 of
>> Semantic Analysis
>> 10/06/10 13:51:20 INFO parse.SemanticAnalyzer: Get metadata for source
>> tables
>> 10/06/10 13:51:20 INFO metastore.HiveMetaStore: 0: Opening raw store with
>> implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
>> 10/06/10 13:51:20 INFO metastore.ObjectStore: ObjectStore, initialize
>> called
>> 10/06/10 13:51:20 ERROR DataNucleus.Plugin: Bundle "org.eclipse.jdt.core"
>> requires "org.eclipse.core.resources" but it cannot be resolved.
>> 10/06/10 13:51:20 ERROR DataNucleus.Plugin: Bundle "org.eclipse.jdt.core"
>> requires "org.eclipse.core.runtime" but it cannot be resolved.
>> 10/06/10 13:51:20 ERROR DataNucleus.Plugin: Bundle "org.eclipse.jdt.core"
>> requires "org.eclipse.text" but it cannot be resolved.
>> 10/06/10 13:51:22 INFO metastore.ObjectStore: Initialized ObjectStore
>> 10/06/10 13:51:22 INFO metastore.HiveMetaStore: 0: get_table : db=default
>> tbl=test
>> 10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
>> 10/06/10 13:51:23 INFO parse.SemanticAnalyzer: Get metadata for subqueries
>> 10/06/10 13:51:23 INFO parse.SemanticAnalyzer: Get metadata for
>> destination tables
>> 10/06/10 13:51:23 INFO metastore.HiveMetaStore: 0: get_table : db=default
>> tbl=test_new
>> 10/06/10 13:51:23 INFO hive.log: DDL: struct test_new { string b, string
>> c}
>> 10/06/10 13:51:23 INFO parse.SemanticAnalyzer: Completed getting MetaData
>> in Semantic Analysis
>> 10/06/10 13:51:23 INFO hive.log: DDL: struct test_new { string b, string
>> c}
>> 10/06/10 13:51:23 INFO ppd.OpProcFactory: Processing for FS(3)
>> 10/06/10 13:51:23 INFO ppd.OpProcFactory: Processing for SCR(2)
>> 10/06/10 13:51:23 INFO ppd.OpProcFactory: Processing for SEL(1)
>> 10/06/10 13:51:23 INFO ppd.OpProcFactory: Processing for TS(0)
>> 10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
>> 10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
>> 10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
>> 10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
>> 10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
>> 10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
>> 10/06/10 13:51:23 INFO parse.SemanticAnalyzer: Completed plan generation
>> 10/06/10 13:51:23 INFO ql.Driver: Semantic Analysis Completed
>> 10/06/10 13:51:23 INFO ql.Driver: Returning Hive schema:
>> Schema(fieldSchemas:[FieldSchema(name:b, type:string, comment:null),
>> FieldSchema(name:c, type:string, comment:null)], properties:null)
>> 10/06/10 13:51:23 INFO ql.Driver: query plan =
>> file:/tmp/root/hive_2010-06-10_13-51-20_112_5091815325633732890/queryplan.xml
>> 10/06/10 13:51:24 INFO ql.Driver: Starting command: INSERT OVERWRITE TABLE
>> test_new
>>
>> SELECT
>>   TRANSFORM (xmlfile)
>>   USING 'sampleMapper.groovy'
>>   AS (b,c)
>> FROM test
>> Total MapReduce jobs = 2
>> 10/06/10 13:51:24 INFO ql.Driver: Total MapReduce jobs = 2
>> Launching Job 1 out of 2
>> 10/06/10 13:51:24 INFO ql.Driver: Launching Job 1 out of 2
>> Number of reduce tasks is set to 0 since there's no reduce operator
>> 10/06/10 13:51:24 INFO exec.ExecDriver: Number of reduce tasks is set to 0
>> since there's no reduce operator
>> 10/06/10 13:51:24 INFO exec.ExecDriver: Using
>> org.apache.hadoop.hive.ql.io.HiveInputFormat
>> 10/06/10 13:51:24 INFO exec.ExecDriver: Processing alias test
>> 10/06/10 13:51:24 INFO exec.ExecDriver: Adding input file
>> hdfs://localhost:9000/user/hive/warehouse/test
>> 10/06/10 13:51:24 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 10/06/10 13:51:24 INFO mapred.FileInputFormat: Total input paths to
>> process : 1
>> Starting Job = job_201006101118_0009, Tracking URL =
>> http://localhost:50030/jobdetails.jsp?jobid=job_201006101118_0009
>> 10/06/10 13:51:25 INFO exec.ExecDriver: Starting Job =
>> job_201006101118_0009, Tracking URL =
>> http://localhost:50030/jobdetails.jsp?jobid=job_201006101118_0009
>> Kill Command = /usr/local/hadoop/hadoop-0.20.2/bin/../bin/hadoop job
>> -Dmapred.job.tracker=localhost:9001 -kill job_201006101118_0009
>> 10/06/10 13:51:25 INFO exec.ExecDriver: Kill Command =
>> /usr/local/hadoop/hadoop-0.20.2/bin/../bin/hadoop job
>> -Dmapred.job.tracker=localhost:9001 -kill job_201006101118_0009
>> 2010-06-10 13:51:32,255 Stage-1 map = 0%,  reduce = 0%
>> 10/06/10 13:51:32 INFO exec.ExecDriver: 2010-06-10 13:51:32,255 Stage-1
>> map = 0%,  reduce = 0%
>> 2010-06-10 13:51:35,305 Stage-1 map = 50%,  reduce = 0%
>> 10/06/10 13:51:35 INFO exec.ExecDriver: 2010-06-10 13:51:35,305 Stage-1
>> map = 50%,  reduce = 0%
>> 2010-06-10 13:51:58,505 Stage-1 map = 100%,  reduce = 100%
>> 10/06/10 13:51:58 INFO exec.ExecDriver: 2010-06-10 13:51:58,505 Stage-1
>> map = 100%,  reduce = 100%
>> Ended Job = job_201006101118_0009 with errors
>> 10/06/10 13:51:58 ERROR exec.ExecDriver: Ended Job = job_201006101118_0009
>> with errors
>>
>> Task with the most failures(4):
>> -----
>> Task ID:
>>   task_201006101118_0009_m_000000
>>
>> URL:
>>
>> http://localhost:50030/taskdetails.jsp?jobid=job_201006101118_0009&tipid=task_201006101118_0009_m_000000
>> -----
>>
>> 10/06/10 13:51:58 ERROR exec.ExecDriver:
>> Task with the most failures(4):
>> -----
>> Task ID:
>>   task_201006101118_0009_m_000000
>>
>> URL:
>>
>> http://localhost:50030/taskdetails.jsp?jobid=job_201006101118_0009&tipid=task_201006101118_0009_m_000000
>> -----
>>
>>
>> FAILED: Execution Error, return code 2 from
>> org.apache.hadoop.hive.ql.exec.ExecDriver
>> 10/06/10 13:51:58 ERROR ql.Driver: FAILED: Execution Error, return code 2
>> from org.apache.hadoop.hive.ql.exec.ExecDriver
>>
>>
>> -----------------------------------------------------------------------------------------------------------------------------
>>
>> Any clue???
>>
>>
>> On Thu, Jun 10, 2010 at 1:43 PM, Sonal Goyal <so...@gmail.com>wrote:
>>
>>> Can you try changing your logging level to debug and see the exact
>>> error message in hive.log?
>>>
>>> Thanks and Regards,
>>> Sonal
>>> www.meghsoft.com
>>> http://in.linkedin.com/in/sonalgoyal
>>>
>>>
>>>
>>> On Thu, Jun 10, 2010 at 5:07 PM, Shuja Rehman <sh...@gmail.com>
>>> wrote:
>>> > Hi
>>> > I have try to do as you described. Let me explain in steps.
>>> >
>>> > 1- create table test (xmlFile String);
>>> >
>>> ----------------------------------------------------------------------------------
>>> >
>>> > 2-LOAD DATA LOCAL INPATH '1.xml'
>>> > OVERWRITE INTO TABLE test;
>>> >
>>> ----------------------------------------------------------------------------------
>>> >
>>> > 3-CREATE TABLE test_new (
>>> >     b STRING,
>>> >     c STRING
>>> >   )
>>> > ROW FORMAT DELIMITED
>>> > FIELDS TERMINATED BY '\t';
>>> >
>>> >
>>> ----------------------------------------------------------------------------------
>>> > 4-add FILE sampleMapper.groovy;
>>> >
>>> ----------------------------------------------------------------------------------
>>> > 5- INSERT OVERWRITE TABLE test_new
>>> > SELECT
>>> >   TRANSFORM (xmlfile)
>>> >   USING 'sampleMapper.groovy'
>>> >   AS (b,c)
>>> > FROM test;
>>> >
>>> ----------------------------------------------------------------------------------
>>> > XML FILE:
>>> > xml file has only one row for testing purpose which is
>>> >
>>> > <xy><a><b>Hello</b><c>world</c></a></xy>
>>> >
>>> ----------------------------------------------------------------------------------
>>> > MAPPER
>>> > and i have write the mapper in groovy to parse it. the mapper is
>>> >
>>> >    def xmlData =""
>>> >  System.in.withReader {
>>> >         xmlData=xmlData+ it.readLine()
>>> > }
>>> >
>>> > def xy = new XmlParser().parseText(xmlData)
>>> > def b=xy.a.b.text()
>>> >     def c=xy.a.c.text()
>>> >     println  ([b,c].join('\t') )
>>> >
>>> ----------------------------------------------------------------------------------
>>> > Now step 1-4 are fine but when i perform step 5 which will load the
>>> data
>>> > from test table to new table using mapper, it throws the error. The
>>> error on
>>> > console is
>>> >
>>> > FAILED: Execution Error, return code 2 from
>>> > org.apache.hadoop.hive.ql.exec.ExecDriver
>>> >
>>> > I am facing hard time. Any suggestions
>>> > Thanks
>>> >
>>> > On Thu, Jun 10, 2010 at 3:05 AM, Ashish Thusoo <at...@facebook.com>
>>> wrote:
>>> >>
>>> >> You could load this whole xml file into a table with a single row and
>>> a
>>> >> single column. The default record delimiter is \n but you can create a
>>> table
>>> >> where the record delimiter is \001. Once you do that you can follow
>>> the
>>> >> approach that you described below. Will this solve your problem?
>>> >>
>>> >> Ashish
>>> >> ________________________________
>>> >> From: Shuja Rehman [mailto:shujamughal@gmail.com]
>>> >> Sent: Wednesday, June 09, 2010 3:07 PM
>>> >> To: hive-user@hadoop.apache.org
>>> >> Subject: Load data from xml using Mapper.py in hive
>>> >>
>>> >> Hi
>>> >> I have created a table in hive (Suppose table1 with two columns, col1
>>> and
>>> >> col2 )
>>> >>
>>> >> now i have an xml file for which i have write a python script which
>>> read
>>> >> the xml file and transform it in single row with tab seperated
>>> >> e.g the output of python script can be
>>> >>
>>> >> row 1 = val1     val2
>>> >> row2 =  val3     val4
>>> >>
>>> >> so the output of file has straight rows with the help of python
>>> script.
>>> >> now i want to load this into created table. I have seen the example of
>>> in
>>> >> which the data is first loaded in u_data table then transform it using
>>> >> python script in u_data_new but in m scenario. it does not fit as i
>>> have xml
>>> >> file as source.
>>> >>
>>> >>
>>> >> Kindly let me know can I achieve this??
>>> >> Thanks
>>> >>
>>> >> --
>>> >
>>> > --
>>> > Regards
>>> > Baig
>>> >
>>> >
>>>
>>
>>
>>
>> --
>> Regards
>> Shuja-ur-Rehman Baig
>> _________________________________
>> MS CS - School of Science and Engineering
>> Lahore University of Management Sciences (LUMS)
>> Sector U, DHA, Lahore, 54792, Pakistan
>> Cell: +92 3214207445
>>
>
>
>
> --
> Regards
> Shuja-ur-Rehman Baig
> _________________________________
> MS CS - School of Science and Engineering
> Lahore University of Management Sciences (LUMS)
> Sector U, DHA, Lahore, 54792, Pakistan
> Cell: +92 3214207445
>

Re: Load data from xml using Mapper.py in hive

Posted by Shuja Rehman <sh...@gmail.com>.
and on the link
http://localhost:50030/jobfailures.jsp?jobid=job_201006101118_0009&kind=map&cause=failed

i have found this output.

java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
while processing row {"xmlfile":"*Hello*world"}
	at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:171)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive
Runtime Error while processing row {"xmlfile":"*Hello*world"}
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:417)
	at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:153)
	... 4 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot
initialize ScriptOperator
	at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:319)
	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:456)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:696)
	at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:456)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:696)
	at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:45)
	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:456)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:696)
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:400)
	... 5 more
Caused by: java.io.IOException: Cannot run program
"sampleMapper.groovy": java.io.IOException: error=2, No such file or
directory
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
	at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279)
	... 14 more
Caused by: java.io.IOException: java.io.IOException: error=2, No such
file or directory
	at java.lang.UNIXProcess.(UNIXProcess.java:148)
	at java.lang.ProcessImpl.start(ProcessImpl.java:65)
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
	... 15 more




On Thu, Jun 10, 2010 at 1:57 PM, Shuja Rehman <sh...@gmail.com> wrote:

> I have changes the logging level according to this command
>
> *bin/hive -hiveconf hive.root.logger=INFO,console *
>
> and the outout is
>
>
> ------------------------------------------------------------------------------------------------------------------------------
> 10/06/10 13:51:20 INFO parse.ParseDriver: Parsing command: INSERT OVERWRITE
> TABLE test_new
>
> SELECT
>   TRANSFORM (xmlfile)
>   USING 'sampleMapper.groovy'
>   AS (b,c)
> FROM test
> 10/06/10 13:51:20 INFO parse.ParseDriver: Parse Completed
> 10/06/10 13:51:20 INFO parse.SemanticAnalyzer: Starting Semantic Analysis
> 10/06/10 13:51:20 INFO parse.SemanticAnalyzer: Completed phase 1 of
> Semantic Analysis
> 10/06/10 13:51:20 INFO parse.SemanticAnalyzer: Get metadata for source
> tables
> 10/06/10 13:51:20 INFO metastore.HiveMetaStore: 0: Opening raw store with
> implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
> 10/06/10 13:51:20 INFO metastore.ObjectStore: ObjectStore, initialize
> called
> 10/06/10 13:51:20 ERROR DataNucleus.Plugin: Bundle "org.eclipse.jdt.core"
> requires "org.eclipse.core.resources" but it cannot be resolved.
> 10/06/10 13:51:20 ERROR DataNucleus.Plugin: Bundle "org.eclipse.jdt.core"
> requires "org.eclipse.core.runtime" but it cannot be resolved.
> 10/06/10 13:51:20 ERROR DataNucleus.Plugin: Bundle "org.eclipse.jdt.core"
> requires "org.eclipse.text" but it cannot be resolved.
> 10/06/10 13:51:22 INFO metastore.ObjectStore: Initialized ObjectStore
> 10/06/10 13:51:22 INFO metastore.HiveMetaStore: 0: get_table : db=default
> tbl=test
> 10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
> 10/06/10 13:51:23 INFO parse.SemanticAnalyzer: Get metadata for subqueries
> 10/06/10 13:51:23 INFO parse.SemanticAnalyzer: Get metadata for destination
> tables
> 10/06/10 13:51:23 INFO metastore.HiveMetaStore: 0: get_table : db=default
> tbl=test_new
> 10/06/10 13:51:23 INFO hive.log: DDL: struct test_new { string b, string c}
> 10/06/10 13:51:23 INFO parse.SemanticAnalyzer: Completed getting MetaData
> in Semantic Analysis
> 10/06/10 13:51:23 INFO hive.log: DDL: struct test_new { string b, string c}
> 10/06/10 13:51:23 INFO ppd.OpProcFactory: Processing for FS(3)
> 10/06/10 13:51:23 INFO ppd.OpProcFactory: Processing for SCR(2)
> 10/06/10 13:51:23 INFO ppd.OpProcFactory: Processing for SEL(1)
> 10/06/10 13:51:23 INFO ppd.OpProcFactory: Processing for TS(0)
> 10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
> 10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
> 10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
> 10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
> 10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
> 10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
> 10/06/10 13:51:23 INFO parse.SemanticAnalyzer: Completed plan generation
> 10/06/10 13:51:23 INFO ql.Driver: Semantic Analysis Completed
> 10/06/10 13:51:23 INFO ql.Driver: Returning Hive schema:
> Schema(fieldSchemas:[FieldSchema(name:b, type:string, comment:null),
> FieldSchema(name:c, type:string, comment:null)], properties:null)
> 10/06/10 13:51:23 INFO ql.Driver: query plan =
> file:/tmp/root/hive_2010-06-10_13-51-20_112_5091815325633732890/queryplan.xml
> 10/06/10 13:51:24 INFO ql.Driver: Starting command: INSERT OVERWRITE TABLE
> test_new
>
> SELECT
>   TRANSFORM (xmlfile)
>   USING 'sampleMapper.groovy'
>   AS (b,c)
> FROM test
> Total MapReduce jobs = 2
> 10/06/10 13:51:24 INFO ql.Driver: Total MapReduce jobs = 2
> Launching Job 1 out of 2
> 10/06/10 13:51:24 INFO ql.Driver: Launching Job 1 out of 2
> Number of reduce tasks is set to 0 since there's no reduce operator
> 10/06/10 13:51:24 INFO exec.ExecDriver: Number of reduce tasks is set to 0
> since there's no reduce operator
> 10/06/10 13:51:24 INFO exec.ExecDriver: Using
> org.apache.hadoop.hive.ql.io.HiveInputFormat
> 10/06/10 13:51:24 INFO exec.ExecDriver: Processing alias test
> 10/06/10 13:51:24 INFO exec.ExecDriver: Adding input file
> hdfs://localhost:9000/user/hive/warehouse/test
> 10/06/10 13:51:24 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 10/06/10 13:51:24 INFO mapred.FileInputFormat: Total input paths to process
> : 1
> Starting Job = job_201006101118_0009, Tracking URL =
> http://localhost:50030/jobdetails.jsp?jobid=job_201006101118_0009
> 10/06/10 13:51:25 INFO exec.ExecDriver: Starting Job =
> job_201006101118_0009, Tracking URL =
> http://localhost:50030/jobdetails.jsp?jobid=job_201006101118_0009
> Kill Command = /usr/local/hadoop/hadoop-0.20.2/bin/../bin/hadoop job
> -Dmapred.job.tracker=localhost:9001 -kill job_201006101118_0009
> 10/06/10 13:51:25 INFO exec.ExecDriver: Kill Command =
> /usr/local/hadoop/hadoop-0.20.2/bin/../bin/hadoop job
> -Dmapred.job.tracker=localhost:9001 -kill job_201006101118_0009
> 2010-06-10 13:51:32,255 Stage-1 map = 0%,  reduce = 0%
> 10/06/10 13:51:32 INFO exec.ExecDriver: 2010-06-10 13:51:32,255 Stage-1 map
> = 0%,  reduce = 0%
> 2010-06-10 13:51:35,305 Stage-1 map = 50%,  reduce = 0%
> 10/06/10 13:51:35 INFO exec.ExecDriver: 2010-06-10 13:51:35,305 Stage-1 map
> = 50%,  reduce = 0%
> 2010-06-10 13:51:58,505 Stage-1 map = 100%,  reduce = 100%
> 10/06/10 13:51:58 INFO exec.ExecDriver: 2010-06-10 13:51:58,505 Stage-1 map
> = 100%,  reduce = 100%
> Ended Job = job_201006101118_0009 with errors
> 10/06/10 13:51:58 ERROR exec.ExecDriver: Ended Job = job_201006101118_0009
> with errors
>
> Task with the most failures(4):
> -----
> Task ID:
>   task_201006101118_0009_m_000000
>
> URL:
>
> http://localhost:50030/taskdetails.jsp?jobid=job_201006101118_0009&tipid=task_201006101118_0009_m_000000
> -----
>
> 10/06/10 13:51:58 ERROR exec.ExecDriver:
> Task with the most failures(4):
> -----
> Task ID:
>   task_201006101118_0009_m_000000
>
> URL:
>
> http://localhost:50030/taskdetails.jsp?jobid=job_201006101118_0009&tipid=task_201006101118_0009_m_000000
> -----
>
>
> FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.ExecDriver
> 10/06/10 13:51:58 ERROR ql.Driver: FAILED: Execution Error, return code 2
> from org.apache.hadoop.hive.ql.exec.ExecDriver
>
>
> -----------------------------------------------------------------------------------------------------------------------------
>
> Any clue???
>
>
> On Thu, Jun 10, 2010 at 1:43 PM, Sonal Goyal <so...@gmail.com>wrote:
>
>> Can you try changing your logging level to debug and see the exact
>> error message in hive.log?
>>
>> Thanks and Regards,
>> Sonal
>> www.meghsoft.com
>> http://in.linkedin.com/in/sonalgoyal
>>
>>
>>
>> On Thu, Jun 10, 2010 at 5:07 PM, Shuja Rehman <sh...@gmail.com>
>> wrote:
>> > Hi
>> > I have try to do as you described. Let me explain in steps.
>> >
>> > 1- create table test (xmlFile String);
>> >
>> ----------------------------------------------------------------------------------
>> >
>> > 2-LOAD DATA LOCAL INPATH '1.xml'
>> > OVERWRITE INTO TABLE test;
>> >
>> ----------------------------------------------------------------------------------
>> >
>> > 3-CREATE TABLE test_new (
>> >     b STRING,
>> >     c STRING
>> >   )
>> > ROW FORMAT DELIMITED
>> > FIELDS TERMINATED BY '\t';
>> >
>> >
>> ----------------------------------------------------------------------------------
>> > 4-add FILE sampleMapper.groovy;
>> >
>> ----------------------------------------------------------------------------------
>> > 5- INSERT OVERWRITE TABLE test_new
>> > SELECT
>> >   TRANSFORM (xmlfile)
>> >   USING 'sampleMapper.groovy'
>> >   AS (b,c)
>> > FROM test;
>> >
>> ----------------------------------------------------------------------------------
>> > XML FILE:
>> > xml file has only one row for testing purpose which is
>> >
>> > <xy><a><b>Hello</b><c>world</c></a></xy>
>> >
>> ----------------------------------------------------------------------------------
>> > MAPPER
>> > and i have write the mapper in groovy to parse it. the mapper is
>> >
>> >    def xmlData =""
>> >  System.in.withReader {
>> >         xmlData=xmlData+ it.readLine()
>> > }
>> >
>> > def xy = new XmlParser().parseText(xmlData)
>> > def b=xy.a.b.text()
>> >     def c=xy.a.c.text()
>> >     println  ([b,c].join('\t') )
>> >
>> ----------------------------------------------------------------------------------
>> > Now step 1-4 are fine but when i perform step 5 which will load the data
>> > from test table to new table using mapper, it throws the error. The
>> error on
>> > console is
>> >
>> > FAILED: Execution Error, return code 2 from
>> > org.apache.hadoop.hive.ql.exec.ExecDriver
>> >
>> > I am facing hard time. Any suggestions
>> > Thanks
>> >
>> > On Thu, Jun 10, 2010 at 3:05 AM, Ashish Thusoo <at...@facebook.com>
>> wrote:
>> >>
>> >> You could load this whole xml file into a table with a single row and a
>> >> single column. The default record delimiter is \n but you can create a
>> table
>> >> where the record delimiter is \001. Once you do that you can follow the
>> >> approach that you described below. Will this solve your problem?
>> >>
>> >> Ashish
>> >> ________________________________
>> >> From: Shuja Rehman [mailto:shujamughal@gmail.com]
>> >> Sent: Wednesday, June 09, 2010 3:07 PM
>> >> To: hive-user@hadoop.apache.org
>> >> Subject: Load data from xml using Mapper.py in hive
>> >>
>> >> Hi
>> >> I have created a table in hive (Suppose table1 with two columns, col1
>> and
>> >> col2 )
>> >>
>> >> now i have an xml file for which i have write a python script which
>> read
>> >> the xml file and transform it in single row with tab seperated
>> >> e.g the output of python script can be
>> >>
>> >> row 1 = val1     val2
>> >> row2 =  val3     val4
>> >>
>> >> so the output of file has straight rows with the help of python script.
>> >> now i want to load this into created table. I have seen the example of
>> in
>> >> which the data is first loaded in u_data table then transform it using
>> >> python script in u_data_new but in m scenario. it does not fit as i
>> have xml
>> >> file as source.
>> >>
>> >>
>> >> Kindly let me know can I achieve this??
>> >> Thanks
>> >>
>> >> --
>> >
>> > --
>> > Regards
>> > Baig
>> >
>> >
>>
>
>
>
> --
> Regards
> Shuja-ur-Rehman Baig
> _________________________________
> MS CS - School of Science and Engineering
> Lahore University of Management Sciences (LUMS)
> Sector U, DHA, Lahore, 54792, Pakistan
> Cell: +92 3214207445
>



-- 
Regards
Shuja-ur-Rehman Baig
_________________________________
MS CS - School of Science and Engineering
Lahore University of Management Sciences (LUMS)
Sector U, DHA, Lahore, 54792, Pakistan
Cell: +92 3214207445

Re: Load data from xml using Mapper.py in hive

Posted by Shuja Rehman <sh...@gmail.com>.
I have changes the logging level according to this command

*bin/hive -hiveconf hive.root.logger=INFO,console *

and the outout is

------------------------------------------------------------------------------------------------------------------------------
10/06/10 13:51:20 INFO parse.ParseDriver: Parsing command: INSERT OVERWRITE
TABLE test_new
SELECT
  TRANSFORM (xmlfile)
  USING 'sampleMapper.groovy'
  AS (b,c)
FROM test
10/06/10 13:51:20 INFO parse.ParseDriver: Parse Completed
10/06/10 13:51:20 INFO parse.SemanticAnalyzer: Starting Semantic Analysis
10/06/10 13:51:20 INFO parse.SemanticAnalyzer: Completed phase 1 of Semantic
Analysis
10/06/10 13:51:20 INFO parse.SemanticAnalyzer: Get metadata for source
tables
10/06/10 13:51:20 INFO metastore.HiveMetaStore: 0: Opening raw store with
implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
10/06/10 13:51:20 INFO metastore.ObjectStore: ObjectStore, initialize called
10/06/10 13:51:20 ERROR DataNucleus.Plugin: Bundle "org.eclipse.jdt.core"
requires "org.eclipse.core.resources" but it cannot be resolved.
10/06/10 13:51:20 ERROR DataNucleus.Plugin: Bundle "org.eclipse.jdt.core"
requires "org.eclipse.core.runtime" but it cannot be resolved.
10/06/10 13:51:20 ERROR DataNucleus.Plugin: Bundle "org.eclipse.jdt.core"
requires "org.eclipse.text" but it cannot be resolved.
10/06/10 13:51:22 INFO metastore.ObjectStore: Initialized ObjectStore
10/06/10 13:51:22 INFO metastore.HiveMetaStore: 0: get_table : db=default
tbl=test
10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
10/06/10 13:51:23 INFO parse.SemanticAnalyzer: Get metadata for subqueries
10/06/10 13:51:23 INFO parse.SemanticAnalyzer: Get metadata for destination
tables
10/06/10 13:51:23 INFO metastore.HiveMetaStore: 0: get_table : db=default
tbl=test_new
10/06/10 13:51:23 INFO hive.log: DDL: struct test_new { string b, string c}
10/06/10 13:51:23 INFO parse.SemanticAnalyzer: Completed getting MetaData in
Semantic Analysis
10/06/10 13:51:23 INFO hive.log: DDL: struct test_new { string b, string c}
10/06/10 13:51:23 INFO ppd.OpProcFactory: Processing for FS(3)
10/06/10 13:51:23 INFO ppd.OpProcFactory: Processing for SCR(2)
10/06/10 13:51:23 INFO ppd.OpProcFactory: Processing for SEL(1)
10/06/10 13:51:23 INFO ppd.OpProcFactory: Processing for TS(0)
10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
10/06/10 13:51:23 INFO parse.SemanticAnalyzer: Completed plan generation
10/06/10 13:51:23 INFO ql.Driver: Semantic Analysis Completed
10/06/10 13:51:23 INFO ql.Driver: Returning Hive schema:
Schema(fieldSchemas:[FieldSchema(name:b, type:string, comment:null),
FieldSchema(name:c, type:string, comment:null)], properties:null)
10/06/10 13:51:23 INFO ql.Driver: query plan =
file:/tmp/root/hive_2010-06-10_13-51-20_112_5091815325633732890/queryplan.xml
10/06/10 13:51:24 INFO ql.Driver: Starting command: INSERT OVERWRITE TABLE
test_new
SELECT
  TRANSFORM (xmlfile)
  USING 'sampleMapper.groovy'
  AS (b,c)
FROM test
Total MapReduce jobs = 2
10/06/10 13:51:24 INFO ql.Driver: Total MapReduce jobs = 2
Launching Job 1 out of 2
10/06/10 13:51:24 INFO ql.Driver: Launching Job 1 out of 2
Number of reduce tasks is set to 0 since there's no reduce operator
10/06/10 13:51:24 INFO exec.ExecDriver: Number of reduce tasks is set to 0
since there's no reduce operator
10/06/10 13:51:24 INFO exec.ExecDriver: Using
org.apache.hadoop.hive.ql.io.HiveInputFormat
10/06/10 13:51:24 INFO exec.ExecDriver: Processing alias test
10/06/10 13:51:24 INFO exec.ExecDriver: Adding input file
hdfs://localhost:9000/user/hive/warehouse/test
10/06/10 13:51:24 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
10/06/10 13:51:24 INFO mapred.FileInputFormat: Total input paths to process
: 1
Starting Job = job_201006101118_0009, Tracking URL =
http://localhost:50030/jobdetails.jsp?jobid=job_201006101118_0009
10/06/10 13:51:25 INFO exec.ExecDriver: Starting Job =
job_201006101118_0009, Tracking URL =
http://localhost:50030/jobdetails.jsp?jobid=job_201006101118_0009
Kill Command = /usr/local/hadoop/hadoop-0.20.2/bin/../bin/hadoop job
-Dmapred.job.tracker=localhost:9001 -kill job_201006101118_0009
10/06/10 13:51:25 INFO exec.ExecDriver: Kill Command =
/usr/local/hadoop/hadoop-0.20.2/bin/../bin/hadoop job
-Dmapred.job.tracker=localhost:9001 -kill job_201006101118_0009
2010-06-10 13:51:32,255 Stage-1 map = 0%,  reduce = 0%
10/06/10 13:51:32 INFO exec.ExecDriver: 2010-06-10 13:51:32,255 Stage-1 map
= 0%,  reduce = 0%
2010-06-10 13:51:35,305 Stage-1 map = 50%,  reduce = 0%
10/06/10 13:51:35 INFO exec.ExecDriver: 2010-06-10 13:51:35,305 Stage-1 map
= 50%,  reduce = 0%
2010-06-10 13:51:58,505 Stage-1 map = 100%,  reduce = 100%
10/06/10 13:51:58 INFO exec.ExecDriver: 2010-06-10 13:51:58,505 Stage-1 map
= 100%,  reduce = 100%
Ended Job = job_201006101118_0009 with errors
10/06/10 13:51:58 ERROR exec.ExecDriver: Ended Job = job_201006101118_0009
with errors

Task with the most failures(4):
-----
Task ID:
  task_201006101118_0009_m_000000

URL:

http://localhost:50030/taskdetails.jsp?jobid=job_201006101118_0009&tipid=task_201006101118_0009_m_000000
-----

10/06/10 13:51:58 ERROR exec.ExecDriver:
Task with the most failures(4):
-----
Task ID:
  task_201006101118_0009_m_000000

URL:

http://localhost:50030/taskdetails.jsp?jobid=job_201006101118_0009&tipid=task_201006101118_0009_m_000000
-----

FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.ExecDriver
10/06/10 13:51:58 ERROR ql.Driver: FAILED: Execution Error, return code 2
from org.apache.hadoop.hive.ql.exec.ExecDriver

-----------------------------------------------------------------------------------------------------------------------------

Any clue???

On Thu, Jun 10, 2010 at 1:43 PM, Sonal Goyal <so...@gmail.com> wrote:

> Can you try changing your logging level to debug and see the exact
> error message in hive.log?
>
> Thanks and Regards,
> Sonal
> www.meghsoft.com
> http://in.linkedin.com/in/sonalgoyal
>
>
>
> On Thu, Jun 10, 2010 at 5:07 PM, Shuja Rehman <sh...@gmail.com>
> wrote:
> > Hi
> > I have try to do as you described. Let me explain in steps.
> >
> > 1- create table test (xmlFile String);
> >
> ----------------------------------------------------------------------------------
> >
> > 2-LOAD DATA LOCAL INPATH '1.xml'
> > OVERWRITE INTO TABLE test;
> >
> ----------------------------------------------------------------------------------
> >
> > 3-CREATE TABLE test_new (
> >     b STRING,
> >     c STRING
> >   )
> > ROW FORMAT DELIMITED
> > FIELDS TERMINATED BY '\t';
> >
> >
> ----------------------------------------------------------------------------------
> > 4-add FILE sampleMapper.groovy;
> >
> ----------------------------------------------------------------------------------
> > 5- INSERT OVERWRITE TABLE test_new
> > SELECT
> >   TRANSFORM (xmlfile)
> >   USING 'sampleMapper.groovy'
> >   AS (b,c)
> > FROM test;
> >
> ----------------------------------------------------------------------------------
> > XML FILE:
> > xml file has only one row for testing purpose which is
> >
> > <xy><a><b>Hello</b><c>world</c></a></xy>
> >
> ----------------------------------------------------------------------------------
> > MAPPER
> > and i have write the mapper in groovy to parse it. the mapper is
> >
> >    def xmlData =""
> >  System.in.withReader {
> >         xmlData=xmlData+ it.readLine()
> > }
> >
> > def xy = new XmlParser().parseText(xmlData)
> > def b=xy.a.b.text()
> >     def c=xy.a.c.text()
> >     println  ([b,c].join('\t') )
> >
> ----------------------------------------------------------------------------------
> > Now step 1-4 are fine but when i perform step 5 which will load the data
> > from test table to new table using mapper, it throws the error. The error
> on
> > console is
> >
> > FAILED: Execution Error, return code 2 from
> > org.apache.hadoop.hive.ql.exec.ExecDriver
> >
> > I am facing hard time. Any suggestions
> > Thanks
> >
> > On Thu, Jun 10, 2010 at 3:05 AM, Ashish Thusoo <at...@facebook.com>
> wrote:
> >>
> >> You could load this whole xml file into a table with a single row and a
> >> single column. The default record delimiter is \n but you can create a
> table
> >> where the record delimiter is \001. Once you do that you can follow the
> >> approach that you described below. Will this solve your problem?
> >>
> >> Ashish
> >> ________________________________
> >> From: Shuja Rehman [mailto:shujamughal@gmail.com]
> >> Sent: Wednesday, June 09, 2010 3:07 PM
> >> To: hive-user@hadoop.apache.org
> >> Subject: Load data from xml using Mapper.py in hive
> >>
> >> Hi
> >> I have created a table in hive (Suppose table1 with two columns, col1
> and
> >> col2 )
> >>
> >> now i have an xml file for which i have write a python script which read
> >> the xml file and transform it in single row with tab seperated
> >> e.g the output of python script can be
> >>
> >> row 1 = val1     val2
> >> row2 =  val3     val4
> >>
> >> so the output of file has straight rows with the help of python script.
> >> now i want to load this into created table. I have seen the example of
> in
> >> which the data is first loaded in u_data table then transform it using
> >> python script in u_data_new but in m scenario. it does not fit as i have
> xml
> >> file as source.
> >>
> >>
> >> Kindly let me know can I achieve this??
> >> Thanks
> >>
> >> --
> >
> > --
> > Regards
> > Baig
> >
> >
>



-- 
Regards
Shuja-ur-Rehman Baig
_________________________________
MS CS - School of Science and Engineering
Lahore University of Management Sciences (LUMS)
Sector U, DHA, Lahore, 54792, Pakistan
Cell: +92 3214207445

Re: Load data from xml using Mapper.py in hive

Posted by Sonal Goyal <so...@gmail.com>.
Can you try changing your logging level to debug and see the exact
error message in hive.log?

Thanks and Regards,
Sonal
www.meghsoft.com
http://in.linkedin.com/in/sonalgoyal



On Thu, Jun 10, 2010 at 5:07 PM, Shuja Rehman <sh...@gmail.com> wrote:
> Hi
> I have try to do as you described. Let me explain in steps.
>
> 1- create table test (xmlFile String);
> ----------------------------------------------------------------------------------
>
> 2-LOAD DATA LOCAL INPATH '1.xml'
> OVERWRITE INTO TABLE test;
> ----------------------------------------------------------------------------------
>
> 3-CREATE TABLE test_new (
>     b STRING,
>     c STRING
>   )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t';
>
> ----------------------------------------------------------------------------------
> 4-add FILE sampleMapper.groovy;
> ----------------------------------------------------------------------------------
> 5- INSERT OVERWRITE TABLE test_new
> SELECT
>   TRANSFORM (xmlfile)
>   USING 'sampleMapper.groovy'
>   AS (b,c)
> FROM test;
> ----------------------------------------------------------------------------------
> XML FILE:
> xml file has only one row for testing purpose which is
>
> <xy><a><b>Hello</b><c>world</c></a></xy>
> ----------------------------------------------------------------------------------
> MAPPER
> and i have write the mapper in groovy to parse it. the mapper is
>
>    def xmlData =""
>  System.in.withReader {
>         xmlData=xmlData+ it.readLine()
> }
>
> def xy = new XmlParser().parseText(xmlData)
> def b=xy.a.b.text()
>     def c=xy.a.c.text()
>     println  ([b,c].join('\t') )
> ----------------------------------------------------------------------------------
> Now step 1-4 are fine but when i perform step 5 which will load the data
> from test table to new table using mapper, it throws the error. The error on
> console is
>
> FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.ExecDriver
>
> I am facing hard time. Any suggestions
> Thanks
>
> On Thu, Jun 10, 2010 at 3:05 AM, Ashish Thusoo <at...@facebook.com> wrote:
>>
>> You could load this whole xml file into a table with a single row and a
>> single column. The default record delimiter is \n but you can create a table
>> where the record delimiter is \001. Once you do that you can follow the
>> approach that you described below. Will this solve your problem?
>>
>> Ashish
>> ________________________________
>> From: Shuja Rehman [mailto:shujamughal@gmail.com]
>> Sent: Wednesday, June 09, 2010 3:07 PM
>> To: hive-user@hadoop.apache.org
>> Subject: Load data from xml using Mapper.py in hive
>>
>> Hi
>> I have created a table in hive (Suppose table1 with two columns, col1 and
>> col2 )
>>
>> now i have an xml file for which i have write a python script which read
>> the xml file and transform it in single row with tab seperated
>> e.g the output of python script can be
>>
>> row 1 = val1     val2
>> row2 =  val3     val4
>>
>> so the output of file has straight rows with the help of python script.
>> now i want to load this into created table. I have seen the example of in
>> which the data is first loaded in u_data table then transform it using
>> python script in u_data_new but in m scenario. it does not fit as i have xml
>> file as source.
>>
>>
>> Kindly let me know can I achieve this??
>> Thanks
>>
>> --
>
> --
> Regards
> Baig
>
>

Re: Load data from xml using Mapper.py in hive

Posted by Shuja Rehman <sh...@gmail.com>.
Hi
I have try to do as you described. Let me explain in steps.

1- create table test (xmlFile String);
----------------------------------------------------------------------------------

2-LOAD DATA LOCAL INPATH '1.xml'
OVERWRITE INTO TABLE test;
----------------------------------------------------------------------------------

3-CREATE TABLE test_new (
    b STRING,
    c STRING
  )
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t';

----------------------------------------------------------------------------------
4-add FILE sampleMapper.groovy;
----------------------------------------------------------------------------------
5- INSERT OVERWRITE TABLE test_new
SELECT
  TRANSFORM (xmlfile)
  USING 'sampleMapper.groovy'
  AS (b,c)
FROM test;
----------------------------------------------------------------------------------
*XML FILE*:
xml file has only one row for testing purpose which is

<xy><a><b>Hello</b><c>world</c></a></xy>
----------------------------------------------------------------------------------
*MAPPER*
and i have write the mapper in groovy to parse it. the mapper is

   def xmlData =""
 System.in.withReader {
        xmlData=xmlData+ it.readLine()
}

def xy = new XmlParser().parseText(xmlData)
def b=xy.a.b.text()
    def c=xy.a.c.text()
    println  ([b,c].join('\t') )
----------------------------------------------------------------------------------
Now step 1-4 are fine but when i perform step 5 which will load the data
from test table to new table using mapper, it throws the error. The error on
console is

*FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.ExecDriver*

I am facing hard time. Any suggestions
Thanks

On Thu, Jun 10, 2010 at 3:05 AM, Ashish Thusoo <at...@facebook.com> wrote:

>  You could load this whole xml file into a table with a single row and a
> single column. The default record delimiter is \n but you can create a table
> where the record delimiter is \001. Once you do that you can follow the
> approach that you described below. Will this solve your problem?
>
> Ashish
>
>  ------------------------------
> *From:* Shuja Rehman [mailto:shujamughal@gmail.com]
> *Sent:* Wednesday, June 09, 2010 3:07 PM
> *To:* hive-user@hadoop.apache.org
> *Subject:* Load data from xml using Mapper.py in hive
>
> Hi
> I have created a table in hive (Suppose table1 with two columns, col1 and
> col2 )
>
> now i have an xml file for which i have write a python script which read
> the xml file and transform it in single row with tab seperated
> e.g the output of python script can be
>
> row 1 = val1     val2
> row2 =  val3     val4
>
> so the output of file has straight rows with the help of python script. now
> i want to load this into created table. I have seen the example of in which
> the data is first loaded in u_data table then transform it using python
> script in u_data_new but in m scenario. it does not fit as i have xml file
> as source.
>
>
> Kindly let me know can I achieve this??
> Thanks
>
> --
>

-- 
Regards
Baig

RE: Load data from xml using Mapper.py in hive

Posted by Ashish Thusoo <at...@facebook.com>.
You could load this whole xml file into a table with a single row and a single column. The default record delimiter is \n but you can create a table where the record delimiter is \001. Once you do that you can follow the approach that you described below. Will this solve your problem?

Ashish

________________________________
From: Shuja Rehman [mailto:shujamughal@gmail.com]
Sent: Wednesday, June 09, 2010 3:07 PM
To: hive-user@hadoop.apache.org
Subject: Load data from xml using Mapper.py in hive

Hi
I have created a table in hive (Suppose table1 with two columns, col1 and col2 )

now i have an xml file for which i have write a python script which read the xml file and transform it in single row with tab seperated
e.g the output of python script can be

row 1 = val1     val2
row2 =  val3     val4

so the output of file has straight rows with the help of python script. now i want to load this into created table. I have seen the example of in which the data is first loaded in u_data table then transform it using python script in u_data_new but in m scenario. it does not fit as i have xml file as source.


Kindly let me know can I achieve this??
Thanks

--
Regards
Shuja-ur-Rehman Baig
_________________________________
MS CS - School of Science and Engineering
Lahore University of Management Sciences (LUMS)
Sector U, DHA, Lahore, 54792, Pakistan
Cell: +92 3214207445