You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by David Novogrodsky <da...@gmail.com> on 2014/12/08 20:35:44 UTC
Using xPATH and Hive SQL to access XML data, but xPath a problem
I created a Hive table using one column. Each row contains one XML record.
Here is the script I used to create this first table:
CREATE EXTERNAL TABLE xml_event_table (
xmlevent string)
STORED AS TEXTFILE
LOCATION “/user/cloudera/vector/events”;
Here is a sample XML in one row of the xm-Levent_table:
<Event xmlns=”http://schemas.microsoft.com/win/2004/08/events/event”><System><ProviderName=”Microsoft-Windows-Security-Auditing”
Guid=”54849625-5478-4994-a5ba-3e3b0328c30d”</Provider> <EventID
Qualifiers=””>4672</EventID> <Version>0</Version>…</Event>
I want to create a view that contains the EventID. But the XPath is not
working correctly:
CREATE VIEW xpath_xml_event_view01(event_id, computer, user_id)
AS SELECT
xpath_string(xmlevent, ‘Event/System/EventID’)
FROM xml_event_table;
I modeled the solution using this web site:
https://communities.intel.com/community/itpeernetwork/datastack/blog/2013/08/15/hadoop-tutorialsingesting-xml-in-hive-using-xpath
Also,
If I change the Hive script to:
CREATE VIEW xpath_xml_event_view01(event_id)
AS SELECT
xpath(xmlevent, '/Event[@xmlns="
http://schemas.microsoft.com/win/2004/08/events/event
"]/System/EventID[@Qualifiers=""]/text()')
FROM xml_event_table;
I get this result when I select all using this view:
0 []
1 []
If I try this Hive script:
CREATE VIEW xpath_xml_event_view01(event_id)
AS SELECT
xpath_string(xmlevent, '/Event[1]/System[1]/EventID')
FROM xml_event_table;
or this Hive script:
CREATE VIEW xpath_xml_event_view01(event_id)
AS SELECT
xpath_string(xmlevent, '/Event[@xmlns="
http://schemas.microsoft.com/win/2004/08/events/event
"]/System/EventID[@Qualifiers=""]/text()')
FROM xml_event_table;
I get this result(empty rows):
0
1
2
David Novogrodsky
david.novogrodsky@gmail.com
http://www.linkedin.com/in/davidnovogrodsky