You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by David Novogrodsky <da...@gmail.com> on 2014/12/08 20:35:44 UTC

Using xPATH and Hive SQL to access XML data, but xPath a problem

I created a Hive table using one column. Each row contains one XML record.
Here is the script I used to create this first table:

    CREATE EXTERNAL TABLE xml_event_table (
    xmlevent string)
    STORED AS TEXTFILE
    LOCATION “/user/cloudera/vector/events”;

Here is a sample XML in one row of the xm-Levent_table:

    <Event xmlns=”http://schemas.microsoft.com/win/2004/08/events/event”><System><ProviderName=”Microsoft-Windows-Security-Auditing”
Guid=”54849625-5478-4994-a5ba-3e3b0328c30d”</Provider> <EventID
Qualifiers=””>4672</EventID> <Version>0</Version>…</Event>

I want to create a view that contains the EventID. But the XPath is not
working correctly:

    CREATE VIEW xpath_xml_event_view01(event_id, computer, user_id)
    AS SELECT
    xpath_string(xmlevent, ‘Event/System/EventID’)
    FROM xml_event_table;

I modeled the solution using this web site:

https://communities.intel.com/community/itpeernetwork/datastack/blog/2013/08/15/hadoop-tutorialsingesting-xml-in-hive-using-xpath

Also,
If I change the Hive script to:

    CREATE VIEW xpath_xml_event_view01(event_id)
    AS SELECT
    xpath(xmlevent, '/Event[@xmlns="
http://schemas.microsoft.com/win/2004/08/events/event
"]/System/EventID[@Qualifiers=""]/text()')
    FROM xml_event_table;
I get this result when I select all using this view:

    0   []
    1   []

If I try this Hive script:

    CREATE VIEW xpath_xml_event_view01(event_id)
    AS SELECT
    xpath_string(xmlevent, '/Event[1]/System[1]/EventID')
    FROM xml_event_table;
or this Hive script:

    CREATE VIEW xpath_xml_event_view01(event_id)
    AS SELECT
    xpath_string(xmlevent, '/Event[@xmlns="
http://schemas.microsoft.com/win/2004/08/events/event
"]/System/EventID[@Qualifiers=""]/text()')
    FROM xml_event_table;

I get this result(empty rows):

    0
    1
    2
David Novogrodsky
david.novogrodsky@gmail.com
http://www.linkedin.com/in/davidnovogrodsky