You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Chetan Khatri <ch...@gmail.com> on 2020/05/11 21:29:15 UTC

XPATH_INT behavior - XML - Function in Spark

Hi Spark Users,

I want to parse xml coming in the query columns and get the value, I am
using *xpath_int* which works as per my requirement but When I am embedding
in the Spark SQL query columns it is failing.

select timesheet_profile_id,
*xpath_int(timesheet_profile_code, '(/timesheetprofile/weeks/week[*
*td.current_week**]/**td.day**)[1]')*

*this failed *
where Hardcoded values work for the above scenario

scala> spark.sql("select timesheet_profile_id,
xpath_int(timesheet_profile_code,
'(/timesheetprofile/weeks/week[2]/friday)[1]') from
TIMESHEET_PROFILE_ATT").show(false)

Anyone has worked on this? Thanks in advance.

Thanks
- Chetan

Re: XPATH_INT behavior - XML - Function in Spark

Posted by Chetan Khatri <ch...@gmail.com>.

Anyone can please suggest how can I achieve this?

On Tue, May 12, 2020 at 5:35 PM Jeff Evans <je...@gmail.com>
wrote:

> It sounds like you're expecting the XPath expression to evaluate embedded
> Spark SQL expressions?  From the documentation
> <https://spark.apache.org/docs/2.4.5/api/sql/index.html#xpath>, there
> appears to be no reason to expect that to work.
>
> On Tue, May 12, 2020 at 2:09 PM Chetan Khatri <ch...@gmail.com>
> wrote:
>
>> Can someone please help.. Thanks in advance.
>>
>> On Mon, May 11, 2020 at 5:29 PM Chetan Khatri <
>> chetan.opensource@gmail.com> wrote:
>>
>>> Hi Spark Users,
>>>
>>> I want to parse xml coming in the query columns and get the value, I am
>>> using *xpath_int* which works as per my requirement but When I am
>>> embedding in the Spark SQL query columns it is failing.
>>>
>>> select timesheet_profile_id,
>>> *xpath_int(timesheet_profile_code, '(/timesheetprofile/weeks/week[*
>>> *td.current_week**]/**td.day**)[1]')*
>>>
>>> *this failed *
>>> where Hardcoded values work for the above scenario
>>>
>>> scala> spark.sql("select timesheet_profile_id,
>>> xpath_int(timesheet_profile_code,
>>> '(/timesheetprofile/weeks/week[2]/friday)[1]') from
>>> TIMESHEET_PROFILE_ATT").show(false)
>>>
>>> Anyone has worked on this? Thanks in advance.
>>>
>>> Thanks
>>> - Chetan
>>>
>>>

Re: XPATH_INT behavior - XML - Function in Spark

Posted by Chetan Khatri <ch...@gmail.com>.

Thank you for the clarification.
What do you suggest to get this use case achieved.

On Tue, May 12, 2020 at 5:35 PM Jeff Evans <je...@gmail.com>
wrote:

> It sounds like you're expecting the XPath expression to evaluate embedded
> Spark SQL expressions?  From the documentation
> <https://spark.apache.org/docs/2.4.5/api/sql/index.html#xpath>, there
> appears to be no reason to expect that to work.
>
> On Tue, May 12, 2020 at 2:09 PM Chetan Khatri <ch...@gmail.com>
> wrote:
>
>> Can someone please help.. Thanks in advance.
>>
>> On Mon, May 11, 2020 at 5:29 PM Chetan Khatri <
>> chetan.opensource@gmail.com> wrote:
>>
>>> Hi Spark Users,
>>>
>>> I want to parse xml coming in the query columns and get the value, I am
>>> using *xpath_int* which works as per my requirement but When I am
>>> embedding in the Spark SQL query columns it is failing.
>>>
>>> select timesheet_profile_id,
>>> *xpath_int(timesheet_profile_code, '(/timesheetprofile/weeks/week[*
>>> *td.current_week**]/**td.day**)[1]')*
>>>
>>> *this failed *
>>> where Hardcoded values work for the above scenario
>>>
>>> scala> spark.sql("select timesheet_profile_id,
>>> xpath_int(timesheet_profile_code,
>>> '(/timesheetprofile/weeks/week[2]/friday)[1]') from
>>> TIMESHEET_PROFILE_ATT").show(false)
>>>
>>> Anyone has worked on this? Thanks in advance.
>>>
>>> Thanks
>>> - Chetan
>>>
>>>

Re: XPATH_INT behavior - XML - Function in Spark

Posted by Jeff Evans <je...@gmail.com>.

It sounds like you're expecting the XPath expression to evaluate embedded
Spark SQL expressions?  From the documentation
<https://spark.apache.org/docs/2.4.5/api/sql/index.html#xpath>, there
appears to be no reason to expect that to work.

On Tue, May 12, 2020 at 2:09 PM Chetan Khatri <ch...@gmail.com>
wrote:

> Can someone please help.. Thanks in advance.
>
> On Mon, May 11, 2020 at 5:29 PM Chetan Khatri <ch...@gmail.com>
> wrote:
>
>> Hi Spark Users,
>>
>> I want to parse xml coming in the query columns and get the value, I am
>> using *xpath_int* which works as per my requirement but When I am
>> embedding in the Spark SQL query columns it is failing.
>>
>> select timesheet_profile_id,
>> *xpath_int(timesheet_profile_code, '(/timesheetprofile/weeks/week[*
>> *td.current_week**]/**td.day**)[1]')*
>>
>> *this failed *
>> where Hardcoded values work for the above scenario
>>
>> scala> spark.sql("select timesheet_profile_id,
>> xpath_int(timesheet_profile_code,
>> '(/timesheetprofile/weeks/week[2]/friday)[1]') from
>> TIMESHEET_PROFILE_ATT").show(false)
>>
>> Anyone has worked on this? Thanks in advance.
>>
>> Thanks
>> - Chetan
>>
>>

Re: XPATH_INT behavior - XML - Function in Spark

Posted by Chetan Khatri <ch...@gmail.com>.

Can someone please help.. Thanks in advance.

On Mon, May 11, 2020 at 5:29 PM Chetan Khatri <ch...@gmail.com>
wrote:

> Hi Spark Users,
>
> I want to parse xml coming in the query columns and get the value, I am
> using *xpath_int* which works as per my requirement but When I am
> embedding in the Spark SQL query columns it is failing.
>
> select timesheet_profile_id,
> *xpath_int(timesheet_profile_code, '(/timesheetprofile/weeks/week[*
> *td.current_week**]/**td.day**)[1]')*
>
> *this failed *
> where Hardcoded values work for the above scenario
>
> scala> spark.sql("select timesheet_profile_id,
> xpath_int(timesheet_profile_code,
> '(/timesheetprofile/weeks/week[2]/friday)[1]') from
> TIMESHEET_PROFILE_ATT").show(false)
>
> Anyone has worked on this? Thanks in advance.
>
> Thanks
> - Chetan
>
>