You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Justin Workman <ju...@gmail.com> on 2013/08/21 07:43:58 UTC

First/last in npath

When is it expect to support lead/lag/first_value/last_value in the
npath result statement?

Thanks


Sent from my iPhone

Re: First/last in npath

Posted by Edward Capriolo <ed...@gmail.com>.
If you can find no open jira issue on this functionality then that means no
one is currently working on it.


On Wed, Aug 21, 2013 at 1:43 AM, Justin Workman <ju...@gmail.com>wrote:

> When is it expect to support lead/lag/first_value/last_value in the
> npath result statement?
>
> Thanks
>
>
> Sent from my iPhone
>

Re: First/last in npath

Posted by Justin Workman <ju...@gmail.com>.
Confirmed, that this does not work. I get the following error

" Non-constant expressions for array indexes not supported"

FWIW, I think I have written a UDF that will work for what I want. I still
have some work to do to make sure it gets and returns the correct data type
of the field being returned but its basically like ExtractFromTpath(tpath,
-1, 'productid'). Arg1 is the tpath object, arg2 is the row to get -1 for
the last row, and arg3 is the field name to retrieve.


On Wed, Aug 21, 2013 at 6:48 PM, Harish Butani <hb...@hortonworks.com>wrote:

> Can you try this:
>
> select search_terms, productid, clicks_to_product from npath ( on clicks
>                 distributed by sessionid sort by timestamp
>                 arg1('SEARCH.NOTPRODUCT*.PRODUCT'),
>                 arg2('SEARCH'), arg3(page = 'SEARCH'),
>                 arg4('PRODUCT'), arg5(page = 'PRODUCT'),
>                 arg5('NOTPRODUCT'), arg5(page != 'PRODUCT'),
>                 arg6('search_terms,  (size(tpath)-1) as clicks_to_product,
> tpath[size(tpath) -1].productid as productid')
>                 );
>
>
> - added NOTPRODUCT to capture clicks between SEARCH and PRODUCT
> - you don't need first_value for search_terms, because you are getting the
> row back starting at which the Pattern matches.
> - to get the last_value, i am hoping this works: tpath[size(tpath)
> -1].productid
>
>
> On Aug 21, 2013, at 12:25 PM, Justin Workman <ju...@gmail.com>
> wrote:
>
> Assuming click stream type of data I want to get the search terms from the
> first search request, and return the product id that was eventually viewed
> and the number of clicks to the product. So something like this
>
> select search_terms, productid, clicks_to_product from npath ( on clicks
>                 distributed by sessionid sort by timestamp
>                 arg1('SEARCH.PRODUCT'),
>                 arg2('SEARCH'), arg3(page = 'SEARCH'),
>                 arg4('PRODUCT'), arg5([age = 'PRODUCT'),
>                 arg6('first_value(search_terms) as search_terms,
> last_value(productid) as productid, (size(tpath)-1) as clicks_to_product')
>                 );
>
> From what I have seen, I will get the search terms from the first search
> without the first_value, however it would be nice to be able to use
> first_value to guarantee that. I cannot get the productid from the last
> tpath object using this. I did try and get the last_value(tpath.productid)
> in the outer query, however that returned the productid ( and all nulls
> leading up to the product viewed page) in the very tpath value for the very
> last row returned from the inner npath select, eg not the last value for
> the productid for that row. I can use tpath.productid in place of productid
> in the outer query and it returns the nulls for each row in the current
> tpath, upto the final product view.
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: First/last in npath

Posted by Justin Workman <ju...@gmail.com>.
I believe I tried that, both in the return argument and the outer query. If
memory serves me, I got an error about the array index needing to be a
constant value.

I will try again when I get back to a computer.

Sent from my iPhone

On Aug 21, 2013, at 6:48 PM, Harish Butani <hb...@hortonworks.com> wrote:

Can you try this:

select search_terms, productid, clicks_to_product from npath ( on clicks
                distributed by sessionid sort by timestamp
                arg1('SEARCH.NOTPRODUCT*.PRODUCT'),
                arg2('SEARCH'), arg3(page = 'SEARCH'),
                arg4('PRODUCT'), arg5(page = 'PRODUCT'),
                arg5('NOTPRODUCT'), arg5(page != 'PRODUCT'),
                arg6('search_terms,  (size(tpath)-1) as clicks_to_product,
tpath[size(tpath) -1].productid as productid')
                );


- added NOTPRODUCT to capture clicks between SEARCH and PRODUCT
- you don't need first_value for search_terms, because you are getting the
row back starting at which the Pattern matches.
- to get the last_value, i am hoping this works: tpath[size(tpath)
-1].productid


On Aug 21, 2013, at 12:25 PM, Justin Workman <ju...@gmail.com>
wrote:

Assuming click stream type of data I want to get the search terms from the
first search request, and return the product id that was eventually viewed
and the number of clicks to the product. So something like this

select search_terms, productid, clicks_to_product from npath ( on clicks
                distributed by sessionid sort by timestamp
                arg1('SEARCH.PRODUCT'),
                arg2('SEARCH'), arg3(page = 'SEARCH'),
                arg4('PRODUCT'), arg5([age = 'PRODUCT'),
                arg6('first_value(search_terms) as search_terms,
last_value(productid) as productid, (size(tpath)-1) as clicks_to_product')
                );

>From what I have seen, I will get the search terms from the first search
without the first_value, however it would be nice to be able to use
first_value to guarantee that. I cannot get the productid from the last
tpath object using this. I did try and get the last_value(tpath.productid)
in the outer query, however that returned the productid ( and all nulls
leading up to the product viewed page) in the very tpath value for the very
last row returned from the inner npath select, eg not the last value for
the productid for that row. I can use tpath.productid in place of productid
in the outer query and it returns the nulls for each row in the current
tpath, upto the final product view.



CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.

Re: First/last in npath

Posted by Harish Butani <hb...@hortonworks.com>.
Can you try this:

select search_terms, productid, clicks_to_product from npath ( on clicks 
                distributed by sessionid sort by timestamp 
                arg1('SEARCH.NOTPRODUCT*.PRODUCT'),
                arg2('SEARCH'), arg3(page = 'SEARCH'),
                arg4('PRODUCT'), arg5(page = 'PRODUCT'),
                arg5('NOTPRODUCT'), arg5(page != 'PRODUCT'),
                arg6('search_terms,  (size(tpath)-1) as clicks_to_product, tpath[size(tpath) -1].productid as productid')
                );


- added NOTPRODUCT to capture clicks between SEARCH and PRODUCT
- you don't need first_value for search_terms, because you are getting the row back starting at which the Pattern matches.
- to get the last_value, i am hoping this works: tpath[size(tpath) -1].productid


On Aug 21, 2013, at 12:25 PM, Justin Workman <ju...@gmail.com> wrote:

> Assuming click stream type of data I want to get the search terms from the first search request, and return the product id that was eventually viewed and the number of clicks to the product. So something like this
> 
> select search_terms, productid, clicks_to_product from npath ( on clicks 
>                 distributed by sessionid sort by timestamp 
>                 arg1('SEARCH.PRODUCT'),
>                 arg2('SEARCH'), arg3(page = 'SEARCH'),
>                 arg4('PRODUCT'), arg5([age = 'PRODUCT'),
>                 arg6('first_value(search_terms) as search_terms, last_value(productid) as productid, (size(tpath)-1) as clicks_to_product')
>                 );
> 
> From what I have seen, I will get the search terms from the first search without the first_value, however it would be nice to be able to use first_value to guarantee that. I cannot get the productid from the last tpath object using this. I did try and get the last_value(tpath.productid) in the outer query, however that returned the productid ( and all nulls leading up to the product viewed page) in the very tpath value for the very last row returned from the inner npath select, eg not the last value for the productid for that row. I can use tpath.productid in place of productid in the outer query and it returns the nulls for each row in the current tpath, upto the final product view.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: First/last in npath

Posted by Justin Workman <ju...@gmail.com>.
Assuming click stream type of data I want to get the search terms from the
first search request, and return the product id that was eventually viewed
and the number of clicks to the product. So something like this

select search_terms, productid, clicks_to_product from npath ( on clicks
                distributed by sessionid sort by timestamp
                arg1('SEARCH.PRODUCT'),
                arg2('SEARCH'), arg3(page = 'SEARCH'),
                arg4('PRODUCT'), arg5([age = 'PRODUCT'),
                arg6('first_value(search_terms) as search_terms,
last_value(productid) as productid, (size(tpath)-1) as clicks_to_product')
                );

>From what I have seen, I will get the search terms from the first search
without the first_value, however it would be nice to be able to use
first_value to guarantee that. I cannot get the productid from the last
tpath object using this. I did try and get the last_value(tpath.productid)
in the outer query, however that returned the productid ( and all nulls
leading up to the product viewed page) in the very tpath value for the very
last row returned from the inner npath select, eg not the last value for
the productid for that row. I can use tpath.productid in place of productid
in the outer query and it returns the nulls for each row in the current
tpath, upto the final product view.

Hope this makes sense.

Thanks
Justin


On Wed, Aug 21, 2013 at 12:42 PM, Harish Butani <hb...@hortonworks.com>wrote:

> Can you provide details on what you want to do.
> You maybe able to express this by stacking queries: execute npath in a
> SubQuery in the from clause and then do windowing in an outer select.
> Also you get the 'path' object back from npath, so you can apply array
> indexing on it.
>
> regards,
> Harish.
>
> On Aug 20, 2013, at 10:43 PM, Justin Workman <ju...@gmail.com>
> wrote:
>
> > When is it expect to support lead/lag/first_value/last_value in the
> > npath result statement?
> >
> > Thanks
> >
> >
> > Sent from my iPhone
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: First/last in npath

Posted by Harish Butani <hb...@hortonworks.com>.
Can you provide details on what you want to do.
You maybe able to express this by stacking queries: execute npath in a SubQuery in the from clause and then do windowing in an outer select.
Also you get the 'path' object back from npath, so you can apply array indexing on it.

regards,
Harish.

On Aug 20, 2013, at 10:43 PM, Justin Workman <ju...@gmail.com> wrote:

> When is it expect to support lead/lag/first_value/last_value in the
> npath result statement?
> 
> Thanks
> 
> 
> Sent from my iPhone


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.