You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Sam Mohamed <sa...@voltage.com> on 2012/10/18 03:04:07 UTC

Hive Query with UDF

I have some encrypted data in an HDFS csv, that I've created a Hive table for, and I want to run a Hive query that first encrypts the query param, then does the lookup.  I have a UDF that does encryption as follows:

public class ParamEncrypt extends UDF {

  public Text evaluate(String name) throws Exception {

      String result = new String();

      if (name == null) { return null; }

      result = ParamData.encrypt(name);

      return new Text(result);
  }
}

Then I run the Hive query as:

  select * from cc_details where first_name = encrypt('Ann');

The problem is, it's running encrypt('Ann') across every single record in the table.  I want it do the encryption once, then do the matchup.  I've tried:

  select * from cc_details where first_name in (select encrypt('Ann') from cc_details limit 1);

But Hive doesn't support **IN** or select queries in the where clause.

What can I do?

Can I do something like:

  select encrypt('Ann') as ann from cc_details where first_name = ann;

That also doesn't work because the query parser throws an error saying **ann** is not a known column

Thanks,

Sam

Re: Hive Query with UDF

Posted by Sam Mohamed <sa...@voltage.com>.
Finally got it to work like this:

select * from cc_details ssn_tbl
right outer join ( select encrypt('850-37-8230','ssn') as ssn
from cc_details limit 1) ssn_tmp
on (ssn_tbl.ssn = ssn_tmp.ssn);

Sam
On Oct 17, 2012, at 7:21 PM, Michael Segel <mi...@hotmail.com>> wrote:

You really don't want to do that.
It becomes a nightmare in that you now ship a derivative of Hive and then have to maintain it and keep it in lock step w Hive from Apache.


There are other options and designs but since this is for a commercial product. I'm not going to talk about them.

Keep in mind that Hive isn't a relational database per se and works on immutable flat files. So that's going to hurt you as well.

On Oct 17, 2012, at 9:13 PM, lohit <lo...@gmail.com>> wrote:

One idea is to write your own translation layer  which sits in between query and actual job submission.
You would most likely end up having your own version of hive jar which has your translation changes on top of HIVE sources.
This has the added advantage that users need not change their queries, they would do it as normal HIVE query, like
   select * from cc_details where first_name = 'Ann'
Disadvantage is you have to maintain a fork.

Even otherwise, my initial guess is you might have to modify command line parser which does encrypt once instead of for every record

2012/10/17 Sam Mohamed <sa...@voltage.com>>
Thanks for the quick response.

The idea is that we are selling the encryption product for customers who use HDFS.  Hence, encryption is a requirement.

Any other suggestions.

Sam
________________________________________
From: Michael Segel [michael_segel@hotmail.com<ma...@hotmail.com>]
Sent: Wednesday, October 17, 2012 6:10 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Hive Query with UDF

You don't need an UDF...

You encrypt the string 'Ann' first then use that encrypted value in the Select statement.

That should make things a bit simpler.



On Oct 17, 2012, at 8:04 PM, Sam Mohamed <sa...@voltage.com>> wrote:

> I have some encrypted data in an HDFS csv, that I've created a Hive table for, and I want to run a Hive query that first encrypts the query param, then does the lookup.  I have a UDF that does encryption as follows:
>
> public class ParamEncrypt extends UDF {
>
>  public Text evaluate(String name) throws Exception {
>
>      String result = new String();
>
>      if (name == null) { return null; }
>
>      result = ParamData.encrypt(name);
>
>      return new Text(result);
>  }
> }
>
> Then I run the Hive query as:
>
>  select * from cc_details where first_name = encrypt('Ann');
>
> The problem is, it's running encrypt('Ann') across every single record in the table.  I want it do the encryption once, then do the matchup.  I've tried:
>
>  select * from cc_details where first_name in (select encrypt('Ann') from cc_details limit 1);
>
> But Hive doesn't support **IN** or select queries in the where clause.
>
> What can I do?
>
> Can I do something like:
>
>  select encrypt('Ann') as ann from cc_details where first_name = ann;
>
> That also doesn't work because the query parser throws an error saying **ann** is not a known column
>
> Thanks,
>
> Sam




--
Have a Nice Day!
Lohit



Re: Hive Query with UDF

Posted by Sam Mohamed <sa...@voltage.com>.
Finally got it to work like this:

select * from cc_details ssn_tbl
right outer join ( select encrypt('850-37-8230','ssn') as ssn
from cc_details limit 1) ssn_tmp
on (ssn_tbl.ssn = ssn_tmp.ssn);

Sam
On Oct 17, 2012, at 7:21 PM, Michael Segel <mi...@hotmail.com>> wrote:

You really don't want to do that.
It becomes a nightmare in that you now ship a derivative of Hive and then have to maintain it and keep it in lock step w Hive from Apache.


There are other options and designs but since this is for a commercial product. I'm not going to talk about them.

Keep in mind that Hive isn't a relational database per se and works on immutable flat files. So that's going to hurt you as well.

On Oct 17, 2012, at 9:13 PM, lohit <lo...@gmail.com>> wrote:

One idea is to write your own translation layer  which sits in between query and actual job submission.
You would most likely end up having your own version of hive jar which has your translation changes on top of HIVE sources.
This has the added advantage that users need not change their queries, they would do it as normal HIVE query, like
   select * from cc_details where first_name = 'Ann'
Disadvantage is you have to maintain a fork.

Even otherwise, my initial guess is you might have to modify command line parser which does encrypt once instead of for every record

2012/10/17 Sam Mohamed <sa...@voltage.com>>
Thanks for the quick response.

The idea is that we are selling the encryption product for customers who use HDFS.  Hence, encryption is a requirement.

Any other suggestions.

Sam
________________________________________
From: Michael Segel [michael_segel@hotmail.com<ma...@hotmail.com>]
Sent: Wednesday, October 17, 2012 6:10 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Hive Query with UDF

You don't need an UDF...

You encrypt the string 'Ann' first then use that encrypted value in the Select statement.

That should make things a bit simpler.



On Oct 17, 2012, at 8:04 PM, Sam Mohamed <sa...@voltage.com>> wrote:

> I have some encrypted data in an HDFS csv, that I've created a Hive table for, and I want to run a Hive query that first encrypts the query param, then does the lookup.  I have a UDF that does encryption as follows:
>
> public class ParamEncrypt extends UDF {
>
>  public Text evaluate(String name) throws Exception {
>
>      String result = new String();
>
>      if (name == null) { return null; }
>
>      result = ParamData.encrypt(name);
>
>      return new Text(result);
>  }
> }
>
> Then I run the Hive query as:
>
>  select * from cc_details where first_name = encrypt('Ann');
>
> The problem is, it's running encrypt('Ann') across every single record in the table.  I want it do the encryption once, then do the matchup.  I've tried:
>
>  select * from cc_details where first_name in (select encrypt('Ann') from cc_details limit 1);
>
> But Hive doesn't support **IN** or select queries in the where clause.
>
> What can I do?
>
> Can I do something like:
>
>  select encrypt('Ann') as ann from cc_details where first_name = ann;
>
> That also doesn't work because the query parser throws an error saying **ann** is not a known column
>
> Thanks,
>
> Sam




--
Have a Nice Day!
Lohit



Re: Hive Query with UDF

Posted by Sam Mohamed <sa...@voltage.com>.
Finally got it to work like this:

select * from cc_details ssn_tbl
right outer join ( select encrypt('850-37-8230','ssn') as ssn
from cc_details limit 1) ssn_tmp
on (ssn_tbl.ssn = ssn_tmp.ssn);

Sam
On Oct 17, 2012, at 7:21 PM, Michael Segel <mi...@hotmail.com>> wrote:

You really don't want to do that.
It becomes a nightmare in that you now ship a derivative of Hive and then have to maintain it and keep it in lock step w Hive from Apache.


There are other options and designs but since this is for a commercial product. I'm not going to talk about them.

Keep in mind that Hive isn't a relational database per se and works on immutable flat files. So that's going to hurt you as well.

On Oct 17, 2012, at 9:13 PM, lohit <lo...@gmail.com>> wrote:

One idea is to write your own translation layer  which sits in between query and actual job submission.
You would most likely end up having your own version of hive jar which has your translation changes on top of HIVE sources.
This has the added advantage that users need not change their queries, they would do it as normal HIVE query, like
   select * from cc_details where first_name = 'Ann'
Disadvantage is you have to maintain a fork.

Even otherwise, my initial guess is you might have to modify command line parser which does encrypt once instead of for every record

2012/10/17 Sam Mohamed <sa...@voltage.com>>
Thanks for the quick response.

The idea is that we are selling the encryption product for customers who use HDFS.  Hence, encryption is a requirement.

Any other suggestions.

Sam
________________________________________
From: Michael Segel [michael_segel@hotmail.com<ma...@hotmail.com>]
Sent: Wednesday, October 17, 2012 6:10 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Hive Query with UDF

You don't need an UDF...

You encrypt the string 'Ann' first then use that encrypted value in the Select statement.

That should make things a bit simpler.



On Oct 17, 2012, at 8:04 PM, Sam Mohamed <sa...@voltage.com>> wrote:

> I have some encrypted data in an HDFS csv, that I've created a Hive table for, and I want to run a Hive query that first encrypts the query param, then does the lookup.  I have a UDF that does encryption as follows:
>
> public class ParamEncrypt extends UDF {
>
>  public Text evaluate(String name) throws Exception {
>
>      String result = new String();
>
>      if (name == null) { return null; }
>
>      result = ParamData.encrypt(name);
>
>      return new Text(result);
>  }
> }
>
> Then I run the Hive query as:
>
>  select * from cc_details where first_name = encrypt('Ann');
>
> The problem is, it's running encrypt('Ann') across every single record in the table.  I want it do the encryption once, then do the matchup.  I've tried:
>
>  select * from cc_details where first_name in (select encrypt('Ann') from cc_details limit 1);
>
> But Hive doesn't support **IN** or select queries in the where clause.
>
> What can I do?
>
> Can I do something like:
>
>  select encrypt('Ann') as ann from cc_details where first_name = ann;
>
> That also doesn't work because the query parser throws an error saying **ann** is not a known column
>
> Thanks,
>
> Sam




--
Have a Nice Day!
Lohit



Re: Hive Query with UDF

Posted by Sam Mohamed <sa...@voltage.com>.
Finally got it to work like this:

select * from cc_details ssn_tbl
right outer join ( select encrypt('850-37-8230','ssn') as ssn
from cc_details limit 1) ssn_tmp
on (ssn_tbl.ssn = ssn_tmp.ssn);

Sam
On Oct 17, 2012, at 7:21 PM, Michael Segel <mi...@hotmail.com>> wrote:

You really don't want to do that.
It becomes a nightmare in that you now ship a derivative of Hive and then have to maintain it and keep it in lock step w Hive from Apache.


There are other options and designs but since this is for a commercial product. I'm not going to talk about them.

Keep in mind that Hive isn't a relational database per se and works on immutable flat files. So that's going to hurt you as well.

On Oct 17, 2012, at 9:13 PM, lohit <lo...@gmail.com>> wrote:

One idea is to write your own translation layer  which sits in between query and actual job submission.
You would most likely end up having your own version of hive jar which has your translation changes on top of HIVE sources.
This has the added advantage that users need not change their queries, they would do it as normal HIVE query, like
   select * from cc_details where first_name = 'Ann'
Disadvantage is you have to maintain a fork.

Even otherwise, my initial guess is you might have to modify command line parser which does encrypt once instead of for every record

2012/10/17 Sam Mohamed <sa...@voltage.com>>
Thanks for the quick response.

The idea is that we are selling the encryption product for customers who use HDFS.  Hence, encryption is a requirement.

Any other suggestions.

Sam
________________________________________
From: Michael Segel [michael_segel@hotmail.com<ma...@hotmail.com>]
Sent: Wednesday, October 17, 2012 6:10 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Hive Query with UDF

You don't need an UDF...

You encrypt the string 'Ann' first then use that encrypted value in the Select statement.

That should make things a bit simpler.



On Oct 17, 2012, at 8:04 PM, Sam Mohamed <sa...@voltage.com>> wrote:

> I have some encrypted data in an HDFS csv, that I've created a Hive table for, and I want to run a Hive query that first encrypts the query param, then does the lookup.  I have a UDF that does encryption as follows:
>
> public class ParamEncrypt extends UDF {
>
>  public Text evaluate(String name) throws Exception {
>
>      String result = new String();
>
>      if (name == null) { return null; }
>
>      result = ParamData.encrypt(name);
>
>      return new Text(result);
>  }
> }
>
> Then I run the Hive query as:
>
>  select * from cc_details where first_name = encrypt('Ann');
>
> The problem is, it's running encrypt('Ann') across every single record in the table.  I want it do the encryption once, then do the matchup.  I've tried:
>
>  select * from cc_details where first_name in (select encrypt('Ann') from cc_details limit 1);
>
> But Hive doesn't support **IN** or select queries in the where clause.
>
> What can I do?
>
> Can I do something like:
>
>  select encrypt('Ann') as ann from cc_details where first_name = ann;
>
> That also doesn't work because the query parser throws an error saying **ann** is not a known column
>
> Thanks,
>
> Sam




--
Have a Nice Day!
Lohit



Re: Hive Query with UDF

Posted by Michael Segel <mi...@hotmail.com>.
You really don't want to do that.
It becomes a nightmare in that you now ship a derivative of Hive and then have to maintain it and keep it in lock step w Hive from Apache. 


There are other options and designs but since this is for a commercial product. I'm not going to talk about them. 

Keep in mind that Hive isn't a relational database per se and works on immutable flat files. So that's going to hurt you as well. 

On Oct 17, 2012, at 9:13 PM, lohit <lo...@gmail.com> wrote:

> One idea is to write your own translation layer  which sits in between query and actual job submission. 
> You would most likely end up having your own version of hive jar which has your translation changes on top of HIVE sources.
> This has the added advantage that users need not change their queries, they would do it as normal HIVE query, like
>    select * from cc_details where first_name = 'Ann'
> Disadvantage is you have to maintain a fork. 
> 
> Even otherwise, my initial guess is you might have to modify command line parser which does encrypt once instead of for every record
> 
> 2012/10/17 Sam Mohamed <sa...@voltage.com>
> Thanks for the quick response.
> 
> The idea is that we are selling the encryption product for customers who use HDFS.  Hence, encryption is a requirement.
> 
> Any other suggestions.
> 
> Sam
> ________________________________________
> From: Michael Segel [michael_segel@hotmail.com]
> Sent: Wednesday, October 17, 2012 6:10 PM
> To: user@hadoop.apache.org
> Subject: Re: Hive Query with UDF
> 
> You don't need an UDF...
> 
> You encrypt the string 'Ann' first then use that encrypted value in the Select statement.
> 
> That should make things a bit simpler.
> 
> 
> 
> On Oct 17, 2012, at 8:04 PM, Sam Mohamed <sa...@voltage.com> wrote:
> 
> > I have some encrypted data in an HDFS csv, that I've created a Hive table for, and I want to run a Hive query that first encrypts the query param, then does the lookup.  I have a UDF that does encryption as follows:
> >
> > public class ParamEncrypt extends UDF {
> >
> >  public Text evaluate(String name) throws Exception {
> >
> >      String result = new String();
> >
> >      if (name == null) { return null; }
> >
> >      result = ParamData.encrypt(name);
> >
> >      return new Text(result);
> >  }
> > }
> >
> > Then I run the Hive query as:
> >
> >  select * from cc_details where first_name = encrypt('Ann');
> >
> > The problem is, it's running encrypt('Ann') across every single record in the table.  I want it do the encryption once, then do the matchup.  I've tried:
> >
> >  select * from cc_details where first_name in (select encrypt('Ann') from cc_details limit 1);
> >
> > But Hive doesn't support **IN** or select queries in the where clause.
> >
> > What can I do?
> >
> > Can I do something like:
> >
> >  select encrypt('Ann') as ann from cc_details where first_name = ann;
> >
> > That also doesn't work because the query parser throws an error saying **ann** is not a known column
> >
> > Thanks,
> >
> > Sam
> 
> 
> 
> 
> -- 
> Have a Nice Day!
> Lohit


Re: Hive Query with UDF

Posted by Michael Segel <mi...@hotmail.com>.
You really don't want to do that.
It becomes a nightmare in that you now ship a derivative of Hive and then have to maintain it and keep it in lock step w Hive from Apache. 


There are other options and designs but since this is for a commercial product. I'm not going to talk about them. 

Keep in mind that Hive isn't a relational database per se and works on immutable flat files. So that's going to hurt you as well. 

On Oct 17, 2012, at 9:13 PM, lohit <lo...@gmail.com> wrote:

> One idea is to write your own translation layer  which sits in between query and actual job submission. 
> You would most likely end up having your own version of hive jar which has your translation changes on top of HIVE sources.
> This has the added advantage that users need not change their queries, they would do it as normal HIVE query, like
>    select * from cc_details where first_name = 'Ann'
> Disadvantage is you have to maintain a fork. 
> 
> Even otherwise, my initial guess is you might have to modify command line parser which does encrypt once instead of for every record
> 
> 2012/10/17 Sam Mohamed <sa...@voltage.com>
> Thanks for the quick response.
> 
> The idea is that we are selling the encryption product for customers who use HDFS.  Hence, encryption is a requirement.
> 
> Any other suggestions.
> 
> Sam
> ________________________________________
> From: Michael Segel [michael_segel@hotmail.com]
> Sent: Wednesday, October 17, 2012 6:10 PM
> To: user@hadoop.apache.org
> Subject: Re: Hive Query with UDF
> 
> You don't need an UDF...
> 
> You encrypt the string 'Ann' first then use that encrypted value in the Select statement.
> 
> That should make things a bit simpler.
> 
> 
> 
> On Oct 17, 2012, at 8:04 PM, Sam Mohamed <sa...@voltage.com> wrote:
> 
> > I have some encrypted data in an HDFS csv, that I've created a Hive table for, and I want to run a Hive query that first encrypts the query param, then does the lookup.  I have a UDF that does encryption as follows:
> >
> > public class ParamEncrypt extends UDF {
> >
> >  public Text evaluate(String name) throws Exception {
> >
> >      String result = new String();
> >
> >      if (name == null) { return null; }
> >
> >      result = ParamData.encrypt(name);
> >
> >      return new Text(result);
> >  }
> > }
> >
> > Then I run the Hive query as:
> >
> >  select * from cc_details where first_name = encrypt('Ann');
> >
> > The problem is, it's running encrypt('Ann') across every single record in the table.  I want it do the encryption once, then do the matchup.  I've tried:
> >
> >  select * from cc_details where first_name in (select encrypt('Ann') from cc_details limit 1);
> >
> > But Hive doesn't support **IN** or select queries in the where clause.
> >
> > What can I do?
> >
> > Can I do something like:
> >
> >  select encrypt('Ann') as ann from cc_details where first_name = ann;
> >
> > That also doesn't work because the query parser throws an error saying **ann** is not a known column
> >
> > Thanks,
> >
> > Sam
> 
> 
> 
> 
> -- 
> Have a Nice Day!
> Lohit


Re: Hive Query with UDF

Posted by Michael Segel <mi...@hotmail.com>.
You really don't want to do that.
It becomes a nightmare in that you now ship a derivative of Hive and then have to maintain it and keep it in lock step w Hive from Apache. 


There are other options and designs but since this is for a commercial product. I'm not going to talk about them. 

Keep in mind that Hive isn't a relational database per se and works on immutable flat files. So that's going to hurt you as well. 

On Oct 17, 2012, at 9:13 PM, lohit <lo...@gmail.com> wrote:

> One idea is to write your own translation layer  which sits in between query and actual job submission. 
> You would most likely end up having your own version of hive jar which has your translation changes on top of HIVE sources.
> This has the added advantage that users need not change their queries, they would do it as normal HIVE query, like
>    select * from cc_details where first_name = 'Ann'
> Disadvantage is you have to maintain a fork. 
> 
> Even otherwise, my initial guess is you might have to modify command line parser which does encrypt once instead of for every record
> 
> 2012/10/17 Sam Mohamed <sa...@voltage.com>
> Thanks for the quick response.
> 
> The idea is that we are selling the encryption product for customers who use HDFS.  Hence, encryption is a requirement.
> 
> Any other suggestions.
> 
> Sam
> ________________________________________
> From: Michael Segel [michael_segel@hotmail.com]
> Sent: Wednesday, October 17, 2012 6:10 PM
> To: user@hadoop.apache.org
> Subject: Re: Hive Query with UDF
> 
> You don't need an UDF...
> 
> You encrypt the string 'Ann' first then use that encrypted value in the Select statement.
> 
> That should make things a bit simpler.
> 
> 
> 
> On Oct 17, 2012, at 8:04 PM, Sam Mohamed <sa...@voltage.com> wrote:
> 
> > I have some encrypted data in an HDFS csv, that I've created a Hive table for, and I want to run a Hive query that first encrypts the query param, then does the lookup.  I have a UDF that does encryption as follows:
> >
> > public class ParamEncrypt extends UDF {
> >
> >  public Text evaluate(String name) throws Exception {
> >
> >      String result = new String();
> >
> >      if (name == null) { return null; }
> >
> >      result = ParamData.encrypt(name);
> >
> >      return new Text(result);
> >  }
> > }
> >
> > Then I run the Hive query as:
> >
> >  select * from cc_details where first_name = encrypt('Ann');
> >
> > The problem is, it's running encrypt('Ann') across every single record in the table.  I want it do the encryption once, then do the matchup.  I've tried:
> >
> >  select * from cc_details where first_name in (select encrypt('Ann') from cc_details limit 1);
> >
> > But Hive doesn't support **IN** or select queries in the where clause.
> >
> > What can I do?
> >
> > Can I do something like:
> >
> >  select encrypt('Ann') as ann from cc_details where first_name = ann;
> >
> > That also doesn't work because the query parser throws an error saying **ann** is not a known column
> >
> > Thanks,
> >
> > Sam
> 
> 
> 
> 
> -- 
> Have a Nice Day!
> Lohit


Re: Hive Query with UDF

Posted by Michael Segel <mi...@hotmail.com>.
You really don't want to do that.
It becomes a nightmare in that you now ship a derivative of Hive and then have to maintain it and keep it in lock step w Hive from Apache. 


There are other options and designs but since this is for a commercial product. I'm not going to talk about them. 

Keep in mind that Hive isn't a relational database per se and works on immutable flat files. So that's going to hurt you as well. 

On Oct 17, 2012, at 9:13 PM, lohit <lo...@gmail.com> wrote:

> One idea is to write your own translation layer  which sits in between query and actual job submission. 
> You would most likely end up having your own version of hive jar which has your translation changes on top of HIVE sources.
> This has the added advantage that users need not change their queries, they would do it as normal HIVE query, like
>    select * from cc_details where first_name = 'Ann'
> Disadvantage is you have to maintain a fork. 
> 
> Even otherwise, my initial guess is you might have to modify command line parser which does encrypt once instead of for every record
> 
> 2012/10/17 Sam Mohamed <sa...@voltage.com>
> Thanks for the quick response.
> 
> The idea is that we are selling the encryption product for customers who use HDFS.  Hence, encryption is a requirement.
> 
> Any other suggestions.
> 
> Sam
> ________________________________________
> From: Michael Segel [michael_segel@hotmail.com]
> Sent: Wednesday, October 17, 2012 6:10 PM
> To: user@hadoop.apache.org
> Subject: Re: Hive Query with UDF
> 
> You don't need an UDF...
> 
> You encrypt the string 'Ann' first then use that encrypted value in the Select statement.
> 
> That should make things a bit simpler.
> 
> 
> 
> On Oct 17, 2012, at 8:04 PM, Sam Mohamed <sa...@voltage.com> wrote:
> 
> > I have some encrypted data in an HDFS csv, that I've created a Hive table for, and I want to run a Hive query that first encrypts the query param, then does the lookup.  I have a UDF that does encryption as follows:
> >
> > public class ParamEncrypt extends UDF {
> >
> >  public Text evaluate(String name) throws Exception {
> >
> >      String result = new String();
> >
> >      if (name == null) { return null; }
> >
> >      result = ParamData.encrypt(name);
> >
> >      return new Text(result);
> >  }
> > }
> >
> > Then I run the Hive query as:
> >
> >  select * from cc_details where first_name = encrypt('Ann');
> >
> > The problem is, it's running encrypt('Ann') across every single record in the table.  I want it do the encryption once, then do the matchup.  I've tried:
> >
> >  select * from cc_details where first_name in (select encrypt('Ann') from cc_details limit 1);
> >
> > But Hive doesn't support **IN** or select queries in the where clause.
> >
> > What can I do?
> >
> > Can I do something like:
> >
> >  select encrypt('Ann') as ann from cc_details where first_name = ann;
> >
> > That also doesn't work because the query parser throws an error saying **ann** is not a known column
> >
> > Thanks,
> >
> > Sam
> 
> 
> 
> 
> -- 
> Have a Nice Day!
> Lohit


Re: Hive Query with UDF

Posted by lohit <lo...@gmail.com>.
One idea is to write your own translation layer  which sits in between
query and actual job submission.
You would most likely end up having your own version of hive jar which has
your translation changes on top of HIVE sources.
This has the added advantage that users need not change their queries, they
would do it as normal HIVE query, like
   select * from cc_details where first_name = 'Ann'
Disadvantage is you have to maintain a fork.

Even otherwise, my initial guess is you might have to modify command line
parser which does encrypt once instead of for every record

2012/10/17 Sam Mohamed <sa...@voltage.com>

> Thanks for the quick response.
>
> The idea is that we are selling the encryption product for customers who
> use HDFS.  Hence, encryption is a requirement.
>
> Any other suggestions.
>
> Sam
> ________________________________________
> From: Michael Segel [michael_segel@hotmail.com]
> Sent: Wednesday, October 17, 2012 6:10 PM
> To: user@hadoop.apache.org
> Subject: Re: Hive Query with UDF
>
> You don't need an UDF...
>
> You encrypt the string 'Ann' first then use that encrypted value in the
> Select statement.
>
> That should make things a bit simpler.
>
>
>
> On Oct 17, 2012, at 8:04 PM, Sam Mohamed <sa...@voltage.com> wrote:
>
> > I have some encrypted data in an HDFS csv, that I've created a Hive
> table for, and I want to run a Hive query that first encrypts the query
> param, then does the lookup.  I have a UDF that does encryption as follows:
> >
> > public class ParamEncrypt extends UDF {
> >
> >  public Text evaluate(String name) throws Exception {
> >
> >      String result = new String();
> >
> >      if (name == null) { return null; }
> >
> >      result = ParamData.encrypt(name);
> >
> >      return new Text(result);
> >  }
> > }
> >
> > Then I run the Hive query as:
> >
> >  select * from cc_details where first_name = encrypt('Ann');
> >
> > The problem is, it's running encrypt('Ann') across every single record
> in the table.  I want it do the encryption once, then do the matchup.  I've
> tried:
> >
> >  select * from cc_details where first_name in (select encrypt('Ann')
> from cc_details limit 1);
> >
> > But Hive doesn't support **IN** or select queries in the where clause.
> >
> > What can I do?
> >
> > Can I do something like:
> >
> >  select encrypt('Ann') as ann from cc_details where first_name = ann;
> >
> > That also doesn't work because the query parser throws an error saying
> **ann** is not a known column
> >
> > Thanks,
> >
> > Sam
>
>


-- 
Have a Nice Day!
Lohit

Re: Hive Query with UDF

Posted by lohit <lo...@gmail.com>.
One idea is to write your own translation layer  which sits in between
query and actual job submission.
You would most likely end up having your own version of hive jar which has
your translation changes on top of HIVE sources.
This has the added advantage that users need not change their queries, they
would do it as normal HIVE query, like
   select * from cc_details where first_name = 'Ann'
Disadvantage is you have to maintain a fork.

Even otherwise, my initial guess is you might have to modify command line
parser which does encrypt once instead of for every record

2012/10/17 Sam Mohamed <sa...@voltage.com>

> Thanks for the quick response.
>
> The idea is that we are selling the encryption product for customers who
> use HDFS.  Hence, encryption is a requirement.
>
> Any other suggestions.
>
> Sam
> ________________________________________
> From: Michael Segel [michael_segel@hotmail.com]
> Sent: Wednesday, October 17, 2012 6:10 PM
> To: user@hadoop.apache.org
> Subject: Re: Hive Query with UDF
>
> You don't need an UDF...
>
> You encrypt the string 'Ann' first then use that encrypted value in the
> Select statement.
>
> That should make things a bit simpler.
>
>
>
> On Oct 17, 2012, at 8:04 PM, Sam Mohamed <sa...@voltage.com> wrote:
>
> > I have some encrypted data in an HDFS csv, that I've created a Hive
> table for, and I want to run a Hive query that first encrypts the query
> param, then does the lookup.  I have a UDF that does encryption as follows:
> >
> > public class ParamEncrypt extends UDF {
> >
> >  public Text evaluate(String name) throws Exception {
> >
> >      String result = new String();
> >
> >      if (name == null) { return null; }
> >
> >      result = ParamData.encrypt(name);
> >
> >      return new Text(result);
> >  }
> > }
> >
> > Then I run the Hive query as:
> >
> >  select * from cc_details where first_name = encrypt('Ann');
> >
> > The problem is, it's running encrypt('Ann') across every single record
> in the table.  I want it do the encryption once, then do the matchup.  I've
> tried:
> >
> >  select * from cc_details where first_name in (select encrypt('Ann')
> from cc_details limit 1);
> >
> > But Hive doesn't support **IN** or select queries in the where clause.
> >
> > What can I do?
> >
> > Can I do something like:
> >
> >  select encrypt('Ann') as ann from cc_details where first_name = ann;
> >
> > That also doesn't work because the query parser throws an error saying
> **ann** is not a known column
> >
> > Thanks,
> >
> > Sam
>
>


-- 
Have a Nice Day!
Lohit

Re: Hive Query with UDF

Posted by lohit <lo...@gmail.com>.
One idea is to write your own translation layer  which sits in between
query and actual job submission.
You would most likely end up having your own version of hive jar which has
your translation changes on top of HIVE sources.
This has the added advantage that users need not change their queries, they
would do it as normal HIVE query, like
   select * from cc_details where first_name = 'Ann'
Disadvantage is you have to maintain a fork.

Even otherwise, my initial guess is you might have to modify command line
parser which does encrypt once instead of for every record

2012/10/17 Sam Mohamed <sa...@voltage.com>

> Thanks for the quick response.
>
> The idea is that we are selling the encryption product for customers who
> use HDFS.  Hence, encryption is a requirement.
>
> Any other suggestions.
>
> Sam
> ________________________________________
> From: Michael Segel [michael_segel@hotmail.com]
> Sent: Wednesday, October 17, 2012 6:10 PM
> To: user@hadoop.apache.org
> Subject: Re: Hive Query with UDF
>
> You don't need an UDF...
>
> You encrypt the string 'Ann' first then use that encrypted value in the
> Select statement.
>
> That should make things a bit simpler.
>
>
>
> On Oct 17, 2012, at 8:04 PM, Sam Mohamed <sa...@voltage.com> wrote:
>
> > I have some encrypted data in an HDFS csv, that I've created a Hive
> table for, and I want to run a Hive query that first encrypts the query
> param, then does the lookup.  I have a UDF that does encryption as follows:
> >
> > public class ParamEncrypt extends UDF {
> >
> >  public Text evaluate(String name) throws Exception {
> >
> >      String result = new String();
> >
> >      if (name == null) { return null; }
> >
> >      result = ParamData.encrypt(name);
> >
> >      return new Text(result);
> >  }
> > }
> >
> > Then I run the Hive query as:
> >
> >  select * from cc_details where first_name = encrypt('Ann');
> >
> > The problem is, it's running encrypt('Ann') across every single record
> in the table.  I want it do the encryption once, then do the matchup.  I've
> tried:
> >
> >  select * from cc_details where first_name in (select encrypt('Ann')
> from cc_details limit 1);
> >
> > But Hive doesn't support **IN** or select queries in the where clause.
> >
> > What can I do?
> >
> > Can I do something like:
> >
> >  select encrypt('Ann') as ann from cc_details where first_name = ann;
> >
> > That also doesn't work because the query parser throws an error saying
> **ann** is not a known column
> >
> > Thanks,
> >
> > Sam
>
>


-- 
Have a Nice Day!
Lohit

Re: Hive Query with UDF

Posted by lohit <lo...@gmail.com>.
One idea is to write your own translation layer  which sits in between
query and actual job submission.
You would most likely end up having your own version of hive jar which has
your translation changes on top of HIVE sources.
This has the added advantage that users need not change their queries, they
would do it as normal HIVE query, like
   select * from cc_details where first_name = 'Ann'
Disadvantage is you have to maintain a fork.

Even otherwise, my initial guess is you might have to modify command line
parser which does encrypt once instead of for every record

2012/10/17 Sam Mohamed <sa...@voltage.com>

> Thanks for the quick response.
>
> The idea is that we are selling the encryption product for customers who
> use HDFS.  Hence, encryption is a requirement.
>
> Any other suggestions.
>
> Sam
> ________________________________________
> From: Michael Segel [michael_segel@hotmail.com]
> Sent: Wednesday, October 17, 2012 6:10 PM
> To: user@hadoop.apache.org
> Subject: Re: Hive Query with UDF
>
> You don't need an UDF...
>
> You encrypt the string 'Ann' first then use that encrypted value in the
> Select statement.
>
> That should make things a bit simpler.
>
>
>
> On Oct 17, 2012, at 8:04 PM, Sam Mohamed <sa...@voltage.com> wrote:
>
> > I have some encrypted data in an HDFS csv, that I've created a Hive
> table for, and I want to run a Hive query that first encrypts the query
> param, then does the lookup.  I have a UDF that does encryption as follows:
> >
> > public class ParamEncrypt extends UDF {
> >
> >  public Text evaluate(String name) throws Exception {
> >
> >      String result = new String();
> >
> >      if (name == null) { return null; }
> >
> >      result = ParamData.encrypt(name);
> >
> >      return new Text(result);
> >  }
> > }
> >
> > Then I run the Hive query as:
> >
> >  select * from cc_details where first_name = encrypt('Ann');
> >
> > The problem is, it's running encrypt('Ann') across every single record
> in the table.  I want it do the encryption once, then do the matchup.  I've
> tried:
> >
> >  select * from cc_details where first_name in (select encrypt('Ann')
> from cc_details limit 1);
> >
> > But Hive doesn't support **IN** or select queries in the where clause.
> >
> > What can I do?
> >
> > Can I do something like:
> >
> >  select encrypt('Ann') as ann from cc_details where first_name = ann;
> >
> > That also doesn't work because the query parser throws an error saying
> **ann** is not a known column
> >
> > Thanks,
> >
> > Sam
>
>


-- 
Have a Nice Day!
Lohit

RE: Hive Query with UDF

Posted by Sam Mohamed <sa...@voltage.com>.
Thanks for the quick response.

The idea is that we are selling the encryption product for customers who use HDFS.  Hence, encryption is a requirement.

Any other suggestions.

Sam
________________________________________
From: Michael Segel [michael_segel@hotmail.com]
Sent: Wednesday, October 17, 2012 6:10 PM
To: user@hadoop.apache.org
Subject: Re: Hive Query with UDF

You don't need an UDF...

You encrypt the string 'Ann' first then use that encrypted value in the Select statement.

That should make things a bit simpler.



On Oct 17, 2012, at 8:04 PM, Sam Mohamed <sa...@voltage.com> wrote:

> I have some encrypted data in an HDFS csv, that I've created a Hive table for, and I want to run a Hive query that first encrypts the query param, then does the lookup.  I have a UDF that does encryption as follows:
>
> public class ParamEncrypt extends UDF {
>
>  public Text evaluate(String name) throws Exception {
>
>      String result = new String();
>
>      if (name == null) { return null; }
>
>      result = ParamData.encrypt(name);
>
>      return new Text(result);
>  }
> }
>
> Then I run the Hive query as:
>
>  select * from cc_details where first_name = encrypt('Ann');
>
> The problem is, it's running encrypt('Ann') across every single record in the table.  I want it do the encryption once, then do the matchup.  I've tried:
>
>  select * from cc_details where first_name in (select encrypt('Ann') from cc_details limit 1);
>
> But Hive doesn't support **IN** or select queries in the where clause.
>
> What can I do?
>
> Can I do something like:
>
>  select encrypt('Ann') as ann from cc_details where first_name = ann;
>
> That also doesn't work because the query parser throws an error saying **ann** is not a known column
>
> Thanks,
>
> Sam


RE: Hive Query with UDF

Posted by Sam Mohamed <sa...@voltage.com>.
Thanks for the quick response.

The idea is that we are selling the encryption product for customers who use HDFS.  Hence, encryption is a requirement.

Any other suggestions.

Sam
________________________________________
From: Michael Segel [michael_segel@hotmail.com]
Sent: Wednesday, October 17, 2012 6:10 PM
To: user@hadoop.apache.org
Subject: Re: Hive Query with UDF

You don't need an UDF...

You encrypt the string 'Ann' first then use that encrypted value in the Select statement.

That should make things a bit simpler.



On Oct 17, 2012, at 8:04 PM, Sam Mohamed <sa...@voltage.com> wrote:

> I have some encrypted data in an HDFS csv, that I've created a Hive table for, and I want to run a Hive query that first encrypts the query param, then does the lookup.  I have a UDF that does encryption as follows:
>
> public class ParamEncrypt extends UDF {
>
>  public Text evaluate(String name) throws Exception {
>
>      String result = new String();
>
>      if (name == null) { return null; }
>
>      result = ParamData.encrypt(name);
>
>      return new Text(result);
>  }
> }
>
> Then I run the Hive query as:
>
>  select * from cc_details where first_name = encrypt('Ann');
>
> The problem is, it's running encrypt('Ann') across every single record in the table.  I want it do the encryption once, then do the matchup.  I've tried:
>
>  select * from cc_details where first_name in (select encrypt('Ann') from cc_details limit 1);
>
> But Hive doesn't support **IN** or select queries in the where clause.
>
> What can I do?
>
> Can I do something like:
>
>  select encrypt('Ann') as ann from cc_details where first_name = ann;
>
> That also doesn't work because the query parser throws an error saying **ann** is not a known column
>
> Thanks,
>
> Sam


RE: Hive Query with UDF

Posted by Sam Mohamed <sa...@voltage.com>.
Thanks for the quick response.

The idea is that we are selling the encryption product for customers who use HDFS.  Hence, encryption is a requirement.

Any other suggestions.

Sam
________________________________________
From: Michael Segel [michael_segel@hotmail.com]
Sent: Wednesday, October 17, 2012 6:10 PM
To: user@hadoop.apache.org
Subject: Re: Hive Query with UDF

You don't need an UDF...

You encrypt the string 'Ann' first then use that encrypted value in the Select statement.

That should make things a bit simpler.



On Oct 17, 2012, at 8:04 PM, Sam Mohamed <sa...@voltage.com> wrote:

> I have some encrypted data in an HDFS csv, that I've created a Hive table for, and I want to run a Hive query that first encrypts the query param, then does the lookup.  I have a UDF that does encryption as follows:
>
> public class ParamEncrypt extends UDF {
>
>  public Text evaluate(String name) throws Exception {
>
>      String result = new String();
>
>      if (name == null) { return null; }
>
>      result = ParamData.encrypt(name);
>
>      return new Text(result);
>  }
> }
>
> Then I run the Hive query as:
>
>  select * from cc_details where first_name = encrypt('Ann');
>
> The problem is, it's running encrypt('Ann') across every single record in the table.  I want it do the encryption once, then do the matchup.  I've tried:
>
>  select * from cc_details where first_name in (select encrypt('Ann') from cc_details limit 1);
>
> But Hive doesn't support **IN** or select queries in the where clause.
>
> What can I do?
>
> Can I do something like:
>
>  select encrypt('Ann') as ann from cc_details where first_name = ann;
>
> That also doesn't work because the query parser throws an error saying **ann** is not a known column
>
> Thanks,
>
> Sam


RE: Hive Query with UDF

Posted by Sam Mohamed <sa...@voltage.com>.
Thanks for the quick response.

The idea is that we are selling the encryption product for customers who use HDFS.  Hence, encryption is a requirement.

Any other suggestions.

Sam
________________________________________
From: Michael Segel [michael_segel@hotmail.com]
Sent: Wednesday, October 17, 2012 6:10 PM
To: user@hadoop.apache.org
Subject: Re: Hive Query with UDF

You don't need an UDF...

You encrypt the string 'Ann' first then use that encrypted value in the Select statement.

That should make things a bit simpler.



On Oct 17, 2012, at 8:04 PM, Sam Mohamed <sa...@voltage.com> wrote:

> I have some encrypted data in an HDFS csv, that I've created a Hive table for, and I want to run a Hive query that first encrypts the query param, then does the lookup.  I have a UDF that does encryption as follows:
>
> public class ParamEncrypt extends UDF {
>
>  public Text evaluate(String name) throws Exception {
>
>      String result = new String();
>
>      if (name == null) { return null; }
>
>      result = ParamData.encrypt(name);
>
>      return new Text(result);
>  }
> }
>
> Then I run the Hive query as:
>
>  select * from cc_details where first_name = encrypt('Ann');
>
> The problem is, it's running encrypt('Ann') across every single record in the table.  I want it do the encryption once, then do the matchup.  I've tried:
>
>  select * from cc_details where first_name in (select encrypt('Ann') from cc_details limit 1);
>
> But Hive doesn't support **IN** or select queries in the where clause.
>
> What can I do?
>
> Can I do something like:
>
>  select encrypt('Ann') as ann from cc_details where first_name = ann;
>
> That also doesn't work because the query parser throws an error saying **ann** is not a known column
>
> Thanks,
>
> Sam


Re: Hive Query with UDF

Posted by Michael Segel <mi...@hotmail.com>.
You don't need an UDF... 

You encrypt the string 'Ann' first then use that encrypted value in the Select statement. 

That should make things a bit simpler. 



On Oct 17, 2012, at 8:04 PM, Sam Mohamed <sa...@voltage.com> wrote:

> I have some encrypted data in an HDFS csv, that I've created a Hive table for, and I want to run a Hive query that first encrypts the query param, then does the lookup.  I have a UDF that does encryption as follows:
> 
> public class ParamEncrypt extends UDF {
> 
>  public Text evaluate(String name) throws Exception {
> 
>      String result = new String();
> 
>      if (name == null) { return null; }
> 
>      result = ParamData.encrypt(name);
> 
>      return new Text(result);
>  }
> }
> 
> Then I run the Hive query as:
> 
>  select * from cc_details where first_name = encrypt('Ann');
> 
> The problem is, it's running encrypt('Ann') across every single record in the table.  I want it do the encryption once, then do the matchup.  I've tried:
> 
>  select * from cc_details where first_name in (select encrypt('Ann') from cc_details limit 1);
> 
> But Hive doesn't support **IN** or select queries in the where clause.
> 
> What can I do?
> 
> Can I do something like:
> 
>  select encrypt('Ann') as ann from cc_details where first_name = ann;
> 
> That also doesn't work because the query parser throws an error saying **ann** is not a known column
> 
> Thanks,
> 
> Sam


Re: Hive Query with UDF

Posted by Michael Segel <mi...@hotmail.com>.
You don't need an UDF... 

You encrypt the string 'Ann' first then use that encrypted value in the Select statement. 

That should make things a bit simpler. 



On Oct 17, 2012, at 8:04 PM, Sam Mohamed <sa...@voltage.com> wrote:

> I have some encrypted data in an HDFS csv, that I've created a Hive table for, and I want to run a Hive query that first encrypts the query param, then does the lookup.  I have a UDF that does encryption as follows:
> 
> public class ParamEncrypt extends UDF {
> 
>  public Text evaluate(String name) throws Exception {
> 
>      String result = new String();
> 
>      if (name == null) { return null; }
> 
>      result = ParamData.encrypt(name);
> 
>      return new Text(result);
>  }
> }
> 
> Then I run the Hive query as:
> 
>  select * from cc_details where first_name = encrypt('Ann');
> 
> The problem is, it's running encrypt('Ann') across every single record in the table.  I want it do the encryption once, then do the matchup.  I've tried:
> 
>  select * from cc_details where first_name in (select encrypt('Ann') from cc_details limit 1);
> 
> But Hive doesn't support **IN** or select queries in the where clause.
> 
> What can I do?
> 
> Can I do something like:
> 
>  select encrypt('Ann') as ann from cc_details where first_name = ann;
> 
> That also doesn't work because the query parser throws an error saying **ann** is not a known column
> 
> Thanks,
> 
> Sam


Re: Hive Query with UDF

Posted by Michael Segel <mi...@hotmail.com>.
You don't need an UDF... 

You encrypt the string 'Ann' first then use that encrypted value in the Select statement. 

That should make things a bit simpler. 



On Oct 17, 2012, at 8:04 PM, Sam Mohamed <sa...@voltage.com> wrote:

> I have some encrypted data in an HDFS csv, that I've created a Hive table for, and I want to run a Hive query that first encrypts the query param, then does the lookup.  I have a UDF that does encryption as follows:
> 
> public class ParamEncrypt extends UDF {
> 
>  public Text evaluate(String name) throws Exception {
> 
>      String result = new String();
> 
>      if (name == null) { return null; }
> 
>      result = ParamData.encrypt(name);
> 
>      return new Text(result);
>  }
> }
> 
> Then I run the Hive query as:
> 
>  select * from cc_details where first_name = encrypt('Ann');
> 
> The problem is, it's running encrypt('Ann') across every single record in the table.  I want it do the encryption once, then do the matchup.  I've tried:
> 
>  select * from cc_details where first_name in (select encrypt('Ann') from cc_details limit 1);
> 
> But Hive doesn't support **IN** or select queries in the where clause.
> 
> What can I do?
> 
> Can I do something like:
> 
>  select encrypt('Ann') as ann from cc_details where first_name = ann;
> 
> That also doesn't work because the query parser throws an error saying **ann** is not a known column
> 
> Thanks,
> 
> Sam


Re: Hive Query with UDF

Posted by Michael Segel <mi...@hotmail.com>.
You don't need an UDF... 

You encrypt the string 'Ann' first then use that encrypted value in the Select statement. 

That should make things a bit simpler. 



On Oct 17, 2012, at 8:04 PM, Sam Mohamed <sa...@voltage.com> wrote:

> I have some encrypted data in an HDFS csv, that I've created a Hive table for, and I want to run a Hive query that first encrypts the query param, then does the lookup.  I have a UDF that does encryption as follows:
> 
> public class ParamEncrypt extends UDF {
> 
>  public Text evaluate(String name) throws Exception {
> 
>      String result = new String();
> 
>      if (name == null) { return null; }
> 
>      result = ParamData.encrypt(name);
> 
>      return new Text(result);
>  }
> }
> 
> Then I run the Hive query as:
> 
>  select * from cc_details where first_name = encrypt('Ann');
> 
> The problem is, it's running encrypt('Ann') across every single record in the table.  I want it do the encryption once, then do the matchup.  I've tried:
> 
>  select * from cc_details where first_name in (select encrypt('Ann') from cc_details limit 1);
> 
> But Hive doesn't support **IN** or select queries in the where clause.
> 
> What can I do?
> 
> Can I do something like:
> 
>  select encrypt('Ann') as ann from cc_details where first_name = ann;
> 
> That also doesn't work because the query parser throws an error saying **ann** is not a known column
> 
> Thanks,
> 
> Sam