You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Swarnim Kulkarni <ku...@gmail.com> on 2013/02/03 02:02:44 UTC

Review Request: Add support for pulling HBase columns with prefixes

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9276/
-----------------------------------------------------------

Review request for hive.


Description
-------

Added support for pulling hbase columns just by providing prefixes and a wildcard. So a query now could look something like this:

CREATE EXTERNAL TABLE hive_hbase_test
ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,fam1:col*") 
TBLPROPERTIES ("hbase.table.name" = "TEST_NEW_KEPLER_TABLE");

This would pull in all columns under column family "fam1" which start with "col". This gives a little more flexibility over pull all columns format.


This addresses bug HIVE-3725.
    https://issues.apache.org/jira/browse/HIVE-3725


Diffs
-----

  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 7f37ba5 
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java a8ba9d9 
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java d35bb52 
  hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java e821282 

Diff: https://reviews.apache.org/r/9276/diff/


Testing
-------

Added unit tests to demonstrate the new functionality. Also made sure that all existing unit tests passed.


Thanks,

Swarnim Kulkarni


Re: Review Request 9276: Add support for pulling HBase columns with prefixes

Posted by Brock Noland <br...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9276/#review23644
-----------------------------------------------------------

Ship it!


Ship It!

- Brock Noland


On July 22, 2013, 1:20 a.m., Swarnim Kulkarni wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9276/
> -----------------------------------------------------------
> 
> (Updated July 22, 2013, 1:20 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-3725
>     https://issues.apache.org/jira/browse/HIVE-3725
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Added support for pulling hbase columns just by providing prefixes and a wildcard. So a query now could look something like this:
> 
> CREATE EXTERNAL TABLE hive_hbase_test
> ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' 
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,fam1:col*") 
> TBLPROPERTIES ("hbase.table.name" = "TEST_HBASE_TABLE");
> 
> This would pull in all columns under column family "fam1" which start with "col". This gives a little more flexibility over pull all columns format.
> 
> 
> Diffs
> -----
> 
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 65c81bf 
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java b550f45 
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java 01938a7 
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java a8ba9d9 
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java 10a9207 
>   hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java e821282 
> 
> Diff: https://reviews.apache.org/r/9276/diff/
> 
> 
> Testing
> -------
> 
> Added unit tests to demonstrate the new functionality. Also made sure that all existing unit tests passed.
> 
> 
> Thanks,
> 
> Swarnim Kulkarni
> 
>


Re: Review Request 9276: Add support for pulling HBase columns with prefixes

Posted by Swarnim Kulkarni <ku...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9276/
-----------------------------------------------------------

(Updated July 22, 2013, 1:20 a.m.)


Review request for hive.


Changes
-------

Updated diff with proposed changes.


Bugs: HIVE-3725
    https://issues.apache.org/jira/browse/HIVE-3725


Repository: hive-git


Description
-------

Added support for pulling hbase columns just by providing prefixes and a wildcard. So a query now could look something like this:

CREATE EXTERNAL TABLE hive_hbase_test
ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,fam1:col*") 
TBLPROPERTIES ("hbase.table.name" = "TEST_HBASE_TABLE");

This would pull in all columns under column family "fam1" which start with "col". This gives a little more flexibility over pull all columns format.


Diffs (updated)
-----

  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 65c81bf 
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java b550f45 
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java 01938a7 
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java a8ba9d9 
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java 10a9207 
  hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java e821282 

Diff: https://reviews.apache.org/r/9276/diff/


Testing
-------

Added unit tests to demonstrate the new functionality. Also made sure that all existing unit tests passed.


Thanks,

Swarnim Kulkarni


Re: Review Request 9276: Add support for pulling HBase columns with prefixes

Posted by Brock Noland <br...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9276/#review23388
-----------------------------------------------------------


Hi,

Regarding the discussion between yourself, Mark and I before, we weren't saying use a regex to decide if the incoming column is a wildcard. We are saying that it should be possible for someone to specify a regex in hbase.columns.mapping which we'd use to match. However, since we don't know the typing of the incoming column qualifiers (from hbase) this might be tough.

How about this... Today we require a very simple .*  to match all characters?  This is a valid regex so when we add regex support later we don't have to deal with backwards incompatibility issues. Basically what this would mean is:

1) Instead of col* matching everything that starts with col, col.* matches everything that starts with col.
2) Eliminate the regex matching against hbase.columns.mapping
3) Add a property which defaults to true named something like hbase.columns.mapping.regex.matching so users could turn this off if needed.
4) As you do today you'd use Bytes.startWith to do the match. Later we'd implement regex matching.

Brock

- Brock Noland


On Feb. 9, 2013, 9:56 p.m., Swarnim Kulkarni wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9276/
> -----------------------------------------------------------
> 
> (Updated Feb. 9, 2013, 9:56 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-3725
>     https://issues.apache.org/jira/browse/HIVE-3725
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Added support for pulling hbase columns just by providing prefixes and a wildcard. So a query now could look something like this:
> 
> CREATE EXTERNAL TABLE hive_hbase_test
> ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' 
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,fam1:col*") 
> TBLPROPERTIES ("hbase.table.name" = "TEST_HBASE_TABLE");
> 
> This would pull in all columns under column family "fam1" which start with "col". This gives a little more flexibility over pull all columns format.
> 
> 
> Diffs
> -----
> 
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 7f37ba5 
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java a8ba9d9 
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java d35bb52 
>   hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java e821282 
> 
> Diff: https://reviews.apache.org/r/9276/diff/
> 
> 
> Testing
> -------
> 
> Added unit tests to demonstrate the new functionality. Also made sure that all existing unit tests passed.
> 
> 
> Thanks,
> 
> Swarnim Kulkarni
> 
>


Re: Review Request: Add support for pulling HBase columns with prefixes

Posted by Swarnim Kulkarni <ku...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9276/
-----------------------------------------------------------

(Updated Feb. 9, 2013, 9:56 p.m.)


Review request for hive.


Changes
-------

Updated diff with the proposed changes.


Description
-------

Added support for pulling hbase columns just by providing prefixes and a wildcard. So a query now could look something like this:

CREATE EXTERNAL TABLE hive_hbase_test
ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,fam1:col*") 
TBLPROPERTIES ("hbase.table.name" = "TEST_HBASE_TABLE");

This would pull in all columns under column family "fam1" which start with "col". This gives a little more flexibility over pull all columns format.


This addresses bug HIVE-3725.
    https://issues.apache.org/jira/browse/HIVE-3725


Diffs (updated)
-----

  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 7f37ba5 
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java a8ba9d9 
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java d35bb52 
  hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java e821282 

Diff: https://reviews.apache.org/r/9276/diff/


Testing
-------

Added unit tests to demonstrate the new functionality. Also made sure that all existing unit tests passed.


Thanks,

Swarnim Kulkarni


Re: Review Request: Add support for pulling HBase columns with prefixes

Posted by Brock Noland <br...@cloudera.com>.

> On Feb. 5, 2013, 3:43 a.m., Mark Grover wrote:
> > hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, line 192
> > <https://reviews.apache.org/r/9276/diff/1/?file=254957#file254957line192>
> >
> >     This seems like a limited case of pattern matching. Swarnim, any way we can support generic regex matching instead?
> 
> Swarnim Kulkarni wrote:
>     Mark, in this case I specifically wanted to only allow strings that end with exactly the character "*" and using String#endsWith seemed more simpler and readable than a regex. Do you still want me to replace this with a regex matching?

I think the issue is that this would make it difficult to implement enhanced pattern matching later. Implementing it now, you'd only need to specify:

col.*

in the table configuration. Now the issue would be detecting if the particular column was a regex pattern. Because #, comma, and : are used as separators that would exclude those characters from being used.


- Brock


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9276/#review16080
-----------------------------------------------------------


On Feb. 3, 2013, 1:04 a.m., Swarnim Kulkarni wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9276/
> -----------------------------------------------------------
> 
> (Updated Feb. 3, 2013, 1:04 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Description
> -------
> 
> Added support for pulling hbase columns just by providing prefixes and a wildcard. So a query now could look something like this:
> 
> CREATE EXTERNAL TABLE hive_hbase_test
> ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' 
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,fam1:col*") 
> TBLPROPERTIES ("hbase.table.name" = "TEST_HBASE_TABLE");
> 
> This would pull in all columns under column family "fam1" which start with "col". This gives a little more flexibility over pull all columns format.
> 
> 
> This addresses bug HIVE-3725.
>     https://issues.apache.org/jira/browse/HIVE-3725
> 
> 
> Diffs
> -----
> 
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 7f37ba5 
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java a8ba9d9 
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java d35bb52 
>   hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java e821282 
> 
> Diff: https://reviews.apache.org/r/9276/diff/
> 
> 
> Testing
> -------
> 
> Added unit tests to demonstrate the new functionality. Also made sure that all existing unit tests passed.
> 
> 
> Thanks,
> 
> Swarnim Kulkarni
> 
>


Re: Review Request: Add support for pulling HBase columns with prefixes

Posted by Swarnim Kulkarni <ku...@gmail.com>.

> On Feb. 5, 2013, 3:43 a.m., Mark Grover wrote:
> > hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, line 192
> > <https://reviews.apache.org/r/9276/diff/1/?file=254957#file254957line192>
> >
> >     This seems like a limited case of pattern matching. Swarnim, any way we can support generic regex matching instead?

Mark, in this case I specifically wanted to only allow strings that end with exactly the character "*" and using String#endsWith seemed more simpler and readable than a regex. Do you still want me to replace this with a regex matching?


- Swarnim


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9276/#review16080
-----------------------------------------------------------


On Feb. 3, 2013, 1:04 a.m., Swarnim Kulkarni wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9276/
> -----------------------------------------------------------
> 
> (Updated Feb. 3, 2013, 1:04 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Description
> -------
> 
> Added support for pulling hbase columns just by providing prefixes and a wildcard. So a query now could look something like this:
> 
> CREATE EXTERNAL TABLE hive_hbase_test
> ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' 
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,fam1:col*") 
> TBLPROPERTIES ("hbase.table.name" = "TEST_HBASE_TABLE");
> 
> This would pull in all columns under column family "fam1" which start with "col". This gives a little more flexibility over pull all columns format.
> 
> 
> This addresses bug HIVE-3725.
>     https://issues.apache.org/jira/browse/HIVE-3725
> 
> 
> Diffs
> -----
> 
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 7f37ba5 
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java a8ba9d9 
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java d35bb52 
>   hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java e821282 
> 
> Diff: https://reviews.apache.org/r/9276/diff/
> 
> 
> Testing
> -------
> 
> Added unit tests to demonstrate the new functionality. Also made sure that all existing unit tests passed.
> 
> 
> Thanks,
> 
> Swarnim Kulkarni
> 
>


Re: Review Request: Add support for pulling HBase columns with prefixes

Posted by Mark Grover <gr...@gmail.com>.

> On Feb. 5, 2013, 3:43 a.m., Mark Grover wrote:
> > hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, line 192
> > <https://reviews.apache.org/r/9276/diff/1/?file=254957#file254957line192>
> >
> >     This seems like a limited case of pattern matching. Swarnim, any way we can support generic regex matching instead?
> 
> Swarnim Kulkarni wrote:
>     Mark, in this case I specifically wanted to only allow strings that end with exactly the character "*" and using String#endsWith seemed more simpler and readable than a regex. Do you still want me to replace this with a regex matching?
> 
> Brock Noland wrote:
>     I think the issue is that this would make it difficult to implement enhanced pattern matching later. Implementing it now, you'd only need to specify:
>     
>     col.*
>     
>     in the table configuration. Now the issue would be detecting if the particular column was a regex pattern. Because #, comma, and : are used as separators that would exclude those characters from being used.
> 
> Swarnim Kulkarni wrote:
>     Thanks Brock. Makes sense. To be sure I am understanding you right, the change now would be just to replace the "parts[1].endsWith(*)" with something more regexy that would still imply that the string ends with "*". Correct?

I think that should be do it.

Personally, I think having limited regex matching is just going to confuse people, so if you could implement (and test) full Nava style regex matching (like we do for RegexSerDe for example), that would be fantastic. Of course, let me know if you have questions!

Thanks for doing this, BTW!


- Mark


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9276/#review16080
-----------------------------------------------------------


On Feb. 3, 2013, 1:04 a.m., Swarnim Kulkarni wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9276/
> -----------------------------------------------------------
> 
> (Updated Feb. 3, 2013, 1:04 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Description
> -------
> 
> Added support for pulling hbase columns just by providing prefixes and a wildcard. So a query now could look something like this:
> 
> CREATE EXTERNAL TABLE hive_hbase_test
> ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' 
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,fam1:col*") 
> TBLPROPERTIES ("hbase.table.name" = "TEST_HBASE_TABLE");
> 
> This would pull in all columns under column family "fam1" which start with "col". This gives a little more flexibility over pull all columns format.
> 
> 
> This addresses bug HIVE-3725.
>     https://issues.apache.org/jira/browse/HIVE-3725
> 
> 
> Diffs
> -----
> 
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 7f37ba5 
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java a8ba9d9 
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java d35bb52 
>   hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java e821282 
> 
> Diff: https://reviews.apache.org/r/9276/diff/
> 
> 
> Testing
> -------
> 
> Added unit tests to demonstrate the new functionality. Also made sure that all existing unit tests passed.
> 
> 
> Thanks,
> 
> Swarnim Kulkarni
> 
>


Re: Review Request: Add support for pulling HBase columns with prefixes

Posted by Swarnim Kulkarni <ku...@gmail.com>.

> On Feb. 5, 2013, 3:43 a.m., Mark Grover wrote:
> > hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, line 192
> > <https://reviews.apache.org/r/9276/diff/1/?file=254957#file254957line192>
> >
> >     This seems like a limited case of pattern matching. Swarnim, any way we can support generic regex matching instead?
> 
> Swarnim Kulkarni wrote:
>     Mark, in this case I specifically wanted to only allow strings that end with exactly the character "*" and using String#endsWith seemed more simpler and readable than a regex. Do you still want me to replace this with a regex matching?
> 
> Brock Noland wrote:
>     I think the issue is that this would make it difficult to implement enhanced pattern matching later. Implementing it now, you'd only need to specify:
>     
>     col.*
>     
>     in the table configuration. Now the issue would be detecting if the particular column was a regex pattern. Because #, comma, and : are used as separators that would exclude those characters from being used.

Thanks Brock. Makes sense. To be sure I am understanding you right, the change now would be just to replace the "parts[1].endsWith(*)" with something more regexy that would still imply that the string ends with "*". Correct?


- Swarnim


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9276/#review16080
-----------------------------------------------------------


On Feb. 3, 2013, 1:04 a.m., Swarnim Kulkarni wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9276/
> -----------------------------------------------------------
> 
> (Updated Feb. 3, 2013, 1:04 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Description
> -------
> 
> Added support for pulling hbase columns just by providing prefixes and a wildcard. So a query now could look something like this:
> 
> CREATE EXTERNAL TABLE hive_hbase_test
> ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' 
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,fam1:col*") 
> TBLPROPERTIES ("hbase.table.name" = "TEST_HBASE_TABLE");
> 
> This would pull in all columns under column family "fam1" which start with "col". This gives a little more flexibility over pull all columns format.
> 
> 
> This addresses bug HIVE-3725.
>     https://issues.apache.org/jira/browse/HIVE-3725
> 
> 
> Diffs
> -----
> 
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 7f37ba5 
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java a8ba9d9 
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java d35bb52 
>   hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java e821282 
> 
> Diff: https://reviews.apache.org/r/9276/diff/
> 
> 
> Testing
> -------
> 
> Added unit tests to demonstrate the new functionality. Also made sure that all existing unit tests passed.
> 
> 
> Thanks,
> 
> Swarnim Kulkarni
> 
>


Re: Review Request: Add support for pulling HBase columns with prefixes

Posted by Swarnim Kulkarni <ku...@gmail.com>.

> On Feb. 5, 2013, 3:43 a.m., Mark Grover wrote:
> > hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, line 192
> > <https://reviews.apache.org/r/9276/diff/1/?file=254957#file254957line192>
> >
> >     This seems like a limited case of pattern matching. Swarnim, any way we can support generic regex matching instead?
> 
> Swarnim Kulkarni wrote:
>     Mark, in this case I specifically wanted to only allow strings that end with exactly the character "*" and using String#endsWith seemed more simpler and readable than a regex. Do you still want me to replace this with a regex matching?
> 
> Brock Noland wrote:
>     I think the issue is that this would make it difficult to implement enhanced pattern matching later. Implementing it now, you'd only need to specify:
>     
>     col.*
>     
>     in the table configuration. Now the issue would be detecting if the particular column was a regex pattern. Because #, comma, and : are used as separators that would exclude those characters from being used.
> 
> Swarnim Kulkarni wrote:
>     Thanks Brock. Makes sense. To be sure I am understanding you right, the change now would be just to replace the "parts[1].endsWith(*)" with something more regexy that would still imply that the string ends with "*". Correct?
> 
> Mark Grover wrote:
>     I think that should be do it.
>     
>     Personally, I think having limited regex matching is just going to confuse people, so if you could implement (and test) full Nava style regex matching (like we do for RegexSerDe for example), that would be fantastic. Of course, let me know if you have questions!
>     
>     Thanks for doing this, BTW!

Thanks for the suggestions. I incorporated them and updated the review. If you get a chance, please let me know if they look any better.


- Swarnim


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9276/#review16080
-----------------------------------------------------------


On Feb. 9, 2013, 9:56 p.m., Swarnim Kulkarni wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9276/
> -----------------------------------------------------------
> 
> (Updated Feb. 9, 2013, 9:56 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Description
> -------
> 
> Added support for pulling hbase columns just by providing prefixes and a wildcard. So a query now could look something like this:
> 
> CREATE EXTERNAL TABLE hive_hbase_test
> ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' 
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,fam1:col*") 
> TBLPROPERTIES ("hbase.table.name" = "TEST_HBASE_TABLE");
> 
> This would pull in all columns under column family "fam1" which start with "col". This gives a little more flexibility over pull all columns format.
> 
> 
> This addresses bug HIVE-3725.
>     https://issues.apache.org/jira/browse/HIVE-3725
> 
> 
> Diffs
> -----
> 
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 7f37ba5 
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java a8ba9d9 
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java d35bb52 
>   hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java e821282 
> 
> Diff: https://reviews.apache.org/r/9276/diff/
> 
> 
> Testing
> -------
> 
> Added unit tests to demonstrate the new functionality. Also made sure that all existing unit tests passed.
> 
> 
> Thanks,
> 
> Swarnim Kulkarni
> 
>


Re: Review Request: Add support for pulling HBase columns with prefixes

Posted by Mark Grover <gr...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9276/#review16080
-----------------------------------------------------------



hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java
<https://reviews.apache.org/r/9276/#comment34401>

    This seems like a limited case of pattern matching. Swarnim, any way we can support generic regex matching instead?


- Mark Grover


On Feb. 3, 2013, 1:04 a.m., Swarnim Kulkarni wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9276/
> -----------------------------------------------------------
> 
> (Updated Feb. 3, 2013, 1:04 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Description
> -------
> 
> Added support for pulling hbase columns just by providing prefixes and a wildcard. So a query now could look something like this:
> 
> CREATE EXTERNAL TABLE hive_hbase_test
> ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' 
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,fam1:col*") 
> TBLPROPERTIES ("hbase.table.name" = "TEST_HBASE_TABLE");
> 
> This would pull in all columns under column family "fam1" which start with "col". This gives a little more flexibility over pull all columns format.
> 
> 
> This addresses bug HIVE-3725.
>     https://issues.apache.org/jira/browse/HIVE-3725
> 
> 
> Diffs
> -----
> 
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 7f37ba5 
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java a8ba9d9 
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java d35bb52 
>   hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java e821282 
> 
> Diff: https://reviews.apache.org/r/9276/diff/
> 
> 
> Testing
> -------
> 
> Added unit tests to demonstrate the new functionality. Also made sure that all existing unit tests passed.
> 
> 
> Thanks,
> 
> Swarnim Kulkarni
> 
>


Re: Review Request: Add support for pulling HBase columns with prefixes

Posted by Swarnim Kulkarni <ku...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9276/
-----------------------------------------------------------

(Updated Feb. 3, 2013, 1:04 a.m.)


Review request for hive.


Changes
-------

Updated description.


Description (updated)
-------

Added support for pulling hbase columns just by providing prefixes and a wildcard. So a query now could look something like this:

CREATE EXTERNAL TABLE hive_hbase_test
ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,fam1:col*") 
TBLPROPERTIES ("hbase.table.name" = "TEST_HBASE_TABLE");

This would pull in all columns under column family "fam1" which start with "col". This gives a little more flexibility over pull all columns format.


This addresses bug HIVE-3725.
    https://issues.apache.org/jira/browse/HIVE-3725


Diffs
-----

  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 7f37ba5 
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java a8ba9d9 
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java d35bb52 
  hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java e821282 

Diff: https://reviews.apache.org/r/9276/diff/


Testing
-------

Added unit tests to demonstrate the new functionality. Also made sure that all existing unit tests passed.


Thanks,

Swarnim Kulkarni