You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by "Kumar, Suresh" <Su...@emc.com> on 2012/10/19 01:48:46 UTC

Thrift Python client with regex

I am using Thrift (0.8.0) to get scan column values from a table.

This code returns all the values.

 

columns = ['mylog']

scanner = client.scannerOpen('apachelogs','', columns)

result = client.scannerGet(scanner)

while result:

  printRow(result[0])

  result = client.scannerGet(scanner)

  print "Scanner finished"

client.scannerClose(scanner)

 

The scannerOpen Python API says you can pass a regex in the column

qualifier, so if I send:

 

columns = ['mylog:suresh'], it should return all the values which has
the

string suresh right? I don't get any result.

 

Thanks,
Suresh


Re: Thrift Python client with regex

Posted by Stack <st...@duboce.net>.
On Thu, Oct 18, 2012 at 7:13 PM, Norbert Burger
<no...@gmail.com> wrote:
> We had the same question earlier.  Unfortunately the documentation is
> wrong on this account; scannerOpen resolves to either a call to
> scan.addFamily or scan.addColumn, and neither directly supports regex
> matching.
>
> Regex pattern matching against colquals is definitely supported on the
> Java side, so Thrift2 (0.94.0) is a possible solution, if you can
> upgrade.  Another approach, depending on how large your rows are,
> would be to grab the full list of cols, filter via regex on the client
> side, and then specify explicitly in scannerOpen().
>

Thanks Norbert.

Of if one of you fellas wants to put up a patch that adds the
regex'ing to thrift1, we'll commit it.

But what about '10.3.1. Filter Language' in
http://hbase.apache.org/book.html ?  Have you fellas tried it?  The
doc looks like it might be wrong regards how you open the scanner --
it seems like you pass the filter string to the thrift Scan object --
but maybe this'll work?  Let us know and if inclined, tell us how to
fix the doc.

Thanks,
St.Ack

Re: Thrift Python client with regex

Posted by Norbert Burger <no...@gmail.com>.
We had the same question earlier.  Unfortunately the documentation is
wrong on this account; scannerOpen resolves to either a call to
scan.addFamily or scan.addColumn, and neither directly supports regex
matching.

Regex pattern matching against colquals is definitely supported on the
Java side, so Thrift2 (0.94.0) is a possible solution, if you can
upgrade.  Another approach, depending on how large your rows are,
would be to grab the full list of cols, filter via regex on the client
side, and then specify explicitly in scannerOpen().

Norbert

On Thu, Oct 18, 2012 at 7:48 PM, Kumar, Suresh <Su...@emc.com> wrote:
> I am using Thrift (0.8.0) to get scan column values from a table.
>
> This code returns all the values.
>
>
>
> columns = ['mylog']
>
> scanner = client.scannerOpen('apachelogs','', columns)
>
> result = client.scannerGet(scanner)
>
> while result:
>
>   printRow(result[0])
>
>   result = client.scannerGet(scanner)
>
>   print "Scanner finished"
>
> client.scannerClose(scanner)
>
>
>
> The scannerOpen Python API says you can pass a regex in the column
>
> qualifier, so if I send:
>
>
>
> columns = ['mylog:suresh'], it should return all the values which has
> the
>
> string suresh right? I don't get any result.
>
>
>
> Thanks,
> Suresh
>