You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@phoenix.apache.org by Manoj Ganesan <mg...@23andme.com> on 2018/11/02 21:27:50 UTC

Python phoenixdb adapter and JSON serialization on PQS

Hey everyone,

I'm trying to use the Python phoenixdb adapter work with JSON serialization
on PQS.

I'm using Phoenix 4.14 and the adapter works fine with protobuf, but when I
try making it work with an older version of phoenixdb (before the JSON to
protobuf switch was introduced), it just returns 0 rows. I don't see
anything in particular wrong with the HTTP requests itself, and they seem
to conform to the Avatica JSON spec (
http://calcite.apache.org/avatica/docs/json_reference.html).

Here's the result (with some debug statements) that returns 0 rows.
Notice the *"firstFrame":{"offset":0,"done":true,"rows":[]* below:

request body =  {"maxRowCount": -2, "connectionId":
"68c05d12-5770-47d6-b3e4-dba556db4790", "request": "prepareAndExecute",
"statementId": 3, "sql": "SELECT col1, col2 from table limit 20"}
request headers =  {'content-type': 'application/json'}
_post_request: got response {'fp': <socket._fileobject object at
0x7f858330b9d0>, 'status': 200, 'will_close': False, 'chunk_left':
'UNKNOWN', 'length': 1395, 'strict': 0, 'reason': 'OK', 'version': 11,
'debuglevel': 0, 'msg': <httplib.HTTPMessage instance at 0x7f84fb50be18>,
'chunked': 0, '_method': 'POST'}
response.read(): body =
{"response":"executeResults","missingStatement":false,"rpcMetadata":{"response":"rpcMetadata","serverAddress":"ip-10-55-6-247:8765"},"results":[{"response":"resultSet","connectionId":"68c05d12-5770-47d6-b3e4-dba556db4790","statementId":3,"ownStatement":true,"signature":{"columns":[{"ordinal":0,"autoIncrement":false,"caseSensitive":false,"searchable":true,"currency":false,"nullable
":0,"signed":true,"displaySize":40,"label":"COL1","columnName":"COL1","schemaName":"","precision":0,"scale":0,"tableName":"TABLE","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"PRIMITIVE_INT"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"},{"ordinal":1,"autoIncrement":false,"caseSensitive":false,"searchable":true,"currency":false,"nullable":0,"signed":true,"displaySize":40,"label":"COL2","columnName":"COL2","schemaName":"","precision":0,"scale":0,"tableName":"TABLE","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"PRIMITIVE_INT"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"}],"sql":null,"parameters":[],"cursorFactory":{"style":"LIST","clazz":null,"fieldNames":null},"statementType":null},
*"firstFrame":{"offset":0,"done":true,"rows":[]*
},"updateCount":-1,"rpcMetadata":{"response":"rpcMetadata","serverAddress":"ip-10-55-6-247:8765"}}]}

The same query issued against a PQS started with PROTOBUF serialization and
using a newer phoenixdb adapter returns the correct number of rows.

Has anyone had luck making this work?

Thanks,
Manoj

Re: Python phoenixdb adapter and JSON serialization on PQS

Posted by Manoj Ganesan <mg...@23andme.com>.

It was probably lost in the wall of text, but the point I was trying to
convey was that fetching all rows via protobuf crashed PQS, while fetching
it via JSON didn't. Basically, the only way I can get protobuf
serialization to work is to 1. bump up the PQS memory and/or 2. fetch in
batches, while I don't have to do either with JSON.

- Manoj

On Fri, Nov 9, 2018 at 10:59 AM Josh Elser <el...@apache.org> wrote:

> Manoj,
>
> re: #2 Please start by reading:
> https://plumbr.io/outofmemoryerror/gc-overhead-limit-exceeded
>
> This should give you a thorough explanation of what this error means.
> You need to increase the memory footprint and/or JVM GC properties for
> PQS to address your issue. When you request all of the rows at once, of
> course PQS needs to hold those all in memory. Please see my earlier
> instructions on how you do set JVM properties for PQS.
>
> re: #3
>
> I don't understand what you mean by "need to make protobuf faster". Just
> telling me what you observe from a high-elevation doesn't help me give
> you any kind of recommendations. I would imagine that you first need to
> solve the above issue, before you start requesting large batches of
> records from PQS (with *either* serialization).
>
> It sounds like the JSON serialization in the Python driver does not
> match what PQS/Avatica currently expects. Avatica provides no guarantee
> of wire compatibility for JSON, so you may just be hitting code that
> needs to be updated.
>
> On 11/8/18 7:14 PM, Manoj Ganesan wrote:
> > Thanks for the pointers Josh.
> >
> > Here’s a set of instructions to demonstrate the performance issues I’m
> > seeing when using protobuf serialization. Hopefully these are
> > comprehensive enough to reproduce the issue. If not, please let me know
> > and I’m happy to provide clarifications.
> >
> > Basic outline of the instructions:
> > 1. Create and populate a test table
> > 2. Use the python adapter to fetch rows
> > 3. JSON serialization
> >
> > 1. Create and populate a test table
> >
> > 1.1. Create a table:
> >
> > $> /usr/lib/phoenix/bin/sqlline.py localhost
> > CREATE TABLE test_perf (ind_id integer not null, a tinyint , b tinyint ,
> > c varchar(12), constraint pk primary key (ind_id));
> >
> > 1.2. I wrote this simple python script to populate the table with about
> > 6 million rows:
> >
> > # data_generator.py
> > import string
> > import csv
> > import random
> >
> > for i in range(0, 6000000):
> >      print ','.join([str(i), str(random.randint(-127, 127)),
> > str(random.randint(-127, 127)),
> > ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in
> > range(12))])
> >
> > # run the script
> > $> python data_generator.py > in.csv
> >
> > 1.3. Import rows into the table
> >
> > $> /opt/phoenix/bin/psql.py -t TEST_PERF <cluster_address> in.csv
> >
> > 2. Use the python adapter to fetch rows
> >
> > 2.1. Install the Python adapter
> >
> > The instructions are detailed on the project README
> > (https://github.com/apache/phoenix/tree/master/python/phoenixdb).
> >
> > pip install phoenixdb
> >
> > 2.2. Write a script to fetch rows
> >
> > import phoenixdb
> > from time import time
> >
> > database_url = ‘<your_pqs_url>’
> > sql = 'select ind_id, a, b, c from test_perf'
> >
> > conn = phoenixdb.connect(database_url, autocommit=True)
> >
> > print 'connecting to %s' % database_url
> > cursor = conn.cursor()
> > cursor.itersize = -2  # to fetch all rows at once
> > s = time()
> > print 'executing query %s' % sql
> > cursor.execute()
> > print 'time to execute = %s' % (time() - s)
> >
> > s = time()
> > print 'fetching results'
> > results = cursor.fetchall()
> > print 'time to fetchall = %s' % (time() - s)
> >
> > 2.3. Execute the fetch script
> >
> > When I execute the above script to fetch all rows at once (i.e., using
> > cursor.itersize = -2), PQS crashes and returns and error:
> >
> > Traceback (most recent call last):
> >    File "queries/protobuf_client.py", line 12, in <module>
> >      cursor = conn.cursor()
> >    File
> >
> "/Users/mganesan/ttam/local-scripts/venv/lib/python2.7/site-packages/phoenixdb/cursor.py",
>
> > line 201, in execute
> >      operation, first_frame_max_size=self.itersize)
> >    File
> >
> "/Users/mganesan/ttam/local-scripts/venv/lib/python2.7/site-packages/phoenixdb/avatica.py",
>
> > line 405, in prepare_and_execute
> >      response_data = self._apply(request, 'ExecuteResponse')
> >    File
> >
> "/Users/mganesan/ttam/local-scripts/venv/lib/python2.7/site-packages/phoenixdb/avatica.py",
>
> > line 213, in _apply
> >      parse_error_page(response_body)
> >    File
> >
> "/Users/mganesan/ttam/local-scripts/venv/lib/python2.7/site-packages/phoenixdb/avatica.py",
>
> > line 117, in parse_error_page
> >      raise errors.InternalError(message)
> > phoenixdb.errors.InternalError: ('GC overhead limit exceeded', None,
> > None, None)
> >
> > If I fetch it 2000 rows at a time (i.e., cursor.itersize = 2000, which
> > is also the default), the fetch finishes in 360s.
> >
> > 3. JSON serialization
> >
> > One of the issues I mentioned in my previous email was about how JSON
> > serialization seems to be not supported via the Python client even
> > though I was making the queries as per the Avatica JSON reference
> > (https://calcite.apache.org/avatica/docs/json_reference.html). The
> > client I wrote always received 0 rows, but no errors. On digging through
> > the source a bit, I realized that with the version being used within
> > Phoenix (the latest?), 2 new arguments are expected with the
> > ‘prepareAndExecute’ method: ‘maxRowsTotal’ and ‘maxRowsInFirstFrame’.
> > Essentially the request should look like the following (note the last 2
> > args):
> > {
> >    'request': 'prepareAndExecute',
> >    'sql': 'select ind_id, a, b, c from test_perf’,
> >    'connectionId': '2747c631-440e-405b-8185-44d1cc7f266c',
> >    'statementId': 1,
> >    'maxRowCount': -2,
> >    'maxRowsTotal': -2,
> >    'maxRowsInFirstFrame': -2,
> > }
> >
> > “sqlline-thin.json -s JSON” works because the Avatica client driver
> > Phoenix/Sqlline uses makes the right HTTP calls, with the correct args.
> >
> > The response time with using JSON serialization and fetching *all* rows
> > (cursor.itersize = -2) was 60s, which is much faster than the 360s that
> > it takes with protobuf when passing cursor.itersize=2K (note that
> > protobuf did not work with fetching all rows at once, i.e.,
> > cursor.itersize=-2)
> >
> > Because of these issues, for now we have switched PQS to using JSON
> > serialization and updated our clients to use the same. We’re obviously
> > very much interested in understanding how the protobuf path can be made
> > faster.
> >
> > Thanks again for the help!
> > Manoj
> >
> > On Tue, Nov 6, 2018 at 9:51 AM Josh Elser <elserj@apache.org
> > <ma...@apache.org>> wrote:
> >
> >
> >
> >     On 11/5/18 10:10 PM, Manoj Ganesan wrote:
> >      > Thanks for the pointers Josh. I'm working on getting a
> >     representative
> >      > concise test to demonstrate the issue.
> >      >
> >      > Meanwhile, I had one question regarding the following:
> >      >
> >      >     You are right that the operations in PQS should be exactly
> >     the same,
> >      >     regardless of the client you're using -- that is how this
> >      >     architecture works.
> >      >
> >      >
> >      > IIUC, this means the following 2 methods should yield the same
> >     result:
> >      >
> >      >  1. sqlline-thin.py -s JSON <query_file>
> >      >  2. using a python avatica client script making JSON requests
> >
> >     That's correct. Any client which speaks to PQS should see the same
> >     results. There may be bugs in the client implementation, of course,
> >     which make this statement false.
> >
> >      > I made the following change in hbase-site.xml on the PQS host:
> >      >
> >      > <property>
> >      >      <name>phoenix.queryserver.serialization</name>
> >      >      <value>JSON</value>
> >      > </property>
> >      >
> >      > I notice that executing "sqlline-thin.py -s JSON <query_file>"
> >     returns
> >      > results just fine. However, when I use a simple script to try the
> >     same
> >      > query, it returns 0 rows. I'm attaching the Python script here.
> The
> >      > script essentially makes HTTP calls using the Avatica JSON
> reference
> >      > <https://calcite.apache.org/avatica/docs/json_reference.html>. I
> >     assumed
> >      > that the sqlline-thin wrapper (when passed the -s JSON flag) also
> >     make
> >      > HTTP calls based on the JSON reference, is that not correct?
> >
> >     Apache mailing lists strip attachments. Please consider hosting it
> >     somewhere else, along with instructions/scripts to generate the
> >     required
> >     tables. Please provide some more analysis of the problem than just a
> >     summarization of what you see as an end-user -- I don't have the
> cycles
> >     or interest to debug the entire system for you :)
> >
> >     Avatica is a protocol that interprets JDBC using some serialization
> >     (JSON or Protobuf today) and a transport (only HTTP) to a remote
> server
> >     to run the JDBC oeprations. So, yes: an Avatica client is always
> using
> >     HTTP, given whatever serialization you instruct it to use.
> >
> >      > I'll work on getting some test cases here soon to illustrate this
> as
> >      > well as the performance problem.
> >      >
> >      > Thanks again!
> >      > Manoj
> >      >
> >      > On Mon, Nov 5, 2018 at 10:43 AM Josh Elser <elserj@apache.org
> >     <ma...@apache.org>
> >      > <mailto:elserj@apache.org <ma...@apache.org>>> wrote:
> >      >
> >      >     Is the OOME issue regardless of using the Java client
> >     (sqlline-thin)
> >      >     and
> >      >     the Python client? I would like to know more about this one.
> >     If you can
> >      >     share something that reproduces the problem for you, I'd like
> >     to look
> >      >     into it. The only suggestion I have at this point in time is
> >     to make
> >      >     sure you set a reasonable max-heap size in hbase-env.sh (e.g.
> >     -Xmx) via
> >      >     PHOENIX_QUERYSERVER_OPTS and have HBASE_CONF_DIR pointing to
> >     the right
> >      >     directory when you launch PQS.
> >      >
> >      >     Regarding performance, as you've described it, it sounds like
> the
> >      >     Python
> >      >     driver is just slower than the Java driver. You are right
> >     that the
> >      >     operations in PQS should be exactly the same, regardless of
> >     the client
> >      >     you're using -- that is how this architecture works. Avatica
> >     is a wire
> >      >     protocol that all clients use to talk to PQS. More
> >     digging/information
> >      >     you can provide about the exact circumstances (and, again,
> >      >     steps/environment to reproduce what you see) would be
> >     extremely helpful.
> >      >
> >      >     Thanks Manoj.
> >      >
> >      >     - Josh
> >      >
> >      >     On 11/2/18 7:16 PM, Manoj Ganesan wrote:
> >      >      > Thanks Josh for the response!
> >      >      >
> >      >      > I would definitely like to use protobuf serialization, but
> I'm
> >      >     observing
> >      >      > performance issues trying to run queries with a large
> >     number of
> >      >     results.
> >      >      > One problem is that I observe PQS runs out of memory, when
> its
> >      >     trying to
> >      >      > (what looks like to me) serialize the results in Avatica.
> The
> >      >     other is
> >      >      > that the phoenixdb python adapter itself spends a large
> >     amount of
> >      >     time
> >      >      > in the logic
> >      >      >
> >      >
> >       <
> https://github.com/apache/phoenix/blob/master/python/phoenixdb/phoenixdb/cursor.py#L248
> >
> >      >
> >      >      > where its converting the protobuf rows to python objects.
> >      >      >
> >      >      > Interestingly when we use sqlline-thin.py instead of python
> >      >     phoenixdb,
> >      >      > the protobuf serialization works fine and responses are
> fast.
> >      >     It's not
> >      >      > clear to me why PQS would have problems when using the
> python
> >      >     adapter
> >      >      > and not when using sqlline-thin, do they follow different
> >     code paths
> >      >      > (especially around serialization)?
> >      >      >
> >      >      > Thanks again,
> >      >      > Manoj
> >      >      >
> >      >      > On Fri, Nov 2, 2018 at 4:05 PM Josh Elser
> >     <elserj@apache.org <ma...@apache.org>
> >      >     <mailto:elserj@apache.org <ma...@apache.org>>
> >      >      > <mailto:elserj@apache.org <ma...@apache.org>
> >     <mailto:elserj@apache.org <ma...@apache.org>>>> wrote:
> >      >      >
> >      >      >     I would strongly suggest you do not use the JSON
> >     serialization.
> >      >      >
> >      >      >     The JSON support is implemented via Jackson which has
> no
> >      >     means to make
> >      >      >     backwards compatibility "easy". On the contrast,
> protobuf
> >      >     makes this
> >      >      >     extremely easy and we have multiple examples over the
> past
> >      >     years where
> >      >      >     we've been able to fix bugs in a backwards compatible
> >     manner.
> >      >      >
> >      >      >     If you want the thin client to continue to work across
> >      >     versions, stick
> >      >      >     with protobuf.
> >      >      >
> >      >      >     On 11/2/18 5:27 PM, Manoj Ganesan wrote:
> >      >      >      > Hey everyone,
> >      >      >      >
> >      >      >      > I'm trying to use the Python phoenixdb adapter work
> >     with JSON
> >      >      >      > serialization on PQS.
> >      >      >      >
> >      >      >      > I'm using Phoenix 4.14 and the adapter works fine
> with
> >      >     protobuf, but
> >      >      >      > when I try making it work with an older version of
> >     phoenixdb
> >      >      >     (before the
> >      >      >      > JSON to protobuf switch was introduced), it just
> >     returns 0
> >      >     rows.
> >      >      >     I don't
> >      >      >      > see anything in particular wrong with the HTTP
> requests
> >      >     itself,
> >      >      >     and they
> >      >      >      > seem to conform to the Avatica JSON spec
> >      >      >      >
> >     (http://calcite.apache.org/avatica/docs/json_reference.html).
> >      >      >      >
> >      >      >      > Here's the result (with some debug statements) that
> >      >     returns 0 rows.
> >      >      >      > Notice the
> >      >     *"firstFrame":{"offset":0,"done":true,"rows":[]* below:
> >      >      >      >
> >      >      >      > request body =  {"maxRowCount": -2, "connectionId":
> >      >      >      > "68c05d12-5770-47d6-b3e4-dba556db4790", "request":
> >      >      >     "prepareAndExecute",
> >      >      >      > "statementId": 3, "sql": "SELECT col1, col2 from
> table
> >      >     limit 20"}
> >      >      >      > request headers =  {'content-type':
> 'application/json'}
> >      >      >      > _post_request: got response {'fp':
> <socket._fileobject
> >      >     object at
> >      >      >      > 0x7f858330b9d0>, 'status': 200, 'will_close': False,
> >      >     'chunk_left':
> >      >      >      > 'UNKNOWN', 'length': 1395, 'strict': 0, 'reason':
> 'OK',
> >      >      >     'version': 11,
> >      >      >      > 'debuglevel': 0, 'msg': <httplib.HTTPMessage
> >     instance at
> >      >      >      > 0x7f84fb50be18>, 'chunked': 0, '_method': 'POST'}
> >      >      >      > response.read(): body =
> >      >      >      >
> >      >      >
> >      >
> >
>  {"response":"executeResults","missingStatement":false,"rpcMetadata":{"response":"rpcMetadata","serverAddress":"ip-10-55-6-247:8765"},"results":[{"response":"resultSet","connectionId":"68c05d12-5770-47d6-b3e4-dba556db4790","statementId":3,"ownStatement":true,"signature":{"columns":[{"ordinal":0,"autoIncrement":false,"caseSensitive":false,"searchable":true,"currency":false,"nullable
> >      >      >      >
> >      >      >
> >      >
> >
>  ":0,"signed":true,"displaySize":40,"label":"COL1","columnName":"COL1","schemaName":"","precision":0,"scale":0,"tableName":"TABLE","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"PRIMITIVE_INT"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"},{"ordinal":1,"autoIncrement":false,"caseSensitive":false,"searchable":true,"currency":false,"nullable":0,"signed":true,"displaySize":40,"label":"COL2","columnName":"COL2","schemaName":"","precision":0,"scale":0,"tableName":"TABLE","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"PRIMITIVE_INT"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"}],"sql":null,"parameters":[],"cursorFactory":{"style":"LIST","clazz":null,"fieldNames":null},"statementType":null},*"firstFrame":{"offset":0,"done":true,"rows":[]*},"updateCount":-1,"rpcMetadata":{"response":"rpcMetadata","serverAddress":"ip-10-55-6-247:8765"}}]}
> >      >      >
> >      >      >      >
> >      >      >      >
> >      >      >      > The same query issued against a PQS started with
> >     PROTOBUF
> >      >      >     serialization
> >      >      >      > and using a newer phoenixdb adapter returns the
> correct
> >      >     number of
> >      >      >     rows.
> >      >      >      >
> >      >      >      > Has anyone had luck making this work?
> >      >      >      >
> >      >      >      > Thanks,
> >      >      >      > Manoj
> >      >      >      >
> >      >      >
> >      >
> >
>

Re: Python phoenixdb adapter and JSON serialization on PQS

Posted by Josh Elser <el...@apache.org>.

Manoj,

re: #2 Please start by reading: 
https://plumbr.io/outofmemoryerror/gc-overhead-limit-exceeded

This should give you a thorough explanation of what this error means. 
You need to increase the memory footprint and/or JVM GC properties for 
PQS to address your issue. When you request all of the rows at once, of 
course PQS needs to hold those all in memory. Please see my earlier 
instructions on how you do set JVM properties for PQS.

re: #3

I don't understand what you mean by "need to make protobuf faster". Just 
telling me what you observe from a high-elevation doesn't help me give 
you any kind of recommendations. I would imagine that you first need to 
solve the above issue, before you start requesting large batches of 
records from PQS (with *either* serialization).

It sounds like the JSON serialization in the Python driver does not 
match what PQS/Avatica currently expects. Avatica provides no guarantee 
of wire compatibility for JSON, so you may just be hitting code that 
needs to be updated.

On 11/8/18 7:14 PM, Manoj Ganesan wrote:
> Thanks for the pointers Josh.
> 
> Here’s a set of instructions to demonstrate the performance issues I’m 
> seeing when using protobuf serialization. Hopefully these are 
> comprehensive enough to reproduce the issue. If not, please let me know 
> and I’m happy to provide clarifications.
> 
> Basic outline of the instructions:
> 1. Create and populate a test table
> 2. Use the python adapter to fetch rows
> 3. JSON serialization
> 
> 1. Create and populate a test table
> 
> 1.1. Create a table:
> 
> $> /usr/lib/phoenix/bin/sqlline.py localhost
> CREATE TABLE test_perf (ind_id integer not null, a tinyint , b tinyint , 
> c varchar(12), constraint pk primary key (ind_id));
> 
> 1.2. I wrote this simple python script to populate the table with about 
> 6 million rows:
> 
> # data_generator.py
> import string
> import csv
> import random
> 
> for i in range(0, 6000000):
>      print ','.join([str(i), str(random.randint(-127, 127)), 
> str(random.randint(-127, 127)), 
> ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in 
> range(12))])
> 
> # run the script
> $> python data_generator.py > in.csv
> 
> 1.3. Import rows into the table
> 
> $> /opt/phoenix/bin/psql.py -t TEST_PERF <cluster_address> in.csv
> 
> 2. Use the python adapter to fetch rows
> 
> 2.1. Install the Python adapter
> 
> The instructions are detailed on the project README 
> (https://github.com/apache/phoenix/tree/master/python/phoenixdb).
> 
> pip install phoenixdb
> 
> 2.2. Write a script to fetch rows
> 
> import phoenixdb
> from time import time
> 
> database_url = ‘<your_pqs_url>’
> sql = 'select ind_id, a, b, c from test_perf'
> 
> conn = phoenixdb.connect(database_url, autocommit=True)
> 
> print 'connecting to %s' % database_url
> cursor = conn.cursor()
> cursor.itersize = -2  # to fetch all rows at once
> s = time()
> print 'executing query %s' % sql
> cursor.execute()
> print 'time to execute = %s' % (time() - s)
> 
> s = time()
> print 'fetching results'
> results = cursor.fetchall()
> print 'time to fetchall = %s' % (time() - s)
> 
> 2.3. Execute the fetch script
> 
> When I execute the above script to fetch all rows at once (i.e., using 
> cursor.itersize = -2), PQS crashes and returns and error:
> 
> Traceback (most recent call last):
>    File "queries/protobuf_client.py", line 12, in <module>
>      cursor = conn.cursor()
>    File 
> "/Users/mganesan/ttam/local-scripts/venv/lib/python2.7/site-packages/phoenixdb/cursor.py", 
> line 201, in execute
>      operation, first_frame_max_size=self.itersize)
>    File 
> "/Users/mganesan/ttam/local-scripts/venv/lib/python2.7/site-packages/phoenixdb/avatica.py", 
> line 405, in prepare_and_execute
>      response_data = self._apply(request, 'ExecuteResponse')
>    File 
> "/Users/mganesan/ttam/local-scripts/venv/lib/python2.7/site-packages/phoenixdb/avatica.py", 
> line 213, in _apply
>      parse_error_page(response_body)
>    File 
> "/Users/mganesan/ttam/local-scripts/venv/lib/python2.7/site-packages/phoenixdb/avatica.py", 
> line 117, in parse_error_page
>      raise errors.InternalError(message)
> phoenixdb.errors.InternalError: ('GC overhead limit exceeded', None, 
> None, None)
> 
> If I fetch it 2000 rows at a time (i.e., cursor.itersize = 2000, which 
> is also the default), the fetch finishes in 360s.
> 
> 3. JSON serialization
> 
> One of the issues I mentioned in my previous email was about how JSON 
> serialization seems to be not supported via the Python client even 
> though I was making the queries as per the Avatica JSON reference 
> (https://calcite.apache.org/avatica/docs/json_reference.html). The 
> client I wrote always received 0 rows, but no errors. On digging through 
> the source a bit, I realized that with the version being used within 
> Phoenix (the latest?), 2 new arguments are expected with the 
> ‘prepareAndExecute’ method: ‘maxRowsTotal’ and ‘maxRowsInFirstFrame’. 
> Essentially the request should look like the following (note the last 2 
> args):
> {
>    'request': 'prepareAndExecute',
>    'sql': 'select ind_id, a, b, c from test_perf’,
>    'connectionId': '2747c631-440e-405b-8185-44d1cc7f266c',
>    'statementId': 1,
>    'maxRowCount': -2,
>    'maxRowsTotal': -2,
>    'maxRowsInFirstFrame': -2,
> }
> 
> “sqlline-thin.json -s JSON” works because the Avatica client driver 
> Phoenix/Sqlline uses makes the right HTTP calls, with the correct args.
> 
> The response time with using JSON serialization and fetching *all* rows 
> (cursor.itersize = -2) was 60s, which is much faster than the 360s that 
> it takes with protobuf when passing cursor.itersize=2K (note that 
> protobuf did not work with fetching all rows at once, i.e., 
> cursor.itersize=-2)
> 
> Because of these issues, for now we have switched PQS to using JSON 
> serialization and updated our clients to use the same. We’re obviously 
> very much interested in understanding how the protobuf path can be made 
> faster.
> 
> Thanks again for the help!
> Manoj
> 
> On Tue, Nov 6, 2018 at 9:51 AM Josh Elser <elserj@apache.org 
> <ma...@apache.org>> wrote:
> 
> 
> 
>     On 11/5/18 10:10 PM, Manoj Ganesan wrote:
>      > Thanks for the pointers Josh. I'm working on getting a
>     representative
>      > concise test to demonstrate the issue.
>      >
>      > Meanwhile, I had one question regarding the following:
>      >
>      >     You are right that the operations in PQS should be exactly
>     the same,
>      >     regardless of the client you're using -- that is how this
>      >     architecture works.
>      >
>      >
>      > IIUC, this means the following 2 methods should yield the same
>     result:
>      >
>      >  1. sqlline-thin.py -s JSON <query_file>
>      >  2. using a python avatica client script making JSON requests
> 
>     That's correct. Any client which speaks to PQS should see the same
>     results. There may be bugs in the client implementation, of course,
>     which make this statement false.
> 
>      > I made the following change in hbase-site.xml on the PQS host:
>      >
>      > <property>
>      >      <name>phoenix.queryserver.serialization</name>
>      >      <value>JSON</value>
>      > </property>
>      >
>      > I notice that executing "sqlline-thin.py -s JSON <query_file>"
>     returns
>      > results just fine. However, when I use a simple script to try the
>     same
>      > query, it returns 0 rows. I'm attaching the Python script here. The
>      > script essentially makes HTTP calls using the Avatica JSON reference
>      > <https://calcite.apache.org/avatica/docs/json_reference.html>. I
>     assumed
>      > that the sqlline-thin wrapper (when passed the -s JSON flag) also
>     make
>      > HTTP calls based on the JSON reference, is that not correct?
> 
>     Apache mailing lists strip attachments. Please consider hosting it
>     somewhere else, along with instructions/scripts to generate the
>     required
>     tables. Please provide some more analysis of the problem than just a
>     summarization of what you see as an end-user -- I don't have the cycles
>     or interest to debug the entire system for you :)
> 
>     Avatica is a protocol that interprets JDBC using some serialization
>     (JSON or Protobuf today) and a transport (only HTTP) to a remote server
>     to run the JDBC oeprations. So, yes: an Avatica client is always using
>     HTTP, given whatever serialization you instruct it to use.
> 
>      > I'll work on getting some test cases here soon to illustrate this as
>      > well as the performance problem.
>      >
>      > Thanks again!
>      > Manoj
>      >
>      > On Mon, Nov 5, 2018 at 10:43 AM Josh Elser <elserj@apache.org
>     <ma...@apache.org>
>      > <mailto:elserj@apache.org <ma...@apache.org>>> wrote:
>      >
>      >     Is the OOME issue regardless of using the Java client
>     (sqlline-thin)
>      >     and
>      >     the Python client? I would like to know more about this one.
>     If you can
>      >     share something that reproduces the problem for you, I'd like
>     to look
>      >     into it. The only suggestion I have at this point in time is
>     to make
>      >     sure you set a reasonable max-heap size in hbase-env.sh (e.g.
>     -Xmx) via
>      >     PHOENIX_QUERYSERVER_OPTS and have HBASE_CONF_DIR pointing to
>     the right
>      >     directory when you launch PQS.
>      >
>      >     Regarding performance, as you've described it, it sounds like the
>      >     Python
>      >     driver is just slower than the Java driver. You are right
>     that the
>      >     operations in PQS should be exactly the same, regardless of
>     the client
>      >     you're using -- that is how this architecture works. Avatica
>     is a wire
>      >     protocol that all clients use to talk to PQS. More
>     digging/information
>      >     you can provide about the exact circumstances (and, again,
>      >     steps/environment to reproduce what you see) would be
>     extremely helpful.
>      >
>      >     Thanks Manoj.
>      >
>      >     - Josh
>      >
>      >     On 11/2/18 7:16 PM, Manoj Ganesan wrote:
>      >      > Thanks Josh for the response!
>      >      >
>      >      > I would definitely like to use protobuf serialization, but I'm
>      >     observing
>      >      > performance issues trying to run queries with a large
>     number of
>      >     results.
>      >      > One problem is that I observe PQS runs out of memory, when its
>      >     trying to
>      >      > (what looks like to me) serialize the results in Avatica. The
>      >     other is
>      >      > that the phoenixdb python adapter itself spends a large
>     amount of
>      >     time
>      >      > in the logic
>      >      >
>      >   
>       <https://github.com/apache/phoenix/blob/master/python/phoenixdb/phoenixdb/cursor.py#L248>
>      >
>      >      > where its converting the protobuf rows to python objects.
>      >      >
>      >      > Interestingly when we use sqlline-thin.py instead of python
>      >     phoenixdb,
>      >      > the protobuf serialization works fine and responses are fast.
>      >     It's not
>      >      > clear to me why PQS would have problems when using the python
>      >     adapter
>      >      > and not when using sqlline-thin, do they follow different
>     code paths
>      >      > (especially around serialization)?
>      >      >
>      >      > Thanks again,
>      >      > Manoj
>      >      >
>      >      > On Fri, Nov 2, 2018 at 4:05 PM Josh Elser
>     <elserj@apache.org <ma...@apache.org>
>      >     <mailto:elserj@apache.org <ma...@apache.org>>
>      >      > <mailto:elserj@apache.org <ma...@apache.org>
>     <mailto:elserj@apache.org <ma...@apache.org>>>> wrote:
>      >      >
>      >      >     I would strongly suggest you do not use the JSON
>     serialization.
>      >      >
>      >      >     The JSON support is implemented via Jackson which has no
>      >     means to make
>      >      >     backwards compatibility "easy". On the contrast, protobuf
>      >     makes this
>      >      >     extremely easy and we have multiple examples over the past
>      >     years where
>      >      >     we've been able to fix bugs in a backwards compatible
>     manner.
>      >      >
>      >      >     If you want the thin client to continue to work across
>      >     versions, stick
>      >      >     with protobuf.
>      >      >
>      >      >     On 11/2/18 5:27 PM, Manoj Ganesan wrote:
>      >      >      > Hey everyone,
>      >      >      >
>      >      >      > I'm trying to use the Python phoenixdb adapter work
>     with JSON
>      >      >      > serialization on PQS.
>      >      >      >
>      >      >      > I'm using Phoenix 4.14 and the adapter works fine with
>      >     protobuf, but
>      >      >      > when I try making it work with an older version of
>     phoenixdb
>      >      >     (before the
>      >      >      > JSON to protobuf switch was introduced), it just
>     returns 0
>      >     rows.
>      >      >     I don't
>      >      >      > see anything in particular wrong with the HTTP requests
>      >     itself,
>      >      >     and they
>      >      >      > seem to conform to the Avatica JSON spec
>      >      >      >
>     (http://calcite.apache.org/avatica/docs/json_reference.html).
>      >      >      >
>      >      >      > Here's the result (with some debug statements) that
>      >     returns 0 rows.
>      >      >      > Notice the
>      >     *"firstFrame":{"offset":0,"done":true,"rows":[]* below:
>      >      >      >
>      >      >      > request body =  {"maxRowCount": -2, "connectionId":
>      >      >      > "68c05d12-5770-47d6-b3e4-dba556db4790", "request":
>      >      >     "prepareAndExecute",
>      >      >      > "statementId": 3, "sql": "SELECT col1, col2 from table
>      >     limit 20"}
>      >      >      > request headers =  {'content-type': 'application/json'}
>      >      >      > _post_request: got response {'fp': <socket._fileobject
>      >     object at
>      >      >      > 0x7f858330b9d0>, 'status': 200, 'will_close': False,
>      >     'chunk_left':
>      >      >      > 'UNKNOWN', 'length': 1395, 'strict': 0, 'reason': 'OK',
>      >      >     'version': 11,
>      >      >      > 'debuglevel': 0, 'msg': <httplib.HTTPMessage
>     instance at
>      >      >      > 0x7f84fb50be18>, 'chunked': 0, '_method': 'POST'}
>      >      >      > response.read(): body =
>      >      >      >
>      >      >
>      >     
>       {"response":"executeResults","missingStatement":false,"rpcMetadata":{"response":"rpcMetadata","serverAddress":"ip-10-55-6-247:8765"},"results":[{"response":"resultSet","connectionId":"68c05d12-5770-47d6-b3e4-dba556db4790","statementId":3,"ownStatement":true,"signature":{"columns":[{"ordinal":0,"autoIncrement":false,"caseSensitive":false,"searchable":true,"currency":false,"nullable
>      >      >      >
>      >      >
>      >     
>       ":0,"signed":true,"displaySize":40,"label":"COL1","columnName":"COL1","schemaName":"","precision":0,"scale":0,"tableName":"TABLE","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"PRIMITIVE_INT"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"},{"ordinal":1,"autoIncrement":false,"caseSensitive":false,"searchable":true,"currency":false,"nullable":0,"signed":true,"displaySize":40,"label":"COL2","columnName":"COL2","schemaName":"","precision":0,"scale":0,"tableName":"TABLE","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"PRIMITIVE_INT"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"}],"sql":null,"parameters":[],"cursorFactory":{"style":"LIST","clazz":null,"fieldNames":null},"statementType":null},*"firstFrame":{"offset":0,"done":true,"rows":[]*},"updateCount":-1,"rpcMetadata":{"response":"rpcMetadata","serverAddress":"ip-10-55-6-247:8765"}}]}
>      >      >
>      >      >      >
>      >      >      >
>      >      >      > The same query issued against a PQS started with
>     PROTOBUF
>      >      >     serialization
>      >      >      > and using a newer phoenixdb adapter returns the correct
>      >     number of
>      >      >     rows.
>      >      >      >
>      >      >      > Has anyone had luck making this work?
>      >      >      >
>      >      >      > Thanks,
>      >      >      > Manoj
>      >      >      >
>      >      >
>      >
>

Re: Python phoenixdb adapter and JSON serialization on PQS

Posted by Manoj Ganesan <mg...@23andme.com>.

Thanks for the pointers Josh.

Here’s a set of instructions to demonstrate the performance issues I’m
seeing when using protobuf serialization. Hopefully these are comprehensive
enough to reproduce the issue. If not, please let me know and I’m happy to
provide clarifications.

Basic outline of the instructions:
1. Create and populate a test table
2. Use the python adapter to fetch rows
3. JSON serialization

1. Create and populate a test table

1.1. Create a table:

$> /usr/lib/phoenix/bin/sqlline.py localhost
CREATE TABLE test_perf (ind_id integer not null, a tinyint , b tinyint , c
varchar(12), constraint pk primary key (ind_id));

1.2. I wrote this simple python script to populate the table with about 6
million rows:

# data_generator.py
import string
import csv
import random

for i in range(0, 6000000):
    print ','.join([str(i), str(random.randint(-127, 127)),
str(random.randint(-127, 127)),
''.join(random.choice(string.ascii_uppercase + string.digits) for _ in
range(12))])

# run the script
$> python data_generator.py > in.csv

1.3. Import rows into the table

$> /opt/phoenix/bin/psql.py -t TEST_PERF <cluster_address> in.csv

2. Use the python adapter to fetch rows

2.1. Install the Python adapter

The instructions are detailed on the project README (
https://github.com/apache/phoenix/tree/master/python/phoenixdb).

pip install phoenixdb

2.2. Write a script to fetch rows

import phoenixdb
from time import time

database_url = ‘<your_pqs_url>’
sql = 'select ind_id, a, b, c from test_perf'

conn = phoenixdb.connect(database_url, autocommit=True)

print 'connecting to %s' % database_url
cursor = conn.cursor()
cursor.itersize = -2  # to fetch all rows at once
s = time()
print 'executing query %s' % sql
cursor.execute()
print 'time to execute = %s' % (time() - s)

s = time()
print 'fetching results'
results = cursor.fetchall()
print 'time to fetchall = %s' % (time() - s)

2.3. Execute the fetch script

When I execute the above script to fetch all rows at once (i.e., using
cursor.itersize = -2), PQS crashes and returns and error:

Traceback (most recent call last):
  File "queries/protobuf_client.py", line 12, in <module>
    cursor = conn.cursor()
  File
"/Users/mganesan/ttam/local-scripts/venv/lib/python2.7/site-packages/phoenixdb/cursor.py",
line 201, in execute
    operation, first_frame_max_size=self.itersize)
  File
"/Users/mganesan/ttam/local-scripts/venv/lib/python2.7/site-packages/phoenixdb/avatica.py",
line 405, in prepare_and_execute
    response_data = self._apply(request, 'ExecuteResponse')
  File
"/Users/mganesan/ttam/local-scripts/venv/lib/python2.7/site-packages/phoenixdb/avatica.py",
line 213, in _apply
    parse_error_page(response_body)
  File
"/Users/mganesan/ttam/local-scripts/venv/lib/python2.7/site-packages/phoenixdb/avatica.py",
line 117, in parse_error_page
    raise errors.InternalError(message)
phoenixdb.errors.InternalError: ('GC overhead limit exceeded', None, None,
None)

If I fetch it 2000 rows at a time (i.e., cursor.itersize = 2000, which is
also the default), the fetch finishes in 360s.

3. JSON serialization

One of the issues I mentioned in my previous email was about how JSON
serialization seems to be not supported via the Python client even though I
was making the queries as per the Avatica JSON reference (
https://calcite.apache.org/avatica/docs/json_reference.html). The client I
wrote always received 0 rows, but no errors. On digging through the source
a bit, I realized that with the version being used within Phoenix (the
latest?), 2 new arguments are expected with the ‘prepareAndExecute’ method:
‘maxRowsTotal’ and ‘maxRowsInFirstFrame’. Essentially the request should
look like the following (note the last 2 args):
{
  'request': 'prepareAndExecute',
  'sql': 'select ind_id, a, b, c from test_perf’,
  'connectionId': '2747c631-440e-405b-8185-44d1cc7f266c',
  'statementId': 1,
  'maxRowCount': -2,
  'maxRowsTotal': -2,
  'maxRowsInFirstFrame': -2,
}

“sqlline-thin.json -s JSON” works because the Avatica client driver
Phoenix/Sqlline uses makes the right HTTP calls, with the correct args.

The response time with using JSON serialization and fetching *all* rows
(cursor.itersize = -2) was 60s, which is much faster than the 360s that it
takes with protobuf when passing cursor.itersize=2K (note that protobuf did
not work with fetching all rows at once, i.e., cursor.itersize=-2)

Because of these issues, for now we have switched PQS to using JSON
serialization and updated our clients to use the same. We’re obviously very
much interested in understanding how the protobuf path can be made faster.

Thanks again for the help!
Manoj

On Tue, Nov 6, 2018 at 9:51 AM Josh Elser <el...@apache.org> wrote:

>
>
> On 11/5/18 10:10 PM, Manoj Ganesan wrote:
> > Thanks for the pointers Josh. I'm working on getting a representative
> > concise test to demonstrate the issue.
> >
> > Meanwhile, I had one question regarding the following:
> >
> >     You are right that the operations in PQS should be exactly the same,
> >     regardless of the client you're using -- that is how this
> >     architecture works.
> >
> >
> > IIUC, this means the following 2 methods should yield the same result:
> >
> >  1. sqlline-thin.py -s JSON <query_file>
> >  2. using a python avatica client script making JSON requests
>
> That's correct. Any client which speaks to PQS should see the same
> results. There may be bugs in the client implementation, of course,
> which make this statement false.
>
> > I made the following change in hbase-site.xml on the PQS host:
> >
> > <property>
> >      <name>phoenix.queryserver.serialization</name>
> >      <value>JSON</value>
> > </property>
> >
> > I notice that executing "sqlline-thin.py -s JSON <query_file>" returns
> > results just fine. However, when I use a simple script to try the same
> > query, it returns 0 rows. I'm attaching the Python script here. The
> > script essentially makes HTTP calls using the Avatica JSON reference
> > <https://calcite.apache.org/avatica/docs/json_reference.html>. I
> assumed
> > that the sqlline-thin wrapper (when passed the -s JSON flag) also make
> > HTTP calls based on the JSON reference, is that not correct?
>
> Apache mailing lists strip attachments. Please consider hosting it
> somewhere else, along with instructions/scripts to generate the required
> tables. Please provide some more analysis of the problem than just a
> summarization of what you see as an end-user -- I don't have the cycles
> or interest to debug the entire system for you :)
>
> Avatica is a protocol that interprets JDBC using some serialization
> (JSON or Protobuf today) and a transport (only HTTP) to a remote server
> to run the JDBC oeprations. So, yes: an Avatica client is always using
> HTTP, given whatever serialization you instruct it to use.
>
> > I'll work on getting some test cases here soon to illustrate this as
> > well as the performance problem.
> >
> > Thanks again!
> > Manoj
> >
> > On Mon, Nov 5, 2018 at 10:43 AM Josh Elser <elserj@apache.org
> > <ma...@apache.org>> wrote:
> >
> >     Is the OOME issue regardless of using the Java client (sqlline-thin)
> >     and
> >     the Python client? I would like to know more about this one. If you
> can
> >     share something that reproduces the problem for you, I'd like to look
> >     into it. The only suggestion I have at this point in time is to make
> >     sure you set a reasonable max-heap size in hbase-env.sh (e.g. -Xmx)
> via
> >     PHOENIX_QUERYSERVER_OPTS and have HBASE_CONF_DIR pointing to the
> right
> >     directory when you launch PQS.
> >
> >     Regarding performance, as you've described it, it sounds like the
> >     Python
> >     driver is just slower than the Java driver. You are right that the
> >     operations in PQS should be exactly the same, regardless of the
> client
> >     you're using -- that is how this architecture works. Avatica is a
> wire
> >     protocol that all clients use to talk to PQS. More
> digging/information
> >     you can provide about the exact circumstances (and, again,
> >     steps/environment to reproduce what you see) would be extremely
> helpful.
> >
> >     Thanks Manoj.
> >
> >     - Josh
> >
> >     On 11/2/18 7:16 PM, Manoj Ganesan wrote:
> >      > Thanks Josh for the response!
> >      >
> >      > I would definitely like to use protobuf serialization, but I'm
> >     observing
> >      > performance issues trying to run queries with a large number of
> >     results.
> >      > One problem is that I observe PQS runs out of memory, when its
> >     trying to
> >      > (what looks like to me) serialize the results in Avatica. The
> >     other is
> >      > that the phoenixdb python adapter itself spends a large amount of
> >     time
> >      > in the logic
> >      >
> >     <
> https://github.com/apache/phoenix/blob/master/python/phoenixdb/phoenixdb/cursor.py#L248
> >
> >
> >      > where its converting the protobuf rows to python objects.
> >      >
> >      > Interestingly when we use sqlline-thin.py instead of python
> >     phoenixdb,
> >      > the protobuf serialization works fine and responses are fast.
> >     It's not
> >      > clear to me why PQS would have problems when using the python
> >     adapter
> >      > and not when using sqlline-thin, do they follow different code
> paths
> >      > (especially around serialization)?
> >      >
> >      > Thanks again,
> >      > Manoj
> >      >
> >      > On Fri, Nov 2, 2018 at 4:05 PM Josh Elser <elserj@apache.org
> >     <ma...@apache.org>
> >      > <mailto:elserj@apache.org <ma...@apache.org>>> wrote:
> >      >
> >      >     I would strongly suggest you do not use the JSON
> serialization.
> >      >
> >      >     The JSON support is implemented via Jackson which has no
> >     means to make
> >      >     backwards compatibility "easy". On the contrast, protobuf
> >     makes this
> >      >     extremely easy and we have multiple examples over the past
> >     years where
> >      >     we've been able to fix bugs in a backwards compatible manner.
> >      >
> >      >     If you want the thin client to continue to work across
> >     versions, stick
> >      >     with protobuf.
> >      >
> >      >     On 11/2/18 5:27 PM, Manoj Ganesan wrote:
> >      >      > Hey everyone,
> >      >      >
> >      >      > I'm trying to use the Python phoenixdb adapter work with
> JSON
> >      >      > serialization on PQS.
> >      >      >
> >      >      > I'm using Phoenix 4.14 and the adapter works fine with
> >     protobuf, but
> >      >      > when I try making it work with an older version of
> phoenixdb
> >      >     (before the
> >      >      > JSON to protobuf switch was introduced), it just returns 0
> >     rows.
> >      >     I don't
> >      >      > see anything in particular wrong with the HTTP requests
> >     itself,
> >      >     and they
> >      >      > seem to conform to the Avatica JSON spec
> >      >      > (
> http://calcite.apache.org/avatica/docs/json_reference.html).
> >      >      >
> >      >      > Here's the result (with some debug statements) that
> >     returns 0 rows.
> >      >      > Notice the
> >     *"firstFrame":{"offset":0,"done":true,"rows":[]* below:
> >      >      >
> >      >      > request body =  {"maxRowCount": -2, "connectionId":
> >      >      > "68c05d12-5770-47d6-b3e4-dba556db4790", "request":
> >      >     "prepareAndExecute",
> >      >      > "statementId": 3, "sql": "SELECT col1, col2 from table
> >     limit 20"}
> >      >      > request headers =  {'content-type': 'application/json'}
> >      >      > _post_request: got response {'fp': <socket._fileobject
> >     object at
> >      >      > 0x7f858330b9d0>, 'status': 200, 'will_close': False,
> >     'chunk_left':
> >      >      > 'UNKNOWN', 'length': 1395, 'strict': 0, 'reason': 'OK',
> >      >     'version': 11,
> >      >      > 'debuglevel': 0, 'msg': <httplib.HTTPMessage instance at
> >      >      > 0x7f84fb50be18>, 'chunked': 0, '_method': 'POST'}
> >      >      > response.read(): body =
> >      >      >
> >      >
> >
>  {"response":"executeResults","missingStatement":false,"rpcMetadata":{"response":"rpcMetadata","serverAddress":"ip-10-55-6-247:8765"},"results":[{"response":"resultSet","connectionId":"68c05d12-5770-47d6-b3e4-dba556db4790","statementId":3,"ownStatement":true,"signature":{"columns":[{"ordinal":0,"autoIncrement":false,"caseSensitive":false,"searchable":true,"currency":false,"nullable
> >      >      >
> >      >
> >
>  ":0,"signed":true,"displaySize":40,"label":"COL1","columnName":"COL1","schemaName":"","precision":0,"scale":0,"tableName":"TABLE","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"PRIMITIVE_INT"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"},{"ordinal":1,"autoIncrement":false,"caseSensitive":false,"searchable":true,"currency":false,"nullable":0,"signed":true,"displaySize":40,"label":"COL2","columnName":"COL2","schemaName":"","precision":0,"scale":0,"tableName":"TABLE","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"PRIMITIVE_INT"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"}],"sql":null,"parameters":[],"cursorFactory":{"style":"LIST","clazz":null,"fieldNames":null},"statementType":null},*"firstFrame":{"offset":0,"done":true,"rows":[]*},"updateCount":-1,"rpcMetadata":{"response":"rpcMetadata","serverAddress":"ip-10-55-6-247:8765"}}]}
> >      >
> >      >      >
> >      >      >
> >      >      > The same query issued against a PQS started with PROTOBUF
> >      >     serialization
> >      >      > and using a newer phoenixdb adapter returns the correct
> >     number of
> >      >     rows.
> >      >      >
> >      >      > Has anyone had luck making this work?
> >      >      >
> >      >      > Thanks,
> >      >      > Manoj
> >      >      >
> >      >
> >
>

Re: Python phoenixdb adapter and JSON serialization on PQS

Posted by Josh Elser <el...@apache.org>.


On 11/5/18 10:10 PM, Manoj Ganesan wrote:
> Thanks for the pointers Josh. I'm working on getting a representative 
> concise test to demonstrate the issue.
> 
> Meanwhile, I had one question regarding the following:
> 
>     You are right that the operations in PQS should be exactly the same,
>     regardless of the client you're using -- that is how this
>     architecture works.
> 
> 
> IIUC, this means the following 2 methods should yield the same result:
> 
>  1. sqlline-thin.py -s JSON <query_file>
>  2. using a python avatica client script making JSON requests

That's correct. Any client which speaks to PQS should see the same 
results. There may be bugs in the client implementation, of course, 
which make this statement false.

> I made the following change in hbase-site.xml on the PQS host:
> 
> <property>
>      <name>phoenix.queryserver.serialization</name>
>      <value>JSON</value>
> </property>
> 
> I notice that executing "sqlline-thin.py -s JSON <query_file>" returns 
> results just fine. However, when I use a simple script to try the same 
> query, it returns 0 rows. I'm attaching the Python script here. The 
> script essentially makes HTTP calls using the Avatica JSON reference 
> <https://calcite.apache.org/avatica/docs/json_reference.html>. I assumed 
> that the sqlline-thin wrapper (when passed the -s JSON flag) also make 
> HTTP calls based on the JSON reference, is that not correct?

Apache mailing lists strip attachments. Please consider hosting it 
somewhere else, along with instructions/scripts to generate the required 
tables. Please provide some more analysis of the problem than just a 
summarization of what you see as an end-user -- I don't have the cycles 
or interest to debug the entire system for you :)

Avatica is a protocol that interprets JDBC using some serialization 
(JSON or Protobuf today) and a transport (only HTTP) to a remote server 
to run the JDBC oeprations. So, yes: an Avatica client is always using 
HTTP, given whatever serialization you instruct it to use.

> I'll work on getting some test cases here soon to illustrate this as 
> well as the performance problem.
> 
> Thanks again!
> Manoj
> 
> On Mon, Nov 5, 2018 at 10:43 AM Josh Elser <elserj@apache.org 
> <ma...@apache.org>> wrote:
> 
>     Is the OOME issue regardless of using the Java client (sqlline-thin)
>     and
>     the Python client? I would like to know more about this one. If you can
>     share something that reproduces the problem for you, I'd like to look
>     into it. The only suggestion I have at this point in time is to make
>     sure you set a reasonable max-heap size in hbase-env.sh (e.g. -Xmx) via
>     PHOENIX_QUERYSERVER_OPTS and have HBASE_CONF_DIR pointing to the right
>     directory when you launch PQS.
> 
>     Regarding performance, as you've described it, it sounds like the
>     Python
>     driver is just slower than the Java driver. You are right that the
>     operations in PQS should be exactly the same, regardless of the client
>     you're using -- that is how this architecture works. Avatica is a wire
>     protocol that all clients use to talk to PQS. More digging/information
>     you can provide about the exact circumstances (and, again,
>     steps/environment to reproduce what you see) would be extremely helpful.
> 
>     Thanks Manoj.
> 
>     - Josh
> 
>     On 11/2/18 7:16 PM, Manoj Ganesan wrote:
>      > Thanks Josh for the response!
>      >
>      > I would definitely like to use protobuf serialization, but I'm
>     observing
>      > performance issues trying to run queries with a large number of
>     results.
>      > One problem is that I observe PQS runs out of memory, when its
>     trying to
>      > (what looks like to me) serialize the results in Avatica. The
>     other is
>      > that the phoenixdb python adapter itself spends a large amount of
>     time
>      > in the logic
>      >
>     <https://github.com/apache/phoenix/blob/master/python/phoenixdb/phoenixdb/cursor.py#L248>
> 
>      > where its converting the protobuf rows to python objects.
>      >
>      > Interestingly when we use sqlline-thin.py instead of python
>     phoenixdb,
>      > the protobuf serialization works fine and responses are fast.
>     It's not
>      > clear to me why PQS would have problems when using the python
>     adapter
>      > and not when using sqlline-thin, do they follow different code paths
>      > (especially around serialization)?
>      >
>      > Thanks again,
>      > Manoj
>      >
>      > On Fri, Nov 2, 2018 at 4:05 PM Josh Elser <elserj@apache.org
>     <ma...@apache.org>
>      > <mailto:elserj@apache.org <ma...@apache.org>>> wrote:
>      >
>      >     I would strongly suggest you do not use the JSON serialization.
>      >
>      >     The JSON support is implemented via Jackson which has no
>     means to make
>      >     backwards compatibility "easy". On the contrast, protobuf
>     makes this
>      >     extremely easy and we have multiple examples over the past
>     years where
>      >     we've been able to fix bugs in a backwards compatible manner.
>      >
>      >     If you want the thin client to continue to work across
>     versions, stick
>      >     with protobuf.
>      >
>      >     On 11/2/18 5:27 PM, Manoj Ganesan wrote:
>      >      > Hey everyone,
>      >      >
>      >      > I'm trying to use the Python phoenixdb adapter work with JSON
>      >      > serialization on PQS.
>      >      >
>      >      > I'm using Phoenix 4.14 and the adapter works fine with
>     protobuf, but
>      >      > when I try making it work with an older version of phoenixdb
>      >     (before the
>      >      > JSON to protobuf switch was introduced), it just returns 0
>     rows.
>      >     I don't
>      >      > see anything in particular wrong with the HTTP requests
>     itself,
>      >     and they
>      >      > seem to conform to the Avatica JSON spec
>      >      > (http://calcite.apache.org/avatica/docs/json_reference.html).
>      >      >
>      >      > Here's the result (with some debug statements) that
>     returns 0 rows.
>      >      > Notice the
>     *"firstFrame":{"offset":0,"done":true,"rows":[]* below:
>      >      >
>      >      > request body =  {"maxRowCount": -2, "connectionId":
>      >      > "68c05d12-5770-47d6-b3e4-dba556db4790", "request":
>      >     "prepareAndExecute",
>      >      > "statementId": 3, "sql": "SELECT col1, col2 from table
>     limit 20"}
>      >      > request headers =  {'content-type': 'application/json'}
>      >      > _post_request: got response {'fp': <socket._fileobject
>     object at
>      >      > 0x7f858330b9d0>, 'status': 200, 'will_close': False,
>     'chunk_left':
>      >      > 'UNKNOWN', 'length': 1395, 'strict': 0, 'reason': 'OK',
>      >     'version': 11,
>      >      > 'debuglevel': 0, 'msg': <httplib.HTTPMessage instance at
>      >      > 0x7f84fb50be18>, 'chunked': 0, '_method': 'POST'}
>      >      > response.read(): body =
>      >      >
>      >   
>       {"response":"executeResults","missingStatement":false,"rpcMetadata":{"response":"rpcMetadata","serverAddress":"ip-10-55-6-247:8765"},"results":[{"response":"resultSet","connectionId":"68c05d12-5770-47d6-b3e4-dba556db4790","statementId":3,"ownStatement":true,"signature":{"columns":[{"ordinal":0,"autoIncrement":false,"caseSensitive":false,"searchable":true,"currency":false,"nullable
>      >      >
>      >   
>       ":0,"signed":true,"displaySize":40,"label":"COL1","columnName":"COL1","schemaName":"","precision":0,"scale":0,"tableName":"TABLE","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"PRIMITIVE_INT"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"},{"ordinal":1,"autoIncrement":false,"caseSensitive":false,"searchable":true,"currency":false,"nullable":0,"signed":true,"displaySize":40,"label":"COL2","columnName":"COL2","schemaName":"","precision":0,"scale":0,"tableName":"TABLE","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"PRIMITIVE_INT"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"}],"sql":null,"parameters":[],"cursorFactory":{"style":"LIST","clazz":null,"fieldNames":null},"statementType":null},*"firstFrame":{"offset":0,"done":true,"rows":[]*},"updateCount":-1,"rpcMetadata":{"response":"rpcMetadata","serverAddress":"ip-10-55-6-247:8765"}}]}
>      >
>      >      >
>      >      >
>      >      > The same query issued against a PQS started with PROTOBUF
>      >     serialization
>      >      > and using a newer phoenixdb adapter returns the correct
>     number of
>      >     rows.
>      >      >
>      >      > Has anyone had luck making this work?
>      >      >
>      >      > Thanks,
>      >      > Manoj
>      >      >
>      >
>

Re: Python phoenixdb adapter and JSON serialization on PQS

Posted by Manoj Ganesan <mg...@23andme.com>.

Thanks for the pointers Josh. I'm working on getting a representative
concise test to demonstrate the issue.

Meanwhile, I had one question regarding the following:

You are right that the operations in PQS should be exactly the same,
> regardless of the client you're using -- that is how this architecture
> works.


IIUC, this means the following 2 methods should yield the same result:

   1. sqlline-thin.py -s JSON <query_file>
   2. using a python avatica client script making JSON requests

I made the following change in hbase-site.xml on the PQS host:

<property>
    <name>phoenix.queryserver.serialization</name>
    <value>JSON</value>
</property>

I notice that executing "sqlline-thin.py -s JSON <query_file>" returns
results just fine. However, when I use a simple script to try the same
query, it returns 0 rows. I'm attaching the Python script here. The script
essentially makes HTTP calls using the Avatica JSON reference
<https://calcite.apache.org/avatica/docs/json_reference.html>. I assumed
that the sqlline-thin wrapper (when passed the -s JSON flag) also make HTTP
calls based on the JSON reference, is that not correct?

I'll work on getting some test cases here soon to illustrate this as well
as the performance problem.

Thanks again!
Manoj

On Mon, Nov 5, 2018 at 10:43 AM Josh Elser <el...@apache.org> wrote:

> Is the OOME issue regardless of using the Java client (sqlline-thin) and
> the Python client? I would like to know more about this one. If you can
> share something that reproduces the problem for you, I'd like to look
> into it. The only suggestion I have at this point in time is to make
> sure you set a reasonable max-heap size in hbase-env.sh (e.g. -Xmx) via
> PHOENIX_QUERYSERVER_OPTS and have HBASE_CONF_DIR pointing to the right
> directory when you launch PQS.
>
> Regarding performance, as you've described it, it sounds like the Python
> driver is just slower than the Java driver. You are right that the
> operations in PQS should be exactly the same, regardless of the client
> you're using -- that is how this architecture works. Avatica is a wire
> protocol that all clients use to talk to PQS. More digging/information
> you can provide about the exact circumstances (and, again,
> steps/environment to reproduce what you see) would be extremely helpful.
>
> Thanks Manoj.
>
> - Josh
>
> On 11/2/18 7:16 PM, Manoj Ganesan wrote:
> > Thanks Josh for the response!
> >
> > I would definitely like to use protobuf serialization, but I'm observing
> > performance issues trying to run queries with a large number of results.
> > One problem is that I observe PQS runs out of memory, when its trying to
> > (what looks like to me) serialize the results in Avatica. The other is
> > that the phoenixdb python adapter itself spends a large amount of time
> > in the logic
> > <
> https://github.com/apache/phoenix/blob/master/python/phoenixdb/phoenixdb/cursor.py#L248>
>
> > where its converting the protobuf rows to python objects.
> >
> > Interestingly when we use sqlline-thin.py instead of python phoenixdb,
> > the protobuf serialization works fine and responses are fast. It's not
> > clear to me why PQS would have problems when using the python adapter
> > and not when using sqlline-thin, do they follow different code paths
> > (especially around serialization)?
> >
> > Thanks again,
> > Manoj
> >
> > On Fri, Nov 2, 2018 at 4:05 PM Josh Elser <elserj@apache.org
> > <ma...@apache.org>> wrote:
> >
> >     I would strongly suggest you do not use the JSON serialization.
> >
> >     The JSON support is implemented via Jackson which has no means to
> make
> >     backwards compatibility "easy". On the contrast, protobuf makes this
> >     extremely easy and we have multiple examples over the past years
> where
> >     we've been able to fix bugs in a backwards compatible manner.
> >
> >     If you want the thin client to continue to work across versions,
> stick
> >     with protobuf.
> >
> >     On 11/2/18 5:27 PM, Manoj Ganesan wrote:
> >      > Hey everyone,
> >      >
> >      > I'm trying to use the Python phoenixdb adapter work with JSON
> >      > serialization on PQS.
> >      >
> >      > I'm using Phoenix 4.14 and the adapter works fine with protobuf,
> but
> >      > when I try making it work with an older version of phoenixdb
> >     (before the
> >      > JSON to protobuf switch was introduced), it just returns 0 rows.
> >     I don't
> >      > see anything in particular wrong with the HTTP requests itself,
> >     and they
> >      > seem to conform to the Avatica JSON spec
> >      > (http://calcite.apache.org/avatica/docs/json_reference.html).
> >      >
> >      > Here's the result (with some debug statements) that returns 0
> rows.
> >      > Notice the *"firstFrame":{"offset":0,"done":true,"rows":[]* below:
> >      >
> >      > request body =  {"maxRowCount": -2, "connectionId":
> >      > "68c05d12-5770-47d6-b3e4-dba556db4790", "request":
> >     "prepareAndExecute",
> >      > "statementId": 3, "sql": "SELECT col1, col2 from table limit 20"}
> >      > request headers =  {'content-type': 'application/json'}
> >      > _post_request: got response {'fp': <socket._fileobject object at
> >      > 0x7f858330b9d0>, 'status': 200, 'will_close': False, 'chunk_left':
> >      > 'UNKNOWN', 'length': 1395, 'strict': 0, 'reason': 'OK',
> >     'version': 11,
> >      > 'debuglevel': 0, 'msg': <httplib.HTTPMessage instance at
> >      > 0x7f84fb50be18>, 'chunked': 0, '_method': 'POST'}
> >      > response.read(): body =
> >      >
> >
>  {"response":"executeResults","missingStatement":false,"rpcMetadata":{"response":"rpcMetadata","serverAddress":"ip-10-55-6-247:8765"},"results":[{"response":"resultSet","connectionId":"68c05d12-5770-47d6-b3e4-dba556db4790","statementId":3,"ownStatement":true,"signature":{"columns":[{"ordinal":0,"autoIncrement":false,"caseSensitive":false,"searchable":true,"currency":false,"nullable
> >      >
> >
>  ":0,"signed":true,"displaySize":40,"label":"COL1","columnName":"COL1","schemaName":"","precision":0,"scale":0,"tableName":"TABLE","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"PRIMITIVE_INT"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"},{"ordinal":1,"autoIncrement":false,"caseSensitive":false,"searchable":true,"currency":false,"nullable":0,"signed":true,"displaySize":40,"label":"COL2","columnName":"COL2","schemaName":"","precision":0,"scale":0,"tableName":"TABLE","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"PRIMITIVE_INT"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"}],"sql":null,"parameters":[],"cursorFactory":{"style":"LIST","clazz":null,"fieldNames":null},"statementType":null},*"firstFrame":{"offset":0,"done":true,"rows":[]*},"updateCount":-1,"rpcMetadata":{"response":"rpcMetadata","serverAddress":"ip-10-55-6-247:8765"}}]}
> >
> >      >
> >      >
> >      > The same query issued against a PQS started with PROTOBUF
> >     serialization
> >      > and using a newer phoenixdb adapter returns the correct number of
> >     rows.
> >      >
> >      > Has anyone had luck making this work?
> >      >
> >      > Thanks,
> >      > Manoj
> >      >
> >
>

Re: Python phoenixdb adapter and JSON serialization on PQS

Posted by Josh Elser <el...@apache.org>.

Is the OOME issue regardless of using the Java client (sqlline-thin) and 
the Python client? I would like to know more about this one. If you can 
share something that reproduces the problem for you, I'd like to look 
into it. The only suggestion I have at this point in time is to make 
sure you set a reasonable max-heap size in hbase-env.sh (e.g. -Xmx) via 
PHOENIX_QUERYSERVER_OPTS and have HBASE_CONF_DIR pointing to the right 
directory when you launch PQS.

Regarding performance, as you've described it, it sounds like the Python 
driver is just slower than the Java driver. You are right that the 
operations in PQS should be exactly the same, regardless of the client 
you're using -- that is how this architecture works. Avatica is a wire 
protocol that all clients use to talk to PQS. More digging/information 
you can provide about the exact circumstances (and, again, 
steps/environment to reproduce what you see) would be extremely helpful.

Thanks Manoj.

- Josh

On 11/2/18 7:16 PM, Manoj Ganesan wrote:
> Thanks Josh for the response!
> 
> I would definitely like to use protobuf serialization, but I'm observing 
> performance issues trying to run queries with a large number of results. 
> One problem is that I observe PQS runs out of memory, when its trying to 
> (what looks like to me) serialize the results in Avatica. The other is 
> that the phoenixdb python adapter itself spends a large amount of time 
> in the logic 
> <https://github.com/apache/phoenix/blob/master/python/phoenixdb/phoenixdb/cursor.py#L248> 
> where its converting the protobuf rows to python objects.
> 
> Interestingly when we use sqlline-thin.py instead of python phoenixdb, 
> the protobuf serialization works fine and responses are fast. It's not 
> clear to me why PQS would have problems when using the python adapter 
> and not when using sqlline-thin, do they follow different code paths 
> (especially around serialization)?
> 
> Thanks again,
> Manoj
> 
> On Fri, Nov 2, 2018 at 4:05 PM Josh Elser <elserj@apache.org 
> <ma...@apache.org>> wrote:
> 
>     I would strongly suggest you do not use the JSON serialization.
> 
>     The JSON support is implemented via Jackson which has no means to make
>     backwards compatibility "easy". On the contrast, protobuf makes this
>     extremely easy and we have multiple examples over the past years where
>     we've been able to fix bugs in a backwards compatible manner.
> 
>     If you want the thin client to continue to work across versions, stick
>     with protobuf.
> 
>     On 11/2/18 5:27 PM, Manoj Ganesan wrote:
>      > Hey everyone,
>      >
>      > I'm trying to use the Python phoenixdb adapter work with JSON
>      > serialization on PQS.
>      >
>      > I'm using Phoenix 4.14 and the adapter works fine with protobuf, but
>      > when I try making it work with an older version of phoenixdb
>     (before the
>      > JSON to protobuf switch was introduced), it just returns 0 rows.
>     I don't
>      > see anything in particular wrong with the HTTP requests itself,
>     and they
>      > seem to conform to the Avatica JSON spec
>      > (http://calcite.apache.org/avatica/docs/json_reference.html).
>      >
>      > Here's the result (with some debug statements) that returns 0 rows.
>      > Notice the *"firstFrame":{"offset":0,"done":true,"rows":[]* below:
>      >
>      > request body =  {"maxRowCount": -2, "connectionId":
>      > "68c05d12-5770-47d6-b3e4-dba556db4790", "request":
>     "prepareAndExecute",
>      > "statementId": 3, "sql": "SELECT col1, col2 from table limit 20"}
>      > request headers =  {'content-type': 'application/json'}
>      > _post_request: got response {'fp': <socket._fileobject object at
>      > 0x7f858330b9d0>, 'status': 200, 'will_close': False, 'chunk_left':
>      > 'UNKNOWN', 'length': 1395, 'strict': 0, 'reason': 'OK',
>     'version': 11,
>      > 'debuglevel': 0, 'msg': <httplib.HTTPMessage instance at
>      > 0x7f84fb50be18>, 'chunked': 0, '_method': 'POST'}
>      > response.read(): body =
>      >
>     {"response":"executeResults","missingStatement":false,"rpcMetadata":{"response":"rpcMetadata","serverAddress":"ip-10-55-6-247:8765"},"results":[{"response":"resultSet","connectionId":"68c05d12-5770-47d6-b3e4-dba556db4790","statementId":3,"ownStatement":true,"signature":{"columns":[{"ordinal":0,"autoIncrement":false,"caseSensitive":false,"searchable":true,"currency":false,"nullable
>      >
>     ":0,"signed":true,"displaySize":40,"label":"COL1","columnName":"COL1","schemaName":"","precision":0,"scale":0,"tableName":"TABLE","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"PRIMITIVE_INT"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"},{"ordinal":1,"autoIncrement":false,"caseSensitive":false,"searchable":true,"currency":false,"nullable":0,"signed":true,"displaySize":40,"label":"COL2","columnName":"COL2","schemaName":"","precision":0,"scale":0,"tableName":"TABLE","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"PRIMITIVE_INT"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"}],"sql":null,"parameters":[],"cursorFactory":{"style":"LIST","clazz":null,"fieldNames":null},"statementType":null},*"firstFrame":{"offset":0,"done":true,"rows":[]*},"updateCount":-1,"rpcMetadata":{"response":"rpcMetadata","serverAddress":"ip-10-55-6-247:8765"}}]}
> 
>      >
>      >
>      > The same query issued against a PQS started with PROTOBUF
>     serialization
>      > and using a newer phoenixdb adapter returns the correct number of
>     rows.
>      >
>      > Has anyone had luck making this work?
>      >
>      > Thanks,
>      > Manoj
>      >
>

Re: Python phoenixdb adapter and JSON serialization on PQS

Posted by Manoj Ganesan <mg...@23andme.com>.

Thanks Josh for the response!

I would definitely like to use protobuf serialization, but I'm observing
performance issues trying to run queries with a large number of results.
One problem is that I observe PQS runs out of memory, when its trying to
(what looks like to me) serialize the results in Avatica. The other is that
the phoenixdb python adapter itself spends a large amount of time in the
logic
<https://github.com/apache/phoenix/blob/master/python/phoenixdb/phoenixdb/cursor.py#L248>
where its converting the protobuf rows to python objects.

Interestingly when we use sqlline-thin.py instead of python phoenixdb, the
protobuf serialization works fine and responses are fast. It's not clear to
me why PQS would have problems when using the python adapter and not when
using sqlline-thin, do they follow different code paths (especially around
serialization)?

Thanks again,
Manoj

On Fri, Nov 2, 2018 at 4:05 PM Josh Elser <el...@apache.org> wrote:

> I would strongly suggest you do not use the JSON serialization.
>
> The JSON support is implemented via Jackson which has no means to make
> backwards compatibility "easy". On the contrast, protobuf makes this
> extremely easy and we have multiple examples over the past years where
> we've been able to fix bugs in a backwards compatible manner.
>
> If you want the thin client to continue to work across versions, stick
> with protobuf.
>
> On 11/2/18 5:27 PM, Manoj Ganesan wrote:
> > Hey everyone,
> >
> > I'm trying to use the Python phoenixdb adapter work with JSON
> > serialization on PQS.
> >
> > I'm using Phoenix 4.14 and the adapter works fine with protobuf, but
> > when I try making it work with an older version of phoenixdb (before the
> > JSON to protobuf switch was introduced), it just returns 0 rows. I don't
> > see anything in particular wrong with the HTTP requests itself, and they
> > seem to conform to the Avatica JSON spec
> > (http://calcite.apache.org/avatica/docs/json_reference.html).
> >
> > Here's the result (with some debug statements) that returns 0 rows.
> > Notice the *"firstFrame":{"offset":0,"done":true,"rows":[]* below:
> >
> > request body =  {"maxRowCount": -2, "connectionId":
> > "68c05d12-5770-47d6-b3e4-dba556db4790", "request": "prepareAndExecute",
> > "statementId": 3, "sql": "SELECT col1, col2 from table limit 20"}
> > request headers =  {'content-type': 'application/json'}
> > _post_request: got response {'fp': <socket._fileobject object at
> > 0x7f858330b9d0>, 'status': 200, 'will_close': False, 'chunk_left':
> > 'UNKNOWN', 'length': 1395, 'strict': 0, 'reason': 'OK', 'version': 11,
> > 'debuglevel': 0, 'msg': <httplib.HTTPMessage instance at
> > 0x7f84fb50be18>, 'chunked': 0, '_method': 'POST'}
> > response.read(): body =
> >
> {"response":"executeResults","missingStatement":false,"rpcMetadata":{"response":"rpcMetadata","serverAddress":"ip-10-55-6-247:8765"},"results":[{"response":"resultSet","connectionId":"68c05d12-5770-47d6-b3e4-dba556db4790","statementId":3,"ownStatement":true,"signature":{"columns":[{"ordinal":0,"autoIncrement":false,"caseSensitive":false,"searchable":true,"currency":false,"nullable
> >
> ":0,"signed":true,"displaySize":40,"label":"COL1","columnName":"COL1","schemaName":"","precision":0,"scale":0,"tableName":"TABLE","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"PRIMITIVE_INT"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"},{"ordinal":1,"autoIncrement":false,"caseSensitive":false,"searchable":true,"currency":false,"nullable":0,"signed":true,"displaySize":40,"label":"COL2","columnName":"COL2","schemaName":"","precision":0,"scale":0,"tableName":"TABLE","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"PRIMITIVE_INT"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"}],"sql":null,"parameters":[],"cursorFactory":{"style":"LIST","clazz":null,"fieldNames":null},"statementType":null},*"firstFrame":{"offset":0,"done":true,"rows":[]*},"updateCount":-1,"rpcMetadata":{"response":"rpcMetadata","serverAddress":"ip-10-55-6-247:8765"}}]}
>
> >
> >
> > The same query issued against a PQS started with PROTOBUF serialization
> > and using a newer phoenixdb adapter returns the correct number of rows.
> >
> > Has anyone had luck making this work?
> >
> > Thanks,
> > Manoj
> >
>

Re: Python phoenixdb adapter and JSON serialization on PQS

Posted by Josh Elser <el...@apache.org>.

I would strongly suggest you do not use the JSON serialization.

The JSON support is implemented via Jackson which has no means to make 
backwards compatibility "easy". On the contrast, protobuf makes this 
extremely easy and we have multiple examples over the past years where 
we've been able to fix bugs in a backwards compatible manner.

If you want the thin client to continue to work across versions, stick 
with protobuf.

On 11/2/18 5:27 PM, Manoj Ganesan wrote:
> Hey everyone,
> 
> I'm trying to use the Python phoenixdb adapter work with JSON 
> serialization on PQS.
> 
> I'm using Phoenix 4.14 and the adapter works fine with protobuf, but 
> when I try making it work with an older version of phoenixdb (before the 
> JSON to protobuf switch was introduced), it just returns 0 rows. I don't 
> see anything in particular wrong with the HTTP requests itself, and they 
> seem to conform to the Avatica JSON spec 
> (http://calcite.apache.org/avatica/docs/json_reference.html).
> 
> Here's the result (with some debug statements) that returns 0 rows. 
> Notice the *"firstFrame":{"offset":0,"done":true,"rows":[]* below:
> 
> request body =  {"maxRowCount": -2, "connectionId": 
> "68c05d12-5770-47d6-b3e4-dba556db4790", "request": "prepareAndExecute", 
> "statementId": 3, "sql": "SELECT col1, col2 from table limit 20"}
> request headers =  {'content-type': 'application/json'}
> _post_request: got response {'fp': <socket._fileobject object at 
> 0x7f858330b9d0>, 'status': 200, 'will_close': False, 'chunk_left': 
> 'UNKNOWN', 'length': 1395, 'strict': 0, 'reason': 'OK', 'version': 11, 
> 'debuglevel': 0, 'msg': <httplib.HTTPMessage instance at 
> 0x7f84fb50be18>, 'chunked': 0, '_method': 'POST'}
> response.read(): body =  
> {"response":"executeResults","missingStatement":false,"rpcMetadata":{"response":"rpcMetadata","serverAddress":"ip-10-55-6-247:8765"},"results":[{"response":"resultSet","connectionId":"68c05d12-5770-47d6-b3e4-dba556db4790","statementId":3,"ownStatement":true,"signature":{"columns":[{"ordinal":0,"autoIncrement":false,"caseSensitive":false,"searchable":true,"currency":false,"nullable
> ":0,"signed":true,"displaySize":40,"label":"COL1","columnName":"COL1","schemaName":"","precision":0,"scale":0,"tableName":"TABLE","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"PRIMITIVE_INT"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"},{"ordinal":1,"autoIncrement":false,"caseSensitive":false,"searchable":true,"currency":false,"nullable":0,"signed":true,"displaySize":40,"label":"COL2","columnName":"COL2","schemaName":"","precision":0,"scale":0,"tableName":"TABLE","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"PRIMITIVE_INT"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"}],"sql":null,"parameters":[],"cursorFactory":{"style":"LIST","clazz":null,"fieldNames":null},"statementType":null},*"firstFrame":{"offset":0,"done":true,"rows":[]*},"updateCount":-1,"rpcMetadata":{"response":"rpcMetadata","serverAddress":"ip-10-55-6-247:8765"}}]} 
> 
> 
> The same query issued against a PQS started with PROTOBUF serialization 
> and using a newer phoenixdb adapter returns the correct number of rows.
> 
> Has anyone had luck making this work?
> 
> Thanks,
> Manoj
>