You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@phoenix.apache.org by alchemist <al...@gmail.com> on 2018/07/11 11:53:08 UTC

Upsert is EXTREMELY slow

I have written a threaded program to batch upsert data into Phoenix.  I am
using Phoenix because of secondary index capability.  I am getting extremely
slow performance in write.

Explain query looks like following

0: jdbc:phoenix:localhost:2181:/hbase> EXPLAIN select count(1) from
VBQL_PHOENIX_TRANSCRIPT5
. . . . . . . . . . . . . . . . . . .> ;
+---------------------------------------------------------------------------------------------------------------+-----------------+------+
|                                                     PLAN                                                     
| EST_BYTES_READ  | EST_ |
+---------------------------------------------------------------------------------------------------------------+-----------------+------+
| CLIENT 100-CHUNK 6838539 ROWS 314572800 BYTES PARALLEL 100-WAY FULL SCAN
OVER VBQL_PHOENIX_TRANSCRIPT_INDEX5  | 314572800       | 6838 |
|     SERVER FILTER BY FIRST KEY ONLY                                                                          
| 314572800       | 6838 |
|     SERVER AGGREGATE INTO SINGLE ROW                                                                         
| 314572800       | 6838 |
+---------------------------------------------------------------------------------------------------------------+-----------------+------+

I have secondary index that looks like this:

0: jdbc:phoenix:localhost:2181:/hbase> EXPLAIN select count(1) from
VBQL_PHOENIX_TRANSCRIPT_INDEX5;
+---------------------------------------------------------------------------------------------------------------+-----------------+------+
|                                                     PLAN                                                     
| EST_BYTES_READ  | EST_ |
+---------------------------------------------------------------------------------------------------------------+-----------------+------+
| CLIENT 100-CHUNK 6838539 ROWS 314572800 BYTES PARALLEL 100-WAY FULL SCAN
OVER VBQL_PHOENIX_TRANSCRIPT_INDEX5  | 314572800       | 6838 |
|     SERVER FILTER BY FIRST KEY ONLY                                                                          
| 314572800       | 6838 |
|     SERVER AGGREGATE INTO SINGLE ROW                                                                         
| 314572800       | 6838 |
+---------------------------------------------------------------------------------------------------------------+-----------------+------+


Tuning Settings used in HBase.  Problem is it is very hard to scale, I tried
adding more nodes to Hbase cluster, and I also tried adding more threads to
the client program but it is not scaling beyond 6K per minutes, which is
VERY SLOW.  Any help is greatly appreciated.

 <property>
      <name>index.writer.threads.max</name>
      <value>30</value>
 </property>
 <property>
      <name>index.builder.threads.max</name>
      <value>30</value>
 </property>
 <property>
      <name>phoenix.query.threadPoolSize</name>
      <value>256</value>
 </property>
 <property>
      <name>index.builder.threads.keepalivetime</name>
      <value>90000</value>
 </property>
<property>
      <name>phoenix.query.timeoutMs</name>
      <value>90000</value>
 </property>

 Rest of the settings I am using are default.

Source Code  (Simple multithreded  with write need to be saved individually/
we cannot use batch  write)

 public void generateAndInsertDataToHbase(int iterationCount ){ 
   try {
	   	 int iterations =0;
	   	 Instant start = Instant.now();
	   	 ExecutorService executorService = Executors.newCachedThreadPool();
	      while (true) {
	    	if (iterations == iterationCount) {
	    		verifyRowCountAndExit(start,executorService);
	    	}
	    	Future<Void> future = executorService.submit(new Callable<Void>() {
	    	    public Void call() throws Exception {
	    	    	List<VbqlBean> vbqlList =
VbqlUtils.generateRandomAccountVbql(accountCategoryMap);
	    	    	if(vbqlList.size() >0) {
	    	    		addVbqlAccountToHbaseAsBatch( vbqlList,connStr);
	    	    	}
	    	        return null;
	    	    }
	    	});
	    	future.get(); 
	    	iterations ++;
	      }
   }catch(Exception e) {
       e.printStackTrace();
   }
 }
 

 public void addVbqlAccountToHbaseAsBatch(List<VbqlBean> vbqlList, String
connStr) throws Exception{
	 Connection connection = null;
	 try {
        connection = DBUtils.initSingleConnection(connStr);
       // connection.setAutoCommit(false);
        if(vbqlList.size() >0) {
	        for (VbqlBean vbqlBean : vbqlList) {
	        	DBUtils.executeSingleVbqlQuery( connStr,  vbqlBean, tableName);
	        }
        }
        connection.commit();
	 } finally {
        if (connection != null) try { connection.close(); } catch
(SQLException ignore) {}
    }
 }





--
Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/

Re: Upsert is EXTREMELY slow

Posted by Josh Elser <el...@apache.org>.

Phoenix does not recommend connection pooling because Phoenix 
Connections are not expensive to create as most DB connections are.

The first connection you make from a JVM is expensive. Every subsequent 
one is cheap.

On 7/11/18 2:55 PM, alchemist wrote:
> Since Phoenix does not recommend connection pooling.  Then even if we have
> multiple threads and processes, each thread will take time to get connection
> to Phoenix server, execute upsert operation then commit the operation.  This
> whole process will take sometime.  And no matter what level of threading we
> have creating connection will take time.  Not sure if my understanding is
> correct.
> 
> 
> 
> --
> Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/
>

Re: Upsert is EXTREMELY slow

Posted by alchemist <al...@gmail.com>.

Since Phoenix does not recommend connection pooling.  Then even if we have
multiple threads and processes, each thread will take time to get connection
to Phoenix server, execute upsert operation then commit the operation.  This
whole process will take sometime.  And no matter what level of threading we
have creating connection will take time.  Not sure if my understanding is
correct.



--
Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/

Re: Upsert is EXTREMELY slow

Posted by alchemist <al...@gmail.com>.

Thanks Josh!  My use case is very simple,  I have data in S3 that is backed
by Hbase in EMR.  I need to access this data from outside EMR cluster, in
this case I created three EC2 nodes that executes multithreaded Java program
in parallel.  These multithreaded program from different EC2 nodes are
trying to get the connection using Phoenix, call upsert to insert data and
commit.

The issue is even though I have 3 nodes running multithreaded program that
inserts data into Hbase is not giving me 10K per minute transactions.



--
Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/

Re: Upsert is EXTREMELY slow

Posted by alchemist <al...@gmail.com>.

Thanks so much for your response.

Now I am getting better perforamnce i.e 15K per minute,  I made two changes. 
I disabled pheonix transaction.

<property>
  <name>phoenix.transactions.enabled</name>
  <value>false</value>
</property>

And I removed the transaction connection.commit();  Logically this should
not make any difference because by default transactions are disabled.  



--
Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/

Re: Upsert is EXTREMELY slow

Posted by Josh Elser <el...@apache.org>.

Sorry, I was brief and didn't get my point across. I meant to say the 
same thing you did.

Someone manually submitting two updates to an index is naively faster 
that what Phoenix goes through to automatically (and safely) do this.

On 7/13/18 12:07 PM, James Taylor wrote:
> Phoenix won’t be slower to update secondary indexes than a use case 
> would be. Both have to do the writes to a second table to keep it in sync.
> 
> On Fri, Jul 13, 2018 at 8:39 AM Josh Elser <elserj@apache.org 
> <ma...@apache.org>> wrote:
> 
>     Also, they're relying on Phoenix to do secondary index updates for them.
> 
>     Obviously, you can do this faster than Phoenix can if you know the
>     exact
>     use-case.
> 
>     On 7/12/18 6:31 PM, Pedro Boado wrote:
>      > A tip for performance is reusing the same preparedStatement , just
>      > clearParameters() , set values and executeUpdate() over and over
>     again.
>      > Don't close the statement or connections after each upsert. Also, I
>      > haven't seen any noticeable benefit on using jdbc batches as Phoenix
>      > controls batching by when commit() is called.
>      >
>      > Keep an eye on not calling commit after every executeUpdate
>     (that's a
>      > real performance killer) . Batch commits in every ~1k upserts .
>      >
>      > Also that attempt of asynchronous code is probably another
>     performance
>      > killer. Are you creating a new Runnable per database write and
>     opening
>      > and closing dB connections per write? Just spawn a few threads (5
>     to 10,
>      > if client cpu is not maxed keep increasing it) and send upserts
>     in a for
>      > loop reusing preparedStatement and connections.
>      >
>      > With a cluster that size I would expect seeing tens of thousands of
>      > writes per second.
>      >
>      > Finally have you checked that all RS receive same traffic ?
>      >
>      > On Thu, 12 Jul 2018, 23:10 Pedro Boado, <pedro.boado@gmail.com
>     <ma...@gmail.com>
>      > <mailto:pedro.boado@gmail.com <ma...@gmail.com>>> wrote:
>      >
>      >     I believe it's related to your client code - In our use case
>     we do
>      >     easily 15k writes/sec in a cluster lower specced than yours.
>      >
>      >     Check that your jdbc connection has autocommit off so Phoenix can
>      >     batch writes and that table has a reasonable
>     UPDATE_CACHE_FREQUENCY
>      >     ( more than 60000 ).
>      >
>      >
>      >     On Thu, 12 Jul 2018, 21:54 alchemist,
>     <alchemistsrivastava@gmail.com <ma...@gmail.com>
>      >     <mailto:alchemistsrivastava@gmail.com
>     <ma...@gmail.com>>> wrote:
>      >
>      >         Thanks a lot for your help.
>      >         Our test is inserting new rows individually. For our use
>     case,
>      >         we are
>      >         benchmarking that we could be able to get 10,000 new rows
>     in a
>      >         minute, using
>      >         a cluster of writers if needed.
>      >         When executing the inserts with Phoenix API (UPSERT) we have
>      >         been able to
>      >         get up to 6,000 new rows per minute.
>      >
>      >         We changed our test to perform the inserts individually using
>      >         the HBase API
>      >         (Put) rather than Phoenix API (UPSERT) and got an
>     improvement of
>      >         more than
>      >         10x. (up to 60,000 rows per minute).
>      >
>      >         What would explain this difference? I assume that in both
>     cases
>      >         HBase must
>      >         grab the locks individually in the same way.
>      >
>      >
>      >
>      >         --
>      >         Sent from:
>     http://apache-phoenix-user-list.1124778.n5.nabble.com/
>      >
>

Re: Upsert is EXTREMELY slow

Posted by James Taylor <ja...@apache.org>.

Phoenix won’t be slower to update secondary indexes than a use case would
be. Both have to do the writes to a second table to keep it in sync.

On Fri, Jul 13, 2018 at 8:39 AM Josh Elser <el...@apache.org> wrote:

> Also, they're relying on Phoenix to do secondary index updates for them.
>
> Obviously, you can do this faster than Phoenix can if you know the exact
> use-case.
>
> On 7/12/18 6:31 PM, Pedro Boado wrote:
> > A tip for performance is reusing the same preparedStatement , just
> > clearParameters() , set values and executeUpdate() over and over again.
> > Don't close the statement or connections after each upsert. Also, I
> > haven't seen any noticeable benefit on using jdbc batches as Phoenix
> > controls batching by when commit() is called.
> >
> > Keep an eye on not calling commit after every executeUpdate (that's a
> > real performance killer) . Batch commits in every ~1k upserts .
> >
> > Also that attempt of asynchronous code is probably another performance
> > killer. Are you creating a new Runnable per database write and opening
> > and closing dB connections per write? Just spawn a few threads (5 to 10,
> > if client cpu is not maxed keep increasing it) and send upserts in a for
> > loop reusing preparedStatement and connections.
> >
> > With a cluster that size I would expect seeing tens of thousands of
> > writes per second.
> >
> > Finally have you checked that all RS receive same traffic ?
> >
> > On Thu, 12 Jul 2018, 23:10 Pedro Boado, <pedro.boado@gmail.com
> > <ma...@gmail.com>> wrote:
> >
> >     I believe it's related to your client code - In our use case we do
> >     easily 15k writes/sec in a cluster lower specced than yours.
> >
> >     Check that your jdbc connection has autocommit off so Phoenix can
> >     batch writes and that table has a reasonable UPDATE_CACHE_FREQUENCY
> >     ( more than 60000 ).
> >
> >
> >     On Thu, 12 Jul 2018, 21:54 alchemist, <alchemistsrivastava@gmail.com
> >     <ma...@gmail.com>> wrote:
> >
> >         Thanks a lot for your help.
> >         Our test is inserting new rows individually. For our use case,
> >         we are
> >         benchmarking that we could be able to get 10,000 new rows in a
> >         minute, using
> >         a cluster of writers if needed.
> >         When executing the inserts with Phoenix API (UPSERT) we have
> >         been able to
> >         get up to 6,000 new rows per minute.
> >
> >         We changed our test to perform the inserts individually using
> >         the HBase API
> >         (Put) rather than Phoenix API (UPSERT) and got an improvement of
> >         more than
> >         10x. (up to 60,000 rows per minute).
> >
> >         What would explain this difference? I assume that in both cases
> >         HBase must
> >         grab the locks individually in the same way.
> >
> >
> >
> >         --
> >         Sent from:
> http://apache-phoenix-user-list.1124778.n5.nabble.com/
> >
>

Re: Upsert is EXTREMELY slow

Posted by Josh Elser <el...@apache.org>.

Also, they're relying on Phoenix to do secondary index updates for them.

Obviously, you can do this faster than Phoenix can if you know the exact 
use-case.

On 7/12/18 6:31 PM, Pedro Boado wrote:
> A tip for performance is reusing the same preparedStatement , just 
> clearParameters() , set values and executeUpdate() over and over again. 
> Don't close the statement or connections after each upsert. Also, I 
> haven't seen any noticeable benefit on using jdbc batches as Phoenix 
> controls batching by when commit() is called.
> 
> Keep an eye on not calling commit after every executeUpdate (that's a 
> real performance killer) . Batch commits in every ~1k upserts .
> 
> Also that attempt of asynchronous code is probably another performance 
> killer. Are you creating a new Runnable per database write and opening 
> and closing dB connections per write? Just spawn a few threads (5 to 10, 
> if client cpu is not maxed keep increasing it) and send upserts in a for 
> loop reusing preparedStatement and connections.
> 
> With a cluster that size I would expect seeing tens of thousands of 
> writes per second.
> 
> Finally have you checked that all RS receive same traffic ?
> 
> On Thu, 12 Jul 2018, 23:10 Pedro Boado, <pedro.boado@gmail.com 
> <ma...@gmail.com>> wrote:
> 
>     I believe it's related to your client code - In our use case we do
>     easily 15k writes/sec in a cluster lower specced than yours.
> 
>     Check that your jdbc connection has autocommit off so Phoenix can
>     batch writes and that table has a reasonable UPDATE_CACHE_FREQUENCY 
>     ( more than 60000 ).
> 
> 
>     On Thu, 12 Jul 2018, 21:54 alchemist, <alchemistsrivastava@gmail.com
>     <ma...@gmail.com>> wrote:
> 
>         Thanks a lot for your help.
>         Our test is inserting new rows individually. For our use case,
>         we are
>         benchmarking that we could be able to get 10,000 new rows in a
>         minute, using
>         a cluster of writers if needed.
>         When executing the inserts with Phoenix API (UPSERT) we have
>         been able to
>         get up to 6,000 new rows per minute.
> 
>         We changed our test to perform the inserts individually using
>         the HBase API
>         (Put) rather than Phoenix API (UPSERT) and got an improvement of
>         more than
>         10x. (up to 60,000 rows per minute).
> 
>         What would explain this difference? I assume that in both cases
>         HBase must
>         grab the locks individually in the same way.
> 
> 
> 
>         --
>         Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/
>

Re: Upsert is EXTREMELY slow

Posted by Pedro Boado <pe...@gmail.com>.

A tip for performance is reusing the same preparedStatement , just
clearParameters() , set values and executeUpdate() over and over again.
Don't close the statement or connections after each upsert. Also, I haven't
seen any noticeable benefit on using jdbc batches as Phoenix controls
batching by when commit() is called.

Keep an eye on not calling commit after every executeUpdate (that's a real
performance killer) . Batch commits in every ~1k upserts .

Also that attempt of asynchronous code is probably another performance
killer. Are you creating a new Runnable per database write and opening and
closing dB connections per write? Just spawn a few threads (5 to 10, if
client cpu is not maxed keep increasing it) and send upserts in a for loop
reusing preparedStatement and connections.

With a cluster that size I would expect seeing tens of thousands of writes
per second.

Finally have you checked that all RS receive same traffic ?

On Thu, 12 Jul 2018, 23:10 Pedro Boado, <pe...@gmail.com> wrote:

> I believe it's related to your client code - In our use case we do easily
> 15k writes/sec in a cluster lower specced than yours.
>
> Check that your jdbc connection has autocommit off so Phoenix can batch
> writes and that table has a reasonable UPDATE_CACHE_FREQUENCY  ( more than
> 60000 ).
>
>
> On Thu, 12 Jul 2018, 21:54 alchemist, <al...@gmail.com>
> wrote:
>
>> Thanks a lot for your help.
>> Our test is inserting new rows individually. For our use case, we are
>> benchmarking that we could be able to get 10,000 new rows in a minute,
>> using
>> a cluster of writers if needed.
>> When executing the inserts with Phoenix API (UPSERT) we have been able to
>> get up to 6,000 new rows per minute.
>>
>> We changed our test to perform the inserts individually using the HBase
>> API
>> (Put) rather than Phoenix API (UPSERT) and got an improvement of more than
>> 10x. (up to 60,000 rows per minute).
>>
>> What would explain this difference? I assume that in both cases HBase must
>> grab the locks individually in the same way.
>>
>>
>>
>> --
>> Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/
>>
>

Re: Upsert is EXTREMELY slow

Posted by Pedro Boado <pe...@gmail.com>.

I believe it's related to your client code - In our use case we do easily
15k writes/sec in a cluster lower specced than yours.

Check that your jdbc connection has autocommit off so Phoenix can batch
writes and that table has a reasonable UPDATE_CACHE_FREQUENCY  ( more than
60000 ).


On Thu, 12 Jul 2018, 21:54 alchemist, <al...@gmail.com> wrote:

> Thanks a lot for your help.
> Our test is inserting new rows individually. For our use case, we are
> benchmarking that we could be able to get 10,000 new rows in a minute,
> using
> a cluster of writers if needed.
> When executing the inserts with Phoenix API (UPSERT) we have been able to
> get up to 6,000 new rows per minute.
>
> We changed our test to perform the inserts individually using the HBase API
> (Put) rather than Phoenix API (UPSERT) and got an improvement of more than
> 10x. (up to 60,000 rows per minute).
>
> What would explain this difference? I assume that in both cases HBase must
> grab the locks individually in the same way.
>
>
>
> --
> Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/
>

Re: Upsert is EXTREMELY slow

Posted by alchemist <al...@gmail.com>.

Thanks a lot for your help.
Our test is inserting new rows individually. For our use case, we are
benchmarking that we could be able to get 10,000 new rows in a minute, using
a cluster of writers if needed.
When executing the inserts with Phoenix API (UPSERT) we have been able to
get up to 6,000 new rows per minute.

We changed our test to perform the inserts individually using the HBase API
(Put) rather than Phoenix API (UPSERT) and got an improvement of more than
10x. (up to 60,000 rows per minute).

What would explain this difference? I assume that in both cases HBase must
grab the locks individually in the same way.



--
Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/

Re: Upsert is EXTREMELY slow

Posted by Josh Elser <el...@apache.org>.

HBase must grab a lock for the row which is being updated. Normally, for 
a batch of updates sent to a region server, the RS will grab as many row 
locks as it can at once. If you only send one row to update at a time, 
you obviously get no amortization.

It's just the normal semantics of batching which you are completely 
missing out on. There are multiple manifestations of this. Row-locks are 
just one (network overhead, serialization, and rpc scheduling/execution 
are three others I can easily see)

On 7/11/18 4:10 PM, alchemist wrote:
> Josh Elser-2 wrote
>> Josh thanks so much for all your help.  I do not understand why you
>> "However, you are still fighting yourself when you have threads all trying
>> to grab the same lock to write their data."  My understanding is if we
>> have many processes running many threads, the data will be logged into WAL
>> as they come then flushed to the disk.  Where is the contention in this
>> case? Do we have one write per region/region server?
>>
>>
>>
>> On 7/11/18 11:33 AM, alchemist wrote:
>>> Thanks so much Josh!  I am unable to understand why performance is
>>> extremely
>>> slow.
>>>
>>> 1.  If I perform update using PreparedStatement addBatch and executeBatch
>>> then I get nearly 6000 transactions per minute.
>>>
>>> 2.  But in our case we need to save each transaction so cannot perform
>>> update batch,  so I am using PreparedStatement executeQuery and commit()
>>> getting nearly 100 transactions per minute.
>>>
>>> These numbers seems extremely slow,  therefore I am wondering I am doing
>>> something very incorrect.
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/
>>>
> 
> 
> 
> 
> 
> --
> Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/
>

Re: Upsert is EXTREMELY slow

Posted by alchemist <al...@gmail.com>.

Josh Elser-2 wrote
> Josh thanks so much for all your help.  I do not understand why you
> "However, you are still fighting yourself when you have threads all trying
> to grab the same lock to write their data."  My understanding is if we
> have many processes running many threads, the data will be logged into WAL
> as they come then flushed to the disk.  Where is the contention in this
> case? Do we have one write per region/region server?
> 
> 
> 
> On 7/11/18 11:33 AM, alchemist wrote:
>> Thanks so much Josh!  I am unable to understand why performance is
>> extremely
>> slow.
>> 
>> 1.  If I perform update using PreparedStatement addBatch and executeBatch
>> then I get nearly 6000 transactions per minute.
>> 
>> 2.  But in our case we need to save each transaction so cannot perform
>> update batch,  so I am using PreparedStatement executeQuery and commit()
>> getting nearly 100 transactions per minute.
>> 
>> These numbers seems extremely slow,  therefore I am wondering I am doing
>> something very incorrect.
>> 
>> 
>> 
>> --
>> Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/
>>





--
Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/

Re: Upsert is EXTREMELY slow

Posted by Josh Elser <el...@apache.org>.

Your real-world situation is not a single-threaded application, is it? 
You will have multiple threads which are all updating Phoenix concurrently.

Given the semantics that your application needs from the requirements 
you stated, I'm not sure what else you can do differently. You can get 
low-latency out of HBase, but that's at the cost of throughput (not a 
unique characteristic of HBase).

Denormalizing your tables will reduce the amount of work each update 
will have to execute. Every secondary index is another update that needs 
to be executed to satisfy your UPSERT.

Updates in HBase go in memory and to the WAL. New updates to HBase are 
blocked when the memstore fills up and needs to flush to disk. Thus, it 
is optimal to keep flush-times short so that you don't have many threads 
blocked. However, you are still fighting yourself when you have threads 
all trying to grab the same lock to write their data.

You can also try reaching out to your vendor (EMR) to see what other 
tunings they recommend. I don't know what this architecture looks like.

On 7/11/18 11:33 AM, alchemist wrote:
> Thanks so much Josh!  I am unable to understand why performance is extremely
> slow.
> 
> 1.  If I perform update using PreparedStatement addBatch and executeBatch
> then I get nearly 6000 transactions per minute.
> 
> 2.  But in our case we need to save each transaction so cannot perform
> update batch,  so I am using PreparedStatement executeQuery and commit()
> getting nearly 100 transactions per minute.
> 
> These numbers seems extremely slow,  therefore I am wondering I am doing
> something very incorrect.
> 
> 
> 
> --
> Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/
>

Re: Upsert is EXTREMELY slow

Posted by alchemist <al...@gmail.com>.

Thanks so much Josh!  I am unable to understand why performance is extremely
slow.  

1.  If I perform update using PreparedStatement addBatch and executeBatch
then I get nearly 6000 transactions per minute.

2.  But in our case we need to save each transaction so cannot perform
update batch,  so I am using PreparedStatement executeQuery and commit()
getting nearly 100 transactions per minute.

These numbers seems extremely slow,  therefore I am wondering I am doing
something very incorrect.



--
Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/

Re: Upsert is EXTREMELY slow

Posted by Josh Elser <el...@apache.org>.

Some thoughts:

* Please _remove_ commented lines before sharing configuration next 
time. We don't need to see all of the things you don't have set :)

* 100 salt buckets is really excessive for a 4 node cluster. Salt 
buckets are not synonymous with pre-splitting HBase tables. This many 
salt buckets will have a negative impact on the read performance of your 
cluster.

* m4.10xlarge should be more than enough to run HBase extremely well.

* You seem to have set your HBase heapsize to be 1KB? I can't imagine 
that HBase would even run with that, so I'm left to assume that you're 
using the Java default heap size of 1GB instead.

export HBASE_HEAPSIZE=1024

You probably want to see what the heapsize your regionservers are 
running with and validate that..

On 7/11/18 10:53 AM, alchemist wrote:
> Thanks so much for all your help and response.  Here are the recommended
> details.
> 
> I have deployed Hbase backed by S3 on EMR cluster.  And performing upsert
> from EC2 node outside EMR using simple java multithreaded client to perform
> atomic update.
> 
> I have created a table like following
> 
> CREATE TABLE IF NOT EXISTS VBQL_PHOENIX_TRANSCRIPT ( PK VARCHAR NOT NULL
> PRIMARY KEY, IMMUTABLES.VBMETAJSON VARCHAR, IMMUTABLES.ACCOUNTID VARCHAR,
> IMMUTABLES.DATECREATED VARCHAR, IMMUTABLES.DATEFINISHED VARCHAR,
> IMMUTABLES.MEDIAID VARCHAR, IMMUTABLES.JOBID VARCHAR, IMMUTABLES.STATUS
> VARCHAR, UPDATABLE.METADATA VARCHAR, CATEGORIES.C_ACOUNTID_CATEGORYNAME
> VARCHAR, COMPUTED.ADDITIONALMETRICS VARCHAR) SALT_BUCKETS =100;
> 
> with secondary index like this:
> 
> CREATE INDEX  VBQL_PHOENIX_TRANSCRIPT_INDEX5  ON  VBQL_PHOENIX_TRANSCRIPT5
> (IMMUTABLES.MEDIAID) ;
> 
> 
> Sample Upsert
> UPSERT INTO VBQL_PHOENIX_TRANSCRIPT2  ( PK , IMMUTABLES.ACCOUNTID ,
> IMMUTABLES.DATECREATED , IMMUTABLES.DATEFINISHED ,
> IMMUTABLES.MEDIAID , IMMUTABLES.JOBID , IMMUTABLES.STATUS  )
> VALUES
> ('5DAD32BA-9656-41F3-BD38-BBF890B85CD62018-05-18T18:09:38.60700005D681A95C-8CDA-47B2-93BE-C165B1DEC7D8',
> 'AAAAAAAAAAAAAAA5DAD32BA-9656',
> '2018-04-18T18:09:38.607+0000',
> '2018-05-18T18:09:38.607+0000','5D681A95C-8CDA-47B2-93BE-C165B1DEC7D8',
> 'JOB123', 'FINISHED');
> 
> 
> HBASE IS INSTALLED ON EMR CLUSTER HERE TABLE IS CREATED USING ABOVE CREATE
> TABLE COMMANDS
> 
> EMR Cluster is 4 node m4.4xlarge cluster (32 vCore, 64 GiB memory, EBS only
> storage
> EBS Storage:32 GiB)
> 
> Client is a Java program running in EC2 (m4.10xlarge m4.10xlarge	40 CPU	160
> RAM 10 GiB Network EBS Only	10 Gbps	4,000 Mbps) Client is a multithreaded
> program that creates atomic connection to Hbase and performs inserts.
> 
> 
> CLIENT hbase-site.xml looks like following:
> 
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> 
> <configuration>
>    <property>
>    <name>hbase.regionserver.wal.codec</name>
>   
> <value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
> </property>
> 
>    <property>
>      <name>hbase.cluster.distributed</name>
>      <value>true</value>
>    </property>
> 
>    <property>
>      <name>hbase.zookeeper.quorum</name>
>      <value>10.16.129.55</value>
>    </property>
> 
>    <property>
>      <name>hbase.rootdir</name>
>      <value>s3://dev-mock-transcription/</value>
>    </property>
> 
>    <property>
>      <name>dfs.support.append</name>
>      <value>true</value>
>    </property>
> 
>    <property>
>      <name>hbase.rest.port</name>
>      <value>8070</value>
>    </property>
> 
> 
>    <property>
>      <name>hbase.replication</name>
>      <value>false</value>
>    </property>
> 
>    <property>
>      <name>hbase.balancer.tablesOnMaster</name>
>      <value>hbase:meta</value>
>    </property>
> 
>    <property>
>      <name>hbase.bucketcache.size</name>
>      <value>8192</value>
>    </property>
> 
>    <property>
>      <name>hbase.master.balancer.uselocality</name>
>      <value>false</value>
>    </property>
> 
>    <property>
>      <name>hbase.master.startup.retainassign</name>
>      <value>false</value>
>    </property>
> 
>    <property>
>      <name>hbase.wal.dir</name>
>      <value>hdfs://10.16.129.55:8020/user/hbase/WAL</value>
>    </property>
> 
>    <property>
>      <name>hbase.bulkload.retries.retryOnIOException</name>
>      <value>true</value>
>    </property>
> 
>    <property>
>      <name>hbase.bucketcache.ioengine</name>
>      <value>files:/mnt1/hbase/bucketcache</value>
>    </property>
> 
>     <property>
>        <name>hbase.rpc.timeout</name>
>        <value>1800000</value>
>      </property>
> 
> 
>    <property>
>        <name>phoenix.query.timeoutMs</name>
>        <value>18000000</value>
>                      </property>
>       <property>
>        <name>phbase.regionserver.lease.period</name>
>        <value>18000000</value>
>      </property>
> 
>        <property>
>        <name>hbase.client.scanner.caching</name>
>        <value>180000</value>
>      </property>
> 
>        <property>
>        <name>phbase.client.scanner.timeout.period</name>
>        <value>18000000</value>
>      </property>
> 
>   <property>
>        <name>index.writer.threads.max</name>
>        <value>30</value>
>   </property>
>   <property>
>        <name>index.builder.threads.max</name>
>        <value>30</value>
>   </property>
>   <property>
>        <name>phoenix.query.threadPoolSize</name>
>        <value>256</value>
>   </property>
>   <property>
>        <name>index.builder.threads.keepalivetime</name>
>        <value>90000</value>
>   </property>
> <property>
>        <name>phoenix.query.timeoutMs</name>
>        <value>90000</value>
>   </property>
> 
> 
> </configuration>
> 
> 
> 
> HBASE ENV LOOKS LIKE FOLLOWING:
> 
> [ec2-user@ip-10-16-129-55 conf]$ cat hbase-env.sh
> #
> #/**
> # * Licensed to the Apache Software Foundation (ASF) under one
> # * or more contributor license agreements.  See the NOTICE file
> # * distributed with this work for additional information
> # * regarding copyright ownership.  The ASF licenses this file
> # * to you under the Apache License, Version 2.0 (the
> # * "License"); you may not use this file except in compliance
> # * with the License.  You may obtain a copy of the License at
> # *
> # *     http://www.apache.org/licenses/LICENSE-2.0
> # *
> # * Unless required by applicable law or agreed to in writing, software
> # * distributed under the License is distributed on an "AS IS" BASIS,
> # * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> # * See the License for the specific language governing permissions and
> # * limitations under the License.
> # */
> 
> # Set environment variables here.
> 
> # This script sets variables multiple times over the course of starting an
> hbase process,
> # so try to keep things idempotent unless you want to take an even deeper
> look
> # into the startup scripts (bin/hbase, etc.)
> 
> # The java implementation to use.  Java 1.7+ required.
> # export JAVA_HOME=/usr/java/jdk1.6.0/
> 
> # Extra Java CLASSPATH elements.  Optional.
> export HBASE_CLASSPATH=/etc/hadoop/conf
> 
> # The maximum amount of heap to use. Default is left to JVM default.
> # export HBASE_HEAPSIZE=1G
> export HBASE_HEAPSIZE=1024
> 
> # Uncomment below if you intend to use off heap cache. For example, to
> allocate 8G of
> # offheap, set the value to "8G".
> # export HBASE_OFFHEAPSIZE=1G
> 
> # Extra Java runtime options.
> # Below are what we set by default.  May only work with SUN JVM.
> # For more on why as well as other possible settings,
> # see http://wiki.apache.org/hadoop/PerformanceTuning
> export HBASE_OPTS="$HBASE_OPTS -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
> -XX:CMSInitiatingOccupancyFraction=70 -Dsun.net.inetaddr.ttl=60
> -Dnetworkaddress.cache.ttl=60"
> 
> 
> # Uncomment one of the below three options to enable java garbage collection
> logging for the server-side processes.
> 
> # This enables basic gc logging to the .out file.
> # export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails
> -XX:+PrintGCDateStamps"
> 
> # This enables basic gc logging to its own file.
> # If FILE-PATH is not replaced, the log file(.gc) would still be generated
> in the HBASE_LOG_DIR .
> # export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails
> -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>"
> 
> # This enables basic GC logging to its own file with automatic log rolling.
> Only applies to jdk 1.6.0_34+ and 1.7.0_2+.
> # If FILE-PATH is not replaced, the log file(.gc) would still be generated
> in the HBASE_LOG_DIR .
> # export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails
> -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"
> 
> # Uncomment one of the below three options to enable java garbage collection
> logging for the client processes.
> 
> # This enables basic gc logging to the .out file.
> # export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails
> -XX:+PrintGCDateStamps"
> 
> # This enables basic gc logging to its own file.
> # If FILE-PATH is not replaced, the log file(.gc) would still be generated
> in the HBASE_LOG_DIR .
> # export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails
> -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>"
> 
> # This enables basic GC logging to its own file with automatic log rolling.
> Only applies to jdk 1.6.0_34+ and 1.7.0_2+.
> # If FILE-PATH is not replaced, the log file(.gc) would still be generated
> in the HBASE_LOG_DIR .
> # export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails
> -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"
> 
> # See the package documentation for org.apache.hadoop.hbase.io.hfile for
> other configurations
> # needed setting up off-heap block caching.
> 
> # Uncomment and adjust to enable JMX exporting
> # See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to
> configure remote password access.
> # More details at:
> http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html
> # NOTE: HBase provides an alternative JMX implementation to fix the random
> ports issue, please see JMX
> # section in HBase Reference Guide for instructions.
> 
> # export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false
> -Dcom.sun.management.jmxremote.authenticate=false"
> # export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE
> -Dcom.sun.management.jmxremote.port=10101"
> # export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE
> -Dcom.sun.management.jmxremote.port=10102"
> # export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS $HBASE_JMX_BASE
> -Dcom.sun.management.jmxremote.port=10103"
> # export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS $HBASE_JMX_BASE
> -Dcom.sun.management.jmxremote.port=10104"
> # export HBASE_REST_OPTS="$HBASE_REST_OPTS $HBASE_JMX_BASE
> -Dcom.sun.management.jmxremote.port=10105"
> 
> # File naming hosts on which HRegionServers will run.
> $HBASE_HOME/conf/regionservers by default.
> # export HBASE_REGIONSERVERS=${HBASE_HOME}/conf/regionservers
> 
> # Uncomment and adjust to keep all the Region Server pages mapped to be
> memory resident
> #HBASE_REGIONSERVER_MLOCK=true
> #HBASE_REGIONSERVER_UID="hbase"
> 
> # File naming hosts on which backup HMaster will run.
> $HBASE_HOME/conf/backup-masters by default.
> # export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters
> 
> # Extra ssh options.  Empty by default.
> # export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR"
> 
> # Where log files are stored.  $HBASE_HOME/logs by default.
> # export HBASE_LOG_DIR=${HBASE_HOME}/logs
> 
> # Enable remote JDWP debugging of major HBase processes. Meant for Core
> Developers
> # export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -Xdebug
> -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8070"
> # export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -Xdebug
> -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8071"
> # export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS -Xdebug
> -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8072"
> # export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS -Xdebug
> -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8073"
> 
> # A string representing this instance of hbase. $USER by default.
> # export HBASE_IDENT_STRING=$USER
> 
> # The scheduling priority for daemon processes.  See 'man nice'.
> # export HBASE_NICENESS=10
> 
> # The directory where pid files are stored. /tmp by default.
> # export HBASE_PID_DIR=/var/hadoop/pids
> 
> # Seconds to sleep between slave commands.  Unset by default.  This
> # can be useful in large clusters, where, e.g., slave rsyncs can
> # otherwise arrive faster than the master can service them.
> # export HBASE_SLAVE_SLEEP=0.1
> 
> # Tell HBase whether it should manage it's own instance of Zookeeper or not.
> # export HBASE_MANAGES_ZK=true
> 
> # The default log rolling policy is RFA, where the log file is rolled as per
> the size defined for the
> # RFA appender. Please refer to the log4j.properties file to see more
> details on this appender.
> # In case one needs to do log rolling on a date change, one should set the
> environment property
> # HBASE_ROOT_LOGGER to "<DESIRED_LOG LEVEL>,DRFA".
> # For example:
> # HBASE_ROOT_LOGGER=INFO,DRFA
> # The reason for changing default to RFA is to avoid the boundary case of
> filling out disk space as
> # DRFA doesn't put any cap on the log size. Please refer to HBase-5655 for
> more context.
> 
> export HBASE_MANAGES_ZK=false
> export HBASE_DAEMON_DEFAULT_ROOT_LOGGER=INFO,DRFA
> export HBASE_DAEMON_DEFAULT_SECURITY_LOGGER=INFO,DRFAS
> export HBASE_CLASSPATH=${HBASE_CLASSPATH}${HBASE_CLASSPATH:+:}$(ls -1
> /usr/lib/phoenix/phoenix-*-HBase-*-server.jar)
> 
> AND SERVER SIDE hbase-site.xmllooks like this:
> <configuration>
>    <property>
>      <name>hbase.cluster.distributed</name>
>      <value>true</value>
>    </property>
> 
>    <property>
>      <name>hbase.zookeeper.quorum</name>
>      <value>ip-10-16-129-55.ec2.internal</value>
>    </property>
> 
>    <property>
>      <name>hbase.rootdir</name>
>      <value>s3://dev-mock-transcription/</value>
>    </property>
> 
>    <property>
>      <name>dfs.support.append</name>
>      <value>true</value>
>    </property>
> 
>    <property>
>      <name>hbase.rest.port</name>
>      <value>8070</value>
>    </property>
> 
>    <property>
>      <name>hbase.replication</name>
>      <value>false</value>
>    </property>
> 
>    <property>
>      <name>hbase.balancer.tablesOnMaster</name>
>      <value>hbase:meta</value>
>    </property>
> 
>    <property>
>      <name>hbase.bucketcache.size</name>
>      <value>8192</value>
>    </property>
> 
>    <property>
>      <name>hbase.master.balancer.uselocality</name>
>      <value>false</value>
>    </property>
> 
>    <property>
>      <name>hbase.master.startup.retainassign</name>
>      <value>false</value>
>    </property>
> 
>    <property>
>      <name>hbase.wal.dir</name>
>      <value>hdfs://ip-10-16-129-55.ec2.internal:8020/user/hbase/WAL</value>
>    </property>
> 
>    <property>
>      <name>hbase.bulkload.retries.retryOnIOException</name>
>      <value>true</value>
>    </property>
> 
>    <property>
>      <name>hbase.bucketcache.ioengine</name>
>      <value>files:/mnt1/hbase/bucketcache</value>
>    </property>
> 
>    
> 
> <property>
>        <name>hbase.rpc.timeout</name>
>        <value>180000</value>
>      </property>
>    <property>
>        <name>index.writer.threads.max</name>
>        <value>30</value>
>   </property>
>   <property>
>        <name>index.builder.threads.max</name>
>        <value>30</value>
>   </property>
>   </configuration>
> 
> 
> 
> 
> --
> Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/
>

Re: Upsert is EXTREMELY slow

Posted by alchemist <al...@gmail.com>.

Thanks so much for all your help and response.  Here are the recommended
details.

I have deployed Hbase backed by S3 on EMR cluster.  And performing upsert
from EC2 node outside EMR using simple java multithreaded client to perform
atomic update.

I have created a table like following

CREATE TABLE IF NOT EXISTS VBQL_PHOENIX_TRANSCRIPT ( PK VARCHAR NOT NULL
PRIMARY KEY, IMMUTABLES.VBMETAJSON VARCHAR, IMMUTABLES.ACCOUNTID VARCHAR,
IMMUTABLES.DATECREATED VARCHAR, IMMUTABLES.DATEFINISHED VARCHAR,
IMMUTABLES.MEDIAID VARCHAR, IMMUTABLES.JOBID VARCHAR, IMMUTABLES.STATUS
VARCHAR, UPDATABLE.METADATA VARCHAR, CATEGORIES.C_ACOUNTID_CATEGORYNAME
VARCHAR, COMPUTED.ADDITIONALMETRICS VARCHAR) SALT_BUCKETS =100;

with secondary index like this:

CREATE INDEX  VBQL_PHOENIX_TRANSCRIPT_INDEX5  ON  VBQL_PHOENIX_TRANSCRIPT5
(IMMUTABLES.MEDIAID) ;


Sample Upsert
UPSERT INTO VBQL_PHOENIX_TRANSCRIPT2  ( PK , IMMUTABLES.ACCOUNTID ,
IMMUTABLES.DATECREATED , IMMUTABLES.DATEFINISHED ,
IMMUTABLES.MEDIAID , IMMUTABLES.JOBID , IMMUTABLES.STATUS  ) 
VALUES
('5DAD32BA-9656-41F3-BD38-BBF890B85CD62018-05-18T18:09:38.60700005D681A95C-8CDA-47B2-93BE-C165B1DEC7D8',
'AAAAAAAAAAAAAAA5DAD32BA-9656', 
'2018-04-18T18:09:38.607+0000',
'2018-05-18T18:09:38.607+0000','5D681A95C-8CDA-47B2-93BE-C165B1DEC7D8',
'JOB123', 'FINISHED');


HBASE IS INSTALLED ON EMR CLUSTER HERE TABLE IS CREATED USING ABOVE CREATE
TABLE COMMANDS

EMR Cluster is 4 node m4.4xlarge cluster (32 vCore, 64 GiB memory, EBS only
storage
EBS Storage:32 GiB)

Client is a Java program running in EC2 (m4.10xlarge m4.10xlarge	40 CPU	160 
RAM 10 GiB Network EBS Only	10 Gbps	4,000 Mbps) Client is a multithreaded
program that creates atomic connection to Hbase and performs inserts.


CLIENT hbase-site.xml looks like following:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
  <property>
  <name>hbase.regionserver.wal.codec</name>
 
<value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
</property>

  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>

  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>10.16.129.55</value>
  </property>

  <property>
    <name>hbase.rootdir</name>
    <value>s3://dev-mock-transcription/</value>
  </property>

  <property>
    <name>dfs.support.append</name>
    <value>true</value>
  </property>

  <property>
    <name>hbase.rest.port</name>
    <value>8070</value>
  </property>


  <property>
    <name>hbase.replication</name>
    <value>false</value>
  </property>

  <property>
    <name>hbase.balancer.tablesOnMaster</name>
    <value>hbase:meta</value>
  </property>

  <property>
    <name>hbase.bucketcache.size</name>
    <value>8192</value>
  </property>

  <property>
    <name>hbase.master.balancer.uselocality</name>
    <value>false</value>
  </property>

  <property>
    <name>hbase.master.startup.retainassign</name>
    <value>false</value>
  </property>

  <property>
    <name>hbase.wal.dir</name>
    <value>hdfs://10.16.129.55:8020/user/hbase/WAL</value>
  </property>

  <property>
    <name>hbase.bulkload.retries.retryOnIOException</name>
    <value>true</value>
  </property>

  <property>
    <name>hbase.bucketcache.ioengine</name>
    <value>files:/mnt1/hbase/bucketcache</value>
  </property>

   <property>
      <name>hbase.rpc.timeout</name>
      <value>1800000</value>
    </property>


  <property>
      <name>phoenix.query.timeoutMs</name>
      <value>18000000</value>
                    </property>
     <property>
      <name>phbase.regionserver.lease.period</name>
      <value>18000000</value>
    </property>

      <property>
      <name>hbase.client.scanner.caching</name>
      <value>180000</value>
    </property>

      <property>
      <name>phbase.client.scanner.timeout.period</name>
      <value>18000000</value>
    </property>

 <property>
      <name>index.writer.threads.max</name>
      <value>30</value>
 </property>
 <property>
      <name>index.builder.threads.max</name>
      <value>30</value>
 </property>
 <property>
      <name>phoenix.query.threadPoolSize</name>
      <value>256</value>
 </property>
 <property>
      <name>index.builder.threads.keepalivetime</name>
      <value>90000</value>
 </property>
<property>
      <name>phoenix.query.timeoutMs</name>
      <value>90000</value>
 </property>


</configuration>



HBASE ENV LOOKS LIKE FOLLOWING:

[ec2-user@ip-10-16-129-55 conf]$ cat hbase-env.sh 
#
#/**
# * Licensed to the Apache Software Foundation (ASF) under one
# * or more contributor license agreements.  See the NOTICE file
# * distributed with this work for additional information
# * regarding copyright ownership.  The ASF licenses this file
# * to you under the Apache License, Version 2.0 (the
# * "License"); you may not use this file except in compliance
# * with the License.  You may obtain a copy of the License at
# *
# *     http://www.apache.org/licenses/LICENSE-2.0
# *
# * Unless required by applicable law or agreed to in writing, software
# * distributed under the License is distributed on an "AS IS" BASIS,
# * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# * See the License for the specific language governing permissions and
# * limitations under the License.
# */

# Set environment variables here.

# This script sets variables multiple times over the course of starting an
hbase process,
# so try to keep things idempotent unless you want to take an even deeper
look
# into the startup scripts (bin/hbase, etc.)

# The java implementation to use.  Java 1.7+ required.
# export JAVA_HOME=/usr/java/jdk1.6.0/

# Extra Java CLASSPATH elements.  Optional.
export HBASE_CLASSPATH=/etc/hadoop/conf

# The maximum amount of heap to use. Default is left to JVM default.
# export HBASE_HEAPSIZE=1G
export HBASE_HEAPSIZE=1024

# Uncomment below if you intend to use off heap cache. For example, to
allocate 8G of 
# offheap, set the value to "8G".
# export HBASE_OFFHEAPSIZE=1G

# Extra Java runtime options.
# Below are what we set by default.  May only work with SUN JVM.
# For more on why as well as other possible settings,
# see http://wiki.apache.org/hadoop/PerformanceTuning
export HBASE_OPTS="$HBASE_OPTS -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
-XX:CMSInitiatingOccupancyFraction=70 -Dsun.net.inetaddr.ttl=60
-Dnetworkaddress.cache.ttl=60"


# Uncomment one of the below three options to enable java garbage collection
logging for the server-side processes.

# This enables basic gc logging to the .out file.
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCDateStamps"

# This enables basic gc logging to its own file.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated
in the HBASE_LOG_DIR .
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>"

# This enables basic GC logging to its own file with automatic log rolling.
Only applies to jdk 1.6.0_34+ and 1.7.0_2+.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated
in the HBASE_LOG_DIR .
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"

# Uncomment one of the below three options to enable java garbage collection
logging for the client processes.

# This enables basic gc logging to the .out file.
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCDateStamps"

# This enables basic gc logging to its own file.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated
in the HBASE_LOG_DIR .
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>"

# This enables basic GC logging to its own file with automatic log rolling.
Only applies to jdk 1.6.0_34+ and 1.7.0_2+.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated
in the HBASE_LOG_DIR .
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"

# See the package documentation for org.apache.hadoop.hbase.io.hfile for
other configurations
# needed setting up off-heap block caching. 

# Uncomment and adjust to enable JMX exporting
# See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to
configure remote password access.
# More details at:
http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html
# NOTE: HBase provides an alternative JMX implementation to fix the random
ports issue, please see JMX
# section in HBase Reference Guide for instructions.

# export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false"
# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE
-Dcom.sun.management.jmxremote.port=10101"
# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE
-Dcom.sun.management.jmxremote.port=10102"
# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS $HBASE_JMX_BASE
-Dcom.sun.management.jmxremote.port=10103"
# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS $HBASE_JMX_BASE
-Dcom.sun.management.jmxremote.port=10104"
# export HBASE_REST_OPTS="$HBASE_REST_OPTS $HBASE_JMX_BASE
-Dcom.sun.management.jmxremote.port=10105"

# File naming hosts on which HRegionServers will run. 
$HBASE_HOME/conf/regionservers by default.
# export HBASE_REGIONSERVERS=${HBASE_HOME}/conf/regionservers

# Uncomment and adjust to keep all the Region Server pages mapped to be
memory resident
#HBASE_REGIONSERVER_MLOCK=true
#HBASE_REGIONSERVER_UID="hbase"

# File naming hosts on which backup HMaster will run. 
$HBASE_HOME/conf/backup-masters by default.
# export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters

# Extra ssh options.  Empty by default.
# export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR"

# Where log files are stored.  $HBASE_HOME/logs by default.
# export HBASE_LOG_DIR=${HBASE_HOME}/logs

# Enable remote JDWP debugging of major HBase processes. Meant for Core
Developers 
# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -Xdebug
-Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8070"
# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -Xdebug
-Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8071"
# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS -Xdebug
-Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8072"
# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS -Xdebug
-Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8073"

# A string representing this instance of hbase. $USER by default.
# export HBASE_IDENT_STRING=$USER

# The scheduling priority for daemon processes.  See 'man nice'.
# export HBASE_NICENESS=10

# The directory where pid files are stored. /tmp by default.
# export HBASE_PID_DIR=/var/hadoop/pids

# Seconds to sleep between slave commands.  Unset by default.  This
# can be useful in large clusters, where, e.g., slave rsyncs can
# otherwise arrive faster than the master can service them.
# export HBASE_SLAVE_SLEEP=0.1

# Tell HBase whether it should manage it's own instance of Zookeeper or not.
# export HBASE_MANAGES_ZK=true

# The default log rolling policy is RFA, where the log file is rolled as per
the size defined for the 
# RFA appender. Please refer to the log4j.properties file to see more
details on this appender.
# In case one needs to do log rolling on a date change, one should set the
environment property
# HBASE_ROOT_LOGGER to "<DESIRED_LOG LEVEL>,DRFA".
# For example:
# HBASE_ROOT_LOGGER=INFO,DRFA
# The reason for changing default to RFA is to avoid the boundary case of
filling out disk space as 
# DRFA doesn't put any cap on the log size. Please refer to HBase-5655 for
more context.

export HBASE_MANAGES_ZK=false
export HBASE_DAEMON_DEFAULT_ROOT_LOGGER=INFO,DRFA
export HBASE_DAEMON_DEFAULT_SECURITY_LOGGER=INFO,DRFAS
export HBASE_CLASSPATH=${HBASE_CLASSPATH}${HBASE_CLASSPATH:+:}$(ls -1
/usr/lib/phoenix/phoenix-*-HBase-*-server.jar)

AND SERVER SIDE hbase-site.xmllooks like this:
<configuration>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>

  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>ip-10-16-129-55.ec2.internal</value>
  </property>

  <property>
    <name>hbase.rootdir</name>
    <value>s3://dev-mock-transcription/</value>
  </property>

  <property>
    <name>dfs.support.append</name>
    <value>true</value>
  </property>

  <property>
    <name>hbase.rest.port</name>
    <value>8070</value>
  </property>

  <property>
    <name>hbase.replication</name>
    <value>false</value>
  </property>

  <property>
    <name>hbase.balancer.tablesOnMaster</name>
    <value>hbase:meta</value>
  </property>

  <property>
    <name>hbase.bucketcache.size</name>
    <value>8192</value>
  </property>

  <property>
    <name>hbase.master.balancer.uselocality</name>
    <value>false</value>
  </property>

  <property>
    <name>hbase.master.startup.retainassign</name>
    <value>false</value>
  </property>

  <property>
    <name>hbase.wal.dir</name>
    <value>hdfs://ip-10-16-129-55.ec2.internal:8020/user/hbase/WAL</value>
  </property>

  <property>
    <name>hbase.bulkload.retries.retryOnIOException</name>
    <value>true</value>
  </property>

  <property>
    <name>hbase.bucketcache.ioengine</name>
    <value>files:/mnt1/hbase/bucketcache</value>
  </property>

  

<property>
      <name>hbase.rpc.timeout</name>
      <value>180000</value>
    </property> 
  <property>
      <name>index.writer.threads.max</name>
      <value>30</value>
 </property>
 <property>
      <name>index.builder.threads.max</name>
      <value>30</value>
 </property>
 </configuration>




--
Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/

Re: Upsert is EXTREMELY slow

Posted by Josh Elser <el...@apache.org>.

The explain plan for your tables isn't a substitute for the DDLs. Please 
provide those.

How about sharing your completely hbase-site.xml and hbase-env.sh files, 
rather than just snippets like you have. A full picture is often needed.

Given that HBase cannot directly run on S3, please also describe how 
your have this setup functioning. Do you have an HDFS instance running 
for the HBase WALs? Finally, assuming that you're using EC2 if you're 
using S3, what kind of nodes do you have?

As a courtesy, please take the time to consolidate your thoughts into 
one email instead of sending multiple in rapid succession like this. 
Remember that Apache communities are volunteer based.

On 7/11/18 7:53 AM, alchemist wrote:
> 
> I have written a threaded program to batch upsert data into Phoenix.  I am
> using Phoenix because of secondary index capability.  I am getting extremely
> slow performance in write.
> 
> Explain query looks like following
> 
> 0: jdbc:phoenix:localhost:2181:/hbase> EXPLAIN select count(1) from
> VBQL_PHOENIX_TRANSCRIPT5
> . . . . . . . . . . . . . . . . . . .> ;
> +---------------------------------------------------------------------------------------------------------------+-----------------+------+
> |                                                     PLAN
> | EST_BYTES_READ  | EST_ |
> +---------------------------------------------------------------------------------------------------------------+-----------------+------+
> | CLIENT 100-CHUNK 6838539 ROWS 314572800 BYTES PARALLEL 100-WAY FULL SCAN
> OVER VBQL_PHOENIX_TRANSCRIPT_INDEX5  | 314572800       | 6838 |
> |     SERVER FILTER BY FIRST KEY ONLY
> | 314572800       | 6838 |
> |     SERVER AGGREGATE INTO SINGLE ROW
> | 314572800       | 6838 |
> +---------------------------------------------------------------------------------------------------------------+-----------------+------+
> 
> I have secondary index that looks like this:
> 
> 0: jdbc:phoenix:localhost:2181:/hbase> EXPLAIN select count(1) from
> VBQL_PHOENIX_TRANSCRIPT_INDEX5;
> +---------------------------------------------------------------------------------------------------------------+-----------------+------+
> |                                                     PLAN
> | EST_BYTES_READ  | EST_ |
> +---------------------------------------------------------------------------------------------------------------+-----------------+------+
> | CLIENT 100-CHUNK 6838539 ROWS 314572800 BYTES PARALLEL 100-WAY FULL SCAN
> OVER VBQL_PHOENIX_TRANSCRIPT_INDEX5  | 314572800       | 6838 |
> |     SERVER FILTER BY FIRST KEY ONLY
> | 314572800       | 6838 |
> |     SERVER AGGREGATE INTO SINGLE ROW
> | 314572800       | 6838 |
> +---------------------------------------------------------------------------------------------------------------+-----------------+------+
> 
> 
> Tuning Settings used in HBase.  Problem is it is very hard to scale, I tried
> adding more nodes to Hbase cluster, and I also tried adding more threads to
> the client program but it is not scaling beyond 6K per minutes, which is
> VERY SLOW.  Any help is greatly appreciated.
> 
>   <property>
>        <name>index.writer.threads.max</name>
>        <value>30</value>
>   </property>
>   <property>
>        <name>index.builder.threads.max</name>
>        <value>30</value>
>   </property>
>   <property>
>        <name>phoenix.query.threadPoolSize</name>
>        <value>256</value>
>   </property>
>   <property>
>        <name>index.builder.threads.keepalivetime</name>
>        <value>90000</value>
>   </property>
> <property>
>        <name>phoenix.query.timeoutMs</name>
>        <value>90000</value>
>   </property>
> 
>   Rest of the settings I am using are default.
> 
> Source Code  (Simple multithreded  with write need to be saved individually/
> we cannot use batch  write)
> 
>   public void generateAndInsertDataToHbase(int iterationCount ){
>     try {
> 	   	 int iterations =0;
> 	   	 Instant start = Instant.now();
> 	   	 ExecutorService executorService = Executors.newCachedThreadPool();
> 	      while (true) {
> 	    	if (iterations == iterationCount) {
> 	    		verifyRowCountAndExit(start,executorService);
> 	    	}
> 	    	Future<Void> future = executorService.submit(new Callable<Void>() {
> 	    	    public Void call() throws Exception {
> 	    	    	List<VbqlBean> vbqlList =
> VbqlUtils.generateRandomAccountVbql(accountCategoryMap);
> 	    	    	if(vbqlList.size() >0) {
> 	    	    		addVbqlAccountToHbaseAsBatch( vbqlList,connStr);
> 	    	    	}
> 	    	        return null;
> 	    	    }
> 	    	});
> 	    	future.get();
> 	    	iterations ++;
> 	      }
>     }catch(Exception e) {
>         e.printStackTrace();
>     }
>   }
>   
> 
>   public void addVbqlAccountToHbaseAsBatch(List<VbqlBean> vbqlList, String
> connStr) throws Exception{
> 	 Connection connection = null;
> 	 try {
>          connection = DBUtils.initSingleConnection(connStr);
>         // connection.setAutoCommit(false);
>          if(vbqlList.size() >0) {
> 	        for (VbqlBean vbqlBean : vbqlList) {
> 	        	DBUtils.executeSingleVbqlQuery( connStr,  vbqlBean, tableName);
> 	        }
>          }
>          connection.commit();
> 	 } finally {
>          if (connection != null) try { connection.close(); } catch
> (SQLException ignore) {}
>      }
>   }
> 
> 
> 
> 
> 
> --
> Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/
>

Re: Upsert is EXTREMELY slow

Posted by alchemist <al...@gmail.com>.

Is there any configurations I am missing?



--
Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/

Re: Upsert is EXTREMELY slow

Posted by alchemist <al...@gmail.com>.

I am doing a POC where I am using Pheonix for single write that updates the
database after the write.  So I cannot batch update the write.  I am getting
1 transaction per second as TPS.  I have 3 node EMR cluster.  I am using
HBase with S3 as backend.  

I tried tuning parameter I found online I created threaded app but still
performance is extemely slow.



--
Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/

Re: Upsert is EXTREMELY slow

Posted by alchemist <al...@gmail.com>.

I have also tried adding cache frequency.

<property>
      <name>phoenix.default.update.cache.frequency</name>
      <value>300000</value>
 </property




--
Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/