You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Peter Conrad (JIRA)" <ji...@apache.org> on 2017/01/25 01:23:26 UTC

[jira] [Comment Edited] (PHOENIX-3218) First draft of Phoenix Tuning Guide

    [ https://issues.apache.org/jira/browse/PHOENIX-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15836991#comment-15836991 ] 

Peter Conrad edited comment on PHOENIX-3218 at 1/25/17 1:22 AM:
----------------------------------------------------------------

[~elserj] Thanks again for the thorough and thoughful review. I'm working on a revision, and I have one question for you:

bq. When using `UPSERT` to write a large number of records, turn off autocommit and batch records. Start with a batch size of 1000 and adjust as needed. Here's some pseudocode showing one way to commit records in batches:
_Recommend putting a caveat here that the use of commit() by Phoenix to control batches of data written to HBase as being "non-standard" in terms of JDBC._

Is this doc the right place to say this? Seems like it would be kind of hidden here. The Grammar page mentions commits kind of off-handedly, as does the Atomic Upsert page. The Transactions page seems to be the one that kind of sort of defines them. But I wonder if the Overview page is the right place to clarify this.

... and some follow-on questions for [~apurtell] or [~jamestaylor]:

The doc says:
bq. When specifying machines for HBase, do not skimp on cores; HBase needs them.
Josh Elser says:
_How can this be made into a more concrete recommendation?_
Do we have any hardware recommendations?

The doc says:
bq. Set the `UPDATE_CACHE_FREQUENCY` [option](http://phoenix.apache.org/language/index.html#options) to 15 minutes or so if your metadata doesn't change very often
Josh Elser says:
_Don't guess, make a concrete recommendation. If 15 minutes isn't a good recommendation, let's come up with a good number._

Similar question—what's a more reliable way to determine cache update frequency?

The doc says:
bq. If you regularly scan large data sets from spinning disk, you're best off with GZIP (but watch write speed)
Josh Elser says:
_Numbers/reference-material to back this up?_

The doc says:
bq. When deleting a large data set, turn on autoCommit before issuing the `DELETE` query so that the client does not need to remember the row keys of all the keys as they are deleted.
Josh Elser says:
_Reasoning behind this one isn't clear to me. Batching DELETEs would have the same benefit of batching UPSERTs, no? (I may just be missing an implementation detail here..._
*Can you help me answer his questions?*


was (Author: pconrad):
[~elserj] Thanks again for the thorough and thoughful review. I'm working on a revision, and I have one question for you:
.bq
When using `UPSERT` to write a large number of records, turn off autocommit and batch records. Start with a batch size of 1000 and adjust as needed. Here's some pseudocode showing one way to commit records in batches:
_Recommend putting a caveat here that the use of commit() by Phoenix to control batches of data written to HBase as being "non-standard" in terms of JDBC._

Is this doc the right place to say this? Seems like it would be kind of hidden here. The Grammar page mentions commits kind of off-handedly, as does the Atomic Upsert page. The Transactions page seems to be the one that kind of sort of defines them. But I wonder if the Overview page is the right place to clarify this.

... and some follow-on questions for [~apurtell] or [~jamestaylor]:

The doc says:
.bq 
When specifying machines for HBase, do not skimp on cores; HBase needs them.
Josh Elser says:
_How can this be made into a more concrete recommendation?_
Do we have any hardware recommendations?

The doc says:
.bq
Set the `UPDATE_CACHE_FREQUENCY` [option](http://phoenix.apache.org/language/index.html#options) to 15 minutes or so if your metadata doesn't change very often
Josh Elser says:
_Don't guess, make a concrete recommendation. If 15 minutes isn't a good recommendation, let's come up with a good number._

Similar question—what's a more reliable way to determine cache update frequency?

The doc says:
.bq
If you regularly scan large data sets from spinning disk, you're best off with GZIP (but watch write speed)
Josh Elser says:
_Numbers/reference-material to back this up?_

The doc says:
.bq
When deleting a large data set, turn on autoCommit before issuing the `DELETE` query so that the client does not need to remember the row keys of all the keys as they are deleted.
Josh Elser says:
_Reasoning behind this one isn't clear to me. Batching DELETEs would have the same benefit of batching UPSERTs, no? (I may just be missing an implementation detail here..._
*Can you help me answer his questions?*

> First draft of Phoenix Tuning Guide
> -----------------------------------
>
>                 Key: PHOENIX-3218
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3218
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Peter Conrad
>         Attachments: Phoenix-Tuning-Guide-20170110.md, Phoenix-Tuning-Guide.md, Phoenix-Tuning-Guide.md
>
>
> Here's a first draft of a Tuning Guide for Phoenix performance. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)