You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by "Riesland, Zack" <Za...@sensus.com> on 2015/07/22 15:23:48 UTC

How fast is upsert select?

I want to play with some options for splitting a table to  test performance.

If I were to create a new table and perform an upsert select * to the table, with billions of rows in the source table, is that like an overnight operation or should it be pretty quick?

For reference, we have 6 (beefy) region servers in our cluster.

Thanks!


RE: How fast is upsert select?

Posted by "Riesland, Zack" <Za...@sensus.com>.
Thanks Thomas,

My table has about 10 billion rows with about 12 columns.

-----Original Message-----
From: Thomas D'Silva [mailto:tdsilva@salesforce.com] 
Sent: Wednesday, July 22, 2015 12:51 PM
To: user@phoenix.apache.org
Subject: Re: How fast is upsert select?

Zack,

It depends on how wide the rows are in your table.  On a 8 node
cluster,   creating an index with 3 columns (char(15),varchar and
date) on a 1 billion row table takes about 1 hour 15 minutes.
How many rows does your table have and how wide are they?

On Wed, Jul 22, 2015 at 8:29 AM, Riesland, Zack <Za...@sensus.com> wrote:
> Thanks Ravi,
>
>
>
> I think I may not have IndexTool in my version of Phoenix.
>
>
>
> I’m calling:
> HADOOP_CLASSPATH=/usr/hdp/current/hbase-master/conf/:/usr/hdp/current/
> hbase-master/lib/hbase-protocol.jar
> hadoop jar /usr/hdp/current/phoenix-client/phoenix-client.jar
> org.apache.phoenix.mapreduce.index.IndexTool
>
>
>
> And getting a java.lang.ClassNotFoundException:
> org.apache.phoenix.mapreduce.index.IndexTool
>
>
>
>
>
>
>
> From: Ravi Kiran [mailto:maghamravikiran@gmail.com]
> Sent: Wednesday, July 22, 2015 10:36 AM
> To: user@phoenix.apache.org
> Subject: Re: How fast is upsert select?
>
>
>
> Hi ,
>
>
>
>    Since you are saying billions of rows, why don't you try out the 
> MapReduce route to speed up the process.  You can take a look at how
> IndexTool.java(https://github.com/apache/phoenix/blob/359c255ba6c67d01
> a810d203825264907f580735/phoenix-core/src/main/java/org/apache/phoenix
> /mapreduce/index/IndexTool.java) was written as it does a similar task 
> of reading from a Phoenix table and writes the data into the target 
> table using bulk load.
>
>
>
>
>
> Regards
>
> Ravi
>
>
>
> On Wed, Jul 22, 2015 at 6:23 AM, Riesland, Zack 
> <Za...@sensus.com>
> wrote:
>
> I want to play with some options for splitting a table to  test performance.
>
>
>
> If I were to create a new table and perform an upsert select * to the 
> table, with billions of rows in the source table, is that like an 
> overnight operation or should it be pretty quick?
>
>
>
> For reference, we have 6 (beefy) region servers in our cluster.
>
>
>
> Thanks!
>
>
>
>

Re: How fast is upsert select?

Posted by Thomas D'Silva <td...@salesforce.com>.
Zack,

It depends on how wide the rows are in your table.  On a 8 node
cluster,   creating an index with 3 columns (char(15),varchar and
date) on a 1 billion row table takes about 1 hour 15 minutes.
How many rows does your table have and how wide are they?

On Wed, Jul 22, 2015 at 8:29 AM, Riesland, Zack
<Za...@sensus.com> wrote:
> Thanks Ravi,
>
>
>
> I think I may not have IndexTool in my version of Phoenix.
>
>
>
> I’m calling:
> HADOOP_CLASSPATH=/usr/hdp/current/hbase-master/conf/:/usr/hdp/current/hbase-master/lib/hbase-protocol.jar
> hadoop jar /usr/hdp/current/phoenix-client/phoenix-client.jar
> org.apache.phoenix.mapreduce.index.IndexTool
>
>
>
> And getting a java.lang.ClassNotFoundException:
> org.apache.phoenix.mapreduce.index.IndexTool
>
>
>
>
>
>
>
> From: Ravi Kiran [mailto:maghamravikiran@gmail.com]
> Sent: Wednesday, July 22, 2015 10:36 AM
> To: user@phoenix.apache.org
> Subject: Re: How fast is upsert select?
>
>
>
> Hi ,
>
>
>
>    Since you are saying billions of rows, why don't you try out the
> MapReduce route to speed up the process.  You can take a look at how
> IndexTool.java(https://github.com/apache/phoenix/blob/359c255ba6c67d01a810d203825264907f580735/phoenix-core/src/main/java/org/apache/phoenix/mapreduce/index/IndexTool.java)
> was written as it does a similar task of reading from a Phoenix table and
> writes the data into the target table using bulk load.
>
>
>
>
>
> Regards
>
> Ravi
>
>
>
> On Wed, Jul 22, 2015 at 6:23 AM, Riesland, Zack <Za...@sensus.com>
> wrote:
>
> I want to play with some options for splitting a table to  test performance.
>
>
>
> If I were to create a new table and perform an upsert select * to the table,
> with billions of rows in the source table, is that like an overnight
> operation or should it be pretty quick?
>
>
>
> For reference, we have 6 (beefy) region servers in our cluster.
>
>
>
> Thanks!
>
>
>
>

RE: How fast is upsert select?

Posted by "Riesland, Zack" <Za...@sensus.com>.
Thanks Ravi,

I think I may not have IndexTool in my version of Phoenix.

I’m calling: HADOOP_CLASSPATH=/usr/hdp/current/hbase-master/conf/:/usr/hdp/current/hbase-master/lib/hbase-protocol.jar hadoop jar /usr/hdp/current/phoenix-client/phoenix-client.jar org.apache.phoenix.mapreduce.index.IndexTool

And getting a java.lang.ClassNotFoundException: org.apache.phoenix.mapreduce.index.IndexTool



From: Ravi Kiran [mailto:maghamravikiran@gmail.com]
Sent: Wednesday, July 22, 2015 10:36 AM
To: user@phoenix.apache.org
Subject: Re: How fast is upsert select?

Hi ,

   Since you are saying billions of rows, why don't you try out the MapReduce route to speed up the process.  You can take a look at how IndexTool.java(https://github.com/apache/phoenix/blob/359c255ba6c67d01a810d203825264907f580735/phoenix-core/src/main/java/org/apache/phoenix/mapreduce/index/IndexTool.java)  was written as it does a similar task of reading from a Phoenix table and writes the data into the target table using bulk load.


Regards
Ravi

On Wed, Jul 22, 2015 at 6:23 AM, Riesland, Zack <Za...@sensus.com>> wrote:
I want to play with some options for splitting a table to  test performance.

If I were to create a new table and perform an upsert select * to the table, with billions of rows in the source table, is that like an overnight operation or should it be pretty quick?

For reference, we have 6 (beefy) region servers in our cluster.

Thanks!



Re: How fast is upsert select?

Posted by Ravi Kiran <ma...@gmail.com>.
Hi ,

   Since you are saying billions of rows, why don't you try out the
MapReduce route to speed up the process.  You can take a look at how
IndexTool.java(
https://github.com/apache/phoenix/blob/359c255ba6c67d01a810d203825264907f580735/phoenix-core/src/main/java/org/apache/phoenix/mapreduce/index/IndexTool.java)
 was written as it does a similar task of reading from a Phoenix table and
writes the data into the target table using bulk load.


Regards
Ravi

On Wed, Jul 22, 2015 at 6:23 AM, Riesland, Zack <Za...@sensus.com>
wrote:

>  I want to play with some options for splitting a table to  test
> performance.
>
>
>
> If I were to create a new table and perform an upsert select * to the
> table, with billions of rows in the source table, is that like an overnight
> operation or should it be pretty quick?
>
>
>
> For reference, we have 6 (beefy) region servers in our cluster.
>
>
>
> Thanks!
>
>
>