You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by Vamsi Krishna <va...@gmail.com> on 2016/03/15 13:41:00 UTC

Does phoenix CsvBulkLoadTool write to WAL/Memstore

Team,

Does phoenix CsvBulkLoadTool write to HBase WAL/Memstore?

Phoenix-Spark plugin:
Does saveToPhoenix method on RDD[Tuple] write to HBase WAL/Memstore?

Thanks,
Vamsi Attluri
-- 
Vamsi Attluri

Re: Does phoenix CsvBulkLoadTool write to WAL/Memstore

Posted by Vamsi Krishna <va...@gmail.com>.
Thanks Pari.

The frequency of the job is weekly.
No. of rows is around 10 billion.
Cluster is 13 node.
>From what you have mentioned I see that CsvBulkLoadTool is best option for
my scenario.

I see you have mentioned about increasing the batch size to accommodate
more rows.
Are you talking about the 'phoenix.mutate.batchSize' configuration
parameter?

Vamsi Attluri

On Wed, Mar 16, 2016 at 9:01 AM Pariksheet Barapatre <pb...@gmail.com>
wrote:

> Hi Vamsi,
>
> How many number of rows your expecting out of your transformation and what
> is the frequency of job?
>
> If there are less number of row (< ~100K and this depends on cluster size
> as well), you can go ahead with phoenix-spark plug-in , increase  batch
> size to accommodate more rows, else use CVSbulkLoader.
>
> Thanks
> Pari
>
> On 16 March 2016 at 20:03, Vamsi Krishna <va...@gmail.com> wrote:
>
>> Thanks Gabriel & Ravi.
>>
>> I have a data processing job wirtten in Spark-Scala.
>> I do a join on data from 2 data files (CSV files) and do data
>> transformation on the resulting data. Finally load the transformed data
>> into phoenix table using Phoenix-Spark plugin.
>> On seeing that Phoenix-Spark plugin goes through regular HBase write path
>> (writes to WAL), i'm thinking of option 2 to reduce the job execution time.
>>
>> *Option 2:* Do data transformation in Spark and write the transformed
>> data to a CSV file and use Phoenix CsvBulkLoadTool to load data into
>> Phoenix table.
>>
>> Has anyone tried this kind of exercise? Any thoughts.
>>
>> Thanks,
>> Vamsi Attluri
>>
>> On Tue, Mar 15, 2016 at 9:40 PM Ravi Kiran <ma...@gmail.com>
>> wrote:
>>
>>> Hi Vamsi,
>>>    The upserts through Phoenix-spark plugin definitely go through WAL .
>>>
>>>
>>> On Tue, Mar 15, 2016 at 5:56 AM, Gabriel Reid <ga...@gmail.com>
>>> wrote:
>>>
>>>> Hi Vamsi,
>>>>
>>>> I can't answer your question abotu the Phoenix-Spark plugin (although
>>>> I'm sure that someone else here can).
>>>>
>>>> However, I can tell you that the CsvBulkLoadTool does not write to the
>>>> WAL or to the Memstore. It simply writes HFiles and then hands those
>>>> HFiles over to HBase, so the memstore and WAL are never
>>>> touched/affected by this.
>>>>
>>>> - Gabriel
>>>>
>>>>
>>>> On Tue, Mar 15, 2016 at 1:41 PM, Vamsi Krishna <va...@gmail.com>
>>>> wrote:
>>>> > Team,
>>>> >
>>>> > Does phoenix CsvBulkLoadTool write to HBase WAL/Memstore?
>>>> >
>>>> > Phoenix-Spark plugin:
>>>> > Does saveToPhoenix method on RDD[Tuple] write to HBase WAL/Memstore?
>>>> >
>>>> > Thanks,
>>>> > Vamsi Attluri
>>>> > --
>>>> > Vamsi Attluri
>>>>
>>>
>>> --
>> Vamsi Attluri
>>
>
>
>
> --
> Cheers,
> Pari
>
-- 
Vamsi Attluri

Re: Does phoenix CsvBulkLoadTool write to WAL/Memstore

Posted by Pariksheet Barapatre <pb...@gmail.com>.
Hi Vamsi,

How many number of rows your expecting out of your transformation and what
is the frequency of job?

If there are less number of row (< ~100K and this depends on cluster size
as well), you can go ahead with phoenix-spark plug-in , increase  batch
size to accommodate more rows, else use CVSbulkLoader.

Thanks
Pari

On 16 March 2016 at 20:03, Vamsi Krishna <va...@gmail.com> wrote:

> Thanks Gabriel & Ravi.
>
> I have a data processing job wirtten in Spark-Scala.
> I do a join on data from 2 data files (CSV files) and do data
> transformation on the resulting data. Finally load the transformed data
> into phoenix table using Phoenix-Spark plugin.
> On seeing that Phoenix-Spark plugin goes through regular HBase write path
> (writes to WAL), i'm thinking of option 2 to reduce the job execution time.
>
> *Option 2:* Do data transformation in Spark and write the transformed
> data to a CSV file and use Phoenix CsvBulkLoadTool to load data into
> Phoenix table.
>
> Has anyone tried this kind of exercise? Any thoughts.
>
> Thanks,
> Vamsi Attluri
>
> On Tue, Mar 15, 2016 at 9:40 PM Ravi Kiran <ma...@gmail.com>
> wrote:
>
>> Hi Vamsi,
>>    The upserts through Phoenix-spark plugin definitely go through WAL .
>>
>>
>> On Tue, Mar 15, 2016 at 5:56 AM, Gabriel Reid <ga...@gmail.com>
>> wrote:
>>
>>> Hi Vamsi,
>>>
>>> I can't answer your question abotu the Phoenix-Spark plugin (although
>>> I'm sure that someone else here can).
>>>
>>> However, I can tell you that the CsvBulkLoadTool does not write to the
>>> WAL or to the Memstore. It simply writes HFiles and then hands those
>>> HFiles over to HBase, so the memstore and WAL are never
>>> touched/affected by this.
>>>
>>> - Gabriel
>>>
>>>
>>> On Tue, Mar 15, 2016 at 1:41 PM, Vamsi Krishna <va...@gmail.com>
>>> wrote:
>>> > Team,
>>> >
>>> > Does phoenix CsvBulkLoadTool write to HBase WAL/Memstore?
>>> >
>>> > Phoenix-Spark plugin:
>>> > Does saveToPhoenix method on RDD[Tuple] write to HBase WAL/Memstore?
>>> >
>>> > Thanks,
>>> > Vamsi Attluri
>>> > --
>>> > Vamsi Attluri
>>>
>>
>> --
> Vamsi Attluri
>



-- 
Cheers,
Pari

Re: Does phoenix CsvBulkLoadTool write to WAL/Memstore

Posted by Vamsi Krishna <va...@gmail.com>.
Thanks Gabriel & Ravi.

I have a data processing job wirtten in Spark-Scala.
I do a join on data from 2 data files (CSV files) and do data
transformation on the resulting data. Finally load the transformed data
into phoenix table using Phoenix-Spark plugin.
On seeing that Phoenix-Spark plugin goes through regular HBase write path
(writes to WAL), i'm thinking of option 2 to reduce the job execution time.

*Option 2:* Do data transformation in Spark and write the transformed data
to a CSV file and use Phoenix CsvBulkLoadTool to load data into Phoenix
table.

Has anyone tried this kind of exercise? Any thoughts.

Thanks,
Vamsi Attluri

On Tue, Mar 15, 2016 at 9:40 PM Ravi Kiran <ma...@gmail.com>
wrote:

> Hi Vamsi,
>    The upserts through Phoenix-spark plugin definitely go through WAL .
>
>
> On Tue, Mar 15, 2016 at 5:56 AM, Gabriel Reid <ga...@gmail.com>
> wrote:
>
>> Hi Vamsi,
>>
>> I can't answer your question abotu the Phoenix-Spark plugin (although
>> I'm sure that someone else here can).
>>
>> However, I can tell you that the CsvBulkLoadTool does not write to the
>> WAL or to the Memstore. It simply writes HFiles and then hands those
>> HFiles over to HBase, so the memstore and WAL are never
>> touched/affected by this.
>>
>> - Gabriel
>>
>>
>> On Tue, Mar 15, 2016 at 1:41 PM, Vamsi Krishna <va...@gmail.com>
>> wrote:
>> > Team,
>> >
>> > Does phoenix CsvBulkLoadTool write to HBase WAL/Memstore?
>> >
>> > Phoenix-Spark plugin:
>> > Does saveToPhoenix method on RDD[Tuple] write to HBase WAL/Memstore?
>> >
>> > Thanks,
>> > Vamsi Attluri
>> > --
>> > Vamsi Attluri
>>
>
> --
Vamsi Attluri

Re: Does phoenix CsvBulkLoadTool write to WAL/Memstore

Posted by Ravi Kiran <ma...@gmail.com>.
Hi Vamsi,
   The upserts through Phoenix-spark plugin definitely go through WAL .


On Tue, Mar 15, 2016 at 5:56 AM, Gabriel Reid <ga...@gmail.com>
wrote:

> Hi Vamsi,
>
> I can't answer your question abotu the Phoenix-Spark plugin (although
> I'm sure that someone else here can).
>
> However, I can tell you that the CsvBulkLoadTool does not write to the
> WAL or to the Memstore. It simply writes HFiles and then hands those
> HFiles over to HBase, so the memstore and WAL are never
> touched/affected by this.
>
> - Gabriel
>
>
> On Tue, Mar 15, 2016 at 1:41 PM, Vamsi Krishna <va...@gmail.com>
> wrote:
> > Team,
> >
> > Does phoenix CsvBulkLoadTool write to HBase WAL/Memstore?
> >
> > Phoenix-Spark plugin:
> > Does saveToPhoenix method on RDD[Tuple] write to HBase WAL/Memstore?
> >
> > Thanks,
> > Vamsi Attluri
> > --
> > Vamsi Attluri
>

Re: Does phoenix CsvBulkLoadTool write to WAL/Memstore

Posted by Gabriel Reid <ga...@gmail.com>.
Hi Vamsi,

I can't answer your question abotu the Phoenix-Spark plugin (although
I'm sure that someone else here can).

However, I can tell you that the CsvBulkLoadTool does not write to the
WAL or to the Memstore. It simply writes HFiles and then hands those
HFiles over to HBase, so the memstore and WAL are never
touched/affected by this.

- Gabriel


On Tue, Mar 15, 2016 at 1:41 PM, Vamsi Krishna <va...@gmail.com> wrote:
> Team,
>
> Does phoenix CsvBulkLoadTool write to HBase WAL/Memstore?
>
> Phoenix-Spark plugin:
> Does saveToPhoenix method on RDD[Tuple] write to HBase WAL/Memstore?
>
> Thanks,
> Vamsi Attluri
> --
> Vamsi Attluri