You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by Mohanraj Ragupathiraj <mo...@gmail.com> on 2016/05/17 10:21:32 UTC

PHOENIX SPARK - DataFrame for BulkLoad

I have 100 million records to be inserted to a HBase table (PHOENIX) as a
result of a Spark Job. I would like to know if i convert it to a Dataframe
and save it, will it do Bulk load (or) it is not the efficient way to write
data to Phoenix HBase table

-- 
Thanks and Regards
Mohan

Re: PHOENIX SPARK - DataFrame for BulkLoad

Posted by Mohanraj Ragupathiraj <mo...@gmail.com>.
Thank you very much., I will try and post the updates.

On Wed, May 18, 2016 at 10:29 PM, Josh Mahonin <jm...@gmail.com> wrote:

> Hi,
>
> The Spark integration uses the Phoenix MapReduce framework, which under
> the hood translates those to UPSERTs spread across a number of workers.
>
> You should try out both methods and see which works best for your use
> case. For what it's worth, we routinely do load / save operations using the
> Spark integration on those data sizes.
>
> Josh
>
> On Tue, May 17, 2016 at 7:03 AM, Radha krishna <gr...@gmail.com> wrote:
>
>> Hi
>>
>> I have the same scenario, can you share your metrics like column count
>> for each row, number of SALT_BUCKETS, compression technique which you used
>> and how much time it is taking to load the complete data.
>>
>> my scenario is I have to load 1.9 billions of records ( approx 20 files
>> data each file contains 100 million rows and 102 columns per each row)
>> currently it is taking 35 to 45 minutes to load one file data
>>
>>
>>
>> On Tue, May 17, 2016 at 3:51 PM, Mohanraj Ragupathiraj <
>> mohanaugust@gmail.com> wrote:
>>
>>> I have 100 million records to be inserted to a HBase table (PHOENIX) as
>>> a result of a Spark Job. I would like to know if i convert it to a
>>> Dataframe and save it, will it do Bulk load (or) it is not the efficient
>>> way to write data to Phoenix HBase table
>>>
>>> --
>>> Thanks and Regards
>>> Mohan
>>>
>>
>>
>>
>> --
>>
>>
>>
>>
>>
>>
>>
>>
>> Thanks & Regards
>>    Radha krishna
>>
>>
>>
>


-- 
Thanks and Regards
Mohan
VISA Pte Limited, Singapore.

Re: PHOENIX SPARK - DataFrame for BulkLoad

Posted by Josh Mahonin <jm...@gmail.com>.
Hi,

The Spark integration uses the Phoenix MapReduce framework, which under the
hood translates those to UPSERTs spread across a number of workers.

You should try out both methods and see which works best for your use case.
For what it's worth, we routinely do load / save operations using the Spark
integration on those data sizes.

Josh

On Tue, May 17, 2016 at 7:03 AM, Radha krishna <gr...@gmail.com> wrote:

> Hi
>
> I have the same scenario, can you share your metrics like column count for
> each row, number of SALT_BUCKETS, compression technique which you used and
> how much time it is taking to load the complete data.
>
> my scenario is I have to load 1.9 billions of records ( approx 20 files
> data each file contains 100 million rows and 102 columns per each row)
> currently it is taking 35 to 45 minutes to load one file data
>
>
>
> On Tue, May 17, 2016 at 3:51 PM, Mohanraj Ragupathiraj <
> mohanaugust@gmail.com> wrote:
>
>> I have 100 million records to be inserted to a HBase table (PHOENIX) as a
>> result of a Spark Job. I would like to know if i convert it to a Dataframe
>> and save it, will it do Bulk load (or) it is not the efficient way to write
>> data to Phoenix HBase table
>>
>> --
>> Thanks and Regards
>> Mohan
>>
>
>
>
> --
>
>
>
>
>
>
>
>
> Thanks & Regards
>    Radha krishna
>
>
>

Re: PHOENIX SPARK - DataFrame for BulkLoad

Posted by Radha krishna <gr...@gmail.com>.
Hi

I have the same scenario, can you share your metrics like column count for
each row, number of SALT_BUCKETS, compression technique which you used and
how much time it is taking to load the complete data.

my scenario is I have to load 1.9 billions of records ( approx 20 files
data each file contains 100 million rows and 102 columns per each row)
currently it is taking 35 to 45 minutes to load one file data



On Tue, May 17, 2016 at 3:51 PM, Mohanraj Ragupathiraj <
mohanaugust@gmail.com> wrote:

> I have 100 million records to be inserted to a HBase table (PHOENIX) as a
> result of a Spark Job. I would like to know if i convert it to a Dataframe
> and save it, will it do Bulk load (or) it is not the efficient way to write
> data to Phoenix HBase table
>
> --
> Thanks and Regards
> Mohan
>



-- 








Thanks & Regards
   Radha krishna