You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by "Tokayer, Jason M." <Ja...@capitalone.com> on 2017/05/30 13:30:18 UTC

Hbase on S3

Does Kylin support storing data in S3 rather than HDFS?
________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Re: 答复: 答复: Hbase on S3

Posted by ShaoFeng Shi <sh...@apache.org>.
Hi zhuoran,

Performance tunning is always an advanced topic, many factors may impact on
the performance. You need identify the bottlenecks at first, and then take
corresponding actions.

2017-05-31 9:55 GMT+08:00 吕卓然 <lv...@fosun.com>:

> I do not have problem with AWS…I just replied to him. I was thinking he
> wanna use a different storage type rather than HBase. Maybe I made a
> mistake.
>
>
>
> However, I have another problem…
>
> The problem is that if I use exactly count distinct on a large amount of
> data, say like, 300 million records. It would take a long time, more than
> 10 seconds for me to get the result.
>
>
>
> Any suggestions?
>
>
>
> Thanks,
>
> Zhuoran
>
>
>
> *发件人:* ShaoFeng Shi [mailto:shaofengshi@apache.org]
> *发送时间:* 2017年5月31日 9:48
> *收件人:* user
> *主题:* Re: 答复: Hbase on S3
>
>
>
> Zhuoran, what's the problem you encountered? Some Kylin users already runs
> on AWS EMR or Azure HDInsight. No blocking issue we see. If you got issue,
> please share more information.
>
>
>
> 2017-05-31 9:41 GMT+08:00 吕卓然 <lv...@fosun.com>:
>
> I do not think so
>
>
>
> *发件人:* Tokayer, Jason M. [mailto:Jason.Tokayer@capitalone.com]
> *发送时间:* 2017年5月30日 21:30
> *收件人:* user@kylin.apache.org
> *主题:* Hbase on S3
>
>
>
> Does Kylin support storing data in S3 rather than HDFS?
>
>
> ------------------------------
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>
>
>
>
>
> --
>
> Best regards,
>
>
>
> Shaofeng Shi 史少锋
>
>
>



-- 
Best regards,

Shaofeng Shi 史少锋

答复: 答复: Hbase on S3

Posted by 吕卓然 <lv...@fosun.com>.
I do not have problem with AWS…I just replied to him. I was thinking he wanna use a different storage type rather than HBase. Maybe I made a mistake.

However, I have another problem…
The problem is that if I use exactly count distinct on a large amount of data, say like, 300 million records. It would take a long time, more than 10 seconds for me to get the result.

Any suggestions?

Thanks,
Zhuoran

发件人: ShaoFeng Shi [mailto:shaofengshi@apache.org]
发送时间: 2017年5月31日 9:48
收件人: user
主题: Re: 答复: Hbase on S3

Zhuoran, what's the problem you encountered? Some Kylin users already runs on AWS EMR or Azure HDInsight. No blocking issue we see. If you got issue, please share more information.

2017-05-31 9:41 GMT+08:00 吕卓然 <lv...@fosun.com>>:
I do not think so

发件人: Tokayer, Jason M. [mailto:Jason.Tokayer@capitalone.com<ma...@capitalone.com>]
发送时间: 2017年5月30日 21:30
收件人: user@kylin.apache.org<ma...@kylin.apache.org>
主题: Hbase on S3

Does Kylin support storing data in S3 rather than HDFS?

________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.



--
Best regards,

Shaofeng Shi 史少锋


Re: 答复: Hbase on S3

Posted by ShaoFeng Shi <sh...@apache.org>.
Sorry my initial statements is something misleading; Adding the
"hbase.zookeeper.quorum"
to kylin_job_conf.xml will make Kylin works with EMR to finish the cubing.
While the default FS of EMR is still be the HDFS (with local disks), not S3

To use S3 as the data store, you need 1) configure kylin working dir as
something like s3://your-bucket/kylin ; 2) use a HBase version which
support S3 as storage.





2017-05-31 12:33 GMT+08:00 iain wright <ia...@gmail.com>:

> It looks like Hbase can use S3 via EMRFS , it must be a fairly new
> feature, it was asked on the Hbase list for years and wasn't possible before
>
> http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hbase-s3.html
>
> Sent from my iPhone
>
> On May 30, 2017, at 6:48 PM, ShaoFeng Shi <sh...@apache.org> wrote:
>
> Zhuoran, what's the problem you encountered? Some Kylin users already runs
> on AWS EMR or Azure HDInsight. No blocking issue we see. If you got issue,
> please share more information.
>
> 2017-05-31 9:41 GMT+08:00 吕卓然 <lv...@fosun.com>:
>
>> I do not think so
>>
>>
>>
>> *发件人:* Tokayer, Jason M. [mailto:Jason.Tokayer@capitalone.com]
>> *发送时间:* 2017年5月30日 21:30
>> *收件人:* user@kylin.apache.org
>> *主题:* Hbase on S3
>>
>>
>>
>> Does Kylin support storing data in S3 rather than HDFS?
>>
>>
>> ------------------------------
>>
>> The information contained in this e-mail is confidential and/or
>> proprietary to Capital One and/or its affiliates and may only be used
>> solely in performance of work or services for Capital One. The information
>> transmitted herewith is intended only for use by the individual or entity
>> to which it is addressed. If the reader of this message is not the intended
>> recipient, you are hereby notified that any review, retransmission,
>> dissemination, distribution, copying or other use of, or taking of any
>> action in reliance upon this information is strictly prohibited. If you
>> have received this communication in error, please contact the sender and
>> delete the material from your computer.
>>
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


-- 
Best regards,

Shaofeng Shi 史少锋

Re: 答复: Hbase on S3

Posted by ShaoFeng Shi <sh...@apache.org>.
Sorry my initial statements is something misleading; Adding the
"hbase.zookeeper.quorum"
to kylin_job_conf.xml will make Kylin works with EMR to finish the cubing.
While the default FS of EMR is still be the HDFS (with local disks), not S3

To use S3 as the data store, you need 1) configure kylin working dir as
something like s3://your-bucket/kylin ; 2) use a HBase version which
support S3 as storage.





2017-05-31 12:33 GMT+08:00 iain wright <ia...@gmail.com>:

> It looks like Hbase can use S3 via EMRFS , it must be a fairly new
> feature, it was asked on the Hbase list for years and wasn't possible before
>
> http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hbase-s3.html
>
> Sent from my iPhone
>
> On May 30, 2017, at 6:48 PM, ShaoFeng Shi <sh...@apache.org> wrote:
>
> Zhuoran, what's the problem you encountered? Some Kylin users already runs
> on AWS EMR or Azure HDInsight. No blocking issue we see. If you got issue,
> please share more information.
>
> 2017-05-31 9:41 GMT+08:00 吕卓然 <lv...@fosun.com>:
>
>> I do not think so
>>
>>
>>
>> *发件人:* Tokayer, Jason M. [mailto:Jason.Tokayer@capitalone.com]
>> *发送时间:* 2017年5月30日 21:30
>> *收件人:* user@kylin.apache.org
>> *主题:* Hbase on S3
>>
>>
>>
>> Does Kylin support storing data in S3 rather than HDFS?
>>
>>
>> ------------------------------
>>
>> The information contained in this e-mail is confidential and/or
>> proprietary to Capital One and/or its affiliates and may only be used
>> solely in performance of work or services for Capital One. The information
>> transmitted herewith is intended only for use by the individual or entity
>> to which it is addressed. If the reader of this message is not the intended
>> recipient, you are hereby notified that any review, retransmission,
>> dissemination, distribution, copying or other use of, or taking of any
>> action in reliance upon this information is strictly prohibited. If you
>> have received this communication in error, please contact the sender and
>> delete the material from your computer.
>>
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


-- 
Best regards,

Shaofeng Shi 史少锋

Re: 答复: Hbase on S3

Posted by iain wright <ia...@gmail.com>.
It looks like Hbase can use S3 via EMRFS , it must be a fairly new feature, it was asked on the Hbase list for years and wasn't possible before

http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hbase-s3.html

Sent from my iPhone

> On May 30, 2017, at 6:48 PM, ShaoFeng Shi <sh...@apache.org> wrote:
> 
> Zhuoran, what's the problem you encountered? Some Kylin users already runs on AWS EMR or Azure HDInsight. No blocking issue we see. If you got issue, please share more information.
> 
> 2017-05-31 9:41 GMT+08:00 吕卓然 <lv...@fosun.com>:
>> I do not think so
>> 
>>  
>> 
>> 发件人: Tokayer, Jason M. [mailto:Jason.Tokayer@capitalone.com] 
>> 发送时间: 2017年5月30日 21:30
>> 收件人: user@kylin.apache.org
>> 主题: Hbase on S3
>> 
>>  
>> 
>> Does Kylin support storing data in S3 rather than HDFS?
>> 
>>  
>> 
>> The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.
> 
> 
> 
> -- 
> Best regards,
> 
> Shaofeng Shi 史少锋
> 

Re: 答复: Hbase on S3

Posted by ShaoFeng Shi <sh...@apache.org>.
Zhuoran, what's the problem you encountered? Some Kylin users already runs
on AWS EMR or Azure HDInsight. No blocking issue we see. If you got issue,
please share more information.

2017-05-31 9:41 GMT+08:00 吕卓然 <lv...@fosun.com>:

> I do not think so
>
>
>
> *发件人:* Tokayer, Jason M. [mailto:Jason.Tokayer@capitalone.com]
> *发送时间:* 2017年5月30日 21:30
> *收件人:* user@kylin.apache.org
> *主题:* Hbase on S3
>
>
>
> Does Kylin support storing data in S3 rather than HDFS?
>
>
> ------------------------------
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>



-- 
Best regards,

Shaofeng Shi 史少锋

答复: Hbase on S3

Posted by 吕卓然 <lv...@fosun.com>.
I do not think so

发件人: Tokayer, Jason M. [mailto:Jason.Tokayer@capitalone.com]
发送时间: 2017年5月30日 21:30
收件人: user@kylin.apache.org
主题: Hbase on S3

Does Kylin support storing data in S3 rather than HDFS?

________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.