You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by 刘feng <f....@neusoft.com> on 2017/09/15 03:42:26 UTC

carbondata 加载数据问题咨询

您好,

   最近研究carbondata,在加载数据时遇到几个问题:

1,load 数据量超过10G,在collect at GlobalDictionaryUtil.scala:746
<http://namenode1:8088/proxy/application_1505443499883_0001/stages/stage?id=
4&attempt=0> 报错,导致无法进行

2,5G以内数据,往新建的表中insert时,一两分钟就可以成功,但是按照增量的方式
insert时会很慢,大约三十分钟。

以上,请问有什么优化的办法吗?谢谢!!!

配置:集群三个 数据节点,配置 128G内存 8核CPU,10块硬盘。

 

----------------------------------------------------------------------------
---------------------------

刘峰

Mobile:13889865456

 



---------------------------------------------------------------------------------------------------
Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s)
is intended only for the use of the intended recipient and may be confidential and/or privileged of
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is
not the intended recipient, unauthorized use, forwarding, printing,  storing, disclosure or copying
is strictly prohibited, and may be unlawful.If you have received this communication in error,please
immediately notify the sender by return e-mail, and delete the original message and all copies from
your system. Thank you.
---------------------------------------------------------------------------------------------------

Re: carbondata 加载数据问题咨询

Posted by Liang Chen <ch...@gmail.com>.
Hi

I have the same comments as cenyuhai, please provide more detail info, which
version you used?

Please refer to
https://github.com/apache/carbondata/blob/master/docs/useful-tips-on-carbondata.md, 
for high cardinality columns, you can use  script like TBLPROPERTIES
('DICTIONARY_EXCLUDE'='MSISDN') , not create dictionary.

Regards
Liang



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: carbondata 加载数据问题咨询

Posted by Sea <26...@qq.com>.
Hi, fengliu:
  please use english. Describe your steps as detailed as possible, error message is also needed.




------------------ Original ------------------
From:  "刘feng";<f....@neusoft.com>;
Date:  Fri, Sep 15, 2017 11:42 AM
To:  "dev"<de...@carbondata.apache.org>; 

Subject:  carbondata 加载数据问题咨询



您好,

   最近研究carbondata,在加载数据时遇到几个问题:

1,load 数据量超过10G,在collect at GlobalDictionaryUtil.scala:746
<http://namenode1:8088/proxy/application_1505443499883_0001/stages/stage?id=
4&attempt=0> 报错,导致无法进行

2,5G以内数据,往新建的表中insert时,一两分钟就可以成功,但是按照增量的方式
insert时会很慢,大约三十分钟。

以上,请问有什么优化的办法吗?谢谢!!!

配置:集群三个 数据节点,配置 128G内存 8核CPU,10块硬盘。

 

----------------------------------------------------------------------------
---------------------------

刘峰

Mobile:13889865456

 



---------------------------------------------------------------------------------------------------
Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s)
is intended only for the use of the intended recipient and may be confidential and/or privileged of
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is
not the intended recipient, unauthorized use, forwarding, printing,  storing, disclosure or copying
is strictly prohibited, and may be unlawful.If you have received this communication in error,please
immediately notify the sender by return e-mail, and delete the original message and all copies from
your system. Thank you.
---------------------------------------------------------------------------------------------------