You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kudu.apache.org by Andrey Kuznetsov <An...@epam.com> on 2017/08/09 16:05:15 UTC

[kudu] import from hdfs

Hi folk,
I have a problem with hdfs to kudu performance, I have created external table with CSV data and ran "insert as select"  from it to kudu-table and to parquet-table:
Importing to parquet-table is 3x faster than to kudu - do you know some tips/tricks to increase performance of import?
actually I am importing 8TB of data, so it is critical for me,

Best regards,
ANDREY KUZNETSOV
Software Engineering Team Leader, Assessment Global Discipline Head (Java)

Office: +7 482 263 00 70 x 42766<tel:+7%20482%20263%2000%2070;ext=42766>   Cell: +7 920 154 05 72<tel:+7%20920%20154%2005%2072>   Email: andrey_kuznetsov@epam.com<ma...@epam.com>
Tver, Russia   epam.com<http://www.epam.com/>

CONFIDENTIALITY CAUTION AND DISCLAIMER
This message is intended only for the use of the individual(s) or entity(ies) to which it is addressed and contains information that is legally privileged and confidential. If you are not the intended recipient, or the person responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. All unintended recipients are obliged to delete this message and destroy any printed copies.


RE: [kudu] import from hdfs

Posted by Andrey Kuznetsov <An...@epam.com>.
Yep, I’ve added a few Gbs ☺
But it provides minimal effect for performance of import

Best regards,
ANDREY KUZNETSOV
Software Engineering Team Leader, Assessment Global Discipline Head (Java)

Office: +7 482 263 00 70 x 42766<tel:+7%20482%20263%2000%2070;ext=42766>   Cell: +7 920 154 05 72<tel:+7%20920%20154%2005%2072>   Email: andrey_kuznetsov@epam.com<ma...@epam.com>
Tver, Russia   epam.com<http://www.epam.com/>

CONFIDENTIALITY CAUTION AND DISCLAIMER
This message is intended only for the use of the individual(s) or entity(ies) to which it is addressed and contains information that is legally privileged and confidential. If you are not the intended recipient, or the person responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. All unintended recipients are obliged to delete this message and destroy any printed copies.

From: Jean-Daniel Cryans [mailto:jdcryans@apache.org]
Sent: Wednesday, August 16, 2017 9:39 PM
To: user@kudu.apache.org
Cc: Special SBER-BPOC Team <Sp...@epam.com>
Subject: Re: [kudu] import from hdfs

Huh this is confusing, how much memory did you say you have per node? You mentioned 256GB but I'm not sure what it relates to anymore because I see you gave 400GB to Kudu in there.

Also, why a single disk? Is HDFS using more than one?

On Tue, Aug 15, 2017 at 9:40 AM, Andrey Kuznetsov <An...@epam.com>> wrote:
Hi Jean-Daniel,
No problem, you can find screen in attachment,
Could not provide the log due security reasons, sorry…

Best regards,
ANDREY KUZNETSOV
Software Engineering Team Leader, Assessment Global Discipline Head (Java)

Office: +7 482 263 00 70 x 42766<tel:+7%20482%20263%2000%2070;ext=42766>   Cell: +7 920 154 05 72<tel:+7%20920%20154%2005%2072>   Email: andrey_kuznetsov@epam.com<ma...@epam.com>
Tver, Russia   epam.com<http://www.epam.com/>

CONFIDENTIALITY CAUTION AND DISCLAIMER
This message is intended only for the use of the individual(s) or entity(ies) to which it is addressed and contains information that is legally privileged and confidential. If you are not the intended recipient, or the person responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. All unintended recipients are obliged to delete this message and destroy any printed copies.

From: Jean-Daniel Cryans [mailto:jdcryans@apache.org<ma...@apache.org>]
Sent: Thursday, August 10, 2017 6:55 PM

To: user@kudu.apache.org<ma...@kudu.apache.org>
Cc: Special SBER-BPOC Team <Sp...@epam.com>>
Subject: Re: [kudu] import from hdfs

Hi Andrey,

Can you double check how much memory is actually given to Kudu? That's --memory_limit_hard_bytes. Providing us with a full kudu-tserver log could be useful, as long as it starts with this line "Tablet server non-default flags".

Without more data about your situation it's going to be really hard to help you.

Thx,

J-D

On Thu, Aug 10, 2017 at 4:46 AM, Andrey Kuznetsov <An...@epam.com>> wrote:
Hi Jean-Daniel,
Nice to hear you)

I use kudu 1.3, I hope kudu has enough memory (about 256Gb each node),
I have played with threads parameter, but there are no a lot of differences -
it is extremely slow…

Best regards,
ANDREY KUZNETSOV
Software Engineering Team Leader, Assessment Global Discipline Head (Java)

Office: +7 482 263 00 70 x 42766<tel:+7%20482%20263%2000%2070;ext=42766>   Cell: +7 920 154 05 72<tel:+7%20920%20154%2005%2072>   Email: andrey_kuznetsov@epam.com<ma...@epam.com>
Tver, Russia   epam.com<http://www.epam.com/>

CONFIDENTIALITY CAUTION AND DISCLAIMER
This message is intended only for the use of the individual(s) or entity(ies) to which it is addressed and contains information that is legally privileged and confidential. If you are not the intended recipient, or the person responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. All unintended recipients are obliged to delete this message and destroy any printed copies.

From: Jean-Daniel Cryans [mailto:jdcryans@apache.org<ma...@apache.org>]
Sent: Wednesday, August 9, 2017 10:52 PM
To: user@kudu.apache.org<ma...@kudu.apache.org>
Cc: Special SBER-BPOC Team <Sp...@epam.com>>
Subject: Re: [kudu] import from hdfs

Hi Andrey,

Which version of Kudu and Impala are you using? Just that can make a huge difference.

Apart from that, make sure Kudu has enough memory (no memory back pressure), you have enough maintenance manager threads (1/3 or 1/4 the number of disks), and that your partitioning favors good load distribution.

But TBH writing to Parquet will remain faster than writing to Kudu, because Kudu isn't just dropping the rows into a file and has to do more than that.

Hope this helps,

J-D

On Wed, Aug 9, 2017 at 9:05 AM, Andrey Kuznetsov <An...@epam.com>> wrote:
Hi folk,
I have a problem with hdfs to kudu performance, I have created external table with CSV data and ran “insert as select”  from it to kudu-table and to parquet-table:
Importing to parquet-table is 3x faster than to kudu – do you know some tips/tricks to increase performance of import?
actually I am importing 8TB of data, so it is critical for me,

Best regards,
ANDREY KUZNETSOV
Software Engineering Team Leader, Assessment Global Discipline Head (Java)

Office: +7 482 263 00 70 x 42766<tel:+7%20482%20263%2000%2070;ext=42766>   Cell: +7 920 154 05 72<tel:+7%20920%20154%2005%2072>   Email: andrey_kuznetsov@epam.com<ma...@epam.com>
Tver, Russia   epam.com<http://www.epam.com/>

CONFIDENTIALITY CAUTION AND DISCLAIMER
This message is intended only for the use of the individual(s) or entity(ies) to which it is addressed and contains information that is legally privileged and confidential. If you are not the intended recipient, or the person responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. All unintended recipients are obliged to delete this message and destroy any printed copies.





Re: [kudu] import from hdfs

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Huh this is confusing, how much memory did you say you have per node? You
mentioned 256GB but I'm not sure what it relates to anymore because I see
you gave 400GB to Kudu in there.

Also, why a single disk? Is HDFS using more than one?

On Tue, Aug 15, 2017 at 9:40 AM, Andrey Kuznetsov <Andrey_Kuznetsov@epam.com
> wrote:

> Hi Jean-Daniel,
>
> No problem, you can find screen in attachment,
>
> Could not provide the log due security reasons, sorry…
>
>
>
> Best regards,
>
> *ANDREY KUZNETSOV*
>
> *Software Engineering Team Leader, Assessment Global Discipline Head
> (Java)*
>
>
>
> *Office: *+7 482 263 00 70 *x* 42766 <+7%20482%20263%2000%2070;ext=42766>
>    *Cell: *+7 920 154 05 72 <+7%20920%20154%2005%2072>   *Email: *
> andrey_kuznetsov@epam.com
>
> *Tver,* *Russia *  *epam.com <http://www.epam.com/>*
>
>
>
> CONFIDENTIALITY CAUTION AND DISCLAIMER
> This message is intended only for the use of the individual(s) or
> entity(ies) to which it is addressed and contains information that is
> legally privileged and confidential. If you are not the intended recipient,
> or the person responsible for delivering the message to the intended
> recipient, you are hereby notified that any dissemination, distribution or
> copying of this communication is strictly prohibited. All unintended
> recipients are obliged to delete this message and destroy any printed
> copies.
>
>
>
> *From:* Jean-Daniel Cryans [mailto:jdcryans@apache.org]
> *Sent:* Thursday, August 10, 2017 6:55 PM
>
> *To:* user@kudu.apache.org
> *Cc:* Special SBER-BPOC Team <Sp...@epam.com>
> *Subject:* Re: [kudu] import from hdfs
>
>
>
> Hi Andrey,
>
>
>
> Can you double check how much memory is actually given to Kudu? That's
> --memory_limit_hard_bytes. Providing us with a full kudu-tserver log could
> be useful, as long as it starts with this line "Tablet server non-default
> flags".
>
>
>
> Without more data about your situation it's going to be really hard to
> help you.
>
>
>
> Thx,
>
>
>
> J-D
>
>
>
> On Thu, Aug 10, 2017 at 4:46 AM, Andrey Kuznetsov <
> Andrey_Kuznetsov@epam.com> wrote:
>
> Hi Jean-Daniel,
>
> Nice to hear you)
>
>
>
> I use kudu 1.3, I hope kudu has enough memory (about 256Gb each node),
>
> I have played with threads parameter, but there are no a lot of
> differences -
>
> it is extremely slow…
>
>
>
> Best regards,
>
> *ANDREY KUZNETSOV*
>
> *Software Engineering Team Leader, Assessment Global Discipline Head
> (Java)*
>
>
>
> *Office: *+7 482 263 00 70 *x* 42766 <+7%20482%20263%2000%2070;ext=42766>
>    *Cell: *+7 920 154 05 72 <+7%20920%20154%2005%2072>   *Email: *
> andrey_kuznetsov@epam.com
>
> *Tver,* *Russia *  *epam.com <http://www.epam.com/>*
>
>
>
> CONFIDENTIALITY CAUTION AND DISCLAIMER
> This message is intended only for the use of the individual(s) or
> entity(ies) to which it is addressed and contains information that is
> legally privileged and confidential. If you are not the intended recipient,
> or the person responsible for delivering the message to the intended
> recipient, you are hereby notified that any dissemination, distribution or
> copying of this communication is strictly prohibited. All unintended
> recipients are obliged to delete this message and destroy any printed
> copies.
>
>
>
> *From:* Jean-Daniel Cryans [mailto:jdcryans@apache.org]
> *Sent:* Wednesday, August 9, 2017 10:52 PM
> *To:* user@kudu.apache.org
> *Cc:* Special SBER-BPOC Team <Sp...@epam.com>
> *Subject:* Re: [kudu] import from hdfs
>
>
>
> Hi Andrey,
>
>
>
> Which version of Kudu and Impala are you using? Just that can make a huge
> difference.
>
>
>
> Apart from that, make sure Kudu has enough memory (no memory back
> pressure), you have enough maintenance manager threads (1/3 or 1/4 the
> number of disks), and that your partitioning favors good load distribution.
>
>
>
> But TBH writing to Parquet will remain faster than writing to Kudu,
> because Kudu isn't just dropping the rows into a file and has to do more
> than that.
>
>
>
> Hope this helps,
>
>
>
> J-D
>
>
>
> On Wed, Aug 9, 2017 at 9:05 AM, Andrey Kuznetsov <
> Andrey_Kuznetsov@epam.com> wrote:
>
> Hi folk,
>
> I have a problem with hdfs to kudu performance, I have created external
> table with CSV data and ran “insert as select”  from it to kudu-table and
> to parquet-table:
>
> Importing to parquet-table is 3x faster than to kudu – do you know some
> tips/tricks to increase performance of import?
>
> actually I am importing 8TB of data, so it is critical for me,
>
>
>
> Best regards,
>
> *ANDREY KUZNETSOV*
>
> *Software Engineering Team Leader, Assessment Global Discipline Head
> (Java)*
>
>
>
> *Office: *+7 482 263 00 70 *x* 42766 <+7%20482%20263%2000%2070;ext=42766>
>    *Cell: *+7 920 154 05 72 <+7%20920%20154%2005%2072>   *Email: *
> andrey_kuznetsov@epam.com
>
> *Tver,* *Russia *  *epam.com <http://www.epam.com/>*
>
>
>
> CONFIDENTIALITY CAUTION AND DISCLAIMER
> This message is intended only for the use of the individual(s) or
> entity(ies) to which it is addressed and contains information that is
> legally privileged and confidential. If you are not the intended recipient,
> or the person responsible for delivering the message to the intended
> recipient, you are hereby notified that any dissemination, distribution or
> copying of this communication is strictly prohibited. All unintended
> recipients are obliged to delete this message and destroy any printed
> copies.
>
>
>
>
>
>
>

RE: [kudu] import from hdfs

Posted by Andrey Kuznetsov <An...@epam.com>.
Hi Jean-Daniel,
No problem, you can find screen in attachment,
Could not provide the log due security reasons, sorry…

Best regards,
ANDREY KUZNETSOV
Software Engineering Team Leader, Assessment Global Discipline Head (Java)

Office: +7 482 263 00 70 x 42766<tel:+7%20482%20263%2000%2070;ext=42766>   Cell: +7 920 154 05 72<tel:+7%20920%20154%2005%2072>   Email: andrey_kuznetsov@epam.com<ma...@epam.com>
Tver, Russia   epam.com<http://www.epam.com/>

CONFIDENTIALITY CAUTION AND DISCLAIMER
This message is intended only for the use of the individual(s) or entity(ies) to which it is addressed and contains information that is legally privileged and confidential. If you are not the intended recipient, or the person responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. All unintended recipients are obliged to delete this message and destroy any printed copies.

From: Jean-Daniel Cryans [mailto:jdcryans@apache.org]
Sent: Thursday, August 10, 2017 6:55 PM
To: user@kudu.apache.org
Cc: Special SBER-BPOC Team <Sp...@epam.com>
Subject: Re: [kudu] import from hdfs

Hi Andrey,

Can you double check how much memory is actually given to Kudu? That's --memory_limit_hard_bytes. Providing us with a full kudu-tserver log could be useful, as long as it starts with this line "Tablet server non-default flags".

Without more data about your situation it's going to be really hard to help you.

Thx,

J-D

On Thu, Aug 10, 2017 at 4:46 AM, Andrey Kuznetsov <An...@epam.com>> wrote:
Hi Jean-Daniel,
Nice to hear you)

I use kudu 1.3, I hope kudu has enough memory (about 256Gb each node),
I have played with threads parameter, but there are no a lot of differences -
it is extremely slow…

Best regards,
ANDREY KUZNETSOV
Software Engineering Team Leader, Assessment Global Discipline Head (Java)

Office: +7 482 263 00 70 x 42766<tel:+7%20482%20263%2000%2070;ext=42766>   Cell: +7 920 154 05 72<tel:+7%20920%20154%2005%2072>   Email: andrey_kuznetsov@epam.com<ma...@epam.com>
Tver, Russia   epam.com<http://www.epam.com/>

CONFIDENTIALITY CAUTION AND DISCLAIMER
This message is intended only for the use of the individual(s) or entity(ies) to which it is addressed and contains information that is legally privileged and confidential. If you are not the intended recipient, or the person responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. All unintended recipients are obliged to delete this message and destroy any printed copies.

From: Jean-Daniel Cryans [mailto:jdcryans@apache.org<ma...@apache.org>]
Sent: Wednesday, August 9, 2017 10:52 PM
To: user@kudu.apache.org<ma...@kudu.apache.org>
Cc: Special SBER-BPOC Team <Sp...@epam.com>>
Subject: Re: [kudu] import from hdfs

Hi Andrey,

Which version of Kudu and Impala are you using? Just that can make a huge difference.

Apart from that, make sure Kudu has enough memory (no memory back pressure), you have enough maintenance manager threads (1/3 or 1/4 the number of disks), and that your partitioning favors good load distribution.

But TBH writing to Parquet will remain faster than writing to Kudu, because Kudu isn't just dropping the rows into a file and has to do more than that.

Hope this helps,

J-D

On Wed, Aug 9, 2017 at 9:05 AM, Andrey Kuznetsov <An...@epam.com>> wrote:
Hi folk,
I have a problem with hdfs to kudu performance, I have created external table with CSV data and ran “insert as select”  from it to kudu-table and to parquet-table:
Importing to parquet-table is 3x faster than to kudu – do you know some tips/tricks to increase performance of import?
actually I am importing 8TB of data, so it is critical for me,

Best regards,
ANDREY KUZNETSOV
Software Engineering Team Leader, Assessment Global Discipline Head (Java)

Office: +7 482 263 00 70 x 42766<tel:+7%20482%20263%2000%2070;ext=42766>   Cell: +7 920 154 05 72<tel:+7%20920%20154%2005%2072>   Email: andrey_kuznetsov@epam.com<ma...@epam.com>
Tver, Russia   epam.com<http://www.epam.com/>

CONFIDENTIALITY CAUTION AND DISCLAIMER
This message is intended only for the use of the individual(s) or entity(ies) to which it is addressed and contains information that is legally privileged and confidential. If you are not the intended recipient, or the person responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. All unintended recipients are obliged to delete this message and destroy any printed copies.




Re: [kudu] import from hdfs

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Hi Andrey,

Can you double check how much memory is actually given to Kudu? That's
--memory_limit_hard_bytes. Providing us with a full kudu-tserver log could
be useful, as long as it starts with this line "Tablet server non-default
flags".

Without more data about your situation it's going to be really hard to help
you.

Thx,

J-D

On Thu, Aug 10, 2017 at 4:46 AM, Andrey Kuznetsov <Andrey_Kuznetsov@epam.com
> wrote:

> Hi Jean-Daniel,
>
> Nice to hear you)
>
>
>
> I use kudu 1.3, I hope kudu has enough memory (about 256Gb each node),
>
> I have played with threads parameter, but there are no a lot of
> differences -
>
> it is extremely slow…
>
>
>
> Best regards,
>
> *ANDREY KUZNETSOV*
>
> *Software Engineering Team Leader, Assessment Global Discipline Head
> (Java)*
>
>
>
> *Office: *+7 482 263 00 70 *x* 42766 <+7%20482%20263%2000%2070;ext=42766>
>    *Cell: *+7 920 154 05 72 <+7%20920%20154%2005%2072>   *Email: *
> andrey_kuznetsov@epam.com
>
> *Tver,* *Russia *  *epam.com <http://www.epam.com/>*
>
>
>
> CONFIDENTIALITY CAUTION AND DISCLAIMER
> This message is intended only for the use of the individual(s) or
> entity(ies) to which it is addressed and contains information that is
> legally privileged and confidential. If you are not the intended recipient,
> or the person responsible for delivering the message to the intended
> recipient, you are hereby notified that any dissemination, distribution or
> copying of this communication is strictly prohibited. All unintended
> recipients are obliged to delete this message and destroy any printed
> copies.
>
>
>
> *From:* Jean-Daniel Cryans [mailto:jdcryans@apache.org]
> *Sent:* Wednesday, August 9, 2017 10:52 PM
> *To:* user@kudu.apache.org
> *Cc:* Special SBER-BPOC Team <Sp...@epam.com>
> *Subject:* Re: [kudu] import from hdfs
>
>
>
> Hi Andrey,
>
>
>
> Which version of Kudu and Impala are you using? Just that can make a huge
> difference.
>
>
>
> Apart from that, make sure Kudu has enough memory (no memory back
> pressure), you have enough maintenance manager threads (1/3 or 1/4 the
> number of disks), and that your partitioning favors good load distribution.
>
>
>
> But TBH writing to Parquet will remain faster than writing to Kudu,
> because Kudu isn't just dropping the rows into a file and has to do more
> than that.
>
>
>
> Hope this helps,
>
>
>
> J-D
>
>
>
> On Wed, Aug 9, 2017 at 9:05 AM, Andrey Kuznetsov <
> Andrey_Kuznetsov@epam.com> wrote:
>
> Hi folk,
>
> I have a problem with hdfs to kudu performance, I have created external
> table with CSV data and ran “insert as select”  from it to kudu-table and
> to parquet-table:
>
> Importing to parquet-table is 3x faster than to kudu – do you know some
> tips/tricks to increase performance of import?
>
> actually I am importing 8TB of data, so it is critical for me,
>
>
>
> Best regards,
>
> *ANDREY KUZNETSOV*
>
> *Software Engineering Team Leader, Assessment Global Discipline Head
> (Java)*
>
>
>
> *Office: *+7 482 263 00 70 *x* 42766 <+7%20482%20263%2000%2070;ext=42766>
>    *Cell: *+7 920 154 05 72 <+7%20920%20154%2005%2072>   *Email: *
> andrey_kuznetsov@epam.com
>
> *Tver,* *Russia *  *epam.com <http://www.epam.com/>*
>
>
>
> CONFIDENTIALITY CAUTION AND DISCLAIMER
> This message is intended only for the use of the individual(s) or
> entity(ies) to which it is addressed and contains information that is
> legally privileged and confidential. If you are not the intended recipient,
> or the person responsible for delivering the message to the intended
> recipient, you are hereby notified that any dissemination, distribution or
> copying of this communication is strictly prohibited. All unintended
> recipients are obliged to delete this message and destroy any printed
> copies.
>
>
>
>
>

RE: [kudu] import from hdfs

Posted by Andrey Kuznetsov <An...@epam.com>.
Hi Jean-Daniel,
Nice to hear you)

I use kudu 1.3, I hope kudu has enough memory (about 256Gb each node),
I have played with threads parameter, but there are no a lot of differences -
it is extremely slow…

Best regards,
ANDREY KUZNETSOV
Software Engineering Team Leader, Assessment Global Discipline Head (Java)

Office: +7 482 263 00 70 x 42766<tel:+7%20482%20263%2000%2070;ext=42766>   Cell: +7 920 154 05 72<tel:+7%20920%20154%2005%2072>   Email: andrey_kuznetsov@epam.com<ma...@epam.com>
Tver, Russia   epam.com<http://www.epam.com/>

CONFIDENTIALITY CAUTION AND DISCLAIMER
This message is intended only for the use of the individual(s) or entity(ies) to which it is addressed and contains information that is legally privileged and confidential. If you are not the intended recipient, or the person responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. All unintended recipients are obliged to delete this message and destroy any printed copies.

From: Jean-Daniel Cryans [mailto:jdcryans@apache.org]
Sent: Wednesday, August 9, 2017 10:52 PM
To: user@kudu.apache.org
Cc: Special SBER-BPOC Team <Sp...@epam.com>
Subject: Re: [kudu] import from hdfs

Hi Andrey,

Which version of Kudu and Impala are you using? Just that can make a huge difference.

Apart from that, make sure Kudu has enough memory (no memory back pressure), you have enough maintenance manager threads (1/3 or 1/4 the number of disks), and that your partitioning favors good load distribution.

But TBH writing to Parquet will remain faster than writing to Kudu, because Kudu isn't just dropping the rows into a file and has to do more than that.

Hope this helps,

J-D

On Wed, Aug 9, 2017 at 9:05 AM, Andrey Kuznetsov <An...@epam.com>> wrote:
Hi folk,
I have a problem with hdfs to kudu performance, I have created external table with CSV data and ran “insert as select”  from it to kudu-table and to parquet-table:
Importing to parquet-table is 3x faster than to kudu – do you know some tips/tricks to increase performance of import?
actually I am importing 8TB of data, so it is critical for me,

Best regards,
ANDREY KUZNETSOV
Software Engineering Team Leader, Assessment Global Discipline Head (Java)

Office: +7 482 263 00 70 x 42766<tel:+7%20482%20263%2000%2070;ext=42766>   Cell: +7 920 154 05 72<tel:+7%20920%20154%2005%2072>   Email: andrey_kuznetsov@epam.com<ma...@epam.com>
Tver, Russia   epam.com<http://www.epam.com/>

CONFIDENTIALITY CAUTION AND DISCLAIMER
This message is intended only for the use of the individual(s) or entity(ies) to which it is addressed and contains information that is legally privileged and confidential. If you are not the intended recipient, or the person responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. All unintended recipients are obliged to delete this message and destroy any printed copies.



Re: [kudu] import from hdfs

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Hi Andrey,

Which version of Kudu and Impala are you using? Just that can make a huge
difference.

Apart from that, make sure Kudu has enough memory (no memory back
pressure), you have enough maintenance manager threads (1/3 or 1/4 the
number of disks), and that your partitioning favors good load distribution.

But TBH writing to Parquet will remain faster than writing to Kudu, because
Kudu isn't just dropping the rows into a file and has to do more than that.

Hope this helps,

J-D

On Wed, Aug 9, 2017 at 9:05 AM, Andrey Kuznetsov <An...@epam.com>
wrote:

> Hi folk,
>
> I have a problem with hdfs to kudu performance, I have created external
> table with CSV data and ran “insert as select”  from it to kudu-table and
> to parquet-table:
>
> Importing to parquet-table is 3x faster than to kudu – do you know some
> tips/tricks to increase performance of import?
>
> actually I am importing 8TB of data, so it is critical for me,
>
>
>
> Best regards,
>
> *ANDREY KUZNETSOV*
>
> *Software Engineering Team Leader, Assessment Global Discipline Head
> (Java)*
>
>
>
> *Office: *+7 482 263 00 70 *x* 42766 <+7%20482%20263%2000%2070;ext=42766>
>    *Cell: *+7 920 154 05 72 <+7%20920%20154%2005%2072>   *Email: *
> andrey_kuznetsov@epam.com
>
> *Tver,* *Russia *  *epam.com <http://www.epam.com/>*
>
>
>
> CONFIDENTIALITY CAUTION AND DISCLAIMER
> This message is intended only for the use of the individual(s) or
> entity(ies) to which it is addressed and contains information that is
> legally privileged and confidential. If you are not the intended recipient,
> or the person responsible for delivering the message to the intended
> recipient, you are hereby notified that any dissemination, distribution or
> copying of this communication is strictly prohibited. All unintended
> recipients are obliged to delete this message and destroy any printed
> copies.
>
>
>