You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Matt <bs...@gmail.com> on 2015/05/27 21:17:37 UTC
Monitoring long / stuck CTAS
Attempting to create a Parquet backed table with a CTAS from an 44GB tab
delimited file in HDFS. The process seemed to be running, as CPU and IO
was seen on all 4 nodes in this cluster, and .parquet files being
created in the expected path.
In however in the last two hours or so, all nodes show near zero CPU or
IO, and the Last Modified date on the .parquet have not changed. Same
time delay shown in the Last Progress column in the active fragment
profile.
What approach can I take to determine what is happening (or not)?
Re: Monitoring long / stuck CTAS
Posted by Sudheesh Katkam <sk...@maprtech.com>.
See below:
> On May 27, 2015, at 12:17 PM, Matt <bs...@gmail.com> wrote:
>
> Attempting to create a Parquet backed table with a CTAS from an 44GB tab delimited file in HDFS. The process seemed to be running, as CPU and IO was seen on all 4 nodes in this cluster, and .parquet files being created in the expected path.
>
> In however in the last two hours or so, all nodes show near zero CPU or IO, and the Last Modified date on the .parquet have not changed. Same time delay shown in the Last Progress column in the active fragment profile.
Did you happen to notice the Last Update column in the profile? If so, was there a time delay in that too?
>
> What approach can I take to determine what is happening (or not)?
>
Re: Monitoring long / stuck CTAS
Posted by Matt <bs...@gmail.com>.
Bumping memory to:
DRILL_MAX_DIRECT_MEMORY="16G"
DRILL_HEAP="8G"
The 44GB file imported successfully in 25 minutes - acceptable on this
hardware.
I don't know if the default memory setting was to blame or not.
On 28 May 2015, at 14:22, Andries Engelbrecht wrote:
> That is the Drill direct memory per node.
>
> DRILL_HEAP is for the heap size per node.
>
> More info here
> http://drill.apache.org/docs/configuring-drill-memory/
>
>
> —Andries
>
> On May 28, 2015, at 11:09 AM, Matt <bs...@gmail.com> wrote:
>
>> Referencing http://drill.apache.org/docs/configuring-drill-memory/
>>
>> Is DRILL_MAX_DIRECT_MEMORY the limit for each node, or the cluster?
>>
>> The root page on a drillbit at port 8047 list for nodes, with the 16G
>> Maximum Direct Memory equal to DRILL_MAX_DIRECT_MEMORY, thus
>> uncertain if that is a node or cluster limit.
>>
>>
>> On 28 May 2015, at 12:23, Jason Altekruse wrote:
>>
>>> That is correct. I guess it could be possible that HDFS might run
>>> out of
>>> heap, but I'm guessing that is unlikely the cause of the failure you
>>> are
>>> seeing. We should not be taxing zookeeper enough to be causing any
>>> issues
>>> there.
>>>
>>> On Thu, May 28, 2015 at 9:17 AM, Matt <bs...@gmail.com> wrote:
>>>
>>>> To make sure I am adjusting the correct config, these are heap
>>>> parameters
>>>> within the Drill configure path, not for Hadoop or Zookeeper?
>>>>
>>>>
>>>>> On May 28, 2015, at 12:08 PM, Jason Altekruse
>>>>> <al...@gmail.com>
>>>> wrote:
>>>>>
>>>>> There should be no upper limit on the size of the tables you can
>>>>> create
>>>>> with Drill. Be advised that Drill does currently operate entirely
>>>>> optimistically in regards to available resources. If a network
>>>>> connection
>>>>> between two drillbits fails during a query, we will not currently
>>>>> re-schedule the work to make use of remaining nodes and network
>>>> connections
>>>>> that are still live. While we have had a good amount of success
>>>>> using
>>>> Drill
>>>>> for data conversion, be aware that these conditions could cause
>>>>> long
>>>>> running queries to fail.
>>>>>
>>>>> That being said, it isn't the only possible cause for such a
>>>>> failure. In
>>>>> the case of a network failure we would expect to see a message
>>>>> returned
>>>> to
>>>>> you that part of the query was unsuccessful and that it had been
>>>> cancelled.
>>>>> Andries has a good suggestion in regards to checking the heap
>>>>> memory,
>>>> this
>>>>> should also be detected and reported back to you at the CLI, but
>>>>> we may
>>>> be
>>>>> failing to propagate the error back to the head node for the
>>>>> query. I
>>>>> believe writing parquet may still be the most heap-intensive
>>>>> operation in
>>>>> Drill, despite our efforts to refactor the write path to use
>>>>> direct
>>>> memory
>>>>> instead of on-heap for large buffers needed in the process of
>>>>> creating
>>>>> parquet files.
>>>>>
>>>>>> On Thu, May 28, 2015 at 8:43 AM, Matt <bs...@gmail.com> wrote:
>>>>>>
>>>>>> Is 300MM records too much to do in a single CTAS statement?
>>>>>>
>>>>>> After almost 23 hours I killed the query (^c) and it returned:
>>>>>>
>>>>>> ~~~
>>>>>> +-----------+----------------------------+
>>>>>> | Fragment | Number of records written |
>>>>>> +-----------+----------------------------+
>>>>>> | 1_20 | 13568824 |
>>>>>> | 1_15 | 12411822 |
>>>>>> | 1_7 | 12470329 |
>>>>>> | 1_12 | 13693867 |
>>>>>> | 1_5 | 13292136 |
>>>>>> | 1_18 | 13874321 |
>>>>>> | 1_16 | 13303094 |
>>>>>> | 1_9 | 13639049 |
>>>>>> | 1_10 | 13698380 |
>>>>>> | 1_22 | 13501073 |
>>>>>> | 1_8 | 13533736 |
>>>>>> | 1_2 | 13549402 |
>>>>>> | 1_21 | 13665183 |
>>>>>> | 1_0 | 13544745 |
>>>>>> | 1_4 | 13532957 |
>>>>>> | 1_19 | 12767473 |
>>>>>> | 1_17 | 13670687 |
>>>>>> | 1_13 | 13469515 |
>>>>>> | 1_23 | 12517632 |
>>>>>> | 1_6 | 13634338 |
>>>>>> | 1_14 | 13611322 |
>>>>>> | 1_3 | 13061900 |
>>>>>> | 1_11 | 12760978 |
>>>>>> +-----------+----------------------------+
>>>>>> 23 rows selected (82294.854 seconds)
>>>>>> ~~~
>>>>>>
>>>>>> The sum of those record counts is 306,772,763 which is close to
>>>>>> the
>>>>>> 320,843,454 in the source file:
>>>>>>
>>>>>> ~~~
>>>>>> 0: jdbc:drill:zk=es05:2181> select count(*) FROM
>>>> root.`sample_201501.dat`;
>>>>>> +------------+
>>>>>> | EXPR$0 |
>>>>>> +------------+
>>>>>> | 320843454 |
>>>>>> +------------+
>>>>>> 1 row selected (384.665 seconds)
>>>>>> ~~~
>>>>>>
>>>>>>
>>>>>> It represents one month of data, 4 key columns and 38 numeric
>>>>>> measure
>>>>>> columns, which could also be partitioned daily. The test here was
>>>>>> to
>>>> create
>>>>>> monthly Parquet files to see how the min/max stats on Parquet
>>>>>> chunks
>>>> help
>>>>>> with range select performance.
>>>>>>
>>>>>> Instead of a small number of large monthly RDBMS tables, I am
>>>>>> attempting
>>>>>> to determine how many Parquet files should be used with Drill /
>>>>>> HDFS.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 27 May 2015, at 15:17, Matt wrote:
>>>>>>
>>>>>> Attempting to create a Parquet backed table with a CTAS from an
>>>>>> 44GB tab
>>>>>>> delimited file in HDFS. The process seemed to be running, as CPU
>>>>>>> and
>>>> IO was
>>>>>>> seen on all 4 nodes in this cluster, and .parquet files being
>>>>>>> created
>>>> in
>>>>>>> the expected path.
>>>>>>>
>>>>>>> In however in the last two hours or so, all nodes show near zero
>>>>>>> CPU or
>>>>>>> IO, and the Last Modified date on the .parquet have not changed.
>>>>>>> Same
>>>> time
>>>>>>> delay shown in the Last Progress column in the active fragment
>>>>>>> profile.
>>>>>>>
>>>>>>> What approach can I take to determine what is happening (or
>>>>>>> not)?
>>>>>>
>>>>
Re: Monitoring long / stuck CTAS
Posted by Andries Engelbrecht <ae...@maprtech.com>.
That is the Drill direct memory per node.
DRILL_HEAP is for the heap size per node.
More info here
http://drill.apache.org/docs/configuring-drill-memory/
—Andries
On May 28, 2015, at 11:09 AM, Matt <bs...@gmail.com> wrote:
> Referencing http://drill.apache.org/docs/configuring-drill-memory/
>
> Is DRILL_MAX_DIRECT_MEMORY the limit for each node, or the cluster?
>
> The root page on a drillbit at port 8047 list for nodes, with the 16G Maximum Direct Memory equal to DRILL_MAX_DIRECT_MEMORY, thus uncertain if that is a node or cluster limit.
>
>
> On 28 May 2015, at 12:23, Jason Altekruse wrote:
>
>> That is correct. I guess it could be possible that HDFS might run out of
>> heap, but I'm guessing that is unlikely the cause of the failure you are
>> seeing. We should not be taxing zookeeper enough to be causing any issues
>> there.
>>
>> On Thu, May 28, 2015 at 9:17 AM, Matt <bs...@gmail.com> wrote:
>>
>>> To make sure I am adjusting the correct config, these are heap parameters
>>> within the Drill configure path, not for Hadoop or Zookeeper?
>>>
>>>
>>>> On May 28, 2015, at 12:08 PM, Jason Altekruse <al...@gmail.com>
>>> wrote:
>>>>
>>>> There should be no upper limit on the size of the tables you can create
>>>> with Drill. Be advised that Drill does currently operate entirely
>>>> optimistically in regards to available resources. If a network connection
>>>> between two drillbits fails during a query, we will not currently
>>>> re-schedule the work to make use of remaining nodes and network
>>> connections
>>>> that are still live. While we have had a good amount of success using
>>> Drill
>>>> for data conversion, be aware that these conditions could cause long
>>>> running queries to fail.
>>>>
>>>> That being said, it isn't the only possible cause for such a failure. In
>>>> the case of a network failure we would expect to see a message returned
>>> to
>>>> you that part of the query was unsuccessful and that it had been
>>> cancelled.
>>>> Andries has a good suggestion in regards to checking the heap memory,
>>> this
>>>> should also be detected and reported back to you at the CLI, but we may
>>> be
>>>> failing to propagate the error back to the head node for the query. I
>>>> believe writing parquet may still be the most heap-intensive operation in
>>>> Drill, despite our efforts to refactor the write path to use direct
>>> memory
>>>> instead of on-heap for large buffers needed in the process of creating
>>>> parquet files.
>>>>
>>>>> On Thu, May 28, 2015 at 8:43 AM, Matt <bs...@gmail.com> wrote:
>>>>>
>>>>> Is 300MM records too much to do in a single CTAS statement?
>>>>>
>>>>> After almost 23 hours I killed the query (^c) and it returned:
>>>>>
>>>>> ~~~
>>>>> +-----------+----------------------------+
>>>>> | Fragment | Number of records written |
>>>>> +-----------+----------------------------+
>>>>> | 1_20 | 13568824 |
>>>>> | 1_15 | 12411822 |
>>>>> | 1_7 | 12470329 |
>>>>> | 1_12 | 13693867 |
>>>>> | 1_5 | 13292136 |
>>>>> | 1_18 | 13874321 |
>>>>> | 1_16 | 13303094 |
>>>>> | 1_9 | 13639049 |
>>>>> | 1_10 | 13698380 |
>>>>> | 1_22 | 13501073 |
>>>>> | 1_8 | 13533736 |
>>>>> | 1_2 | 13549402 |
>>>>> | 1_21 | 13665183 |
>>>>> | 1_0 | 13544745 |
>>>>> | 1_4 | 13532957 |
>>>>> | 1_19 | 12767473 |
>>>>> | 1_17 | 13670687 |
>>>>> | 1_13 | 13469515 |
>>>>> | 1_23 | 12517632 |
>>>>> | 1_6 | 13634338 |
>>>>> | 1_14 | 13611322 |
>>>>> | 1_3 | 13061900 |
>>>>> | 1_11 | 12760978 |
>>>>> +-----------+----------------------------+
>>>>> 23 rows selected (82294.854 seconds)
>>>>> ~~~
>>>>>
>>>>> The sum of those record counts is 306,772,763 which is close to the
>>>>> 320,843,454 in the source file:
>>>>>
>>>>> ~~~
>>>>> 0: jdbc:drill:zk=es05:2181> select count(*) FROM
>>> root.`sample_201501.dat`;
>>>>> +------------+
>>>>> | EXPR$0 |
>>>>> +------------+
>>>>> | 320843454 |
>>>>> +------------+
>>>>> 1 row selected (384.665 seconds)
>>>>> ~~~
>>>>>
>>>>>
>>>>> It represents one month of data, 4 key columns and 38 numeric measure
>>>>> columns, which could also be partitioned daily. The test here was to
>>> create
>>>>> monthly Parquet files to see how the min/max stats on Parquet chunks
>>> help
>>>>> with range select performance.
>>>>>
>>>>> Instead of a small number of large monthly RDBMS tables, I am attempting
>>>>> to determine how many Parquet files should be used with Drill / HDFS.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 27 May 2015, at 15:17, Matt wrote:
>>>>>
>>>>> Attempting to create a Parquet backed table with a CTAS from an 44GB tab
>>>>>> delimited file in HDFS. The process seemed to be running, as CPU and
>>> IO was
>>>>>> seen on all 4 nodes in this cluster, and .parquet files being created
>>> in
>>>>>> the expected path.
>>>>>>
>>>>>> In however in the last two hours or so, all nodes show near zero CPU or
>>>>>> IO, and the Last Modified date on the .parquet have not changed. Same
>>> time
>>>>>> delay shown in the Last Progress column in the active fragment profile.
>>>>>>
>>>>>> What approach can I take to determine what is happening (or not)?
>>>>>
>>>
Re: Monitoring long / stuck CTAS
Posted by Matt <bs...@gmail.com>.
Referencing http://drill.apache.org/docs/configuring-drill-memory/
Is DRILL_MAX_DIRECT_MEMORY the limit for each node, or the cluster?
The root page on a drillbit at port 8047 list for nodes, with the 16G
Maximum Direct Memory equal to DRILL_MAX_DIRECT_MEMORY, thus uncertain
if that is a node or cluster limit.
On 28 May 2015, at 12:23, Jason Altekruse wrote:
> That is correct. I guess it could be possible that HDFS might run out
> of
> heap, but I'm guessing that is unlikely the cause of the failure you
> are
> seeing. We should not be taxing zookeeper enough to be causing any
> issues
> there.
>
> On Thu, May 28, 2015 at 9:17 AM, Matt <bs...@gmail.com> wrote:
>
>> To make sure I am adjusting the correct config, these are heap
>> parameters
>> within the Drill configure path, not for Hadoop or Zookeeper?
>>
>>
>>> On May 28, 2015, at 12:08 PM, Jason Altekruse
>>> <al...@gmail.com>
>> wrote:
>>>
>>> There should be no upper limit on the size of the tables you can
>>> create
>>> with Drill. Be advised that Drill does currently operate entirely
>>> optimistically in regards to available resources. If a network
>>> connection
>>> between two drillbits fails during a query, we will not currently
>>> re-schedule the work to make use of remaining nodes and network
>> connections
>>> that are still live. While we have had a good amount of success
>>> using
>> Drill
>>> for data conversion, be aware that these conditions could cause long
>>> running queries to fail.
>>>
>>> That being said, it isn't the only possible cause for such a
>>> failure. In
>>> the case of a network failure we would expect to see a message
>>> returned
>> to
>>> you that part of the query was unsuccessful and that it had been
>> cancelled.
>>> Andries has a good suggestion in regards to checking the heap
>>> memory,
>> this
>>> should also be detected and reported back to you at the CLI, but we
>>> may
>> be
>>> failing to propagate the error back to the head node for the query.
>>> I
>>> believe writing parquet may still be the most heap-intensive
>>> operation in
>>> Drill, despite our efforts to refactor the write path to use direct
>> memory
>>> instead of on-heap for large buffers needed in the process of
>>> creating
>>> parquet files.
>>>
>>>> On Thu, May 28, 2015 at 8:43 AM, Matt <bs...@gmail.com> wrote:
>>>>
>>>> Is 300MM records too much to do in a single CTAS statement?
>>>>
>>>> After almost 23 hours I killed the query (^c) and it returned:
>>>>
>>>> ~~~
>>>> +-----------+----------------------------+
>>>> | Fragment | Number of records written |
>>>> +-----------+----------------------------+
>>>> | 1_20 | 13568824 |
>>>> | 1_15 | 12411822 |
>>>> | 1_7 | 12470329 |
>>>> | 1_12 | 13693867 |
>>>> | 1_5 | 13292136 |
>>>> | 1_18 | 13874321 |
>>>> | 1_16 | 13303094 |
>>>> | 1_9 | 13639049 |
>>>> | 1_10 | 13698380 |
>>>> | 1_22 | 13501073 |
>>>> | 1_8 | 13533736 |
>>>> | 1_2 | 13549402 |
>>>> | 1_21 | 13665183 |
>>>> | 1_0 | 13544745 |
>>>> | 1_4 | 13532957 |
>>>> | 1_19 | 12767473 |
>>>> | 1_17 | 13670687 |
>>>> | 1_13 | 13469515 |
>>>> | 1_23 | 12517632 |
>>>> | 1_6 | 13634338 |
>>>> | 1_14 | 13611322 |
>>>> | 1_3 | 13061900 |
>>>> | 1_11 | 12760978 |
>>>> +-----------+----------------------------+
>>>> 23 rows selected (82294.854 seconds)
>>>> ~~~
>>>>
>>>> The sum of those record counts is 306,772,763 which is close to
>>>> the
>>>> 320,843,454 in the source file:
>>>>
>>>> ~~~
>>>> 0: jdbc:drill:zk=es05:2181> select count(*) FROM
>> root.`sample_201501.dat`;
>>>> +------------+
>>>> | EXPR$0 |
>>>> +------------+
>>>> | 320843454 |
>>>> +------------+
>>>> 1 row selected (384.665 seconds)
>>>> ~~~
>>>>
>>>>
>>>> It represents one month of data, 4 key columns and 38 numeric
>>>> measure
>>>> columns, which could also be partitioned daily. The test here was
>>>> to
>> create
>>>> monthly Parquet files to see how the min/max stats on Parquet
>>>> chunks
>> help
>>>> with range select performance.
>>>>
>>>> Instead of a small number of large monthly RDBMS tables, I am
>>>> attempting
>>>> to determine how many Parquet files should be used with Drill /
>>>> HDFS.
>>>>
>>>>
>>>>
>>>>
>>>> On 27 May 2015, at 15:17, Matt wrote:
>>>>
>>>> Attempting to create a Parquet backed table with a CTAS from an
>>>> 44GB tab
>>>>> delimited file in HDFS. The process seemed to be running, as CPU
>>>>> and
>> IO was
>>>>> seen on all 4 nodes in this cluster, and .parquet files being
>>>>> created
>> in
>>>>> the expected path.
>>>>>
>>>>> In however in the last two hours or so, all nodes show near zero
>>>>> CPU or
>>>>> IO, and the Last Modified date on the .parquet have not changed.
>>>>> Same
>> time
>>>>> delay shown in the Last Progress column in the active fragment
>>>>> profile.
>>>>>
>>>>> What approach can I take to determine what is happening (or not)?
>>>>
>>
Re: Monitoring long / stuck CTAS
Posted by Jason Altekruse <al...@gmail.com>.
That is correct. I guess it could be possible that HDFS might run out of
heap, but I'm guessing that is unlikely the cause of the failure you are
seeing. We should not be taxing zookeeper enough to be causing any issues
there.
On Thu, May 28, 2015 at 9:17 AM, Matt <bs...@gmail.com> wrote:
> To make sure I am adjusting the correct config, these are heap parameters
> within the Drill configure path, not for Hadoop or Zookeeper?
>
>
> > On May 28, 2015, at 12:08 PM, Jason Altekruse <al...@gmail.com>
> wrote:
> >
> > There should be no upper limit on the size of the tables you can create
> > with Drill. Be advised that Drill does currently operate entirely
> > optimistically in regards to available resources. If a network connection
> > between two drillbits fails during a query, we will not currently
> > re-schedule the work to make use of remaining nodes and network
> connections
> > that are still live. While we have had a good amount of success using
> Drill
> > for data conversion, be aware that these conditions could cause long
> > running queries to fail.
> >
> > That being said, it isn't the only possible cause for such a failure. In
> > the case of a network failure we would expect to see a message returned
> to
> > you that part of the query was unsuccessful and that it had been
> cancelled.
> > Andries has a good suggestion in regards to checking the heap memory,
> this
> > should also be detected and reported back to you at the CLI, but we may
> be
> > failing to propagate the error back to the head node for the query. I
> > believe writing parquet may still be the most heap-intensive operation in
> > Drill, despite our efforts to refactor the write path to use direct
> memory
> > instead of on-heap for large buffers needed in the process of creating
> > parquet files.
> >
> >> On Thu, May 28, 2015 at 8:43 AM, Matt <bs...@gmail.com> wrote:
> >>
> >> Is 300MM records too much to do in a single CTAS statement?
> >>
> >> After almost 23 hours I killed the query (^c) and it returned:
> >>
> >> ~~~
> >> +-----------+----------------------------+
> >> | Fragment | Number of records written |
> >> +-----------+----------------------------+
> >> | 1_20 | 13568824 |
> >> | 1_15 | 12411822 |
> >> | 1_7 | 12470329 |
> >> | 1_12 | 13693867 |
> >> | 1_5 | 13292136 |
> >> | 1_18 | 13874321 |
> >> | 1_16 | 13303094 |
> >> | 1_9 | 13639049 |
> >> | 1_10 | 13698380 |
> >> | 1_22 | 13501073 |
> >> | 1_8 | 13533736 |
> >> | 1_2 | 13549402 |
> >> | 1_21 | 13665183 |
> >> | 1_0 | 13544745 |
> >> | 1_4 | 13532957 |
> >> | 1_19 | 12767473 |
> >> | 1_17 | 13670687 |
> >> | 1_13 | 13469515 |
> >> | 1_23 | 12517632 |
> >> | 1_6 | 13634338 |
> >> | 1_14 | 13611322 |
> >> | 1_3 | 13061900 |
> >> | 1_11 | 12760978 |
> >> +-----------+----------------------------+
> >> 23 rows selected (82294.854 seconds)
> >> ~~~
> >>
> >> The sum of those record counts is 306,772,763 which is close to the
> >> 320,843,454 in the source file:
> >>
> >> ~~~
> >> 0: jdbc:drill:zk=es05:2181> select count(*) FROM
> root.`sample_201501.dat`;
> >> +------------+
> >> | EXPR$0 |
> >> +------------+
> >> | 320843454 |
> >> +------------+
> >> 1 row selected (384.665 seconds)
> >> ~~~
> >>
> >>
> >> It represents one month of data, 4 key columns and 38 numeric measure
> >> columns, which could also be partitioned daily. The test here was to
> create
> >> monthly Parquet files to see how the min/max stats on Parquet chunks
> help
> >> with range select performance.
> >>
> >> Instead of a small number of large monthly RDBMS tables, I am attempting
> >> to determine how many Parquet files should be used with Drill / HDFS.
> >>
> >>
> >>
> >>
> >> On 27 May 2015, at 15:17, Matt wrote:
> >>
> >> Attempting to create a Parquet backed table with a CTAS from an 44GB tab
> >>> delimited file in HDFS. The process seemed to be running, as CPU and
> IO was
> >>> seen on all 4 nodes in this cluster, and .parquet files being created
> in
> >>> the expected path.
> >>>
> >>> In however in the last two hours or so, all nodes show near zero CPU or
> >>> IO, and the Last Modified date on the .parquet have not changed. Same
> time
> >>> delay shown in the Last Progress column in the active fragment profile.
> >>>
> >>> What approach can I take to determine what is happening (or not)?
> >>
>
Re: Monitoring long / stuck CTAS
Posted by Matt <bs...@gmail.com>.
> 1) it isn't HDFS.
Is MapR-FS a replacement or stand-in for HDFS?
On 29 May 2015, at 5:55, Ted Dunning wrote:
> Apologies for the plug, but using MapR FS would help you a lot here. The
> trick is that you can run an NFS server on every node and mount that server
> as localhost.
>
> The benefits are:
>
> 1) the entire cluster appears as a conventional POSIX style file system in
> addition to being available via HDFS API's.
>
> 2) you get very high I/O speeds
>
> 3) you get real snapshots and mirrors if you need them
>
> 4) you get the use of the HBase API without having to run HBase. Tables
> are integrated directly into MapR FS.
>
> 5) programs that need to exceed local disk size can do so
>
> 6) data can be isolated to single machines if you want.
>
> 7) you can get it for free or pay for support
>
>
> The downsides are:
>
> 1) it isn't HDFS.
>
> 2) the data platform isn't Apache licensed (all of eco-system code is
> unchanged wrt licensing)
>
>
>
>
>
> On Thu, May 28, 2015 at 9:37 AM, Matt <bs...@gmail.com> wrote:
>
>> I know I can / should assign individual disks to HDFS, but as a test
>> cluster there are apps that expect data volumes to work on. A dedicated
>> Hadoop production cluster would have a disk layout specific to the task.
Re: Monitoring long / stuck CTAS
Posted by Carol McDonald <cm...@maprtech.com>.
What Ted just talked about is also explained in this On Demand Training
https://www.mapr.com/services/mapr-academy/mapr-distribution-essentials-training-course-on-demand
(which is free)
On Fri, May 29, 2015 at 5:29 PM, Ted Dunning <te...@gmail.com> wrote:
> There are two methods to support HBase table API's. The first is to simply
> run HBase. That is just like, well, running HBase.
>
> The more interesting alternative is to use a special client API that talks
> a special table-oriented wire protocol to the file system which implements
> a column-family / column oriented table API similar to what HBase uses.
> The big differences have to do with the fact that code inside the file
> system has capabilities available to it that are not available to HBase.
> For instance, it can use a file oriented transaction and recovery system.
> It can also make use of knowledge about file system layout that is not
> available to HBase.
>
> Because we can optimize the file layouts, we can also change the low level
> protocols for disk reorganization. MapR tables have more levels of
> sub-division than HBase and we use different low-level algorithms. This
> results in having lots of write-ahead logs which would crush HDFS because
> of the commit rate, but it allows very fast crash recovery (10's to low
> 100's of ms after the basic file system is back)
>
> Also, since the tables are built using standard file-system primitives all
> of the transactionally correct snapshots and mirrors carry over to tables
> as well.
>
> Oh, and it tends to be a lot faster and failure tolerant as well.
>
>
>
> On Fri, May 29, 2015 at 7:00 AM, Yousef Lasi <yo...@gmail.com>
> wrote:
>
> > Could you expand on the HBase table integration? How does that work?
> >
> > On Fri, May 29, 2015 at 5:55 AM, Ted Dunning <te...@gmail.com>
> > wrote:
> >
> > >
> > > 4) you get the use of the HBase API without having to run HBase.
> Tables
> > > are integrated directly into MapR FS.
> > >
> > >
> > >
> > >
> > >
> > > On Thu, May 28, 2015 at 9:37 AM, Matt <bs...@gmail.com> wrote:
> > >
> > > > I know I can / should assign individual disks to HDFS, but as a test
> > > > cluster there are apps that expect data volumes to work on. A
> dedicated
> > > > Hadoop production cluster would have a disk layout specific to the
> > task.
> > >
> >
>
Re: Monitoring long / stuck CTAS
Posted by Ted Dunning <te...@gmail.com>.
There are two methods to support HBase table API's. The first is to simply
run HBase. That is just like, well, running HBase.
The more interesting alternative is to use a special client API that talks
a special table-oriented wire protocol to the file system which implements
a column-family / column oriented table API similar to what HBase uses.
The big differences have to do with the fact that code inside the file
system has capabilities available to it that are not available to HBase.
For instance, it can use a file oriented transaction and recovery system.
It can also make use of knowledge about file system layout that is not
available to HBase.
Because we can optimize the file layouts, we can also change the low level
protocols for disk reorganization. MapR tables have more levels of
sub-division than HBase and we use different low-level algorithms. This
results in having lots of write-ahead logs which would crush HDFS because
of the commit rate, but it allows very fast crash recovery (10's to low
100's of ms after the basic file system is back)
Also, since the tables are built using standard file-system primitives all
of the transactionally correct snapshots and mirrors carry over to tables
as well.
Oh, and it tends to be a lot faster and failure tolerant as well.
On Fri, May 29, 2015 at 7:00 AM, Yousef Lasi <yo...@gmail.com> wrote:
> Could you expand on the HBase table integration? How does that work?
>
> On Fri, May 29, 2015 at 5:55 AM, Ted Dunning <te...@gmail.com>
> wrote:
>
> >
> > 4) you get the use of the HBase API without having to run HBase. Tables
> > are integrated directly into MapR FS.
> >
> >
> >
> >
> >
> > On Thu, May 28, 2015 at 9:37 AM, Matt <bs...@gmail.com> wrote:
> >
> > > I know I can / should assign individual disks to HDFS, but as a test
> > > cluster there are apps that expect data volumes to work on. A dedicated
> > > Hadoop production cluster would have a disk layout specific to the
> task.
> >
>
Re: Monitoring long / stuck CTAS
Posted by Yousef Lasi <yo...@gmail.com>.
Could you expand on the HBase table integration? How does that work?
On Fri, May 29, 2015 at 5:55 AM, Ted Dunning <te...@gmail.com> wrote:
>
> 4) you get the use of the HBase API without having to run HBase. Tables
> are integrated directly into MapR FS.
>
>
>
>
>
> On Thu, May 28, 2015 at 9:37 AM, Matt <bs...@gmail.com> wrote:
>
> > I know I can / should assign individual disks to HDFS, but as a test
> > cluster there are apps that expect data volumes to work on. A dedicated
> > Hadoop production cluster would have a disk layout specific to the task.
>
Re: Monitoring long / stuck CTAS
Posted by Ted Dunning <te...@gmail.com>.
Apologies for the plug, but using MapR FS would help you a lot here. The
trick is that you can run an NFS server on every node and mount that server
as localhost.
The benefits are:
1) the entire cluster appears as a conventional POSIX style file system in
addition to being available via HDFS API's.
2) you get very high I/O speeds
3) you get real snapshots and mirrors if you need them
4) you get the use of the HBase API without having to run HBase. Tables
are integrated directly into MapR FS.
5) programs that need to exceed local disk size can do so
6) data can be isolated to single machines if you want.
7) you can get it for free or pay for support
The downsides are:
1) it isn't HDFS.
2) the data platform isn't Apache licensed (all of eco-system code is
unchanged wrt licensing)
On Thu, May 28, 2015 at 9:37 AM, Matt <bs...@gmail.com> wrote:
> I know I can / should assign individual disks to HDFS, but as a test
> cluster there are apps that expect data volumes to work on. A dedicated
> Hadoop production cluster would have a disk layout specific to the task.
Re: Monitoring long / stuck CTAS
Posted by Matt <bs...@gmail.com>.
CPU and IO went to near zero on the master and all nodes after about 1
hour. I am do not know if the bulk of rows were written within that hour
or after.
> Is there any way you can read the table and try to validate if all of
> the data was written?
A simple join will show me where it stopped, and if that was at a
specific point in scanning the source file top to bottom.
> While we certainly want to look into this more to find the issue in
> your case, you might have all of the data you need to start running
> queries against the parquet files.
Simple row count comparison tells me about 5% of the rows are missing in
the destination, but I will be confirming that.
On 28 May 2015, at 13:24, Jason Altekruse wrote:
> He mentioned in his original post that he saw CPU and IO on all of the
> nodes for a while when the query was active, but it suddenly dropped
> down
> to low CPU usage and stopped producing files. It seems like we are
> failing
> to detect an error an cancel the query.
>
> It is possible that the failure happened when we were finalizing the
> query,
> cleanup resources/file handles/ etc. Is there any way you can read the
> table and try to validate if all of the data was written? You can try
> to
> run a few of the same queries against the tab delimited files and
> resulting
> parquet files to see if all of the records were written. While we
> certainly
> want to look into this more to find the issue in your case, you might
> have
> all of the data you need to start running queries against the parquet
> files.
>
> On Thu, May 28, 2015 at 10:06 AM, Andries Engelbrecht <
> aengelbrecht@maprtech.com> wrote:
>
>> The time seems pretty long for that file size. What type of file is
>> it?
>>
>> Is the CTAS running single threaded?
>>
>> —Andries
>>
>>
>> On May 28, 2015, at 9:37 AM, Matt <bs...@gmail.com> wrote:
>>
>>>> How large is the data set you are working with, and your
>>>> cluster/nodes?
>>>
>>> Just testing with that single 44GB source file currently, and my
>>> test
>> cluster is made from 4 nodes, each with 8 CPU cores, 32GB RAM, a 6TB
>> Ext4
>> volume (RAID-10).
>>>
>>> Drill defaults left as come in v1.0. I will be adjusting memory and
>> retrying the CTAS.
>>>
>>> I know I can / should assign individual disks to HDFS, but as a test
>> cluster there are apps that expect data volumes to work on. A
>> dedicated
>> Hadoop production cluster would have a disk layout specific to the
>> task.
>>>
>>>
>>> On 28 May 2015, at 12:26, Andries Engelbrecht wrote:
>>>
>>>> Just check the drillbit.log and drillbit.out files in the log
>>>> directory.
>>>> Before adjusting memory, see if that is an issue first. It was for
>>>> me,
>> but as Jason mentioned there can be other causes as well.
>>>>
>>>> You adjust memory allocation in the drill-env.sh files, and have to
>> restart the drill bits.
>>>>
>>>> How large is the data set you are working with, and your
>>>> cluster/nodes?
>>>>
>>>> —Andries
>>>>
>>>>
>>>> On May 28, 2015, at 9:17 AM, Matt <bs...@gmail.com> wrote:
>>>>
>>>>> To make sure I am adjusting the correct config, these are heap
>> parameters within the Drill configure path, not for Hadoop or
>> Zookeeper?
>>>>>
>>>>>
>>>>>> On May 28, 2015, at 12:08 PM, Jason Altekruse <
>> altekrusejason@gmail.com> wrote:
>>>>>>
>>>>>> There should be no upper limit on the size of the tables you can
>> create
>>>>>> with Drill. Be advised that Drill does currently operate entirely
>>>>>> optimistically in regards to available resources. If a network
>> connection
>>>>>> between two drillbits fails during a query, we will not currently
>>>>>> re-schedule the work to make use of remaining nodes and network
>> connections
>>>>>> that are still live. While we have had a good amount of success
>>>>>> using
>> Drill
>>>>>> for data conversion, be aware that these conditions could cause
>>>>>> long
>>>>>> running queries to fail.
>>>>>>
>>>>>> That being said, it isn't the only possible cause for such a
>>>>>> failure.
>> In
>>>>>> the case of a network failure we would expect to see a message
>> returned to
>>>>>> you that part of the query was unsuccessful and that it had been
>> cancelled.
>>>>>> Andries has a good suggestion in regards to checking the heap
>>>>>> memory,
>> this
>>>>>> should also be detected and reported back to you at the CLI, but
>>>>>> we
>> may be
>>>>>> failing to propagate the error back to the head node for the
>>>>>> query. I
>>>>>> believe writing parquet may still be the most heap-intensive
>> operation in
>>>>>> Drill, despite our efforts to refactor the write path to use
>>>>>> direct
>> memory
>>>>>> instead of on-heap for large buffers needed in the process of
>>>>>> creating
>>>>>> parquet files.
>>>>>>
>>>>>>> On Thu, May 28, 2015 at 8:43 AM, Matt <bs...@gmail.com> wrote:
>>>>>>>
>>>>>>> Is 300MM records too much to do in a single CTAS statement?
>>>>>>>
>>>>>>> After almost 23 hours I killed the query (^c) and it returned:
>>>>>>>
>>>>>>> ~~~
>>>>>>> +-----------+----------------------------+
>>>>>>> | Fragment | Number of records written |
>>>>>>> +-----------+----------------------------+
>>>>>>> | 1_20 | 13568824 |
>>>>>>> | 1_15 | 12411822 |
>>>>>>> | 1_7 | 12470329 |
>>>>>>> | 1_12 | 13693867 |
>>>>>>> | 1_5 | 13292136 |
>>>>>>> | 1_18 | 13874321 |
>>>>>>> | 1_16 | 13303094 |
>>>>>>> | 1_9 | 13639049 |
>>>>>>> | 1_10 | 13698380 |
>>>>>>> | 1_22 | 13501073 |
>>>>>>> | 1_8 | 13533736 |
>>>>>>> | 1_2 | 13549402 |
>>>>>>> | 1_21 | 13665183 |
>>>>>>> | 1_0 | 13544745 |
>>>>>>> | 1_4 | 13532957 |
>>>>>>> | 1_19 | 12767473 |
>>>>>>> | 1_17 | 13670687 |
>>>>>>> | 1_13 | 13469515 |
>>>>>>> | 1_23 | 12517632 |
>>>>>>> | 1_6 | 13634338 |
>>>>>>> | 1_14 | 13611322 |
>>>>>>> | 1_3 | 13061900 |
>>>>>>> | 1_11 | 12760978 |
>>>>>>> +-----------+----------------------------+
>>>>>>> 23 rows selected (82294.854 seconds)
>>>>>>> ~~~
>>>>>>>
>>>>>>> The sum of those record counts is 306,772,763 which is close to
>>>>>>> the
>>>>>>> 320,843,454 in the source file:
>>>>>>>
>>>>>>> ~~~
>>>>>>> 0: jdbc:drill:zk=es05:2181> select count(*) FROM
>> root.`sample_201501.dat`;
>>>>>>> +------------+
>>>>>>> | EXPR$0 |
>>>>>>> +------------+
>>>>>>> | 320843454 |
>>>>>>> +------------+
>>>>>>> 1 row selected (384.665 seconds)
>>>>>>> ~~~
>>>>>>>
>>>>>>>
>>>>>>> It represents one month of data, 4 key columns and 38 numeric
>>>>>>> measure
>>>>>>> columns, which could also be partitioned daily. The test here
>>>>>>> was to
>> create
>>>>>>> monthly Parquet files to see how the min/max stats on Parquet
>>>>>>> chunks
>> help
>>>>>>> with range select performance.
>>>>>>>
>>>>>>> Instead of a small number of large monthly RDBMS tables, I am
>> attempting
>>>>>>> to determine how many Parquet files should be used with Drill /
>>>>>>> HDFS.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 27 May 2015, at 15:17, Matt wrote:
>>>>>>>
>>>>>>> Attempting to create a Parquet backed table with a CTAS from an
>>>>>>> 44GB
>> tab
>>>>>>>> delimited file in HDFS. The process seemed to be running, as
>>>>>>>> CPU
>> and IO was
>>>>>>>> seen on all 4 nodes in this cluster, and .parquet files being
>> created in
>>>>>>>> the expected path.
>>>>>>>>
>>>>>>>> In however in the last two hours or so, all nodes show near
>>>>>>>> zero
>> CPU or
>>>>>>>> IO, and the Last Modified date on the .parquet have not
>>>>>>>> changed.
>> Same time
>>>>>>>> delay shown in the Last Progress column in the active fragment
>> profile.
>>>>>>>>
>>>>>>>> What approach can I take to determine what is happening (or
>>>>>>>> not)?
>>>>>>>
>>
>>
Re: Monitoring long / stuck CTAS
Posted by Jason Altekruse <al...@gmail.com>.
He mentioned in his original post that he saw CPU and IO on all of the
nodes for a while when the query was active, but it suddenly dropped down
to low CPU usage and stopped producing files. It seems like we are failing
to detect an error an cancel the query.
It is possible that the failure happened when we were finalizing the query,
cleanup resources/file handles/ etc. Is there any way you can read the
table and try to validate if all of the data was written? You can try to
run a few of the same queries against the tab delimited files and resulting
parquet files to see if all of the records were written. While we certainly
want to look into this more to find the issue in your case, you might have
all of the data you need to start running queries against the parquet files.
On Thu, May 28, 2015 at 10:06 AM, Andries Engelbrecht <
aengelbrecht@maprtech.com> wrote:
> The time seems pretty long for that file size. What type of file is it?
>
> Is the CTAS running single threaded?
>
> —Andries
>
>
> On May 28, 2015, at 9:37 AM, Matt <bs...@gmail.com> wrote:
>
> >> How large is the data set you are working with, and your cluster/nodes?
> >
> > Just testing with that single 44GB source file currently, and my test
> cluster is made from 4 nodes, each with 8 CPU cores, 32GB RAM, a 6TB Ext4
> volume (RAID-10).
> >
> > Drill defaults left as come in v1.0. I will be adjusting memory and
> retrying the CTAS.
> >
> > I know I can / should assign individual disks to HDFS, but as a test
> cluster there are apps that expect data volumes to work on. A dedicated
> Hadoop production cluster would have a disk layout specific to the task.
> >
> >
> > On 28 May 2015, at 12:26, Andries Engelbrecht wrote:
> >
> >> Just check the drillbit.log and drillbit.out files in the log directory.
> >> Before adjusting memory, see if that is an issue first. It was for me,
> but as Jason mentioned there can be other causes as well.
> >>
> >> You adjust memory allocation in the drill-env.sh files, and have to
> restart the drill bits.
> >>
> >> How large is the data set you are working with, and your cluster/nodes?
> >>
> >> —Andries
> >>
> >>
> >> On May 28, 2015, at 9:17 AM, Matt <bs...@gmail.com> wrote:
> >>
> >>> To make sure I am adjusting the correct config, these are heap
> parameters within the Drill configure path, not for Hadoop or Zookeeper?
> >>>
> >>>
> >>>> On May 28, 2015, at 12:08 PM, Jason Altekruse <
> altekrusejason@gmail.com> wrote:
> >>>>
> >>>> There should be no upper limit on the size of the tables you can
> create
> >>>> with Drill. Be advised that Drill does currently operate entirely
> >>>> optimistically in regards to available resources. If a network
> connection
> >>>> between two drillbits fails during a query, we will not currently
> >>>> re-schedule the work to make use of remaining nodes and network
> connections
> >>>> that are still live. While we have had a good amount of success using
> Drill
> >>>> for data conversion, be aware that these conditions could cause long
> >>>> running queries to fail.
> >>>>
> >>>> That being said, it isn't the only possible cause for such a failure.
> In
> >>>> the case of a network failure we would expect to see a message
> returned to
> >>>> you that part of the query was unsuccessful and that it had been
> cancelled.
> >>>> Andries has a good suggestion in regards to checking the heap memory,
> this
> >>>> should also be detected and reported back to you at the CLI, but we
> may be
> >>>> failing to propagate the error back to the head node for the query. I
> >>>> believe writing parquet may still be the most heap-intensive
> operation in
> >>>> Drill, despite our efforts to refactor the write path to use direct
> memory
> >>>> instead of on-heap for large buffers needed in the process of creating
> >>>> parquet files.
> >>>>
> >>>>> On Thu, May 28, 2015 at 8:43 AM, Matt <bs...@gmail.com> wrote:
> >>>>>
> >>>>> Is 300MM records too much to do in a single CTAS statement?
> >>>>>
> >>>>> After almost 23 hours I killed the query (^c) and it returned:
> >>>>>
> >>>>> ~~~
> >>>>> +-----------+----------------------------+
> >>>>> | Fragment | Number of records written |
> >>>>> +-----------+----------------------------+
> >>>>> | 1_20 | 13568824 |
> >>>>> | 1_15 | 12411822 |
> >>>>> | 1_7 | 12470329 |
> >>>>> | 1_12 | 13693867 |
> >>>>> | 1_5 | 13292136 |
> >>>>> | 1_18 | 13874321 |
> >>>>> | 1_16 | 13303094 |
> >>>>> | 1_9 | 13639049 |
> >>>>> | 1_10 | 13698380 |
> >>>>> | 1_22 | 13501073 |
> >>>>> | 1_8 | 13533736 |
> >>>>> | 1_2 | 13549402 |
> >>>>> | 1_21 | 13665183 |
> >>>>> | 1_0 | 13544745 |
> >>>>> | 1_4 | 13532957 |
> >>>>> | 1_19 | 12767473 |
> >>>>> | 1_17 | 13670687 |
> >>>>> | 1_13 | 13469515 |
> >>>>> | 1_23 | 12517632 |
> >>>>> | 1_6 | 13634338 |
> >>>>> | 1_14 | 13611322 |
> >>>>> | 1_3 | 13061900 |
> >>>>> | 1_11 | 12760978 |
> >>>>> +-----------+----------------------------+
> >>>>> 23 rows selected (82294.854 seconds)
> >>>>> ~~~
> >>>>>
> >>>>> The sum of those record counts is 306,772,763 which is close to the
> >>>>> 320,843,454 in the source file:
> >>>>>
> >>>>> ~~~
> >>>>> 0: jdbc:drill:zk=es05:2181> select count(*) FROM
> root.`sample_201501.dat`;
> >>>>> +------------+
> >>>>> | EXPR$0 |
> >>>>> +------------+
> >>>>> | 320843454 |
> >>>>> +------------+
> >>>>> 1 row selected (384.665 seconds)
> >>>>> ~~~
> >>>>>
> >>>>>
> >>>>> It represents one month of data, 4 key columns and 38 numeric measure
> >>>>> columns, which could also be partitioned daily. The test here was to
> create
> >>>>> monthly Parquet files to see how the min/max stats on Parquet chunks
> help
> >>>>> with range select performance.
> >>>>>
> >>>>> Instead of a small number of large monthly RDBMS tables, I am
> attempting
> >>>>> to determine how many Parquet files should be used with Drill / HDFS.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 27 May 2015, at 15:17, Matt wrote:
> >>>>>
> >>>>> Attempting to create a Parquet backed table with a CTAS from an 44GB
> tab
> >>>>>> delimited file in HDFS. The process seemed to be running, as CPU
> and IO was
> >>>>>> seen on all 4 nodes in this cluster, and .parquet files being
> created in
> >>>>>> the expected path.
> >>>>>>
> >>>>>> In however in the last two hours or so, all nodes show near zero
> CPU or
> >>>>>> IO, and the Last Modified date on the .parquet have not changed.
> Same time
> >>>>>> delay shown in the Last Progress column in the active fragment
> profile.
> >>>>>>
> >>>>>> What approach can I take to determine what is happening (or not)?
> >>>>>
>
>
Re: Monitoring long / stuck CTAS
Posted by Matt <bs...@gmail.com>.
I have another test case that queries a table using a filter of a range
of dates and customer key, that SUMs 38 columns. The returned record set
encompasses all 42 columns in the table - not a good design for parquet
files or any RDBMS, but a modeling problem that is not yet fully in my
control (the application needs some changes).
Simply selecting all the columns in the parquet files with that filter
returns data to the client in about 3 seconds, but SUMming all of the 38
measure columns resulted in the query still running at the client 22
hours later.
However, the query profile shows no fragments with a Max Runtime or more
than 2h20m, much like the "stuck CTAS" I had before. Learning from that
case, I looked at the node hosting the one fragment that did not finish.
Could this be a communication failure between nodes that is not
signaling the client?
~~~
Major Fragment: 02-xx-xx
Minor Fragment ID Host Name Start End Runtime Max Records Max
Batches Last Update Last Progress Peak Memory State
02-00-xx es06 1.011s 2h20m 2h20m 0 1 02:35:43 02:35:43 2MB CANCELLED
02-01-xx es08 0.999s 4m33s 4m32s 0 1 01:19:52 01:19:52 2MB FINISHED
02-02-xx es07 1.010s 2m16s 2m15s 0 1 01:17:34 01:17:34 2MB FINISHED
02-03-xx es05 1.009s 2m56s 2m55s 0 1 01:18:14 01:18:14 2MB FINISHED
~~~
~~~
2015-05-29 05:23:07,822 [UserServer-1] INFO
o.a.drill.exec.work.foreman.Foreman - Failure while trying communicate
query result to initiating client. This would happen if a client is
disconnected before response notice can be sent.
org.apache.drill.exec.rpc.ChannelClosedException: null
at
org.apache.drill.exec.rpc.CoordinationQueue$RpcListener.operationComplete(CoordinationQueue.java:89)
[drill-java-exec-1.0.0-rebuffed.jar:1.0.0]
at
org.apache.drill.exec.rpc.CoordinationQueue$RpcListener.operationComplete(CoordinationQueue.java:67)
[drill-java-exec-1.0.0-rebuffed.jar:1.0.0]
at
io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
[netty-common-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603)
[netty-common-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563)
[netty-common-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
[netty-common-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:788)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:689)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1114)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:705)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:980)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1032)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:965)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
[netty-common-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:254)
[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
[netty-common-4.0.27.Final.jar:4.0.27.Final]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79]
2015-05-29 05:23:07,822 [UserServer-1] INFO
o.a.drill.exec.work.foreman.Foreman - State change requested. CANCELED
--> FAILED
org.apache.drill.exec.rpc.ChannelClosedException: null
at
org.apache.drill.exec.rpc.CoordinationQueue$RpcListener.operationComplete(CoordinationQueue.java:89)
[drill-java-exec-1.0.0-rebuffed.jar:1.0.0]
at
org.apache.drill.exec.rpc.CoordinationQueue$RpcListener.operationComplete(CoordinationQueue.java:67)
[drill-java-exec-1.0.0-rebuffed.jar:1.0.0]
at
io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
[netty-common-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603)
[netty-common-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563)
[netty-common-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
[netty-common-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:788)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:689)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1114)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:705)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:980)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1032)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:965)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
[netty-common-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:254)
[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
[netty-common-4.0.27.Final.jar:4.0.27.Final]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79]
~~~
On 28 May 2015, at 16:43, Mehant Baid wrote:
> I think the problem might be related to a single laggard, looks like
> we are waiting for one minor fragment to complete. Based on the output
> you provided looks like the fragment 1_1 hasn't completed. You might
> want to find out where the fragment was scheduled and what is going on
> in that node. It might also be useful to look at the profile for that
> minor fragment to see how much data has been processed.
>
>
> Thanks
> Mehant
>
> On 5/28/15 10:57 AM, Matt wrote:
>>> Did you check the log files for any errors?
>>
>> No messages related to this query containing errors or warning, nor
>> nothing mentioning memory or heap. Querying now to determine what is
>> missing in the parquet destination.
>>
>> drillbit.out on the master shows no error messages, and what looks
>> like the last relevant line is:
>>
>> ~~~
>> May 27, 2015 6:43:50 PM INFO:
>> parquet.hadoop.ColumnChunkPageWriteStore: written 2,258,263B for
>> [bytes_1250] INT64: 3,069,414 values, 24,555,504B raw, 2,257,112B
>> comp, 24 pages, encodings: [RLE, PLAIN, BIT_PACKED]
>> May 27, 2015 6:43:51 PM INFO: parquet.haMay 28, 2015 5:13:42 PM
>> org.apache.calcite.sql.validate.SqlValidatorException <init>
>> ~~~
>>
>> The final lines in drillbit.log (which appear to use a different time
>> format in the log) that contain the profile ID:
>>
>> ~~~
>> 2015-05-27 18:39:49,980
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:20] INFO
>> o.a.d.e.w.fragment.FragmentExecutor -
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:20: State change requested
>> from RUNNING --> FINISHED for
>> 2015-05-27 18:39:49,981
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:20] INFO
>> o.a.d.e.w.f.AbstractStatusReporter - State changed for
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:20. New state: FINISHED
>> 2015-05-27 18:40:05,650
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:12] INFO
>> o.a.d.e.w.fragment.FragmentExecutor -
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:12: State change requested
>> from RUNNING --> FINISHED for
>> 2015-05-27 18:40:05,650
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:12] INFO
>> o.a.d.e.w.f.AbstractStatusReporter - State changed for
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:12. New state: FINISHED
>> 2015-05-27 18:41:57,444
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:16] INFO
>> o.a.d.e.w.fragment.FragmentExecutor -
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:16: State change requested
>> from RUNNING --> FINISHED for
>> 2015-05-27 18:41:57,444
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:16] INFO
>> o.a.d.e.w.f.AbstractStatusReporter - State changed for
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:16. New state: FINISHED
>> 2015-05-27 18:43:25,005
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:8] INFO
>> o.a.d.e.w.fragment.FragmentExecutor -
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:8: State change requested from
>> RUNNING --> FINISHED for
>> 2015-05-27 18:43:25,005
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:8] INFO
>> o.a.d.e.w.f.AbstractStatusReporter - State changed for
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:8. New state: FINISHED
>> 2015-05-27 18:43:54,539
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:0] INFO
>> o.a.d.e.w.fragment.FragmentExecutor -
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:0: State change requested from
>> RUNNING --> FINISHED for
>> 2015-05-27 18:43:54,540
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:0] INFO
>> o.a.d.e.w.f.AbstractStatusReporter - State changed for
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:0. New state: FINISHED
>> 2015-05-27 18:43:59,947
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:4] INFO
>> o.a.d.e.w.fragment.FragmentExecutor -
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:4: State change requested from
>> RUNNING --> FINISHED for
>> 2015-05-27 18:43:59,947
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:4] INFO
>> o.a.d.e.w.f.AbstractStatusReporter - State changed for
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:4. New state: FINISHED
>> ~~~
>>
>>
>> On 28 May 2015, at 13:42, Andries Engelbrecht wrote:
>>
>>> It should execute multi threaded, need to check on text file.
>>>
>>> Did you check the log files for any errors?
>>>
>>>
>>> On May 28, 2015, at 10:36 AM, Matt <bs...@gmail.com> wrote:
>>>
>>>>> The time seems pretty long for that file size. What type of file
>>>>> is it?
>>>>
>>>> Tab delimited UTF-8 text.
>>>>
>>>> I left the query to run overnight to see if it would complete, but
>>>> 24 hours for an import like this would indeed be too long.
>>>>
>>>>> Is the CTAS running single threaded?
>>>>
>>>> In the first hour, with this being the only client connected to the
>>>> cluster, I observed activity on all 4 nodes.
>>>>
>>>> Is multi-threaded query execution the default? I would not have
>>>> changed anything deliberately to force single thread execution.
>>>>
>>>>
>>>> On 28 May 2015, at 13:06, Andries Engelbrecht wrote:
>>>>
>>>>> The time seems pretty long for that file size. What type of file
>>>>> is it?
>>>>>
>>>>> Is the CTAS running single threaded?
>>>>>
>>>>> —Andries
>>>>>
>>>>>
>>>>> On May 28, 2015, at 9:37 AM, Matt <bs...@gmail.com> wrote:
>>>>>
>>>>>>> How large is the data set you are working with, and your
>>>>>>> cluster/nodes?
>>>>>>
>>>>>> Just testing with that single 44GB source file currently, and my
>>>>>> test cluster is made from 4 nodes, each with 8 CPU cores, 32GB
>>>>>> RAM, a 6TB Ext4 volume (RAID-10).
>>>>>>
>>>>>> Drill defaults left as come in v1.0. I will be adjusting memory
>>>>>> and retrying the CTAS.
>>>>>>
>>>>>> I know I can / should assign individual disks to HDFS, but as a
>>>>>> test cluster there are apps that expect data volumes to work on.
>>>>>> A dedicated Hadoop production cluster would have a disk layout
>>>>>> specific to the task.
>>>>>>
>>>>>>
>>>>>> On 28 May 2015, at 12:26, Andries Engelbrecht wrote:
>>>>>>
>>>>>>> Just check the drillbit.log and drillbit.out files in the log
>>>>>>> directory.
>>>>>>> Before adjusting memory, see if that is an issue first. It was
>>>>>>> for me, but as Jason mentioned there can be other causes as
>>>>>>> well.
>>>>>>>
>>>>>>> You adjust memory allocation in the drill-env.sh files, and have
>>>>>>> to restart the drill bits.
>>>>>>>
>>>>>>> How large is the data set you are working with, and your
>>>>>>> cluster/nodes?
>>>>>>>
>>>>>>> —Andries
>>>>>>>
>>>>>>>
>>>>>>> On May 28, 2015, at 9:17 AM, Matt <bs...@gmail.com> wrote:
>>>>>>>
>>>>>>>> To make sure I am adjusting the correct config, these are heap
>>>>>>>> parameters within the Drill configure path, not for Hadoop or
>>>>>>>> Zookeeper?
>>>>>>>>
>>>>>>>>
>>>>>>>>> On May 28, 2015, at 12:08 PM, Jason Altekruse
>>>>>>>>> <al...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> There should be no upper limit on the size of the tables you
>>>>>>>>> can create
>>>>>>>>> with Drill. Be advised that Drill does currently operate
>>>>>>>>> entirely
>>>>>>>>> optimistically in regards to available resources. If a network
>>>>>>>>> connection
>>>>>>>>> between two drillbits fails during a query, we will not
>>>>>>>>> currently
>>>>>>>>> re-schedule the work to make use of remaining nodes and
>>>>>>>>> network connections
>>>>>>>>> that are still live. While we have had a good amount of
>>>>>>>>> success using Drill
>>>>>>>>> for data conversion, be aware that these conditions could
>>>>>>>>> cause long
>>>>>>>>> running queries to fail.
>>>>>>>>>
>>>>>>>>> That being said, it isn't the only possible cause for such a
>>>>>>>>> failure. In
>>>>>>>>> the case of a network failure we would expect to see a message
>>>>>>>>> returned to
>>>>>>>>> you that part of the query was unsuccessful and that it had
>>>>>>>>> been cancelled.
>>>>>>>>> Andries has a good suggestion in regards to checking the heap
>>>>>>>>> memory, this
>>>>>>>>> should also be detected and reported back to you at the CLI,
>>>>>>>>> but we may be
>>>>>>>>> failing to propagate the error back to the head node for the
>>>>>>>>> query. I
>>>>>>>>> believe writing parquet may still be the most heap-intensive
>>>>>>>>> operation in
>>>>>>>>> Drill, despite our efforts to refactor the write path to use
>>>>>>>>> direct memory
>>>>>>>>> instead of on-heap for large buffers needed in the process of
>>>>>>>>> creating
>>>>>>>>> parquet files.
>>>>>>>>>
>>>>>>>>>> On Thu, May 28, 2015 at 8:43 AM, Matt <bs...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Is 300MM records too much to do in a single CTAS statement?
>>>>>>>>>>
>>>>>>>>>> After almost 23 hours I killed the query (^c) and it
>>>>>>>>>> returned:
>>>>>>>>>>
>>>>>>>>>> ~~~
>>>>>>>>>> +-----------+----------------------------+
>>>>>>>>>> | Fragment | Number of records written |
>>>>>>>>>> +-----------+----------------------------+
>>>>>>>>>> | 1_20 | 13568824 |
>>>>>>>>>> | 1_15 | 12411822 |
>>>>>>>>>> | 1_7 | 12470329 |
>>>>>>>>>> | 1_12 | 13693867 |
>>>>>>>>>> | 1_5 | 13292136 |
>>>>>>>>>> | 1_18 | 13874321 |
>>>>>>>>>> | 1_16 | 13303094 |
>>>>>>>>>> | 1_9 | 13639049 |
>>>>>>>>>> | 1_10 | 13698380 |
>>>>>>>>>> | 1_22 | 13501073 |
>>>>>>>>>> | 1_8 | 13533736 |
>>>>>>>>>> | 1_2 | 13549402 |
>>>>>>>>>> | 1_21 | 13665183 |
>>>>>>>>>> | 1_0 | 13544745 |
>>>>>>>>>> | 1_4 | 13532957 |
>>>>>>>>>> | 1_19 | 12767473 |
>>>>>>>>>> | 1_17 | 13670687 |
>>>>>>>>>> | 1_13 | 13469515 |
>>>>>>>>>> | 1_23 | 12517632 |
>>>>>>>>>> | 1_6 | 13634338 |
>>>>>>>>>> | 1_14 | 13611322 |
>>>>>>>>>> | 1_3 | 13061900 |
>>>>>>>>>> | 1_11 | 12760978 |
>>>>>>>>>> +-----------+----------------------------+
>>>>>>>>>> 23 rows selected (82294.854 seconds)
>>>>>>>>>> ~~~
>>>>>>>>>>
>>>>>>>>>> The sum of those record counts is 306,772,763 which is close
>>>>>>>>>> to the
>>>>>>>>>> 320,843,454 in the source file:
>>>>>>>>>>
>>>>>>>>>> ~~~
>>>>>>>>>> 0: jdbc:drill:zk=es05:2181> select count(*) FROM
>>>>>>>>>> root.`sample_201501.dat`;
>>>>>>>>>> +------------+
>>>>>>>>>> | EXPR$0 |
>>>>>>>>>> +------------+
>>>>>>>>>> | 320843454 |
>>>>>>>>>> +------------+
>>>>>>>>>> 1 row selected (384.665 seconds)
>>>>>>>>>> ~~~
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> It represents one month of data, 4 key columns and 38 numeric
>>>>>>>>>> measure
>>>>>>>>>> columns, which could also be partitioned daily. The test here
>>>>>>>>>> was to create
>>>>>>>>>> monthly Parquet files to see how the min/max stats on Parquet
>>>>>>>>>> chunks help
>>>>>>>>>> with range select performance.
>>>>>>>>>>
>>>>>>>>>> Instead of a small number of large monthly RDBMS tables, I am
>>>>>>>>>> attempting
>>>>>>>>>> to determine how many Parquet files should be used with Drill
>>>>>>>>>> / HDFS.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 27 May 2015, at 15:17, Matt wrote:
>>>>>>>>>>
>>>>>>>>>> Attempting to create a Parquet backed table with a CTAS from
>>>>>>>>>> an 44GB tab
>>>>>>>>>>> delimited file in HDFS. The process seemed to be running, as
>>>>>>>>>>> CPU and IO was
>>>>>>>>>>> seen on all 4 nodes in this cluster, and .parquet files
>>>>>>>>>>> being created in
>>>>>>>>>>> the expected path.
>>>>>>>>>>>
>>>>>>>>>>> In however in the last two hours or so, all nodes show near
>>>>>>>>>>> zero CPU or
>>>>>>>>>>> IO, and the Last Modified date on the .parquet have not
>>>>>>>>>>> changed. Same time
>>>>>>>>>>> delay shown in the Last Progress column in the active
>>>>>>>>>>> fragment profile.
>>>>>>>>>>>
>>>>>>>>>>> What approach can I take to determine what is happening (or
>>>>>>>>>>> not)?
>>>>>>>>>>
>>
Re: Monitoring long / stuck CTAS
Posted by Matt <bs...@gmail.com>.
That is a good point. The difference between the number of source rows,
and those that made it into the parquet files is about the same count as
the other fragments.
Indeed the query profile does show fragment 1_1 as CANCELED while the
others all have State FINISHED. Additionally the other fragments have a
runtime of less than 30 mins, where only fragment 1_1 lasted the 23
hours before cancellation.
On 28 May 2015, at 16:43, Mehant Baid wrote:
> I think the problem might be related to a single laggard, looks like
> we are waiting for one minor fragment to complete. Based on the output
> you provided looks like the fragment 1_1 hasn't completed. You might
> want to find out where the fragment was scheduled and what is going on
> in that node. It might also be useful to look at the profile for that
> minor fragment to see how much data has been processed.
>
>
> Thanks
> Mehant
>
> On 5/28/15 10:57 AM, Matt wrote:
>>> Did you check the log files for any errors?
>>
>> No messages related to this query containing errors or warning, nor
>> nothing mentioning memory or heap. Querying now to determine what is
>> missing in the parquet destination.
>>
>> drillbit.out on the master shows no error messages, and what looks
>> like the last relevant line is:
>>
>> ~~~
>> May 27, 2015 6:43:50 PM INFO:
>> parquet.hadoop.ColumnChunkPageWriteStore: written 2,258,263B for
>> [bytes_1250] INT64: 3,069,414 values, 24,555,504B raw, 2,257,112B
>> comp, 24 pages, encodings: [RLE, PLAIN, BIT_PACKED]
>> May 27, 2015 6:43:51 PM INFO: parquet.haMay 28, 2015 5:13:42 PM
>> org.apache.calcite.sql.validate.SqlValidatorException <init>
>> ~~~
>>
>> The final lines in drillbit.log (which appear to use a different time
>> format in the log) that contain the profile ID:
>>
>> ~~~
>> 2015-05-27 18:39:49,980
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:20] INFO
>> o.a.d.e.w.fragment.FragmentExecutor -
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:20: State change requested
>> from RUNNING --> FINISHED for
>> 2015-05-27 18:39:49,981
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:20] INFO
>> o.a.d.e.w.f.AbstractStatusReporter - State changed for
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:20. New state: FINISHED
>> 2015-05-27 18:40:05,650
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:12] INFO
>> o.a.d.e.w.fragment.FragmentExecutor -
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:12: State change requested
>> from RUNNING --> FINISHED for
>> 2015-05-27 18:40:05,650
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:12] INFO
>> o.a.d.e.w.f.AbstractStatusReporter - State changed for
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:12. New state: FINISHED
>> 2015-05-27 18:41:57,444
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:16] INFO
>> o.a.d.e.w.fragment.FragmentExecutor -
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:16: State change requested
>> from RUNNING --> FINISHED for
>> 2015-05-27 18:41:57,444
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:16] INFO
>> o.a.d.e.w.f.AbstractStatusReporter - State changed for
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:16. New state: FINISHED
>> 2015-05-27 18:43:25,005
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:8] INFO
>> o.a.d.e.w.fragment.FragmentExecutor -
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:8: State change requested from
>> RUNNING --> FINISHED for
>> 2015-05-27 18:43:25,005
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:8] INFO
>> o.a.d.e.w.f.AbstractStatusReporter - State changed for
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:8. New state: FINISHED
>> 2015-05-27 18:43:54,539
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:0] INFO
>> o.a.d.e.w.fragment.FragmentExecutor -
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:0: State change requested from
>> RUNNING --> FINISHED for
>> 2015-05-27 18:43:54,540
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:0] INFO
>> o.a.d.e.w.f.AbstractStatusReporter - State changed for
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:0. New state: FINISHED
>> 2015-05-27 18:43:59,947
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:4] INFO
>> o.a.d.e.w.fragment.FragmentExecutor -
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:4: State change requested from
>> RUNNING --> FINISHED for
>> 2015-05-27 18:43:59,947
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:4] INFO
>> o.a.d.e.w.f.AbstractStatusReporter - State changed for
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:4. New state: FINISHED
>> ~~~
>>
>>
>> On 28 May 2015, at 13:42, Andries Engelbrecht wrote:
>>
>>> It should execute multi threaded, need to check on text file.
>>>
>>> Did you check the log files for any errors?
>>>
>>>
>>> On May 28, 2015, at 10:36 AM, Matt <bs...@gmail.com> wrote:
>>>
>>>>> The time seems pretty long for that file size. What type of file
>>>>> is it?
>>>>
>>>> Tab delimited UTF-8 text.
>>>>
>>>> I left the query to run overnight to see if it would complete, but
>>>> 24 hours for an import like this would indeed be too long.
>>>>
>>>>> Is the CTAS running single threaded?
>>>>
>>>> In the first hour, with this being the only client connected to the
>>>> cluster, I observed activity on all 4 nodes.
>>>>
>>>> Is multi-threaded query execution the default? I would not have
>>>> changed anything deliberately to force single thread execution.
>>>>
>>>>
>>>> On 28 May 2015, at 13:06, Andries Engelbrecht wrote:
>>>>
>>>>> The time seems pretty long for that file size. What type of file
>>>>> is it?
>>>>>
>>>>> Is the CTAS running single threaded?
>>>>>
>>>>> —Andries
>>>>>
>>>>>
>>>>> On May 28, 2015, at 9:37 AM, Matt <bs...@gmail.com> wrote:
>>>>>
>>>>>>> How large is the data set you are working with, and your
>>>>>>> cluster/nodes?
>>>>>>
>>>>>> Just testing with that single 44GB source file currently, and my
>>>>>> test cluster is made from 4 nodes, each with 8 CPU cores, 32GB
>>>>>> RAM, a 6TB Ext4 volume (RAID-10).
>>>>>>
>>>>>> Drill defaults left as come in v1.0. I will be adjusting memory
>>>>>> and retrying the CTAS.
>>>>>>
>>>>>> I know I can / should assign individual disks to HDFS, but as a
>>>>>> test cluster there are apps that expect data volumes to work on.
>>>>>> A dedicated Hadoop production cluster would have a disk layout
>>>>>> specific to the task.
>>>>>>
>>>>>>
>>>>>> On 28 May 2015, at 12:26, Andries Engelbrecht wrote:
>>>>>>
>>>>>>> Just check the drillbit.log and drillbit.out files in the log
>>>>>>> directory.
>>>>>>> Before adjusting memory, see if that is an issue first. It was
>>>>>>> for me, but as Jason mentioned there can be other causes as
>>>>>>> well.
>>>>>>>
>>>>>>> You adjust memory allocation in the drill-env.sh files, and have
>>>>>>> to restart the drill bits.
>>>>>>>
>>>>>>> How large is the data set you are working with, and your
>>>>>>> cluster/nodes?
>>>>>>>
>>>>>>> —Andries
>>>>>>>
>>>>>>>
>>>>>>> On May 28, 2015, at 9:17 AM, Matt <bs...@gmail.com> wrote:
>>>>>>>
>>>>>>>> To make sure I am adjusting the correct config, these are heap
>>>>>>>> parameters within the Drill configure path, not for Hadoop or
>>>>>>>> Zookeeper?
>>>>>>>>
>>>>>>>>
>>>>>>>>> On May 28, 2015, at 12:08 PM, Jason Altekruse
>>>>>>>>> <al...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> There should be no upper limit on the size of the tables you
>>>>>>>>> can create
>>>>>>>>> with Drill. Be advised that Drill does currently operate
>>>>>>>>> entirely
>>>>>>>>> optimistically in regards to available resources. If a network
>>>>>>>>> connection
>>>>>>>>> between two drillbits fails during a query, we will not
>>>>>>>>> currently
>>>>>>>>> re-schedule the work to make use of remaining nodes and
>>>>>>>>> network connections
>>>>>>>>> that are still live. While we have had a good amount of
>>>>>>>>> success using Drill
>>>>>>>>> for data conversion, be aware that these conditions could
>>>>>>>>> cause long
>>>>>>>>> running queries to fail.
>>>>>>>>>
>>>>>>>>> That being said, it isn't the only possible cause for such a
>>>>>>>>> failure. In
>>>>>>>>> the case of a network failure we would expect to see a message
>>>>>>>>> returned to
>>>>>>>>> you that part of the query was unsuccessful and that it had
>>>>>>>>> been cancelled.
>>>>>>>>> Andries has a good suggestion in regards to checking the heap
>>>>>>>>> memory, this
>>>>>>>>> should also be detected and reported back to you at the CLI,
>>>>>>>>> but we may be
>>>>>>>>> failing to propagate the error back to the head node for the
>>>>>>>>> query. I
>>>>>>>>> believe writing parquet may still be the most heap-intensive
>>>>>>>>> operation in
>>>>>>>>> Drill, despite our efforts to refactor the write path to use
>>>>>>>>> direct memory
>>>>>>>>> instead of on-heap for large buffers needed in the process of
>>>>>>>>> creating
>>>>>>>>> parquet files.
>>>>>>>>>
>>>>>>>>>> On Thu, May 28, 2015 at 8:43 AM, Matt <bs...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Is 300MM records too much to do in a single CTAS statement?
>>>>>>>>>>
>>>>>>>>>> After almost 23 hours I killed the query (^c) and it
>>>>>>>>>> returned:
>>>>>>>>>>
>>>>>>>>>> ~~~
>>>>>>>>>> +-----------+----------------------------+
>>>>>>>>>> | Fragment | Number of records written |
>>>>>>>>>> +-----------+----------------------------+
>>>>>>>>>> | 1_20 | 13568824 |
>>>>>>>>>> | 1_15 | 12411822 |
>>>>>>>>>> | 1_7 | 12470329 |
>>>>>>>>>> | 1_12 | 13693867 |
>>>>>>>>>> | 1_5 | 13292136 |
>>>>>>>>>> | 1_18 | 13874321 |
>>>>>>>>>> | 1_16 | 13303094 |
>>>>>>>>>> | 1_9 | 13639049 |
>>>>>>>>>> | 1_10 | 13698380 |
>>>>>>>>>> | 1_22 | 13501073 |
>>>>>>>>>> | 1_8 | 13533736 |
>>>>>>>>>> | 1_2 | 13549402 |
>>>>>>>>>> | 1_21 | 13665183 |
>>>>>>>>>> | 1_0 | 13544745 |
>>>>>>>>>> | 1_4 | 13532957 |
>>>>>>>>>> | 1_19 | 12767473 |
>>>>>>>>>> | 1_17 | 13670687 |
>>>>>>>>>> | 1_13 | 13469515 |
>>>>>>>>>> | 1_23 | 12517632 |
>>>>>>>>>> | 1_6 | 13634338 |
>>>>>>>>>> | 1_14 | 13611322 |
>>>>>>>>>> | 1_3 | 13061900 |
>>>>>>>>>> | 1_11 | 12760978 |
>>>>>>>>>> +-----------+----------------------------+
>>>>>>>>>> 23 rows selected (82294.854 seconds)
>>>>>>>>>> ~~~
>>>>>>>>>>
>>>>>>>>>> The sum of those record counts is 306,772,763 which is close
>>>>>>>>>> to the
>>>>>>>>>> 320,843,454 in the source file:
>>>>>>>>>>
>>>>>>>>>> ~~~
>>>>>>>>>> 0: jdbc:drill:zk=es05:2181> select count(*) FROM
>>>>>>>>>> root.`sample_201501.dat`;
>>>>>>>>>> +------------+
>>>>>>>>>> | EXPR$0 |
>>>>>>>>>> +------------+
>>>>>>>>>> | 320843454 |
>>>>>>>>>> +------------+
>>>>>>>>>> 1 row selected (384.665 seconds)
>>>>>>>>>> ~~~
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> It represents one month of data, 4 key columns and 38 numeric
>>>>>>>>>> measure
>>>>>>>>>> columns, which could also be partitioned daily. The test here
>>>>>>>>>> was to create
>>>>>>>>>> monthly Parquet files to see how the min/max stats on Parquet
>>>>>>>>>> chunks help
>>>>>>>>>> with range select performance.
>>>>>>>>>>
>>>>>>>>>> Instead of a small number of large monthly RDBMS tables, I am
>>>>>>>>>> attempting
>>>>>>>>>> to determine how many Parquet files should be used with Drill
>>>>>>>>>> / HDFS.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 27 May 2015, at 15:17, Matt wrote:
>>>>>>>>>>
>>>>>>>>>> Attempting to create a Parquet backed table with a CTAS from
>>>>>>>>>> an 44GB tab
>>>>>>>>>>> delimited file in HDFS. The process seemed to be running, as
>>>>>>>>>>> CPU and IO was
>>>>>>>>>>> seen on all 4 nodes in this cluster, and .parquet files
>>>>>>>>>>> being created in
>>>>>>>>>>> the expected path.
>>>>>>>>>>>
>>>>>>>>>>> In however in the last two hours or so, all nodes show near
>>>>>>>>>>> zero CPU or
>>>>>>>>>>> IO, and the Last Modified date on the .parquet have not
>>>>>>>>>>> changed. Same time
>>>>>>>>>>> delay shown in the Last Progress column in the active
>>>>>>>>>>> fragment profile.
>>>>>>>>>>>
>>>>>>>>>>> What approach can I take to determine what is happening (or
>>>>>>>>>>> not)?
>>>>>>>>>>
>>
Re: Monitoring long / stuck CTAS
Posted by Mehant Baid <ba...@gmail.com>.
I think the problem might be related to a single laggard, looks like we
are waiting for one minor fragment to complete. Based on the output you
provided looks like the fragment 1_1 hasn't completed. You might want to
find out where the fragment was scheduled and what is going on in that
node. It might also be useful to look at the profile for that minor
fragment to see how much data has been processed.
Thanks
Mehant
On 5/28/15 10:57 AM, Matt wrote:
>> Did you check the log files for any errors?
>
> No messages related to this query containing errors or warning, nor
> nothing mentioning memory or heap. Querying now to determine what is
> missing in the parquet destination.
>
> drillbit.out on the master shows no error messages, and what looks
> like the last relevant line is:
>
> ~~~
> May 27, 2015 6:43:50 PM INFO:
> parquet.hadoop.ColumnChunkPageWriteStore: written 2,258,263B for
> [bytes_1250] INT64: 3,069,414 values, 24,555,504B raw, 2,257,112B
> comp, 24 pages, encodings: [RLE, PLAIN, BIT_PACKED]
> May 27, 2015 6:43:51 PM INFO: parquet.haMay 28, 2015 5:13:42 PM
> org.apache.calcite.sql.validate.SqlValidatorException <init>
> ~~~
>
> The final lines in drillbit.log (which appear to use a different time
> format in the log) that contain the profile ID:
>
> ~~~
> 2015-05-27 18:39:49,980
> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:20] INFO
> o.a.d.e.w.fragment.FragmentExecutor -
> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:20: State change requested from
> RUNNING --> FINISHED for
> 2015-05-27 18:39:49,981
> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:20] INFO
> o.a.d.e.w.f.AbstractStatusReporter - State changed for
> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:20. New state: FINISHED
> 2015-05-27 18:40:05,650
> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:12] INFO
> o.a.d.e.w.fragment.FragmentExecutor -
> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:12: State change requested from
> RUNNING --> FINISHED for
> 2015-05-27 18:40:05,650
> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:12] INFO
> o.a.d.e.w.f.AbstractStatusReporter - State changed for
> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:12. New state: FINISHED
> 2015-05-27 18:41:57,444
> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:16] INFO
> o.a.d.e.w.fragment.FragmentExecutor -
> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:16: State change requested from
> RUNNING --> FINISHED for
> 2015-05-27 18:41:57,444
> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:16] INFO
> o.a.d.e.w.f.AbstractStatusReporter - State changed for
> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:16. New state: FINISHED
> 2015-05-27 18:43:25,005
> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:8] INFO
> o.a.d.e.w.fragment.FragmentExecutor -
> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:8: State change requested from
> RUNNING --> FINISHED for
> 2015-05-27 18:43:25,005
> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:8] INFO
> o.a.d.e.w.f.AbstractStatusReporter - State changed for
> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:8. New state: FINISHED
> 2015-05-27 18:43:54,539
> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:0] INFO
> o.a.d.e.w.fragment.FragmentExecutor -
> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:0: State change requested from
> RUNNING --> FINISHED for
> 2015-05-27 18:43:54,540
> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:0] INFO
> o.a.d.e.w.f.AbstractStatusReporter - State changed for
> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:0. New state: FINISHED
> 2015-05-27 18:43:59,947
> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:4] INFO
> o.a.d.e.w.fragment.FragmentExecutor -
> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:4: State change requested from
> RUNNING --> FINISHED for
> 2015-05-27 18:43:59,947
> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:4] INFO
> o.a.d.e.w.f.AbstractStatusReporter - State changed for
> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:4. New state: FINISHED
> ~~~
>
>
> On 28 May 2015, at 13:42, Andries Engelbrecht wrote:
>
>> It should execute multi threaded, need to check on text file.
>>
>> Did you check the log files for any errors?
>>
>>
>> On May 28, 2015, at 10:36 AM, Matt <bs...@gmail.com> wrote:
>>
>>>> The time seems pretty long for that file size. What type of file is
>>>> it?
>>>
>>> Tab delimited UTF-8 text.
>>>
>>> I left the query to run overnight to see if it would complete, but
>>> 24 hours for an import like this would indeed be too long.
>>>
>>>> Is the CTAS running single threaded?
>>>
>>> In the first hour, with this being the only client connected to the
>>> cluster, I observed activity on all 4 nodes.
>>>
>>> Is multi-threaded query execution the default? I would not have
>>> changed anything deliberately to force single thread execution.
>>>
>>>
>>> On 28 May 2015, at 13:06, Andries Engelbrecht wrote:
>>>
>>>> The time seems pretty long for that file size. What type of file is
>>>> it?
>>>>
>>>> Is the CTAS running single threaded?
>>>>
>>>> —Andries
>>>>
>>>>
>>>> On May 28, 2015, at 9:37 AM, Matt <bs...@gmail.com> wrote:
>>>>
>>>>>> How large is the data set you are working with, and your
>>>>>> cluster/nodes?
>>>>>
>>>>> Just testing with that single 44GB source file currently, and my
>>>>> test cluster is made from 4 nodes, each with 8 CPU cores, 32GB
>>>>> RAM, a 6TB Ext4 volume (RAID-10).
>>>>>
>>>>> Drill defaults left as come in v1.0. I will be adjusting memory
>>>>> and retrying the CTAS.
>>>>>
>>>>> I know I can / should assign individual disks to HDFS, but as a
>>>>> test cluster there are apps that expect data volumes to work on. A
>>>>> dedicated Hadoop production cluster would have a disk layout
>>>>> specific to the task.
>>>>>
>>>>>
>>>>> On 28 May 2015, at 12:26, Andries Engelbrecht wrote:
>>>>>
>>>>>> Just check the drillbit.log and drillbit.out files in the log
>>>>>> directory.
>>>>>> Before adjusting memory, see if that is an issue first. It was
>>>>>> for me, but as Jason mentioned there can be other causes as well.
>>>>>>
>>>>>> You adjust memory allocation in the drill-env.sh files, and have
>>>>>> to restart the drill bits.
>>>>>>
>>>>>> How large is the data set you are working with, and your
>>>>>> cluster/nodes?
>>>>>>
>>>>>> —Andries
>>>>>>
>>>>>>
>>>>>> On May 28, 2015, at 9:17 AM, Matt <bs...@gmail.com> wrote:
>>>>>>
>>>>>>> To make sure I am adjusting the correct config, these are heap
>>>>>>> parameters within the Drill configure path, not for Hadoop or
>>>>>>> Zookeeper?
>>>>>>>
>>>>>>>
>>>>>>>> On May 28, 2015, at 12:08 PM, Jason Altekruse
>>>>>>>> <al...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> There should be no upper limit on the size of the tables you
>>>>>>>> can create
>>>>>>>> with Drill. Be advised that Drill does currently operate entirely
>>>>>>>> optimistically in regards to available resources. If a network
>>>>>>>> connection
>>>>>>>> between two drillbits fails during a query, we will not currently
>>>>>>>> re-schedule the work to make use of remaining nodes and network
>>>>>>>> connections
>>>>>>>> that are still live. While we have had a good amount of success
>>>>>>>> using Drill
>>>>>>>> for data conversion, be aware that these conditions could cause
>>>>>>>> long
>>>>>>>> running queries to fail.
>>>>>>>>
>>>>>>>> That being said, it isn't the only possible cause for such a
>>>>>>>> failure. In
>>>>>>>> the case of a network failure we would expect to see a message
>>>>>>>> returned to
>>>>>>>> you that part of the query was unsuccessful and that it had
>>>>>>>> been cancelled.
>>>>>>>> Andries has a good suggestion in regards to checking the heap
>>>>>>>> memory, this
>>>>>>>> should also be detected and reported back to you at the CLI,
>>>>>>>> but we may be
>>>>>>>> failing to propagate the error back to the head node for the
>>>>>>>> query. I
>>>>>>>> believe writing parquet may still be the most heap-intensive
>>>>>>>> operation in
>>>>>>>> Drill, despite our efforts to refactor the write path to use
>>>>>>>> direct memory
>>>>>>>> instead of on-heap for large buffers needed in the process of
>>>>>>>> creating
>>>>>>>> parquet files.
>>>>>>>>
>>>>>>>>> On Thu, May 28, 2015 at 8:43 AM, Matt <bs...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> Is 300MM records too much to do in a single CTAS statement?
>>>>>>>>>
>>>>>>>>> After almost 23 hours I killed the query (^c) and it returned:
>>>>>>>>>
>>>>>>>>> ~~~
>>>>>>>>> +-----------+----------------------------+
>>>>>>>>> | Fragment | Number of records written |
>>>>>>>>> +-----------+----------------------------+
>>>>>>>>> | 1_20 | 13568824 |
>>>>>>>>> | 1_15 | 12411822 |
>>>>>>>>> | 1_7 | 12470329 |
>>>>>>>>> | 1_12 | 13693867 |
>>>>>>>>> | 1_5 | 13292136 |
>>>>>>>>> | 1_18 | 13874321 |
>>>>>>>>> | 1_16 | 13303094 |
>>>>>>>>> | 1_9 | 13639049 |
>>>>>>>>> | 1_10 | 13698380 |
>>>>>>>>> | 1_22 | 13501073 |
>>>>>>>>> | 1_8 | 13533736 |
>>>>>>>>> | 1_2 | 13549402 |
>>>>>>>>> | 1_21 | 13665183 |
>>>>>>>>> | 1_0 | 13544745 |
>>>>>>>>> | 1_4 | 13532957 |
>>>>>>>>> | 1_19 | 12767473 |
>>>>>>>>> | 1_17 | 13670687 |
>>>>>>>>> | 1_13 | 13469515 |
>>>>>>>>> | 1_23 | 12517632 |
>>>>>>>>> | 1_6 | 13634338 |
>>>>>>>>> | 1_14 | 13611322 |
>>>>>>>>> | 1_3 | 13061900 |
>>>>>>>>> | 1_11 | 12760978 |
>>>>>>>>> +-----------+----------------------------+
>>>>>>>>> 23 rows selected (82294.854 seconds)
>>>>>>>>> ~~~
>>>>>>>>>
>>>>>>>>> The sum of those record counts is 306,772,763 which is close
>>>>>>>>> to the
>>>>>>>>> 320,843,454 in the source file:
>>>>>>>>>
>>>>>>>>> ~~~
>>>>>>>>> 0: jdbc:drill:zk=es05:2181> select count(*) FROM
>>>>>>>>> root.`sample_201501.dat`;
>>>>>>>>> +------------+
>>>>>>>>> | EXPR$0 |
>>>>>>>>> +------------+
>>>>>>>>> | 320843454 |
>>>>>>>>> +------------+
>>>>>>>>> 1 row selected (384.665 seconds)
>>>>>>>>> ~~~
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> It represents one month of data, 4 key columns and 38 numeric
>>>>>>>>> measure
>>>>>>>>> columns, which could also be partitioned daily. The test here
>>>>>>>>> was to create
>>>>>>>>> monthly Parquet files to see how the min/max stats on Parquet
>>>>>>>>> chunks help
>>>>>>>>> with range select performance.
>>>>>>>>>
>>>>>>>>> Instead of a small number of large monthly RDBMS tables, I am
>>>>>>>>> attempting
>>>>>>>>> to determine how many Parquet files should be used with Drill
>>>>>>>>> / HDFS.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 27 May 2015, at 15:17, Matt wrote:
>>>>>>>>>
>>>>>>>>> Attempting to create a Parquet backed table with a CTAS from
>>>>>>>>> an 44GB tab
>>>>>>>>>> delimited file in HDFS. The process seemed to be running, as
>>>>>>>>>> CPU and IO was
>>>>>>>>>> seen on all 4 nodes in this cluster, and .parquet files being
>>>>>>>>>> created in
>>>>>>>>>> the expected path.
>>>>>>>>>>
>>>>>>>>>> In however in the last two hours or so, all nodes show near
>>>>>>>>>> zero CPU or
>>>>>>>>>> IO, and the Last Modified date on the .parquet have not
>>>>>>>>>> changed. Same time
>>>>>>>>>> delay shown in the Last Progress column in the active
>>>>>>>>>> fragment profile.
>>>>>>>>>>
>>>>>>>>>> What approach can I take to determine what is happening (or
>>>>>>>>>> not)?
>>>>>>>>>
>
Re: Monitoring long / stuck CTAS
Posted by Matt <bs...@gmail.com>.
> Did you check the log files for any errors?
No messages related to this query containing errors or warning, nor
nothing mentioning memory or heap. Querying now to determine what is
missing in the parquet destination.
drillbit.out on the master shows no error messages, and what looks like
the last relevant line is:
~~~
May 27, 2015 6:43:50 PM INFO: parquet.hadoop.ColumnChunkPageWriteStore:
written 2,258,263B for [bytes_1250] INT64: 3,069,414 values, 24,555,504B
raw, 2,257,112B comp, 24 pages, encodings: [RLE, PLAIN, BIT_PACKED]
May 27, 2015 6:43:51 PM INFO: parquet.haMay 28, 2015 5:13:42 PM
org.apache.calcite.sql.validate.SqlValidatorException <init>
~~~
The final lines in drillbit.log (which appear to use a different time
format in the log) that contain the profile ID:
~~~
2015-05-27 18:39:49,980 [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:20]
INFO o.a.d.e.w.fragment.FragmentExecutor -
2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:20: State change requested from
RUNNING --> FINISHED for
2015-05-27 18:39:49,981 [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:20]
INFO o.a.d.e.w.f.AbstractStatusReporter - State changed for
2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:20. New state: FINISHED
2015-05-27 18:40:05,650 [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:12]
INFO o.a.d.e.w.fragment.FragmentExecutor -
2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:12: State change requested from
RUNNING --> FINISHED for
2015-05-27 18:40:05,650 [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:12]
INFO o.a.d.e.w.f.AbstractStatusReporter - State changed for
2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:12. New state: FINISHED
2015-05-27 18:41:57,444 [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:16]
INFO o.a.d.e.w.fragment.FragmentExecutor -
2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:16: State change requested from
RUNNING --> FINISHED for
2015-05-27 18:41:57,444 [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:16]
INFO o.a.d.e.w.f.AbstractStatusReporter - State changed for
2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:16. New state: FINISHED
2015-05-27 18:43:25,005 [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:8]
INFO o.a.d.e.w.fragment.FragmentExecutor -
2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:8: State change requested from
RUNNING --> FINISHED for
2015-05-27 18:43:25,005 [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:8]
INFO o.a.d.e.w.f.AbstractStatusReporter - State changed for
2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:8. New state: FINISHED
2015-05-27 18:43:54,539 [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:0]
INFO o.a.d.e.w.fragment.FragmentExecutor -
2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:0: State change requested from
RUNNING --> FINISHED for
2015-05-27 18:43:54,540 [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:0]
INFO o.a.d.e.w.f.AbstractStatusReporter - State changed for
2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:0. New state: FINISHED
2015-05-27 18:43:59,947 [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:4]
INFO o.a.d.e.w.fragment.FragmentExecutor -
2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:4: State change requested from
RUNNING --> FINISHED for
2015-05-27 18:43:59,947 [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:4]
INFO o.a.d.e.w.f.AbstractStatusReporter - State changed for
2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:4. New state: FINISHED
~~~
On 28 May 2015, at 13:42, Andries Engelbrecht wrote:
> It should execute multi threaded, need to check on text file.
>
> Did you check the log files for any errors?
>
>
> On May 28, 2015, at 10:36 AM, Matt <bs...@gmail.com> wrote:
>
>>> The time seems pretty long for that file size. What type of file is
>>> it?
>>
>> Tab delimited UTF-8 text.
>>
>> I left the query to run overnight to see if it would complete, but 24
>> hours for an import like this would indeed be too long.
>>
>>> Is the CTAS running single threaded?
>>
>> In the first hour, with this being the only client connected to the
>> cluster, I observed activity on all 4 nodes.
>>
>> Is multi-threaded query execution the default? I would not have
>> changed anything deliberately to force single thread execution.
>>
>>
>> On 28 May 2015, at 13:06, Andries Engelbrecht wrote:
>>
>>> The time seems pretty long for that file size. What type of file is
>>> it?
>>>
>>> Is the CTAS running single threaded?
>>>
>>> —Andries
>>>
>>>
>>> On May 28, 2015, at 9:37 AM, Matt <bs...@gmail.com> wrote:
>>>
>>>>> How large is the data set you are working with, and your
>>>>> cluster/nodes?
>>>>
>>>> Just testing with that single 44GB source file currently, and my
>>>> test cluster is made from 4 nodes, each with 8 CPU cores, 32GB RAM,
>>>> a 6TB Ext4 volume (RAID-10).
>>>>
>>>> Drill defaults left as come in v1.0. I will be adjusting memory and
>>>> retrying the CTAS.
>>>>
>>>> I know I can / should assign individual disks to HDFS, but as a
>>>> test cluster there are apps that expect data volumes to work on. A
>>>> dedicated Hadoop production cluster would have a disk layout
>>>> specific to the task.
>>>>
>>>>
>>>> On 28 May 2015, at 12:26, Andries Engelbrecht wrote:
>>>>
>>>>> Just check the drillbit.log and drillbit.out files in the log
>>>>> directory.
>>>>> Before adjusting memory, see if that is an issue first. It was for
>>>>> me, but as Jason mentioned there can be other causes as well.
>>>>>
>>>>> You adjust memory allocation in the drill-env.sh files, and have
>>>>> to restart the drill bits.
>>>>>
>>>>> How large is the data set you are working with, and your
>>>>> cluster/nodes?
>>>>>
>>>>> —Andries
>>>>>
>>>>>
>>>>> On May 28, 2015, at 9:17 AM, Matt <bs...@gmail.com> wrote:
>>>>>
>>>>>> To make sure I am adjusting the correct config, these are heap
>>>>>> parameters within the Drill configure path, not for Hadoop or
>>>>>> Zookeeper?
>>>>>>
>>>>>>
>>>>>>> On May 28, 2015, at 12:08 PM, Jason Altekruse
>>>>>>> <al...@gmail.com> wrote:
>>>>>>>
>>>>>>> There should be no upper limit on the size of the tables you can
>>>>>>> create
>>>>>>> with Drill. Be advised that Drill does currently operate
>>>>>>> entirely
>>>>>>> optimistically in regards to available resources. If a network
>>>>>>> connection
>>>>>>> between two drillbits fails during a query, we will not
>>>>>>> currently
>>>>>>> re-schedule the work to make use of remaining nodes and network
>>>>>>> connections
>>>>>>> that are still live. While we have had a good amount of success
>>>>>>> using Drill
>>>>>>> for data conversion, be aware that these conditions could cause
>>>>>>> long
>>>>>>> running queries to fail.
>>>>>>>
>>>>>>> That being said, it isn't the only possible cause for such a
>>>>>>> failure. In
>>>>>>> the case of a network failure we would expect to see a message
>>>>>>> returned to
>>>>>>> you that part of the query was unsuccessful and that it had been
>>>>>>> cancelled.
>>>>>>> Andries has a good suggestion in regards to checking the heap
>>>>>>> memory, this
>>>>>>> should also be detected and reported back to you at the CLI, but
>>>>>>> we may be
>>>>>>> failing to propagate the error back to the head node for the
>>>>>>> query. I
>>>>>>> believe writing parquet may still be the most heap-intensive
>>>>>>> operation in
>>>>>>> Drill, despite our efforts to refactor the write path to use
>>>>>>> direct memory
>>>>>>> instead of on-heap for large buffers needed in the process of
>>>>>>> creating
>>>>>>> parquet files.
>>>>>>>
>>>>>>>> On Thu, May 28, 2015 at 8:43 AM, Matt <bs...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> Is 300MM records too much to do in a single CTAS statement?
>>>>>>>>
>>>>>>>> After almost 23 hours I killed the query (^c) and it returned:
>>>>>>>>
>>>>>>>> ~~~
>>>>>>>> +-----------+----------------------------+
>>>>>>>> | Fragment | Number of records written |
>>>>>>>> +-----------+----------------------------+
>>>>>>>> | 1_20 | 13568824 |
>>>>>>>> | 1_15 | 12411822 |
>>>>>>>> | 1_7 | 12470329 |
>>>>>>>> | 1_12 | 13693867 |
>>>>>>>> | 1_5 | 13292136 |
>>>>>>>> | 1_18 | 13874321 |
>>>>>>>> | 1_16 | 13303094 |
>>>>>>>> | 1_9 | 13639049 |
>>>>>>>> | 1_10 | 13698380 |
>>>>>>>> | 1_22 | 13501073 |
>>>>>>>> | 1_8 | 13533736 |
>>>>>>>> | 1_2 | 13549402 |
>>>>>>>> | 1_21 | 13665183 |
>>>>>>>> | 1_0 | 13544745 |
>>>>>>>> | 1_4 | 13532957 |
>>>>>>>> | 1_19 | 12767473 |
>>>>>>>> | 1_17 | 13670687 |
>>>>>>>> | 1_13 | 13469515 |
>>>>>>>> | 1_23 | 12517632 |
>>>>>>>> | 1_6 | 13634338 |
>>>>>>>> | 1_14 | 13611322 |
>>>>>>>> | 1_3 | 13061900 |
>>>>>>>> | 1_11 | 12760978 |
>>>>>>>> +-----------+----------------------------+
>>>>>>>> 23 rows selected (82294.854 seconds)
>>>>>>>> ~~~
>>>>>>>>
>>>>>>>> The sum of those record counts is 306,772,763 which is close
>>>>>>>> to the
>>>>>>>> 320,843,454 in the source file:
>>>>>>>>
>>>>>>>> ~~~
>>>>>>>> 0: jdbc:drill:zk=es05:2181> select count(*) FROM
>>>>>>>> root.`sample_201501.dat`;
>>>>>>>> +------------+
>>>>>>>> | EXPR$0 |
>>>>>>>> +------------+
>>>>>>>> | 320843454 |
>>>>>>>> +------------+
>>>>>>>> 1 row selected (384.665 seconds)
>>>>>>>> ~~~
>>>>>>>>
>>>>>>>>
>>>>>>>> It represents one month of data, 4 key columns and 38 numeric
>>>>>>>> measure
>>>>>>>> columns, which could also be partitioned daily. The test here
>>>>>>>> was to create
>>>>>>>> monthly Parquet files to see how the min/max stats on Parquet
>>>>>>>> chunks help
>>>>>>>> with range select performance.
>>>>>>>>
>>>>>>>> Instead of a small number of large monthly RDBMS tables, I am
>>>>>>>> attempting
>>>>>>>> to determine how many Parquet files should be used with Drill /
>>>>>>>> HDFS.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 27 May 2015, at 15:17, Matt wrote:
>>>>>>>>
>>>>>>>> Attempting to create a Parquet backed table with a CTAS from an
>>>>>>>> 44GB tab
>>>>>>>>> delimited file in HDFS. The process seemed to be running, as
>>>>>>>>> CPU and IO was
>>>>>>>>> seen on all 4 nodes in this cluster, and .parquet files being
>>>>>>>>> created in
>>>>>>>>> the expected path.
>>>>>>>>>
>>>>>>>>> In however in the last two hours or so, all nodes show near
>>>>>>>>> zero CPU or
>>>>>>>>> IO, and the Last Modified date on the .parquet have not
>>>>>>>>> changed. Same time
>>>>>>>>> delay shown in the Last Progress column in the active fragment
>>>>>>>>> profile.
>>>>>>>>>
>>>>>>>>> What approach can I take to determine what is happening (or
>>>>>>>>> not)?
>>>>>>>>
Re: Monitoring long / stuck CTAS
Posted by Andries Engelbrecht <ae...@maprtech.com>.
It should execute multi threaded, need to check on text file.
Did you check the log files for any errors?
On May 28, 2015, at 10:36 AM, Matt <bs...@gmail.com> wrote:
>> The time seems pretty long for that file size. What type of file is it?
>
> Tab delimited UTF-8 text.
>
> I left the query to run overnight to see if it would complete, but 24 hours for an import like this would indeed be too long.
>
>> Is the CTAS running single threaded?
>
> In the first hour, with this being the only client connected to the cluster, I observed activity on all 4 nodes.
>
> Is multi-threaded query execution the default? I would not have changed anything deliberately to force single thread execution.
>
>
> On 28 May 2015, at 13:06, Andries Engelbrecht wrote:
>
>> The time seems pretty long for that file size. What type of file is it?
>>
>> Is the CTAS running single threaded?
>>
>> —Andries
>>
>>
>> On May 28, 2015, at 9:37 AM, Matt <bs...@gmail.com> wrote:
>>
>>>> How large is the data set you are working with, and your cluster/nodes?
>>>
>>> Just testing with that single 44GB source file currently, and my test cluster is made from 4 nodes, each with 8 CPU cores, 32GB RAM, a 6TB Ext4 volume (RAID-10).
>>>
>>> Drill defaults left as come in v1.0. I will be adjusting memory and retrying the CTAS.
>>>
>>> I know I can / should assign individual disks to HDFS, but as a test cluster there are apps that expect data volumes to work on. A dedicated Hadoop production cluster would have a disk layout specific to the task.
>>>
>>>
>>> On 28 May 2015, at 12:26, Andries Engelbrecht wrote:
>>>
>>>> Just check the drillbit.log and drillbit.out files in the log directory.
>>>> Before adjusting memory, see if that is an issue first. It was for me, but as Jason mentioned there can be other causes as well.
>>>>
>>>> You adjust memory allocation in the drill-env.sh files, and have to restart the drill bits.
>>>>
>>>> How large is the data set you are working with, and your cluster/nodes?
>>>>
>>>> —Andries
>>>>
>>>>
>>>> On May 28, 2015, at 9:17 AM, Matt <bs...@gmail.com> wrote:
>>>>
>>>>> To make sure I am adjusting the correct config, these are heap parameters within the Drill configure path, not for Hadoop or Zookeeper?
>>>>>
>>>>>
>>>>>> On May 28, 2015, at 12:08 PM, Jason Altekruse <al...@gmail.com> wrote:
>>>>>>
>>>>>> There should be no upper limit on the size of the tables you can create
>>>>>> with Drill. Be advised that Drill does currently operate entirely
>>>>>> optimistically in regards to available resources. If a network connection
>>>>>> between two drillbits fails during a query, we will not currently
>>>>>> re-schedule the work to make use of remaining nodes and network connections
>>>>>> that are still live. While we have had a good amount of success using Drill
>>>>>> for data conversion, be aware that these conditions could cause long
>>>>>> running queries to fail.
>>>>>>
>>>>>> That being said, it isn't the only possible cause for such a failure. In
>>>>>> the case of a network failure we would expect to see a message returned to
>>>>>> you that part of the query was unsuccessful and that it had been cancelled.
>>>>>> Andries has a good suggestion in regards to checking the heap memory, this
>>>>>> should also be detected and reported back to you at the CLI, but we may be
>>>>>> failing to propagate the error back to the head node for the query. I
>>>>>> believe writing parquet may still be the most heap-intensive operation in
>>>>>> Drill, despite our efforts to refactor the write path to use direct memory
>>>>>> instead of on-heap for large buffers needed in the process of creating
>>>>>> parquet files.
>>>>>>
>>>>>>> On Thu, May 28, 2015 at 8:43 AM, Matt <bs...@gmail.com> wrote:
>>>>>>>
>>>>>>> Is 300MM records too much to do in a single CTAS statement?
>>>>>>>
>>>>>>> After almost 23 hours I killed the query (^c) and it returned:
>>>>>>>
>>>>>>> ~~~
>>>>>>> +-----------+----------------------------+
>>>>>>> | Fragment | Number of records written |
>>>>>>> +-----------+----------------------------+
>>>>>>> | 1_20 | 13568824 |
>>>>>>> | 1_15 | 12411822 |
>>>>>>> | 1_7 | 12470329 |
>>>>>>> | 1_12 | 13693867 |
>>>>>>> | 1_5 | 13292136 |
>>>>>>> | 1_18 | 13874321 |
>>>>>>> | 1_16 | 13303094 |
>>>>>>> | 1_9 | 13639049 |
>>>>>>> | 1_10 | 13698380 |
>>>>>>> | 1_22 | 13501073 |
>>>>>>> | 1_8 | 13533736 |
>>>>>>> | 1_2 | 13549402 |
>>>>>>> | 1_21 | 13665183 |
>>>>>>> | 1_0 | 13544745 |
>>>>>>> | 1_4 | 13532957 |
>>>>>>> | 1_19 | 12767473 |
>>>>>>> | 1_17 | 13670687 |
>>>>>>> | 1_13 | 13469515 |
>>>>>>> | 1_23 | 12517632 |
>>>>>>> | 1_6 | 13634338 |
>>>>>>> | 1_14 | 13611322 |
>>>>>>> | 1_3 | 13061900 |
>>>>>>> | 1_11 | 12760978 |
>>>>>>> +-----------+----------------------------+
>>>>>>> 23 rows selected (82294.854 seconds)
>>>>>>> ~~~
>>>>>>>
>>>>>>> The sum of those record counts is 306,772,763 which is close to the
>>>>>>> 320,843,454 in the source file:
>>>>>>>
>>>>>>> ~~~
>>>>>>> 0: jdbc:drill:zk=es05:2181> select count(*) FROM root.`sample_201501.dat`;
>>>>>>> +------------+
>>>>>>> | EXPR$0 |
>>>>>>> +------------+
>>>>>>> | 320843454 |
>>>>>>> +------------+
>>>>>>> 1 row selected (384.665 seconds)
>>>>>>> ~~~
>>>>>>>
>>>>>>>
>>>>>>> It represents one month of data, 4 key columns and 38 numeric measure
>>>>>>> columns, which could also be partitioned daily. The test here was to create
>>>>>>> monthly Parquet files to see how the min/max stats on Parquet chunks help
>>>>>>> with range select performance.
>>>>>>>
>>>>>>> Instead of a small number of large monthly RDBMS tables, I am attempting
>>>>>>> to determine how many Parquet files should be used with Drill / HDFS.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 27 May 2015, at 15:17, Matt wrote:
>>>>>>>
>>>>>>> Attempting to create a Parquet backed table with a CTAS from an 44GB tab
>>>>>>>> delimited file in HDFS. The process seemed to be running, as CPU and IO was
>>>>>>>> seen on all 4 nodes in this cluster, and .parquet files being created in
>>>>>>>> the expected path.
>>>>>>>>
>>>>>>>> In however in the last two hours or so, all nodes show near zero CPU or
>>>>>>>> IO, and the Last Modified date on the .parquet have not changed. Same time
>>>>>>>> delay shown in the Last Progress column in the active fragment profile.
>>>>>>>>
>>>>>>>> What approach can I take to determine what is happening (or not)?
>>>>>>>
Re: Monitoring long / stuck CTAS
Posted by Matt <bs...@gmail.com>.
> The time seems pretty long for that file size. What type of file is
> it?
Tab delimited UTF-8 text.
I left the query to run overnight to see if it would complete, but 24
hours for an import like this would indeed be too long.
> Is the CTAS running single threaded?
In the first hour, with this being the only client connected to the
cluster, I observed activity on all 4 nodes.
Is multi-threaded query execution the default? I would not have changed
anything deliberately to force single thread execution.
On 28 May 2015, at 13:06, Andries Engelbrecht wrote:
> The time seems pretty long for that file size. What type of file is
> it?
>
> Is the CTAS running single threaded?
>
> —Andries
>
>
> On May 28, 2015, at 9:37 AM, Matt <bs...@gmail.com> wrote:
>
>>> How large is the data set you are working with, and your
>>> cluster/nodes?
>>
>> Just testing with that single 44GB source file currently, and my test
>> cluster is made from 4 nodes, each with 8 CPU cores, 32GB RAM, a 6TB
>> Ext4 volume (RAID-10).
>>
>> Drill defaults left as come in v1.0. I will be adjusting memory and
>> retrying the CTAS.
>>
>> I know I can / should assign individual disks to HDFS, but as a test
>> cluster there are apps that expect data volumes to work on. A
>> dedicated Hadoop production cluster would have a disk layout specific
>> to the task.
>>
>>
>> On 28 May 2015, at 12:26, Andries Engelbrecht wrote:
>>
>>> Just check the drillbit.log and drillbit.out files in the log
>>> directory.
>>> Before adjusting memory, see if that is an issue first. It was for
>>> me, but as Jason mentioned there can be other causes as well.
>>>
>>> You adjust memory allocation in the drill-env.sh files, and have to
>>> restart the drill bits.
>>>
>>> How large is the data set you are working with, and your
>>> cluster/nodes?
>>>
>>> —Andries
>>>
>>>
>>> On May 28, 2015, at 9:17 AM, Matt <bs...@gmail.com> wrote:
>>>
>>>> To make sure I am adjusting the correct config, these are heap
>>>> parameters within the Drill configure path, not for Hadoop or
>>>> Zookeeper?
>>>>
>>>>
>>>>> On May 28, 2015, at 12:08 PM, Jason Altekruse
>>>>> <al...@gmail.com> wrote:
>>>>>
>>>>> There should be no upper limit on the size of the tables you can
>>>>> create
>>>>> with Drill. Be advised that Drill does currently operate entirely
>>>>> optimistically in regards to available resources. If a network
>>>>> connection
>>>>> between two drillbits fails during a query, we will not currently
>>>>> re-schedule the work to make use of remaining nodes and network
>>>>> connections
>>>>> that are still live. While we have had a good amount of success
>>>>> using Drill
>>>>> for data conversion, be aware that these conditions could cause
>>>>> long
>>>>> running queries to fail.
>>>>>
>>>>> That being said, it isn't the only possible cause for such a
>>>>> failure. In
>>>>> the case of a network failure we would expect to see a message
>>>>> returned to
>>>>> you that part of the query was unsuccessful and that it had been
>>>>> cancelled.
>>>>> Andries has a good suggestion in regards to checking the heap
>>>>> memory, this
>>>>> should also be detected and reported back to you at the CLI, but
>>>>> we may be
>>>>> failing to propagate the error back to the head node for the
>>>>> query. I
>>>>> believe writing parquet may still be the most heap-intensive
>>>>> operation in
>>>>> Drill, despite our efforts to refactor the write path to use
>>>>> direct memory
>>>>> instead of on-heap for large buffers needed in the process of
>>>>> creating
>>>>> parquet files.
>>>>>
>>>>>> On Thu, May 28, 2015 at 8:43 AM, Matt <bs...@gmail.com> wrote:
>>>>>>
>>>>>> Is 300MM records too much to do in a single CTAS statement?
>>>>>>
>>>>>> After almost 23 hours I killed the query (^c) and it returned:
>>>>>>
>>>>>> ~~~
>>>>>> +-----------+----------------------------+
>>>>>> | Fragment | Number of records written |
>>>>>> +-----------+----------------------------+
>>>>>> | 1_20 | 13568824 |
>>>>>> | 1_15 | 12411822 |
>>>>>> | 1_7 | 12470329 |
>>>>>> | 1_12 | 13693867 |
>>>>>> | 1_5 | 13292136 |
>>>>>> | 1_18 | 13874321 |
>>>>>> | 1_16 | 13303094 |
>>>>>> | 1_9 | 13639049 |
>>>>>> | 1_10 | 13698380 |
>>>>>> | 1_22 | 13501073 |
>>>>>> | 1_8 | 13533736 |
>>>>>> | 1_2 | 13549402 |
>>>>>> | 1_21 | 13665183 |
>>>>>> | 1_0 | 13544745 |
>>>>>> | 1_4 | 13532957 |
>>>>>> | 1_19 | 12767473 |
>>>>>> | 1_17 | 13670687 |
>>>>>> | 1_13 | 13469515 |
>>>>>> | 1_23 | 12517632 |
>>>>>> | 1_6 | 13634338 |
>>>>>> | 1_14 | 13611322 |
>>>>>> | 1_3 | 13061900 |
>>>>>> | 1_11 | 12760978 |
>>>>>> +-----------+----------------------------+
>>>>>> 23 rows selected (82294.854 seconds)
>>>>>> ~~~
>>>>>>
>>>>>> The sum of those record counts is 306,772,763 which is close to
>>>>>> the
>>>>>> 320,843,454 in the source file:
>>>>>>
>>>>>> ~~~
>>>>>> 0: jdbc:drill:zk=es05:2181> select count(*) FROM
>>>>>> root.`sample_201501.dat`;
>>>>>> +------------+
>>>>>> | EXPR$0 |
>>>>>> +------------+
>>>>>> | 320843454 |
>>>>>> +------------+
>>>>>> 1 row selected (384.665 seconds)
>>>>>> ~~~
>>>>>>
>>>>>>
>>>>>> It represents one month of data, 4 key columns and 38 numeric
>>>>>> measure
>>>>>> columns, which could also be partitioned daily. The test here was
>>>>>> to create
>>>>>> monthly Parquet files to see how the min/max stats on Parquet
>>>>>> chunks help
>>>>>> with range select performance.
>>>>>>
>>>>>> Instead of a small number of large monthly RDBMS tables, I am
>>>>>> attempting
>>>>>> to determine how many Parquet files should be used with Drill /
>>>>>> HDFS.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 27 May 2015, at 15:17, Matt wrote:
>>>>>>
>>>>>> Attempting to create a Parquet backed table with a CTAS from an
>>>>>> 44GB tab
>>>>>>> delimited file in HDFS. The process seemed to be running, as CPU
>>>>>>> and IO was
>>>>>>> seen on all 4 nodes in this cluster, and .parquet files being
>>>>>>> created in
>>>>>>> the expected path.
>>>>>>>
>>>>>>> In however in the last two hours or so, all nodes show near zero
>>>>>>> CPU or
>>>>>>> IO, and the Last Modified date on the .parquet have not changed.
>>>>>>> Same time
>>>>>>> delay shown in the Last Progress column in the active fragment
>>>>>>> profile.
>>>>>>>
>>>>>>> What approach can I take to determine what is happening (or
>>>>>>> not)?
>>>>>>
Re: Monitoring long / stuck CTAS
Posted by Andries Engelbrecht <ae...@maprtech.com>.
The time seems pretty long for that file size. What type of file is it?
Is the CTAS running single threaded?
—Andries
On May 28, 2015, at 9:37 AM, Matt <bs...@gmail.com> wrote:
>> How large is the data set you are working with, and your cluster/nodes?
>
> Just testing with that single 44GB source file currently, and my test cluster is made from 4 nodes, each with 8 CPU cores, 32GB RAM, a 6TB Ext4 volume (RAID-10).
>
> Drill defaults left as come in v1.0. I will be adjusting memory and retrying the CTAS.
>
> I know I can / should assign individual disks to HDFS, but as a test cluster there are apps that expect data volumes to work on. A dedicated Hadoop production cluster would have a disk layout specific to the task.
>
>
> On 28 May 2015, at 12:26, Andries Engelbrecht wrote:
>
>> Just check the drillbit.log and drillbit.out files in the log directory.
>> Before adjusting memory, see if that is an issue first. It was for me, but as Jason mentioned there can be other causes as well.
>>
>> You adjust memory allocation in the drill-env.sh files, and have to restart the drill bits.
>>
>> How large is the data set you are working with, and your cluster/nodes?
>>
>> —Andries
>>
>>
>> On May 28, 2015, at 9:17 AM, Matt <bs...@gmail.com> wrote:
>>
>>> To make sure I am adjusting the correct config, these are heap parameters within the Drill configure path, not for Hadoop or Zookeeper?
>>>
>>>
>>>> On May 28, 2015, at 12:08 PM, Jason Altekruse <al...@gmail.com> wrote:
>>>>
>>>> There should be no upper limit on the size of the tables you can create
>>>> with Drill. Be advised that Drill does currently operate entirely
>>>> optimistically in regards to available resources. If a network connection
>>>> between two drillbits fails during a query, we will not currently
>>>> re-schedule the work to make use of remaining nodes and network connections
>>>> that are still live. While we have had a good amount of success using Drill
>>>> for data conversion, be aware that these conditions could cause long
>>>> running queries to fail.
>>>>
>>>> That being said, it isn't the only possible cause for such a failure. In
>>>> the case of a network failure we would expect to see a message returned to
>>>> you that part of the query was unsuccessful and that it had been cancelled.
>>>> Andries has a good suggestion in regards to checking the heap memory, this
>>>> should also be detected and reported back to you at the CLI, but we may be
>>>> failing to propagate the error back to the head node for the query. I
>>>> believe writing parquet may still be the most heap-intensive operation in
>>>> Drill, despite our efforts to refactor the write path to use direct memory
>>>> instead of on-heap for large buffers needed in the process of creating
>>>> parquet files.
>>>>
>>>>> On Thu, May 28, 2015 at 8:43 AM, Matt <bs...@gmail.com> wrote:
>>>>>
>>>>> Is 300MM records too much to do in a single CTAS statement?
>>>>>
>>>>> After almost 23 hours I killed the query (^c) and it returned:
>>>>>
>>>>> ~~~
>>>>> +-----------+----------------------------+
>>>>> | Fragment | Number of records written |
>>>>> +-----------+----------------------------+
>>>>> | 1_20 | 13568824 |
>>>>> | 1_15 | 12411822 |
>>>>> | 1_7 | 12470329 |
>>>>> | 1_12 | 13693867 |
>>>>> | 1_5 | 13292136 |
>>>>> | 1_18 | 13874321 |
>>>>> | 1_16 | 13303094 |
>>>>> | 1_9 | 13639049 |
>>>>> | 1_10 | 13698380 |
>>>>> | 1_22 | 13501073 |
>>>>> | 1_8 | 13533736 |
>>>>> | 1_2 | 13549402 |
>>>>> | 1_21 | 13665183 |
>>>>> | 1_0 | 13544745 |
>>>>> | 1_4 | 13532957 |
>>>>> | 1_19 | 12767473 |
>>>>> | 1_17 | 13670687 |
>>>>> | 1_13 | 13469515 |
>>>>> | 1_23 | 12517632 |
>>>>> | 1_6 | 13634338 |
>>>>> | 1_14 | 13611322 |
>>>>> | 1_3 | 13061900 |
>>>>> | 1_11 | 12760978 |
>>>>> +-----------+----------------------------+
>>>>> 23 rows selected (82294.854 seconds)
>>>>> ~~~
>>>>>
>>>>> The sum of those record counts is 306,772,763 which is close to the
>>>>> 320,843,454 in the source file:
>>>>>
>>>>> ~~~
>>>>> 0: jdbc:drill:zk=es05:2181> select count(*) FROM root.`sample_201501.dat`;
>>>>> +------------+
>>>>> | EXPR$0 |
>>>>> +------------+
>>>>> | 320843454 |
>>>>> +------------+
>>>>> 1 row selected (384.665 seconds)
>>>>> ~~~
>>>>>
>>>>>
>>>>> It represents one month of data, 4 key columns and 38 numeric measure
>>>>> columns, which could also be partitioned daily. The test here was to create
>>>>> monthly Parquet files to see how the min/max stats on Parquet chunks help
>>>>> with range select performance.
>>>>>
>>>>> Instead of a small number of large monthly RDBMS tables, I am attempting
>>>>> to determine how many Parquet files should be used with Drill / HDFS.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 27 May 2015, at 15:17, Matt wrote:
>>>>>
>>>>> Attempting to create a Parquet backed table with a CTAS from an 44GB tab
>>>>>> delimited file in HDFS. The process seemed to be running, as CPU and IO was
>>>>>> seen on all 4 nodes in this cluster, and .parquet files being created in
>>>>>> the expected path.
>>>>>>
>>>>>> In however in the last two hours or so, all nodes show near zero CPU or
>>>>>> IO, and the Last Modified date on the .parquet have not changed. Same time
>>>>>> delay shown in the Last Progress column in the active fragment profile.
>>>>>>
>>>>>> What approach can I take to determine what is happening (or not)?
>>>>>
Re: Monitoring long / stuck CTAS
Posted by Matt <bs...@gmail.com>.
> How large is the data set you are working with, and your
> cluster/nodes?
Just testing with that single 44GB source file currently, and my test
cluster is made from 4 nodes, each with 8 CPU cores, 32GB RAM, a 6TB
Ext4 volume (RAID-10).
Drill defaults left as come in v1.0. I will be adjusting memory and
retrying the CTAS.
I know I can / should assign individual disks to HDFS, but as a test
cluster there are apps that expect data volumes to work on. A dedicated
Hadoop production cluster would have a disk layout specific to the task.
On 28 May 2015, at 12:26, Andries Engelbrecht wrote:
> Just check the drillbit.log and drillbit.out files in the log
> directory.
> Before adjusting memory, see if that is an issue first. It was for me,
> but as Jason mentioned there can be other causes as well.
>
> You adjust memory allocation in the drill-env.sh files, and have to
> restart the drill bits.
>
> How large is the data set you are working with, and your
> cluster/nodes?
>
> —Andries
>
>
> On May 28, 2015, at 9:17 AM, Matt <bs...@gmail.com> wrote:
>
>> To make sure I am adjusting the correct config, these are heap
>> parameters within the Drill configure path, not for Hadoop or
>> Zookeeper?
>>
>>
>>> On May 28, 2015, at 12:08 PM, Jason Altekruse
>>> <al...@gmail.com> wrote:
>>>
>>> There should be no upper limit on the size of the tables you can
>>> create
>>> with Drill. Be advised that Drill does currently operate entirely
>>> optimistically in regards to available resources. If a network
>>> connection
>>> between two drillbits fails during a query, we will not currently
>>> re-schedule the work to make use of remaining nodes and network
>>> connections
>>> that are still live. While we have had a good amount of success
>>> using Drill
>>> for data conversion, be aware that these conditions could cause long
>>> running queries to fail.
>>>
>>> That being said, it isn't the only possible cause for such a
>>> failure. In
>>> the case of a network failure we would expect to see a message
>>> returned to
>>> you that part of the query was unsuccessful and that it had been
>>> cancelled.
>>> Andries has a good suggestion in regards to checking the heap
>>> memory, this
>>> should also be detected and reported back to you at the CLI, but we
>>> may be
>>> failing to propagate the error back to the head node for the query.
>>> I
>>> believe writing parquet may still be the most heap-intensive
>>> operation in
>>> Drill, despite our efforts to refactor the write path to use direct
>>> memory
>>> instead of on-heap for large buffers needed in the process of
>>> creating
>>> parquet files.
>>>
>>>> On Thu, May 28, 2015 at 8:43 AM, Matt <bs...@gmail.com> wrote:
>>>>
>>>> Is 300MM records too much to do in a single CTAS statement?
>>>>
>>>> After almost 23 hours I killed the query (^c) and it returned:
>>>>
>>>> ~~~
>>>> +-----------+----------------------------+
>>>> | Fragment | Number of records written |
>>>> +-----------+----------------------------+
>>>> | 1_20 | 13568824 |
>>>> | 1_15 | 12411822 |
>>>> | 1_7 | 12470329 |
>>>> | 1_12 | 13693867 |
>>>> | 1_5 | 13292136 |
>>>> | 1_18 | 13874321 |
>>>> | 1_16 | 13303094 |
>>>> | 1_9 | 13639049 |
>>>> | 1_10 | 13698380 |
>>>> | 1_22 | 13501073 |
>>>> | 1_8 | 13533736 |
>>>> | 1_2 | 13549402 |
>>>> | 1_21 | 13665183 |
>>>> | 1_0 | 13544745 |
>>>> | 1_4 | 13532957 |
>>>> | 1_19 | 12767473 |
>>>> | 1_17 | 13670687 |
>>>> | 1_13 | 13469515 |
>>>> | 1_23 | 12517632 |
>>>> | 1_6 | 13634338 |
>>>> | 1_14 | 13611322 |
>>>> | 1_3 | 13061900 |
>>>> | 1_11 | 12760978 |
>>>> +-----------+----------------------------+
>>>> 23 rows selected (82294.854 seconds)
>>>> ~~~
>>>>
>>>> The sum of those record counts is 306,772,763 which is close to
>>>> the
>>>> 320,843,454 in the source file:
>>>>
>>>> ~~~
>>>> 0: jdbc:drill:zk=es05:2181> select count(*) FROM
>>>> root.`sample_201501.dat`;
>>>> +------------+
>>>> | EXPR$0 |
>>>> +------------+
>>>> | 320843454 |
>>>> +------------+
>>>> 1 row selected (384.665 seconds)
>>>> ~~~
>>>>
>>>>
>>>> It represents one month of data, 4 key columns and 38 numeric
>>>> measure
>>>> columns, which could also be partitioned daily. The test here was
>>>> to create
>>>> monthly Parquet files to see how the min/max stats on Parquet
>>>> chunks help
>>>> with range select performance.
>>>>
>>>> Instead of a small number of large monthly RDBMS tables, I am
>>>> attempting
>>>> to determine how many Parquet files should be used with Drill /
>>>> HDFS.
>>>>
>>>>
>>>>
>>>>
>>>> On 27 May 2015, at 15:17, Matt wrote:
>>>>
>>>> Attempting to create a Parquet backed table with a CTAS from an
>>>> 44GB tab
>>>>> delimited file in HDFS. The process seemed to be running, as CPU
>>>>> and IO was
>>>>> seen on all 4 nodes in this cluster, and .parquet files being
>>>>> created in
>>>>> the expected path.
>>>>>
>>>>> In however in the last two hours or so, all nodes show near zero
>>>>> CPU or
>>>>> IO, and the Last Modified date on the .parquet have not changed.
>>>>> Same time
>>>>> delay shown in the Last Progress column in the active fragment
>>>>> profile.
>>>>>
>>>>> What approach can I take to determine what is happening (or not)?
>>>>
Re: Monitoring long / stuck CTAS
Posted by Andries Engelbrecht <ae...@maprtech.com>.
Just check the drillbit.log and drillbit.out files in the log directory.
Before adjusting memory, see if that is an issue first. It was for me, but as Jason mentioned there can be other causes as well.
You adjust memory allocation in the drill-env.sh files, and have to restart the drill bits.
How large is the data set you are working with, and your cluster/nodes?
—Andries
On May 28, 2015, at 9:17 AM, Matt <bs...@gmail.com> wrote:
> To make sure I am adjusting the correct config, these are heap parameters within the Drill configure path, not for Hadoop or Zookeeper?
>
>
>> On May 28, 2015, at 12:08 PM, Jason Altekruse <al...@gmail.com> wrote:
>>
>> There should be no upper limit on the size of the tables you can create
>> with Drill. Be advised that Drill does currently operate entirely
>> optimistically in regards to available resources. If a network connection
>> between two drillbits fails during a query, we will not currently
>> re-schedule the work to make use of remaining nodes and network connections
>> that are still live. While we have had a good amount of success using Drill
>> for data conversion, be aware that these conditions could cause long
>> running queries to fail.
>>
>> That being said, it isn't the only possible cause for such a failure. In
>> the case of a network failure we would expect to see a message returned to
>> you that part of the query was unsuccessful and that it had been cancelled.
>> Andries has a good suggestion in regards to checking the heap memory, this
>> should also be detected and reported back to you at the CLI, but we may be
>> failing to propagate the error back to the head node for the query. I
>> believe writing parquet may still be the most heap-intensive operation in
>> Drill, despite our efforts to refactor the write path to use direct memory
>> instead of on-heap for large buffers needed in the process of creating
>> parquet files.
>>
>>> On Thu, May 28, 2015 at 8:43 AM, Matt <bs...@gmail.com> wrote:
>>>
>>> Is 300MM records too much to do in a single CTAS statement?
>>>
>>> After almost 23 hours I killed the query (^c) and it returned:
>>>
>>> ~~~
>>> +-----------+----------------------------+
>>> | Fragment | Number of records written |
>>> +-----------+----------------------------+
>>> | 1_20 | 13568824 |
>>> | 1_15 | 12411822 |
>>> | 1_7 | 12470329 |
>>> | 1_12 | 13693867 |
>>> | 1_5 | 13292136 |
>>> | 1_18 | 13874321 |
>>> | 1_16 | 13303094 |
>>> | 1_9 | 13639049 |
>>> | 1_10 | 13698380 |
>>> | 1_22 | 13501073 |
>>> | 1_8 | 13533736 |
>>> | 1_2 | 13549402 |
>>> | 1_21 | 13665183 |
>>> | 1_0 | 13544745 |
>>> | 1_4 | 13532957 |
>>> | 1_19 | 12767473 |
>>> | 1_17 | 13670687 |
>>> | 1_13 | 13469515 |
>>> | 1_23 | 12517632 |
>>> | 1_6 | 13634338 |
>>> | 1_14 | 13611322 |
>>> | 1_3 | 13061900 |
>>> | 1_11 | 12760978 |
>>> +-----------+----------------------------+
>>> 23 rows selected (82294.854 seconds)
>>> ~~~
>>>
>>> The sum of those record counts is 306,772,763 which is close to the
>>> 320,843,454 in the source file:
>>>
>>> ~~~
>>> 0: jdbc:drill:zk=es05:2181> select count(*) FROM root.`sample_201501.dat`;
>>> +------------+
>>> | EXPR$0 |
>>> +------------+
>>> | 320843454 |
>>> +------------+
>>> 1 row selected (384.665 seconds)
>>> ~~~
>>>
>>>
>>> It represents one month of data, 4 key columns and 38 numeric measure
>>> columns, which could also be partitioned daily. The test here was to create
>>> monthly Parquet files to see how the min/max stats on Parquet chunks help
>>> with range select performance.
>>>
>>> Instead of a small number of large monthly RDBMS tables, I am attempting
>>> to determine how many Parquet files should be used with Drill / HDFS.
>>>
>>>
>>>
>>>
>>> On 27 May 2015, at 15:17, Matt wrote:
>>>
>>> Attempting to create a Parquet backed table with a CTAS from an 44GB tab
>>>> delimited file in HDFS. The process seemed to be running, as CPU and IO was
>>>> seen on all 4 nodes in this cluster, and .parquet files being created in
>>>> the expected path.
>>>>
>>>> In however in the last two hours or so, all nodes show near zero CPU or
>>>> IO, and the Last Modified date on the .parquet have not changed. Same time
>>>> delay shown in the Last Progress column in the active fragment profile.
>>>>
>>>> What approach can I take to determine what is happening (or not)?
>>>
Re: Monitoring long / stuck CTAS
Posted by Matt <bs...@gmail.com>.
To make sure I am adjusting the correct config, these are heap parameters within the Drill configure path, not for Hadoop or Zookeeper?
> On May 28, 2015, at 12:08 PM, Jason Altekruse <al...@gmail.com> wrote:
>
> There should be no upper limit on the size of the tables you can create
> with Drill. Be advised that Drill does currently operate entirely
> optimistically in regards to available resources. If a network connection
> between two drillbits fails during a query, we will not currently
> re-schedule the work to make use of remaining nodes and network connections
> that are still live. While we have had a good amount of success using Drill
> for data conversion, be aware that these conditions could cause long
> running queries to fail.
>
> That being said, it isn't the only possible cause for such a failure. In
> the case of a network failure we would expect to see a message returned to
> you that part of the query was unsuccessful and that it had been cancelled.
> Andries has a good suggestion in regards to checking the heap memory, this
> should also be detected and reported back to you at the CLI, but we may be
> failing to propagate the error back to the head node for the query. I
> believe writing parquet may still be the most heap-intensive operation in
> Drill, despite our efforts to refactor the write path to use direct memory
> instead of on-heap for large buffers needed in the process of creating
> parquet files.
>
>> On Thu, May 28, 2015 at 8:43 AM, Matt <bs...@gmail.com> wrote:
>>
>> Is 300MM records too much to do in a single CTAS statement?
>>
>> After almost 23 hours I killed the query (^c) and it returned:
>>
>> ~~~
>> +-----------+----------------------------+
>> | Fragment | Number of records written |
>> +-----------+----------------------------+
>> | 1_20 | 13568824 |
>> | 1_15 | 12411822 |
>> | 1_7 | 12470329 |
>> | 1_12 | 13693867 |
>> | 1_5 | 13292136 |
>> | 1_18 | 13874321 |
>> | 1_16 | 13303094 |
>> | 1_9 | 13639049 |
>> | 1_10 | 13698380 |
>> | 1_22 | 13501073 |
>> | 1_8 | 13533736 |
>> | 1_2 | 13549402 |
>> | 1_21 | 13665183 |
>> | 1_0 | 13544745 |
>> | 1_4 | 13532957 |
>> | 1_19 | 12767473 |
>> | 1_17 | 13670687 |
>> | 1_13 | 13469515 |
>> | 1_23 | 12517632 |
>> | 1_6 | 13634338 |
>> | 1_14 | 13611322 |
>> | 1_3 | 13061900 |
>> | 1_11 | 12760978 |
>> +-----------+----------------------------+
>> 23 rows selected (82294.854 seconds)
>> ~~~
>>
>> The sum of those record counts is 306,772,763 which is close to the
>> 320,843,454 in the source file:
>>
>> ~~~
>> 0: jdbc:drill:zk=es05:2181> select count(*) FROM root.`sample_201501.dat`;
>> +------------+
>> | EXPR$0 |
>> +------------+
>> | 320843454 |
>> +------------+
>> 1 row selected (384.665 seconds)
>> ~~~
>>
>>
>> It represents one month of data, 4 key columns and 38 numeric measure
>> columns, which could also be partitioned daily. The test here was to create
>> monthly Parquet files to see how the min/max stats on Parquet chunks help
>> with range select performance.
>>
>> Instead of a small number of large monthly RDBMS tables, I am attempting
>> to determine how many Parquet files should be used with Drill / HDFS.
>>
>>
>>
>>
>> On 27 May 2015, at 15:17, Matt wrote:
>>
>> Attempting to create a Parquet backed table with a CTAS from an 44GB tab
>>> delimited file in HDFS. The process seemed to be running, as CPU and IO was
>>> seen on all 4 nodes in this cluster, and .parquet files being created in
>>> the expected path.
>>>
>>> In however in the last two hours or so, all nodes show near zero CPU or
>>> IO, and the Last Modified date on the .parquet have not changed. Same time
>>> delay shown in the Last Progress column in the active fragment profile.
>>>
>>> What approach can I take to determine what is happening (or not)?
>>
Re: Monitoring long / stuck CTAS
Posted by Jason Altekruse <al...@gmail.com>.
There should be no upper limit on the size of the tables you can create
with Drill. Be advised that Drill does currently operate entirely
optimistically in regards to available resources. If a network connection
between two drillbits fails during a query, we will not currently
re-schedule the work to make use of remaining nodes and network connections
that are still live. While we have had a good amount of success using Drill
for data conversion, be aware that these conditions could cause long
running queries to fail.
That being said, it isn't the only possible cause for such a failure. In
the case of a network failure we would expect to see a message returned to
you that part of the query was unsuccessful and that it had been cancelled.
Andries has a good suggestion in regards to checking the heap memory, this
should also be detected and reported back to you at the CLI, but we may be
failing to propagate the error back to the head node for the query. I
believe writing parquet may still be the most heap-intensive operation in
Drill, despite our efforts to refactor the write path to use direct memory
instead of on-heap for large buffers needed in the process of creating
parquet files.
On Thu, May 28, 2015 at 8:43 AM, Matt <bs...@gmail.com> wrote:
> Is 300MM records too much to do in a single CTAS statement?
>
> After almost 23 hours I killed the query (^c) and it returned:
>
> ~~~
> +-----------+----------------------------+
> | Fragment | Number of records written |
> +-----------+----------------------------+
> | 1_20 | 13568824 |
> | 1_15 | 12411822 |
> | 1_7 | 12470329 |
> | 1_12 | 13693867 |
> | 1_5 | 13292136 |
> | 1_18 | 13874321 |
> | 1_16 | 13303094 |
> | 1_9 | 13639049 |
> | 1_10 | 13698380 |
> | 1_22 | 13501073 |
> | 1_8 | 13533736 |
> | 1_2 | 13549402 |
> | 1_21 | 13665183 |
> | 1_0 | 13544745 |
> | 1_4 | 13532957 |
> | 1_19 | 12767473 |
> | 1_17 | 13670687 |
> | 1_13 | 13469515 |
> | 1_23 | 12517632 |
> | 1_6 | 13634338 |
> | 1_14 | 13611322 |
> | 1_3 | 13061900 |
> | 1_11 | 12760978 |
> +-----------+----------------------------+
> 23 rows selected (82294.854 seconds)
> ~~~
>
> The sum of those record counts is 306,772,763 which is close to the
> 320,843,454 in the source file:
>
> ~~~
> 0: jdbc:drill:zk=es05:2181> select count(*) FROM root.`sample_201501.dat`;
> +------------+
> | EXPR$0 |
> +------------+
> | 320843454 |
> +------------+
> 1 row selected (384.665 seconds)
> ~~~
>
>
> It represents one month of data, 4 key columns and 38 numeric measure
> columns, which could also be partitioned daily. The test here was to create
> monthly Parquet files to see how the min/max stats on Parquet chunks help
> with range select performance.
>
> Instead of a small number of large monthly RDBMS tables, I am attempting
> to determine how many Parquet files should be used with Drill / HDFS.
>
>
>
>
> On 27 May 2015, at 15:17, Matt wrote:
>
> Attempting to create a Parquet backed table with a CTAS from an 44GB tab
>> delimited file in HDFS. The process seemed to be running, as CPU and IO was
>> seen on all 4 nodes in this cluster, and .parquet files being created in
>> the expected path.
>>
>> In however in the last two hours or so, all nodes show near zero CPU or
>> IO, and the Last Modified date on the .parquet have not changed. Same time
>> delay shown in the Last Progress column in the active fragment profile.
>>
>> What approach can I take to determine what is happening (or not)?
>>
>
Re: Monitoring long / stuck CTAS
Posted by Matt <bs...@gmail.com>.
I did not note any memory errors or warnings in a quick scan of the logs, but to double check, is there a specific log I would find such warnings in?
> On May 28, 2015, at 12:01 PM, Andries Engelbrecht <ae...@maprtech.com> wrote:
>
> I have used a single CTAS to create tables using parquet with 1.5B rows.
>
> It did consume a lot of heap memory on the Drillbits and I had to increase the heap size. Check your logs to see if you are running out of heap memory.
>
> I used 128MB parquet block size.
>
> This was with Drill 0.9 , so I’m sure 1.0 will be better in this regard.
>
> —Andries
>
>
>
>> On May 28, 2015, at 8:43 AM, Matt <bs...@gmail.com> wrote:
>>
>> Is 300MM records too much to do in a single CTAS statement?
>>
>> After almost 23 hours I killed the query (^c) and it returned:
>>
>> ~~~
>> +-----------+----------------------------+
>> | Fragment | Number of records written |
>> +-----------+----------------------------+
>> | 1_20 | 13568824 |
>> | 1_15 | 12411822 |
>> | 1_7 | 12470329 |
>> | 1_12 | 13693867 |
>> | 1_5 | 13292136 |
>> | 1_18 | 13874321 |
>> | 1_16 | 13303094 |
>> | 1_9 | 13639049 |
>> | 1_10 | 13698380 |
>> | 1_22 | 13501073 |
>> | 1_8 | 13533736 |
>> | 1_2 | 13549402 |
>> | 1_21 | 13665183 |
>> | 1_0 | 13544745 |
>> | 1_4 | 13532957 |
>> | 1_19 | 12767473 |
>> | 1_17 | 13670687 |
>> | 1_13 | 13469515 |
>> | 1_23 | 12517632 |
>> | 1_6 | 13634338 |
>> | 1_14 | 13611322 |
>> | 1_3 | 13061900 |
>> | 1_11 | 12760978 |
>> +-----------+----------------------------+
>> 23 rows selected (82294.854 seconds)
>> ~~~
>>
>> The sum of those record counts is 306,772,763 which is close to the 320,843,454 in the source file:
>>
>> ~~~
>> 0: jdbc:drill:zk=es05:2181> select count(*) FROM root.`sample_201501.dat`;
>> +------------+
>> | EXPR$0 |
>> +------------+
>> | 320843454 |
>> +------------+
>> 1 row selected (384.665 seconds)
>> ~~~
>>
>>
>> It represents one month of data, 4 key columns and 38 numeric measure columns, which could also be partitioned daily. The test here was to create monthly Parquet files to see how the min/max stats on Parquet chunks help with range select performance.
>>
>> Instead of a small number of large monthly RDBMS tables, I am attempting to determine how many Parquet files should be used with Drill / HDFS.
>>
>>
>>
>>> On 27 May 2015, at 15:17, Matt wrote:
>>>
>>> Attempting to create a Parquet backed table with a CTAS from an 44GB tab delimited file in HDFS. The process seemed to be running, as CPU and IO was seen on all 4 nodes in this cluster, and .parquet files being created in the expected path.
>>>
>>> In however in the last two hours or so, all nodes show near zero CPU or IO, and the Last Modified date on the .parquet have not changed. Same time delay shown in the Last Progress column in the active fragment profile.
>>>
>>> What approach can I take to determine what is happening (or not)?
>
Re: Monitoring long / stuck CTAS
Posted by Andries Engelbrecht <ae...@maprtech.com>.
I have used a single CTAS to create tables using parquet with 1.5B rows.
It did consume a lot of heap memory on the Drillbits and I had to increase the heap size. Check your logs to see if you are running out of heap memory.
I used 128MB parquet block size.
This was with Drill 0.9 , so I’m sure 1.0 will be better in this regard.
—Andries
On May 28, 2015, at 8:43 AM, Matt <bs...@gmail.com> wrote:
> Is 300MM records too much to do in a single CTAS statement?
>
> After almost 23 hours I killed the query (^c) and it returned:
>
> ~~~
> +-----------+----------------------------+
> | Fragment | Number of records written |
> +-----------+----------------------------+
> | 1_20 | 13568824 |
> | 1_15 | 12411822 |
> | 1_7 | 12470329 |
> | 1_12 | 13693867 |
> | 1_5 | 13292136 |
> | 1_18 | 13874321 |
> | 1_16 | 13303094 |
> | 1_9 | 13639049 |
> | 1_10 | 13698380 |
> | 1_22 | 13501073 |
> | 1_8 | 13533736 |
> | 1_2 | 13549402 |
> | 1_21 | 13665183 |
> | 1_0 | 13544745 |
> | 1_4 | 13532957 |
> | 1_19 | 12767473 |
> | 1_17 | 13670687 |
> | 1_13 | 13469515 |
> | 1_23 | 12517632 |
> | 1_6 | 13634338 |
> | 1_14 | 13611322 |
> | 1_3 | 13061900 |
> | 1_11 | 12760978 |
> +-----------+----------------------------+
> 23 rows selected (82294.854 seconds)
> ~~~
>
> The sum of those record counts is 306,772,763 which is close to the 320,843,454 in the source file:
>
> ~~~
> 0: jdbc:drill:zk=es05:2181> select count(*) FROM root.`sample_201501.dat`;
> +------------+
> | EXPR$0 |
> +------------+
> | 320843454 |
> +------------+
> 1 row selected (384.665 seconds)
> ~~~
>
>
> It represents one month of data, 4 key columns and 38 numeric measure columns, which could also be partitioned daily. The test here was to create monthly Parquet files to see how the min/max stats on Parquet chunks help with range select performance.
>
> Instead of a small number of large monthly RDBMS tables, I am attempting to determine how many Parquet files should be used with Drill / HDFS.
>
>
>
> On 27 May 2015, at 15:17, Matt wrote:
>
>> Attempting to create a Parquet backed table with a CTAS from an 44GB tab delimited file in HDFS. The process seemed to be running, as CPU and IO was seen on all 4 nodes in this cluster, and .parquet files being created in the expected path.
>>
>> In however in the last two hours or so, all nodes show near zero CPU or IO, and the Last Modified date on the .parquet have not changed. Same time delay shown in the Last Progress column in the active fragment profile.
>>
>> What approach can I take to determine what is happening (or not)?
Re: Monitoring long / stuck CTAS
Posted by Matt <bs...@gmail.com>.
Is 300MM records too much to do in a single CTAS statement?
After almost 23 hours I killed the query (^c) and it returned:
~~~
+-----------+----------------------------+
| Fragment | Number of records written |
+-----------+----------------------------+
| 1_20 | 13568824 |
| 1_15 | 12411822 |
| 1_7 | 12470329 |
| 1_12 | 13693867 |
| 1_5 | 13292136 |
| 1_18 | 13874321 |
| 1_16 | 13303094 |
| 1_9 | 13639049 |
| 1_10 | 13698380 |
| 1_22 | 13501073 |
| 1_8 | 13533736 |
| 1_2 | 13549402 |
| 1_21 | 13665183 |
| 1_0 | 13544745 |
| 1_4 | 13532957 |
| 1_19 | 12767473 |
| 1_17 | 13670687 |
| 1_13 | 13469515 |
| 1_23 | 12517632 |
| 1_6 | 13634338 |
| 1_14 | 13611322 |
| 1_3 | 13061900 |
| 1_11 | 12760978 |
+-----------+----------------------------+
23 rows selected (82294.854 seconds)
~~~
The sum of those record counts is 306,772,763 which is close to the
320,843,454 in the source file:
~~~
0: jdbc:drill:zk=es05:2181> select count(*) FROM
root.`sample_201501.dat`;
+------------+
| EXPR$0 |
+------------+
| 320843454 |
+------------+
1 row selected (384.665 seconds)
~~~
It represents one month of data, 4 key columns and 38 numeric measure
columns, which could also be partitioned daily. The test here was to
create monthly Parquet files to see how the min/max stats on Parquet
chunks help with range select performance.
Instead of a small number of large monthly RDBMS tables, I am attempting
to determine how many Parquet files should be used with Drill / HDFS.
On 27 May 2015, at 15:17, Matt wrote:
> Attempting to create a Parquet backed table with a CTAS from an 44GB
> tab delimited file in HDFS. The process seemed to be running, as CPU
> and IO was seen on all 4 nodes in this cluster, and .parquet files
> being created in the expected path.
>
> In however in the last two hours or so, all nodes show near zero CPU
> or IO, and the Last Modified date on the .parquet have not changed.
> Same time delay shown in the Last Progress column in the active
> fragment profile.
>
> What approach can I take to determine what is happening (or not)?