You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@drill.apache.org by Matt <bs...@gmail.com> on 2015/05/27 21:17:37 UTC

Monitoring long / stuck CTAS

Attempting to create a Parquet backed table with a CTAS from an 44GB tab 
delimited file in HDFS. The process seemed to be running, as CPU and IO 
was seen on all 4 nodes in this cluster, and .parquet files being 
created in the expected path.

In however in the last two hours or so, all nodes show near zero CPU or 
IO, and the Last Modified date on the .parquet have not changed. Same 
time delay shown in the Last Progress column in the active fragment 
profile.

What approach can I take to determine what is happening (or not)?

Re: Monitoring long / stuck CTAS

Posted by Sudheesh Katkam <sk...@maprtech.com>.

See below:

> On May 27, 2015, at 12:17 PM, Matt <bs...@gmail.com> wrote:
> 
> Attempting to create a Parquet backed table with a CTAS from an 44GB tab delimited file in HDFS. The process seemed to be running, as CPU and IO was seen on all 4 nodes in this cluster, and .parquet files being created in the expected path.
> 
> In however in the last two hours or so, all nodes show near zero CPU or IO, and the Last Modified date on the .parquet have not changed. Same time delay shown in the Last Progress column in the active fragment profile.

Did you happen to notice the Last Update column in the profile? If so, was there a time delay in that too?

> 
> What approach can I take to determine what is happening (or not)?
>

Re: Monitoring long / stuck CTAS

Posted by Matt <bs...@gmail.com>.

Bumping memory to:

DRILL_MAX_DIRECT_MEMORY="16G"
DRILL_HEAP="8G"

The 44GB file imported successfully in 25 minutes - acceptable on this 
hardware.

I don't know if the default memory setting was to blame or not.


On 28 May 2015, at 14:22, Andries Engelbrecht wrote:

> That is the Drill direct memory per node.
>
> DRILL_HEAP is for the heap size per node.
>
> More info here
> http://drill.apache.org/docs/configuring-drill-memory/
>
>
> —Andries
>
> On May 28, 2015, at 11:09 AM, Matt <bs...@gmail.com> wrote:
>
>> Referencing http://drill.apache.org/docs/configuring-drill-memory/
>>
>> Is DRILL_MAX_DIRECT_MEMORY the limit for each node, or the cluster?
>>
>> The root page on a drillbit at port 8047 list for nodes, with the 16G 
>> Maximum Direct Memory equal to DRILL_MAX_DIRECT_MEMORY, thus 
>> uncertain if that is a node or cluster limit.
>>
>>
>> On 28 May 2015, at 12:23, Jason Altekruse wrote:
>>
>>> That is correct. I guess it could be possible that HDFS might run 
>>> out of
>>> heap, but I'm guessing that is unlikely the cause of the failure you 
>>> are
>>> seeing. We should not be taxing zookeeper enough to be causing any 
>>> issues
>>> there.
>>>
>>> On Thu, May 28, 2015 at 9:17 AM, Matt <bs...@gmail.com> wrote:
>>>
>>>> To make sure I am adjusting the correct config, these are heap 
>>>> parameters
>>>> within the Drill configure path, not for Hadoop or Zookeeper?
>>>>
>>>>
>>>>> On May 28, 2015, at 12:08 PM, Jason Altekruse 
>>>>> <al...@gmail.com>
>>>> wrote:
>>>>>
>>>>> There should be no upper limit on the size of the tables you can 
>>>>> create
>>>>> with Drill. Be advised that Drill does currently operate entirely
>>>>> optimistically in regards to available resources. If a network 
>>>>> connection
>>>>> between two drillbits fails during a query, we will not currently
>>>>> re-schedule the work to make use of remaining nodes and network
>>>> connections
>>>>> that are still live. While we have had a good amount of success 
>>>>> using
>>>> Drill
>>>>> for data conversion, be aware that these conditions could cause 
>>>>> long
>>>>> running queries to fail.
>>>>>
>>>>> That being said, it isn't the only possible cause for such a 
>>>>> failure. In
>>>>> the case of a network failure we would expect to see a message 
>>>>> returned
>>>> to
>>>>> you that part of the query was unsuccessful and that it had been
>>>> cancelled.
>>>>> Andries has a good suggestion in regards to checking the heap 
>>>>> memory,
>>>> this
>>>>> should also be detected and reported back to you at the CLI, but 
>>>>> we may
>>>> be
>>>>> failing to propagate the error back to the head node for the 
>>>>> query. I
>>>>> believe writing parquet may still be the most heap-intensive 
>>>>> operation in
>>>>> Drill, despite our efforts to refactor the write path to use 
>>>>> direct
>>>> memory
>>>>> instead of on-heap for large buffers needed in the process of 
>>>>> creating
>>>>> parquet files.
>>>>>
>>>>>> On Thu, May 28, 2015 at 8:43 AM, Matt <bs...@gmail.com> wrote:
>>>>>>
>>>>>> Is 300MM records too much to do in a single CTAS statement?
>>>>>>
>>>>>> After almost 23 hours I killed the query (^c) and it returned:
>>>>>>
>>>>>> ~~~
>>>>>> +-----------+----------------------------+
>>>>>> | Fragment  | Number of records written  |
>>>>>> +-----------+----------------------------+
>>>>>> | 1_20      | 13568824                   |
>>>>>> | 1_15      | 12411822                   |
>>>>>> | 1_7       | 12470329                   |
>>>>>> | 1_12      | 13693867                   |
>>>>>> | 1_5       | 13292136                   |
>>>>>> | 1_18      | 13874321                   |
>>>>>> | 1_16      | 13303094                   |
>>>>>> | 1_9       | 13639049                   |
>>>>>> | 1_10      | 13698380                   |
>>>>>> | 1_22      | 13501073                   |
>>>>>> | 1_8       | 13533736                   |
>>>>>> | 1_2       | 13549402                   |
>>>>>> | 1_21      | 13665183                   |
>>>>>> | 1_0       | 13544745                   |
>>>>>> | 1_4       | 13532957                   |
>>>>>> | 1_19      | 12767473                   |
>>>>>> | 1_17      | 13670687                   |
>>>>>> | 1_13      | 13469515                   |
>>>>>> | 1_23      | 12517632                   |
>>>>>> | 1_6       | 13634338                   |
>>>>>> | 1_14      | 13611322                   |
>>>>>> | 1_3       | 13061900                   |
>>>>>> | 1_11      | 12760978                   |
>>>>>> +-----------+----------------------------+
>>>>>> 23 rows selected (82294.854 seconds)
>>>>>> ~~~
>>>>>>
>>>>>> The sum of those record counts is  306,772,763 which is close to 
>>>>>> the
>>>>>> 320,843,454 in the source file:
>>>>>>
>>>>>> ~~~
>>>>>> 0: jdbc:drill:zk=es05:2181> select count(*)  FROM
>>>> root.`sample_201501.dat`;
>>>>>> +------------+
>>>>>> |   EXPR$0   |
>>>>>> +------------+
>>>>>> | 320843454  |
>>>>>> +------------+
>>>>>> 1 row selected (384.665 seconds)
>>>>>> ~~~
>>>>>>
>>>>>>
>>>>>> It represents one month of data, 4 key columns and 38 numeric 
>>>>>> measure
>>>>>> columns, which could also be partitioned daily. The test here was 
>>>>>> to
>>>> create
>>>>>> monthly Parquet files to see how the min/max stats on Parquet 
>>>>>> chunks
>>>> help
>>>>>> with range select performance.
>>>>>>
>>>>>> Instead of a small number of large monthly RDBMS tables, I am 
>>>>>> attempting
>>>>>> to determine how many Parquet files should be used with Drill / 
>>>>>> HDFS.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 27 May 2015, at 15:17, Matt wrote:
>>>>>>
>>>>>> Attempting to create a Parquet backed table with a CTAS from an 
>>>>>> 44GB tab
>>>>>>> delimited file in HDFS. The process seemed to be running, as CPU 
>>>>>>> and
>>>> IO was
>>>>>>> seen on all 4 nodes in this cluster, and .parquet files being 
>>>>>>> created
>>>> in
>>>>>>> the expected path.
>>>>>>>
>>>>>>> In however in the last two hours or so, all nodes show near zero 
>>>>>>> CPU or
>>>>>>> IO, and the Last Modified date on the .parquet have not changed. 
>>>>>>> Same
>>>> time
>>>>>>> delay shown in the Last Progress column in the active fragment 
>>>>>>> profile.
>>>>>>>
>>>>>>> What approach can I take to determine what is happening (or 
>>>>>>> not)?
>>>>>>
>>>>

Re: Monitoring long / stuck CTAS

Posted by Andries Engelbrecht <ae...@maprtech.com>.

That is the Drill direct memory per node.

DRILL_HEAP is for the heap size per node.

More info here
http://drill.apache.org/docs/configuring-drill-memory/


—Andries

On May 28, 2015, at 11:09 AM, Matt <bs...@gmail.com> wrote:

> Referencing http://drill.apache.org/docs/configuring-drill-memory/
> 
> Is DRILL_MAX_DIRECT_MEMORY the limit for each node, or the cluster?
> 
> The root page on a drillbit at port 8047 list for nodes, with the 16G Maximum Direct Memory equal to DRILL_MAX_DIRECT_MEMORY, thus uncertain if that is a node or cluster limit.
> 
> 
> On 28 May 2015, at 12:23, Jason Altekruse wrote:
> 
>> That is correct. I guess it could be possible that HDFS might run out of
>> heap, but I'm guessing that is unlikely the cause of the failure you are
>> seeing. We should not be taxing zookeeper enough to be causing any issues
>> there.
>> 
>> On Thu, May 28, 2015 at 9:17 AM, Matt <bs...@gmail.com> wrote:
>> 
>>> To make sure I am adjusting the correct config, these are heap parameters
>>> within the Drill configure path, not for Hadoop or Zookeeper?
>>> 
>>> 
>>>> On May 28, 2015, at 12:08 PM, Jason Altekruse <al...@gmail.com>
>>> wrote:
>>>> 
>>>> There should be no upper limit on the size of the tables you can create
>>>> with Drill. Be advised that Drill does currently operate entirely
>>>> optimistically in regards to available resources. If a network connection
>>>> between two drillbits fails during a query, we will not currently
>>>> re-schedule the work to make use of remaining nodes and network
>>> connections
>>>> that are still live. While we have had a good amount of success using
>>> Drill
>>>> for data conversion, be aware that these conditions could cause long
>>>> running queries to fail.
>>>> 
>>>> That being said, it isn't the only possible cause for such a failure. In
>>>> the case of a network failure we would expect to see a message returned
>>> to
>>>> you that part of the query was unsuccessful and that it had been
>>> cancelled.
>>>> Andries has a good suggestion in regards to checking the heap memory,
>>> this
>>>> should also be detected and reported back to you at the CLI, but we may
>>> be
>>>> failing to propagate the error back to the head node for the query. I
>>>> believe writing parquet may still be the most heap-intensive operation in
>>>> Drill, despite our efforts to refactor the write path to use direct
>>> memory
>>>> instead of on-heap for large buffers needed in the process of creating
>>>> parquet files.
>>>> 
>>>>> On Thu, May 28, 2015 at 8:43 AM, Matt <bs...@gmail.com> wrote:
>>>>> 
>>>>> Is 300MM records too much to do in a single CTAS statement?
>>>>> 
>>>>> After almost 23 hours I killed the query (^c) and it returned:
>>>>> 
>>>>> ~~~
>>>>> +-----------+----------------------------+
>>>>> | Fragment  | Number of records written  |
>>>>> +-----------+----------------------------+
>>>>> | 1_20      | 13568824                   |
>>>>> | 1_15      | 12411822                   |
>>>>> | 1_7       | 12470329                   |
>>>>> | 1_12      | 13693867                   |
>>>>> | 1_5       | 13292136                   |
>>>>> | 1_18      | 13874321                   |
>>>>> | 1_16      | 13303094                   |
>>>>> | 1_9       | 13639049                   |
>>>>> | 1_10      | 13698380                   |
>>>>> | 1_22      | 13501073                   |
>>>>> | 1_8       | 13533736                   |
>>>>> | 1_2       | 13549402                   |
>>>>> | 1_21      | 13665183                   |
>>>>> | 1_0       | 13544745                   |
>>>>> | 1_4       | 13532957                   |
>>>>> | 1_19      | 12767473                   |
>>>>> | 1_17      | 13670687                   |
>>>>> | 1_13      | 13469515                   |
>>>>> | 1_23      | 12517632                   |
>>>>> | 1_6       | 13634338                   |
>>>>> | 1_14      | 13611322                   |
>>>>> | 1_3       | 13061900                   |
>>>>> | 1_11      | 12760978                   |
>>>>> +-----------+----------------------------+
>>>>> 23 rows selected (82294.854 seconds)
>>>>> ~~~
>>>>> 
>>>>> The sum of those record counts is  306,772,763 which is close to the
>>>>> 320,843,454 in the source file:
>>>>> 
>>>>> ~~~
>>>>> 0: jdbc:drill:zk=es05:2181> select count(*)  FROM
>>> root.`sample_201501.dat`;
>>>>> +------------+
>>>>> |   EXPR$0   |
>>>>> +------------+
>>>>> | 320843454  |
>>>>> +------------+
>>>>> 1 row selected (384.665 seconds)
>>>>> ~~~
>>>>> 
>>>>> 
>>>>> It represents one month of data, 4 key columns and 38 numeric measure
>>>>> columns, which could also be partitioned daily. The test here was to
>>> create
>>>>> monthly Parquet files to see how the min/max stats on Parquet chunks
>>> help
>>>>> with range select performance.
>>>>> 
>>>>> Instead of a small number of large monthly RDBMS tables, I am attempting
>>>>> to determine how many Parquet files should be used with Drill / HDFS.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On 27 May 2015, at 15:17, Matt wrote:
>>>>> 
>>>>> Attempting to create a Parquet backed table with a CTAS from an 44GB tab
>>>>>> delimited file in HDFS. The process seemed to be running, as CPU and
>>> IO was
>>>>>> seen on all 4 nodes in this cluster, and .parquet files being created
>>> in
>>>>>> the expected path.
>>>>>> 
>>>>>> In however in the last two hours or so, all nodes show near zero CPU or
>>>>>> IO, and the Last Modified date on the .parquet have not changed. Same
>>> time
>>>>>> delay shown in the Last Progress column in the active fragment profile.
>>>>>> 
>>>>>> What approach can I take to determine what is happening (or not)?
>>>>> 
>>>

Re: Monitoring long / stuck CTAS

Posted by Matt <bs...@gmail.com>.

Referencing http://drill.apache.org/docs/configuring-drill-memory/

Is DRILL_MAX_DIRECT_MEMORY the limit for each node, or the cluster?

The root page on a drillbit at port 8047 list for nodes, with the 16G 
Maximum Direct Memory equal to DRILL_MAX_DIRECT_MEMORY, thus uncertain 
if that is a node or cluster limit.


On 28 May 2015, at 12:23, Jason Altekruse wrote:

> That is correct. I guess it could be possible that HDFS might run out 
> of
> heap, but I'm guessing that is unlikely the cause of the failure you 
> are
> seeing. We should not be taxing zookeeper enough to be causing any 
> issues
> there.
>
> On Thu, May 28, 2015 at 9:17 AM, Matt <bs...@gmail.com> wrote:
>
>> To make sure I am adjusting the correct config, these are heap 
>> parameters
>> within the Drill configure path, not for Hadoop or Zookeeper?
>>
>>
>>> On May 28, 2015, at 12:08 PM, Jason Altekruse 
>>> <al...@gmail.com>
>> wrote:
>>>
>>> There should be no upper limit on the size of the tables you can 
>>> create
>>> with Drill. Be advised that Drill does currently operate entirely
>>> optimistically in regards to available resources. If a network 
>>> connection
>>> between two drillbits fails during a query, we will not currently
>>> re-schedule the work to make use of remaining nodes and network
>> connections
>>> that are still live. While we have had a good amount of success 
>>> using
>> Drill
>>> for data conversion, be aware that these conditions could cause long
>>> running queries to fail.
>>>
>>> That being said, it isn't the only possible cause for such a 
>>> failure. In
>>> the case of a network failure we would expect to see a message 
>>> returned
>> to
>>> you that part of the query was unsuccessful and that it had been
>> cancelled.
>>> Andries has a good suggestion in regards to checking the heap 
>>> memory,
>> this
>>> should also be detected and reported back to you at the CLI, but we 
>>> may
>> be
>>> failing to propagate the error back to the head node for the query. 
>>> I
>>> believe writing parquet may still be the most heap-intensive 
>>> operation in
>>> Drill, despite our efforts to refactor the write path to use direct
>> memory
>>> instead of on-heap for large buffers needed in the process of 
>>> creating
>>> parquet files.
>>>
>>>> On Thu, May 28, 2015 at 8:43 AM, Matt <bs...@gmail.com> wrote:
>>>>
>>>> Is 300MM records too much to do in a single CTAS statement?
>>>>
>>>> After almost 23 hours I killed the query (^c) and it returned:
>>>>
>>>> ~~~
>>>> +-----------+----------------------------+
>>>> | Fragment  | Number of records written  |
>>>> +-----------+----------------------------+
>>>> | 1_20      | 13568824                   |
>>>> | 1_15      | 12411822                   |
>>>> | 1_7       | 12470329                   |
>>>> | 1_12      | 13693867                   |
>>>> | 1_5       | 13292136                   |
>>>> | 1_18      | 13874321                   |
>>>> | 1_16      | 13303094                   |
>>>> | 1_9       | 13639049                   |
>>>> | 1_10      | 13698380                   |
>>>> | 1_22      | 13501073                   |
>>>> | 1_8       | 13533736                   |
>>>> | 1_2       | 13549402                   |
>>>> | 1_21      | 13665183                   |
>>>> | 1_0       | 13544745                   |
>>>> | 1_4       | 13532957                   |
>>>> | 1_19      | 12767473                   |
>>>> | 1_17      | 13670687                   |
>>>> | 1_13      | 13469515                   |
>>>> | 1_23      | 12517632                   |
>>>> | 1_6       | 13634338                   |
>>>> | 1_14      | 13611322                   |
>>>> | 1_3       | 13061900                   |
>>>> | 1_11      | 12760978                   |
>>>> +-----------+----------------------------+
>>>> 23 rows selected (82294.854 seconds)
>>>> ~~~
>>>>
>>>> The sum of those record counts is  306,772,763 which is close to 
>>>> the
>>>> 320,843,454 in the source file:
>>>>
>>>> ~~~
>>>> 0: jdbc:drill:zk=es05:2181> select count(*)  FROM
>> root.`sample_201501.dat`;
>>>> +------------+
>>>> |   EXPR$0   |
>>>> +------------+
>>>> | 320843454  |
>>>> +------------+
>>>> 1 row selected (384.665 seconds)
>>>> ~~~
>>>>
>>>>
>>>> It represents one month of data, 4 key columns and 38 numeric 
>>>> measure
>>>> columns, which could also be partitioned daily. The test here was 
>>>> to
>> create
>>>> monthly Parquet files to see how the min/max stats on Parquet 
>>>> chunks
>> help
>>>> with range select performance.
>>>>
>>>> Instead of a small number of large monthly RDBMS tables, I am 
>>>> attempting
>>>> to determine how many Parquet files should be used with Drill / 
>>>> HDFS.
>>>>
>>>>
>>>>
>>>>
>>>> On 27 May 2015, at 15:17, Matt wrote:
>>>>
>>>> Attempting to create a Parquet backed table with a CTAS from an 
>>>> 44GB tab
>>>>> delimited file in HDFS. The process seemed to be running, as CPU 
>>>>> and
>> IO was
>>>>> seen on all 4 nodes in this cluster, and .parquet files being 
>>>>> created
>> in
>>>>> the expected path.
>>>>>
>>>>> In however in the last two hours or so, all nodes show near zero 
>>>>> CPU or
>>>>> IO, and the Last Modified date on the .parquet have not changed. 
>>>>> Same
>> time
>>>>> delay shown in the Last Progress column in the active fragment 
>>>>> profile.
>>>>>
>>>>> What approach can I take to determine what is happening (or not)?
>>>>
>>

Re: Monitoring long / stuck CTAS

Posted by Jason Altekruse <al...@gmail.com>.

That is correct. I guess it could be possible that HDFS might run out of
heap, but I'm guessing that is unlikely the cause of the failure you are
seeing. We should not be taxing zookeeper enough to be causing any issues
there.

On Thu, May 28, 2015 at 9:17 AM, Matt <bs...@gmail.com> wrote:

> To make sure I am adjusting the correct config, these are heap parameters
> within the Drill configure path, not for Hadoop or Zookeeper?
>
>
> > On May 28, 2015, at 12:08 PM, Jason Altekruse <al...@gmail.com>
> wrote:
> >
> > There should be no upper limit on the size of the tables you can create
> > with Drill. Be advised that Drill does currently operate entirely
> > optimistically in regards to available resources. If a network connection
> > between two drillbits fails during a query, we will not currently
> > re-schedule the work to make use of remaining nodes and network
> connections
> > that are still live. While we have had a good amount of success using
> Drill
> > for data conversion, be aware that these conditions could cause long
> > running queries to fail.
> >
> > That being said, it isn't the only possible cause for such a failure. In
> > the case of a network failure we would expect to see a message returned
> to
> > you that part of the query was unsuccessful and that it had been
> cancelled.
> > Andries has a good suggestion in regards to checking the heap memory,
> this
> > should also be detected and reported back to you at the CLI, but we may
> be
> > failing to propagate the error back to the head node for the query. I
> > believe writing parquet may still be the most heap-intensive operation in
> > Drill, despite our efforts to refactor the write path to use direct
> memory
> > instead of on-heap for large buffers needed in the process of creating
> > parquet files.
> >
> >> On Thu, May 28, 2015 at 8:43 AM, Matt <bs...@gmail.com> wrote:
> >>
> >> Is 300MM records too much to do in a single CTAS statement?
> >>
> >> After almost 23 hours I killed the query (^c) and it returned:
> >>
> >> ~~~
> >> +-----------+----------------------------+
> >> | Fragment  | Number of records written  |
> >> +-----------+----------------------------+
> >> | 1_20      | 13568824                   |
> >> | 1_15      | 12411822                   |
> >> | 1_7       | 12470329                   |
> >> | 1_12      | 13693867                   |
> >> | 1_5       | 13292136                   |
> >> | 1_18      | 13874321                   |
> >> | 1_16      | 13303094                   |
> >> | 1_9       | 13639049                   |
> >> | 1_10      | 13698380                   |
> >> | 1_22      | 13501073                   |
> >> | 1_8       | 13533736                   |
> >> | 1_2       | 13549402                   |
> >> | 1_21      | 13665183                   |
> >> | 1_0       | 13544745                   |
> >> | 1_4       | 13532957                   |
> >> | 1_19      | 12767473                   |
> >> | 1_17      | 13670687                   |
> >> | 1_13      | 13469515                   |
> >> | 1_23      | 12517632                   |
> >> | 1_6       | 13634338                   |
> >> | 1_14      | 13611322                   |
> >> | 1_3       | 13061900                   |
> >> | 1_11      | 12760978                   |
> >> +-----------+----------------------------+
> >> 23 rows selected (82294.854 seconds)
> >> ~~~
> >>
> >> The sum of those record counts is  306,772,763 which is close to the
> >> 320,843,454 in the source file:
> >>
> >> ~~~
> >> 0: jdbc:drill:zk=es05:2181> select count(*)  FROM
> root.`sample_201501.dat`;
> >> +------------+
> >> |   EXPR$0   |
> >> +------------+
> >> | 320843454  |
> >> +------------+
> >> 1 row selected (384.665 seconds)
> >> ~~~
> >>
> >>
> >> It represents one month of data, 4 key columns and 38 numeric measure
> >> columns, which could also be partitioned daily. The test here was to
> create
> >> monthly Parquet files to see how the min/max stats on Parquet chunks
> help
> >> with range select performance.
> >>
> >> Instead of a small number of large monthly RDBMS tables, I am attempting
> >> to determine how many Parquet files should be used with Drill / HDFS.
> >>
> >>
> >>
> >>
> >> On 27 May 2015, at 15:17, Matt wrote:
> >>
> >> Attempting to create a Parquet backed table with a CTAS from an 44GB tab
> >>> delimited file in HDFS. The process seemed to be running, as CPU and
> IO was
> >>> seen on all 4 nodes in this cluster, and .parquet files being created
> in
> >>> the expected path.
> >>>
> >>> In however in the last two hours or so, all nodes show near zero CPU or
> >>> IO, and the Last Modified date on the .parquet have not changed. Same
> time
> >>> delay shown in the Last Progress column in the active fragment profile.
> >>>
> >>> What approach can I take to determine what is happening (or not)?
> >>
>

Re: Monitoring long / stuck CTAS

Posted by Matt <bs...@gmail.com>.

> 1) it isn't HDFS.

Is MapR-FS a replacement or stand-in for HDFS?


On 29 May 2015, at 5:55, Ted Dunning wrote:

> Apologies for the plug, but using MapR FS would help you a lot here.  The
> trick is that you can run an NFS server on every node and mount that server
> as localhost.
>
> The benefits are:
>
> 1) the entire cluster appears as a conventional POSIX style file system in
> addition to being available via HDFS API's.
>
> 2) you get very high I/O speeds
>
> 3) you get real snapshots and mirrors if you need them
>
> 4) you get the use of the HBase API without having to run HBase.  Tables
> are integrated directly into MapR FS.
>
> 5) programs that need to exceed local disk size can do so
>
> 6) data can be isolated to single machines if you want.
>
> 7) you can get it for free or pay for support
>
>
> The downsides are:
>
> 1) it isn't HDFS.
>
> 2) the data platform isn't Apache licensed (all of eco-system code is
> unchanged wrt licensing)
>
>
>
>
>
> On Thu, May 28, 2015 at 9:37 AM, Matt <bs...@gmail.com> wrote:
>
>> I know I can / should assign individual disks to HDFS, but as a test
>> cluster there are apps that expect data volumes to work on. A dedicated
>> Hadoop production cluster would have a disk layout specific to the task.

Re: Monitoring long / stuck CTAS

Posted by Carol McDonald <cm...@maprtech.com>.

What Ted just talked about  is also explained in this On Demand Training

https://www.mapr.com/services/mapr-academy/mapr-distribution-essentials-training-course-on-demand

(which is free)


On Fri, May 29, 2015 at 5:29 PM, Ted Dunning <te...@gmail.com> wrote:

> There are two methods to support HBase table API's.  The first is to simply
> run HBase. That is just like, well, running HBase.
>
> The more interesting alternative is to use a special client API that talks
> a special table-oriented wire protocol to the file system which implements
> a column-family / column oriented table API similar to what HBase uses.
> The big differences have to do with the fact that code inside the file
> system has capabilities available to it that are not available to HBase.
> For instance, it can use a file oriented transaction and recovery system.
> It can also make use of knowledge about file system layout that is not
> available to HBase.
>
> Because we can optimize the file layouts, we can also change the low level
> protocols for disk reorganization.  MapR tables have more levels of
> sub-division than HBase and we use different low-level algorithms.  This
> results in having lots of write-ahead logs which would crush HDFS because
> of the commit rate, but it allows very fast crash recovery (10's to low
> 100's of ms after the basic file system is back)
>
> Also, since the tables are built using standard file-system primitives all
> of the transactionally correct snapshots and mirrors carry over to tables
> as well.
>
> Oh, and it tends to be a lot faster and failure tolerant as well.
>
>
>
> On Fri, May 29, 2015 at 7:00 AM, Yousef Lasi <yo...@gmail.com>
> wrote:
>
> > Could you expand on the HBase table integration? How does that work?
> >
> > On Fri, May 29, 2015 at 5:55 AM, Ted Dunning <te...@gmail.com>
> > wrote:
> >
> > >
> > > 4) you get the use of the HBase API without having to run HBase.
> Tables
> > > are integrated directly into MapR FS.
> > >
> > >
> > >
> > >
> > >
> > > On Thu, May 28, 2015 at 9:37 AM, Matt <bs...@gmail.com> wrote:
> > >
> > > > I know I can / should assign individual disks to HDFS, but as a test
> > > > cluster there are apps that expect data volumes to work on. A
> dedicated
> > > > Hadoop production cluster would have a disk layout specific to the
> > task.
> > >
> >
>

Re: Monitoring long / stuck CTAS

Posted by Ted Dunning <te...@gmail.com>.

There are two methods to support HBase table API's.  The first is to simply
run HBase. That is just like, well, running HBase.

The more interesting alternative is to use a special client API that talks
a special table-oriented wire protocol to the file system which implements
a column-family / column oriented table API similar to what HBase uses.
The big differences have to do with the fact that code inside the file
system has capabilities available to it that are not available to HBase.
For instance, it can use a file oriented transaction and recovery system.
It can also make use of knowledge about file system layout that is not
available to HBase.

Because we can optimize the file layouts, we can also change the low level
protocols for disk reorganization.  MapR tables have more levels of
sub-division than HBase and we use different low-level algorithms.  This
results in having lots of write-ahead logs which would crush HDFS because
of the commit rate, but it allows very fast crash recovery (10's to low
100's of ms after the basic file system is back)

Also, since the tables are built using standard file-system primitives all
of the transactionally correct snapshots and mirrors carry over to tables
as well.

Oh, and it tends to be a lot faster and failure tolerant as well.

On Fri, May 29, 2015 at 7:00 AM, Yousef Lasi <yo...@gmail.com> wrote:

> Could you expand on the HBase table integration? How does that work?
>
> On Fri, May 29, 2015 at 5:55 AM, Ted Dunning <te...@gmail.com>
> wrote:
>
> >
> > 4) you get the use of the HBase API without having to run HBase.  Tables
> > are integrated directly into MapR FS.
> >
> >
> >
> >
> >
> > On Thu, May 28, 2015 at 9:37 AM, Matt <bs...@gmail.com> wrote:
> >
> > > I know I can / should assign individual disks to HDFS, but as a test
> > > cluster there are apps that expect data volumes to work on. A dedicated
> > > Hadoop production cluster would have a disk layout specific to the
> task.
> >
>

Re: Monitoring long / stuck CTAS

Posted by Yousef Lasi <yo...@gmail.com>.

Could you expand on the HBase table integration? How does that work?

On Fri, May 29, 2015 at 5:55 AM, Ted Dunning <te...@gmail.com> wrote:

>
> 4) you get the use of the HBase API without having to run HBase.  Tables
> are integrated directly into MapR FS.
>
>
>
>
>
> On Thu, May 28, 2015 at 9:37 AM, Matt <bs...@gmail.com> wrote:
>
> > I know I can / should assign individual disks to HDFS, but as a test
> > cluster there are apps that expect data volumes to work on. A dedicated
> > Hadoop production cluster would have a disk layout specific to the task.
>

Re: Monitoring long / stuck CTAS

Posted by Ted Dunning <te...@gmail.com>.

Apologies for the plug, but using MapR FS would help you a lot here.  The
trick is that you can run an NFS server on every node and mount that server
as localhost.

The benefits are:

1) the entire cluster appears as a conventional POSIX style file system in
addition to being available via HDFS API's.

2) you get very high I/O speeds

3) you get real snapshots and mirrors if you need them

4) you get the use of the HBase API without having to run HBase.  Tables
are integrated directly into MapR FS.

5) programs that need to exceed local disk size can do so

6) data can be isolated to single machines if you want.

7) you can get it for free or pay for support

The downsides are:

1) it isn't HDFS.

2) the data platform isn't Apache licensed (all of eco-system code is
unchanged wrt licensing)

On Thu, May 28, 2015 at 9:37 AM, Matt <bs...@gmail.com> wrote:

> I know I can / should assign individual disks to HDFS, but as a test
> cluster there are apps that expect data volumes to work on. A dedicated
> Hadoop production cluster would have a disk layout specific to the task.

Re: Monitoring long / stuck CTAS

Posted by Matt <bs...@gmail.com>.

CPU and IO went to near zero on the master and all nodes after about 1 
hour. I am do not know if the bulk of rows were written within that hour 
or after.

> Is there any way you can read the table and try to validate if all of 
> the data was written?

A simple join will show me where it stopped, and if that was at a 
specific point in scanning the source file top to bottom.

> While we certainly want to look into this more to find the issue in 
> your case, you might have all of the data you need to start running 
> queries against the parquet files.

Simple row count comparison tells me about 5% of the rows are missing in 
the destination, but I will be confirming that.


On 28 May 2015, at 13:24, Jason Altekruse wrote:

> He mentioned in his original post that he saw CPU and IO on all of the
> nodes for a while when the query was active, but it suddenly dropped 
> down
> to low CPU usage and stopped producing files. It seems like we are 
> failing
> to detect an error an cancel the query.
>
> It is possible that the failure happened when we were finalizing the 
> query,
> cleanup resources/file handles/ etc. Is there any way you can read the
> table and try to validate if all of the data was written? You can try 
> to
> run a few of the same queries against the tab delimited files and 
> resulting
> parquet files to see if all of the records were written. While we 
> certainly
> want to look into this more to find the issue in your case, you might 
> have
> all of the data you need to start running queries against the parquet 
> files.
>
> On Thu, May 28, 2015 at 10:06 AM, Andries Engelbrecht <
> aengelbrecht@maprtech.com> wrote:
>
>> The time seems pretty long for that file size. What type of file is 
>> it?
>>
>> Is the CTAS running single threaded?
>>
>> —Andries
>>
>>
>> On May 28, 2015, at 9:37 AM, Matt <bs...@gmail.com> wrote:
>>
>>>> How large is the data set you are working with, and your 
>>>> cluster/nodes?
>>>
>>> Just testing with that single 44GB source file currently, and my 
>>> test
>> cluster is made from 4 nodes, each with 8 CPU cores, 32GB RAM, a 6TB 
>> Ext4
>> volume (RAID-10).
>>>
>>> Drill defaults left as come in v1.0. I will be adjusting memory and
>> retrying the CTAS.
>>>
>>> I know I can / should assign individual disks to HDFS, but as a test
>> cluster there are apps that expect data volumes to work on. A 
>> dedicated
>> Hadoop production cluster would have a disk layout specific to the 
>> task.
>>>
>>>
>>> On 28 May 2015, at 12:26, Andries Engelbrecht wrote:
>>>
>>>> Just check the drillbit.log and drillbit.out files in the log 
>>>> directory.
>>>> Before adjusting memory, see if that is an issue first. It was for 
>>>> me,
>> but as Jason mentioned there can be other causes as well.
>>>>
>>>> You adjust memory allocation in the drill-env.sh files, and have to
>> restart the drill bits.
>>>>
>>>> How large is the data set you are working with, and your 
>>>> cluster/nodes?
>>>>
>>>> —Andries
>>>>
>>>>
>>>> On May 28, 2015, at 9:17 AM, Matt <bs...@gmail.com> wrote:
>>>>
>>>>> To make sure I am adjusting the correct config, these are heap
>> parameters within the Drill configure path, not for Hadoop or 
>> Zookeeper?
>>>>>
>>>>>
>>>>>> On May 28, 2015, at 12:08 PM, Jason Altekruse <
>> altekrusejason@gmail.com> wrote:
>>>>>>
>>>>>> There should be no upper limit on the size of the tables you can
>> create
>>>>>> with Drill. Be advised that Drill does currently operate entirely
>>>>>> optimistically in regards to available resources. If a network
>> connection
>>>>>> between two drillbits fails during a query, we will not currently
>>>>>> re-schedule the work to make use of remaining nodes and network
>> connections
>>>>>> that are still live. While we have had a good amount of success 
>>>>>> using
>> Drill
>>>>>> for data conversion, be aware that these conditions could cause 
>>>>>> long
>>>>>> running queries to fail.
>>>>>>
>>>>>> That being said, it isn't the only possible cause for such a 
>>>>>> failure.
>> In
>>>>>> the case of a network failure we would expect to see a message
>> returned to
>>>>>> you that part of the query was unsuccessful and that it had been
>> cancelled.
>>>>>> Andries has a good suggestion in regards to checking the heap 
>>>>>> memory,
>> this
>>>>>> should also be detected and reported back to you at the CLI, but 
>>>>>> we
>> may be
>>>>>> failing to propagate the error back to the head node for the 
>>>>>> query. I
>>>>>> believe writing parquet may still be the most heap-intensive
>> operation in
>>>>>> Drill, despite our efforts to refactor the write path to use 
>>>>>> direct
>> memory
>>>>>> instead of on-heap for large buffers needed in the process of 
>>>>>> creating
>>>>>> parquet files.
>>>>>>
>>>>>>> On Thu, May 28, 2015 at 8:43 AM, Matt <bs...@gmail.com> wrote:
>>>>>>>
>>>>>>> Is 300MM records too much to do in a single CTAS statement?
>>>>>>>
>>>>>>> After almost 23 hours I killed the query (^c) and it returned:
>>>>>>>
>>>>>>> ~~~
>>>>>>> +-----------+----------------------------+
>>>>>>> | Fragment  | Number of records written  |
>>>>>>> +-----------+----------------------------+
>>>>>>> | 1_20      | 13568824                   |
>>>>>>> | 1_15      | 12411822                   |
>>>>>>> | 1_7       | 12470329                   |
>>>>>>> | 1_12      | 13693867                   |
>>>>>>> | 1_5       | 13292136                   |
>>>>>>> | 1_18      | 13874321                   |
>>>>>>> | 1_16      | 13303094                   |
>>>>>>> | 1_9       | 13639049                   |
>>>>>>> | 1_10      | 13698380                   |
>>>>>>> | 1_22      | 13501073                   |
>>>>>>> | 1_8       | 13533736                   |
>>>>>>> | 1_2       | 13549402                   |
>>>>>>> | 1_21      | 13665183                   |
>>>>>>> | 1_0       | 13544745                   |
>>>>>>> | 1_4       | 13532957                   |
>>>>>>> | 1_19      | 12767473                   |
>>>>>>> | 1_17      | 13670687                   |
>>>>>>> | 1_13      | 13469515                   |
>>>>>>> | 1_23      | 12517632                   |
>>>>>>> | 1_6       | 13634338                   |
>>>>>>> | 1_14      | 13611322                   |
>>>>>>> | 1_3       | 13061900                   |
>>>>>>> | 1_11      | 12760978                   |
>>>>>>> +-----------+----------------------------+
>>>>>>> 23 rows selected (82294.854 seconds)
>>>>>>> ~~~
>>>>>>>
>>>>>>> The sum of those record counts is  306,772,763 which is close to 
>>>>>>> the
>>>>>>> 320,843,454 in the source file:
>>>>>>>
>>>>>>> ~~~
>>>>>>> 0: jdbc:drill:zk=es05:2181> select count(*)  FROM
>> root.`sample_201501.dat`;
>>>>>>> +------------+
>>>>>>> |   EXPR$0   |
>>>>>>> +------------+
>>>>>>> | 320843454  |
>>>>>>> +------------+
>>>>>>> 1 row selected (384.665 seconds)
>>>>>>> ~~~
>>>>>>>
>>>>>>>
>>>>>>> It represents one month of data, 4 key columns and 38 numeric 
>>>>>>> measure
>>>>>>> columns, which could also be partitioned daily. The test here 
>>>>>>> was to
>> create
>>>>>>> monthly Parquet files to see how the min/max stats on Parquet 
>>>>>>> chunks
>> help
>>>>>>> with range select performance.
>>>>>>>
>>>>>>> Instead of a small number of large monthly RDBMS tables, I am
>> attempting
>>>>>>> to determine how many Parquet files should be used with Drill / 
>>>>>>> HDFS.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 27 May 2015, at 15:17, Matt wrote:
>>>>>>>
>>>>>>> Attempting to create a Parquet backed table with a CTAS from an 
>>>>>>> 44GB
>> tab
>>>>>>>> delimited file in HDFS. The process seemed to be running, as 
>>>>>>>> CPU
>> and IO was
>>>>>>>> seen on all 4 nodes in this cluster, and .parquet files being
>> created in
>>>>>>>> the expected path.
>>>>>>>>
>>>>>>>> In however in the last two hours or so, all nodes show near 
>>>>>>>> zero
>> CPU or
>>>>>>>> IO, and the Last Modified date on the .parquet have not 
>>>>>>>> changed.
>> Same time
>>>>>>>> delay shown in the Last Progress column in the active fragment
>> profile.
>>>>>>>>
>>>>>>>> What approach can I take to determine what is happening (or 
>>>>>>>> not)?
>>>>>>>
>>
>>

Re: Monitoring long / stuck CTAS

Posted by Jason Altekruse <al...@gmail.com>.

He mentioned in his original post that he saw CPU and IO on all of the
nodes for a while when the query was active, but it suddenly dropped down
to low CPU usage and stopped producing files. It seems like we are failing
to detect an error an cancel the query.

It is possible that the failure happened when we were finalizing the query,
cleanup resources/file handles/ etc. Is there any way you can read the
table and try to validate if all of the data was written? You can try to
run a few of the same queries against the tab delimited files and resulting
parquet files to see if all of the records were written. While we certainly
want to look into this more to find the issue in your case, you might have
all of the data you need to start running queries against the parquet files.

On Thu, May 28, 2015 at 10:06 AM, Andries Engelbrecht <
aengelbrecht@maprtech.com> wrote:

> The time seems pretty long for that file size. What type of file is it?
>
> Is the CTAS running single threaded?
>
> —Andries
>
>
> On May 28, 2015, at 9:37 AM, Matt <bs...@gmail.com> wrote:
>
> >> How large is the data set you are working with, and your cluster/nodes?
> >
> > Just testing with that single 44GB source file currently, and my test
> cluster is made from 4 nodes, each with 8 CPU cores, 32GB RAM, a 6TB Ext4
> volume (RAID-10).
> >
> > Drill defaults left as come in v1.0. I will be adjusting memory and
> retrying the CTAS.
> >
> > I know I can / should assign individual disks to HDFS, but as a test
> cluster there are apps that expect data volumes to work on. A dedicated
> Hadoop production cluster would have a disk layout specific to the task.
> >
> >
> > On 28 May 2015, at 12:26, Andries Engelbrecht wrote:
> >
> >> Just check the drillbit.log and drillbit.out files in the log directory.
> >> Before adjusting memory, see if that is an issue first. It was for me,
> but as Jason mentioned there can be other causes as well.
> >>
> >> You adjust memory allocation in the drill-env.sh files, and have to
> restart the drill bits.
> >>
> >> How large is the data set you are working with, and your cluster/nodes?
> >>
> >> —Andries
> >>
> >>
> >> On May 28, 2015, at 9:17 AM, Matt <bs...@gmail.com> wrote:
> >>
> >>> To make sure I am adjusting the correct config, these are heap
> parameters within the Drill configure path, not for Hadoop or Zookeeper?
> >>>
> >>>
> >>>> On May 28, 2015, at 12:08 PM, Jason Altekruse <
> altekrusejason@gmail.com> wrote:
> >>>>
> >>>> There should be no upper limit on the size of the tables you can
> create
> >>>> with Drill. Be advised that Drill does currently operate entirely
> >>>> optimistically in regards to available resources. If a network
> connection
> >>>> between two drillbits fails during a query, we will not currently
> >>>> re-schedule the work to make use of remaining nodes and network
> connections
> >>>> that are still live. While we have had a good amount of success using
> Drill
> >>>> for data conversion, be aware that these conditions could cause long
> >>>> running queries to fail.
> >>>>
> >>>> That being said, it isn't the only possible cause for such a failure.
> In
> >>>> the case of a network failure we would expect to see a message
> returned to
> >>>> you that part of the query was unsuccessful and that it had been
> cancelled.
> >>>> Andries has a good suggestion in regards to checking the heap memory,
> this
> >>>> should also be detected and reported back to you at the CLI, but we
> may be
> >>>> failing to propagate the error back to the head node for the query. I
> >>>> believe writing parquet may still be the most heap-intensive
> operation in
> >>>> Drill, despite our efforts to refactor the write path to use direct
> memory
> >>>> instead of on-heap for large buffers needed in the process of creating
> >>>> parquet files.
> >>>>
> >>>>> On Thu, May 28, 2015 at 8:43 AM, Matt <bs...@gmail.com> wrote:
> >>>>>
> >>>>> Is 300MM records too much to do in a single CTAS statement?
> >>>>>
> >>>>> After almost 23 hours I killed the query (^c) and it returned:
> >>>>>
> >>>>> ~~~
> >>>>> +-----------+----------------------------+
> >>>>> | Fragment  | Number of records written  |
> >>>>> +-----------+----------------------------+
> >>>>> | 1_20      | 13568824                   |
> >>>>> | 1_15      | 12411822                   |
> >>>>> | 1_7       | 12470329                   |
> >>>>> | 1_12      | 13693867                   |
> >>>>> | 1_5       | 13292136                   |
> >>>>> | 1_18      | 13874321                   |
> >>>>> | 1_16      | 13303094                   |
> >>>>> | 1_9       | 13639049                   |
> >>>>> | 1_10      | 13698380                   |
> >>>>> | 1_22      | 13501073                   |
> >>>>> | 1_8       | 13533736                   |
> >>>>> | 1_2       | 13549402                   |
> >>>>> | 1_21      | 13665183                   |
> >>>>> | 1_0       | 13544745                   |
> >>>>> | 1_4       | 13532957                   |
> >>>>> | 1_19      | 12767473                   |
> >>>>> | 1_17      | 13670687                   |
> >>>>> | 1_13      | 13469515                   |
> >>>>> | 1_23      | 12517632                   |
> >>>>> | 1_6       | 13634338                   |
> >>>>> | 1_14      | 13611322                   |
> >>>>> | 1_3       | 13061900                   |
> >>>>> | 1_11      | 12760978                   |
> >>>>> +-----------+----------------------------+
> >>>>> 23 rows selected (82294.854 seconds)
> >>>>> ~~~
> >>>>>
> >>>>> The sum of those record counts is  306,772,763 which is close to the
> >>>>> 320,843,454 in the source file:
> >>>>>
> >>>>> ~~~
> >>>>> 0: jdbc:drill:zk=es05:2181> select count(*)  FROM
> root.`sample_201501.dat`;
> >>>>> +------------+
> >>>>> |   EXPR$0   |
> >>>>> +------------+
> >>>>> | 320843454  |
> >>>>> +------------+
> >>>>> 1 row selected (384.665 seconds)
> >>>>> ~~~
> >>>>>
> >>>>>
> >>>>> It represents one month of data, 4 key columns and 38 numeric measure
> >>>>> columns, which could also be partitioned daily. The test here was to
> create
> >>>>> monthly Parquet files to see how the min/max stats on Parquet chunks
> help
> >>>>> with range select performance.
> >>>>>
> >>>>> Instead of a small number of large monthly RDBMS tables, I am
> attempting
> >>>>> to determine how many Parquet files should be used with Drill / HDFS.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 27 May 2015, at 15:17, Matt wrote:
> >>>>>
> >>>>> Attempting to create a Parquet backed table with a CTAS from an 44GB
> tab
> >>>>>> delimited file in HDFS. The process seemed to be running, as CPU
> and IO was
> >>>>>> seen on all 4 nodes in this cluster, and .parquet files being
> created in
> >>>>>> the expected path.
> >>>>>>
> >>>>>> In however in the last two hours or so, all nodes show near zero
> CPU or
> >>>>>> IO, and the Last Modified date on the .parquet have not changed.
> Same time
> >>>>>> delay shown in the Last Progress column in the active fragment
> profile.
> >>>>>>
> >>>>>> What approach can I take to determine what is happening (or not)?
> >>>>>
>
>

Re: Monitoring long / stuck CTAS

Posted by Matt <bs...@gmail.com>.

I have another test case that queries a table using a filter of a range 
of dates and customer key, that SUMs 38 columns. The returned record set 
encompasses all 42 columns in the table - not a good design for parquet 
files or any RDBMS, but a modeling problem that is not yet fully in my 
control (the application needs some changes).

Simply selecting all the columns in the parquet files with that filter 
returns data to the client in about 3 seconds, but SUMming all of the 38 
measure columns resulted in the query still running at the client 22 
hours later.

However, the query profile shows no fragments with a Max Runtime or more 
than 2h20m, much like the "stuck CTAS" I had before. Learning from that 
case, I looked at the node hosting the one fragment that did not finish. 
Could this be a communication failure between nodes that is not 
signaling the client?

~~~
Major Fragment: 02-xx-xx
Minor Fragment ID	Host Name	Start	End	Runtime	Max Records	Max 
Batches	Last Update	Last Progress	Peak Memory	State
02-00-xx	es06	1.011s	2h20m	2h20m	0	1	02:35:43	02:35:43	2MB	CANCELLED
02-01-xx	es08	0.999s	4m33s	4m32s	0	1	01:19:52	01:19:52	2MB	FINISHED
02-02-xx	es07	1.010s	2m16s	2m15s	0	1	01:17:34	01:17:34	2MB	FINISHED
02-03-xx	es05	1.009s	2m56s	2m55s	0	1	01:18:14	01:18:14	2MB	FINISHED
~~~

~~~
2015-05-29 05:23:07,822 [UserServer-1] INFO  
o.a.drill.exec.work.foreman.Foreman - Failure while trying communicate 
query result to initiating client. This would happen if a client is 
disconnected before response notice can be sent.
org.apache.drill.exec.rpc.ChannelClosedException: null
         at 
org.apache.drill.exec.rpc.CoordinationQueue$RpcListener.operationComplete(CoordinationQueue.java:89) 
[drill-java-exec-1.0.0-rebuffed.jar:1.0.0]
         at 
org.apache.drill.exec.rpc.CoordinationQueue$RpcListener.operationComplete(CoordinationQueue.java:67) 
[drill-java-exec-1.0.0-rebuffed.jar:1.0.0]
         at 
io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) 
[netty-common-4.0.27.Final.jar:4.0.27.Final]
         at 
io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603) 
[netty-common-4.0.27.Final.jar:4.0.27.Final]
         at 
io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563) 
[netty-common-4.0.27.Final.jar:4.0.27.Final]
         at 
io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424) 
[netty-common-4.0.27.Final.jar:4.0.27.Final]
         at 
io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:788) 
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
         at 
io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:689) 
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
         at 
io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1114) 
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
         at 
io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:705) 
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
         at 
io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32) 
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
         at 
io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:980) 
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
         at 
io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1032) 
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
         at 
io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:965) 
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
         at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) 
[netty-common-4.0.27.Final.jar:4.0.27.Final]
         at 
io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:254) 
[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
         at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) 
[netty-common-4.0.27.Final.jar:4.0.27.Final]
         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79]
2015-05-29 05:23:07,822 [UserServer-1] INFO  
o.a.drill.exec.work.foreman.Foreman - State change requested.  CANCELED 
--> FAILED
org.apache.drill.exec.rpc.ChannelClosedException: null
         at 
org.apache.drill.exec.rpc.CoordinationQueue$RpcListener.operationComplete(CoordinationQueue.java:89) 
[drill-java-exec-1.0.0-rebuffed.jar:1.0.0]
         at 
org.apache.drill.exec.rpc.CoordinationQueue$RpcListener.operationComplete(CoordinationQueue.java:67) 
[drill-java-exec-1.0.0-rebuffed.jar:1.0.0]
         at 
io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) 
[netty-common-4.0.27.Final.jar:4.0.27.Final]
         at 
io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603) 
[netty-common-4.0.27.Final.jar:4.0.27.Final]
         at 
io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563) 
[netty-common-4.0.27.Final.jar:4.0.27.Final]
         at 
io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424) 
[netty-common-4.0.27.Final.jar:4.0.27.Final]
         at 
io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:788) 
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
         at 
io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:689) 
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
         at 
io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1114) 
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
         at 
io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:705) 
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
         at 
io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32) 
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
         at 
io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:980) 
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
         at 
io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1032) 
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
         at 
io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:965) 
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
         at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) 
[netty-common-4.0.27.Final.jar:4.0.27.Final]
         at 
io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:254) 
[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
         at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) 
[netty-common-4.0.27.Final.jar:4.0.27.Final]
         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79]
~~~



On 28 May 2015, at 16:43, Mehant Baid wrote:

> I think the problem might be related to a single laggard, looks like 
> we are waiting for one minor fragment to complete. Based on the output 
> you provided looks like the fragment 1_1 hasn't completed. You might 
> want to find out where the fragment was scheduled and what is going on 
> in that node. It might also be useful to look at the profile for that 
> minor fragment to see how much data has been processed.
>
>
> Thanks
> Mehant
>
> On 5/28/15 10:57 AM, Matt wrote:
>>> Did you check the log files for any errors?
>>
>> No messages related to this query containing errors or warning, nor 
>> nothing mentioning memory or heap. Querying now to determine what is 
>> missing in the parquet destination.
>>
>> drillbit.out on the master shows no error messages, and what looks 
>> like the last relevant line is:
>>
>> ~~~
>> May 27, 2015 6:43:50 PM INFO: 
>> parquet.hadoop.ColumnChunkPageWriteStore: written 2,258,263B for 
>> [bytes_1250] INT64: 3,069,414 values, 24,555,504B raw, 2,257,112B 
>> comp, 24 pages, encodings: [RLE, PLAIN, BIT_PACKED]
>> May 27, 2015 6:43:51 PM INFO: parquet.haMay 28, 2015 5:13:42 PM 
>> org.apache.calcite.sql.validate.SqlValidatorException <init>
>> ~~~
>>
>> The final lines in drillbit.log (which appear to use a different time 
>> format in the log) that contain the profile ID:
>>
>> ~~~
>> 2015-05-27 18:39:49,980 
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:20] INFO 
>> o.a.d.e.w.fragment.FragmentExecutor - 
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:20: State change requested 
>> from RUNNING --> FINISHED for
>> 2015-05-27 18:39:49,981 
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:20] INFO 
>> o.a.d.e.w.f.AbstractStatusReporter - State changed for 
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:20. New state: FINISHED
>> 2015-05-27 18:40:05,650 
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:12] INFO 
>> o.a.d.e.w.fragment.FragmentExecutor - 
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:12: State change requested 
>> from RUNNING --> FINISHED for
>> 2015-05-27 18:40:05,650 
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:12] INFO 
>> o.a.d.e.w.f.AbstractStatusReporter - State changed for 
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:12. New state: FINISHED
>> 2015-05-27 18:41:57,444 
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:16] INFO 
>> o.a.d.e.w.fragment.FragmentExecutor - 
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:16: State change requested 
>> from RUNNING --> FINISHED for
>> 2015-05-27 18:41:57,444 
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:16] INFO 
>> o.a.d.e.w.f.AbstractStatusReporter - State changed for 
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:16. New state: FINISHED
>> 2015-05-27 18:43:25,005 
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:8] INFO 
>> o.a.d.e.w.fragment.FragmentExecutor - 
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:8: State change requested from 
>> RUNNING --> FINISHED for
>> 2015-05-27 18:43:25,005 
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:8] INFO 
>> o.a.d.e.w.f.AbstractStatusReporter - State changed for 
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:8. New state: FINISHED
>> 2015-05-27 18:43:54,539 
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:0] INFO 
>> o.a.d.e.w.fragment.FragmentExecutor - 
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:0: State change requested from 
>> RUNNING --> FINISHED for
>> 2015-05-27 18:43:54,540 
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:0] INFO 
>> o.a.d.e.w.f.AbstractStatusReporter - State changed for 
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:0. New state: FINISHED
>> 2015-05-27 18:43:59,947 
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:4] INFO 
>> o.a.d.e.w.fragment.FragmentExecutor - 
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:4: State change requested from 
>> RUNNING --> FINISHED for
>> 2015-05-27 18:43:59,947 
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:4] INFO 
>> o.a.d.e.w.f.AbstractStatusReporter - State changed for 
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:4. New state: FINISHED
>> ~~~
>>
>>
>> On 28 May 2015, at 13:42, Andries Engelbrecht wrote:
>>
>>> It should execute multi threaded, need to check on text file.
>>>
>>> Did you check the log files for any errors?
>>>
>>>
>>> On May 28, 2015, at 10:36 AM, Matt <bs...@gmail.com> wrote:
>>>
>>>>> The time seems pretty long for that file size. What type of file 
>>>>> is it?
>>>>
>>>> Tab delimited UTF-8 text.
>>>>
>>>> I left the query to run overnight to see if it would complete, but 
>>>> 24 hours for an import like this would indeed be too long.
>>>>
>>>>> Is the CTAS running single threaded?
>>>>
>>>> In the first hour, with this being the only client connected to the 
>>>> cluster, I observed activity on all 4 nodes.
>>>>
>>>> Is multi-threaded query execution the default? I would not have 
>>>> changed anything deliberately to force single thread execution.
>>>>
>>>>
>>>> On 28 May 2015, at 13:06, Andries Engelbrecht wrote:
>>>>
>>>>> The time seems pretty long for that file size. What type of file 
>>>>> is it?
>>>>>
>>>>> Is the CTAS running single threaded?
>>>>>
>>>>> —Andries
>>>>>
>>>>>
>>>>> On May 28, 2015, at 9:37 AM, Matt <bs...@gmail.com> wrote:
>>>>>
>>>>>>> How large is the data set you are working with, and your 
>>>>>>> cluster/nodes?
>>>>>>
>>>>>> Just testing with that single 44GB source file currently, and my 
>>>>>> test cluster is made from 4 nodes, each with 8 CPU cores, 32GB 
>>>>>> RAM, a 6TB Ext4 volume (RAID-10).
>>>>>>
>>>>>> Drill defaults left as come in v1.0. I will be adjusting memory 
>>>>>> and retrying the CTAS.
>>>>>>
>>>>>> I know I can / should assign individual disks to HDFS, but as a 
>>>>>> test cluster there are apps that expect data volumes to work on. 
>>>>>> A dedicated Hadoop production cluster would have a disk layout 
>>>>>> specific to the task.
>>>>>>
>>>>>>
>>>>>> On 28 May 2015, at 12:26, Andries Engelbrecht wrote:
>>>>>>
>>>>>>> Just check the drillbit.log and drillbit.out files in the log 
>>>>>>> directory.
>>>>>>> Before adjusting memory, see if that is an issue first. It was 
>>>>>>> for me, but as Jason mentioned there can be other causes as 
>>>>>>> well.
>>>>>>>
>>>>>>> You adjust memory allocation in the drill-env.sh files, and have 
>>>>>>> to restart the drill bits.
>>>>>>>
>>>>>>> How large is the data set you are working with, and your 
>>>>>>> cluster/nodes?
>>>>>>>
>>>>>>> —Andries
>>>>>>>
>>>>>>>
>>>>>>> On May 28, 2015, at 9:17 AM, Matt <bs...@gmail.com> wrote:
>>>>>>>
>>>>>>>> To make sure I am adjusting the correct config, these are heap 
>>>>>>>> parameters within the Drill configure path, not for Hadoop or 
>>>>>>>> Zookeeper?
>>>>>>>>
>>>>>>>>
>>>>>>>>> On May 28, 2015, at 12:08 PM, Jason Altekruse 
>>>>>>>>> <al...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> There should be no upper limit on the size of the tables you 
>>>>>>>>> can create
>>>>>>>>> with Drill. Be advised that Drill does currently operate 
>>>>>>>>> entirely
>>>>>>>>> optimistically in regards to available resources. If a network 
>>>>>>>>> connection
>>>>>>>>> between two drillbits fails during a query, we will not 
>>>>>>>>> currently
>>>>>>>>> re-schedule the work to make use of remaining nodes and 
>>>>>>>>> network connections
>>>>>>>>> that are still live. While we have had a good amount of 
>>>>>>>>> success using Drill
>>>>>>>>> for data conversion, be aware that these conditions could 
>>>>>>>>> cause long
>>>>>>>>> running queries to fail.
>>>>>>>>>
>>>>>>>>> That being said, it isn't the only possible cause for such a 
>>>>>>>>> failure. In
>>>>>>>>> the case of a network failure we would expect to see a message 
>>>>>>>>> returned to
>>>>>>>>> you that part of the query was unsuccessful and that it had 
>>>>>>>>> been cancelled.
>>>>>>>>> Andries has a good suggestion in regards to checking the heap 
>>>>>>>>> memory, this
>>>>>>>>> should also be detected and reported back to you at the CLI, 
>>>>>>>>> but we may be
>>>>>>>>> failing to propagate the error back to the head node for the 
>>>>>>>>> query. I
>>>>>>>>> believe writing parquet may still be the most heap-intensive 
>>>>>>>>> operation in
>>>>>>>>> Drill, despite our efforts to refactor the write path to use 
>>>>>>>>> direct memory
>>>>>>>>> instead of on-heap for large buffers needed in the process of 
>>>>>>>>> creating
>>>>>>>>> parquet files.
>>>>>>>>>
>>>>>>>>>> On Thu, May 28, 2015 at 8:43 AM, Matt <bs...@gmail.com> 
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Is 300MM records too much to do in a single CTAS statement?
>>>>>>>>>>
>>>>>>>>>> After almost 23 hours I killed the query (^c) and it 
>>>>>>>>>> returned:
>>>>>>>>>>
>>>>>>>>>> ~~~
>>>>>>>>>> +-----------+----------------------------+
>>>>>>>>>> | Fragment  | Number of records written  |
>>>>>>>>>> +-----------+----------------------------+
>>>>>>>>>> | 1_20      | 13568824                   |
>>>>>>>>>> | 1_15      | 12411822                   |
>>>>>>>>>> | 1_7       | 12470329                   |
>>>>>>>>>> | 1_12      | 13693867                   |
>>>>>>>>>> | 1_5       | 13292136                   |
>>>>>>>>>> | 1_18      | 13874321                   |
>>>>>>>>>> | 1_16      | 13303094                   |
>>>>>>>>>> | 1_9       | 13639049                   |
>>>>>>>>>> | 1_10      | 13698380                   |
>>>>>>>>>> | 1_22      | 13501073                   |
>>>>>>>>>> | 1_8       | 13533736                   |
>>>>>>>>>> | 1_2       | 13549402                   |
>>>>>>>>>> | 1_21      | 13665183                   |
>>>>>>>>>> | 1_0       | 13544745                   |
>>>>>>>>>> | 1_4       | 13532957                   |
>>>>>>>>>> | 1_19      | 12767473                   |
>>>>>>>>>> | 1_17      | 13670687                   |
>>>>>>>>>> | 1_13      | 13469515                   |
>>>>>>>>>> | 1_23      | 12517632                   |
>>>>>>>>>> | 1_6       | 13634338                   |
>>>>>>>>>> | 1_14      | 13611322                   |
>>>>>>>>>> | 1_3       | 13061900                   |
>>>>>>>>>> | 1_11      | 12760978                   |
>>>>>>>>>> +-----------+----------------------------+
>>>>>>>>>> 23 rows selected (82294.854 seconds)
>>>>>>>>>> ~~~
>>>>>>>>>>
>>>>>>>>>> The sum of those record counts is  306,772,763 which is close 
>>>>>>>>>> to the
>>>>>>>>>> 320,843,454 in the source file:
>>>>>>>>>>
>>>>>>>>>> ~~~
>>>>>>>>>> 0: jdbc:drill:zk=es05:2181> select count(*) FROM 
>>>>>>>>>> root.`sample_201501.dat`;
>>>>>>>>>> +------------+
>>>>>>>>>> |   EXPR$0   |
>>>>>>>>>> +------------+
>>>>>>>>>> | 320843454  |
>>>>>>>>>> +------------+
>>>>>>>>>> 1 row selected (384.665 seconds)
>>>>>>>>>> ~~~
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> It represents one month of data, 4 key columns and 38 numeric 
>>>>>>>>>> measure
>>>>>>>>>> columns, which could also be partitioned daily. The test here 
>>>>>>>>>> was to create
>>>>>>>>>> monthly Parquet files to see how the min/max stats on Parquet 
>>>>>>>>>> chunks help
>>>>>>>>>> with range select performance.
>>>>>>>>>>
>>>>>>>>>> Instead of a small number of large monthly RDBMS tables, I am 
>>>>>>>>>> attempting
>>>>>>>>>> to determine how many Parquet files should be used with Drill 
>>>>>>>>>> / HDFS.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 27 May 2015, at 15:17, Matt wrote:
>>>>>>>>>>
>>>>>>>>>> Attempting to create a Parquet backed table with a CTAS from 
>>>>>>>>>> an 44GB tab
>>>>>>>>>>> delimited file in HDFS. The process seemed to be running, as 
>>>>>>>>>>> CPU and IO was
>>>>>>>>>>> seen on all 4 nodes in this cluster, and .parquet files 
>>>>>>>>>>> being created in
>>>>>>>>>>> the expected path.
>>>>>>>>>>>
>>>>>>>>>>> In however in the last two hours or so, all nodes show near 
>>>>>>>>>>> zero CPU or
>>>>>>>>>>> IO, and the Last Modified date on the .parquet have not 
>>>>>>>>>>> changed. Same time
>>>>>>>>>>> delay shown in the Last Progress column in the active 
>>>>>>>>>>> fragment profile.
>>>>>>>>>>>
>>>>>>>>>>> What approach can I take to determine what is happening (or 
>>>>>>>>>>> not)?
>>>>>>>>>>
>>

Re: Monitoring long / stuck CTAS

Posted by Matt <bs...@gmail.com>.

That is a good point. The difference between the number of source rows, 
and those that made it into the parquet files is about the same count as 
the other fragments.

Indeed the query profile does show fragment 1_1 as CANCELED while the 
others all have State FINISHED. Additionally the other fragments have a 
runtime of less than 30 mins, where only fragment 1_1 lasted the 23 
hours before cancellation.


On 28 May 2015, at 16:43, Mehant Baid wrote:

> I think the problem might be related to a single laggard, looks like 
> we are waiting for one minor fragment to complete. Based on the output 
> you provided looks like the fragment 1_1 hasn't completed. You might 
> want to find out where the fragment was scheduled and what is going on 
> in that node. It might also be useful to look at the profile for that 
> minor fragment to see how much data has been processed.
>
>
> Thanks
> Mehant
>
> On 5/28/15 10:57 AM, Matt wrote:
>>> Did you check the log files for any errors?
>>
>> No messages related to this query containing errors or warning, nor 
>> nothing mentioning memory or heap. Querying now to determine what is 
>> missing in the parquet destination.
>>
>> drillbit.out on the master shows no error messages, and what looks 
>> like the last relevant line is:
>>
>> ~~~
>> May 27, 2015 6:43:50 PM INFO: 
>> parquet.hadoop.ColumnChunkPageWriteStore: written 2,258,263B for 
>> [bytes_1250] INT64: 3,069,414 values, 24,555,504B raw, 2,257,112B 
>> comp, 24 pages, encodings: [RLE, PLAIN, BIT_PACKED]
>> May 27, 2015 6:43:51 PM INFO: parquet.haMay 28, 2015 5:13:42 PM 
>> org.apache.calcite.sql.validate.SqlValidatorException <init>
>> ~~~
>>
>> The final lines in drillbit.log (which appear to use a different time 
>> format in the log) that contain the profile ID:
>>
>> ~~~
>> 2015-05-27 18:39:49,980 
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:20] INFO 
>> o.a.d.e.w.fragment.FragmentExecutor - 
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:20: State change requested 
>> from RUNNING --> FINISHED for
>> 2015-05-27 18:39:49,981 
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:20] INFO 
>> o.a.d.e.w.f.AbstractStatusReporter - State changed for 
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:20. New state: FINISHED
>> 2015-05-27 18:40:05,650 
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:12] INFO 
>> o.a.d.e.w.fragment.FragmentExecutor - 
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:12: State change requested 
>> from RUNNING --> FINISHED for
>> 2015-05-27 18:40:05,650 
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:12] INFO 
>> o.a.d.e.w.f.AbstractStatusReporter - State changed for 
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:12. New state: FINISHED
>> 2015-05-27 18:41:57,444 
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:16] INFO 
>> o.a.d.e.w.fragment.FragmentExecutor - 
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:16: State change requested 
>> from RUNNING --> FINISHED for
>> 2015-05-27 18:41:57,444 
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:16] INFO 
>> o.a.d.e.w.f.AbstractStatusReporter - State changed for 
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:16. New state: FINISHED
>> 2015-05-27 18:43:25,005 
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:8] INFO 
>> o.a.d.e.w.fragment.FragmentExecutor - 
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:8: State change requested from 
>> RUNNING --> FINISHED for
>> 2015-05-27 18:43:25,005 
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:8] INFO 
>> o.a.d.e.w.f.AbstractStatusReporter - State changed for 
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:8. New state: FINISHED
>> 2015-05-27 18:43:54,539 
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:0] INFO 
>> o.a.d.e.w.fragment.FragmentExecutor - 
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:0: State change requested from 
>> RUNNING --> FINISHED for
>> 2015-05-27 18:43:54,540 
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:0] INFO 
>> o.a.d.e.w.f.AbstractStatusReporter - State changed for 
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:0. New state: FINISHED
>> 2015-05-27 18:43:59,947 
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:4] INFO 
>> o.a.d.e.w.fragment.FragmentExecutor - 
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:4: State change requested from 
>> RUNNING --> FINISHED for
>> 2015-05-27 18:43:59,947 
>> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:4] INFO 
>> o.a.d.e.w.f.AbstractStatusReporter - State changed for 
>> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:4. New state: FINISHED
>> ~~~
>>
>>
>> On 28 May 2015, at 13:42, Andries Engelbrecht wrote:
>>
>>> It should execute multi threaded, need to check on text file.
>>>
>>> Did you check the log files for any errors?
>>>
>>>
>>> On May 28, 2015, at 10:36 AM, Matt <bs...@gmail.com> wrote:
>>>
>>>>> The time seems pretty long for that file size. What type of file 
>>>>> is it?
>>>>
>>>> Tab delimited UTF-8 text.
>>>>
>>>> I left the query to run overnight to see if it would complete, but 
>>>> 24 hours for an import like this would indeed be too long.
>>>>
>>>>> Is the CTAS running single threaded?
>>>>
>>>> In the first hour, with this being the only client connected to the 
>>>> cluster, I observed activity on all 4 nodes.
>>>>
>>>> Is multi-threaded query execution the default? I would not have 
>>>> changed anything deliberately to force single thread execution.
>>>>
>>>>
>>>> On 28 May 2015, at 13:06, Andries Engelbrecht wrote:
>>>>
>>>>> The time seems pretty long for that file size. What type of file 
>>>>> is it?
>>>>>
>>>>> Is the CTAS running single threaded?
>>>>>
>>>>> —Andries
>>>>>
>>>>>
>>>>> On May 28, 2015, at 9:37 AM, Matt <bs...@gmail.com> wrote:
>>>>>
>>>>>>> How large is the data set you are working with, and your 
>>>>>>> cluster/nodes?
>>>>>>
>>>>>> Just testing with that single 44GB source file currently, and my 
>>>>>> test cluster is made from 4 nodes, each with 8 CPU cores, 32GB 
>>>>>> RAM, a 6TB Ext4 volume (RAID-10).
>>>>>>
>>>>>> Drill defaults left as come in v1.0. I will be adjusting memory 
>>>>>> and retrying the CTAS.
>>>>>>
>>>>>> I know I can / should assign individual disks to HDFS, but as a 
>>>>>> test cluster there are apps that expect data volumes to work on. 
>>>>>> A dedicated Hadoop production cluster would have a disk layout 
>>>>>> specific to the task.
>>>>>>
>>>>>>
>>>>>> On 28 May 2015, at 12:26, Andries Engelbrecht wrote:
>>>>>>
>>>>>>> Just check the drillbit.log and drillbit.out files in the log 
>>>>>>> directory.
>>>>>>> Before adjusting memory, see if that is an issue first. It was 
>>>>>>> for me, but as Jason mentioned there can be other causes as 
>>>>>>> well.
>>>>>>>
>>>>>>> You adjust memory allocation in the drill-env.sh files, and have 
>>>>>>> to restart the drill bits.
>>>>>>>
>>>>>>> How large is the data set you are working with, and your 
>>>>>>> cluster/nodes?
>>>>>>>
>>>>>>> —Andries
>>>>>>>
>>>>>>>
>>>>>>> On May 28, 2015, at 9:17 AM, Matt <bs...@gmail.com> wrote:
>>>>>>>
>>>>>>>> To make sure I am adjusting the correct config, these are heap 
>>>>>>>> parameters within the Drill configure path, not for Hadoop or 
>>>>>>>> Zookeeper?
>>>>>>>>
>>>>>>>>
>>>>>>>>> On May 28, 2015, at 12:08 PM, Jason Altekruse 
>>>>>>>>> <al...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> There should be no upper limit on the size of the tables you 
>>>>>>>>> can create
>>>>>>>>> with Drill. Be advised that Drill does currently operate 
>>>>>>>>> entirely
>>>>>>>>> optimistically in regards to available resources. If a network 
>>>>>>>>> connection
>>>>>>>>> between two drillbits fails during a query, we will not 
>>>>>>>>> currently
>>>>>>>>> re-schedule the work to make use of remaining nodes and 
>>>>>>>>> network connections
>>>>>>>>> that are still live. While we have had a good amount of 
>>>>>>>>> success using Drill
>>>>>>>>> for data conversion, be aware that these conditions could 
>>>>>>>>> cause long
>>>>>>>>> running queries to fail.
>>>>>>>>>
>>>>>>>>> That being said, it isn't the only possible cause for such a 
>>>>>>>>> failure. In
>>>>>>>>> the case of a network failure we would expect to see a message 
>>>>>>>>> returned to
>>>>>>>>> you that part of the query was unsuccessful and that it had 
>>>>>>>>> been cancelled.
>>>>>>>>> Andries has a good suggestion in regards to checking the heap 
>>>>>>>>> memory, this
>>>>>>>>> should also be detected and reported back to you at the CLI, 
>>>>>>>>> but we may be
>>>>>>>>> failing to propagate the error back to the head node for the 
>>>>>>>>> query. I
>>>>>>>>> believe writing parquet may still be the most heap-intensive 
>>>>>>>>> operation in
>>>>>>>>> Drill, despite our efforts to refactor the write path to use 
>>>>>>>>> direct memory
>>>>>>>>> instead of on-heap for large buffers needed in the process of 
>>>>>>>>> creating
>>>>>>>>> parquet files.
>>>>>>>>>
>>>>>>>>>> On Thu, May 28, 2015 at 8:43 AM, Matt <bs...@gmail.com> 
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Is 300MM records too much to do in a single CTAS statement?
>>>>>>>>>>
>>>>>>>>>> After almost 23 hours I killed the query (^c) and it 
>>>>>>>>>> returned:
>>>>>>>>>>
>>>>>>>>>> ~~~
>>>>>>>>>> +-----------+----------------------------+
>>>>>>>>>> | Fragment  | Number of records written  |
>>>>>>>>>> +-----------+----------------------------+
>>>>>>>>>> | 1_20      | 13568824                   |
>>>>>>>>>> | 1_15      | 12411822                   |
>>>>>>>>>> | 1_7       | 12470329                   |
>>>>>>>>>> | 1_12      | 13693867                   |
>>>>>>>>>> | 1_5       | 13292136                   |
>>>>>>>>>> | 1_18      | 13874321                   |
>>>>>>>>>> | 1_16      | 13303094                   |
>>>>>>>>>> | 1_9       | 13639049                   |
>>>>>>>>>> | 1_10      | 13698380                   |
>>>>>>>>>> | 1_22      | 13501073                   |
>>>>>>>>>> | 1_8       | 13533736                   |
>>>>>>>>>> | 1_2       | 13549402                   |
>>>>>>>>>> | 1_21      | 13665183                   |
>>>>>>>>>> | 1_0       | 13544745                   |
>>>>>>>>>> | 1_4       | 13532957                   |
>>>>>>>>>> | 1_19      | 12767473                   |
>>>>>>>>>> | 1_17      | 13670687                   |
>>>>>>>>>> | 1_13      | 13469515                   |
>>>>>>>>>> | 1_23      | 12517632                   |
>>>>>>>>>> | 1_6       | 13634338                   |
>>>>>>>>>> | 1_14      | 13611322                   |
>>>>>>>>>> | 1_3       | 13061900                   |
>>>>>>>>>> | 1_11      | 12760978                   |
>>>>>>>>>> +-----------+----------------------------+
>>>>>>>>>> 23 rows selected (82294.854 seconds)
>>>>>>>>>> ~~~
>>>>>>>>>>
>>>>>>>>>> The sum of those record counts is  306,772,763 which is close 
>>>>>>>>>> to the
>>>>>>>>>> 320,843,454 in the source file:
>>>>>>>>>>
>>>>>>>>>> ~~~
>>>>>>>>>> 0: jdbc:drill:zk=es05:2181> select count(*) FROM 
>>>>>>>>>> root.`sample_201501.dat`;
>>>>>>>>>> +------------+
>>>>>>>>>> |   EXPR$0   |
>>>>>>>>>> +------------+
>>>>>>>>>> | 320843454  |
>>>>>>>>>> +------------+
>>>>>>>>>> 1 row selected (384.665 seconds)
>>>>>>>>>> ~~~
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> It represents one month of data, 4 key columns and 38 numeric 
>>>>>>>>>> measure
>>>>>>>>>> columns, which could also be partitioned daily. The test here 
>>>>>>>>>> was to create
>>>>>>>>>> monthly Parquet files to see how the min/max stats on Parquet 
>>>>>>>>>> chunks help
>>>>>>>>>> with range select performance.
>>>>>>>>>>
>>>>>>>>>> Instead of a small number of large monthly RDBMS tables, I am 
>>>>>>>>>> attempting
>>>>>>>>>> to determine how many Parquet files should be used with Drill 
>>>>>>>>>> / HDFS.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 27 May 2015, at 15:17, Matt wrote:
>>>>>>>>>>
>>>>>>>>>> Attempting to create a Parquet backed table with a CTAS from 
>>>>>>>>>> an 44GB tab
>>>>>>>>>>> delimited file in HDFS. The process seemed to be running, as 
>>>>>>>>>>> CPU and IO was
>>>>>>>>>>> seen on all 4 nodes in this cluster, and .parquet files 
>>>>>>>>>>> being created in
>>>>>>>>>>> the expected path.
>>>>>>>>>>>
>>>>>>>>>>> In however in the last two hours or so, all nodes show near 
>>>>>>>>>>> zero CPU or
>>>>>>>>>>> IO, and the Last Modified date on the .parquet have not 
>>>>>>>>>>> changed. Same time
>>>>>>>>>>> delay shown in the Last Progress column in the active 
>>>>>>>>>>> fragment profile.
>>>>>>>>>>>
>>>>>>>>>>> What approach can I take to determine what is happening (or 
>>>>>>>>>>> not)?
>>>>>>>>>>
>>

Re: Monitoring long / stuck CTAS

Posted by Mehant Baid <ba...@gmail.com>.

I think the problem might be related to a single laggard, looks like we 
are waiting for one minor fragment to complete. Based on the output you 
provided looks like the fragment 1_1 hasn't completed. You might want to 
find out where the fragment was scheduled and what is going on in that 
node. It might also be useful to look at the profile for that minor 
fragment to see how much data has been processed.


Thanks
Mehant

On 5/28/15 10:57 AM, Matt wrote:
>> Did you check the log files for any errors?
>
> No messages related to this query containing errors or warning, nor 
> nothing mentioning memory or heap. Querying now to determine what is 
> missing in the parquet destination.
>
> drillbit.out on the master shows no error messages, and what looks 
> like the last relevant line is:
>
> ~~~
> May 27, 2015 6:43:50 PM INFO: 
> parquet.hadoop.ColumnChunkPageWriteStore: written 2,258,263B for 
> [bytes_1250] INT64: 3,069,414 values, 24,555,504B raw, 2,257,112B 
> comp, 24 pages, encodings: [RLE, PLAIN, BIT_PACKED]
> May 27, 2015 6:43:51 PM INFO: parquet.haMay 28, 2015 5:13:42 PM 
> org.apache.calcite.sql.validate.SqlValidatorException <init>
> ~~~
>
> The final lines in drillbit.log (which appear to use a different time 
> format in the log) that contain the profile ID:
>
> ~~~
> 2015-05-27 18:39:49,980 
> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:20] INFO 
> o.a.d.e.w.fragment.FragmentExecutor - 
> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:20: State change requested from 
> RUNNING --> FINISHED for
> 2015-05-27 18:39:49,981 
> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:20] INFO 
> o.a.d.e.w.f.AbstractStatusReporter - State changed for 
> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:20. New state: FINISHED
> 2015-05-27 18:40:05,650 
> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:12] INFO 
> o.a.d.e.w.fragment.FragmentExecutor - 
> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:12: State change requested from 
> RUNNING --> FINISHED for
> 2015-05-27 18:40:05,650 
> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:12] INFO 
> o.a.d.e.w.f.AbstractStatusReporter - State changed for 
> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:12. New state: FINISHED
> 2015-05-27 18:41:57,444 
> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:16] INFO 
> o.a.d.e.w.fragment.FragmentExecutor - 
> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:16: State change requested from 
> RUNNING --> FINISHED for
> 2015-05-27 18:41:57,444 
> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:16] INFO 
> o.a.d.e.w.f.AbstractStatusReporter - State changed for 
> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:16. New state: FINISHED
> 2015-05-27 18:43:25,005 
> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:8] INFO 
> o.a.d.e.w.fragment.FragmentExecutor - 
> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:8: State change requested from 
> RUNNING --> FINISHED for
> 2015-05-27 18:43:25,005 
> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:8] INFO 
> o.a.d.e.w.f.AbstractStatusReporter - State changed for 
> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:8. New state: FINISHED
> 2015-05-27 18:43:54,539 
> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:0] INFO 
> o.a.d.e.w.fragment.FragmentExecutor - 
> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:0: State change requested from 
> RUNNING --> FINISHED for
> 2015-05-27 18:43:54,540 
> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:0] INFO 
> o.a.d.e.w.f.AbstractStatusReporter - State changed for 
> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:0. New state: FINISHED
> 2015-05-27 18:43:59,947 
> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:4] INFO 
> o.a.d.e.w.fragment.FragmentExecutor - 
> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:4: State change requested from 
> RUNNING --> FINISHED for
> 2015-05-27 18:43:59,947 
> [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:4] INFO 
> o.a.d.e.w.f.AbstractStatusReporter - State changed for 
> 2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:4. New state: FINISHED
> ~~~
>
>
> On 28 May 2015, at 13:42, Andries Engelbrecht wrote:
>
>> It should execute multi threaded, need to check on text file.
>>
>> Did you check the log files for any errors?
>>
>>
>> On May 28, 2015, at 10:36 AM, Matt <bs...@gmail.com> wrote:
>>
>>>> The time seems pretty long for that file size. What type of file is 
>>>> it?
>>>
>>> Tab delimited UTF-8 text.
>>>
>>> I left the query to run overnight to see if it would complete, but 
>>> 24 hours for an import like this would indeed be too long.
>>>
>>>> Is the CTAS running single threaded?
>>>
>>> In the first hour, with this being the only client connected to the 
>>> cluster, I observed activity on all 4 nodes.
>>>
>>> Is multi-threaded query execution the default? I would not have 
>>> changed anything deliberately to force single thread execution.
>>>
>>>
>>> On 28 May 2015, at 13:06, Andries Engelbrecht wrote:
>>>
>>>> The time seems pretty long for that file size. What type of file is 
>>>> it?
>>>>
>>>> Is the CTAS running single threaded?
>>>>
>>>> —Andries
>>>>
>>>>
>>>> On May 28, 2015, at 9:37 AM, Matt <bs...@gmail.com> wrote:
>>>>
>>>>>> How large is the data set you are working with, and your 
>>>>>> cluster/nodes?
>>>>>
>>>>> Just testing with that single 44GB source file currently, and my 
>>>>> test cluster is made from 4 nodes, each with 8 CPU cores, 32GB 
>>>>> RAM, a 6TB Ext4 volume (RAID-10).
>>>>>
>>>>> Drill defaults left as come in v1.0. I will be adjusting memory 
>>>>> and retrying the CTAS.
>>>>>
>>>>> I know I can / should assign individual disks to HDFS, but as a 
>>>>> test cluster there are apps that expect data volumes to work on. A 
>>>>> dedicated Hadoop production cluster would have a disk layout 
>>>>> specific to the task.
>>>>>
>>>>>
>>>>> On 28 May 2015, at 12:26, Andries Engelbrecht wrote:
>>>>>
>>>>>> Just check the drillbit.log and drillbit.out files in the log 
>>>>>> directory.
>>>>>> Before adjusting memory, see if that is an issue first. It was 
>>>>>> for me, but as Jason mentioned there can be other causes as well.
>>>>>>
>>>>>> You adjust memory allocation in the drill-env.sh files, and have 
>>>>>> to restart the drill bits.
>>>>>>
>>>>>> How large is the data set you are working with, and your 
>>>>>> cluster/nodes?
>>>>>>
>>>>>> —Andries
>>>>>>
>>>>>>
>>>>>> On May 28, 2015, at 9:17 AM, Matt <bs...@gmail.com> wrote:
>>>>>>
>>>>>>> To make sure I am adjusting the correct config, these are heap 
>>>>>>> parameters within the Drill configure path, not for Hadoop or 
>>>>>>> Zookeeper?
>>>>>>>
>>>>>>>
>>>>>>>> On May 28, 2015, at 12:08 PM, Jason Altekruse 
>>>>>>>> <al...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> There should be no upper limit on the size of the tables you 
>>>>>>>> can create
>>>>>>>> with Drill. Be advised that Drill does currently operate entirely
>>>>>>>> optimistically in regards to available resources. If a network 
>>>>>>>> connection
>>>>>>>> between two drillbits fails during a query, we will not currently
>>>>>>>> re-schedule the work to make use of remaining nodes and network 
>>>>>>>> connections
>>>>>>>> that are still live. While we have had a good amount of success 
>>>>>>>> using Drill
>>>>>>>> for data conversion, be aware that these conditions could cause 
>>>>>>>> long
>>>>>>>> running queries to fail.
>>>>>>>>
>>>>>>>> That being said, it isn't the only possible cause for such a 
>>>>>>>> failure. In
>>>>>>>> the case of a network failure we would expect to see a message 
>>>>>>>> returned to
>>>>>>>> you that part of the query was unsuccessful and that it had 
>>>>>>>> been cancelled.
>>>>>>>> Andries has a good suggestion in regards to checking the heap 
>>>>>>>> memory, this
>>>>>>>> should also be detected and reported back to you at the CLI, 
>>>>>>>> but we may be
>>>>>>>> failing to propagate the error back to the head node for the 
>>>>>>>> query. I
>>>>>>>> believe writing parquet may still be the most heap-intensive 
>>>>>>>> operation in
>>>>>>>> Drill, despite our efforts to refactor the write path to use 
>>>>>>>> direct memory
>>>>>>>> instead of on-heap for large buffers needed in the process of 
>>>>>>>> creating
>>>>>>>> parquet files.
>>>>>>>>
>>>>>>>>> On Thu, May 28, 2015 at 8:43 AM, Matt <bs...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> Is 300MM records too much to do in a single CTAS statement?
>>>>>>>>>
>>>>>>>>> After almost 23 hours I killed the query (^c) and it returned:
>>>>>>>>>
>>>>>>>>> ~~~
>>>>>>>>> +-----------+----------------------------+
>>>>>>>>> | Fragment  | Number of records written  |
>>>>>>>>> +-----------+----------------------------+
>>>>>>>>> | 1_20      | 13568824                   |
>>>>>>>>> | 1_15      | 12411822                   |
>>>>>>>>> | 1_7       | 12470329                   |
>>>>>>>>> | 1_12      | 13693867                   |
>>>>>>>>> | 1_5       | 13292136                   |
>>>>>>>>> | 1_18      | 13874321                   |
>>>>>>>>> | 1_16      | 13303094                   |
>>>>>>>>> | 1_9       | 13639049                   |
>>>>>>>>> | 1_10      | 13698380                   |
>>>>>>>>> | 1_22      | 13501073                   |
>>>>>>>>> | 1_8       | 13533736                   |
>>>>>>>>> | 1_2       | 13549402                   |
>>>>>>>>> | 1_21      | 13665183                   |
>>>>>>>>> | 1_0       | 13544745                   |
>>>>>>>>> | 1_4       | 13532957                   |
>>>>>>>>> | 1_19      | 12767473                   |
>>>>>>>>> | 1_17      | 13670687                   |
>>>>>>>>> | 1_13      | 13469515                   |
>>>>>>>>> | 1_23      | 12517632                   |
>>>>>>>>> | 1_6       | 13634338                   |
>>>>>>>>> | 1_14      | 13611322                   |
>>>>>>>>> | 1_3       | 13061900                   |
>>>>>>>>> | 1_11      | 12760978                   |
>>>>>>>>> +-----------+----------------------------+
>>>>>>>>> 23 rows selected (82294.854 seconds)
>>>>>>>>> ~~~
>>>>>>>>>
>>>>>>>>> The sum of those record counts is  306,772,763 which is close 
>>>>>>>>> to the
>>>>>>>>> 320,843,454 in the source file:
>>>>>>>>>
>>>>>>>>> ~~~
>>>>>>>>> 0: jdbc:drill:zk=es05:2181> select count(*) FROM 
>>>>>>>>> root.`sample_201501.dat`;
>>>>>>>>> +------------+
>>>>>>>>> |   EXPR$0   |
>>>>>>>>> +------------+
>>>>>>>>> | 320843454  |
>>>>>>>>> +------------+
>>>>>>>>> 1 row selected (384.665 seconds)
>>>>>>>>> ~~~
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> It represents one month of data, 4 key columns and 38 numeric 
>>>>>>>>> measure
>>>>>>>>> columns, which could also be partitioned daily. The test here 
>>>>>>>>> was to create
>>>>>>>>> monthly Parquet files to see how the min/max stats on Parquet 
>>>>>>>>> chunks help
>>>>>>>>> with range select performance.
>>>>>>>>>
>>>>>>>>> Instead of a small number of large monthly RDBMS tables, I am 
>>>>>>>>> attempting
>>>>>>>>> to determine how many Parquet files should be used with Drill 
>>>>>>>>> / HDFS.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 27 May 2015, at 15:17, Matt wrote:
>>>>>>>>>
>>>>>>>>> Attempting to create a Parquet backed table with a CTAS from 
>>>>>>>>> an 44GB tab
>>>>>>>>>> delimited file in HDFS. The process seemed to be running, as 
>>>>>>>>>> CPU and IO was
>>>>>>>>>> seen on all 4 nodes in this cluster, and .parquet files being 
>>>>>>>>>> created in
>>>>>>>>>> the expected path.
>>>>>>>>>>
>>>>>>>>>> In however in the last two hours or so, all nodes show near 
>>>>>>>>>> zero CPU or
>>>>>>>>>> IO, and the Last Modified date on the .parquet have not 
>>>>>>>>>> changed. Same time
>>>>>>>>>> delay shown in the Last Progress column in the active 
>>>>>>>>>> fragment profile.
>>>>>>>>>>
>>>>>>>>>> What approach can I take to determine what is happening (or 
>>>>>>>>>> not)?
>>>>>>>>>
>

Re: Monitoring long / stuck CTAS

Posted by Matt <bs...@gmail.com>.

> Did you check the log files for any errors?

No messages related to this query containing errors or warning, nor 
nothing mentioning memory or heap. Querying now to determine what is 
missing in the parquet destination.

drillbit.out on the master shows no error messages, and what looks like 
the last relevant line is:

~~~
May 27, 2015 6:43:50 PM INFO: parquet.hadoop.ColumnChunkPageWriteStore: 
written 2,258,263B for [bytes_1250] INT64: 3,069,414 values, 24,555,504B 
raw, 2,257,112B comp, 24 pages, encodings: [RLE, PLAIN, BIT_PACKED]
May 27, 2015 6:43:51 PM INFO: parquet.haMay 28, 2015 5:13:42 PM 
org.apache.calcite.sql.validate.SqlValidatorException <init>
~~~

The final lines in drillbit.log (which appear to use a different time 
format in the log) that contain the profile ID:

~~~
2015-05-27 18:39:49,980 [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:20] 
INFO  o.a.d.e.w.fragment.FragmentExecutor - 
2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:20: State change requested from 
RUNNING --> FINISHED for
2015-05-27 18:39:49,981 [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:20] 
INFO  o.a.d.e.w.f.AbstractStatusReporter - State changed for 
2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:20. New state: FINISHED
2015-05-27 18:40:05,650 [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:12] 
INFO  o.a.d.e.w.fragment.FragmentExecutor - 
2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:12: State change requested from 
RUNNING --> FINISHED for
2015-05-27 18:40:05,650 [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:12] 
INFO  o.a.d.e.w.f.AbstractStatusReporter - State changed for 
2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:12. New state: FINISHED
2015-05-27 18:41:57,444 [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:16] 
INFO  o.a.d.e.w.fragment.FragmentExecutor - 
2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:16: State change requested from 
RUNNING --> FINISHED for
2015-05-27 18:41:57,444 [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:16] 
INFO  o.a.d.e.w.f.AbstractStatusReporter - State changed for 
2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:16. New state: FINISHED
2015-05-27 18:43:25,005 [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:8] 
INFO  o.a.d.e.w.fragment.FragmentExecutor - 
2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:8: State change requested from 
RUNNING --> FINISHED for
2015-05-27 18:43:25,005 [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:8] 
INFO  o.a.d.e.w.f.AbstractStatusReporter - State changed for 
2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:8. New state: FINISHED
2015-05-27 18:43:54,539 [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:0] 
INFO  o.a.d.e.w.fragment.FragmentExecutor - 
2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:0: State change requested from 
RUNNING --> FINISHED for
2015-05-27 18:43:54,540 [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:0] 
INFO  o.a.d.e.w.f.AbstractStatusReporter - State changed for 
2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:0. New state: FINISHED
2015-05-27 18:43:59,947 [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:4] 
INFO  o.a.d.e.w.fragment.FragmentExecutor - 
2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:4: State change requested from 
RUNNING --> FINISHED for
2015-05-27 18:43:59,947 [2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:frag:1:4] 
INFO  o.a.d.e.w.f.AbstractStatusReporter - State changed for 
2a9a10ec-6f96-5dc5-54fc-dc5295a77e42:1:4. New state: FINISHED
~~~


On 28 May 2015, at 13:42, Andries Engelbrecht wrote:

> It should execute multi threaded, need to check on text file.
>
> Did you check the log files for any errors?
>
>
> On May 28, 2015, at 10:36 AM, Matt <bs...@gmail.com> wrote:
>
>>> The time seems pretty long for that file size. What type of file is 
>>> it?
>>
>> Tab delimited UTF-8 text.
>>
>> I left the query to run overnight to see if it would complete, but 24 
>> hours for an import like this would indeed be too long.
>>
>>> Is the CTAS running single threaded?
>>
>> In the first hour, with this being the only client connected to the 
>> cluster, I observed activity on all 4 nodes.
>>
>> Is multi-threaded query execution the default? I would not have 
>> changed anything deliberately to force single thread execution.
>>
>>
>> On 28 May 2015, at 13:06, Andries Engelbrecht wrote:
>>
>>> The time seems pretty long for that file size. What type of file is 
>>> it?
>>>
>>> Is the CTAS running single threaded?
>>>
>>> —Andries
>>>
>>>
>>> On May 28, 2015, at 9:37 AM, Matt <bs...@gmail.com> wrote:
>>>
>>>>> How large is the data set you are working with, and your 
>>>>> cluster/nodes?
>>>>
>>>> Just testing with that single 44GB source file currently, and my 
>>>> test cluster is made from 4 nodes, each with 8 CPU cores, 32GB RAM, 
>>>> a 6TB Ext4 volume (RAID-10).
>>>>
>>>> Drill defaults left as come in v1.0. I will be adjusting memory and 
>>>> retrying the CTAS.
>>>>
>>>> I know I can / should assign individual disks to HDFS, but as a 
>>>> test cluster there are apps that expect data volumes to work on. A 
>>>> dedicated Hadoop production cluster would have a disk layout 
>>>> specific to the task.
>>>>
>>>>
>>>> On 28 May 2015, at 12:26, Andries Engelbrecht wrote:
>>>>
>>>>> Just check the drillbit.log and drillbit.out files in the log 
>>>>> directory.
>>>>> Before adjusting memory, see if that is an issue first. It was for 
>>>>> me, but as Jason mentioned there can be other causes as well.
>>>>>
>>>>> You adjust memory allocation in the drill-env.sh files, and have 
>>>>> to restart the drill bits.
>>>>>
>>>>> How large is the data set you are working with, and your 
>>>>> cluster/nodes?
>>>>>
>>>>> —Andries
>>>>>
>>>>>
>>>>> On May 28, 2015, at 9:17 AM, Matt <bs...@gmail.com> wrote:
>>>>>
>>>>>> To make sure I am adjusting the correct config, these are heap 
>>>>>> parameters within the Drill configure path, not for Hadoop or 
>>>>>> Zookeeper?
>>>>>>
>>>>>>
>>>>>>> On May 28, 2015, at 12:08 PM, Jason Altekruse 
>>>>>>> <al...@gmail.com> wrote:
>>>>>>>
>>>>>>> There should be no upper limit on the size of the tables you can 
>>>>>>> create
>>>>>>> with Drill. Be advised that Drill does currently operate 
>>>>>>> entirely
>>>>>>> optimistically in regards to available resources. If a network 
>>>>>>> connection
>>>>>>> between two drillbits fails during a query, we will not 
>>>>>>> currently
>>>>>>> re-schedule the work to make use of remaining nodes and network 
>>>>>>> connections
>>>>>>> that are still live. While we have had a good amount of success 
>>>>>>> using Drill
>>>>>>> for data conversion, be aware that these conditions could cause 
>>>>>>> long
>>>>>>> running queries to fail.
>>>>>>>
>>>>>>> That being said, it isn't the only possible cause for such a 
>>>>>>> failure. In
>>>>>>> the case of a network failure we would expect to see a message 
>>>>>>> returned to
>>>>>>> you that part of the query was unsuccessful and that it had been 
>>>>>>> cancelled.
>>>>>>> Andries has a good suggestion in regards to checking the heap 
>>>>>>> memory, this
>>>>>>> should also be detected and reported back to you at the CLI, but 
>>>>>>> we may be
>>>>>>> failing to propagate the error back to the head node for the 
>>>>>>> query. I
>>>>>>> believe writing parquet may still be the most heap-intensive 
>>>>>>> operation in
>>>>>>> Drill, despite our efforts to refactor the write path to use 
>>>>>>> direct memory
>>>>>>> instead of on-heap for large buffers needed in the process of 
>>>>>>> creating
>>>>>>> parquet files.
>>>>>>>
>>>>>>>> On Thu, May 28, 2015 at 8:43 AM, Matt <bs...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> Is 300MM records too much to do in a single CTAS statement?
>>>>>>>>
>>>>>>>> After almost 23 hours I killed the query (^c) and it returned:
>>>>>>>>
>>>>>>>> ~~~
>>>>>>>> +-----------+----------------------------+
>>>>>>>> | Fragment  | Number of records written  |
>>>>>>>> +-----------+----------------------------+
>>>>>>>> | 1_20      | 13568824                   |
>>>>>>>> | 1_15      | 12411822                   |
>>>>>>>> | 1_7       | 12470329                   |
>>>>>>>> | 1_12      | 13693867                   |
>>>>>>>> | 1_5       | 13292136                   |
>>>>>>>> | 1_18      | 13874321                   |
>>>>>>>> | 1_16      | 13303094                   |
>>>>>>>> | 1_9       | 13639049                   |
>>>>>>>> | 1_10      | 13698380                   |
>>>>>>>> | 1_22      | 13501073                   |
>>>>>>>> | 1_8       | 13533736                   |
>>>>>>>> | 1_2       | 13549402                   |
>>>>>>>> | 1_21      | 13665183                   |
>>>>>>>> | 1_0       | 13544745                   |
>>>>>>>> | 1_4       | 13532957                   |
>>>>>>>> | 1_19      | 12767473                   |
>>>>>>>> | 1_17      | 13670687                   |
>>>>>>>> | 1_13      | 13469515                   |
>>>>>>>> | 1_23      | 12517632                   |
>>>>>>>> | 1_6       | 13634338                   |
>>>>>>>> | 1_14      | 13611322                   |
>>>>>>>> | 1_3       | 13061900                   |
>>>>>>>> | 1_11      | 12760978                   |
>>>>>>>> +-----------+----------------------------+
>>>>>>>> 23 rows selected (82294.854 seconds)
>>>>>>>> ~~~
>>>>>>>>
>>>>>>>> The sum of those record counts is  306,772,763 which is close 
>>>>>>>> to the
>>>>>>>> 320,843,454 in the source file:
>>>>>>>>
>>>>>>>> ~~~
>>>>>>>> 0: jdbc:drill:zk=es05:2181> select count(*)  FROM 
>>>>>>>> root.`sample_201501.dat`;
>>>>>>>> +------------+
>>>>>>>> |   EXPR$0   |
>>>>>>>> +------------+
>>>>>>>> | 320843454  |
>>>>>>>> +------------+
>>>>>>>> 1 row selected (384.665 seconds)
>>>>>>>> ~~~
>>>>>>>>
>>>>>>>>
>>>>>>>> It represents one month of data, 4 key columns and 38 numeric 
>>>>>>>> measure
>>>>>>>> columns, which could also be partitioned daily. The test here 
>>>>>>>> was to create
>>>>>>>> monthly Parquet files to see how the min/max stats on Parquet 
>>>>>>>> chunks help
>>>>>>>> with range select performance.
>>>>>>>>
>>>>>>>> Instead of a small number of large monthly RDBMS tables, I am 
>>>>>>>> attempting
>>>>>>>> to determine how many Parquet files should be used with Drill / 
>>>>>>>> HDFS.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 27 May 2015, at 15:17, Matt wrote:
>>>>>>>>
>>>>>>>> Attempting to create a Parquet backed table with a CTAS from an 
>>>>>>>> 44GB tab
>>>>>>>>> delimited file in HDFS. The process seemed to be running, as 
>>>>>>>>> CPU and IO was
>>>>>>>>> seen on all 4 nodes in this cluster, and .parquet files being 
>>>>>>>>> created in
>>>>>>>>> the expected path.
>>>>>>>>>
>>>>>>>>> In however in the last two hours or so, all nodes show near 
>>>>>>>>> zero CPU or
>>>>>>>>> IO, and the Last Modified date on the .parquet have not 
>>>>>>>>> changed. Same time
>>>>>>>>> delay shown in the Last Progress column in the active fragment 
>>>>>>>>> profile.
>>>>>>>>>
>>>>>>>>> What approach can I take to determine what is happening (or 
>>>>>>>>> not)?
>>>>>>>>

Re: Monitoring long / stuck CTAS

Posted by Andries Engelbrecht <ae...@maprtech.com>.

It should execute multi threaded, need to check on text file.

Did you check the log files for any errors?


On May 28, 2015, at 10:36 AM, Matt <bs...@gmail.com> wrote:

>> The time seems pretty long for that file size. What type of file is it?
> 
> Tab delimited UTF-8 text.
> 
> I left the query to run overnight to see if it would complete, but 24 hours for an import like this would indeed be too long.
> 
>> Is the CTAS running single threaded?
> 
> In the first hour, with this being the only client connected to the cluster, I observed activity on all 4 nodes.
> 
> Is multi-threaded query execution the default? I would not have changed anything deliberately to force single thread execution.
> 
> 
> On 28 May 2015, at 13:06, Andries Engelbrecht wrote:
> 
>> The time seems pretty long for that file size. What type of file is it?
>> 
>> Is the CTAS running single threaded?
>> 
>> —Andries
>> 
>> 
>> On May 28, 2015, at 9:37 AM, Matt <bs...@gmail.com> wrote:
>> 
>>>> How large is the data set you are working with, and your cluster/nodes?
>>> 
>>> Just testing with that single 44GB source file currently, and my test cluster is made from 4 nodes, each with 8 CPU cores, 32GB RAM, a 6TB Ext4 volume (RAID-10).
>>> 
>>> Drill defaults left as come in v1.0. I will be adjusting memory and retrying the CTAS.
>>> 
>>> I know I can / should assign individual disks to HDFS, but as a test cluster there are apps that expect data volumes to work on. A dedicated Hadoop production cluster would have a disk layout specific to the task.
>>> 
>>> 
>>> On 28 May 2015, at 12:26, Andries Engelbrecht wrote:
>>> 
>>>> Just check the drillbit.log and drillbit.out files in the log directory.
>>>> Before adjusting memory, see if that is an issue first. It was for me, but as Jason mentioned there can be other causes as well.
>>>> 
>>>> You adjust memory allocation in the drill-env.sh files, and have to restart the drill bits.
>>>> 
>>>> How large is the data set you are working with, and your cluster/nodes?
>>>> 
>>>> —Andries
>>>> 
>>>> 
>>>> On May 28, 2015, at 9:17 AM, Matt <bs...@gmail.com> wrote:
>>>> 
>>>>> To make sure I am adjusting the correct config, these are heap parameters within the Drill configure path, not for Hadoop or Zookeeper?
>>>>> 
>>>>> 
>>>>>> On May 28, 2015, at 12:08 PM, Jason Altekruse <al...@gmail.com> wrote:
>>>>>> 
>>>>>> There should be no upper limit on the size of the tables you can create
>>>>>> with Drill. Be advised that Drill does currently operate entirely
>>>>>> optimistically in regards to available resources. If a network connection
>>>>>> between two drillbits fails during a query, we will not currently
>>>>>> re-schedule the work to make use of remaining nodes and network connections
>>>>>> that are still live. While we have had a good amount of success using Drill
>>>>>> for data conversion, be aware that these conditions could cause long
>>>>>> running queries to fail.
>>>>>> 
>>>>>> That being said, it isn't the only possible cause for such a failure. In
>>>>>> the case of a network failure we would expect to see a message returned to
>>>>>> you that part of the query was unsuccessful and that it had been cancelled.
>>>>>> Andries has a good suggestion in regards to checking the heap memory, this
>>>>>> should also be detected and reported back to you at the CLI, but we may be
>>>>>> failing to propagate the error back to the head node for the query. I
>>>>>> believe writing parquet may still be the most heap-intensive operation in
>>>>>> Drill, despite our efforts to refactor the write path to use direct memory
>>>>>> instead of on-heap for large buffers needed in the process of creating
>>>>>> parquet files.
>>>>>> 
>>>>>>> On Thu, May 28, 2015 at 8:43 AM, Matt <bs...@gmail.com> wrote:
>>>>>>> 
>>>>>>> Is 300MM records too much to do in a single CTAS statement?
>>>>>>> 
>>>>>>> After almost 23 hours I killed the query (^c) and it returned:
>>>>>>> 
>>>>>>> ~~~
>>>>>>> +-----------+----------------------------+
>>>>>>> | Fragment  | Number of records written  |
>>>>>>> +-----------+----------------------------+
>>>>>>> | 1_20      | 13568824                   |
>>>>>>> | 1_15      | 12411822                   |
>>>>>>> | 1_7       | 12470329                   |
>>>>>>> | 1_12      | 13693867                   |
>>>>>>> | 1_5       | 13292136                   |
>>>>>>> | 1_18      | 13874321                   |
>>>>>>> | 1_16      | 13303094                   |
>>>>>>> | 1_9       | 13639049                   |
>>>>>>> | 1_10      | 13698380                   |
>>>>>>> | 1_22      | 13501073                   |
>>>>>>> | 1_8       | 13533736                   |
>>>>>>> | 1_2       | 13549402                   |
>>>>>>> | 1_21      | 13665183                   |
>>>>>>> | 1_0       | 13544745                   |
>>>>>>> | 1_4       | 13532957                   |
>>>>>>> | 1_19      | 12767473                   |
>>>>>>> | 1_17      | 13670687                   |
>>>>>>> | 1_13      | 13469515                   |
>>>>>>> | 1_23      | 12517632                   |
>>>>>>> | 1_6       | 13634338                   |
>>>>>>> | 1_14      | 13611322                   |
>>>>>>> | 1_3       | 13061900                   |
>>>>>>> | 1_11      | 12760978                   |
>>>>>>> +-----------+----------------------------+
>>>>>>> 23 rows selected (82294.854 seconds)
>>>>>>> ~~~
>>>>>>> 
>>>>>>> The sum of those record counts is  306,772,763 which is close to the
>>>>>>> 320,843,454 in the source file:
>>>>>>> 
>>>>>>> ~~~
>>>>>>> 0: jdbc:drill:zk=es05:2181> select count(*)  FROM root.`sample_201501.dat`;
>>>>>>> +------------+
>>>>>>> |   EXPR$0   |
>>>>>>> +------------+
>>>>>>> | 320843454  |
>>>>>>> +------------+
>>>>>>> 1 row selected (384.665 seconds)
>>>>>>> ~~~
>>>>>>> 
>>>>>>> 
>>>>>>> It represents one month of data, 4 key columns and 38 numeric measure
>>>>>>> columns, which could also be partitioned daily. The test here was to create
>>>>>>> monthly Parquet files to see how the min/max stats on Parquet chunks help
>>>>>>> with range select performance.
>>>>>>> 
>>>>>>> Instead of a small number of large monthly RDBMS tables, I am attempting
>>>>>>> to determine how many Parquet files should be used with Drill / HDFS.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On 27 May 2015, at 15:17, Matt wrote:
>>>>>>> 
>>>>>>> Attempting to create a Parquet backed table with a CTAS from an 44GB tab
>>>>>>>> delimited file in HDFS. The process seemed to be running, as CPU and IO was
>>>>>>>> seen on all 4 nodes in this cluster, and .parquet files being created in
>>>>>>>> the expected path.
>>>>>>>> 
>>>>>>>> In however in the last two hours or so, all nodes show near zero CPU or
>>>>>>>> IO, and the Last Modified date on the .parquet have not changed. Same time
>>>>>>>> delay shown in the Last Progress column in the active fragment profile.
>>>>>>>> 
>>>>>>>> What approach can I take to determine what is happening (or not)?
>>>>>>>

Re: Monitoring long / stuck CTAS

Posted by Matt <bs...@gmail.com>.

> The time seems pretty long for that file size. What type of file is 
> it?

Tab delimited UTF-8 text.

I left the query to run overnight to see if it would complete, but 24 
hours for an import like this would indeed be too long.

> Is the CTAS running single threaded?

In the first hour, with this being the only client connected to the 
cluster, I observed activity on all 4 nodes.

Is multi-threaded query execution the default? I would not have changed 
anything deliberately to force single thread execution.


On 28 May 2015, at 13:06, Andries Engelbrecht wrote:

> The time seems pretty long for that file size. What type of file is 
> it?
>
> Is the CTAS running single threaded?
>
> —Andries
>
>
> On May 28, 2015, at 9:37 AM, Matt <bs...@gmail.com> wrote:
>
>>> How large is the data set you are working with, and your 
>>> cluster/nodes?
>>
>> Just testing with that single 44GB source file currently, and my test 
>> cluster is made from 4 nodes, each with 8 CPU cores, 32GB RAM, a 6TB 
>> Ext4 volume (RAID-10).
>>
>> Drill defaults left as come in v1.0. I will be adjusting memory and 
>> retrying the CTAS.
>>
>> I know I can / should assign individual disks to HDFS, but as a test 
>> cluster there are apps that expect data volumes to work on. A 
>> dedicated Hadoop production cluster would have a disk layout specific 
>> to the task.
>>
>>
>> On 28 May 2015, at 12:26, Andries Engelbrecht wrote:
>>
>>> Just check the drillbit.log and drillbit.out files in the log 
>>> directory.
>>> Before adjusting memory, see if that is an issue first. It was for 
>>> me, but as Jason mentioned there can be other causes as well.
>>>
>>> You adjust memory allocation in the drill-env.sh files, and have to 
>>> restart the drill bits.
>>>
>>> How large is the data set you are working with, and your 
>>> cluster/nodes?
>>>
>>> —Andries
>>>
>>>
>>> On May 28, 2015, at 9:17 AM, Matt <bs...@gmail.com> wrote:
>>>
>>>> To make sure I am adjusting the correct config, these are heap 
>>>> parameters within the Drill configure path, not for Hadoop or 
>>>> Zookeeper?
>>>>
>>>>
>>>>> On May 28, 2015, at 12:08 PM, Jason Altekruse 
>>>>> <al...@gmail.com> wrote:
>>>>>
>>>>> There should be no upper limit on the size of the tables you can 
>>>>> create
>>>>> with Drill. Be advised that Drill does currently operate entirely
>>>>> optimistically in regards to available resources. If a network 
>>>>> connection
>>>>> between two drillbits fails during a query, we will not currently
>>>>> re-schedule the work to make use of remaining nodes and network 
>>>>> connections
>>>>> that are still live. While we have had a good amount of success 
>>>>> using Drill
>>>>> for data conversion, be aware that these conditions could cause 
>>>>> long
>>>>> running queries to fail.
>>>>>
>>>>> That being said, it isn't the only possible cause for such a 
>>>>> failure. In
>>>>> the case of a network failure we would expect to see a message 
>>>>> returned to
>>>>> you that part of the query was unsuccessful and that it had been 
>>>>> cancelled.
>>>>> Andries has a good suggestion in regards to checking the heap 
>>>>> memory, this
>>>>> should also be detected and reported back to you at the CLI, but 
>>>>> we may be
>>>>> failing to propagate the error back to the head node for the 
>>>>> query. I
>>>>> believe writing parquet may still be the most heap-intensive 
>>>>> operation in
>>>>> Drill, despite our efforts to refactor the write path to use 
>>>>> direct memory
>>>>> instead of on-heap for large buffers needed in the process of 
>>>>> creating
>>>>> parquet files.
>>>>>
>>>>>> On Thu, May 28, 2015 at 8:43 AM, Matt <bs...@gmail.com> wrote:
>>>>>>
>>>>>> Is 300MM records too much to do in a single CTAS statement?
>>>>>>
>>>>>> After almost 23 hours I killed the query (^c) and it returned:
>>>>>>
>>>>>> ~~~
>>>>>> +-----------+----------------------------+
>>>>>> | Fragment  | Number of records written  |
>>>>>> +-----------+----------------------------+
>>>>>> | 1_20      | 13568824                   |
>>>>>> | 1_15      | 12411822                   |
>>>>>> | 1_7       | 12470329                   |
>>>>>> | 1_12      | 13693867                   |
>>>>>> | 1_5       | 13292136                   |
>>>>>> | 1_18      | 13874321                   |
>>>>>> | 1_16      | 13303094                   |
>>>>>> | 1_9       | 13639049                   |
>>>>>> | 1_10      | 13698380                   |
>>>>>> | 1_22      | 13501073                   |
>>>>>> | 1_8       | 13533736                   |
>>>>>> | 1_2       | 13549402                   |
>>>>>> | 1_21      | 13665183                   |
>>>>>> | 1_0       | 13544745                   |
>>>>>> | 1_4       | 13532957                   |
>>>>>> | 1_19      | 12767473                   |
>>>>>> | 1_17      | 13670687                   |
>>>>>> | 1_13      | 13469515                   |
>>>>>> | 1_23      | 12517632                   |
>>>>>> | 1_6       | 13634338                   |
>>>>>> | 1_14      | 13611322                   |
>>>>>> | 1_3       | 13061900                   |
>>>>>> | 1_11      | 12760978                   |
>>>>>> +-----------+----------------------------+
>>>>>> 23 rows selected (82294.854 seconds)
>>>>>> ~~~
>>>>>>
>>>>>> The sum of those record counts is  306,772,763 which is close to 
>>>>>> the
>>>>>> 320,843,454 in the source file:
>>>>>>
>>>>>> ~~~
>>>>>> 0: jdbc:drill:zk=es05:2181> select count(*)  FROM 
>>>>>> root.`sample_201501.dat`;
>>>>>> +------------+
>>>>>> |   EXPR$0   |
>>>>>> +------------+
>>>>>> | 320843454  |
>>>>>> +------------+
>>>>>> 1 row selected (384.665 seconds)
>>>>>> ~~~
>>>>>>
>>>>>>
>>>>>> It represents one month of data, 4 key columns and 38 numeric 
>>>>>> measure
>>>>>> columns, which could also be partitioned daily. The test here was 
>>>>>> to create
>>>>>> monthly Parquet files to see how the min/max stats on Parquet 
>>>>>> chunks help
>>>>>> with range select performance.
>>>>>>
>>>>>> Instead of a small number of large monthly RDBMS tables, I am 
>>>>>> attempting
>>>>>> to determine how many Parquet files should be used with Drill / 
>>>>>> HDFS.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 27 May 2015, at 15:17, Matt wrote:
>>>>>>
>>>>>> Attempting to create a Parquet backed table with a CTAS from an 
>>>>>> 44GB tab
>>>>>>> delimited file in HDFS. The process seemed to be running, as CPU 
>>>>>>> and IO was
>>>>>>> seen on all 4 nodes in this cluster, and .parquet files being 
>>>>>>> created in
>>>>>>> the expected path.
>>>>>>>
>>>>>>> In however in the last two hours or so, all nodes show near zero 
>>>>>>> CPU or
>>>>>>> IO, and the Last Modified date on the .parquet have not changed. 
>>>>>>> Same time
>>>>>>> delay shown in the Last Progress column in the active fragment 
>>>>>>> profile.
>>>>>>>
>>>>>>> What approach can I take to determine what is happening (or 
>>>>>>> not)?
>>>>>>

Re: Monitoring long / stuck CTAS

Posted by Andries Engelbrecht <ae...@maprtech.com>.

The time seems pretty long for that file size. What type of file is it?

Is the CTAS running single threaded?

—Andries


On May 28, 2015, at 9:37 AM, Matt <bs...@gmail.com> wrote:

>> How large is the data set you are working with, and your cluster/nodes?
> 
> Just testing with that single 44GB source file currently, and my test cluster is made from 4 nodes, each with 8 CPU cores, 32GB RAM, a 6TB Ext4 volume (RAID-10).
> 
> Drill defaults left as come in v1.0. I will be adjusting memory and retrying the CTAS.
> 
> I know I can / should assign individual disks to HDFS, but as a test cluster there are apps that expect data volumes to work on. A dedicated Hadoop production cluster would have a disk layout specific to the task.
> 
> 
> On 28 May 2015, at 12:26, Andries Engelbrecht wrote:
> 
>> Just check the drillbit.log and drillbit.out files in the log directory.
>> Before adjusting memory, see if that is an issue first. It was for me, but as Jason mentioned there can be other causes as well.
>> 
>> You adjust memory allocation in the drill-env.sh files, and have to restart the drill bits.
>> 
>> How large is the data set you are working with, and your cluster/nodes?
>> 
>> —Andries
>> 
>> 
>> On May 28, 2015, at 9:17 AM, Matt <bs...@gmail.com> wrote:
>> 
>>> To make sure I am adjusting the correct config, these are heap parameters within the Drill configure path, not for Hadoop or Zookeeper?
>>> 
>>> 
>>>> On May 28, 2015, at 12:08 PM, Jason Altekruse <al...@gmail.com> wrote:
>>>> 
>>>> There should be no upper limit on the size of the tables you can create
>>>> with Drill. Be advised that Drill does currently operate entirely
>>>> optimistically in regards to available resources. If a network connection
>>>> between two drillbits fails during a query, we will not currently
>>>> re-schedule the work to make use of remaining nodes and network connections
>>>> that are still live. While we have had a good amount of success using Drill
>>>> for data conversion, be aware that these conditions could cause long
>>>> running queries to fail.
>>>> 
>>>> That being said, it isn't the only possible cause for such a failure. In
>>>> the case of a network failure we would expect to see a message returned to
>>>> you that part of the query was unsuccessful and that it had been cancelled.
>>>> Andries has a good suggestion in regards to checking the heap memory, this
>>>> should also be detected and reported back to you at the CLI, but we may be
>>>> failing to propagate the error back to the head node for the query. I
>>>> believe writing parquet may still be the most heap-intensive operation in
>>>> Drill, despite our efforts to refactor the write path to use direct memory
>>>> instead of on-heap for large buffers needed in the process of creating
>>>> parquet files.
>>>> 
>>>>> On Thu, May 28, 2015 at 8:43 AM, Matt <bs...@gmail.com> wrote:
>>>>> 
>>>>> Is 300MM records too much to do in a single CTAS statement?
>>>>> 
>>>>> After almost 23 hours I killed the query (^c) and it returned:
>>>>> 
>>>>> ~~~
>>>>> +-----------+----------------------------+
>>>>> | Fragment  | Number of records written  |
>>>>> +-----------+----------------------------+
>>>>> | 1_20      | 13568824                   |
>>>>> | 1_15      | 12411822                   |
>>>>> | 1_7       | 12470329                   |
>>>>> | 1_12      | 13693867                   |
>>>>> | 1_5       | 13292136                   |
>>>>> | 1_18      | 13874321                   |
>>>>> | 1_16      | 13303094                   |
>>>>> | 1_9       | 13639049                   |
>>>>> | 1_10      | 13698380                   |
>>>>> | 1_22      | 13501073                   |
>>>>> | 1_8       | 13533736                   |
>>>>> | 1_2       | 13549402                   |
>>>>> | 1_21      | 13665183                   |
>>>>> | 1_0       | 13544745                   |
>>>>> | 1_4       | 13532957                   |
>>>>> | 1_19      | 12767473                   |
>>>>> | 1_17      | 13670687                   |
>>>>> | 1_13      | 13469515                   |
>>>>> | 1_23      | 12517632                   |
>>>>> | 1_6       | 13634338                   |
>>>>> | 1_14      | 13611322                   |
>>>>> | 1_3       | 13061900                   |
>>>>> | 1_11      | 12760978                   |
>>>>> +-----------+----------------------------+
>>>>> 23 rows selected (82294.854 seconds)
>>>>> ~~~
>>>>> 
>>>>> The sum of those record counts is  306,772,763 which is close to the
>>>>> 320,843,454 in the source file:
>>>>> 
>>>>> ~~~
>>>>> 0: jdbc:drill:zk=es05:2181> select count(*)  FROM root.`sample_201501.dat`;
>>>>> +------------+
>>>>> |   EXPR$0   |
>>>>> +------------+
>>>>> | 320843454  |
>>>>> +------------+
>>>>> 1 row selected (384.665 seconds)
>>>>> ~~~
>>>>> 
>>>>> 
>>>>> It represents one month of data, 4 key columns and 38 numeric measure
>>>>> columns, which could also be partitioned daily. The test here was to create
>>>>> monthly Parquet files to see how the min/max stats on Parquet chunks help
>>>>> with range select performance.
>>>>> 
>>>>> Instead of a small number of large monthly RDBMS tables, I am attempting
>>>>> to determine how many Parquet files should be used with Drill / HDFS.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On 27 May 2015, at 15:17, Matt wrote:
>>>>> 
>>>>> Attempting to create a Parquet backed table with a CTAS from an 44GB tab
>>>>>> delimited file in HDFS. The process seemed to be running, as CPU and IO was
>>>>>> seen on all 4 nodes in this cluster, and .parquet files being created in
>>>>>> the expected path.
>>>>>> 
>>>>>> In however in the last two hours or so, all nodes show near zero CPU or
>>>>>> IO, and the Last Modified date on the .parquet have not changed. Same time
>>>>>> delay shown in the Last Progress column in the active fragment profile.
>>>>>> 
>>>>>> What approach can I take to determine what is happening (or not)?
>>>>>

Re: Monitoring long / stuck CTAS

Posted by Matt <bs...@gmail.com>.

> How large is the data set you are working with, and your 
> cluster/nodes?

Just testing with that single 44GB source file currently, and my test 
cluster is made from 4 nodes, each with 8 CPU cores, 32GB RAM, a 6TB 
Ext4 volume (RAID-10).

Drill defaults left as come in v1.0. I will be adjusting memory and 
retrying the CTAS.

I know I can / should assign individual disks to HDFS, but as a test 
cluster there are apps that expect data volumes to work on. A dedicated 
Hadoop production cluster would have a disk layout specific to the task.


On 28 May 2015, at 12:26, Andries Engelbrecht wrote:

> Just check the drillbit.log and drillbit.out files in the log 
> directory.
> Before adjusting memory, see if that is an issue first. It was for me, 
> but as Jason mentioned there can be other causes as well.
>
> You adjust memory allocation in the drill-env.sh files, and have to 
> restart the drill bits.
>
> How large is the data set you are working with, and your 
> cluster/nodes?
>
> —Andries
>
>
> On May 28, 2015, at 9:17 AM, Matt <bs...@gmail.com> wrote:
>
>> To make sure I am adjusting the correct config, these are heap 
>> parameters within the Drill configure path, not for Hadoop or 
>> Zookeeper?
>>
>>
>>> On May 28, 2015, at 12:08 PM, Jason Altekruse 
>>> <al...@gmail.com> wrote:
>>>
>>> There should be no upper limit on the size of the tables you can 
>>> create
>>> with Drill. Be advised that Drill does currently operate entirely
>>> optimistically in regards to available resources. If a network 
>>> connection
>>> between two drillbits fails during a query, we will not currently
>>> re-schedule the work to make use of remaining nodes and network 
>>> connections
>>> that are still live. While we have had a good amount of success 
>>> using Drill
>>> for data conversion, be aware that these conditions could cause long
>>> running queries to fail.
>>>
>>> That being said, it isn't the only possible cause for such a 
>>> failure. In
>>> the case of a network failure we would expect to see a message 
>>> returned to
>>> you that part of the query was unsuccessful and that it had been 
>>> cancelled.
>>> Andries has a good suggestion in regards to checking the heap 
>>> memory, this
>>> should also be detected and reported back to you at the CLI, but we 
>>> may be
>>> failing to propagate the error back to the head node for the query. 
>>> I
>>> believe writing parquet may still be the most heap-intensive 
>>> operation in
>>> Drill, despite our efforts to refactor the write path to use direct 
>>> memory
>>> instead of on-heap for large buffers needed in the process of 
>>> creating
>>> parquet files.
>>>
>>>> On Thu, May 28, 2015 at 8:43 AM, Matt <bs...@gmail.com> wrote:
>>>>
>>>> Is 300MM records too much to do in a single CTAS statement?
>>>>
>>>> After almost 23 hours I killed the query (^c) and it returned:
>>>>
>>>> ~~~
>>>> +-----------+----------------------------+
>>>> | Fragment  | Number of records written  |
>>>> +-----------+----------------------------+
>>>> | 1_20      | 13568824                   |
>>>> | 1_15      | 12411822                   |
>>>> | 1_7       | 12470329                   |
>>>> | 1_12      | 13693867                   |
>>>> | 1_5       | 13292136                   |
>>>> | 1_18      | 13874321                   |
>>>> | 1_16      | 13303094                   |
>>>> | 1_9       | 13639049                   |
>>>> | 1_10      | 13698380                   |
>>>> | 1_22      | 13501073                   |
>>>> | 1_8       | 13533736                   |
>>>> | 1_2       | 13549402                   |
>>>> | 1_21      | 13665183                   |
>>>> | 1_0       | 13544745                   |
>>>> | 1_4       | 13532957                   |
>>>> | 1_19      | 12767473                   |
>>>> | 1_17      | 13670687                   |
>>>> | 1_13      | 13469515                   |
>>>> | 1_23      | 12517632                   |
>>>> | 1_6       | 13634338                   |
>>>> | 1_14      | 13611322                   |
>>>> | 1_3       | 13061900                   |
>>>> | 1_11      | 12760978                   |
>>>> +-----------+----------------------------+
>>>> 23 rows selected (82294.854 seconds)
>>>> ~~~
>>>>
>>>> The sum of those record counts is  306,772,763 which is close to 
>>>> the
>>>> 320,843,454 in the source file:
>>>>
>>>> ~~~
>>>> 0: jdbc:drill:zk=es05:2181> select count(*)  FROM 
>>>> root.`sample_201501.dat`;
>>>> +------------+
>>>> |   EXPR$0   |
>>>> +------------+
>>>> | 320843454  |
>>>> +------------+
>>>> 1 row selected (384.665 seconds)
>>>> ~~~
>>>>
>>>>
>>>> It represents one month of data, 4 key columns and 38 numeric 
>>>> measure
>>>> columns, which could also be partitioned daily. The test here was 
>>>> to create
>>>> monthly Parquet files to see how the min/max stats on Parquet 
>>>> chunks help
>>>> with range select performance.
>>>>
>>>> Instead of a small number of large monthly RDBMS tables, I am 
>>>> attempting
>>>> to determine how many Parquet files should be used with Drill / 
>>>> HDFS.
>>>>
>>>>
>>>>
>>>>
>>>> On 27 May 2015, at 15:17, Matt wrote:
>>>>
>>>> Attempting to create a Parquet backed table with a CTAS from an 
>>>> 44GB tab
>>>>> delimited file in HDFS. The process seemed to be running, as CPU 
>>>>> and IO was
>>>>> seen on all 4 nodes in this cluster, and .parquet files being 
>>>>> created in
>>>>> the expected path.
>>>>>
>>>>> In however in the last two hours or so, all nodes show near zero 
>>>>> CPU or
>>>>> IO, and the Last Modified date on the .parquet have not changed. 
>>>>> Same time
>>>>> delay shown in the Last Progress column in the active fragment 
>>>>> profile.
>>>>>
>>>>> What approach can I take to determine what is happening (or not)?
>>>>

Re: Monitoring long / stuck CTAS

Posted by Andries Engelbrecht <ae...@maprtech.com>.

Just check the drillbit.log and drillbit.out files in the log directory.
Before adjusting memory, see if that is an issue first. It was for me, but as Jason mentioned there can be other causes as well.

You adjust memory allocation in the drill-env.sh files, and have to restart the drill bits.

How large is the data set you are working with, and your cluster/nodes?

—Andries


On May 28, 2015, at 9:17 AM, Matt <bs...@gmail.com> wrote:

> To make sure I am adjusting the correct config, these are heap parameters within the Drill configure path, not for Hadoop or Zookeeper?
> 
> 
>> On May 28, 2015, at 12:08 PM, Jason Altekruse <al...@gmail.com> wrote:
>> 
>> There should be no upper limit on the size of the tables you can create
>> with Drill. Be advised that Drill does currently operate entirely
>> optimistically in regards to available resources. If a network connection
>> between two drillbits fails during a query, we will not currently
>> re-schedule the work to make use of remaining nodes and network connections
>> that are still live. While we have had a good amount of success using Drill
>> for data conversion, be aware that these conditions could cause long
>> running queries to fail.
>> 
>> That being said, it isn't the only possible cause for such a failure. In
>> the case of a network failure we would expect to see a message returned to
>> you that part of the query was unsuccessful and that it had been cancelled.
>> Andries has a good suggestion in regards to checking the heap memory, this
>> should also be detected and reported back to you at the CLI, but we may be
>> failing to propagate the error back to the head node for the query. I
>> believe writing parquet may still be the most heap-intensive operation in
>> Drill, despite our efforts to refactor the write path to use direct memory
>> instead of on-heap for large buffers needed in the process of creating
>> parquet files.
>> 
>>> On Thu, May 28, 2015 at 8:43 AM, Matt <bs...@gmail.com> wrote:
>>> 
>>> Is 300MM records too much to do in a single CTAS statement?
>>> 
>>> After almost 23 hours I killed the query (^c) and it returned:
>>> 
>>> ~~~
>>> +-----------+----------------------------+
>>> | Fragment  | Number of records written  |
>>> +-----------+----------------------------+
>>> | 1_20      | 13568824                   |
>>> | 1_15      | 12411822                   |
>>> | 1_7       | 12470329                   |
>>> | 1_12      | 13693867                   |
>>> | 1_5       | 13292136                   |
>>> | 1_18      | 13874321                   |
>>> | 1_16      | 13303094                   |
>>> | 1_9       | 13639049                   |
>>> | 1_10      | 13698380                   |
>>> | 1_22      | 13501073                   |
>>> | 1_8       | 13533736                   |
>>> | 1_2       | 13549402                   |
>>> | 1_21      | 13665183                   |
>>> | 1_0       | 13544745                   |
>>> | 1_4       | 13532957                   |
>>> | 1_19      | 12767473                   |
>>> | 1_17      | 13670687                   |
>>> | 1_13      | 13469515                   |
>>> | 1_23      | 12517632                   |
>>> | 1_6       | 13634338                   |
>>> | 1_14      | 13611322                   |
>>> | 1_3       | 13061900                   |
>>> | 1_11      | 12760978                   |
>>> +-----------+----------------------------+
>>> 23 rows selected (82294.854 seconds)
>>> ~~~
>>> 
>>> The sum of those record counts is  306,772,763 which is close to the
>>> 320,843,454 in the source file:
>>> 
>>> ~~~
>>> 0: jdbc:drill:zk=es05:2181> select count(*)  FROM root.`sample_201501.dat`;
>>> +------------+
>>> |   EXPR$0   |
>>> +------------+
>>> | 320843454  |
>>> +------------+
>>> 1 row selected (384.665 seconds)
>>> ~~~
>>> 
>>> 
>>> It represents one month of data, 4 key columns and 38 numeric measure
>>> columns, which could also be partitioned daily. The test here was to create
>>> monthly Parquet files to see how the min/max stats on Parquet chunks help
>>> with range select performance.
>>> 
>>> Instead of a small number of large monthly RDBMS tables, I am attempting
>>> to determine how many Parquet files should be used with Drill / HDFS.
>>> 
>>> 
>>> 
>>> 
>>> On 27 May 2015, at 15:17, Matt wrote:
>>> 
>>> Attempting to create a Parquet backed table with a CTAS from an 44GB tab
>>>> delimited file in HDFS. The process seemed to be running, as CPU and IO was
>>>> seen on all 4 nodes in this cluster, and .parquet files being created in
>>>> the expected path.
>>>> 
>>>> In however in the last two hours or so, all nodes show near zero CPU or
>>>> IO, and the Last Modified date on the .parquet have not changed. Same time
>>>> delay shown in the Last Progress column in the active fragment profile.
>>>> 
>>>> What approach can I take to determine what is happening (or not)?
>>>

Re: Monitoring long / stuck CTAS

Posted by Matt <bs...@gmail.com>.

To make sure I am adjusting the correct config, these are heap parameters within the Drill configure path, not for Hadoop or Zookeeper?


> On May 28, 2015, at 12:08 PM, Jason Altekruse <al...@gmail.com> wrote:
> 
> There should be no upper limit on the size of the tables you can create
> with Drill. Be advised that Drill does currently operate entirely
> optimistically in regards to available resources. If a network connection
> between two drillbits fails during a query, we will not currently
> re-schedule the work to make use of remaining nodes and network connections
> that are still live. While we have had a good amount of success using Drill
> for data conversion, be aware that these conditions could cause long
> running queries to fail.
> 
> That being said, it isn't the only possible cause for such a failure. In
> the case of a network failure we would expect to see a message returned to
> you that part of the query was unsuccessful and that it had been cancelled.
> Andries has a good suggestion in regards to checking the heap memory, this
> should also be detected and reported back to you at the CLI, but we may be
> failing to propagate the error back to the head node for the query. I
> believe writing parquet may still be the most heap-intensive operation in
> Drill, despite our efforts to refactor the write path to use direct memory
> instead of on-heap for large buffers needed in the process of creating
> parquet files.
> 
>> On Thu, May 28, 2015 at 8:43 AM, Matt <bs...@gmail.com> wrote:
>> 
>> Is 300MM records too much to do in a single CTAS statement?
>> 
>> After almost 23 hours I killed the query (^c) and it returned:
>> 
>> ~~~
>> +-----------+----------------------------+
>> | Fragment  | Number of records written  |
>> +-----------+----------------------------+
>> | 1_20      | 13568824                   |
>> | 1_15      | 12411822                   |
>> | 1_7       | 12470329                   |
>> | 1_12      | 13693867                   |
>> | 1_5       | 13292136                   |
>> | 1_18      | 13874321                   |
>> | 1_16      | 13303094                   |
>> | 1_9       | 13639049                   |
>> | 1_10      | 13698380                   |
>> | 1_22      | 13501073                   |
>> | 1_8       | 13533736                   |
>> | 1_2       | 13549402                   |
>> | 1_21      | 13665183                   |
>> | 1_0       | 13544745                   |
>> | 1_4       | 13532957                   |
>> | 1_19      | 12767473                   |
>> | 1_17      | 13670687                   |
>> | 1_13      | 13469515                   |
>> | 1_23      | 12517632                   |
>> | 1_6       | 13634338                   |
>> | 1_14      | 13611322                   |
>> | 1_3       | 13061900                   |
>> | 1_11      | 12760978                   |
>> +-----------+----------------------------+
>> 23 rows selected (82294.854 seconds)
>> ~~~
>> 
>> The sum of those record counts is  306,772,763 which is close to the
>> 320,843,454 in the source file:
>> 
>> ~~~
>> 0: jdbc:drill:zk=es05:2181> select count(*)  FROM root.`sample_201501.dat`;
>> +------------+
>> |   EXPR$0   |
>> +------------+
>> | 320843454  |
>> +------------+
>> 1 row selected (384.665 seconds)
>> ~~~
>> 
>> 
>> It represents one month of data, 4 key columns and 38 numeric measure
>> columns, which could also be partitioned daily. The test here was to create
>> monthly Parquet files to see how the min/max stats on Parquet chunks help
>> with range select performance.
>> 
>> Instead of a small number of large monthly RDBMS tables, I am attempting
>> to determine how many Parquet files should be used with Drill / HDFS.
>> 
>> 
>> 
>> 
>> On 27 May 2015, at 15:17, Matt wrote:
>> 
>> Attempting to create a Parquet backed table with a CTAS from an 44GB tab
>>> delimited file in HDFS. The process seemed to be running, as CPU and IO was
>>> seen on all 4 nodes in this cluster, and .parquet files being created in
>>> the expected path.
>>> 
>>> In however in the last two hours or so, all nodes show near zero CPU or
>>> IO, and the Last Modified date on the .parquet have not changed. Same time
>>> delay shown in the Last Progress column in the active fragment profile.
>>> 
>>> What approach can I take to determine what is happening (or not)?
>>

Re: Monitoring long / stuck CTAS

Posted by Jason Altekruse <al...@gmail.com>.

There should be no upper limit on the size of the tables you can create
with Drill. Be advised that Drill does currently operate entirely
optimistically in regards to available resources. If a network connection
between two drillbits fails during a query, we will not currently
re-schedule the work to make use of remaining nodes and network connections
that are still live. While we have had a good amount of success using Drill
for data conversion, be aware that these conditions could cause long
running queries to fail.

That being said, it isn't the only possible cause for such a failure. In
the case of a network failure we would expect to see a message returned to
you that part of the query was unsuccessful and that it had been cancelled.
Andries has a good suggestion in regards to checking the heap memory, this
should also be detected and reported back to you at the CLI, but we may be
failing to propagate the error back to the head node for the query. I
believe writing parquet may still be the most heap-intensive operation in
Drill, despite our efforts to refactor the write path to use direct memory
instead of on-heap for large buffers needed in the process of creating
parquet files.

On Thu, May 28, 2015 at 8:43 AM, Matt <bs...@gmail.com> wrote:

> Is 300MM records too much to do in a single CTAS statement?
>
> After almost 23 hours I killed the query (^c) and it returned:
>
> ~~~
> +-----------+----------------------------+
> | Fragment  | Number of records written  |
> +-----------+----------------------------+
> | 1_20      | 13568824                   |
> | 1_15      | 12411822                   |
> | 1_7       | 12470329                   |
> | 1_12      | 13693867                   |
> | 1_5       | 13292136                   |
> | 1_18      | 13874321                   |
> | 1_16      | 13303094                   |
> | 1_9       | 13639049                   |
> | 1_10      | 13698380                   |
> | 1_22      | 13501073                   |
> | 1_8       | 13533736                   |
> | 1_2       | 13549402                   |
> | 1_21      | 13665183                   |
> | 1_0       | 13544745                   |
> | 1_4       | 13532957                   |
> | 1_19      | 12767473                   |
> | 1_17      | 13670687                   |
> | 1_13      | 13469515                   |
> | 1_23      | 12517632                   |
> | 1_6       | 13634338                   |
> | 1_14      | 13611322                   |
> | 1_3       | 13061900                   |
> | 1_11      | 12760978                   |
> +-----------+----------------------------+
> 23 rows selected (82294.854 seconds)
> ~~~
>
> The sum of those record counts is  306,772,763 which is close to the
> 320,843,454 in the source file:
>
> ~~~
> 0: jdbc:drill:zk=es05:2181> select count(*)  FROM root.`sample_201501.dat`;
> +------------+
> |   EXPR$0   |
> +------------+
> | 320843454  |
> +------------+
> 1 row selected (384.665 seconds)
> ~~~
>
>
> It represents one month of data, 4 key columns and 38 numeric measure
> columns, which could also be partitioned daily. The test here was to create
> monthly Parquet files to see how the min/max stats on Parquet chunks help
> with range select performance.
>
> Instead of a small number of large monthly RDBMS tables, I am attempting
> to determine how many Parquet files should be used with Drill / HDFS.
>
>
>
>
> On 27 May 2015, at 15:17, Matt wrote:
>
>  Attempting to create a Parquet backed table with a CTAS from an 44GB tab
>> delimited file in HDFS. The process seemed to be running, as CPU and IO was
>> seen on all 4 nodes in this cluster, and .parquet files being created in
>> the expected path.
>>
>> In however in the last two hours or so, all nodes show near zero CPU or
>> IO, and the Last Modified date on the .parquet have not changed. Same time
>> delay shown in the Last Progress column in the active fragment profile.
>>
>> What approach can I take to determine what is happening (or not)?
>>
>

Re: Monitoring long / stuck CTAS

Posted by Matt <bs...@gmail.com>.

I did not note any memory errors or warnings in a quick scan of the logs, but to double check, is there a specific log I would find such warnings in?


> On May 28, 2015, at 12:01 PM, Andries Engelbrecht <ae...@maprtech.com> wrote:
> 
> I have used a single CTAS to create tables using parquet with 1.5B rows.
> 
> It did consume a lot of heap memory on the Drillbits and I had to increase the heap size. Check your logs to see if you are running out of heap memory.
> 
> I used 128MB parquet block size.
> 
> This was with Drill 0.9 , so I’m sure 1.0 will be better in this regard.
> 
> —Andries
> 
> 
> 
>> On May 28, 2015, at 8:43 AM, Matt <bs...@gmail.com> wrote:
>> 
>> Is 300MM records too much to do in a single CTAS statement?
>> 
>> After almost 23 hours I killed the query (^c) and it returned:
>> 
>> ~~~
>> +-----------+----------------------------+
>> | Fragment  | Number of records written  |
>> +-----------+----------------------------+
>> | 1_20      | 13568824                   |
>> | 1_15      | 12411822                   |
>> | 1_7       | 12470329                   |
>> | 1_12      | 13693867                   |
>> | 1_5       | 13292136                   |
>> | 1_18      | 13874321                   |
>> | 1_16      | 13303094                   |
>> | 1_9       | 13639049                   |
>> | 1_10      | 13698380                   |
>> | 1_22      | 13501073                   |
>> | 1_8       | 13533736                   |
>> | 1_2       | 13549402                   |
>> | 1_21      | 13665183                   |
>> | 1_0       | 13544745                   |
>> | 1_4       | 13532957                   |
>> | 1_19      | 12767473                   |
>> | 1_17      | 13670687                   |
>> | 1_13      | 13469515                   |
>> | 1_23      | 12517632                   |
>> | 1_6       | 13634338                   |
>> | 1_14      | 13611322                   |
>> | 1_3       | 13061900                   |
>> | 1_11      | 12760978                   |
>> +-----------+----------------------------+
>> 23 rows selected (82294.854 seconds)
>> ~~~
>> 
>> The sum of those record counts is  306,772,763 which is close to the  320,843,454 in the source file:
>> 
>> ~~~
>> 0: jdbc:drill:zk=es05:2181> select count(*)  FROM root.`sample_201501.dat`;
>> +------------+
>> |   EXPR$0   |
>> +------------+
>> | 320843454  |
>> +------------+
>> 1 row selected (384.665 seconds)
>> ~~~
>> 
>> 
>> It represents one month of data, 4 key columns and 38 numeric measure columns, which could also be partitioned daily. The test here was to create monthly Parquet files to see how the min/max stats on Parquet chunks help with range select performance.
>> 
>> Instead of a small number of large monthly RDBMS tables, I am attempting to determine how many Parquet files should be used with Drill / HDFS.
>> 
>> 
>> 
>>> On 27 May 2015, at 15:17, Matt wrote:
>>> 
>>> Attempting to create a Parquet backed table with a CTAS from an 44GB tab delimited file in HDFS. The process seemed to be running, as CPU and IO was seen on all 4 nodes in this cluster, and .parquet files being created in the expected path.
>>> 
>>> In however in the last two hours or so, all nodes show near zero CPU or IO, and the Last Modified date on the .parquet have not changed. Same time delay shown in the Last Progress column in the active fragment profile.
>>> 
>>> What approach can I take to determine what is happening (or not)?
>

Re: Monitoring long / stuck CTAS

Posted by Andries Engelbrecht <ae...@maprtech.com>.

I have used a single CTAS to create tables using parquet with 1.5B rows.

It did consume a lot of heap memory on the Drillbits and I had to increase the heap size. Check your logs to see if you are running out of heap memory.

I used 128MB parquet block size.

This was with Drill 0.9 , so I’m sure 1.0 will be better in this regard.

—Andries



On May 28, 2015, at 8:43 AM, Matt <bs...@gmail.com> wrote:

> Is 300MM records too much to do in a single CTAS statement?
> 
> After almost 23 hours I killed the query (^c) and it returned:
> 
> ~~~
> +-----------+----------------------------+
> | Fragment  | Number of records written  |
> +-----------+----------------------------+
> | 1_20      | 13568824                   |
> | 1_15      | 12411822                   |
> | 1_7       | 12470329                   |
> | 1_12      | 13693867                   |
> | 1_5       | 13292136                   |
> | 1_18      | 13874321                   |
> | 1_16      | 13303094                   |
> | 1_9       | 13639049                   |
> | 1_10      | 13698380                   |
> | 1_22      | 13501073                   |
> | 1_8       | 13533736                   |
> | 1_2       | 13549402                   |
> | 1_21      | 13665183                   |
> | 1_0       | 13544745                   |
> | 1_4       | 13532957                   |
> | 1_19      | 12767473                   |
> | 1_17      | 13670687                   |
> | 1_13      | 13469515                   |
> | 1_23      | 12517632                   |
> | 1_6       | 13634338                   |
> | 1_14      | 13611322                   |
> | 1_3       | 13061900                   |
> | 1_11      | 12760978                   |
> +-----------+----------------------------+
> 23 rows selected (82294.854 seconds)
> ~~~
> 
> The sum of those record counts is  306,772,763 which is close to the  320,843,454 in the source file:
> 
> ~~~
> 0: jdbc:drill:zk=es05:2181> select count(*)  FROM root.`sample_201501.dat`;
> +------------+
> |   EXPR$0   |
> +------------+
> | 320843454  |
> +------------+
> 1 row selected (384.665 seconds)
> ~~~
> 
> 
> It represents one month of data, 4 key columns and 38 numeric measure columns, which could also be partitioned daily. The test here was to create monthly Parquet files to see how the min/max stats on Parquet chunks help with range select performance.
> 
> Instead of a small number of large monthly RDBMS tables, I am attempting to determine how many Parquet files should be used with Drill / HDFS.
> 
> 
> 
> On 27 May 2015, at 15:17, Matt wrote:
> 
>> Attempting to create a Parquet backed table with a CTAS from an 44GB tab delimited file in HDFS. The process seemed to be running, as CPU and IO was seen on all 4 nodes in this cluster, and .parquet files being created in the expected path.
>> 
>> In however in the last two hours or so, all nodes show near zero CPU or IO, and the Last Modified date on the .parquet have not changed. Same time delay shown in the Last Progress column in the active fragment profile.
>> 
>> What approach can I take to determine what is happening (or not)?

Re: Monitoring long / stuck CTAS

Posted by Matt <bs...@gmail.com>.

Is 300MM records too much to do in a single CTAS statement?

After almost 23 hours I killed the query (^c) and it returned:

~~~
+-----------+----------------------------+
| Fragment  | Number of records written  |
+-----------+----------------------------+
| 1_20      | 13568824                   |
| 1_15      | 12411822                   |
| 1_7       | 12470329                   |
| 1_12      | 13693867                   |
| 1_5       | 13292136                   |
| 1_18      | 13874321                   |
| 1_16      | 13303094                   |
| 1_9       | 13639049                   |
| 1_10      | 13698380                   |
| 1_22      | 13501073                   |
| 1_8       | 13533736                   |
| 1_2       | 13549402                   |
| 1_21      | 13665183                   |
| 1_0       | 13544745                   |
| 1_4       | 13532957                   |
| 1_19      | 12767473                   |
| 1_17      | 13670687                   |
| 1_13      | 13469515                   |
| 1_23      | 12517632                   |
| 1_6       | 13634338                   |
| 1_14      | 13611322                   |
| 1_3       | 13061900                   |
| 1_11      | 12760978                   |
+-----------+----------------------------+
23 rows selected (82294.854 seconds)
~~~

The sum of those record counts is  306,772,763 which is close to the  
320,843,454 in the source file:

~~~
0: jdbc:drill:zk=es05:2181> select count(*)  FROM 
root.`sample_201501.dat`;
+------------+
|   EXPR$0   |
+------------+
| 320843454  |
+------------+
1 row selected (384.665 seconds)
~~~


It represents one month of data, 4 key columns and 38 numeric measure 
columns, which could also be partitioned daily. The test here was to 
create monthly Parquet files to see how the min/max stats on Parquet 
chunks help with range select performance.

Instead of a small number of large monthly RDBMS tables, I am attempting 
to determine how many Parquet files should be used with Drill / HDFS.



On 27 May 2015, at 15:17, Matt wrote:

> Attempting to create a Parquet backed table with a CTAS from an 44GB 
> tab delimited file in HDFS. The process seemed to be running, as CPU 
> and IO was seen on all 4 nodes in this cluster, and .parquet files 
> being created in the expected path.
>
> In however in the last two hours or so, all nodes show near zero CPU 
> or IO, and the Last Modified date on the .parquet have not changed. 
> Same time delay shown in the Last Progress column in the active 
> fragment profile.
>
> What approach can I take to determine what is happening (or not)?