You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by mi...@nomura.com on 2015/10/13 11:06:13 UTC

Stop Drill querying .tmp files

Hi,

I am trying to query data ingested by Flume in real time, however, Flume writes out data to a file ending in .tmp and then renames it once it has completed its writes. If you run a drill query on a large data set and a .tmp file is renamed by Flume whilst the query is running, it bombs out. I was looking for a way to specify a file exclusion pattern with regex or something similar, however right now this doesn’t seem possible. Right now, just making Drill exclude any files ending in .tmp or starting with a . or a _ would be very useful for this reason.

I have seen the following JIRAs relating to this issue:

https://issues.apache.org/jira/browse/DRILL-2424 - closed as a duplicate

https://issues.apache.org/jira/browse/DRILL-1131 - still open but related to Parquet

Is there another way to achieve this without having to wait for a change on the Drill code base? I wrote a custom Hive class to achieve the same functionality but I am not sure this is possible in Drill.

Thanks,
Mike


This e-mail (including any attachments) is private and confidential, may contain proprietary or privileged information and is intended for the named recipient(s) only. Unintended recipients are strictly prohibited from taking action on the basis of information in this e-mail and must contact the sender immediately, delete this e-mail (and all attachments) and destroy any hard copies. Nomura will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in, this e-mail. If verification is sought please request a hard copy. Any reference to the terms of executed transactions should be treated as preliminary only and subject to formal written confirmation by Nomura. Nomura reserves the right to retain, monitor and intercept e-mail communications through its networks (subject to and in accordance with applicable laws). No confidentiality or privilege is waived or lost by Nomura by any mistransmission of this e-mail. Any reference to "Nomura" is a reference to any entity in the Nomura Holdings, Inc. group. Please read our Electronic Communications Legal Notice which forms part of this e-mail: http://www.Nomura.com/email_disclaimer.htm


Re: Stop Drill querying .tmp files

Posted by Daniel Barclay <db...@maprtech.com>.
Instead of defining a hard-coded set of prefixes, suffixes, and/or
patterns, can we give users some kind of configuration parameter
somewhere?

Perhaps the file-system plug-in should have a configuration parameter
that is a list of "glob" or regular-expression patterns specifying
names to ignore, with a default (bootstrap) setting that covers
common cases.

Daniel

Mehant Baid wrote:
> I addressed the issue mentioned in DRILL-1131, ignoring files starting with an underscore and dot, this was implicitly added as part of drop table support. However did not notice the additional file type that needs to be handled (files ending with .tmp). If this is a common use case then I can create a trivial patch to do this.
>
> Thanks
> Mehant
> On 10/13/15 1:48 PM, Steven Phillips wrote:
>> DRILL-2424 has a comment from Mehant that this should be fixed, but that
>> there was some sort of merge conflict. Was this ever resolved? Or a new
>> jira filed?
>>
>> On Tue, Oct 13, 2015 at 10:13 AM, Rajkumar Singh <rs...@maprtech.com>
>> wrote:
>>
>>> There is related jira already filed
>>> https://issues.apache.org/jira/browse/DRILL-2799 <
>>> https://issues.apache.org/jira/browse/DRILL-2799>
>>>
>>>> On 13-Oct-2015, at 10:36 PM, Christopher Matta <cm...@mapr.com> wrote:
>>>>
>>>> I agree, would someone in more of a leadership position care to comment
>>> if
>>>> this warrants an enhancement Jira?
>>>>
>>>> On Tuesday, October 13, 2015, <michael.england@nomura.com <mailto:
>>> michael.england@nomura.com>> wrote:
>>>>> Thanks Chris, I have tested this and it works well. I think it would
>>> still
>>>>> be nice to be able to set an exclusion pattern in the workspace so that
>>> you
>>>>> don't have to code each query with this in mind.
>>>>>
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Christopher Matta [mailto:cmatta@mapr.com <javascript:;>]
>>>>> Sent: 13 October 2015 14:21
>>>>> To: user@drill.apache.org <javascript:;>
>>>>> Subject: Re: Stop Drill querying .tmp files
>>>>>
>>>>> Drill respects a file *inclusion* pattern, so you could build a view
>>> sort
>>>>> of like:
>>>>>
>>>>> select * from dfs.workspace.`dirname/*.csv`;
>>>>>
>>>>> ​
>>>>>
>>>>> Chris Matta
>>>>> cmatta@mapr.com <javascript:;>
>>>>> 215-701-3146
>>>>>
>>>>> On Tue, Oct 13, 2015 at 5:09 AM, <michael.england@nomura.com
>>>>> <javascript:;>> wrote:
>>>>>
>>>>>> FYI - by real time I mean data files which Flume has finished writing
>>>>>> to...so near real time!
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: England, Michael (IT/UK)
>>>>>> Sent: 13 October 2015 10:06
>>>>>> To: user@drill.apache.org <javascript:;>
>>>>>> Subject: Stop Drill querying .tmp files
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am trying to query data ingested by Flume in real time, however,
>>>>>> Flume writes out data to a file ending in .tmp and then renames it
>>>>>> once it has completed its writes. If you run a drill query on a large
>>>>>> data set and a .tmp file is renamed by Flume whilst the query is
>>>>>> running, it bombs out. I was looking for a way to specify a file
>>>>>> exclusion pattern with regex or something similar, however right now
>>>>>> this doesn’t seem possible. Right now, just making Drill exclude any
>>>>>> files ending in .tmp or starting with a . or a _ would be very useful
>>>>> for this reason.
>>>>>> I have seen the following JIRAs relating to this issue:
>>>>>>
>>>>>> https://issues.apache.org/jira/browse/DRILL-2424 - closed as a
>>>>>> duplicate
>>>>>>
>>>>>> https://issues.apache.org/jira/browse/DRILL-1131 - still open but
>>>>>> related to Parquet
>>>>>>
>>>>>> Is there another way to achieve this without having to wait for a
>>>>>> change on the Drill code base? I wrote a custom Hive class to achieve
>>>>>> the same functionality but I am not sure this is possible in Drill.
>>>>>>
>>>>>> Thanks,
>>>>>> Mike
>>>>>>
>>>>>>
>>>>>> This e-mail (including any attachments) is private and confidential,
>>>>>> may contain proprietary or privileged information and is intended for
>>>>>> the named
>>>>>> recipient(s) only. Unintended recipients are strictly prohibited from
>>>>>> taking action on the basis of information in this e-mail and must
>>>>>> contact the sender immediately, delete this e-mail (and all
>>>>>> attachments) and destroy any hard copies. Nomura will not accept
>>>>>> responsibility or liability for the accuracy or completeness of, or
>>>>>> the presence of any virus or disabling code in, this e-mail. If
>>>>>> verification is sought please request a hard copy. Any reference to
>>>>>> the terms of executed transactions should be treated as preliminary
>>>>>> only and subject to formal written confirmation by Nomura. Nomura
>>>>>> reserves the right to retain, monitor and intercept e-mail
>>>>>> communications through its networks (subject to and in accordance with
>>>>>> applicable laws). No confidentiality or privilege is waived or lost by
>>>>>> Nomura by any mistransmission of this e-mail. Any reference to
>>>>>> "Nomura" is a reference to any entity in the Nomura Holdings, Inc.
>>>>> group. Please read our Electronic Communications Legal Notice which
>>> forms
>>>>> part of this e-mail:
>>>>>> http://www.Nomura.com/email_disclaimer.htm
>>>>>>
>>>>>>
>>>>>>
>>>>>> This e-mail (including any attachments) is private and confidential,
>>>>>> may contain proprietary or privileged information and is intended for
>>>>>> the named
>>>>>> recipient(s) only. Unintended recipients are strictly prohibited from
>>>>>> taking action on the basis of information in this e-mail and must
>>>>>> contact the sender immediately, delete this e-mail (and all
>>>>>> attachments) and destroy any hard copies. Nomura will not accept
>>>>>> responsibility or liability for the accuracy or completeness of, or
>>>>>> the presence of any virus or disabling code in, this e-mail. If
>>>>>> verification is sought please request a hard copy. Any reference to
>>>>>> the terms of executed transactions should be treated as preliminary
>>>>>> only and subject to formal written confirmation by Nomura. Nomura
>>>>>> reserves the right to retain, monitor and intercept e-mail
>>>>>> communications through its networks (subject to and in accordance with
>>>>>> applicable laws). No confidentiality or privilege is waived or lost by
>>>>>> Nomura by any mistransmission of this e-mail. Any reference to
>>>>>> "Nomura" is a reference to any entity in the Nomura Holdings, Inc.
>>>>> group. Please read our Electronic Communications Legal Notice which
>>> forms
>>>>> part of this e-mail:
>>>>>> http://www.Nomura.com/email_disclaimer.htm
>>>>>>
>>>>>>
>>>>>
>>>>> This e-mail (including any attachments) is private and confidential, may
>>>>> contain proprietary or privileged information and is intended for the
>>> named
>>>>> recipient(s) only. Unintended recipients are strictly prohibited from
>>>>> taking action on the basis of information in this e-mail and must
>>> contact
>>>>> the sender immediately, delete this e-mail (and all attachments) and
>>>>> destroy any hard copies. Nomura will not accept responsibility or
>>> liability
>>>>> for the accuracy or completeness of, or the presence of any virus or
>>>>> disabling code in, this e-mail. If verification is sought please
>>> request a
>>>>> hard copy. Any reference to the terms of executed transactions should be
>>>>> treated as preliminary only and subject to formal written confirmation
>>> by
>>>>> Nomura. Nomura reserves the right to retain, monitor and intercept
>>> e-mail
>>>>> communications through its networks (subject to and in accordance with
>>>>> applicable laws). No confidentiality or privilege is waived or lost by
>>>>> Nomura by any mistransmission of this e-mail. Any reference to "Nomura"
>>> is
>>>>> a reference to any entity in the Nomura Holdings, Inc. group. Please
>>> read
>>>>> our Electronic Communications Legal Notice which forms part of this
>>> e-mail:
>>>>> http://www.Nomura.com/email_disclaimer.htm
>>>>>
>>>>>
>>>> --
>>>> Chris Matta
>>>> cmatta@mapr.com <ma...@mapr.com>
>>>> 215-701-3146
>>>
>


-- 
Daniel Barclay
MapR Technologies

Re: Stop Drill querying .tmp files

Posted by Mehant Baid <ba...@gmail.com>.
I addressed the issue mentioned in DRILL-1131, ignoring files starting 
with an underscore and dot, this was implicitly added as part of drop 
table support. However did not notice the additional file type that 
needs to be handled (files ending with .tmp). If this is a common use 
case then I can create a trivial patch to do this.

Thanks
Mehant
On 10/13/15 1:48 PM, Steven Phillips wrote:
> DRILL-2424 has a comment from Mehant that this should be fixed, but that
> there was some sort of merge conflict. Was this ever resolved? Or a new
> jira filed?
>
> On Tue, Oct 13, 2015 at 10:13 AM, Rajkumar Singh <rs...@maprtech.com>
> wrote:
>
>> There is related jira already filed
>> https://issues.apache.org/jira/browse/DRILL-2799 <
>> https://issues.apache.org/jira/browse/DRILL-2799>
>>
>>> On 13-Oct-2015, at 10:36 PM, Christopher Matta <cm...@mapr.com> wrote:
>>>
>>> I agree, would someone in more of a leadership position care to comment
>> if
>>> this warrants an enhancement Jira?
>>>
>>> On Tuesday, October 13, 2015, <michael.england@nomura.com <mailto:
>> michael.england@nomura.com>> wrote:
>>>> Thanks Chris, I have tested this and it works well. I think it would
>> still
>>>> be nice to be able to set an exclusion pattern in the workspace so that
>> you
>>>> don't have to code each query with this in mind.
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Christopher Matta [mailto:cmatta@mapr.com <javascript:;>]
>>>> Sent: 13 October 2015 14:21
>>>> To: user@drill.apache.org <javascript:;>
>>>> Subject: Re: Stop Drill querying .tmp files
>>>>
>>>> Drill respects a file *inclusion* pattern, so you could build a view
>> sort
>>>> of like:
>>>>
>>>> select * from dfs.workspace.`dirname/*.csv`;
>>>>
>>>> ​
>>>>
>>>> Chris Matta
>>>> cmatta@mapr.com <javascript:;>
>>>> 215-701-3146
>>>>
>>>> On Tue, Oct 13, 2015 at 5:09 AM, <michael.england@nomura.com
>>>> <javascript:;>> wrote:
>>>>
>>>>> FYI - by real time I mean data files which Flume has finished writing
>>>>> to...so near real time!
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: England, Michael (IT/UK)
>>>>> Sent: 13 October 2015 10:06
>>>>> To: user@drill.apache.org <javascript:;>
>>>>> Subject: Stop Drill querying .tmp files
>>>>>
>>>>> Hi,
>>>>>
>>>>> I am trying to query data ingested by Flume in real time, however,
>>>>> Flume writes out data to a file ending in .tmp and then renames it
>>>>> once it has completed its writes. If you run a drill query on a large
>>>>> data set and a .tmp file is renamed by Flume whilst the query is
>>>>> running, it bombs out. I was looking for a way to specify a file
>>>>> exclusion pattern with regex or something similar, however right now
>>>>> this doesn’t seem possible. Right now, just making Drill exclude any
>>>>> files ending in .tmp or starting with a . or a _ would be very useful
>>>> for this reason.
>>>>> I have seen the following JIRAs relating to this issue:
>>>>>
>>>>> https://issues.apache.org/jira/browse/DRILL-2424 - closed as a
>>>>> duplicate
>>>>>
>>>>> https://issues.apache.org/jira/browse/DRILL-1131 - still open but
>>>>> related to Parquet
>>>>>
>>>>> Is there another way to achieve this without having to wait for a
>>>>> change on the Drill code base? I wrote a custom Hive class to achieve
>>>>> the same functionality but I am not sure this is possible in Drill.
>>>>>
>>>>> Thanks,
>>>>> Mike
>>>>>
>>>>>
>>>>> This e-mail (including any attachments) is private and confidential,
>>>>> may contain proprietary or privileged information and is intended for
>>>>> the named
>>>>> recipient(s) only. Unintended recipients are strictly prohibited from
>>>>> taking action on the basis of information in this e-mail and must
>>>>> contact the sender immediately, delete this e-mail (and all
>>>>> attachments) and destroy any hard copies. Nomura will not accept
>>>>> responsibility or liability for the accuracy or completeness of, or
>>>>> the presence of any virus or disabling code in, this e-mail. If
>>>>> verification is sought please request a hard copy. Any reference to
>>>>> the terms of executed transactions should be treated as preliminary
>>>>> only and subject to formal written confirmation by Nomura. Nomura
>>>>> reserves the right to retain, monitor and intercept e-mail
>>>>> communications through its networks (subject to and in accordance with
>>>>> applicable laws). No confidentiality or privilege is waived or lost by
>>>>> Nomura by any mistransmission of this e-mail. Any reference to
>>>>> "Nomura" is a reference to any entity in the Nomura Holdings, Inc.
>>>> group. Please read our Electronic Communications Legal Notice which
>> forms
>>>> part of this e-mail:
>>>>> http://www.Nomura.com/email_disclaimer.htm
>>>>>
>>>>>
>>>>>
>>>>> This e-mail (including any attachments) is private and confidential,
>>>>> may contain proprietary or privileged information and is intended for
>>>>> the named
>>>>> recipient(s) only. Unintended recipients are strictly prohibited from
>>>>> taking action on the basis of information in this e-mail and must
>>>>> contact the sender immediately, delete this e-mail (and all
>>>>> attachments) and destroy any hard copies. Nomura will not accept
>>>>> responsibility or liability for the accuracy or completeness of, or
>>>>> the presence of any virus or disabling code in, this e-mail. If
>>>>> verification is sought please request a hard copy. Any reference to
>>>>> the terms of executed transactions should be treated as preliminary
>>>>> only and subject to formal written confirmation by Nomura. Nomura
>>>>> reserves the right to retain, monitor and intercept e-mail
>>>>> communications through its networks (subject to and in accordance with
>>>>> applicable laws). No confidentiality or privilege is waived or lost by
>>>>> Nomura by any mistransmission of this e-mail. Any reference to
>>>>> "Nomura" is a reference to any entity in the Nomura Holdings, Inc.
>>>> group. Please read our Electronic Communications Legal Notice which
>> forms
>>>> part of this e-mail:
>>>>> http://www.Nomura.com/email_disclaimer.htm
>>>>>
>>>>>
>>>>
>>>> This e-mail (including any attachments) is private and confidential, may
>>>> contain proprietary or privileged information and is intended for the
>> named
>>>> recipient(s) only. Unintended recipients are strictly prohibited from
>>>> taking action on the basis of information in this e-mail and must
>> contact
>>>> the sender immediately, delete this e-mail (and all attachments) and
>>>> destroy any hard copies. Nomura will not accept responsibility or
>> liability
>>>> for the accuracy or completeness of, or the presence of any virus or
>>>> disabling code in, this e-mail. If verification is sought please
>> request a
>>>> hard copy. Any reference to the terms of executed transactions should be
>>>> treated as preliminary only and subject to formal written confirmation
>> by
>>>> Nomura. Nomura reserves the right to retain, monitor and intercept
>> e-mail
>>>> communications through its networks (subject to and in accordance with
>>>> applicable laws). No confidentiality or privilege is waived or lost by
>>>> Nomura by any mistransmission of this e-mail. Any reference to "Nomura"
>> is
>>>> a reference to any entity in the Nomura Holdings, Inc. group. Please
>> read
>>>> our Electronic Communications Legal Notice which forms part of this
>> e-mail:
>>>> http://www.Nomura.com/email_disclaimer.htm
>>>>
>>>>
>>> --
>>> Chris Matta
>>> cmatta@mapr.com <ma...@mapr.com>
>>> 215-701-3146
>>


Re: Stop Drill querying .tmp files

Posted by Steven Phillips <st...@dremio.com>.
DRILL-2424 has a comment from Mehant that this should be fixed, but that
there was some sort of merge conflict. Was this ever resolved? Or a new
jira filed?

On Tue, Oct 13, 2015 at 10:13 AM, Rajkumar Singh <rs...@maprtech.com>
wrote:

> There is related jira already filed
> https://issues.apache.org/jira/browse/DRILL-2799 <
> https://issues.apache.org/jira/browse/DRILL-2799>
>
> > On 13-Oct-2015, at 10:36 PM, Christopher Matta <cm...@mapr.com> wrote:
> >
> > I agree, would someone in more of a leadership position care to comment
> if
> > this warrants an enhancement Jira?
> >
> > On Tuesday, October 13, 2015, <michael.england@nomura.com <mailto:
> michael.england@nomura.com>> wrote:
> >
> >> Thanks Chris, I have tested this and it works well. I think it would
> still
> >> be nice to be able to set an exclusion pattern in the workspace so that
> you
> >> don't have to code each query with this in mind.
> >>
> >>
> >>
> >> -----Original Message-----
> >> From: Christopher Matta [mailto:cmatta@mapr.com <javascript:;>]
> >> Sent: 13 October 2015 14:21
> >> To: user@drill.apache.org <javascript:;>
> >> Subject: Re: Stop Drill querying .tmp files
> >>
> >> Drill respects a file *inclusion* pattern, so you could build a view
> sort
> >> of like:
> >>
> >> select * from dfs.workspace.`dirname/*.csv`;
> >>
> >> ​
> >>
> >> Chris Matta
> >> cmatta@mapr.com <javascript:;>
> >> 215-701-3146
> >>
> >> On Tue, Oct 13, 2015 at 5:09 AM, <michael.england@nomura.com
> >> <javascript:;>> wrote:
> >>
> >>> FYI - by real time I mean data files which Flume has finished writing
> >>> to...so near real time!
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: England, Michael (IT/UK)
> >>> Sent: 13 October 2015 10:06
> >>> To: user@drill.apache.org <javascript:;>
> >>> Subject: Stop Drill querying .tmp files
> >>>
> >>> Hi,
> >>>
> >>> I am trying to query data ingested by Flume in real time, however,
> >>> Flume writes out data to a file ending in .tmp and then renames it
> >>> once it has completed its writes. If you run a drill query on a large
> >>> data set and a .tmp file is renamed by Flume whilst the query is
> >>> running, it bombs out. I was looking for a way to specify a file
> >>> exclusion pattern with regex or something similar, however right now
> >>> this doesn’t seem possible. Right now, just making Drill exclude any
> >>> files ending in .tmp or starting with a . or a _ would be very useful
> >> for this reason.
> >>>
> >>> I have seen the following JIRAs relating to this issue:
> >>>
> >>> https://issues.apache.org/jira/browse/DRILL-2424 - closed as a
> >>> duplicate
> >>>
> >>> https://issues.apache.org/jira/browse/DRILL-1131 - still open but
> >>> related to Parquet
> >>>
> >>> Is there another way to achieve this without having to wait for a
> >>> change on the Drill code base? I wrote a custom Hive class to achieve
> >>> the same functionality but I am not sure this is possible in Drill.
> >>>
> >>> Thanks,
> >>> Mike
> >>>
> >>>
> >>> This e-mail (including any attachments) is private and confidential,
> >>> may contain proprietary or privileged information and is intended for
> >>> the named
> >>> recipient(s) only. Unintended recipients are strictly prohibited from
> >>> taking action on the basis of information in this e-mail and must
> >>> contact the sender immediately, delete this e-mail (and all
> >>> attachments) and destroy any hard copies. Nomura will not accept
> >>> responsibility or liability for the accuracy or completeness of, or
> >>> the presence of any virus or disabling code in, this e-mail. If
> >>> verification is sought please request a hard copy. Any reference to
> >>> the terms of executed transactions should be treated as preliminary
> >>> only and subject to formal written confirmation by Nomura. Nomura
> >>> reserves the right to retain, monitor and intercept e-mail
> >>> communications through its networks (subject to and in accordance with
> >>> applicable laws). No confidentiality or privilege is waived or lost by
> >>> Nomura by any mistransmission of this e-mail. Any reference to
> >>> "Nomura" is a reference to any entity in the Nomura Holdings, Inc.
> >> group. Please read our Electronic Communications Legal Notice which
> forms
> >> part of this e-mail:
> >>> http://www.Nomura.com/email_disclaimer.htm
> >>>
> >>>
> >>>
> >>> This e-mail (including any attachments) is private and confidential,
> >>> may contain proprietary or privileged information and is intended for
> >>> the named
> >>> recipient(s) only. Unintended recipients are strictly prohibited from
> >>> taking action on the basis of information in this e-mail and must
> >>> contact the sender immediately, delete this e-mail (and all
> >>> attachments) and destroy any hard copies. Nomura will not accept
> >>> responsibility or liability for the accuracy or completeness of, or
> >>> the presence of any virus or disabling code in, this e-mail. If
> >>> verification is sought please request a hard copy. Any reference to
> >>> the terms of executed transactions should be treated as preliminary
> >>> only and subject to formal written confirmation by Nomura. Nomura
> >>> reserves the right to retain, monitor and intercept e-mail
> >>> communications through its networks (subject to and in accordance with
> >>> applicable laws). No confidentiality or privilege is waived or lost by
> >>> Nomura by any mistransmission of this e-mail. Any reference to
> >>> "Nomura" is a reference to any entity in the Nomura Holdings, Inc.
> >> group. Please read our Electronic Communications Legal Notice which
> forms
> >> part of this e-mail:
> >>> http://www.Nomura.com/email_disclaimer.htm
> >>>
> >>>
> >>
> >>
> >> This e-mail (including any attachments) is private and confidential, may
> >> contain proprietary or privileged information and is intended for the
> named
> >> recipient(s) only. Unintended recipients are strictly prohibited from
> >> taking action on the basis of information in this e-mail and must
> contact
> >> the sender immediately, delete this e-mail (and all attachments) and
> >> destroy any hard copies. Nomura will not accept responsibility or
> liability
> >> for the accuracy or completeness of, or the presence of any virus or
> >> disabling code in, this e-mail. If verification is sought please
> request a
> >> hard copy. Any reference to the terms of executed transactions should be
> >> treated as preliminary only and subject to formal written confirmation
> by
> >> Nomura. Nomura reserves the right to retain, monitor and intercept
> e-mail
> >> communications through its networks (subject to and in accordance with
> >> applicable laws). No confidentiality or privilege is waived or lost by
> >> Nomura by any mistransmission of this e-mail. Any reference to "Nomura"
> is
> >> a reference to any entity in the Nomura Holdings, Inc. group. Please
> read
> >> our Electronic Communications Legal Notice which forms part of this
> e-mail:
> >> http://www.Nomura.com/email_disclaimer.htm
> >>
> >>
> >
> > --
> > Chris Matta
> > cmatta@mapr.com <ma...@mapr.com>
> > 215-701-3146
>
>

Re: Stop Drill querying .tmp files

Posted by Rajkumar Singh <rs...@maprtech.com>.
There is related jira already filed https://issues.apache.org/jira/browse/DRILL-2799 <https://issues.apache.org/jira/browse/DRILL-2799>

> On 13-Oct-2015, at 10:36 PM, Christopher Matta <cm...@mapr.com> wrote:
> 
> I agree, would someone in more of a leadership position care to comment if
> this warrants an enhancement Jira?
> 
> On Tuesday, October 13, 2015, <michael.england@nomura.com <ma...@nomura.com>> wrote:
> 
>> Thanks Chris, I have tested this and it works well. I think it would still
>> be nice to be able to set an exclusion pattern in the workspace so that you
>> don't have to code each query with this in mind.
>> 
>> 
>> 
>> -----Original Message-----
>> From: Christopher Matta [mailto:cmatta@mapr.com <javascript:;>]
>> Sent: 13 October 2015 14:21
>> To: user@drill.apache.org <javascript:;>
>> Subject: Re: Stop Drill querying .tmp files
>> 
>> Drill respects a file *inclusion* pattern, so you could build a view sort
>> of like:
>> 
>> select * from dfs.workspace.`dirname/*.csv`;
>> 
>> ​
>> 
>> Chris Matta
>> cmatta@mapr.com <javascript:;>
>> 215-701-3146
>> 
>> On Tue, Oct 13, 2015 at 5:09 AM, <michael.england@nomura.com
>> <javascript:;>> wrote:
>> 
>>> FYI - by real time I mean data files which Flume has finished writing
>>> to...so near real time!
>>> 
>>> 
>>> -----Original Message-----
>>> From: England, Michael (IT/UK)
>>> Sent: 13 October 2015 10:06
>>> To: user@drill.apache.org <javascript:;>
>>> Subject: Stop Drill querying .tmp files
>>> 
>>> Hi,
>>> 
>>> I am trying to query data ingested by Flume in real time, however,
>>> Flume writes out data to a file ending in .tmp and then renames it
>>> once it has completed its writes. If you run a drill query on a large
>>> data set and a .tmp file is renamed by Flume whilst the query is
>>> running, it bombs out. I was looking for a way to specify a file
>>> exclusion pattern with regex or something similar, however right now
>>> this doesn’t seem possible. Right now, just making Drill exclude any
>>> files ending in .tmp or starting with a . or a _ would be very useful
>> for this reason.
>>> 
>>> I have seen the following JIRAs relating to this issue:
>>> 
>>> https://issues.apache.org/jira/browse/DRILL-2424 - closed as a
>>> duplicate
>>> 
>>> https://issues.apache.org/jira/browse/DRILL-1131 - still open but
>>> related to Parquet
>>> 
>>> Is there another way to achieve this without having to wait for a
>>> change on the Drill code base? I wrote a custom Hive class to achieve
>>> the same functionality but I am not sure this is possible in Drill.
>>> 
>>> Thanks,
>>> Mike
>>> 
>>> 
>>> This e-mail (including any attachments) is private and confidential,
>>> may contain proprietary or privileged information and is intended for
>>> the named
>>> recipient(s) only. Unintended recipients are strictly prohibited from
>>> taking action on the basis of information in this e-mail and must
>>> contact the sender immediately, delete this e-mail (and all
>>> attachments) and destroy any hard copies. Nomura will not accept
>>> responsibility or liability for the accuracy or completeness of, or
>>> the presence of any virus or disabling code in, this e-mail. If
>>> verification is sought please request a hard copy. Any reference to
>>> the terms of executed transactions should be treated as preliminary
>>> only and subject to formal written confirmation by Nomura. Nomura
>>> reserves the right to retain, monitor and intercept e-mail
>>> communications through its networks (subject to and in accordance with
>>> applicable laws). No confidentiality or privilege is waived or lost by
>>> Nomura by any mistransmission of this e-mail. Any reference to
>>> "Nomura" is a reference to any entity in the Nomura Holdings, Inc.
>> group. Please read our Electronic Communications Legal Notice which forms
>> part of this e-mail:
>>> http://www.Nomura.com/email_disclaimer.htm
>>> 
>>> 
>>> 
>>> This e-mail (including any attachments) is private and confidential,
>>> may contain proprietary or privileged information and is intended for
>>> the named
>>> recipient(s) only. Unintended recipients are strictly prohibited from
>>> taking action on the basis of information in this e-mail and must
>>> contact the sender immediately, delete this e-mail (and all
>>> attachments) and destroy any hard copies. Nomura will not accept
>>> responsibility or liability for the accuracy or completeness of, or
>>> the presence of any virus or disabling code in, this e-mail. If
>>> verification is sought please request a hard copy. Any reference to
>>> the terms of executed transactions should be treated as preliminary
>>> only and subject to formal written confirmation by Nomura. Nomura
>>> reserves the right to retain, monitor and intercept e-mail
>>> communications through its networks (subject to and in accordance with
>>> applicable laws). No confidentiality or privilege is waived or lost by
>>> Nomura by any mistransmission of this e-mail. Any reference to
>>> "Nomura" is a reference to any entity in the Nomura Holdings, Inc.
>> group. Please read our Electronic Communications Legal Notice which forms
>> part of this e-mail:
>>> http://www.Nomura.com/email_disclaimer.htm
>>> 
>>> 
>> 
>> 
>> This e-mail (including any attachments) is private and confidential, may
>> contain proprietary or privileged information and is intended for the named
>> recipient(s) only. Unintended recipients are strictly prohibited from
>> taking action on the basis of information in this e-mail and must contact
>> the sender immediately, delete this e-mail (and all attachments) and
>> destroy any hard copies. Nomura will not accept responsibility or liability
>> for the accuracy or completeness of, or the presence of any virus or
>> disabling code in, this e-mail. If verification is sought please request a
>> hard copy. Any reference to the terms of executed transactions should be
>> treated as preliminary only and subject to formal written confirmation by
>> Nomura. Nomura reserves the right to retain, monitor and intercept e-mail
>> communications through its networks (subject to and in accordance with
>> applicable laws). No confidentiality or privilege is waived or lost by
>> Nomura by any mistransmission of this e-mail. Any reference to "Nomura" is
>> a reference to any entity in the Nomura Holdings, Inc. group. Please read
>> our Electronic Communications Legal Notice which forms part of this e-mail:
>> http://www.Nomura.com/email_disclaimer.htm
>> 
>> 
> 
> -- 
> Chris Matta
> cmatta@mapr.com <ma...@mapr.com>
> 215-701-3146


Re: Stop Drill querying .tmp files

Posted by Christopher Matta <cm...@mapr.com>.
I agree, would someone in more of a leadership position care to comment if
this warrants an enhancement Jira?

On Tuesday, October 13, 2015, <mi...@nomura.com> wrote:

> Thanks Chris, I have tested this and it works well. I think it would still
> be nice to be able to set an exclusion pattern in the workspace so that you
> don't have to code each query with this in mind.
>
>
>
> -----Original Message-----
> From: Christopher Matta [mailto:cmatta@mapr.com <javascript:;>]
> Sent: 13 October 2015 14:21
> To: user@drill.apache.org <javascript:;>
> Subject: Re: Stop Drill querying .tmp files
>
> Drill respects a file *inclusion* pattern, so you could build a view sort
> of like:
>
> select * from dfs.workspace.`dirname/*.csv`;
>
> ​
>
> Chris Matta
> cmatta@mapr.com <javascript:;>
> 215-701-3146
>
> On Tue, Oct 13, 2015 at 5:09 AM, <michael.england@nomura.com
> <javascript:;>> wrote:
>
> > FYI - by real time I mean data files which Flume has finished writing
> > to...so near real time!
> >
> >
> > -----Original Message-----
> > From: England, Michael (IT/UK)
> > Sent: 13 October 2015 10:06
> > To: user@drill.apache.org <javascript:;>
> > Subject: Stop Drill querying .tmp files
> >
> > Hi,
> >
> > I am trying to query data ingested by Flume in real time, however,
> > Flume writes out data to a file ending in .tmp and then renames it
> > once it has completed its writes. If you run a drill query on a large
> > data set and a .tmp file is renamed by Flume whilst the query is
> > running, it bombs out. I was looking for a way to specify a file
> > exclusion pattern with regex or something similar, however right now
> > this doesn’t seem possible. Right now, just making Drill exclude any
> > files ending in .tmp or starting with a . or a _ would be very useful
> for this reason.
> >
> > I have seen the following JIRAs relating to this issue:
> >
> > https://issues.apache.org/jira/browse/DRILL-2424 - closed as a
> > duplicate
> >
> > https://issues.apache.org/jira/browse/DRILL-1131 - still open but
> > related to Parquet
> >
> > Is there another way to achieve this without having to wait for a
> > change on the Drill code base? I wrote a custom Hive class to achieve
> > the same functionality but I am not sure this is possible in Drill.
> >
> > Thanks,
> > Mike
> >
> >
> > This e-mail (including any attachments) is private and confidential,
> > may contain proprietary or privileged information and is intended for
> > the named
> > recipient(s) only. Unintended recipients are strictly prohibited from
> > taking action on the basis of information in this e-mail and must
> > contact the sender immediately, delete this e-mail (and all
> > attachments) and destroy any hard copies. Nomura will not accept
> > responsibility or liability for the accuracy or completeness of, or
> > the presence of any virus or disabling code in, this e-mail. If
> > verification is sought please request a hard copy. Any reference to
> > the terms of executed transactions should be treated as preliminary
> > only and subject to formal written confirmation by Nomura. Nomura
> > reserves the right to retain, monitor and intercept e-mail
> > communications through its networks (subject to and in accordance with
> > applicable laws). No confidentiality or privilege is waived or lost by
> > Nomura by any mistransmission of this e-mail. Any reference to
> > "Nomura" is a reference to any entity in the Nomura Holdings, Inc.
> group. Please read our Electronic Communications Legal Notice which forms
> part of this e-mail:
> > http://www.Nomura.com/email_disclaimer.htm
> >
> >
> >
> > This e-mail (including any attachments) is private and confidential,
> > may contain proprietary or privileged information and is intended for
> > the named
> > recipient(s) only. Unintended recipients are strictly prohibited from
> > taking action on the basis of information in this e-mail and must
> > contact the sender immediately, delete this e-mail (and all
> > attachments) and destroy any hard copies. Nomura will not accept
> > responsibility or liability for the accuracy or completeness of, or
> > the presence of any virus or disabling code in, this e-mail. If
> > verification is sought please request a hard copy. Any reference to
> > the terms of executed transactions should be treated as preliminary
> > only and subject to formal written confirmation by Nomura. Nomura
> > reserves the right to retain, monitor and intercept e-mail
> > communications through its networks (subject to and in accordance with
> > applicable laws). No confidentiality or privilege is waived or lost by
> > Nomura by any mistransmission of this e-mail. Any reference to
> > "Nomura" is a reference to any entity in the Nomura Holdings, Inc.
> group. Please read our Electronic Communications Legal Notice which forms
> part of this e-mail:
> > http://www.Nomura.com/email_disclaimer.htm
> >
> >
>
>
> This e-mail (including any attachments) is private and confidential, may
> contain proprietary or privileged information and is intended for the named
> recipient(s) only. Unintended recipients are strictly prohibited from
> taking action on the basis of information in this e-mail and must contact
> the sender immediately, delete this e-mail (and all attachments) and
> destroy any hard copies. Nomura will not accept responsibility or liability
> for the accuracy or completeness of, or the presence of any virus or
> disabling code in, this e-mail. If verification is sought please request a
> hard copy. Any reference to the terms of executed transactions should be
> treated as preliminary only and subject to formal written confirmation by
> Nomura. Nomura reserves the right to retain, monitor and intercept e-mail
> communications through its networks (subject to and in accordance with
> applicable laws). No confidentiality or privilege is waived or lost by
> Nomura by any mistransmission of this e-mail. Any reference to "Nomura" is
> a reference to any entity in the Nomura Holdings, Inc. group. Please read
> our Electronic Communications Legal Notice which forms part of this e-mail:
> http://www.Nomura.com/email_disclaimer.htm
>
>

-- 
Chris Matta
cmatta@mapr.com
215-701-3146

RE: Stop Drill querying .tmp files

Posted by mi...@nomura.com.
Thanks Chris, I have tested this and it works well. I think it would still be nice to be able to set an exclusion pattern in the workspace so that you don't have to code each query with this in mind.



-----Original Message-----
From: Christopher Matta [mailto:cmatta@mapr.com] 
Sent: 13 October 2015 14:21
To: user@drill.apache.org
Subject: Re: Stop Drill querying .tmp files

Drill respects a file *inclusion* pattern, so you could build a view sort of like:

select * from dfs.workspace.`dirname/*.csv`;

​

Chris Matta
cmatta@mapr.com
215-701-3146

On Tue, Oct 13, 2015 at 5:09 AM, <mi...@nomura.com> wrote:

> FYI - by real time I mean data files which Flume has finished writing 
> to...so near real time!
>
>
> -----Original Message-----
> From: England, Michael (IT/UK)
> Sent: 13 October 2015 10:06
> To: user@drill.apache.org
> Subject: Stop Drill querying .tmp files
>
> Hi,
>
> I am trying to query data ingested by Flume in real time, however, 
> Flume writes out data to a file ending in .tmp and then renames it 
> once it has completed its writes. If you run a drill query on a large 
> data set and a .tmp file is renamed by Flume whilst the query is 
> running, it bombs out. I was looking for a way to specify a file 
> exclusion pattern with regex or something similar, however right now 
> this doesn’t seem possible. Right now, just making Drill exclude any 
> files ending in .tmp or starting with a . or a _ would be very useful for this reason.
>
> I have seen the following JIRAs relating to this issue:
>
> https://issues.apache.org/jira/browse/DRILL-2424 - closed as a 
> duplicate
>
> https://issues.apache.org/jira/browse/DRILL-1131 - still open but 
> related to Parquet
>
> Is there another way to achieve this without having to wait for a 
> change on the Drill code base? I wrote a custom Hive class to achieve 
> the same functionality but I am not sure this is possible in Drill.
>
> Thanks,
> Mike
>
>
> This e-mail (including any attachments) is private and confidential, 
> may contain proprietary or privileged information and is intended for 
> the named
> recipient(s) only. Unintended recipients are strictly prohibited from 
> taking action on the basis of information in this e-mail and must 
> contact the sender immediately, delete this e-mail (and all 
> attachments) and destroy any hard copies. Nomura will not accept 
> responsibility or liability for the accuracy or completeness of, or 
> the presence of any virus or disabling code in, this e-mail. If 
> verification is sought please request a hard copy. Any reference to 
> the terms of executed transactions should be treated as preliminary 
> only and subject to formal written confirmation by Nomura. Nomura 
> reserves the right to retain, monitor and intercept e-mail 
> communications through its networks (subject to and in accordance with 
> applicable laws). No confidentiality or privilege is waived or lost by 
> Nomura by any mistransmission of this e-mail. Any reference to 
> "Nomura" is a reference to any entity in the Nomura Holdings, Inc. group. Please read our Electronic Communications Legal Notice which forms part of this e-mail:
> http://www.Nomura.com/email_disclaimer.htm
>
>
>
> This e-mail (including any attachments) is private and confidential, 
> may contain proprietary or privileged information and is intended for 
> the named
> recipient(s) only. Unintended recipients are strictly prohibited from 
> taking action on the basis of information in this e-mail and must 
> contact the sender immediately, delete this e-mail (and all 
> attachments) and destroy any hard copies. Nomura will not accept 
> responsibility or liability for the accuracy or completeness of, or 
> the presence of any virus or disabling code in, this e-mail. If 
> verification is sought please request a hard copy. Any reference to 
> the terms of executed transactions should be treated as preliminary 
> only and subject to formal written confirmation by Nomura. Nomura 
> reserves the right to retain, monitor and intercept e-mail 
> communications through its networks (subject to and in accordance with 
> applicable laws). No confidentiality or privilege is waived or lost by 
> Nomura by any mistransmission of this e-mail. Any reference to 
> "Nomura" is a reference to any entity in the Nomura Holdings, Inc. group. Please read our Electronic Communications Legal Notice which forms part of this e-mail:
> http://www.Nomura.com/email_disclaimer.htm
>
>


This e-mail (including any attachments) is private and confidential, may contain proprietary or privileged information and is intended for the named recipient(s) only. Unintended recipients are strictly prohibited from taking action on the basis of information in this e-mail and must contact the sender immediately, delete this e-mail (and all attachments) and destroy any hard copies. Nomura will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in, this e-mail. If verification is sought please request a hard copy. Any reference to the terms of executed transactions should be treated as preliminary only and subject to formal written confirmation by Nomura. Nomura reserves the right to retain, monitor and intercept e-mail communications through its networks (subject to and in accordance with applicable laws). No confidentiality or privilege is waived or lost by Nomura by any mistransmission of this e-mail. Any reference to "Nomura" is a reference to any entity in the Nomura Holdings, Inc. group. Please read our Electronic Communications Legal Notice which forms part of this e-mail: http://www.Nomura.com/email_disclaimer.htm


Re: Stop Drill querying .tmp files

Posted by Christopher Matta <cm...@mapr.com>.
Drill respects a file *inclusion* pattern, so you could build a view sort
of like:

select * from dfs.workspace.`dirname/*.csv`;

​

Chris Matta
cmatta@mapr.com
215-701-3146

On Tue, Oct 13, 2015 at 5:09 AM, <mi...@nomura.com> wrote:

> FYI - by real time I mean data files which Flume has finished writing
> to...so near real time!
>
>
> -----Original Message-----
> From: England, Michael (IT/UK)
> Sent: 13 October 2015 10:06
> To: user@drill.apache.org
> Subject: Stop Drill querying .tmp files
>
> Hi,
>
> I am trying to query data ingested by Flume in real time, however, Flume
> writes out data to a file ending in .tmp and then renames it once it has
> completed its writes. If you run a drill query on a large data set and a
> .tmp file is renamed by Flume whilst the query is running, it bombs out. I
> was looking for a way to specify a file exclusion pattern with regex or
> something similar, however right now this doesn’t seem possible. Right now,
> just making Drill exclude any files ending in .tmp or starting with a . or
> a _ would be very useful for this reason.
>
> I have seen the following JIRAs relating to this issue:
>
> https://issues.apache.org/jira/browse/DRILL-2424 - closed as a duplicate
>
> https://issues.apache.org/jira/browse/DRILL-1131 - still open but related
> to Parquet
>
> Is there another way to achieve this without having to wait for a change
> on the Drill code base? I wrote a custom Hive class to achieve the same
> functionality but I am not sure this is possible in Drill.
>
> Thanks,
> Mike
>
>
> This e-mail (including any attachments) is private and confidential, may
> contain proprietary or privileged information and is intended for the named
> recipient(s) only. Unintended recipients are strictly prohibited from
> taking action on the basis of information in this e-mail and must contact
> the sender immediately, delete this e-mail (and all attachments) and
> destroy any hard copies. Nomura will not accept responsibility or liability
> for the accuracy or completeness of, or the presence of any virus or
> disabling code in, this e-mail. If verification is sought please request a
> hard copy. Any reference to the terms of executed transactions should be
> treated as preliminary only and subject to formal written confirmation by
> Nomura. Nomura reserves the right to retain, monitor and intercept e-mail
> communications through its networks (subject to and in accordance with
> applicable laws). No confidentiality or privilege is waived or lost by
> Nomura by any mistransmission of this e-mail. Any reference to "Nomura" is
> a reference to any entity in the Nomura Holdings, Inc. group. Please read
> our Electronic Communications Legal Notice which forms part of this e-mail:
> http://www.Nomura.com/email_disclaimer.htm
>
>
>
> This e-mail (including any attachments) is private and confidential, may
> contain proprietary or privileged information and is intended for the named
> recipient(s) only. Unintended recipients are strictly prohibited from
> taking action on the basis of information in this e-mail and must contact
> the sender immediately, delete this e-mail (and all attachments) and
> destroy any hard copies. Nomura will not accept responsibility or liability
> for the accuracy or completeness of, or the presence of any virus or
> disabling code in, this e-mail. If verification is sought please request a
> hard copy. Any reference to the terms of executed transactions should be
> treated as preliminary only and subject to formal written confirmation by
> Nomura. Nomura reserves the right to retain, monitor and intercept e-mail
> communications through its networks (subject to and in accordance with
> applicable laws). No confidentiality or privilege is waived or lost by
> Nomura by any mistransmission of this e-mail. Any reference to "Nomura" is
> a reference to any entity in the Nomura Holdings, Inc. group. Please read
> our Electronic Communications Legal Notice which forms part of this e-mail:
> http://www.Nomura.com/email_disclaimer.htm
>
>

RE: Stop Drill querying .tmp files

Posted by mi...@nomura.com.
FYI - by real time I mean data files which Flume has finished writing to...so near real time!


-----Original Message-----
From: England, Michael (IT/UK) 
Sent: 13 October 2015 10:06
To: user@drill.apache.org
Subject: Stop Drill querying .tmp files

Hi,

I am trying to query data ingested by Flume in real time, however, Flume writes out data to a file ending in .tmp and then renames it once it has completed its writes. If you run a drill query on a large data set and a .tmp file is renamed by Flume whilst the query is running, it bombs out. I was looking for a way to specify a file exclusion pattern with regex or something similar, however right now this doesn’t seem possible. Right now, just making Drill exclude any files ending in .tmp or starting with a . or a _ would be very useful for this reason.

I have seen the following JIRAs relating to this issue:

https://issues.apache.org/jira/browse/DRILL-2424 - closed as a duplicate

https://issues.apache.org/jira/browse/DRILL-1131 - still open but related to Parquet

Is there another way to achieve this without having to wait for a change on the Drill code base? I wrote a custom Hive class to achieve the same functionality but I am not sure this is possible in Drill.

Thanks,
Mike


This e-mail (including any attachments) is private and confidential, may contain proprietary or privileged information and is intended for the named recipient(s) only. Unintended recipients are strictly prohibited from taking action on the basis of information in this e-mail and must contact the sender immediately, delete this e-mail (and all attachments) and destroy any hard copies. Nomura will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in, this e-mail. If verification is sought please request a hard copy. Any reference to the terms of executed transactions should be treated as preliminary only and subject to formal written confirmation by Nomura. Nomura reserves the right to retain, monitor and intercept e-mail communications through its networks (subject to and in accordance with applicable laws). No confidentiality or privilege is waived or lost by Nomura by any mistransmission of this e-mail. Any reference to "Nomura" is a reference to any entity in the Nomura Holdings, Inc. group. Please read our Electronic Communications Legal Notice which forms part of this e-mail: http://www.Nomura.com/email_disclaimer.htm



This e-mail (including any attachments) is private and confidential, may contain proprietary or privileged information and is intended for the named recipient(s) only. Unintended recipients are strictly prohibited from taking action on the basis of information in this e-mail and must contact the sender immediately, delete this e-mail (and all attachments) and destroy any hard copies. Nomura will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in, this e-mail. If verification is sought please request a hard copy. Any reference to the terms of executed transactions should be treated as preliminary only and subject to formal written confirmation by Nomura. Nomura reserves the right to retain, monitor and intercept e-mail communications through its networks (subject to and in accordance with applicable laws). No confidentiality or privilege is waived or lost by Nomura by any mistransmission of this e-mail. Any reference to "Nomura" is a reference to any entity in the Nomura Holdings, Inc. group. Please read our Electronic Communications Legal Notice which forms part of this e-mail: http://www.Nomura.com/email_disclaimer.htm