You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Sean Hsuan-Yi Chu <hs...@usc.edu> on 2015/11/16 19:45:20 UTC

Proposal for Skipping Records

Hi all,
We have worked on coming up a design document on this topic, which focuses
on external design. Thanks Neeraja for summarizing a document as below:

https://docs.google.com/document/d/1D4mDS-N722MZtkeYGSJbY-wUHG5E8IMT9rIMk1NHHGA/edit

Please help take a look and offer some feedback.

Re: Proposal for Skipping Records

Posted by Khurram Faraaz <kf...@maprtech.com>.
Agree with Julian.
Users definitely should not have to interpret failure scenarios (i.e.
warnings or errors) by having to look at Exceptions in the logs.

On Mon, Nov 16, 2015 at 3:07 PM, Julian Hyde <jh...@apache.org> wrote:

> Fair enough.
>
> Remember that end users don’t (in general) write Java functions and don’t
> know what exceptions are. If your intent is to write a specification, you
> should describe in SQL terms what are error conditions for the built-in
> operators.
>
> > On Nov 16, 2015, at 3:01 PM, Sean Hsuan-Yi Chu <hs...@usc.edu> wrote:
> >
> > It is defined with respect to the behavior of function evaluation. If a
> > function evaluation fails (throwing exceptions), we then considered
> > something bad with the input record of this function.
> >
> > I agree people might have different beliefs on the definition. However,
> > from the aspect of users' experience, they could just proceed and see the
> > different types of errors at the log, which helps them judge whether the
> > failure is tolerable or not.
> >
> > On Mon, Nov 16, 2015 at 11:56 AM, Julian Hyde <jh...@apache.org> wrote:
> >
> >> It would be useful if you could describe the different ways that a
> record
> >> can be “bad”. IIRC the SQL standard divides the conditions into errors
> and
> >> warnings. Examples of a warning would be a string column that is
> truncated
> >> because it is too large for a varchar(20), or numeric underflow when you
> >> add 10.00001 to 100000. Examples of errors would be divide-by-zero or
> >> inserting a NULL value into a column declared NOT NULL.
> >>
> >> Maybe Drill has a different set of error and warning conditions than
> this
> >> (but probably not THAT different). But it would be useful to spell them
> >> out. And it would be useful to be able to treat “error” and “warning”
> >> conditions differently.
> >>
> >> Julian
> >>
> >>
> >>> On Nov 16, 2015, at 10:45 AM, Sean Hsuan-Yi Chu <hs...@usc.edu>
> wrote:
> >>>
> >>> Hi all,
> >>> We have worked on coming up a design document on this topic, which
> >> focuses
> >>> on external design. Thanks Neeraja for summarizing a document as below:
> >>>
> >>>
> >>
> https://docs.google.com/document/d/1D4mDS-N722MZtkeYGSJbY-wUHG5E8IMT9rIMk1NHHGA/edit
> >>>
> >>> Please help take a look and offer some feedback.
> >>
> >>
>
>

Re: Proposal for Skipping Records

Posted by Khurram Faraaz <kf...@maprtech.com>.
Agree with Julian.
Users definitely should not have to interpret failure scenarios (i.e.
warnings or errors) by having to look at Exceptions in the logs.

On Mon, Nov 16, 2015 at 3:07 PM, Julian Hyde <jh...@apache.org> wrote:

> Fair enough.
>
> Remember that end users don’t (in general) write Java functions and don’t
> know what exceptions are. If your intent is to write a specification, you
> should describe in SQL terms what are error conditions for the built-in
> operators.
>
> > On Nov 16, 2015, at 3:01 PM, Sean Hsuan-Yi Chu <hs...@usc.edu> wrote:
> >
> > It is defined with respect to the behavior of function evaluation. If a
> > function evaluation fails (throwing exceptions), we then considered
> > something bad with the input record of this function.
> >
> > I agree people might have different beliefs on the definition. However,
> > from the aspect of users' experience, they could just proceed and see the
> > different types of errors at the log, which helps them judge whether the
> > failure is tolerable or not.
> >
> > On Mon, Nov 16, 2015 at 11:56 AM, Julian Hyde <jh...@apache.org> wrote:
> >
> >> It would be useful if you could describe the different ways that a
> record
> >> can be “bad”. IIRC the SQL standard divides the conditions into errors
> and
> >> warnings. Examples of a warning would be a string column that is
> truncated
> >> because it is too large for a varchar(20), or numeric underflow when you
> >> add 10.00001 to 100000. Examples of errors would be divide-by-zero or
> >> inserting a NULL value into a column declared NOT NULL.
> >>
> >> Maybe Drill has a different set of error and warning conditions than
> this
> >> (but probably not THAT different). But it would be useful to spell them
> >> out. And it would be useful to be able to treat “error” and “warning”
> >> conditions differently.
> >>
> >> Julian
> >>
> >>
> >>> On Nov 16, 2015, at 10:45 AM, Sean Hsuan-Yi Chu <hs...@usc.edu>
> wrote:
> >>>
> >>> Hi all,
> >>> We have worked on coming up a design document on this topic, which
> >> focuses
> >>> on external design. Thanks Neeraja for summarizing a document as below:
> >>>
> >>>
> >>
> https://docs.google.com/document/d/1D4mDS-N722MZtkeYGSJbY-wUHG5E8IMT9rIMk1NHHGA/edit
> >>>
> >>> Please help take a look and offer some feedback.
> >>
> >>
>
>

Re: Proposal for Skipping Records

Posted by Julian Hyde <jh...@apache.org>.
Fair enough.

Remember that end users don’t (in general) write Java functions and don’t know what exceptions are. If your intent is to write a specification, you should describe in SQL terms what are error conditions for the built-in operators.

> On Nov 16, 2015, at 3:01 PM, Sean Hsuan-Yi Chu <hs...@usc.edu> wrote:
> 
> It is defined with respect to the behavior of function evaluation. If a
> function evaluation fails (throwing exceptions), we then considered
> something bad with the input record of this function.
> 
> I agree people might have different beliefs on the definition. However,
> from the aspect of users' experience, they could just proceed and see the
> different types of errors at the log, which helps them judge whether the
> failure is tolerable or not.
> 
> On Mon, Nov 16, 2015 at 11:56 AM, Julian Hyde <jh...@apache.org> wrote:
> 
>> It would be useful if you could describe the different ways that a record
>> can be “bad”. IIRC the SQL standard divides the conditions into errors and
>> warnings. Examples of a warning would be a string column that is truncated
>> because it is too large for a varchar(20), or numeric underflow when you
>> add 10.00001 to 100000. Examples of errors would be divide-by-zero or
>> inserting a NULL value into a column declared NOT NULL.
>> 
>> Maybe Drill has a different set of error and warning conditions than this
>> (but probably not THAT different). But it would be useful to spell them
>> out. And it would be useful to be able to treat “error” and “warning”
>> conditions differently.
>> 
>> Julian
>> 
>> 
>>> On Nov 16, 2015, at 10:45 AM, Sean Hsuan-Yi Chu <hs...@usc.edu> wrote:
>>> 
>>> Hi all,
>>> We have worked on coming up a design document on this topic, which
>> focuses
>>> on external design. Thanks Neeraja for summarizing a document as below:
>>> 
>>> 
>> https://docs.google.com/document/d/1D4mDS-N722MZtkeYGSJbY-wUHG5E8IMT9rIMk1NHHGA/edit
>>> 
>>> Please help take a look and offer some feedback.
>> 
>> 


Re: Proposal for Skipping Records

Posted by Julian Hyde <jh...@apache.org>.
Fair enough.

Remember that end users don’t (in general) write Java functions and don’t know what exceptions are. If your intent is to write a specification, you should describe in SQL terms what are error conditions for the built-in operators.

> On Nov 16, 2015, at 3:01 PM, Sean Hsuan-Yi Chu <hs...@usc.edu> wrote:
> 
> It is defined with respect to the behavior of function evaluation. If a
> function evaluation fails (throwing exceptions), we then considered
> something bad with the input record of this function.
> 
> I agree people might have different beliefs on the definition. However,
> from the aspect of users' experience, they could just proceed and see the
> different types of errors at the log, which helps them judge whether the
> failure is tolerable or not.
> 
> On Mon, Nov 16, 2015 at 11:56 AM, Julian Hyde <jh...@apache.org> wrote:
> 
>> It would be useful if you could describe the different ways that a record
>> can be “bad”. IIRC the SQL standard divides the conditions into errors and
>> warnings. Examples of a warning would be a string column that is truncated
>> because it is too large for a varchar(20), or numeric underflow when you
>> add 10.00001 to 100000. Examples of errors would be divide-by-zero or
>> inserting a NULL value into a column declared NOT NULL.
>> 
>> Maybe Drill has a different set of error and warning conditions than this
>> (but probably not THAT different). But it would be useful to spell them
>> out. And it would be useful to be able to treat “error” and “warning”
>> conditions differently.
>> 
>> Julian
>> 
>> 
>>> On Nov 16, 2015, at 10:45 AM, Sean Hsuan-Yi Chu <hs...@usc.edu> wrote:
>>> 
>>> Hi all,
>>> We have worked on coming up a design document on this topic, which
>> focuses
>>> on external design. Thanks Neeraja for summarizing a document as below:
>>> 
>>> 
>> https://docs.google.com/document/d/1D4mDS-N722MZtkeYGSJbY-wUHG5E8IMT9rIMk1NHHGA/edit
>>> 
>>> Please help take a look and offer some feedback.
>> 
>> 


Re: Proposal for Skipping Records

Posted by Sean Hsuan-Yi Chu <hs...@usc.edu>.
It is defined with respect to the behavior of function evaluation. If a
function evaluation fails (throwing exceptions), we then considered
something bad with the input record of this function.

I agree people might have different beliefs on the definition. However,
from the aspect of users' experience, they could just proceed and see the
different types of errors at the log, which helps them judge whether the
failure is tolerable or not.

On Mon, Nov 16, 2015 at 11:56 AM, Julian Hyde <jh...@apache.org> wrote:

> It would be useful if you could describe the different ways that a record
> can be “bad”. IIRC the SQL standard divides the conditions into errors and
> warnings. Examples of a warning would be a string column that is truncated
> because it is too large for a varchar(20), or numeric underflow when you
> add 10.00001 to 100000. Examples of errors would be divide-by-zero or
> inserting a NULL value into a column declared NOT NULL.
>
> Maybe Drill has a different set of error and warning conditions than this
> (but probably not THAT different). But it would be useful to spell them
> out. And it would be useful to be able to treat “error” and “warning”
> conditions differently.
>
> Julian
>
>
> > On Nov 16, 2015, at 10:45 AM, Sean Hsuan-Yi Chu <hs...@usc.edu> wrote:
> >
> > Hi all,
> > We have worked on coming up a design document on this topic, which
> focuses
> > on external design. Thanks Neeraja for summarizing a document as below:
> >
> >
> https://docs.google.com/document/d/1D4mDS-N722MZtkeYGSJbY-wUHG5E8IMT9rIMk1NHHGA/edit
> >
> > Please help take a look and offer some feedback.
>
>

Re: Proposal for Skipping Records

Posted by Sean Hsuan-Yi Chu <hs...@usc.edu>.
It is defined with respect to the behavior of function evaluation. If a
function evaluation fails (throwing exceptions), we then considered
something bad with the input record of this function.

I agree people might have different beliefs on the definition. However,
from the aspect of users' experience, they could just proceed and see the
different types of errors at the log, which helps them judge whether the
failure is tolerable or not.

On Mon, Nov 16, 2015 at 11:56 AM, Julian Hyde <jh...@apache.org> wrote:

> It would be useful if you could describe the different ways that a record
> can be “bad”. IIRC the SQL standard divides the conditions into errors and
> warnings. Examples of a warning would be a string column that is truncated
> because it is too large for a varchar(20), or numeric underflow when you
> add 10.00001 to 100000. Examples of errors would be divide-by-zero or
> inserting a NULL value into a column declared NOT NULL.
>
> Maybe Drill has a different set of error and warning conditions than this
> (but probably not THAT different). But it would be useful to spell them
> out. And it would be useful to be able to treat “error” and “warning”
> conditions differently.
>
> Julian
>
>
> > On Nov 16, 2015, at 10:45 AM, Sean Hsuan-Yi Chu <hs...@usc.edu> wrote:
> >
> > Hi all,
> > We have worked on coming up a design document on this topic, which
> focuses
> > on external design. Thanks Neeraja for summarizing a document as below:
> >
> >
> https://docs.google.com/document/d/1D4mDS-N722MZtkeYGSJbY-wUHG5E8IMT9rIMk1NHHGA/edit
> >
> > Please help take a look and offer some feedback.
>
>

Re: Proposal for Skipping Records

Posted by Julian Hyde <jh...@apache.org>.
It would be useful if you could describe the different ways that a record can be “bad”. IIRC the SQL standard divides the conditions into errors and warnings. Examples of a warning would be a string column that is truncated because it is too large for a varchar(20), or numeric underflow when you add 10.00001 to 100000. Examples of errors would be divide-by-zero or inserting a NULL value into a column declared NOT NULL.

Maybe Drill has a different set of error and warning conditions than this (but probably not THAT different). But it would be useful to spell them out. And it would be useful to be able to treat “error” and “warning” conditions differently.

Julian


> On Nov 16, 2015, at 10:45 AM, Sean Hsuan-Yi Chu <hs...@usc.edu> wrote:
> 
> Hi all,
> We have worked on coming up a design document on this topic, which focuses
> on external design. Thanks Neeraja for summarizing a document as below:
> 
> https://docs.google.com/document/d/1D4mDS-N722MZtkeYGSJbY-wUHG5E8IMT9rIMk1NHHGA/edit
> 
> Please help take a look and offer some feedback.


Re: Proposal for Skipping Records

Posted by Julian Hyde <jh...@apache.org>.
It would be useful if you could describe the different ways that a record can be “bad”. IIRC the SQL standard divides the conditions into errors and warnings. Examples of a warning would be a string column that is truncated because it is too large for a varchar(20), or numeric underflow when you add 10.00001 to 100000. Examples of errors would be divide-by-zero or inserting a NULL value into a column declared NOT NULL.

Maybe Drill has a different set of error and warning conditions than this (but probably not THAT different). But it would be useful to spell them out. And it would be useful to be able to treat “error” and “warning” conditions differently.

Julian


> On Nov 16, 2015, at 10:45 AM, Sean Hsuan-Yi Chu <hs...@usc.edu> wrote:
> 
> Hi all,
> We have worked on coming up a design document on this topic, which focuses
> on external design. Thanks Neeraja for summarizing a document as below:
> 
> https://docs.google.com/document/d/1D4mDS-N722MZtkeYGSJbY-wUHG5E8IMT9rIMk1NHHGA/edit
> 
> Please help take a look and offer some feedback.