You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by David Bowen <db...@yahoo-inc.com> on 2006/05/01 04:43:10 UTC
Re: C API for Hadoop DFS
I'm curious about error handling.
Do dfsConnect and dfsOpenFile return NULL on failure?
Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and
dfsSetWorkingDirectory each have a return value to indicate success or
failure? Or are they assumed to never fail?
- David
Re: C API for Hadoop DFS
Posted by Sameer Paranjpye <sa...@yahoo-inc.com>.
The common convention in C APIs is to have a return value that indicates
failure, usually -1 or NULL. The caller checks errno only if the return
value indicates failure.
This appears to be the convention followed in the published API, in most
places. Functions returning 'void' like 'dfsCreateDirectory' should
probably return ints with 0 indicating success and -1 indicating failure.
I also notice 'bool' being returned by a couple of functions, these
should return ints as well.
Doug Cutting wrote:
> The spec says:
>
> /** All APIs set errno to meaningful values */
>
> So callers should always check errno after each call. Whether this is
> the best way to handle errors in C can be debated, but an error
> mechanism was in fact specified.
>
> Doug
>
> Konstantin Shvachko wrote:
>
>> I think this a very important issue raised by David.
>>
>> IMO __ALL__ functions should return an integer value indicating
>> success (=0) or failure (<0).
>> Unless we want to use C style Exceptions, otherwise we won't be able
>> to identify what went
>> wrong if anything.
>> NULL or bool is not enough in most cases, since we need to distinguish
>> e.g. between
>> timeout (when we retry) and "file not found" cases.
>> The actual return objects should be passed as outputs parameters.
>> E.g.
>> dfsFS dfsConnect(char *host, tPort port);
>> will become
>> tCompletionCode dfsConnect(char *host, tPort port, dfsFS fileSystem );
>> where tCompletionCode could be integer for now. Or we can define a
>> structure
>> { int errCode; char *errDescription; }
>> to return the actual error descriptions along with the error code.
>>
>> --Konstantin
>>
>> Devaraj Das wrote:
>>
>>>> Do dfsConnect and dfsOpenFile return NULL on failure?
>>>>
>>>
>>>
>>> Yes.
>>>
>>>
>>>
>>>> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and
>>>> dfsSetWorkingDirectory each have a return value to indicate success
>>>> or failure? Or are they assumed to never fail?
>>>>
>>>
>>>
>>> Yes these functions should have return values. I will update the API
>>> spec.
>>> Thanks for pointing this out.
>>>
>>> -----Original Message-----
>>> From: David Bowen [mailto:dbowen@yahoo-inc.com] Sent: Monday, May 01,
>>> 2006 8:13 AM
>>> To: hadoop-dev@lucene.apache.org
>>> Subject: Re: C API for Hadoop DFS
>>>
>>>
>>> I'm curious about error handling.
>>> Do dfsConnect and dfsOpenFile return NULL on failure?
>>>
>>> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and
>>> dfsSetWorkingDirectory each have a return value to indicate success
>>> or failure? Or are they assumed to never fail?
>>>
>>> - David
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>
Re: C API for Hadoop DFS
Posted by Sameer Paranjpye <sa...@yahoo-inc.com>.
Negative numbers work fine when the return value is an int. But what do
you do when returning a pointer? Either you have an integer return value
and a pointer to pointer as an output parameter, or you return NULL and
indicate the error via errno
Eric Baldeschwieler wrote:
> I'd vote against errno, because I don't see why we need it. Why not
> just return the error as a negative number? Adding a global just
> complicates the code and introduces an opportunity for further error.
>
> What am I missing?
>
> On May 2, 2006, at 11:19 PM, Devaraj Das wrote:
>
>> In our case, the components involved are the C API library, JNI layer
>> and
>> Java APIs. In all these, we have control over errno. For example, if a
>> particular C API uses a third party library function that might
>> return error
>> and hence set errno, we know about it already. Depending on the
>> error, we
>> take a decision whether to proceed further in the API implementation
>> code or
>> return an error to the client invoking the API. This includes the
>> functions
>> in the JNI library which the API implementation calls. In the Java
>> world, we
>> deal with exceptions and don't bother about errno. So for example, if
>> a Java
>> method, invoked through JNI from a C API, throws an exception, the C API
>> implementation will get the exception object and depending on that
>> the API
>> implementation will set a meaningful errno and return a (-1) or NULL to
>> signify that an error occurred. As I said earlier, this includes the
>> case
>> where the JNI function itself fails (for some reason like out-of-
>> memory or
>> something).
>> As an aside, the JNI layer doesn't generate errno-s.
>>
>> -----Original Message-----
>> From: Konstantin Shvachko [mailto:shv@yahoo-inc.com]
>> Sent: Wednesday, May 03, 2006 2:40 AM
>> To: hadoop-dev@lucene.apache.org
>> Subject: Re: C API for Hadoop DFS
>>
>> Don't think errno is a particularly good idea for several reasons.
>> It is not common to set errno codes.
>> If a system library function uses errno, and we overwrite its value to
>> return
>> something dfs related, the library function behavior becomes
>> unpredictable.
>> This could be hard to debug.
>> We have a JNI layer between our C library and Java, which also might
>> generate
>> errno-s overwriting the values we were trying to bring back from Java.
>>
>> --Konstantin
>>
>> Doug Cutting wrote:
>>
>>> The spec says:
>>>
>>> /** All APIs set errno to meaningful values */
>>>
>>> So callers should always check errno after each call. Whether this is
>>> the best way to handle errors in C can be debated, but an error
>>> mechanism was in fact specified.
>>>
>>> Doug
>>>
>>> Konstantin Shvachko wrote:
>>>
>>>> I think this a very important issue raised by David.
>>>>
>>>> IMO __ALL__ functions should return an integer value indicating
>>>> success (=0) or failure (<0).
>>>> Unless we want to use C style Exceptions, otherwise we won't be able
>>>> to identify what went
>>>> wrong if anything.
>>>> NULL or bool is not enough in most cases, since we need to
>>>> distinguish e.g. between
>>>> timeout (when we retry) and "file not found" cases.
>>>> The actual return objects should be passed as outputs parameters.
>>>> E.g.
>>>> dfsFS dfsConnect(char *host, tPort port);
>>>> will become
>>>> tCompletionCode dfsConnect(char *host, tPort port, dfsFS fileSystem );
>>>> where tCompletionCode could be integer for now. Or we can define a
>>>> structure
>>>> { int errCode; char *errDescription; }
>>>> to return the actual error descriptions along with the error code.
>>>>
>>>> --Konstantin
>>>>
>>>> Devaraj Das wrote:
>>>>
>>>>>> Do dfsConnect and dfsOpenFile return NULL on failure?
>>>>>>
>>>>>
>>>>>
>>>>> Yes.
>>>>>
>>>>>
>>>>>
>>>>>> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and
>>>>>> dfsSetWorkingDirectory each have a return value to indicate success
>>>>>> or failure? Or are they assumed to never fail?
>>>>>>
>>>>>
>>>>>
>>>>> Yes these functions should have return values. I will update the API
>>>>> spec.
>>>>> Thanks for pointing this out.
>>>>>
>>>>> -----Original Message-----
>>>>> From: David Bowen [mailto:dbowen@yahoo-inc.com] Sent: Monday, May
>>>>> 01, 2006 8:13 AM
>>>>> To: hadoop-dev@lucene.apache.org
>>>>> Subject: Re: C API for Hadoop DFS
>>>>>
>>>>>
>>>>> I'm curious about error handling.
>>>>> Do dfsConnect and dfsOpenFile return NULL on failure?
>>>>>
>>>>> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and
>>>>> dfsSetWorkingDirectory each have a return value to indicate success
>>>>> or failure? Or are they assumed to never fail?
>>>>>
>>>>> - David
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>>
>
>
Re: C API for Hadoop DFS
Posted by Sameer Paranjpye <sa...@yahoo-inc.com>.
All that said, I don't mean to be religious about errno. Using negative
return values is a perfectly reasonably approach as well. If folks feel
strongly about it I'm happy to go along ...
Sameer Paranjpye wrote:
> Errno is not a global and is thread safe in all modern libc
> implementations. If you compile with -D_REENTRANT you'll be just fine.
> There is a separate errno for each thread.
>
>
>
> Runping Qi wrote:
>
>> Errno approach proved to be problematic in multi-thread environments.
>> Returning an error code is better.
>>
>> Runping
>>
>>
>> -----Original Message-----
>> From: Devaraj Das [mailto:ddas@yahoo-inc.com] Sent: Wednesday, May 03,
>> 2006 10:32 PM
>> To: hadoop-dev@lucene.apache.org
>> Subject: RE: C API for Hadoop DFS
>>
>> Returning error as a negative number works as well. We initially
>> decided to
>> go with errno since it's a standard in most I/O centric APIs.
>>
>> -----Original Message-----
>> From: Eric Baldeschwieler [mailto:eric14@yahoo-inc.com] Sent:
>> Thursday, May 04, 2006 9:45 AM
>> To: hadoop-dev@lucene.apache.org
>> Subject: Re: C API for Hadoop DFS
>>
>> I'd vote against errno, because I don't see why we need it. Why not
>> just return the error as a negative number? Adding a global just
>> complicates the code and introduces an opportunity for further error.
>>
>> What am I missing?
>>
>> On May 2, 2006, at 11:19 PM, Devaraj Das wrote:
>>
>>
>>> In our case, the components involved are the C API library, JNI
>>> layer and
>>> Java APIs. In all these, we have control over errno. For example, if a
>>> particular C API uses a third party library function that might
>>> return error
>>> and hence set errno, we know about it already. Depending on the
>>> error, we
>>> take a decision whether to proceed further in the API implementation
>>> code or
>>> return an error to the client invoking the API. This includes the
>>> functions
>>> in the JNI library which the API implementation calls. In the Java
>>> world, we
>>> deal with exceptions and don't bother about errno. So for example,
>>> if a Java
>>> method, invoked through JNI from a C API, throws an exception, the C
>>> API
>>> implementation will get the exception object and depending on that
>>> the API
>>> implementation will set a meaningful errno and return a (-1) or NULL to
>>> signify that an error occurred. As I said earlier, this includes the
>>> case
>>> where the JNI function itself fails (for some reason like out-of-
>>> memory or
>>> something).
>>> As an aside, the JNI layer doesn't generate errno-s.
>>>
>>> -----Original Message-----
>>> From: Konstantin Shvachko [mailto:shv@yahoo-inc.com]
>>> Sent: Wednesday, May 03, 2006 2:40 AM
>>> To: hadoop-dev@lucene.apache.org
>>> Subject: Re: C API for Hadoop DFS
>>>
>>> Don't think errno is a particularly good idea for several reasons.
>>> It is not common to set errno codes.
>>> If a system library function uses errno, and we overwrite its value to
>>> return
>>> something dfs related, the library function behavior becomes
>>> unpredictable.
>>> This could be hard to debug.
>>> We have a JNI layer between our C library and Java, which also might
>>> generate
>>> errno-s overwriting the values we were trying to bring back from Java.
>>>
>>> --Konstantin
>>>
>>> Doug Cutting wrote:
>>>
>>>
>>>> The spec says:
>>>>
>>>> /** All APIs set errno to meaningful values */
>>>>
>>>> So callers should always check errno after each call. Whether this is
>>>> the best way to handle errors in C can be debated, but an error
>>>> mechanism was in fact specified.
>>>>
>>>> Doug
>>>>
>>>> Konstantin Shvachko wrote:
>>>>
>>>>
>>>>> I think this a very important issue raised by David.
>>>>>
>>>>> IMO __ALL__ functions should return an integer value indicating
>>>>> success (=0) or failure (<0).
>>>>> Unless we want to use C style Exceptions, otherwise we won't be able
>>>>> to identify what went
>>>>> wrong if anything.
>>>>> NULL or bool is not enough in most cases, since we need to
>>>>> distinguish e.g. between
>>>>> timeout (when we retry) and "file not found" cases.
>>>>> The actual return objects should be passed as outputs parameters.
>>>>> E.g.
>>>>> dfsFS dfsConnect(char *host, tPort port);
>>>>> will become
>>>>> tCompletionCode dfsConnect(char *host, tPort port, dfsFS
>>>>> fileSystem );
>>>>> where tCompletionCode could be integer for now. Or we can define a
>>>>> structure
>>>>> { int errCode; char *errDescription; }
>>>>> to return the actual error descriptions along with the error code.
>>>>>
>>>>> --Konstantin
>>>>>
>>>>> Devaraj Das wrote:
>>>>>
>>>>>
>>>>>>> Do dfsConnect and dfsOpenFile return NULL on failure?
>>>>>>>
>>>>>>
>>>>>>
>>>>>> Yes.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and
>>>>>>> dfsSetWorkingDirectory each have a return value to indicate success
>>>>>>> or failure? Or are they assumed to never fail?
>>>>>>>
>>>>>>
>>>>>>
>>>>>> Yes these functions should have return values. I will update the API
>>>>>> spec.
>>>>>> Thanks for pointing this out.
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: David Bowen [mailto:dbowen@yahoo-inc.com] Sent: Monday, May
>>>>>> 01, 2006 8:13 AM
>>>>>> To: hadoop-dev@lucene.apache.org
>>>>>> Subject: Re: C API for Hadoop DFS
>>>>>>
>>>>>>
>>>>>> I'm curious about error handling.
>>>>>> Do dfsConnect and dfsOpenFile return NULL on failure?
>>>>>>
>>>>>> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and
>>>>>> dfsSetWorkingDirectory each have a return value to indicate success
>>>>>> or failure? Or are they assumed to never fail?
>>>>>>
>>>>>> - David
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>>
>>
>
Re: C API for Hadoop DFS
Posted by Sameer Paranjpye <sa...@yahoo-inc.com>.
Errno is not a global and is thread safe in all modern libc
implementations. If you compile with -D_REENTRANT you'll be just fine.
There is a separate errno for each thread.
Runping Qi wrote:
> Errno approach proved to be problematic in multi-thread environments.
> Returning an error code is better.
>
> Runping
>
>
> -----Original Message-----
> From: Devaraj Das [mailto:ddas@yahoo-inc.com]
> Sent: Wednesday, May 03, 2006 10:32 PM
> To: hadoop-dev@lucene.apache.org
> Subject: RE: C API for Hadoop DFS
>
> Returning error as a negative number works as well. We initially decided to
> go with errno since it's a standard in most I/O centric APIs.
>
> -----Original Message-----
> From: Eric Baldeschwieler [mailto:eric14@yahoo-inc.com]
> Sent: Thursday, May 04, 2006 9:45 AM
> To: hadoop-dev@lucene.apache.org
> Subject: Re: C API for Hadoop DFS
>
> I'd vote against errno, because I don't see why we need it. Why not
> just return the error as a negative number? Adding a global just
> complicates the code and introduces an opportunity for further error.
>
> What am I missing?
>
> On May 2, 2006, at 11:19 PM, Devaraj Das wrote:
>
>
>>In our case, the components involved are the C API library, JNI
>>layer and
>>Java APIs. In all these, we have control over errno. For example, if a
>>particular C API uses a third party library function that might
>>return error
>>and hence set errno, we know about it already. Depending on the
>>error, we
>>take a decision whether to proceed further in the API
>>implementation code or
>>return an error to the client invoking the API. This includes the
>>functions
>>in the JNI library which the API implementation calls. In the Java
>>world, we
>>deal with exceptions and don't bother about errno. So for example,
>>if a Java
>>method, invoked through JNI from a C API, throws an exception, the
>>C API
>>implementation will get the exception object and depending on that
>>the API
>>implementation will set a meaningful errno and return a (-1) or
>>NULL to
>>signify that an error occurred. As I said earlier, this includes
>>the case
>>where the JNI function itself fails (for some reason like out-of-
>>memory or
>>something).
>>As an aside, the JNI layer doesn't generate errno-s.
>>
>>-----Original Message-----
>>From: Konstantin Shvachko [mailto:shv@yahoo-inc.com]
>>Sent: Wednesday, May 03, 2006 2:40 AM
>>To: hadoop-dev@lucene.apache.org
>>Subject: Re: C API for Hadoop DFS
>>
>>Don't think errno is a particularly good idea for several reasons.
>>It is not common to set errno codes.
>>If a system library function uses errno, and we overwrite its value to
>>return
>>something dfs related, the library function behavior becomes
>>unpredictable.
>>This could be hard to debug.
>>We have a JNI layer between our C library and Java, which also might
>>generate
>>errno-s overwriting the values we were trying to bring back from Java.
>>
>>--Konstantin
>>
>>Doug Cutting wrote:
>>
>>
>>>The spec says:
>>>
>>>/** All APIs set errno to meaningful values */
>>>
>>>So callers should always check errno after each call. Whether
>>>this is
>>>the best way to handle errors in C can be debated, but an error
>>>mechanism was in fact specified.
>>>
>>>Doug
>>>
>>>Konstantin Shvachko wrote:
>>>
>>>
>>>>I think this a very important issue raised by David.
>>>>
>>>>IMO __ALL__ functions should return an integer value indicating
>>>>success (=0) or failure (<0).
>>>>Unless we want to use C style Exceptions, otherwise we won't be able
>>>>to identify what went
>>>>wrong if anything.
>>>>NULL or bool is not enough in most cases, since we need to
>>>>distinguish e.g. between
>>>>timeout (when we retry) and "file not found" cases.
>>>>The actual return objects should be passed as outputs parameters.
>>>>E.g.
>>>>dfsFS dfsConnect(char *host, tPort port);
>>>>will become
>>>>tCompletionCode dfsConnect(char *host, tPort port, dfsFS
>>>>fileSystem );
>>>>where tCompletionCode could be integer for now. Or we can define a
>>>>structure
>>>>{ int errCode; char *errDescription; }
>>>>to return the actual error descriptions along with the error code.
>>>>
>>>>--Konstantin
>>>>
>>>>Devaraj Das wrote:
>>>>
>>>>
>>>>>>Do dfsConnect and dfsOpenFile return NULL on failure?
>>>>>>
>>>>>
>>>>>
>>>>>Yes.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and
>>>>>>dfsSetWorkingDirectory each have a return value to indicate
>>>>>>success
>>>>>>or failure? Or are they assumed to never fail?
>>>>>>
>>>>>
>>>>>
>>>>>Yes these functions should have return values. I will update the
>>>>>API
>>>>>spec.
>>>>>Thanks for pointing this out.
>>>>>
>>>>>-----Original Message-----
>>>>>From: David Bowen [mailto:dbowen@yahoo-inc.com] Sent: Monday, May
>>>>>01, 2006 8:13 AM
>>>>>To: hadoop-dev@lucene.apache.org
>>>>>Subject: Re: C API for Hadoop DFS
>>>>>
>>>>>
>>>>>I'm curious about error handling.
>>>>>Do dfsConnect and dfsOpenFile return NULL on failure?
>>>>>
>>>>>Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and
>>>>>dfsSetWorkingDirectory each have a return value to indicate success
>>>>>or failure? Or are they assumed to never fail?
>>>>>
>>>>>- David
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>
>
>
RE: C API for Hadoop DFS
Posted by Runping Qi <ru...@yahoo-inc.com>.
Errno approach proved to be problematic in multi-thread environments.
Returning an error code is better.
Runping
-----Original Message-----
From: Devaraj Das [mailto:ddas@yahoo-inc.com]
Sent: Wednesday, May 03, 2006 10:32 PM
To: hadoop-dev@lucene.apache.org
Subject: RE: C API for Hadoop DFS
Returning error as a negative number works as well. We initially decided to
go with errno since it's a standard in most I/O centric APIs.
-----Original Message-----
From: Eric Baldeschwieler [mailto:eric14@yahoo-inc.com]
Sent: Thursday, May 04, 2006 9:45 AM
To: hadoop-dev@lucene.apache.org
Subject: Re: C API for Hadoop DFS
I'd vote against errno, because I don't see why we need it. Why not
just return the error as a negative number? Adding a global just
complicates the code and introduces an opportunity for further error.
What am I missing?
On May 2, 2006, at 11:19 PM, Devaraj Das wrote:
> In our case, the components involved are the C API library, JNI
> layer and
> Java APIs. In all these, we have control over errno. For example, if a
> particular C API uses a third party library function that might
> return error
> and hence set errno, we know about it already. Depending on the
> error, we
> take a decision whether to proceed further in the API
> implementation code or
> return an error to the client invoking the API. This includes the
> functions
> in the JNI library which the API implementation calls. In the Java
> world, we
> deal with exceptions and don't bother about errno. So for example,
> if a Java
> method, invoked through JNI from a C API, throws an exception, the
> C API
> implementation will get the exception object and depending on that
> the API
> implementation will set a meaningful errno and return a (-1) or
> NULL to
> signify that an error occurred. As I said earlier, this includes
> the case
> where the JNI function itself fails (for some reason like out-of-
> memory or
> something).
> As an aside, the JNI layer doesn't generate errno-s.
>
> -----Original Message-----
> From: Konstantin Shvachko [mailto:shv@yahoo-inc.com]
> Sent: Wednesday, May 03, 2006 2:40 AM
> To: hadoop-dev@lucene.apache.org
> Subject: Re: C API for Hadoop DFS
>
> Don't think errno is a particularly good idea for several reasons.
> It is not common to set errno codes.
> If a system library function uses errno, and we overwrite its value to
> return
> something dfs related, the library function behavior becomes
> unpredictable.
> This could be hard to debug.
> We have a JNI layer between our C library and Java, which also might
> generate
> errno-s overwriting the values we were trying to bring back from Java.
>
> --Konstantin
>
> Doug Cutting wrote:
>
>> The spec says:
>>
>> /** All APIs set errno to meaningful values */
>>
>> So callers should always check errno after each call. Whether
>> this is
>> the best way to handle errors in C can be debated, but an error
>> mechanism was in fact specified.
>>
>> Doug
>>
>> Konstantin Shvachko wrote:
>>
>>> I think this a very important issue raised by David.
>>>
>>> IMO __ALL__ functions should return an integer value indicating
>>> success (=0) or failure (<0).
>>> Unless we want to use C style Exceptions, otherwise we won't be able
>>> to identify what went
>>> wrong if anything.
>>> NULL or bool is not enough in most cases, since we need to
>>> distinguish e.g. between
>>> timeout (when we retry) and "file not found" cases.
>>> The actual return objects should be passed as outputs parameters.
>>> E.g.
>>> dfsFS dfsConnect(char *host, tPort port);
>>> will become
>>> tCompletionCode dfsConnect(char *host, tPort port, dfsFS
>>> fileSystem );
>>> where tCompletionCode could be integer for now. Or we can define a
>>> structure
>>> { int errCode; char *errDescription; }
>>> to return the actual error descriptions along with the error code.
>>>
>>> --Konstantin
>>>
>>> Devaraj Das wrote:
>>>
>>>>> Do dfsConnect and dfsOpenFile return NULL on failure?
>>>>>
>>>>
>>>>
>>>> Yes.
>>>>
>>>>
>>>>
>>>>> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and
>>>>> dfsSetWorkingDirectory each have a return value to indicate
>>>>> success
>>>>> or failure? Or are they assumed to never fail?
>>>>>
>>>>
>>>>
>>>> Yes these functions should have return values. I will update the
>>>> API
>>>> spec.
>>>> Thanks for pointing this out.
>>>>
>>>> -----Original Message-----
>>>> From: David Bowen [mailto:dbowen@yahoo-inc.com] Sent: Monday, May
>>>> 01, 2006 8:13 AM
>>>> To: hadoop-dev@lucene.apache.org
>>>> Subject: Re: C API for Hadoop DFS
>>>>
>>>>
>>>> I'm curious about error handling.
>>>> Do dfsConnect and dfsOpenFile return NULL on failure?
>>>>
>>>> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and
>>>> dfsSetWorkingDirectory each have a return value to indicate success
>>>> or failure? Or are they assumed to never fail?
>>>>
>>>> - David
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>
>
RE: C API for Hadoop DFS
Posted by Devaraj Das <dd...@yahoo-inc.com>.
Returning error as a negative number works as well. We initially decided to
go with errno since it's a standard in most I/O centric APIs.
-----Original Message-----
From: Eric Baldeschwieler [mailto:eric14@yahoo-inc.com]
Sent: Thursday, May 04, 2006 9:45 AM
To: hadoop-dev@lucene.apache.org
Subject: Re: C API for Hadoop DFS
I'd vote against errno, because I don't see why we need it. Why not
just return the error as a negative number? Adding a global just
complicates the code and introduces an opportunity for further error.
What am I missing?
On May 2, 2006, at 11:19 PM, Devaraj Das wrote:
> In our case, the components involved are the C API library, JNI
> layer and
> Java APIs. In all these, we have control over errno. For example, if a
> particular C API uses a third party library function that might
> return error
> and hence set errno, we know about it already. Depending on the
> error, we
> take a decision whether to proceed further in the API
> implementation code or
> return an error to the client invoking the API. This includes the
> functions
> in the JNI library which the API implementation calls. In the Java
> world, we
> deal with exceptions and don't bother about errno. So for example,
> if a Java
> method, invoked through JNI from a C API, throws an exception, the
> C API
> implementation will get the exception object and depending on that
> the API
> implementation will set a meaningful errno and return a (-1) or
> NULL to
> signify that an error occurred. As I said earlier, this includes
> the case
> where the JNI function itself fails (for some reason like out-of-
> memory or
> something).
> As an aside, the JNI layer doesn't generate errno-s.
>
> -----Original Message-----
> From: Konstantin Shvachko [mailto:shv@yahoo-inc.com]
> Sent: Wednesday, May 03, 2006 2:40 AM
> To: hadoop-dev@lucene.apache.org
> Subject: Re: C API for Hadoop DFS
>
> Don't think errno is a particularly good idea for several reasons.
> It is not common to set errno codes.
> If a system library function uses errno, and we overwrite its value to
> return
> something dfs related, the library function behavior becomes
> unpredictable.
> This could be hard to debug.
> We have a JNI layer between our C library and Java, which also might
> generate
> errno-s overwriting the values we were trying to bring back from Java.
>
> --Konstantin
>
> Doug Cutting wrote:
>
>> The spec says:
>>
>> /** All APIs set errno to meaningful values */
>>
>> So callers should always check errno after each call. Whether
>> this is
>> the best way to handle errors in C can be debated, but an error
>> mechanism was in fact specified.
>>
>> Doug
>>
>> Konstantin Shvachko wrote:
>>
>>> I think this a very important issue raised by David.
>>>
>>> IMO __ALL__ functions should return an integer value indicating
>>> success (=0) or failure (<0).
>>> Unless we want to use C style Exceptions, otherwise we won't be able
>>> to identify what went
>>> wrong if anything.
>>> NULL or bool is not enough in most cases, since we need to
>>> distinguish e.g. between
>>> timeout (when we retry) and "file not found" cases.
>>> The actual return objects should be passed as outputs parameters.
>>> E.g.
>>> dfsFS dfsConnect(char *host, tPort port);
>>> will become
>>> tCompletionCode dfsConnect(char *host, tPort port, dfsFS
>>> fileSystem );
>>> where tCompletionCode could be integer for now. Or we can define a
>>> structure
>>> { int errCode; char *errDescription; }
>>> to return the actual error descriptions along with the error code.
>>>
>>> --Konstantin
>>>
>>> Devaraj Das wrote:
>>>
>>>>> Do dfsConnect and dfsOpenFile return NULL on failure?
>>>>>
>>>>
>>>>
>>>> Yes.
>>>>
>>>>
>>>>
>>>>> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and
>>>>> dfsSetWorkingDirectory each have a return value to indicate
>>>>> success
>>>>> or failure? Or are they assumed to never fail?
>>>>>
>>>>
>>>>
>>>> Yes these functions should have return values. I will update the
>>>> API
>>>> spec.
>>>> Thanks for pointing this out.
>>>>
>>>> -----Original Message-----
>>>> From: David Bowen [mailto:dbowen@yahoo-inc.com] Sent: Monday, May
>>>> 01, 2006 8:13 AM
>>>> To: hadoop-dev@lucene.apache.org
>>>> Subject: Re: C API for Hadoop DFS
>>>>
>>>>
>>>> I'm curious about error handling.
>>>> Do dfsConnect and dfsOpenFile return NULL on failure?
>>>>
>>>> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and
>>>> dfsSetWorkingDirectory each have a return value to indicate success
>>>> or failure? Or are they assumed to never fail?
>>>>
>>>> - David
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>
>
Re: C API for Hadoop DFS
Posted by Eric Baldeschwieler <er...@yahoo-inc.com>.
I'd vote against errno, because I don't see why we need it. Why not
just return the error as a negative number? Adding a global just
complicates the code and introduces an opportunity for further error.
What am I missing?
On May 2, 2006, at 11:19 PM, Devaraj Das wrote:
> In our case, the components involved are the C API library, JNI
> layer and
> Java APIs. In all these, we have control over errno. For example, if a
> particular C API uses a third party library function that might
> return error
> and hence set errno, we know about it already. Depending on the
> error, we
> take a decision whether to proceed further in the API
> implementation code or
> return an error to the client invoking the API. This includes the
> functions
> in the JNI library which the API implementation calls. In the Java
> world, we
> deal with exceptions and don't bother about errno. So for example,
> if a Java
> method, invoked through JNI from a C API, throws an exception, the
> C API
> implementation will get the exception object and depending on that
> the API
> implementation will set a meaningful errno and return a (-1) or
> NULL to
> signify that an error occurred. As I said earlier, this includes
> the case
> where the JNI function itself fails (for some reason like out-of-
> memory or
> something).
> As an aside, the JNI layer doesn't generate errno-s.
>
> -----Original Message-----
> From: Konstantin Shvachko [mailto:shv@yahoo-inc.com]
> Sent: Wednesday, May 03, 2006 2:40 AM
> To: hadoop-dev@lucene.apache.org
> Subject: Re: C API for Hadoop DFS
>
> Don't think errno is a particularly good idea for several reasons.
> It is not common to set errno codes.
> If a system library function uses errno, and we overwrite its value to
> return
> something dfs related, the library function behavior becomes
> unpredictable.
> This could be hard to debug.
> We have a JNI layer between our C library and Java, which also might
> generate
> errno-s overwriting the values we were trying to bring back from Java.
>
> --Konstantin
>
> Doug Cutting wrote:
>
>> The spec says:
>>
>> /** All APIs set errno to meaningful values */
>>
>> So callers should always check errno after each call. Whether
>> this is
>> the best way to handle errors in C can be debated, but an error
>> mechanism was in fact specified.
>>
>> Doug
>>
>> Konstantin Shvachko wrote:
>>
>>> I think this a very important issue raised by David.
>>>
>>> IMO __ALL__ functions should return an integer value indicating
>>> success (=0) or failure (<0).
>>> Unless we want to use C style Exceptions, otherwise we won't be able
>>> to identify what went
>>> wrong if anything.
>>> NULL or bool is not enough in most cases, since we need to
>>> distinguish e.g. between
>>> timeout (when we retry) and "file not found" cases.
>>> The actual return objects should be passed as outputs parameters.
>>> E.g.
>>> dfsFS dfsConnect(char *host, tPort port);
>>> will become
>>> tCompletionCode dfsConnect(char *host, tPort port, dfsFS
>>> fileSystem );
>>> where tCompletionCode could be integer for now. Or we can define a
>>> structure
>>> { int errCode; char *errDescription; }
>>> to return the actual error descriptions along with the error code.
>>>
>>> --Konstantin
>>>
>>> Devaraj Das wrote:
>>>
>>>>> Do dfsConnect and dfsOpenFile return NULL on failure?
>>>>>
>>>>
>>>>
>>>> Yes.
>>>>
>>>>
>>>>
>>>>> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and
>>>>> dfsSetWorkingDirectory each have a return value to indicate
>>>>> success
>>>>> or failure? Or are they assumed to never fail?
>>>>>
>>>>
>>>>
>>>> Yes these functions should have return values. I will update the
>>>> API
>>>> spec.
>>>> Thanks for pointing this out.
>>>>
>>>> -----Original Message-----
>>>> From: David Bowen [mailto:dbowen@yahoo-inc.com] Sent: Monday, May
>>>> 01, 2006 8:13 AM
>>>> To: hadoop-dev@lucene.apache.org
>>>> Subject: Re: C API for Hadoop DFS
>>>>
>>>>
>>>> I'm curious about error handling.
>>>> Do dfsConnect and dfsOpenFile return NULL on failure?
>>>>
>>>> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and
>>>> dfsSetWorkingDirectory each have a return value to indicate success
>>>> or failure? Or are they assumed to never fail?
>>>>
>>>> - David
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>
>
RE: C API for Hadoop DFS
Posted by Devaraj Das <dd...@yahoo-inc.com>.
In our case, the components involved are the C API library, JNI layer and
Java APIs. In all these, we have control over errno. For example, if a
particular C API uses a third party library function that might return error
and hence set errno, we know about it already. Depending on the error, we
take a decision whether to proceed further in the API implementation code or
return an error to the client invoking the API. This includes the functions
in the JNI library which the API implementation calls. In the Java world, we
deal with exceptions and don't bother about errno. So for example, if a Java
method, invoked through JNI from a C API, throws an exception, the C API
implementation will get the exception object and depending on that the API
implementation will set a meaningful errno and return a (-1) or NULL to
signify that an error occurred. As I said earlier, this includes the case
where the JNI function itself fails (for some reason like out-of-memory or
something).
As an aside, the JNI layer doesn't generate errno-s.
-----Original Message-----
From: Konstantin Shvachko [mailto:shv@yahoo-inc.com]
Sent: Wednesday, May 03, 2006 2:40 AM
To: hadoop-dev@lucene.apache.org
Subject: Re: C API for Hadoop DFS
Don't think errno is a particularly good idea for several reasons.
It is not common to set errno codes.
If a system library function uses errno, and we overwrite its value to
return
something dfs related, the library function behavior becomes unpredictable.
This could be hard to debug.
We have a JNI layer between our C library and Java, which also might
generate
errno-s overwriting the values we were trying to bring back from Java.
--Konstantin
Doug Cutting wrote:
> The spec says:
>
> /** All APIs set errno to meaningful values */
>
> So callers should always check errno after each call. Whether this is
> the best way to handle errors in C can be debated, but an error
> mechanism was in fact specified.
>
> Doug
>
> Konstantin Shvachko wrote:
>
>> I think this a very important issue raised by David.
>>
>> IMO __ALL__ functions should return an integer value indicating
>> success (=0) or failure (<0).
>> Unless we want to use C style Exceptions, otherwise we won't be able
>> to identify what went
>> wrong if anything.
>> NULL or bool is not enough in most cases, since we need to
>> distinguish e.g. between
>> timeout (when we retry) and "file not found" cases.
>> The actual return objects should be passed as outputs parameters.
>> E.g.
>> dfsFS dfsConnect(char *host, tPort port);
>> will become
>> tCompletionCode dfsConnect(char *host, tPort port, dfsFS fileSystem );
>> where tCompletionCode could be integer for now. Or we can define a
>> structure
>> { int errCode; char *errDescription; }
>> to return the actual error descriptions along with the error code.
>>
>> --Konstantin
>>
>> Devaraj Das wrote:
>>
>>>> Do dfsConnect and dfsOpenFile return NULL on failure?
>>>>
>>>
>>>
>>> Yes.
>>>
>>>
>>>
>>>> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and
>>>> dfsSetWorkingDirectory each have a return value to indicate success
>>>> or failure? Or are they assumed to never fail?
>>>>
>>>
>>>
>>> Yes these functions should have return values. I will update the API
>>> spec.
>>> Thanks for pointing this out.
>>>
>>> -----Original Message-----
>>> From: David Bowen [mailto:dbowen@yahoo-inc.com] Sent: Monday, May
>>> 01, 2006 8:13 AM
>>> To: hadoop-dev@lucene.apache.org
>>> Subject: Re: C API for Hadoop DFS
>>>
>>>
>>> I'm curious about error handling.
>>> Do dfsConnect and dfsOpenFile return NULL on failure?
>>>
>>> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and
>>> dfsSetWorkingDirectory each have a return value to indicate success
>>> or failure? Or are they assumed to never fail?
>>>
>>> - David
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>
>
>
Re: C API for Hadoop DFS
Posted by Konstantin Shvachko <sh...@yahoo-inc.com>.
Don't think errno is a particularly good idea for several reasons.
It is not common to set errno codes.
If a system library function uses errno, and we overwrite its value to
return
something dfs related, the library function behavior becomes unpredictable.
This could be hard to debug.
We have a JNI layer between our C library and Java, which also might
generate
errno-s overwriting the values we were trying to bring back from Java.
--Konstantin
Doug Cutting wrote:
> The spec says:
>
> /** All APIs set errno to meaningful values */
>
> So callers should always check errno after each call. Whether this is
> the best way to handle errors in C can be debated, but an error
> mechanism was in fact specified.
>
> Doug
>
> Konstantin Shvachko wrote:
>
>> I think this a very important issue raised by David.
>>
>> IMO __ALL__ functions should return an integer value indicating
>> success (=0) or failure (<0).
>> Unless we want to use C style Exceptions, otherwise we won't be able
>> to identify what went
>> wrong if anything.
>> NULL or bool is not enough in most cases, since we need to
>> distinguish e.g. between
>> timeout (when we retry) and "file not found" cases.
>> The actual return objects should be passed as outputs parameters.
>> E.g.
>> dfsFS dfsConnect(char *host, tPort port);
>> will become
>> tCompletionCode dfsConnect(char *host, tPort port, dfsFS fileSystem );
>> where tCompletionCode could be integer for now. Or we can define a
>> structure
>> { int errCode; char *errDescription; }
>> to return the actual error descriptions along with the error code.
>>
>> --Konstantin
>>
>> Devaraj Das wrote:
>>
>>>> Do dfsConnect and dfsOpenFile return NULL on failure?
>>>>
>>>
>>>
>>> Yes.
>>>
>>>
>>>
>>>> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and
>>>> dfsSetWorkingDirectory each have a return value to indicate success
>>>> or failure? Or are they assumed to never fail?
>>>>
>>>
>>>
>>> Yes these functions should have return values. I will update the API
>>> spec.
>>> Thanks for pointing this out.
>>>
>>> -----Original Message-----
>>> From: David Bowen [mailto:dbowen@yahoo-inc.com] Sent: Monday, May
>>> 01, 2006 8:13 AM
>>> To: hadoop-dev@lucene.apache.org
>>> Subject: Re: C API for Hadoop DFS
>>>
>>>
>>> I'm curious about error handling.
>>> Do dfsConnect and dfsOpenFile return NULL on failure?
>>>
>>> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and
>>> dfsSetWorkingDirectory each have a return value to indicate success
>>> or failure? Or are they assumed to never fail?
>>>
>>> - David
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>
>
>
Re: C API for Hadoop DFS
Posted by David Bowen <db...@yahoo-inc.com>.
Doug Cutting wrote:
> The spec says:
>
> /** All APIs set errno to meaningful values */
>
> So callers should always check errno after each call. Whether this is
> the best way to handle errors in C can be debated, but an error
> mechanism was in fact specified.
I don't much like errno, but if we are using it we should follow the
normal convention that applications only check errno when a return value
indicates that there was an error. So it does not need to be modified
when there isn't an error.
- David
Re: C API for Hadoop DFS
Posted by Doug Cutting <cu...@apache.org>.
The spec says:
/** All APIs set errno to meaningful values */
So callers should always check errno after each call. Whether this is
the best way to handle errors in C can be debated, but an error
mechanism was in fact specified.
Doug
Konstantin Shvachko wrote:
> I think this a very important issue raised by David.
>
> IMO __ALL__ functions should return an integer value indicating success
> (=0) or failure (<0).
> Unless we want to use C style Exceptions, otherwise we won't be able to
> identify what went
> wrong if anything.
> NULL or bool is not enough in most cases, since we need to distinguish
> e.g. between
> timeout (when we retry) and "file not found" cases.
> The actual return objects should be passed as outputs parameters.
> E.g.
> dfsFS dfsConnect(char *host, tPort port);
> will become
> tCompletionCode dfsConnect(char *host, tPort port, dfsFS fileSystem );
> where tCompletionCode could be integer for now. Or we can define a
> structure
> { int errCode; char *errDescription; }
> to return the actual error descriptions along with the error code.
>
> --Konstantin
>
> Devaraj Das wrote:
>
>>> Do dfsConnect and dfsOpenFile return NULL on failure?
>>>
>>
>> Yes.
>>
>>
>>
>>> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and
>>> dfsSetWorkingDirectory each have a return value to indicate success
>>> or failure? Or are they assumed to never fail?
>>>
>>
>> Yes these functions should have return values. I will update the API
>> spec.
>> Thanks for pointing this out.
>>
>> -----Original Message-----
>> From: David Bowen [mailto:dbowen@yahoo-inc.com] Sent: Monday, May 01,
>> 2006 8:13 AM
>> To: hadoop-dev@lucene.apache.org
>> Subject: Re: C API for Hadoop DFS
>>
>>
>> I'm curious about error handling.
>> Do dfsConnect and dfsOpenFile return NULL on failure?
>>
>> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and
>> dfsSetWorkingDirectory each have a return value to indicate success or
>> failure? Or are they assumed to never fail?
>>
>> - David
>>
>>
>>
>>
>>
>>
>>
>
Re: C API for Hadoop DFS
Posted by Konstantin Shvachko <sh...@yahoo-inc.com>.
I think this a very important issue raised by David.
IMO __ALL__ functions should return an integer value indicating success
(=0) or failure (<0).
Unless we want to use C style Exceptions, otherwise we won't be able to
identify what went
wrong if anything.
NULL or bool is not enough in most cases, since we need to distinguish
e.g. between
timeout (when we retry) and "file not found" cases.
The actual return objects should be passed as outputs parameters.
E.g.
dfsFS dfsConnect(char *host, tPort port);
will become
tCompletionCode dfsConnect(char *host, tPort port, dfsFS fileSystem );
where tCompletionCode could be integer for now. Or we can define a structure
{ int errCode; char *errDescription; }
to return the actual error descriptions along with the error code.
--Konstantin
Devaraj Das wrote:
>>Do dfsConnect and dfsOpenFile return NULL on failure?
>>
>>
>Yes.
>
>
>
>>Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and
>>dfsSetWorkingDirectory each have a return value to indicate success or
>>failure? Or are they assumed to never fail?
>>
>>
>Yes these functions should have return values. I will update the API spec.
>Thanks for pointing this out.
>
>-----Original Message-----
>From: David Bowen [mailto:dbowen@yahoo-inc.com]
>Sent: Monday, May 01, 2006 8:13 AM
>To: hadoop-dev@lucene.apache.org
>Subject: Re: C API for Hadoop DFS
>
>
>I'm curious about error handling.
>
>Do dfsConnect and dfsOpenFile return NULL on failure?
>
>Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and
>dfsSetWorkingDirectory each have a return value to indicate success or
>failure? Or are they assumed to never fail?
>
>- David
>
>
>
>
>
>
>
RE: C API for Hadoop DFS
Posted by Devaraj Das <dd...@yahoo-inc.com>.
> Do dfsConnect and dfsOpenFile return NULL on failure?
Yes.
> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and
> dfsSetWorkingDirectory each have a return value to indicate success or
> failure? Or are they assumed to never fail?
Yes these functions should have return values. I will update the API spec.
Thanks for pointing this out.
-----Original Message-----
From: David Bowen [mailto:dbowen@yahoo-inc.com]
Sent: Monday, May 01, 2006 8:13 AM
To: hadoop-dev@lucene.apache.org
Subject: Re: C API for Hadoop DFS
I'm curious about error handling.
Do dfsConnect and dfsOpenFile return NULL on failure?
Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and
dfsSetWorkingDirectory each have a return value to indicate success or
failure? Or are they assumed to never fail?
- David