You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by David Bowen <db...@yahoo-inc.com> on 2006/05/01 04:43:10 UTC

Re: C API for Hadoop DFS

I'm curious about error handling. 

Do dfsConnect and dfsOpenFile return NULL on failure?

Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and 
dfsSetWorkingDirectory each have a return value to indicate success or 
failure?  Or are they assumed to never fail?

- David



Re: C API for Hadoop DFS

Posted by Sameer Paranjpye <sa...@yahoo-inc.com>.
The common convention in C APIs is to have a return value that indicates 
failure, usually -1 or NULL. The caller checks errno only if the return 
value indicates failure.

This appears to be the convention followed in the published API, in most 
places. Functions returning 'void' like 'dfsCreateDirectory' should 
probably return ints with 0 indicating success and -1 indicating failure.

I also notice 'bool' being returned by a couple of functions, these 
should return ints as well.



Doug Cutting wrote:

> The spec says:
> 
> /** All APIs set errno to meaningful values */
> 
> So callers should always check errno after each call.  Whether this is 
> the best way to handle errors in C can be debated, but an error 
> mechanism was in fact specified.
> 
> Doug
> 
> Konstantin Shvachko wrote:
> 
>> I think this a very important issue raised by David.
>>
>> IMO __ALL__ functions should return an integer value indicating 
>> success (=0) or failure (<0).
>> Unless we want to use C style Exceptions, otherwise we won't be able 
>> to identify what went
>> wrong if anything.
>> NULL or bool is not enough in most cases, since we need to distinguish 
>> e.g. between
>> timeout (when we retry) and "file not found" cases.
>> The actual return objects should be passed as outputs parameters.
>> E.g.
>> dfsFS dfsConnect(char *host, tPort port);
>> will become
>> tCompletionCode dfsConnect(char *host, tPort port, dfsFS fileSystem );
>> where tCompletionCode could be integer for now. Or we can define a 
>> structure
>> { int errCode; char *errDescription; }
>> to return the actual error descriptions along with the error code.
>>
>> --Konstantin
>>
>> Devaraj Das wrote:
>>
>>>> Do dfsConnect and dfsOpenFile return NULL on failure?
>>>>   
>>>
>>>
>>> Yes.
>>>
>>>  
>>>
>>>> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and 
>>>> dfsSetWorkingDirectory each have a return value to indicate success 
>>>> or failure?  Or are they assumed to never fail?
>>>>   
>>>
>>>
>>> Yes these functions should have return values. I will update the API 
>>> spec.
>>> Thanks for pointing this out.
>>>
>>> -----Original Message-----
>>> From: David Bowen [mailto:dbowen@yahoo-inc.com] Sent: Monday, May 01, 
>>> 2006 8:13 AM
>>> To: hadoop-dev@lucene.apache.org
>>> Subject: Re: C API for Hadoop DFS
>>>
>>>
>>> I'm curious about error handling.
>>> Do dfsConnect and dfsOpenFile return NULL on failure?
>>>
>>> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and 
>>> dfsSetWorkingDirectory each have a return value to indicate success 
>>> or failure?  Or are they assumed to never fail?
>>>
>>> - David
>>>
>>>
>>>
>>>
>>>
>>>  
>>>
>>
> 

Re: C API for Hadoop DFS

Posted by Sameer Paranjpye <sa...@yahoo-inc.com>.
Negative numbers work fine when the return value is an int. But what do 
you do when returning a pointer? Either you have an integer return value 
and a pointer to pointer as an output parameter, or you return NULL and 
indicate the error via errno





Eric Baldeschwieler wrote:
> I'd vote against errno, because I don't see why we need it.  Why not  
> just return the error as a negative number?  Adding a global just  
> complicates the code and introduces an opportunity for further error.
> 
> What am I missing?
> 
> On May 2, 2006, at 11:19 PM, Devaraj Das wrote:
> 
>> In our case, the components involved are the C API library, JNI  layer 
>> and
>> Java APIs. In all these, we have control over errno. For example, if a
>> particular C API uses a third party library function that might  
>> return error
>> and hence set errno, we know about it already. Depending on the  
>> error, we
>> take a decision whether to proceed further in the API  implementation 
>> code or
>> return an error to the client invoking the API. This includes the  
>> functions
>> in the JNI library which the API implementation calls. In the Java  
>> world, we
>> deal with exceptions and don't bother about errno. So for example,  if 
>> a Java
>> method, invoked through JNI from a C API, throws an exception, the  C API
>> implementation will get the exception object and depending on that  
>> the API
>> implementation will set a meaningful errno and return a (-1) or  NULL to
>> signify that an error occurred. As I said earlier, this includes  the 
>> case
>> where the JNI function itself fails (for some reason like out-of- 
>> memory or
>> something).
>> As an aside, the JNI layer doesn't generate errno-s.
>>
>> -----Original Message-----
>> From: Konstantin Shvachko [mailto:shv@yahoo-inc.com]
>> Sent: Wednesday, May 03, 2006 2:40 AM
>> To: hadoop-dev@lucene.apache.org
>> Subject: Re: C API for Hadoop DFS
>>
>> Don't think errno is a particularly good idea for several reasons.
>> It is not common to set errno codes.
>> If a system library function uses errno, and we overwrite its value to
>> return
>> something dfs related, the library function behavior becomes  
>> unpredictable.
>> This could be hard to debug.
>> We have a JNI layer between our C library and Java, which also might
>> generate
>> errno-s overwriting the values we were trying to bring back from Java.
>>
>> --Konstantin
>>
>> Doug Cutting wrote:
>>
>>> The spec says:
>>>
>>> /** All APIs set errno to meaningful values */
>>>
>>> So callers should always check errno after each call.  Whether  this is
>>> the best way to handle errors in C can be debated, but an error
>>> mechanism was in fact specified.
>>>
>>> Doug
>>>
>>> Konstantin Shvachko wrote:
>>>
>>>> I think this a very important issue raised by David.
>>>>
>>>> IMO __ALL__ functions should return an integer value indicating
>>>> success (=0) or failure (<0).
>>>> Unless we want to use C style Exceptions, otherwise we won't be able
>>>> to identify what went
>>>> wrong if anything.
>>>> NULL or bool is not enough in most cases, since we need to
>>>> distinguish e.g. between
>>>> timeout (when we retry) and "file not found" cases.
>>>> The actual return objects should be passed as outputs parameters.
>>>> E.g.
>>>> dfsFS dfsConnect(char *host, tPort port);
>>>> will become
>>>> tCompletionCode dfsConnect(char *host, tPort port, dfsFS  fileSystem );
>>>> where tCompletionCode could be integer for now. Or we can define a
>>>> structure
>>>> { int errCode; char *errDescription; }
>>>> to return the actual error descriptions along with the error code.
>>>>
>>>> --Konstantin
>>>>
>>>> Devaraj Das wrote:
>>>>
>>>>>> Do dfsConnect and dfsOpenFile return NULL on failure?
>>>>>>
>>>>>
>>>>>
>>>>> Yes.
>>>>>
>>>>>
>>>>>
>>>>>> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and
>>>>>> dfsSetWorkingDirectory each have a return value to indicate  success
>>>>>> or failure?  Or are they assumed to never fail?
>>>>>>
>>>>>
>>>>>
>>>>> Yes these functions should have return values. I will update the  API
>>>>> spec.
>>>>> Thanks for pointing this out.
>>>>>
>>>>> -----Original Message-----
>>>>> From: David Bowen [mailto:dbowen@yahoo-inc.com] Sent: Monday, May
>>>>> 01, 2006 8:13 AM
>>>>> To: hadoop-dev@lucene.apache.org
>>>>> Subject: Re: C API for Hadoop DFS
>>>>>
>>>>>
>>>>> I'm curious about error handling.
>>>>> Do dfsConnect and dfsOpenFile return NULL on failure?
>>>>>
>>>>> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and
>>>>> dfsSetWorkingDirectory each have a return value to indicate success
>>>>> or failure?  Or are they assumed to never fail?
>>>>>
>>>>> - David
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>>
> 
> 

Re: C API for Hadoop DFS

Posted by Sameer Paranjpye <sa...@yahoo-inc.com>.
All that said, I don't mean to be religious about errno. Using negative 
return values is a perfectly reasonably approach as well. If folks feel 
strongly about it I'm happy to go along ...



Sameer Paranjpye wrote:

> Errno is not a global and is thread safe in all modern libc 
> implementations. If you compile with -D_REENTRANT you'll be just fine. 
> There is a separate errno for each thread.
> 
> 
> 
> Runping Qi wrote:
> 
>> Errno approach proved to be problematic in multi-thread environments.
>> Returning an error code is better.
>>
>> Runping
>>
>>
>> -----Original Message-----
>> From: Devaraj Das [mailto:ddas@yahoo-inc.com] Sent: Wednesday, May 03, 
>> 2006 10:32 PM
>> To: hadoop-dev@lucene.apache.org
>> Subject: RE: C API for Hadoop DFS
>>
>> Returning error as a negative number works as well. We initially 
>> decided to
>> go with errno since it's a standard in most I/O centric APIs.
>>
>> -----Original Message-----
>> From: Eric Baldeschwieler [mailto:eric14@yahoo-inc.com] Sent: 
>> Thursday, May 04, 2006 9:45 AM
>> To: hadoop-dev@lucene.apache.org
>> Subject: Re: C API for Hadoop DFS
>>
>> I'd vote against errno, because I don't see why we need it.  Why not  
>> just return the error as a negative number?  Adding a global just  
>> complicates the code and introduces an opportunity for further error.
>>
>> What am I missing?
>>
>> On May 2, 2006, at 11:19 PM, Devaraj Das wrote:
>>
>>
>>> In our case, the components involved are the C API library, JNI  
>>> layer and
>>> Java APIs. In all these, we have control over errno. For example, if a
>>> particular C API uses a third party library function that might  
>>> return error
>>> and hence set errno, we know about it already. Depending on the  
>>> error, we
>>> take a decision whether to proceed further in the API  implementation 
>>> code or
>>> return an error to the client invoking the API. This includes the  
>>> functions
>>> in the JNI library which the API implementation calls. In the Java  
>>> world, we
>>> deal with exceptions and don't bother about errno. So for example,  
>>> if a Java
>>> method, invoked through JNI from a C API, throws an exception, the  C 
>>> API
>>> implementation will get the exception object and depending on that  
>>> the API
>>> implementation will set a meaningful errno and return a (-1) or  NULL to
>>> signify that an error occurred. As I said earlier, this includes  the 
>>> case
>>> where the JNI function itself fails (for some reason like out-of- 
>>> memory or
>>> something).
>>> As an aside, the JNI layer doesn't generate errno-s.
>>>
>>> -----Original Message-----
>>> From: Konstantin Shvachko [mailto:shv@yahoo-inc.com]
>>> Sent: Wednesday, May 03, 2006 2:40 AM
>>> To: hadoop-dev@lucene.apache.org
>>> Subject: Re: C API for Hadoop DFS
>>>
>>> Don't think errno is a particularly good idea for several reasons.
>>> It is not common to set errno codes.
>>> If a system library function uses errno, and we overwrite its value to
>>> return
>>> something dfs related, the library function behavior becomes  
>>> unpredictable.
>>> This could be hard to debug.
>>> We have a JNI layer between our C library and Java, which also might
>>> generate
>>> errno-s overwriting the values we were trying to bring back from Java.
>>>
>>> --Konstantin
>>>
>>> Doug Cutting wrote:
>>>
>>>
>>>> The spec says:
>>>>
>>>> /** All APIs set errno to meaningful values */
>>>>
>>>> So callers should always check errno after each call.  Whether  this is
>>>> the best way to handle errors in C can be debated, but an error
>>>> mechanism was in fact specified.
>>>>
>>>> Doug
>>>>
>>>> Konstantin Shvachko wrote:
>>>>
>>>>
>>>>> I think this a very important issue raised by David.
>>>>>
>>>>> IMO __ALL__ functions should return an integer value indicating
>>>>> success (=0) or failure (<0).
>>>>> Unless we want to use C style Exceptions, otherwise we won't be able
>>>>> to identify what went
>>>>> wrong if anything.
>>>>> NULL or bool is not enough in most cases, since we need to
>>>>> distinguish e.g. between
>>>>> timeout (when we retry) and "file not found" cases.
>>>>> The actual return objects should be passed as outputs parameters.
>>>>> E.g.
>>>>> dfsFS dfsConnect(char *host, tPort port);
>>>>> will become
>>>>> tCompletionCode dfsConnect(char *host, tPort port, dfsFS  
>>>>> fileSystem );
>>>>> where tCompletionCode could be integer for now. Or we can define a
>>>>> structure
>>>>> { int errCode; char *errDescription; }
>>>>> to return the actual error descriptions along with the error code.
>>>>>
>>>>> --Konstantin
>>>>>
>>>>> Devaraj Das wrote:
>>>>>
>>>>>
>>>>>>> Do dfsConnect and dfsOpenFile return NULL on failure?
>>>>>>>
>>>>>>
>>>>>>
>>>>>> Yes.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and
>>>>>>> dfsSetWorkingDirectory each have a return value to indicate  success
>>>>>>> or failure?  Or are they assumed to never fail?
>>>>>>>
>>>>>>
>>>>>>
>>>>>> Yes these functions should have return values. I will update the  API
>>>>>> spec.
>>>>>> Thanks for pointing this out.
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: David Bowen [mailto:dbowen@yahoo-inc.com] Sent: Monday, May
>>>>>> 01, 2006 8:13 AM
>>>>>> To: hadoop-dev@lucene.apache.org
>>>>>> Subject: Re: C API for Hadoop DFS
>>>>>>
>>>>>>
>>>>>> I'm curious about error handling.
>>>>>> Do dfsConnect and dfsOpenFile return NULL on failure?
>>>>>>
>>>>>> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and
>>>>>> dfsSetWorkingDirectory each have a return value to indicate success
>>>>>> or failure?  Or are they assumed to never fail?
>>>>>>
>>>>>> - David
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>>
>>
> 

Re: C API for Hadoop DFS

Posted by Sameer Paranjpye <sa...@yahoo-inc.com>.
Errno is not a global and is thread safe in all modern libc 
implementations. If you compile with -D_REENTRANT you'll be just fine. 
There is a separate errno for each thread.



Runping Qi wrote:

> Errno approach proved to be problematic in multi-thread environments.
> Returning an error code is better.
> 
> Runping
> 
> 
> -----Original Message-----
> From: Devaraj Das [mailto:ddas@yahoo-inc.com] 
> Sent: Wednesday, May 03, 2006 10:32 PM
> To: hadoop-dev@lucene.apache.org
> Subject: RE: C API for Hadoop DFS
> 
> Returning error as a negative number works as well. We initially decided to
> go with errno since it's a standard in most I/O centric APIs.
> 
> -----Original Message-----
> From: Eric Baldeschwieler [mailto:eric14@yahoo-inc.com] 
> Sent: Thursday, May 04, 2006 9:45 AM
> To: hadoop-dev@lucene.apache.org
> Subject: Re: C API for Hadoop DFS
> 
> I'd vote against errno, because I don't see why we need it.  Why not  
> just return the error as a negative number?  Adding a global just  
> complicates the code and introduces an opportunity for further error.
> 
> What am I missing?
> 
> On May 2, 2006, at 11:19 PM, Devaraj Das wrote:
> 
> 
>>In our case, the components involved are the C API library, JNI  
>>layer and
>>Java APIs. In all these, we have control over errno. For example, if a
>>particular C API uses a third party library function that might  
>>return error
>>and hence set errno, we know about it already. Depending on the  
>>error, we
>>take a decision whether to proceed further in the API  
>>implementation code or
>>return an error to the client invoking the API. This includes the  
>>functions
>>in the JNI library which the API implementation calls. In the Java  
>>world, we
>>deal with exceptions and don't bother about errno. So for example,  
>>if a Java
>>method, invoked through JNI from a C API, throws an exception, the  
>>C API
>>implementation will get the exception object and depending on that  
>>the API
>>implementation will set a meaningful errno and return a (-1) or  
>>NULL to
>>signify that an error occurred. As I said earlier, this includes  
>>the case
>>where the JNI function itself fails (for some reason like out-of- 
>>memory or
>>something).
>>As an aside, the JNI layer doesn't generate errno-s.
>>
>>-----Original Message-----
>>From: Konstantin Shvachko [mailto:shv@yahoo-inc.com]
>>Sent: Wednesday, May 03, 2006 2:40 AM
>>To: hadoop-dev@lucene.apache.org
>>Subject: Re: C API for Hadoop DFS
>>
>>Don't think errno is a particularly good idea for several reasons.
>>It is not common to set errno codes.
>>If a system library function uses errno, and we overwrite its value to
>>return
>>something dfs related, the library function behavior becomes  
>>unpredictable.
>>This could be hard to debug.
>>We have a JNI layer between our C library and Java, which also might
>>generate
>>errno-s overwriting the values we were trying to bring back from Java.
>>
>>--Konstantin
>>
>>Doug Cutting wrote:
>>
>>
>>>The spec says:
>>>
>>>/** All APIs set errno to meaningful values */
>>>
>>>So callers should always check errno after each call.  Whether  
>>>this is
>>>the best way to handle errors in C can be debated, but an error
>>>mechanism was in fact specified.
>>>
>>>Doug
>>>
>>>Konstantin Shvachko wrote:
>>>
>>>
>>>>I think this a very important issue raised by David.
>>>>
>>>>IMO __ALL__ functions should return an integer value indicating
>>>>success (=0) or failure (<0).
>>>>Unless we want to use C style Exceptions, otherwise we won't be able
>>>>to identify what went
>>>>wrong if anything.
>>>>NULL or bool is not enough in most cases, since we need to
>>>>distinguish e.g. between
>>>>timeout (when we retry) and "file not found" cases.
>>>>The actual return objects should be passed as outputs parameters.
>>>>E.g.
>>>>dfsFS dfsConnect(char *host, tPort port);
>>>>will become
>>>>tCompletionCode dfsConnect(char *host, tPort port, dfsFS  
>>>>fileSystem );
>>>>where tCompletionCode could be integer for now. Or we can define a
>>>>structure
>>>>{ int errCode; char *errDescription; }
>>>>to return the actual error descriptions along with the error code.
>>>>
>>>>--Konstantin
>>>>
>>>>Devaraj Das wrote:
>>>>
>>>>
>>>>>>Do dfsConnect and dfsOpenFile return NULL on failure?
>>>>>>
>>>>>
>>>>>
>>>>>Yes.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and
>>>>>>dfsSetWorkingDirectory each have a return value to indicate  
>>>>>>success
>>>>>>or failure?  Or are they assumed to never fail?
>>>>>>
>>>>>
>>>>>
>>>>>Yes these functions should have return values. I will update the  
>>>>>API
>>>>>spec.
>>>>>Thanks for pointing this out.
>>>>>
>>>>>-----Original Message-----
>>>>>From: David Bowen [mailto:dbowen@yahoo-inc.com] Sent: Monday, May
>>>>>01, 2006 8:13 AM
>>>>>To: hadoop-dev@lucene.apache.org
>>>>>Subject: Re: C API for Hadoop DFS
>>>>>
>>>>>
>>>>>I'm curious about error handling.
>>>>>Do dfsConnect and dfsOpenFile return NULL on failure?
>>>>>
>>>>>Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and
>>>>>dfsSetWorkingDirectory each have a return value to indicate success
>>>>>or failure?  Or are they assumed to never fail?
>>>>>
>>>>>- David
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
> 
> 
> 
> 

RE: C API for Hadoop DFS

Posted by Runping Qi <ru...@yahoo-inc.com>.
Errno approach proved to be problematic in multi-thread environments.
Returning an error code is better.

Runping


-----Original Message-----
From: Devaraj Das [mailto:ddas@yahoo-inc.com] 
Sent: Wednesday, May 03, 2006 10:32 PM
To: hadoop-dev@lucene.apache.org
Subject: RE: C API for Hadoop DFS

Returning error as a negative number works as well. We initially decided to
go with errno since it's a standard in most I/O centric APIs.

-----Original Message-----
From: Eric Baldeschwieler [mailto:eric14@yahoo-inc.com] 
Sent: Thursday, May 04, 2006 9:45 AM
To: hadoop-dev@lucene.apache.org
Subject: Re: C API for Hadoop DFS

I'd vote against errno, because I don't see why we need it.  Why not  
just return the error as a negative number?  Adding a global just  
complicates the code and introduces an opportunity for further error.

What am I missing?

On May 2, 2006, at 11:19 PM, Devaraj Das wrote:

> In our case, the components involved are the C API library, JNI  
> layer and
> Java APIs. In all these, we have control over errno. For example, if a
> particular C API uses a third party library function that might  
> return error
> and hence set errno, we know about it already. Depending on the  
> error, we
> take a decision whether to proceed further in the API  
> implementation code or
> return an error to the client invoking the API. This includes the  
> functions
> in the JNI library which the API implementation calls. In the Java  
> world, we
> deal with exceptions and don't bother about errno. So for example,  
> if a Java
> method, invoked through JNI from a C API, throws an exception, the  
> C API
> implementation will get the exception object and depending on that  
> the API
> implementation will set a meaningful errno and return a (-1) or  
> NULL to
> signify that an error occurred. As I said earlier, this includes  
> the case
> where the JNI function itself fails (for some reason like out-of- 
> memory or
> something).
> As an aside, the JNI layer doesn't generate errno-s.
>
> -----Original Message-----
> From: Konstantin Shvachko [mailto:shv@yahoo-inc.com]
> Sent: Wednesday, May 03, 2006 2:40 AM
> To: hadoop-dev@lucene.apache.org
> Subject: Re: C API for Hadoop DFS
>
> Don't think errno is a particularly good idea for several reasons.
> It is not common to set errno codes.
> If a system library function uses errno, and we overwrite its value to
> return
> something dfs related, the library function behavior becomes  
> unpredictable.
> This could be hard to debug.
> We have a JNI layer between our C library and Java, which also might
> generate
> errno-s overwriting the values we were trying to bring back from Java.
>
> --Konstantin
>
> Doug Cutting wrote:
>
>> The spec says:
>>
>> /** All APIs set errno to meaningful values */
>>
>> So callers should always check errno after each call.  Whether  
>> this is
>> the best way to handle errors in C can be debated, but an error
>> mechanism was in fact specified.
>>
>> Doug
>>
>> Konstantin Shvachko wrote:
>>
>>> I think this a very important issue raised by David.
>>>
>>> IMO __ALL__ functions should return an integer value indicating
>>> success (=0) or failure (<0).
>>> Unless we want to use C style Exceptions, otherwise we won't be able
>>> to identify what went
>>> wrong if anything.
>>> NULL or bool is not enough in most cases, since we need to
>>> distinguish e.g. between
>>> timeout (when we retry) and "file not found" cases.
>>> The actual return objects should be passed as outputs parameters.
>>> E.g.
>>> dfsFS dfsConnect(char *host, tPort port);
>>> will become
>>> tCompletionCode dfsConnect(char *host, tPort port, dfsFS  
>>> fileSystem );
>>> where tCompletionCode could be integer for now. Or we can define a
>>> structure
>>> { int errCode; char *errDescription; }
>>> to return the actual error descriptions along with the error code.
>>>
>>> --Konstantin
>>>
>>> Devaraj Das wrote:
>>>
>>>>> Do dfsConnect and dfsOpenFile return NULL on failure?
>>>>>
>>>>
>>>>
>>>> Yes.
>>>>
>>>>
>>>>
>>>>> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and
>>>>> dfsSetWorkingDirectory each have a return value to indicate  
>>>>> success
>>>>> or failure?  Or are they assumed to never fail?
>>>>>
>>>>
>>>>
>>>> Yes these functions should have return values. I will update the  
>>>> API
>>>> spec.
>>>> Thanks for pointing this out.
>>>>
>>>> -----Original Message-----
>>>> From: David Bowen [mailto:dbowen@yahoo-inc.com] Sent: Monday, May
>>>> 01, 2006 8:13 AM
>>>> To: hadoop-dev@lucene.apache.org
>>>> Subject: Re: C API for Hadoop DFS
>>>>
>>>>
>>>> I'm curious about error handling.
>>>> Do dfsConnect and dfsOpenFile return NULL on failure?
>>>>
>>>> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and
>>>> dfsSetWorkingDirectory each have a return value to indicate success
>>>> or failure?  Or are they assumed to never fail?
>>>>
>>>> - David
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>
>




RE: C API for Hadoop DFS

Posted by Devaraj Das <dd...@yahoo-inc.com>.
Returning error as a negative number works as well. We initially decided to
go with errno since it's a standard in most I/O centric APIs.

-----Original Message-----
From: Eric Baldeschwieler [mailto:eric14@yahoo-inc.com] 
Sent: Thursday, May 04, 2006 9:45 AM
To: hadoop-dev@lucene.apache.org
Subject: Re: C API for Hadoop DFS

I'd vote against errno, because I don't see why we need it.  Why not  
just return the error as a negative number?  Adding a global just  
complicates the code and introduces an opportunity for further error.

What am I missing?

On May 2, 2006, at 11:19 PM, Devaraj Das wrote:

> In our case, the components involved are the C API library, JNI  
> layer and
> Java APIs. In all these, we have control over errno. For example, if a
> particular C API uses a third party library function that might  
> return error
> and hence set errno, we know about it already. Depending on the  
> error, we
> take a decision whether to proceed further in the API  
> implementation code or
> return an error to the client invoking the API. This includes the  
> functions
> in the JNI library which the API implementation calls. In the Java  
> world, we
> deal with exceptions and don't bother about errno. So for example,  
> if a Java
> method, invoked through JNI from a C API, throws an exception, the  
> C API
> implementation will get the exception object and depending on that  
> the API
> implementation will set a meaningful errno and return a (-1) or  
> NULL to
> signify that an error occurred. As I said earlier, this includes  
> the case
> where the JNI function itself fails (for some reason like out-of- 
> memory or
> something).
> As an aside, the JNI layer doesn't generate errno-s.
>
> -----Original Message-----
> From: Konstantin Shvachko [mailto:shv@yahoo-inc.com]
> Sent: Wednesday, May 03, 2006 2:40 AM
> To: hadoop-dev@lucene.apache.org
> Subject: Re: C API for Hadoop DFS
>
> Don't think errno is a particularly good idea for several reasons.
> It is not common to set errno codes.
> If a system library function uses errno, and we overwrite its value to
> return
> something dfs related, the library function behavior becomes  
> unpredictable.
> This could be hard to debug.
> We have a JNI layer between our C library and Java, which also might
> generate
> errno-s overwriting the values we were trying to bring back from Java.
>
> --Konstantin
>
> Doug Cutting wrote:
>
>> The spec says:
>>
>> /** All APIs set errno to meaningful values */
>>
>> So callers should always check errno after each call.  Whether  
>> this is
>> the best way to handle errors in C can be debated, but an error
>> mechanism was in fact specified.
>>
>> Doug
>>
>> Konstantin Shvachko wrote:
>>
>>> I think this a very important issue raised by David.
>>>
>>> IMO __ALL__ functions should return an integer value indicating
>>> success (=0) or failure (<0).
>>> Unless we want to use C style Exceptions, otherwise we won't be able
>>> to identify what went
>>> wrong if anything.
>>> NULL or bool is not enough in most cases, since we need to
>>> distinguish e.g. between
>>> timeout (when we retry) and "file not found" cases.
>>> The actual return objects should be passed as outputs parameters.
>>> E.g.
>>> dfsFS dfsConnect(char *host, tPort port);
>>> will become
>>> tCompletionCode dfsConnect(char *host, tPort port, dfsFS  
>>> fileSystem );
>>> where tCompletionCode could be integer for now. Or we can define a
>>> structure
>>> { int errCode; char *errDescription; }
>>> to return the actual error descriptions along with the error code.
>>>
>>> --Konstantin
>>>
>>> Devaraj Das wrote:
>>>
>>>>> Do dfsConnect and dfsOpenFile return NULL on failure?
>>>>>
>>>>
>>>>
>>>> Yes.
>>>>
>>>>
>>>>
>>>>> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and
>>>>> dfsSetWorkingDirectory each have a return value to indicate  
>>>>> success
>>>>> or failure?  Or are they assumed to never fail?
>>>>>
>>>>
>>>>
>>>> Yes these functions should have return values. I will update the  
>>>> API
>>>> spec.
>>>> Thanks for pointing this out.
>>>>
>>>> -----Original Message-----
>>>> From: David Bowen [mailto:dbowen@yahoo-inc.com] Sent: Monday, May
>>>> 01, 2006 8:13 AM
>>>> To: hadoop-dev@lucene.apache.org
>>>> Subject: Re: C API for Hadoop DFS
>>>>
>>>>
>>>> I'm curious about error handling.
>>>> Do dfsConnect and dfsOpenFile return NULL on failure?
>>>>
>>>> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and
>>>> dfsSetWorkingDirectory each have a return value to indicate success
>>>> or failure?  Or are they assumed to never fail?
>>>>
>>>> - David
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>
>



Re: C API for Hadoop DFS

Posted by Eric Baldeschwieler <er...@yahoo-inc.com>.
I'd vote against errno, because I don't see why we need it.  Why not  
just return the error as a negative number?  Adding a global just  
complicates the code and introduces an opportunity for further error.

What am I missing?

On May 2, 2006, at 11:19 PM, Devaraj Das wrote:

> In our case, the components involved are the C API library, JNI  
> layer and
> Java APIs. In all these, we have control over errno. For example, if a
> particular C API uses a third party library function that might  
> return error
> and hence set errno, we know about it already. Depending on the  
> error, we
> take a decision whether to proceed further in the API  
> implementation code or
> return an error to the client invoking the API. This includes the  
> functions
> in the JNI library which the API implementation calls. In the Java  
> world, we
> deal with exceptions and don't bother about errno. So for example,  
> if a Java
> method, invoked through JNI from a C API, throws an exception, the  
> C API
> implementation will get the exception object and depending on that  
> the API
> implementation will set a meaningful errno and return a (-1) or  
> NULL to
> signify that an error occurred. As I said earlier, this includes  
> the case
> where the JNI function itself fails (for some reason like out-of- 
> memory or
> something).
> As an aside, the JNI layer doesn't generate errno-s.
>
> -----Original Message-----
> From: Konstantin Shvachko [mailto:shv@yahoo-inc.com]
> Sent: Wednesday, May 03, 2006 2:40 AM
> To: hadoop-dev@lucene.apache.org
> Subject: Re: C API for Hadoop DFS
>
> Don't think errno is a particularly good idea for several reasons.
> It is not common to set errno codes.
> If a system library function uses errno, and we overwrite its value to
> return
> something dfs related, the library function behavior becomes  
> unpredictable.
> This could be hard to debug.
> We have a JNI layer between our C library and Java, which also might
> generate
> errno-s overwriting the values we were trying to bring back from Java.
>
> --Konstantin
>
> Doug Cutting wrote:
>
>> The spec says:
>>
>> /** All APIs set errno to meaningful values */
>>
>> So callers should always check errno after each call.  Whether  
>> this is
>> the best way to handle errors in C can be debated, but an error
>> mechanism was in fact specified.
>>
>> Doug
>>
>> Konstantin Shvachko wrote:
>>
>>> I think this a very important issue raised by David.
>>>
>>> IMO __ALL__ functions should return an integer value indicating
>>> success (=0) or failure (<0).
>>> Unless we want to use C style Exceptions, otherwise we won't be able
>>> to identify what went
>>> wrong if anything.
>>> NULL or bool is not enough in most cases, since we need to
>>> distinguish e.g. between
>>> timeout (when we retry) and "file not found" cases.
>>> The actual return objects should be passed as outputs parameters.
>>> E.g.
>>> dfsFS dfsConnect(char *host, tPort port);
>>> will become
>>> tCompletionCode dfsConnect(char *host, tPort port, dfsFS  
>>> fileSystem );
>>> where tCompletionCode could be integer for now. Or we can define a
>>> structure
>>> { int errCode; char *errDescription; }
>>> to return the actual error descriptions along with the error code.
>>>
>>> --Konstantin
>>>
>>> Devaraj Das wrote:
>>>
>>>>> Do dfsConnect and dfsOpenFile return NULL on failure?
>>>>>
>>>>
>>>>
>>>> Yes.
>>>>
>>>>
>>>>
>>>>> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and
>>>>> dfsSetWorkingDirectory each have a return value to indicate  
>>>>> success
>>>>> or failure?  Or are they assumed to never fail?
>>>>>
>>>>
>>>>
>>>> Yes these functions should have return values. I will update the  
>>>> API
>>>> spec.
>>>> Thanks for pointing this out.
>>>>
>>>> -----Original Message-----
>>>> From: David Bowen [mailto:dbowen@yahoo-inc.com] Sent: Monday, May
>>>> 01, 2006 8:13 AM
>>>> To: hadoop-dev@lucene.apache.org
>>>> Subject: Re: C API for Hadoop DFS
>>>>
>>>>
>>>> I'm curious about error handling.
>>>> Do dfsConnect and dfsOpenFile return NULL on failure?
>>>>
>>>> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and
>>>> dfsSetWorkingDirectory each have a return value to indicate success
>>>> or failure?  Or are they assumed to never fail?
>>>>
>>>> - David
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>
>


RE: C API for Hadoop DFS

Posted by Devaraj Das <dd...@yahoo-inc.com>.
In our case, the components involved are the C API library, JNI layer and
Java APIs. In all these, we have control over errno. For example, if a
particular C API uses a third party library function that might return error
and hence set errno, we know about it already. Depending on the error, we
take a decision whether to proceed further in the API implementation code or
return an error to the client invoking the API. This includes the functions
in the JNI library which the API implementation calls. In the Java world, we
deal with exceptions and don't bother about errno. So for example, if a Java
method, invoked through JNI from a C API, throws an exception, the C API
implementation will get the exception object and depending on that the API
implementation will set a meaningful errno and return a (-1) or NULL to
signify that an error occurred. As I said earlier, this includes the case
where the JNI function itself fails (for some reason like out-of-memory or
something).
As an aside, the JNI layer doesn't generate errno-s.

-----Original Message-----
From: Konstantin Shvachko [mailto:shv@yahoo-inc.com] 
Sent: Wednesday, May 03, 2006 2:40 AM
To: hadoop-dev@lucene.apache.org
Subject: Re: C API for Hadoop DFS

Don't think errno is a particularly good idea for several reasons.
It is not common to set errno codes.
If a system library function uses errno, and we overwrite its value to 
return
something dfs related, the library function behavior becomes unpredictable.
This could be hard to debug.
We have a JNI layer between our C library and Java, which also might 
generate
errno-s overwriting the values we were trying to bring back from Java.

--Konstantin

Doug Cutting wrote:

> The spec says:
>
> /** All APIs set errno to meaningful values */
>
> So callers should always check errno after each call.  Whether this is 
> the best way to handle errors in C can be debated, but an error 
> mechanism was in fact specified.
>
> Doug
>
> Konstantin Shvachko wrote:
>
>> I think this a very important issue raised by David.
>>
>> IMO __ALL__ functions should return an integer value indicating 
>> success (=0) or failure (<0).
>> Unless we want to use C style Exceptions, otherwise we won't be able 
>> to identify what went
>> wrong if anything.
>> NULL or bool is not enough in most cases, since we need to 
>> distinguish e.g. between
>> timeout (when we retry) and "file not found" cases.
>> The actual return objects should be passed as outputs parameters.
>> E.g.
>> dfsFS dfsConnect(char *host, tPort port);
>> will become
>> tCompletionCode dfsConnect(char *host, tPort port, dfsFS fileSystem );
>> where tCompletionCode could be integer for now. Or we can define a 
>> structure
>> { int errCode; char *errDescription; }
>> to return the actual error descriptions along with the error code.
>>
>> --Konstantin
>>
>> Devaraj Das wrote:
>>
>>>> Do dfsConnect and dfsOpenFile return NULL on failure?
>>>>   
>>>
>>>
>>> Yes.
>>>
>>>  
>>>
>>>> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and 
>>>> dfsSetWorkingDirectory each have a return value to indicate success 
>>>> or failure?  Or are they assumed to never fail?
>>>>   
>>>
>>>
>>> Yes these functions should have return values. I will update the API 
>>> spec.
>>> Thanks for pointing this out.
>>>
>>> -----Original Message-----
>>> From: David Bowen [mailto:dbowen@yahoo-inc.com] Sent: Monday, May 
>>> 01, 2006 8:13 AM
>>> To: hadoop-dev@lucene.apache.org
>>> Subject: Re: C API for Hadoop DFS
>>>
>>>
>>> I'm curious about error handling.
>>> Do dfsConnect and dfsOpenFile return NULL on failure?
>>>
>>> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and 
>>> dfsSetWorkingDirectory each have a return value to indicate success 
>>> or failure?  Or are they assumed to never fail?
>>>
>>> - David
>>>
>>>
>>>
>>>
>>>
>>>  
>>>
>>
>
>
>



Re: C API for Hadoop DFS

Posted by Konstantin Shvachko <sh...@yahoo-inc.com>.
Don't think errno is a particularly good idea for several reasons.
It is not common to set errno codes.
If a system library function uses errno, and we overwrite its value to 
return
something dfs related, the library function behavior becomes unpredictable.
This could be hard to debug.
We have a JNI layer between our C library and Java, which also might 
generate
errno-s overwriting the values we were trying to bring back from Java.

--Konstantin

Doug Cutting wrote:

> The spec says:
>
> /** All APIs set errno to meaningful values */
>
> So callers should always check errno after each call.  Whether this is 
> the best way to handle errors in C can be debated, but an error 
> mechanism was in fact specified.
>
> Doug
>
> Konstantin Shvachko wrote:
>
>> I think this a very important issue raised by David.
>>
>> IMO __ALL__ functions should return an integer value indicating 
>> success (=0) or failure (<0).
>> Unless we want to use C style Exceptions, otherwise we won't be able 
>> to identify what went
>> wrong if anything.
>> NULL or bool is not enough in most cases, since we need to 
>> distinguish e.g. between
>> timeout (when we retry) and "file not found" cases.
>> The actual return objects should be passed as outputs parameters.
>> E.g.
>> dfsFS dfsConnect(char *host, tPort port);
>> will become
>> tCompletionCode dfsConnect(char *host, tPort port, dfsFS fileSystem );
>> where tCompletionCode could be integer for now. Or we can define a 
>> structure
>> { int errCode; char *errDescription; }
>> to return the actual error descriptions along with the error code.
>>
>> --Konstantin
>>
>> Devaraj Das wrote:
>>
>>>> Do dfsConnect and dfsOpenFile return NULL on failure?
>>>>   
>>>
>>>
>>> Yes.
>>>
>>>  
>>>
>>>> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and 
>>>> dfsSetWorkingDirectory each have a return value to indicate success 
>>>> or failure?  Or are they assumed to never fail?
>>>>   
>>>
>>>
>>> Yes these functions should have return values. I will update the API 
>>> spec.
>>> Thanks for pointing this out.
>>>
>>> -----Original Message-----
>>> From: David Bowen [mailto:dbowen@yahoo-inc.com] Sent: Monday, May 
>>> 01, 2006 8:13 AM
>>> To: hadoop-dev@lucene.apache.org
>>> Subject: Re: C API for Hadoop DFS
>>>
>>>
>>> I'm curious about error handling.
>>> Do dfsConnect and dfsOpenFile return NULL on failure?
>>>
>>> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and 
>>> dfsSetWorkingDirectory each have a return value to indicate success 
>>> or failure?  Or are they assumed to never fail?
>>>
>>> - David
>>>
>>>
>>>
>>>
>>>
>>>  
>>>
>>
>
>
>


Re: C API for Hadoop DFS

Posted by David Bowen <db...@yahoo-inc.com>.
Doug Cutting wrote:
> The spec says:
>
> /** All APIs set errno to meaningful values */
>
> So callers should always check errno after each call.  Whether this is 
> the best way to handle errors in C can be debated, but an error 
> mechanism was in fact specified.
I don't much like errno, but if we are using it we should follow the 
normal convention that applications only check errno when a return value 
indicates that there was an error.  So it does not need to be modified 
when there isn't an error.

- David


Re: C API for Hadoop DFS

Posted by Doug Cutting <cu...@apache.org>.
The spec says:

/** All APIs set errno to meaningful values */

So callers should always check errno after each call.  Whether this is 
the best way to handle errors in C can be debated, but an error 
mechanism was in fact specified.

Doug

Konstantin Shvachko wrote:
> I think this a very important issue raised by David.
> 
> IMO __ALL__ functions should return an integer value indicating success 
> (=0) or failure (<0).
> Unless we want to use C style Exceptions, otherwise we won't be able to 
> identify what went
> wrong if anything.
> NULL or bool is not enough in most cases, since we need to distinguish 
> e.g. between
> timeout (when we retry) and "file not found" cases.
> The actual return objects should be passed as outputs parameters.
> E.g.
> dfsFS dfsConnect(char *host, tPort port);
> will become
> tCompletionCode dfsConnect(char *host, tPort port, dfsFS fileSystem );
> where tCompletionCode could be integer for now. Or we can define a 
> structure
> { int errCode; char *errDescription; }
> to return the actual error descriptions along with the error code.
> 
> --Konstantin
> 
> Devaraj Das wrote:
> 
>>> Do dfsConnect and dfsOpenFile return NULL on failure?
>>>   
>>
>> Yes.
>>
>>  
>>
>>> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and 
>>> dfsSetWorkingDirectory each have a return value to indicate success 
>>> or failure?  Or are they assumed to never fail?
>>>   
>>
>> Yes these functions should have return values. I will update the API 
>> spec.
>> Thanks for pointing this out.
>>
>> -----Original Message-----
>> From: David Bowen [mailto:dbowen@yahoo-inc.com] Sent: Monday, May 01, 
>> 2006 8:13 AM
>> To: hadoop-dev@lucene.apache.org
>> Subject: Re: C API for Hadoop DFS
>>
>>
>> I'm curious about error handling.
>> Do dfsConnect and dfsOpenFile return NULL on failure?
>>
>> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and 
>> dfsSetWorkingDirectory each have a return value to indicate success or 
>> failure?  Or are they assumed to never fail?
>>
>> - David
>>
>>
>>
>>
>>
>>  
>>
> 

Re: C API for Hadoop DFS

Posted by Konstantin Shvachko <sh...@yahoo-inc.com>.
I think this a very important issue raised by David.

IMO __ALL__ functions should return an integer value indicating success 
(=0) or failure (<0).
Unless we want to use C style Exceptions, otherwise we won't be able to 
identify what went
wrong if anything.
NULL or bool is not enough in most cases, since we need to distinguish 
e.g. between
timeout (when we retry) and "file not found" cases.
The actual return objects should be passed as outputs parameters.
E.g.
dfsFS dfsConnect(char *host, tPort port);
will become
tCompletionCode dfsConnect(char *host, tPort port, dfsFS fileSystem );
where tCompletionCode could be integer for now. Or we can define a structure
{ int errCode; char *errDescription; }
to return the actual error descriptions along with the error code.

--Konstantin

Devaraj Das wrote:

>>Do dfsConnect and dfsOpenFile return NULL on failure?
>>    
>>
>Yes.
>
>  
>
>>Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and 
>>dfsSetWorkingDirectory each have a return value to indicate success or 
>>failure?  Or are they assumed to never fail?
>>    
>>
>Yes these functions should have return values. I will update the API spec.
>Thanks for pointing this out.
>
>-----Original Message-----
>From: David Bowen [mailto:dbowen@yahoo-inc.com] 
>Sent: Monday, May 01, 2006 8:13 AM
>To: hadoop-dev@lucene.apache.org
>Subject: Re: C API for Hadoop DFS
>
>
>I'm curious about error handling. 
>
>Do dfsConnect and dfsOpenFile return NULL on failure?
>
>Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and 
>dfsSetWorkingDirectory each have a return value to indicate success or 
>failure?  Or are they assumed to never fail?
>
>- David
>
>
>
>
>
>  
>


RE: C API for Hadoop DFS

Posted by Devaraj Das <dd...@yahoo-inc.com>.
> Do dfsConnect and dfsOpenFile return NULL on failure?
Yes.

> Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and 
> dfsSetWorkingDirectory each have a return value to indicate success or 
> failure?  Or are they assumed to never fail?
Yes these functions should have return values. I will update the API spec.
Thanks for pointing this out.

-----Original Message-----
From: David Bowen [mailto:dbowen@yahoo-inc.com] 
Sent: Monday, May 01, 2006 8:13 AM
To: hadoop-dev@lucene.apache.org
Subject: Re: C API for Hadoop DFS


I'm curious about error handling. 

Do dfsConnect and dfsOpenFile return NULL on failure?

Shouldn't dfsSeek, dfsRename, dfsCreateDirectory and 
dfsSetWorkingDirectory each have a return value to indicate success or 
failure?  Or are they assumed to never fail?

- David