You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Bram Biesbrouck <b...@beligum.com> on 2015/04/16 16:58:33 UTC

Found weird issue with HttpFS and WebHdfsFileSystem

Hi all,

I'm experiencing something strange while developing against the HttpFS
front-end webapp on Hadoop 2.6.0.

I'm currently digging into WebHdfsFileSystem and HttpFS to understand it
better and understand how the rest api works. I've setup a local single
node Hadoop instance, which I can query successfully with eg.
http://localhost:50070/webhdfs/v1/?op=LISTSTATUS
Returning eg. this FileStatus object:

{
accessTime: 0,
blockSize: 0,
childrenNum: 0,
fileId: 16386,
group: "supergroup",
length: 0,
modificationTime: 1417964248854,
owner: "hadoop",
pathSuffix: "user",
permission: "755",
replication: 0,
storagePolicy: 0,
type: "DIRECTORY"
}

Now, when I start HttpFS and ask for the same data over it's interface (
http://localhost:14000/webhdfs/v1/?op=LISTSTATUS), I get a different reply.
Especially, the childrenNum and fileId fields are missing, compared to the
first result (same file or directory):

{
pathSuffix: "user",
type: "DIRECTORY",
length: 0,
owner: "hadoop",
group: "supergroup",
permission: "755",
accessTime: 0,
modificationTime: 1417964248854,
blockSize: 0,
replication: 0
}

Since I need the childrenNum property, I started digging into the code to
see where it's "lost" and found that WebHdfsFileSystem performs a
makeQualified() step (around line 1287 in WebHdfsFileSystem.java), just
before the list of filestatuses is returned. Basically, it converts
HdfsFileStatus objects into FileStatus objects, effectively chopping off
those two properties.

The sources for HdfsFileStatus clearly state that it's an "Interface that
represents the over the wire information for a file.", so I wonder why this
happens, since the HdfsFileStatus contains all the right properties,
according to the docs at
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#List_a_Directory

It feels like the FileStatus class hasn't been updated to match the
HdfsFileStatus class, but since they don't share any interfaces or
superclasses I get the feeling it's intentional, but I just can't find or
figure out why.

Can somebody help or shed some light?

thanks,

b.
-- 

 Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
reinvention

Re: Found weird issue with HttpFS and WebHdfsFileSystem

Posted by Bram Biesbrouck <b...@beligum.com>.
Great, thanks a lot.

b.

On Mon, Apr 20, 2015 at 7:03 PM, Chris Nauroth <cn...@hortonworks.com>
wrote:

>   Hi Bram,
>
>  Your gut feeling is correct.  These 2 properties are used in private
> implementation details of cluster communication.  I believe these 2
> properties are currently the only difference compared to the public REST
> API.
>
>   Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>   From: Bram Biesbrouck <b...@beligum.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Monday, April 20, 2015 at 4:19 AM
>
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Re: Found weird issue with HttpFS and WebHdfsFileSystem
>
>   Hi Chris,
>
>  Thanks for your insights. Last question: can you tell me the main
> differences (from a Hadoop dev point of view) between the public REST api
> and the HDFS wire protocol?
> My gut feeling tells me hdfs is mainly used in cluster communication and
> the public one is, well, for public api's. But maybe I'm missing some more
> subtle differences?
>
>  cheers,
>
>  b.
>
> On Fri, Apr 17, 2015 at 7:28 PM, Chris Nauroth <cn...@hortonworks.com>
> wrote:
>
>>   Hello Bram,
>>
>>  I'm glad to hear the information was helpful.
>>
>>  If you'd like to request access to childNum as part of a guaranteed
>> public API, then I encourage you to create a jira issue in the HDFS
>> project.  We could consider it for the future.
>>
>>  HdfsFileStatus is a representation of the HDFS wire protocol, and it's
>> intended to be decoupled from the public API FileStatus object so that the
>> two can evolve independently.  From a pure code reuse perspective, I
>> suppose the two could share a common base class, but then that common base
>> class would need to creep into the public API too.
>>
>>  Currently, there is no guarantee about the availability of these fields
>> in the public REST API.  We're going to remove mention of them in the
>> documentation.  We're not necessarily planning to remove the fields from
>> the JSON immediately, but there is also no guarantee that they'll stay
>> there.
>>
>>   Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>   From: Bram Biesbrouck <b...@beligum.com>
>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Date: Friday, April 17, 2015 at 5:26 AM
>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Subject: Re: Found weird issue with HttpFS and WebHdfsFileSystem
>>
>>   Hi Chris,
>>
>>  Thanks for this reply. I thought something funny was happening.
>>
>>  The childNum field is actually very useful (eg for (not) rendering a
>> expansion marker next to a folder in a GUI when it has children), so it's a
>> pity the info is there, but get's "eaten up" by the general interface, only
>> to be re-calculated later on.
>> It would be nice to have the info as an optional field in the FileStatus
>> class (initialized to -1 like it is right now), so we can use it if it's
>> there or just ignore it when not initialized. While I'm
>> ranting, HdfsFileStatus should override from FileStatus because it's 95%
>> the same code anyway.
>>
>>  If I read your reply correctly, I assume the fields will be deleted
>> from the webhdfs JSON responses as well in the future?
>>
>>  Thanks again for the extensive reply, very useful and appreciated.
>>
>>  cheers,
>>
>>  b.
>>
>>
>>
>> On Fri, Apr 17, 2015 at 12:59 AM, Chris Nauroth <cnauroth@hortonworks.com
>> > wrote:
>>
>>>  Hello Bram,
>>>
>>>  There are a few Apache jiras with background discussion of the
>>> introduction of these fields in WebHDFS.
>>>
>>>  https://issues.apache.org/jira/browse/HDFS-4502
>>>
>>>  https://issues.apache.org/jira/browse/HDFS-4772
>>>
>>>  https://issues.apache.org/jira/browse/HDFS-4969
>>>
>>>  The new fields could not be supported in HTTPFS (only WebHDFS), and
>>> they were not intended to be guaranteed in the public REST API.
>>> Unfortunately, the fields were added to the documentation mistakenly in
>>> Apache Hadoop 2.5.0.
>>>
>>>  https://issues.apache.org/jira/browse/HDFS-6153
>>>
>>>  We're going to revert that documentation change in Apache Hadoop
>>> 2.8.0.  I suggest that your application does not rely on these fields, or
>>> at least includes fallback logic to keep working as best as it can if the
>>> fields are not present.  Another way to determine the number of children
>>> would be to make a subsequent LISTSTATUS call on the child path.
>>>
>>>  I apologize if this caused any inconvenience, and I hope the
>>> information helps.
>>>
>>>   Chris Nauroth
>>> Hortonworks
>>> http://hortonworks.com/
>>>
>>>
>>>   From: Bram Biesbrouck <b...@beligum.com>
>>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>>> Date: Thursday, April 16, 2015 at 7:58 AM
>>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>>> Subject: Found weird issue with HttpFS and WebHdfsFileSystem
>>>
>>>   Hi all,
>>>
>>>  I'm experiencing something strange while developing against the HttpFS
>>> front-end webapp on Hadoop 2.6.0.
>>>
>>>  I'm currently digging into WebHdfsFileSystem and HttpFS to understand
>>> it better and understand how the rest api works. I've setup a local single
>>> node Hadoop instance, which I can query successfully with eg.
>>> http://localhost:50070/webhdfs/v1/?op=LISTSTATUS
>>> Returning eg. this FileStatus object:
>>>
>>>  {
>>> accessTime: 0,
>>> blockSize: 0,
>>> childrenNum: 0,
>>> fileId: 16386,
>>> group: "supergroup",
>>> length: 0,
>>> modificationTime: 1417964248854,
>>> owner: "hadoop",
>>> pathSuffix: "user",
>>> permission: "755",
>>> replication: 0,
>>> storagePolicy: 0,
>>> type: "DIRECTORY"
>>> }
>>>
>>>  Now, when I start HttpFS and ask for the same data over it's interface
>>> (http://localhost:14000/webhdfs/v1/?op=LISTSTATUS), I get a different
>>> reply. Especially, the childrenNum and fileId fields are missing, compared
>>> to the first result (same file or directory):
>>>
>>>  {
>>> pathSuffix: "user",
>>> type: "DIRECTORY",
>>> length: 0,
>>> owner: "hadoop",
>>> group: "supergroup",
>>> permission: "755",
>>> accessTime: 0,
>>> modificationTime: 1417964248854,
>>> blockSize: 0,
>>> replication: 0
>>> }
>>>
>>>  Since I need the childrenNum property, I started digging into the code
>>> to see where it's "lost" and found that WebHdfsFileSystem performs a
>>> makeQualified() step (around line 1287 in WebHdfsFileSystem.java), just
>>> before the list of filestatuses is returned. Basically, it converts
>>> HdfsFileStatus objects into FileStatus objects, effectively chopping off
>>> those two properties.
>>>
>>>  The sources for HdfsFileStatus clearly state that it's an "Interface
>>> that represents the over the wire information for a file.", so I wonder why
>>> this happens, since the HdfsFileStatus contains all the right properties,
>>> according to the docs at
>>> http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#List_a_Directory
>>>
>>>  It feels like the FileStatus class hasn't been updated to match the
>>> HdfsFileStatus class, but since they don't share any interfaces or
>>> superclasses I get the feeling it's intentional, but I just can't find or
>>> figure out why.
>>>
>>>  Can somebody help or shed some light?
>>>
>>>  thanks,
>>>
>>>  b.
>>> --
>>>
>>>  Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
>>> reinvention
>>>
>>>
>>
>>
>>  --
>>
>>  Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
>> reinvention
>>
>>
>
>
>  --
>
>  Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
> reinvention
>
>


-- 

 Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
reinvention

Re: Found weird issue with HttpFS and WebHdfsFileSystem

Posted by Bram Biesbrouck <b...@beligum.com>.
Great, thanks a lot.

b.

On Mon, Apr 20, 2015 at 7:03 PM, Chris Nauroth <cn...@hortonworks.com>
wrote:

>   Hi Bram,
>
>  Your gut feeling is correct.  These 2 properties are used in private
> implementation details of cluster communication.  I believe these 2
> properties are currently the only difference compared to the public REST
> API.
>
>   Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>   From: Bram Biesbrouck <b...@beligum.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Monday, April 20, 2015 at 4:19 AM
>
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Re: Found weird issue with HttpFS and WebHdfsFileSystem
>
>   Hi Chris,
>
>  Thanks for your insights. Last question: can you tell me the main
> differences (from a Hadoop dev point of view) between the public REST api
> and the HDFS wire protocol?
> My gut feeling tells me hdfs is mainly used in cluster communication and
> the public one is, well, for public api's. But maybe I'm missing some more
> subtle differences?
>
>  cheers,
>
>  b.
>
> On Fri, Apr 17, 2015 at 7:28 PM, Chris Nauroth <cn...@hortonworks.com>
> wrote:
>
>>   Hello Bram,
>>
>>  I'm glad to hear the information was helpful.
>>
>>  If you'd like to request access to childNum as part of a guaranteed
>> public API, then I encourage you to create a jira issue in the HDFS
>> project.  We could consider it for the future.
>>
>>  HdfsFileStatus is a representation of the HDFS wire protocol, and it's
>> intended to be decoupled from the public API FileStatus object so that the
>> two can evolve independently.  From a pure code reuse perspective, I
>> suppose the two could share a common base class, but then that common base
>> class would need to creep into the public API too.
>>
>>  Currently, there is no guarantee about the availability of these fields
>> in the public REST API.  We're going to remove mention of them in the
>> documentation.  We're not necessarily planning to remove the fields from
>> the JSON immediately, but there is also no guarantee that they'll stay
>> there.
>>
>>   Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>   From: Bram Biesbrouck <b...@beligum.com>
>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Date: Friday, April 17, 2015 at 5:26 AM
>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Subject: Re: Found weird issue with HttpFS and WebHdfsFileSystem
>>
>>   Hi Chris,
>>
>>  Thanks for this reply. I thought something funny was happening.
>>
>>  The childNum field is actually very useful (eg for (not) rendering a
>> expansion marker next to a folder in a GUI when it has children), so it's a
>> pity the info is there, but get's "eaten up" by the general interface, only
>> to be re-calculated later on.
>> It would be nice to have the info as an optional field in the FileStatus
>> class (initialized to -1 like it is right now), so we can use it if it's
>> there or just ignore it when not initialized. While I'm
>> ranting, HdfsFileStatus should override from FileStatus because it's 95%
>> the same code anyway.
>>
>>  If I read your reply correctly, I assume the fields will be deleted
>> from the webhdfs JSON responses as well in the future?
>>
>>  Thanks again for the extensive reply, very useful and appreciated.
>>
>>  cheers,
>>
>>  b.
>>
>>
>>
>> On Fri, Apr 17, 2015 at 12:59 AM, Chris Nauroth <cnauroth@hortonworks.com
>> > wrote:
>>
>>>  Hello Bram,
>>>
>>>  There are a few Apache jiras with background discussion of the
>>> introduction of these fields in WebHDFS.
>>>
>>>  https://issues.apache.org/jira/browse/HDFS-4502
>>>
>>>  https://issues.apache.org/jira/browse/HDFS-4772
>>>
>>>  https://issues.apache.org/jira/browse/HDFS-4969
>>>
>>>  The new fields could not be supported in HTTPFS (only WebHDFS), and
>>> they were not intended to be guaranteed in the public REST API.
>>> Unfortunately, the fields were added to the documentation mistakenly in
>>> Apache Hadoop 2.5.0.
>>>
>>>  https://issues.apache.org/jira/browse/HDFS-6153
>>>
>>>  We're going to revert that documentation change in Apache Hadoop
>>> 2.8.0.  I suggest that your application does not rely on these fields, or
>>> at least includes fallback logic to keep working as best as it can if the
>>> fields are not present.  Another way to determine the number of children
>>> would be to make a subsequent LISTSTATUS call on the child path.
>>>
>>>  I apologize if this caused any inconvenience, and I hope the
>>> information helps.
>>>
>>>   Chris Nauroth
>>> Hortonworks
>>> http://hortonworks.com/
>>>
>>>
>>>   From: Bram Biesbrouck <b...@beligum.com>
>>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>>> Date: Thursday, April 16, 2015 at 7:58 AM
>>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>>> Subject: Found weird issue with HttpFS and WebHdfsFileSystem
>>>
>>>   Hi all,
>>>
>>>  I'm experiencing something strange while developing against the HttpFS
>>> front-end webapp on Hadoop 2.6.0.
>>>
>>>  I'm currently digging into WebHdfsFileSystem and HttpFS to understand
>>> it better and understand how the rest api works. I've setup a local single
>>> node Hadoop instance, which I can query successfully with eg.
>>> http://localhost:50070/webhdfs/v1/?op=LISTSTATUS
>>> Returning eg. this FileStatus object:
>>>
>>>  {
>>> accessTime: 0,
>>> blockSize: 0,
>>> childrenNum: 0,
>>> fileId: 16386,
>>> group: "supergroup",
>>> length: 0,
>>> modificationTime: 1417964248854,
>>> owner: "hadoop",
>>> pathSuffix: "user",
>>> permission: "755",
>>> replication: 0,
>>> storagePolicy: 0,
>>> type: "DIRECTORY"
>>> }
>>>
>>>  Now, when I start HttpFS and ask for the same data over it's interface
>>> (http://localhost:14000/webhdfs/v1/?op=LISTSTATUS), I get a different
>>> reply. Especially, the childrenNum and fileId fields are missing, compared
>>> to the first result (same file or directory):
>>>
>>>  {
>>> pathSuffix: "user",
>>> type: "DIRECTORY",
>>> length: 0,
>>> owner: "hadoop",
>>> group: "supergroup",
>>> permission: "755",
>>> accessTime: 0,
>>> modificationTime: 1417964248854,
>>> blockSize: 0,
>>> replication: 0
>>> }
>>>
>>>  Since I need the childrenNum property, I started digging into the code
>>> to see where it's "lost" and found that WebHdfsFileSystem performs a
>>> makeQualified() step (around line 1287 in WebHdfsFileSystem.java), just
>>> before the list of filestatuses is returned. Basically, it converts
>>> HdfsFileStatus objects into FileStatus objects, effectively chopping off
>>> those two properties.
>>>
>>>  The sources for HdfsFileStatus clearly state that it's an "Interface
>>> that represents the over the wire information for a file.", so I wonder why
>>> this happens, since the HdfsFileStatus contains all the right properties,
>>> according to the docs at
>>> http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#List_a_Directory
>>>
>>>  It feels like the FileStatus class hasn't been updated to match the
>>> HdfsFileStatus class, but since they don't share any interfaces or
>>> superclasses I get the feeling it's intentional, but I just can't find or
>>> figure out why.
>>>
>>>  Can somebody help or shed some light?
>>>
>>>  thanks,
>>>
>>>  b.
>>> --
>>>
>>>  Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
>>> reinvention
>>>
>>>
>>
>>
>>  --
>>
>>  Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
>> reinvention
>>
>>
>
>
>  --
>
>  Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
> reinvention
>
>


-- 

 Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
reinvention

Re: Found weird issue with HttpFS and WebHdfsFileSystem

Posted by Bram Biesbrouck <b...@beligum.com>.
Great, thanks a lot.

b.

On Mon, Apr 20, 2015 at 7:03 PM, Chris Nauroth <cn...@hortonworks.com>
wrote:

>   Hi Bram,
>
>  Your gut feeling is correct.  These 2 properties are used in private
> implementation details of cluster communication.  I believe these 2
> properties are currently the only difference compared to the public REST
> API.
>
>   Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>   From: Bram Biesbrouck <b...@beligum.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Monday, April 20, 2015 at 4:19 AM
>
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Re: Found weird issue with HttpFS and WebHdfsFileSystem
>
>   Hi Chris,
>
>  Thanks for your insights. Last question: can you tell me the main
> differences (from a Hadoop dev point of view) between the public REST api
> and the HDFS wire protocol?
> My gut feeling tells me hdfs is mainly used in cluster communication and
> the public one is, well, for public api's. But maybe I'm missing some more
> subtle differences?
>
>  cheers,
>
>  b.
>
> On Fri, Apr 17, 2015 at 7:28 PM, Chris Nauroth <cn...@hortonworks.com>
> wrote:
>
>>   Hello Bram,
>>
>>  I'm glad to hear the information was helpful.
>>
>>  If you'd like to request access to childNum as part of a guaranteed
>> public API, then I encourage you to create a jira issue in the HDFS
>> project.  We could consider it for the future.
>>
>>  HdfsFileStatus is a representation of the HDFS wire protocol, and it's
>> intended to be decoupled from the public API FileStatus object so that the
>> two can evolve independently.  From a pure code reuse perspective, I
>> suppose the two could share a common base class, but then that common base
>> class would need to creep into the public API too.
>>
>>  Currently, there is no guarantee about the availability of these fields
>> in the public REST API.  We're going to remove mention of them in the
>> documentation.  We're not necessarily planning to remove the fields from
>> the JSON immediately, but there is also no guarantee that they'll stay
>> there.
>>
>>   Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>   From: Bram Biesbrouck <b...@beligum.com>
>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Date: Friday, April 17, 2015 at 5:26 AM
>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Subject: Re: Found weird issue with HttpFS and WebHdfsFileSystem
>>
>>   Hi Chris,
>>
>>  Thanks for this reply. I thought something funny was happening.
>>
>>  The childNum field is actually very useful (eg for (not) rendering a
>> expansion marker next to a folder in a GUI when it has children), so it's a
>> pity the info is there, but get's "eaten up" by the general interface, only
>> to be re-calculated later on.
>> It would be nice to have the info as an optional field in the FileStatus
>> class (initialized to -1 like it is right now), so we can use it if it's
>> there or just ignore it when not initialized. While I'm
>> ranting, HdfsFileStatus should override from FileStatus because it's 95%
>> the same code anyway.
>>
>>  If I read your reply correctly, I assume the fields will be deleted
>> from the webhdfs JSON responses as well in the future?
>>
>>  Thanks again for the extensive reply, very useful and appreciated.
>>
>>  cheers,
>>
>>  b.
>>
>>
>>
>> On Fri, Apr 17, 2015 at 12:59 AM, Chris Nauroth <cnauroth@hortonworks.com
>> > wrote:
>>
>>>  Hello Bram,
>>>
>>>  There are a few Apache jiras with background discussion of the
>>> introduction of these fields in WebHDFS.
>>>
>>>  https://issues.apache.org/jira/browse/HDFS-4502
>>>
>>>  https://issues.apache.org/jira/browse/HDFS-4772
>>>
>>>  https://issues.apache.org/jira/browse/HDFS-4969
>>>
>>>  The new fields could not be supported in HTTPFS (only WebHDFS), and
>>> they were not intended to be guaranteed in the public REST API.
>>> Unfortunately, the fields were added to the documentation mistakenly in
>>> Apache Hadoop 2.5.0.
>>>
>>>  https://issues.apache.org/jira/browse/HDFS-6153
>>>
>>>  We're going to revert that documentation change in Apache Hadoop
>>> 2.8.0.  I suggest that your application does not rely on these fields, or
>>> at least includes fallback logic to keep working as best as it can if the
>>> fields are not present.  Another way to determine the number of children
>>> would be to make a subsequent LISTSTATUS call on the child path.
>>>
>>>  I apologize if this caused any inconvenience, and I hope the
>>> information helps.
>>>
>>>   Chris Nauroth
>>> Hortonworks
>>> http://hortonworks.com/
>>>
>>>
>>>   From: Bram Biesbrouck <b...@beligum.com>
>>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>>> Date: Thursday, April 16, 2015 at 7:58 AM
>>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>>> Subject: Found weird issue with HttpFS and WebHdfsFileSystem
>>>
>>>   Hi all,
>>>
>>>  I'm experiencing something strange while developing against the HttpFS
>>> front-end webapp on Hadoop 2.6.0.
>>>
>>>  I'm currently digging into WebHdfsFileSystem and HttpFS to understand
>>> it better and understand how the rest api works. I've setup a local single
>>> node Hadoop instance, which I can query successfully with eg.
>>> http://localhost:50070/webhdfs/v1/?op=LISTSTATUS
>>> Returning eg. this FileStatus object:
>>>
>>>  {
>>> accessTime: 0,
>>> blockSize: 0,
>>> childrenNum: 0,
>>> fileId: 16386,
>>> group: "supergroup",
>>> length: 0,
>>> modificationTime: 1417964248854,
>>> owner: "hadoop",
>>> pathSuffix: "user",
>>> permission: "755",
>>> replication: 0,
>>> storagePolicy: 0,
>>> type: "DIRECTORY"
>>> }
>>>
>>>  Now, when I start HttpFS and ask for the same data over it's interface
>>> (http://localhost:14000/webhdfs/v1/?op=LISTSTATUS), I get a different
>>> reply. Especially, the childrenNum and fileId fields are missing, compared
>>> to the first result (same file or directory):
>>>
>>>  {
>>> pathSuffix: "user",
>>> type: "DIRECTORY",
>>> length: 0,
>>> owner: "hadoop",
>>> group: "supergroup",
>>> permission: "755",
>>> accessTime: 0,
>>> modificationTime: 1417964248854,
>>> blockSize: 0,
>>> replication: 0
>>> }
>>>
>>>  Since I need the childrenNum property, I started digging into the code
>>> to see where it's "lost" and found that WebHdfsFileSystem performs a
>>> makeQualified() step (around line 1287 in WebHdfsFileSystem.java), just
>>> before the list of filestatuses is returned. Basically, it converts
>>> HdfsFileStatus objects into FileStatus objects, effectively chopping off
>>> those two properties.
>>>
>>>  The sources for HdfsFileStatus clearly state that it's an "Interface
>>> that represents the over the wire information for a file.", so I wonder why
>>> this happens, since the HdfsFileStatus contains all the right properties,
>>> according to the docs at
>>> http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#List_a_Directory
>>>
>>>  It feels like the FileStatus class hasn't been updated to match the
>>> HdfsFileStatus class, but since they don't share any interfaces or
>>> superclasses I get the feeling it's intentional, but I just can't find or
>>> figure out why.
>>>
>>>  Can somebody help or shed some light?
>>>
>>>  thanks,
>>>
>>>  b.
>>> --
>>>
>>>  Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
>>> reinvention
>>>
>>>
>>
>>
>>  --
>>
>>  Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
>> reinvention
>>
>>
>
>
>  --
>
>  Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
> reinvention
>
>


-- 

 Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
reinvention

Re: Found weird issue with HttpFS and WebHdfsFileSystem

Posted by Bram Biesbrouck <b...@beligum.com>.
Great, thanks a lot.

b.

On Mon, Apr 20, 2015 at 7:03 PM, Chris Nauroth <cn...@hortonworks.com>
wrote:

>   Hi Bram,
>
>  Your gut feeling is correct.  These 2 properties are used in private
> implementation details of cluster communication.  I believe these 2
> properties are currently the only difference compared to the public REST
> API.
>
>   Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>   From: Bram Biesbrouck <b...@beligum.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Monday, April 20, 2015 at 4:19 AM
>
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Re: Found weird issue with HttpFS and WebHdfsFileSystem
>
>   Hi Chris,
>
>  Thanks for your insights. Last question: can you tell me the main
> differences (from a Hadoop dev point of view) between the public REST api
> and the HDFS wire protocol?
> My gut feeling tells me hdfs is mainly used in cluster communication and
> the public one is, well, for public api's. But maybe I'm missing some more
> subtle differences?
>
>  cheers,
>
>  b.
>
> On Fri, Apr 17, 2015 at 7:28 PM, Chris Nauroth <cn...@hortonworks.com>
> wrote:
>
>>   Hello Bram,
>>
>>  I'm glad to hear the information was helpful.
>>
>>  If you'd like to request access to childNum as part of a guaranteed
>> public API, then I encourage you to create a jira issue in the HDFS
>> project.  We could consider it for the future.
>>
>>  HdfsFileStatus is a representation of the HDFS wire protocol, and it's
>> intended to be decoupled from the public API FileStatus object so that the
>> two can evolve independently.  From a pure code reuse perspective, I
>> suppose the two could share a common base class, but then that common base
>> class would need to creep into the public API too.
>>
>>  Currently, there is no guarantee about the availability of these fields
>> in the public REST API.  We're going to remove mention of them in the
>> documentation.  We're not necessarily planning to remove the fields from
>> the JSON immediately, but there is also no guarantee that they'll stay
>> there.
>>
>>   Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>   From: Bram Biesbrouck <b...@beligum.com>
>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Date: Friday, April 17, 2015 at 5:26 AM
>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Subject: Re: Found weird issue with HttpFS and WebHdfsFileSystem
>>
>>   Hi Chris,
>>
>>  Thanks for this reply. I thought something funny was happening.
>>
>>  The childNum field is actually very useful (eg for (not) rendering a
>> expansion marker next to a folder in a GUI when it has children), so it's a
>> pity the info is there, but get's "eaten up" by the general interface, only
>> to be re-calculated later on.
>> It would be nice to have the info as an optional field in the FileStatus
>> class (initialized to -1 like it is right now), so we can use it if it's
>> there or just ignore it when not initialized. While I'm
>> ranting, HdfsFileStatus should override from FileStatus because it's 95%
>> the same code anyway.
>>
>>  If I read your reply correctly, I assume the fields will be deleted
>> from the webhdfs JSON responses as well in the future?
>>
>>  Thanks again for the extensive reply, very useful and appreciated.
>>
>>  cheers,
>>
>>  b.
>>
>>
>>
>> On Fri, Apr 17, 2015 at 12:59 AM, Chris Nauroth <cnauroth@hortonworks.com
>> > wrote:
>>
>>>  Hello Bram,
>>>
>>>  There are a few Apache jiras with background discussion of the
>>> introduction of these fields in WebHDFS.
>>>
>>>  https://issues.apache.org/jira/browse/HDFS-4502
>>>
>>>  https://issues.apache.org/jira/browse/HDFS-4772
>>>
>>>  https://issues.apache.org/jira/browse/HDFS-4969
>>>
>>>  The new fields could not be supported in HTTPFS (only WebHDFS), and
>>> they were not intended to be guaranteed in the public REST API.
>>> Unfortunately, the fields were added to the documentation mistakenly in
>>> Apache Hadoop 2.5.0.
>>>
>>>  https://issues.apache.org/jira/browse/HDFS-6153
>>>
>>>  We're going to revert that documentation change in Apache Hadoop
>>> 2.8.0.  I suggest that your application does not rely on these fields, or
>>> at least includes fallback logic to keep working as best as it can if the
>>> fields are not present.  Another way to determine the number of children
>>> would be to make a subsequent LISTSTATUS call on the child path.
>>>
>>>  I apologize if this caused any inconvenience, and I hope the
>>> information helps.
>>>
>>>   Chris Nauroth
>>> Hortonworks
>>> http://hortonworks.com/
>>>
>>>
>>>   From: Bram Biesbrouck <b...@beligum.com>
>>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>>> Date: Thursday, April 16, 2015 at 7:58 AM
>>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>>> Subject: Found weird issue with HttpFS and WebHdfsFileSystem
>>>
>>>   Hi all,
>>>
>>>  I'm experiencing something strange while developing against the HttpFS
>>> front-end webapp on Hadoop 2.6.0.
>>>
>>>  I'm currently digging into WebHdfsFileSystem and HttpFS to understand
>>> it better and understand how the rest api works. I've setup a local single
>>> node Hadoop instance, which I can query successfully with eg.
>>> http://localhost:50070/webhdfs/v1/?op=LISTSTATUS
>>> Returning eg. this FileStatus object:
>>>
>>>  {
>>> accessTime: 0,
>>> blockSize: 0,
>>> childrenNum: 0,
>>> fileId: 16386,
>>> group: "supergroup",
>>> length: 0,
>>> modificationTime: 1417964248854,
>>> owner: "hadoop",
>>> pathSuffix: "user",
>>> permission: "755",
>>> replication: 0,
>>> storagePolicy: 0,
>>> type: "DIRECTORY"
>>> }
>>>
>>>  Now, when I start HttpFS and ask for the same data over it's interface
>>> (http://localhost:14000/webhdfs/v1/?op=LISTSTATUS), I get a different
>>> reply. Especially, the childrenNum and fileId fields are missing, compared
>>> to the first result (same file or directory):
>>>
>>>  {
>>> pathSuffix: "user",
>>> type: "DIRECTORY",
>>> length: 0,
>>> owner: "hadoop",
>>> group: "supergroup",
>>> permission: "755",
>>> accessTime: 0,
>>> modificationTime: 1417964248854,
>>> blockSize: 0,
>>> replication: 0
>>> }
>>>
>>>  Since I need the childrenNum property, I started digging into the code
>>> to see where it's "lost" and found that WebHdfsFileSystem performs a
>>> makeQualified() step (around line 1287 in WebHdfsFileSystem.java), just
>>> before the list of filestatuses is returned. Basically, it converts
>>> HdfsFileStatus objects into FileStatus objects, effectively chopping off
>>> those two properties.
>>>
>>>  The sources for HdfsFileStatus clearly state that it's an "Interface
>>> that represents the over the wire information for a file.", so I wonder why
>>> this happens, since the HdfsFileStatus contains all the right properties,
>>> according to the docs at
>>> http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#List_a_Directory
>>>
>>>  It feels like the FileStatus class hasn't been updated to match the
>>> HdfsFileStatus class, but since they don't share any interfaces or
>>> superclasses I get the feeling it's intentional, but I just can't find or
>>> figure out why.
>>>
>>>  Can somebody help or shed some light?
>>>
>>>  thanks,
>>>
>>>  b.
>>> --
>>>
>>>  Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
>>> reinvention
>>>
>>>
>>
>>
>>  --
>>
>>  Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
>> reinvention
>>
>>
>
>
>  --
>
>  Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
> reinvention
>
>


-- 

 Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
reinvention

Re: Found weird issue with HttpFS and WebHdfsFileSystem

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hi Bram,

Your gut feeling is correct.  These 2 properties are used in private implementation details of cluster communication.  I believe these 2 properties are currently the only difference compared to the public REST API.

Chris Nauroth
Hortonworks
http://hortonworks.com/


From: Bram Biesbrouck <b@...@beligum.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Monday, April 20, 2015 at 4:19 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Found weird issue with HttpFS and WebHdfsFileSystem

Hi Chris,

Thanks for your insights. Last question: can you tell me the main differences (from a Hadoop dev point of view) between the public REST api and the HDFS wire protocol?
My gut feeling tells me hdfs is mainly used in cluster communication and the public one is, well, for public api's. But maybe I'm missing some more subtle differences?

cheers,

b.

On Fri, Apr 17, 2015 at 7:28 PM, Chris Nauroth <cn...@hortonworks.com>> wrote:
Hello Bram,

I'm glad to hear the information was helpful.

If you'd like to request access to childNum as part of a guaranteed public API, then I encourage you to create a jira issue in the HDFS project.  We could consider it for the future.

HdfsFileStatus is a representation of the HDFS wire protocol, and it's intended to be decoupled from the public API FileStatus object so that the two can evolve independently.  From a pure code reuse perspective, I suppose the two could share a common base class, but then that common base class would need to creep into the public API too.

Currently, there is no guarantee about the availability of these fields in the public REST API.  We're going to remove mention of them in the documentation.  We're not necessarily planning to remove the fields from the JSON immediately, but there is also no guarantee that they'll stay there.

Chris Nauroth
Hortonworks
http://hortonworks.com/


From: Bram Biesbrouck <b@...@beligum.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Friday, April 17, 2015 at 5:26 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Found weird issue with HttpFS and WebHdfsFileSystem

Hi Chris,

Thanks for this reply. I thought something funny was happening.

The childNum field is actually very useful (eg for (not) rendering a expansion marker next to a folder in a GUI when it has children), so it's a pity the info is there, but get's "eaten up" by the general interface, only to be re-calculated later on.
It would be nice to have the info as an optional field in the FileStatus class (initialized to -1 like it is right now), so we can use it if it's there or just ignore it when not initialized. While I'm ranting, HdfsFileStatus should override from FileStatus because it's 95% the same code anyway.

If I read your reply correctly, I assume the fields will be deleted from the webhdfs JSON responses as well in the future?

Thanks again for the extensive reply, very useful and appreciated.

cheers,

b.



On Fri, Apr 17, 2015 at 12:59 AM, Chris Nauroth <cn...@hortonworks.com>> wrote:
Hello Bram,

There are a few Apache jiras with background discussion of the introduction of these fields in WebHDFS.

https://issues.apache.org/jira/browse/HDFS-4502

https://issues.apache.org/jira/browse/HDFS-4772

https://issues.apache.org/jira/browse/HDFS-4969

The new fields could not be supported in HTTPFS (only WebHDFS), and they were not intended to be guaranteed in the public REST API.  Unfortunately, the fields were added to the documentation mistakenly in Apache Hadoop 2.5.0.

https://issues.apache.org/jira/browse/HDFS-6153

We're going to revert that documentation change in Apache Hadoop 2.8.0.  I suggest that your application does not rely on these fields, or at least includes fallback logic to keep working as best as it can if the fields are not present.  Another way to determine the number of children would be to make a subsequent LISTSTATUS call on the child path.

I apologize if this caused any inconvenience, and I hope the information helps.

Chris Nauroth
Hortonworks
http://hortonworks.com/


From: Bram Biesbrouck <b@...@beligum.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Thursday, April 16, 2015 at 7:58 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Found weird issue with HttpFS and WebHdfsFileSystem

Hi all,

I'm experiencing something strange while developing against the HttpFS front-end webapp on Hadoop 2.6.0.

I'm currently digging into WebHdfsFileSystem and HttpFS to understand it better and understand how the rest api works. I've setup a local single node Hadoop instance, which I can query successfully with eg. http://localhost:50070/webhdfs/v1/?op=LISTSTATUS
Returning eg. this FileStatus object:

{
accessTime: 0,
blockSize: 0,
childrenNum: 0,
fileId: 16386,
group: "supergroup",
length: 0,
modificationTime: 1417964248854,
owner: "hadoop",
pathSuffix: "user",
permission: "755",
replication: 0,
storagePolicy: 0,
type: "DIRECTORY"
}

Now, when I start HttpFS and ask for the same data over it's interface (http://localhost:14000/webhdfs/v1/?op=LISTSTATUS), I get a different reply. Especially, the childrenNum and fileId fields are missing, compared to the first result (same file or directory):

{
pathSuffix: "user",
type: "DIRECTORY",
length: 0,
owner: "hadoop",
group: "supergroup",
permission: "755",
accessTime: 0,
modificationTime: 1417964248854,
blockSize: 0,
replication: 0
}

Since I need the childrenNum property, I started digging into the code to see where it's "lost" and found that WebHdfsFileSystem performs a makeQualified() step (around line 1287 in WebHdfsFileSystem.java), just before the list of filestatuses is returned. Basically, it converts HdfsFileStatus objects into FileStatus objects, effectively chopping off those two properties.

The sources for HdfsFileStatus clearly state that it's an "Interface that represents the over the wire information for a file.", so I wonder why this happens, since the HdfsFileStatus contains all the right properties, according to the docs at http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#List_a_Directory

It feels like the FileStatus class hasn't been updated to match the HdfsFileStatus class, but since they don't share any interfaces or superclasses I get the feeling it's intentional, but I just can't find or figure out why.

Can somebody help or shed some light?

thanks,

b.
--

[http://dl.beligum.com/pub/beligum_flag_5dpi.png] Bram Biesbrouck - 0486/118280<tel:0486%2F118280> - www.beligum.com<http://www.beligum.com> -  the republic of reinvention



--

[http://dl.beligum.com/pub/beligum_flag_5dpi.png] Bram Biesbrouck - 0486/118280<tel:0486%2F118280> - www.beligum.com<http://www.beligum.com> -  the republic of reinvention



--

[http://dl.beligum.com/pub/beligum_flag_5dpi.png] Bram Biesbrouck - 0486/118280 - www.beligum.com<http://www.beligum.com> -  the republic of reinvention

Re: Found weird issue with HttpFS and WebHdfsFileSystem

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hi Bram,

Your gut feeling is correct.  These 2 properties are used in private implementation details of cluster communication.  I believe these 2 properties are currently the only difference compared to the public REST API.

Chris Nauroth
Hortonworks
http://hortonworks.com/


From: Bram Biesbrouck <b@...@beligum.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Monday, April 20, 2015 at 4:19 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Found weird issue with HttpFS and WebHdfsFileSystem

Hi Chris,

Thanks for your insights. Last question: can you tell me the main differences (from a Hadoop dev point of view) between the public REST api and the HDFS wire protocol?
My gut feeling tells me hdfs is mainly used in cluster communication and the public one is, well, for public api's. But maybe I'm missing some more subtle differences?

cheers,

b.

On Fri, Apr 17, 2015 at 7:28 PM, Chris Nauroth <cn...@hortonworks.com>> wrote:
Hello Bram,

I'm glad to hear the information was helpful.

If you'd like to request access to childNum as part of a guaranteed public API, then I encourage you to create a jira issue in the HDFS project.  We could consider it for the future.

HdfsFileStatus is a representation of the HDFS wire protocol, and it's intended to be decoupled from the public API FileStatus object so that the two can evolve independently.  From a pure code reuse perspective, I suppose the two could share a common base class, but then that common base class would need to creep into the public API too.

Currently, there is no guarantee about the availability of these fields in the public REST API.  We're going to remove mention of them in the documentation.  We're not necessarily planning to remove the fields from the JSON immediately, but there is also no guarantee that they'll stay there.

Chris Nauroth
Hortonworks
http://hortonworks.com/


From: Bram Biesbrouck <b@...@beligum.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Friday, April 17, 2015 at 5:26 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Found weird issue with HttpFS and WebHdfsFileSystem

Hi Chris,

Thanks for this reply. I thought something funny was happening.

The childNum field is actually very useful (eg for (not) rendering a expansion marker next to a folder in a GUI when it has children), so it's a pity the info is there, but get's "eaten up" by the general interface, only to be re-calculated later on.
It would be nice to have the info as an optional field in the FileStatus class (initialized to -1 like it is right now), so we can use it if it's there or just ignore it when not initialized. While I'm ranting, HdfsFileStatus should override from FileStatus because it's 95% the same code anyway.

If I read your reply correctly, I assume the fields will be deleted from the webhdfs JSON responses as well in the future?

Thanks again for the extensive reply, very useful and appreciated.

cheers,

b.



On Fri, Apr 17, 2015 at 12:59 AM, Chris Nauroth <cn...@hortonworks.com>> wrote:
Hello Bram,

There are a few Apache jiras with background discussion of the introduction of these fields in WebHDFS.

https://issues.apache.org/jira/browse/HDFS-4502

https://issues.apache.org/jira/browse/HDFS-4772

https://issues.apache.org/jira/browse/HDFS-4969

The new fields could not be supported in HTTPFS (only WebHDFS), and they were not intended to be guaranteed in the public REST API.  Unfortunately, the fields were added to the documentation mistakenly in Apache Hadoop 2.5.0.

https://issues.apache.org/jira/browse/HDFS-6153

We're going to revert that documentation change in Apache Hadoop 2.8.0.  I suggest that your application does not rely on these fields, or at least includes fallback logic to keep working as best as it can if the fields are not present.  Another way to determine the number of children would be to make a subsequent LISTSTATUS call on the child path.

I apologize if this caused any inconvenience, and I hope the information helps.

Chris Nauroth
Hortonworks
http://hortonworks.com/


From: Bram Biesbrouck <b@...@beligum.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Thursday, April 16, 2015 at 7:58 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Found weird issue with HttpFS and WebHdfsFileSystem

Hi all,

I'm experiencing something strange while developing against the HttpFS front-end webapp on Hadoop 2.6.0.

I'm currently digging into WebHdfsFileSystem and HttpFS to understand it better and understand how the rest api works. I've setup a local single node Hadoop instance, which I can query successfully with eg. http://localhost:50070/webhdfs/v1/?op=LISTSTATUS
Returning eg. this FileStatus object:

{
accessTime: 0,
blockSize: 0,
childrenNum: 0,
fileId: 16386,
group: "supergroup",
length: 0,
modificationTime: 1417964248854,
owner: "hadoop",
pathSuffix: "user",
permission: "755",
replication: 0,
storagePolicy: 0,
type: "DIRECTORY"
}

Now, when I start HttpFS and ask for the same data over it's interface (http://localhost:14000/webhdfs/v1/?op=LISTSTATUS), I get a different reply. Especially, the childrenNum and fileId fields are missing, compared to the first result (same file or directory):

{
pathSuffix: "user",
type: "DIRECTORY",
length: 0,
owner: "hadoop",
group: "supergroup",
permission: "755",
accessTime: 0,
modificationTime: 1417964248854,
blockSize: 0,
replication: 0
}

Since I need the childrenNum property, I started digging into the code to see where it's "lost" and found that WebHdfsFileSystem performs a makeQualified() step (around line 1287 in WebHdfsFileSystem.java), just before the list of filestatuses is returned. Basically, it converts HdfsFileStatus objects into FileStatus objects, effectively chopping off those two properties.

The sources for HdfsFileStatus clearly state that it's an "Interface that represents the over the wire information for a file.", so I wonder why this happens, since the HdfsFileStatus contains all the right properties, according to the docs at http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#List_a_Directory

It feels like the FileStatus class hasn't been updated to match the HdfsFileStatus class, but since they don't share any interfaces or superclasses I get the feeling it's intentional, but I just can't find or figure out why.

Can somebody help or shed some light?

thanks,

b.
--

[http://dl.beligum.com/pub/beligum_flag_5dpi.png] Bram Biesbrouck - 0486/118280<tel:0486%2F118280> - www.beligum.com<http://www.beligum.com> -  the republic of reinvention



--

[http://dl.beligum.com/pub/beligum_flag_5dpi.png] Bram Biesbrouck - 0486/118280<tel:0486%2F118280> - www.beligum.com<http://www.beligum.com> -  the republic of reinvention



--

[http://dl.beligum.com/pub/beligum_flag_5dpi.png] Bram Biesbrouck - 0486/118280 - www.beligum.com<http://www.beligum.com> -  the republic of reinvention

Re: Found weird issue with HttpFS and WebHdfsFileSystem

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hi Bram,

Your gut feeling is correct.  These 2 properties are used in private implementation details of cluster communication.  I believe these 2 properties are currently the only difference compared to the public REST API.

Chris Nauroth
Hortonworks
http://hortonworks.com/


From: Bram Biesbrouck <b@...@beligum.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Monday, April 20, 2015 at 4:19 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Found weird issue with HttpFS and WebHdfsFileSystem

Hi Chris,

Thanks for your insights. Last question: can you tell me the main differences (from a Hadoop dev point of view) between the public REST api and the HDFS wire protocol?
My gut feeling tells me hdfs is mainly used in cluster communication and the public one is, well, for public api's. But maybe I'm missing some more subtle differences?

cheers,

b.

On Fri, Apr 17, 2015 at 7:28 PM, Chris Nauroth <cn...@hortonworks.com>> wrote:
Hello Bram,

I'm glad to hear the information was helpful.

If you'd like to request access to childNum as part of a guaranteed public API, then I encourage you to create a jira issue in the HDFS project.  We could consider it for the future.

HdfsFileStatus is a representation of the HDFS wire protocol, and it's intended to be decoupled from the public API FileStatus object so that the two can evolve independently.  From a pure code reuse perspective, I suppose the two could share a common base class, but then that common base class would need to creep into the public API too.

Currently, there is no guarantee about the availability of these fields in the public REST API.  We're going to remove mention of them in the documentation.  We're not necessarily planning to remove the fields from the JSON immediately, but there is also no guarantee that they'll stay there.

Chris Nauroth
Hortonworks
http://hortonworks.com/


From: Bram Biesbrouck <b@...@beligum.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Friday, April 17, 2015 at 5:26 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Found weird issue with HttpFS and WebHdfsFileSystem

Hi Chris,

Thanks for this reply. I thought something funny was happening.

The childNum field is actually very useful (eg for (not) rendering a expansion marker next to a folder in a GUI when it has children), so it's a pity the info is there, but get's "eaten up" by the general interface, only to be re-calculated later on.
It would be nice to have the info as an optional field in the FileStatus class (initialized to -1 like it is right now), so we can use it if it's there or just ignore it when not initialized. While I'm ranting, HdfsFileStatus should override from FileStatus because it's 95% the same code anyway.

If I read your reply correctly, I assume the fields will be deleted from the webhdfs JSON responses as well in the future?

Thanks again for the extensive reply, very useful and appreciated.

cheers,

b.



On Fri, Apr 17, 2015 at 12:59 AM, Chris Nauroth <cn...@hortonworks.com>> wrote:
Hello Bram,

There are a few Apache jiras with background discussion of the introduction of these fields in WebHDFS.

https://issues.apache.org/jira/browse/HDFS-4502

https://issues.apache.org/jira/browse/HDFS-4772

https://issues.apache.org/jira/browse/HDFS-4969

The new fields could not be supported in HTTPFS (only WebHDFS), and they were not intended to be guaranteed in the public REST API.  Unfortunately, the fields were added to the documentation mistakenly in Apache Hadoop 2.5.0.

https://issues.apache.org/jira/browse/HDFS-6153

We're going to revert that documentation change in Apache Hadoop 2.8.0.  I suggest that your application does not rely on these fields, or at least includes fallback logic to keep working as best as it can if the fields are not present.  Another way to determine the number of children would be to make a subsequent LISTSTATUS call on the child path.

I apologize if this caused any inconvenience, and I hope the information helps.

Chris Nauroth
Hortonworks
http://hortonworks.com/


From: Bram Biesbrouck <b@...@beligum.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Thursday, April 16, 2015 at 7:58 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Found weird issue with HttpFS and WebHdfsFileSystem

Hi all,

I'm experiencing something strange while developing against the HttpFS front-end webapp on Hadoop 2.6.0.

I'm currently digging into WebHdfsFileSystem and HttpFS to understand it better and understand how the rest api works. I've setup a local single node Hadoop instance, which I can query successfully with eg. http://localhost:50070/webhdfs/v1/?op=LISTSTATUS
Returning eg. this FileStatus object:

{
accessTime: 0,
blockSize: 0,
childrenNum: 0,
fileId: 16386,
group: "supergroup",
length: 0,
modificationTime: 1417964248854,
owner: "hadoop",
pathSuffix: "user",
permission: "755",
replication: 0,
storagePolicy: 0,
type: "DIRECTORY"
}

Now, when I start HttpFS and ask for the same data over it's interface (http://localhost:14000/webhdfs/v1/?op=LISTSTATUS), I get a different reply. Especially, the childrenNum and fileId fields are missing, compared to the first result (same file or directory):

{
pathSuffix: "user",
type: "DIRECTORY",
length: 0,
owner: "hadoop",
group: "supergroup",
permission: "755",
accessTime: 0,
modificationTime: 1417964248854,
blockSize: 0,
replication: 0
}

Since I need the childrenNum property, I started digging into the code to see where it's "lost" and found that WebHdfsFileSystem performs a makeQualified() step (around line 1287 in WebHdfsFileSystem.java), just before the list of filestatuses is returned. Basically, it converts HdfsFileStatus objects into FileStatus objects, effectively chopping off those two properties.

The sources for HdfsFileStatus clearly state that it's an "Interface that represents the over the wire information for a file.", so I wonder why this happens, since the HdfsFileStatus contains all the right properties, according to the docs at http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#List_a_Directory

It feels like the FileStatus class hasn't been updated to match the HdfsFileStatus class, but since they don't share any interfaces or superclasses I get the feeling it's intentional, but I just can't find or figure out why.

Can somebody help or shed some light?

thanks,

b.
--

[http://dl.beligum.com/pub/beligum_flag_5dpi.png] Bram Biesbrouck - 0486/118280<tel:0486%2F118280> - www.beligum.com<http://www.beligum.com> -  the republic of reinvention



--

[http://dl.beligum.com/pub/beligum_flag_5dpi.png] Bram Biesbrouck - 0486/118280<tel:0486%2F118280> - www.beligum.com<http://www.beligum.com> -  the republic of reinvention



--

[http://dl.beligum.com/pub/beligum_flag_5dpi.png] Bram Biesbrouck - 0486/118280 - www.beligum.com<http://www.beligum.com> -  the republic of reinvention

Re: Found weird issue with HttpFS and WebHdfsFileSystem

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hi Bram,

Your gut feeling is correct.  These 2 properties are used in private implementation details of cluster communication.  I believe these 2 properties are currently the only difference compared to the public REST API.

Chris Nauroth
Hortonworks
http://hortonworks.com/


From: Bram Biesbrouck <b@...@beligum.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Monday, April 20, 2015 at 4:19 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Found weird issue with HttpFS and WebHdfsFileSystem

Hi Chris,

Thanks for your insights. Last question: can you tell me the main differences (from a Hadoop dev point of view) between the public REST api and the HDFS wire protocol?
My gut feeling tells me hdfs is mainly used in cluster communication and the public one is, well, for public api's. But maybe I'm missing some more subtle differences?

cheers,

b.

On Fri, Apr 17, 2015 at 7:28 PM, Chris Nauroth <cn...@hortonworks.com>> wrote:
Hello Bram,

I'm glad to hear the information was helpful.

If you'd like to request access to childNum as part of a guaranteed public API, then I encourage you to create a jira issue in the HDFS project.  We could consider it for the future.

HdfsFileStatus is a representation of the HDFS wire protocol, and it's intended to be decoupled from the public API FileStatus object so that the two can evolve independently.  From a pure code reuse perspective, I suppose the two could share a common base class, but then that common base class would need to creep into the public API too.

Currently, there is no guarantee about the availability of these fields in the public REST API.  We're going to remove mention of them in the documentation.  We're not necessarily planning to remove the fields from the JSON immediately, but there is also no guarantee that they'll stay there.

Chris Nauroth
Hortonworks
http://hortonworks.com/


From: Bram Biesbrouck <b@...@beligum.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Friday, April 17, 2015 at 5:26 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Found weird issue with HttpFS and WebHdfsFileSystem

Hi Chris,

Thanks for this reply. I thought something funny was happening.

The childNum field is actually very useful (eg for (not) rendering a expansion marker next to a folder in a GUI when it has children), so it's a pity the info is there, but get's "eaten up" by the general interface, only to be re-calculated later on.
It would be nice to have the info as an optional field in the FileStatus class (initialized to -1 like it is right now), so we can use it if it's there or just ignore it when not initialized. While I'm ranting, HdfsFileStatus should override from FileStatus because it's 95% the same code anyway.

If I read your reply correctly, I assume the fields will be deleted from the webhdfs JSON responses as well in the future?

Thanks again for the extensive reply, very useful and appreciated.

cheers,

b.



On Fri, Apr 17, 2015 at 12:59 AM, Chris Nauroth <cn...@hortonworks.com>> wrote:
Hello Bram,

There are a few Apache jiras with background discussion of the introduction of these fields in WebHDFS.

https://issues.apache.org/jira/browse/HDFS-4502

https://issues.apache.org/jira/browse/HDFS-4772

https://issues.apache.org/jira/browse/HDFS-4969

The new fields could not be supported in HTTPFS (only WebHDFS), and they were not intended to be guaranteed in the public REST API.  Unfortunately, the fields were added to the documentation mistakenly in Apache Hadoop 2.5.0.

https://issues.apache.org/jira/browse/HDFS-6153

We're going to revert that documentation change in Apache Hadoop 2.8.0.  I suggest that your application does not rely on these fields, or at least includes fallback logic to keep working as best as it can if the fields are not present.  Another way to determine the number of children would be to make a subsequent LISTSTATUS call on the child path.

I apologize if this caused any inconvenience, and I hope the information helps.

Chris Nauroth
Hortonworks
http://hortonworks.com/


From: Bram Biesbrouck <b@...@beligum.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Thursday, April 16, 2015 at 7:58 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Found weird issue with HttpFS and WebHdfsFileSystem

Hi all,

I'm experiencing something strange while developing against the HttpFS front-end webapp on Hadoop 2.6.0.

I'm currently digging into WebHdfsFileSystem and HttpFS to understand it better and understand how the rest api works. I've setup a local single node Hadoop instance, which I can query successfully with eg. http://localhost:50070/webhdfs/v1/?op=LISTSTATUS
Returning eg. this FileStatus object:

{
accessTime: 0,
blockSize: 0,
childrenNum: 0,
fileId: 16386,
group: "supergroup",
length: 0,
modificationTime: 1417964248854,
owner: "hadoop",
pathSuffix: "user",
permission: "755",
replication: 0,
storagePolicy: 0,
type: "DIRECTORY"
}

Now, when I start HttpFS and ask for the same data over it's interface (http://localhost:14000/webhdfs/v1/?op=LISTSTATUS), I get a different reply. Especially, the childrenNum and fileId fields are missing, compared to the first result (same file or directory):

{
pathSuffix: "user",
type: "DIRECTORY",
length: 0,
owner: "hadoop",
group: "supergroup",
permission: "755",
accessTime: 0,
modificationTime: 1417964248854,
blockSize: 0,
replication: 0
}

Since I need the childrenNum property, I started digging into the code to see where it's "lost" and found that WebHdfsFileSystem performs a makeQualified() step (around line 1287 in WebHdfsFileSystem.java), just before the list of filestatuses is returned. Basically, it converts HdfsFileStatus objects into FileStatus objects, effectively chopping off those two properties.

The sources for HdfsFileStatus clearly state that it's an "Interface that represents the over the wire information for a file.", so I wonder why this happens, since the HdfsFileStatus contains all the right properties, according to the docs at http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#List_a_Directory

It feels like the FileStatus class hasn't been updated to match the HdfsFileStatus class, but since they don't share any interfaces or superclasses I get the feeling it's intentional, but I just can't find or figure out why.

Can somebody help or shed some light?

thanks,

b.
--

[http://dl.beligum.com/pub/beligum_flag_5dpi.png] Bram Biesbrouck - 0486/118280<tel:0486%2F118280> - www.beligum.com<http://www.beligum.com> -  the republic of reinvention



--

[http://dl.beligum.com/pub/beligum_flag_5dpi.png] Bram Biesbrouck - 0486/118280<tel:0486%2F118280> - www.beligum.com<http://www.beligum.com> -  the republic of reinvention



--

[http://dl.beligum.com/pub/beligum_flag_5dpi.png] Bram Biesbrouck - 0486/118280 - www.beligum.com<http://www.beligum.com> -  the republic of reinvention

Re: Found weird issue with HttpFS and WebHdfsFileSystem

Posted by Bram Biesbrouck <b...@beligum.com>.
Hi Chris,

Thanks for your insights. Last question: can you tell me the main
differences (from a Hadoop dev point of view) between the public REST api
and the HDFS wire protocol?
My gut feeling tells me hdfs is mainly used in cluster communication and
the public one is, well, for public api's. But maybe I'm missing some more
subtle differences?

cheers,

b.

On Fri, Apr 17, 2015 at 7:28 PM, Chris Nauroth <cn...@hortonworks.com>
wrote:

>   Hello Bram,
>
>  I'm glad to hear the information was helpful.
>
>  If you'd like to request access to childNum as part of a guaranteed
> public API, then I encourage you to create a jira issue in the HDFS
> project.  We could consider it for the future.
>
>  HdfsFileStatus is a representation of the HDFS wire protocol, and it's
> intended to be decoupled from the public API FileStatus object so that the
> two can evolve independently.  From a pure code reuse perspective, I
> suppose the two could share a common base class, but then that common base
> class would need to creep into the public API too.
>
>  Currently, there is no guarantee about the availability of these fields
> in the public REST API.  We're going to remove mention of them in the
> documentation.  We're not necessarily planning to remove the fields from
> the JSON immediately, but there is also no guarantee that they'll stay
> there.
>
>   Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>   From: Bram Biesbrouck <b...@beligum.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Friday, April 17, 2015 at 5:26 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Re: Found weird issue with HttpFS and WebHdfsFileSystem
>
>   Hi Chris,
>
>  Thanks for this reply. I thought something funny was happening.
>
>  The childNum field is actually very useful (eg for (not) rendering a
> expansion marker next to a folder in a GUI when it has children), so it's a
> pity the info is there, but get's "eaten up" by the general interface, only
> to be re-calculated later on.
> It would be nice to have the info as an optional field in the FileStatus
> class (initialized to -1 like it is right now), so we can use it if it's
> there or just ignore it when not initialized. While I'm
> ranting, HdfsFileStatus should override from FileStatus because it's 95%
> the same code anyway.
>
>  If I read your reply correctly, I assume the fields will be deleted from
> the webhdfs JSON responses as well in the future?
>
>  Thanks again for the extensive reply, very useful and appreciated.
>
>  cheers,
>
>  b.
>
>
>
> On Fri, Apr 17, 2015 at 12:59 AM, Chris Nauroth <cn...@hortonworks.com>
> wrote:
>
>>  Hello Bram,
>>
>>  There are a few Apache jiras with background discussion of the
>> introduction of these fields in WebHDFS.
>>
>>  https://issues.apache.org/jira/browse/HDFS-4502
>>
>>  https://issues.apache.org/jira/browse/HDFS-4772
>>
>>  https://issues.apache.org/jira/browse/HDFS-4969
>>
>>  The new fields could not be supported in HTTPFS (only WebHDFS), and
>> they were not intended to be guaranteed in the public REST API.
>> Unfortunately, the fields were added to the documentation mistakenly in
>> Apache Hadoop 2.5.0.
>>
>>  https://issues.apache.org/jira/browse/HDFS-6153
>>
>>  We're going to revert that documentation change in Apache Hadoop
>> 2.8.0.  I suggest that your application does not rely on these fields, or
>> at least includes fallback logic to keep working as best as it can if the
>> fields are not present.  Another way to determine the number of children
>> would be to make a subsequent LISTSTATUS call on the child path.
>>
>>  I apologize if this caused any inconvenience, and I hope the
>> information helps.
>>
>>   Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>   From: Bram Biesbrouck <b...@beligum.com>
>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Date: Thursday, April 16, 2015 at 7:58 AM
>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Subject: Found weird issue with HttpFS and WebHdfsFileSystem
>>
>>   Hi all,
>>
>>  I'm experiencing something strange while developing against the HttpFS
>> front-end webapp on Hadoop 2.6.0.
>>
>>  I'm currently digging into WebHdfsFileSystem and HttpFS to understand
>> it better and understand how the rest api works. I've setup a local single
>> node Hadoop instance, which I can query successfully with eg.
>> http://localhost:50070/webhdfs/v1/?op=LISTSTATUS
>> Returning eg. this FileStatus object:
>>
>>  {
>> accessTime: 0,
>> blockSize: 0,
>> childrenNum: 0,
>> fileId: 16386,
>> group: "supergroup",
>> length: 0,
>> modificationTime: 1417964248854,
>> owner: "hadoop",
>> pathSuffix: "user",
>> permission: "755",
>> replication: 0,
>> storagePolicy: 0,
>> type: "DIRECTORY"
>> }
>>
>>  Now, when I start HttpFS and ask for the same data over it's interface (
>> http://localhost:14000/webhdfs/v1/?op=LISTSTATUS), I get a different
>> reply. Especially, the childrenNum and fileId fields are missing, compared
>> to the first result (same file or directory):
>>
>>  {
>> pathSuffix: "user",
>> type: "DIRECTORY",
>> length: 0,
>> owner: "hadoop",
>> group: "supergroup",
>> permission: "755",
>> accessTime: 0,
>> modificationTime: 1417964248854,
>> blockSize: 0,
>> replication: 0
>> }
>>
>>  Since I need the childrenNum property, I started digging into the code
>> to see where it's "lost" and found that WebHdfsFileSystem performs a
>> makeQualified() step (around line 1287 in WebHdfsFileSystem.java), just
>> before the list of filestatuses is returned. Basically, it converts
>> HdfsFileStatus objects into FileStatus objects, effectively chopping off
>> those two properties.
>>
>>  The sources for HdfsFileStatus clearly state that it's an "Interface
>> that represents the over the wire information for a file.", so I wonder why
>> this happens, since the HdfsFileStatus contains all the right properties,
>> according to the docs at
>> http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#List_a_Directory
>>
>>  It feels like the FileStatus class hasn't been updated to match the
>> HdfsFileStatus class, but since they don't share any interfaces or
>> superclasses I get the feeling it's intentional, but I just can't find or
>> figure out why.
>>
>>  Can somebody help or shed some light?
>>
>>  thanks,
>>
>>  b.
>> --
>>
>>  Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
>> reinvention
>>
>>
>
>
>  --
>
>  Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
> reinvention
>
>


-- 

 Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
reinvention

Re: Found weird issue with HttpFS and WebHdfsFileSystem

Posted by Bram Biesbrouck <b...@beligum.com>.
Hi Chris,

Thanks for your insights. Last question: can you tell me the main
differences (from a Hadoop dev point of view) between the public REST api
and the HDFS wire protocol?
My gut feeling tells me hdfs is mainly used in cluster communication and
the public one is, well, for public api's. But maybe I'm missing some more
subtle differences?

cheers,

b.

On Fri, Apr 17, 2015 at 7:28 PM, Chris Nauroth <cn...@hortonworks.com>
wrote:

>   Hello Bram,
>
>  I'm glad to hear the information was helpful.
>
>  If you'd like to request access to childNum as part of a guaranteed
> public API, then I encourage you to create a jira issue in the HDFS
> project.  We could consider it for the future.
>
>  HdfsFileStatus is a representation of the HDFS wire protocol, and it's
> intended to be decoupled from the public API FileStatus object so that the
> two can evolve independently.  From a pure code reuse perspective, I
> suppose the two could share a common base class, but then that common base
> class would need to creep into the public API too.
>
>  Currently, there is no guarantee about the availability of these fields
> in the public REST API.  We're going to remove mention of them in the
> documentation.  We're not necessarily planning to remove the fields from
> the JSON immediately, but there is also no guarantee that they'll stay
> there.
>
>   Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>   From: Bram Biesbrouck <b...@beligum.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Friday, April 17, 2015 at 5:26 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Re: Found weird issue with HttpFS and WebHdfsFileSystem
>
>   Hi Chris,
>
>  Thanks for this reply. I thought something funny was happening.
>
>  The childNum field is actually very useful (eg for (not) rendering a
> expansion marker next to a folder in a GUI when it has children), so it's a
> pity the info is there, but get's "eaten up" by the general interface, only
> to be re-calculated later on.
> It would be nice to have the info as an optional field in the FileStatus
> class (initialized to -1 like it is right now), so we can use it if it's
> there or just ignore it when not initialized. While I'm
> ranting, HdfsFileStatus should override from FileStatus because it's 95%
> the same code anyway.
>
>  If I read your reply correctly, I assume the fields will be deleted from
> the webhdfs JSON responses as well in the future?
>
>  Thanks again for the extensive reply, very useful and appreciated.
>
>  cheers,
>
>  b.
>
>
>
> On Fri, Apr 17, 2015 at 12:59 AM, Chris Nauroth <cn...@hortonworks.com>
> wrote:
>
>>  Hello Bram,
>>
>>  There are a few Apache jiras with background discussion of the
>> introduction of these fields in WebHDFS.
>>
>>  https://issues.apache.org/jira/browse/HDFS-4502
>>
>>  https://issues.apache.org/jira/browse/HDFS-4772
>>
>>  https://issues.apache.org/jira/browse/HDFS-4969
>>
>>  The new fields could not be supported in HTTPFS (only WebHDFS), and
>> they were not intended to be guaranteed in the public REST API.
>> Unfortunately, the fields were added to the documentation mistakenly in
>> Apache Hadoop 2.5.0.
>>
>>  https://issues.apache.org/jira/browse/HDFS-6153
>>
>>  We're going to revert that documentation change in Apache Hadoop
>> 2.8.0.  I suggest that your application does not rely on these fields, or
>> at least includes fallback logic to keep working as best as it can if the
>> fields are not present.  Another way to determine the number of children
>> would be to make a subsequent LISTSTATUS call on the child path.
>>
>>  I apologize if this caused any inconvenience, and I hope the
>> information helps.
>>
>>   Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>   From: Bram Biesbrouck <b...@beligum.com>
>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Date: Thursday, April 16, 2015 at 7:58 AM
>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Subject: Found weird issue with HttpFS and WebHdfsFileSystem
>>
>>   Hi all,
>>
>>  I'm experiencing something strange while developing against the HttpFS
>> front-end webapp on Hadoop 2.6.0.
>>
>>  I'm currently digging into WebHdfsFileSystem and HttpFS to understand
>> it better and understand how the rest api works. I've setup a local single
>> node Hadoop instance, which I can query successfully with eg.
>> http://localhost:50070/webhdfs/v1/?op=LISTSTATUS
>> Returning eg. this FileStatus object:
>>
>>  {
>> accessTime: 0,
>> blockSize: 0,
>> childrenNum: 0,
>> fileId: 16386,
>> group: "supergroup",
>> length: 0,
>> modificationTime: 1417964248854,
>> owner: "hadoop",
>> pathSuffix: "user",
>> permission: "755",
>> replication: 0,
>> storagePolicy: 0,
>> type: "DIRECTORY"
>> }
>>
>>  Now, when I start HttpFS and ask for the same data over it's interface (
>> http://localhost:14000/webhdfs/v1/?op=LISTSTATUS), I get a different
>> reply. Especially, the childrenNum and fileId fields are missing, compared
>> to the first result (same file or directory):
>>
>>  {
>> pathSuffix: "user",
>> type: "DIRECTORY",
>> length: 0,
>> owner: "hadoop",
>> group: "supergroup",
>> permission: "755",
>> accessTime: 0,
>> modificationTime: 1417964248854,
>> blockSize: 0,
>> replication: 0
>> }
>>
>>  Since I need the childrenNum property, I started digging into the code
>> to see where it's "lost" and found that WebHdfsFileSystem performs a
>> makeQualified() step (around line 1287 in WebHdfsFileSystem.java), just
>> before the list of filestatuses is returned. Basically, it converts
>> HdfsFileStatus objects into FileStatus objects, effectively chopping off
>> those two properties.
>>
>>  The sources for HdfsFileStatus clearly state that it's an "Interface
>> that represents the over the wire information for a file.", so I wonder why
>> this happens, since the HdfsFileStatus contains all the right properties,
>> according to the docs at
>> http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#List_a_Directory
>>
>>  It feels like the FileStatus class hasn't been updated to match the
>> HdfsFileStatus class, but since they don't share any interfaces or
>> superclasses I get the feeling it's intentional, but I just can't find or
>> figure out why.
>>
>>  Can somebody help or shed some light?
>>
>>  thanks,
>>
>>  b.
>> --
>>
>>  Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
>> reinvention
>>
>>
>
>
>  --
>
>  Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
> reinvention
>
>


-- 

 Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
reinvention

Re: Found weird issue with HttpFS and WebHdfsFileSystem

Posted by Bram Biesbrouck <b...@beligum.com>.
Hi Chris,

Thanks for your insights. Last question: can you tell me the main
differences (from a Hadoop dev point of view) between the public REST api
and the HDFS wire protocol?
My gut feeling tells me hdfs is mainly used in cluster communication and
the public one is, well, for public api's. But maybe I'm missing some more
subtle differences?

cheers,

b.

On Fri, Apr 17, 2015 at 7:28 PM, Chris Nauroth <cn...@hortonworks.com>
wrote:

>   Hello Bram,
>
>  I'm glad to hear the information was helpful.
>
>  If you'd like to request access to childNum as part of a guaranteed
> public API, then I encourage you to create a jira issue in the HDFS
> project.  We could consider it for the future.
>
>  HdfsFileStatus is a representation of the HDFS wire protocol, and it's
> intended to be decoupled from the public API FileStatus object so that the
> two can evolve independently.  From a pure code reuse perspective, I
> suppose the two could share a common base class, but then that common base
> class would need to creep into the public API too.
>
>  Currently, there is no guarantee about the availability of these fields
> in the public REST API.  We're going to remove mention of them in the
> documentation.  We're not necessarily planning to remove the fields from
> the JSON immediately, but there is also no guarantee that they'll stay
> there.
>
>   Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>   From: Bram Biesbrouck <b...@beligum.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Friday, April 17, 2015 at 5:26 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Re: Found weird issue with HttpFS and WebHdfsFileSystem
>
>   Hi Chris,
>
>  Thanks for this reply. I thought something funny was happening.
>
>  The childNum field is actually very useful (eg for (not) rendering a
> expansion marker next to a folder in a GUI when it has children), so it's a
> pity the info is there, but get's "eaten up" by the general interface, only
> to be re-calculated later on.
> It would be nice to have the info as an optional field in the FileStatus
> class (initialized to -1 like it is right now), so we can use it if it's
> there or just ignore it when not initialized. While I'm
> ranting, HdfsFileStatus should override from FileStatus because it's 95%
> the same code anyway.
>
>  If I read your reply correctly, I assume the fields will be deleted from
> the webhdfs JSON responses as well in the future?
>
>  Thanks again for the extensive reply, very useful and appreciated.
>
>  cheers,
>
>  b.
>
>
>
> On Fri, Apr 17, 2015 at 12:59 AM, Chris Nauroth <cn...@hortonworks.com>
> wrote:
>
>>  Hello Bram,
>>
>>  There are a few Apache jiras with background discussion of the
>> introduction of these fields in WebHDFS.
>>
>>  https://issues.apache.org/jira/browse/HDFS-4502
>>
>>  https://issues.apache.org/jira/browse/HDFS-4772
>>
>>  https://issues.apache.org/jira/browse/HDFS-4969
>>
>>  The new fields could not be supported in HTTPFS (only WebHDFS), and
>> they were not intended to be guaranteed in the public REST API.
>> Unfortunately, the fields were added to the documentation mistakenly in
>> Apache Hadoop 2.5.0.
>>
>>  https://issues.apache.org/jira/browse/HDFS-6153
>>
>>  We're going to revert that documentation change in Apache Hadoop
>> 2.8.0.  I suggest that your application does not rely on these fields, or
>> at least includes fallback logic to keep working as best as it can if the
>> fields are not present.  Another way to determine the number of children
>> would be to make a subsequent LISTSTATUS call on the child path.
>>
>>  I apologize if this caused any inconvenience, and I hope the
>> information helps.
>>
>>   Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>   From: Bram Biesbrouck <b...@beligum.com>
>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Date: Thursday, April 16, 2015 at 7:58 AM
>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Subject: Found weird issue with HttpFS and WebHdfsFileSystem
>>
>>   Hi all,
>>
>>  I'm experiencing something strange while developing against the HttpFS
>> front-end webapp on Hadoop 2.6.0.
>>
>>  I'm currently digging into WebHdfsFileSystem and HttpFS to understand
>> it better and understand how the rest api works. I've setup a local single
>> node Hadoop instance, which I can query successfully with eg.
>> http://localhost:50070/webhdfs/v1/?op=LISTSTATUS
>> Returning eg. this FileStatus object:
>>
>>  {
>> accessTime: 0,
>> blockSize: 0,
>> childrenNum: 0,
>> fileId: 16386,
>> group: "supergroup",
>> length: 0,
>> modificationTime: 1417964248854,
>> owner: "hadoop",
>> pathSuffix: "user",
>> permission: "755",
>> replication: 0,
>> storagePolicy: 0,
>> type: "DIRECTORY"
>> }
>>
>>  Now, when I start HttpFS and ask for the same data over it's interface (
>> http://localhost:14000/webhdfs/v1/?op=LISTSTATUS), I get a different
>> reply. Especially, the childrenNum and fileId fields are missing, compared
>> to the first result (same file or directory):
>>
>>  {
>> pathSuffix: "user",
>> type: "DIRECTORY",
>> length: 0,
>> owner: "hadoop",
>> group: "supergroup",
>> permission: "755",
>> accessTime: 0,
>> modificationTime: 1417964248854,
>> blockSize: 0,
>> replication: 0
>> }
>>
>>  Since I need the childrenNum property, I started digging into the code
>> to see where it's "lost" and found that WebHdfsFileSystem performs a
>> makeQualified() step (around line 1287 in WebHdfsFileSystem.java), just
>> before the list of filestatuses is returned. Basically, it converts
>> HdfsFileStatus objects into FileStatus objects, effectively chopping off
>> those two properties.
>>
>>  The sources for HdfsFileStatus clearly state that it's an "Interface
>> that represents the over the wire information for a file.", so I wonder why
>> this happens, since the HdfsFileStatus contains all the right properties,
>> according to the docs at
>> http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#List_a_Directory
>>
>>  It feels like the FileStatus class hasn't been updated to match the
>> HdfsFileStatus class, but since they don't share any interfaces or
>> superclasses I get the feeling it's intentional, but I just can't find or
>> figure out why.
>>
>>  Can somebody help or shed some light?
>>
>>  thanks,
>>
>>  b.
>> --
>>
>>  Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
>> reinvention
>>
>>
>
>
>  --
>
>  Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
> reinvention
>
>


-- 

 Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
reinvention

Re: Found weird issue with HttpFS and WebHdfsFileSystem

Posted by Bram Biesbrouck <b...@beligum.com>.
Hi Chris,

Thanks for your insights. Last question: can you tell me the main
differences (from a Hadoop dev point of view) between the public REST api
and the HDFS wire protocol?
My gut feeling tells me hdfs is mainly used in cluster communication and
the public one is, well, for public api's. But maybe I'm missing some more
subtle differences?

cheers,

b.

On Fri, Apr 17, 2015 at 7:28 PM, Chris Nauroth <cn...@hortonworks.com>
wrote:

>   Hello Bram,
>
>  I'm glad to hear the information was helpful.
>
>  If you'd like to request access to childNum as part of a guaranteed
> public API, then I encourage you to create a jira issue in the HDFS
> project.  We could consider it for the future.
>
>  HdfsFileStatus is a representation of the HDFS wire protocol, and it's
> intended to be decoupled from the public API FileStatus object so that the
> two can evolve independently.  From a pure code reuse perspective, I
> suppose the two could share a common base class, but then that common base
> class would need to creep into the public API too.
>
>  Currently, there is no guarantee about the availability of these fields
> in the public REST API.  We're going to remove mention of them in the
> documentation.  We're not necessarily planning to remove the fields from
> the JSON immediately, but there is also no guarantee that they'll stay
> there.
>
>   Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>   From: Bram Biesbrouck <b...@beligum.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Friday, April 17, 2015 at 5:26 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Re: Found weird issue with HttpFS and WebHdfsFileSystem
>
>   Hi Chris,
>
>  Thanks for this reply. I thought something funny was happening.
>
>  The childNum field is actually very useful (eg for (not) rendering a
> expansion marker next to a folder in a GUI when it has children), so it's a
> pity the info is there, but get's "eaten up" by the general interface, only
> to be re-calculated later on.
> It would be nice to have the info as an optional field in the FileStatus
> class (initialized to -1 like it is right now), so we can use it if it's
> there or just ignore it when not initialized. While I'm
> ranting, HdfsFileStatus should override from FileStatus because it's 95%
> the same code anyway.
>
>  If I read your reply correctly, I assume the fields will be deleted from
> the webhdfs JSON responses as well in the future?
>
>  Thanks again for the extensive reply, very useful and appreciated.
>
>  cheers,
>
>  b.
>
>
>
> On Fri, Apr 17, 2015 at 12:59 AM, Chris Nauroth <cn...@hortonworks.com>
> wrote:
>
>>  Hello Bram,
>>
>>  There are a few Apache jiras with background discussion of the
>> introduction of these fields in WebHDFS.
>>
>>  https://issues.apache.org/jira/browse/HDFS-4502
>>
>>  https://issues.apache.org/jira/browse/HDFS-4772
>>
>>  https://issues.apache.org/jira/browse/HDFS-4969
>>
>>  The new fields could not be supported in HTTPFS (only WebHDFS), and
>> they were not intended to be guaranteed in the public REST API.
>> Unfortunately, the fields were added to the documentation mistakenly in
>> Apache Hadoop 2.5.0.
>>
>>  https://issues.apache.org/jira/browse/HDFS-6153
>>
>>  We're going to revert that documentation change in Apache Hadoop
>> 2.8.0.  I suggest that your application does not rely on these fields, or
>> at least includes fallback logic to keep working as best as it can if the
>> fields are not present.  Another way to determine the number of children
>> would be to make a subsequent LISTSTATUS call on the child path.
>>
>>  I apologize if this caused any inconvenience, and I hope the
>> information helps.
>>
>>   Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>   From: Bram Biesbrouck <b...@beligum.com>
>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Date: Thursday, April 16, 2015 at 7:58 AM
>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Subject: Found weird issue with HttpFS and WebHdfsFileSystem
>>
>>   Hi all,
>>
>>  I'm experiencing something strange while developing against the HttpFS
>> front-end webapp on Hadoop 2.6.0.
>>
>>  I'm currently digging into WebHdfsFileSystem and HttpFS to understand
>> it better and understand how the rest api works. I've setup a local single
>> node Hadoop instance, which I can query successfully with eg.
>> http://localhost:50070/webhdfs/v1/?op=LISTSTATUS
>> Returning eg. this FileStatus object:
>>
>>  {
>> accessTime: 0,
>> blockSize: 0,
>> childrenNum: 0,
>> fileId: 16386,
>> group: "supergroup",
>> length: 0,
>> modificationTime: 1417964248854,
>> owner: "hadoop",
>> pathSuffix: "user",
>> permission: "755",
>> replication: 0,
>> storagePolicy: 0,
>> type: "DIRECTORY"
>> }
>>
>>  Now, when I start HttpFS and ask for the same data over it's interface (
>> http://localhost:14000/webhdfs/v1/?op=LISTSTATUS), I get a different
>> reply. Especially, the childrenNum and fileId fields are missing, compared
>> to the first result (same file or directory):
>>
>>  {
>> pathSuffix: "user",
>> type: "DIRECTORY",
>> length: 0,
>> owner: "hadoop",
>> group: "supergroup",
>> permission: "755",
>> accessTime: 0,
>> modificationTime: 1417964248854,
>> blockSize: 0,
>> replication: 0
>> }
>>
>>  Since I need the childrenNum property, I started digging into the code
>> to see where it's "lost" and found that WebHdfsFileSystem performs a
>> makeQualified() step (around line 1287 in WebHdfsFileSystem.java), just
>> before the list of filestatuses is returned. Basically, it converts
>> HdfsFileStatus objects into FileStatus objects, effectively chopping off
>> those two properties.
>>
>>  The sources for HdfsFileStatus clearly state that it's an "Interface
>> that represents the over the wire information for a file.", so I wonder why
>> this happens, since the HdfsFileStatus contains all the right properties,
>> according to the docs at
>> http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#List_a_Directory
>>
>>  It feels like the FileStatus class hasn't been updated to match the
>> HdfsFileStatus class, but since they don't share any interfaces or
>> superclasses I get the feeling it's intentional, but I just can't find or
>> figure out why.
>>
>>  Can somebody help or shed some light?
>>
>>  thanks,
>>
>>  b.
>> --
>>
>>  Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
>> reinvention
>>
>>
>
>
>  --
>
>  Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
> reinvention
>
>


-- 

 Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
reinvention

Re: Found weird issue with HttpFS and WebHdfsFileSystem

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hello Bram,

I'm glad to hear the information was helpful.

If you'd like to request access to childNum as part of a guaranteed public API, then I encourage you to create a jira issue in the HDFS project.  We could consider it for the future.

HdfsFileStatus is a representation of the HDFS wire protocol, and it's intended to be decoupled from the public API FileStatus object so that the two can evolve independently.  From a pure code reuse perspective, I suppose the two could share a common base class, but then that common base class would need to creep into the public API too.

Currently, there is no guarantee about the availability of these fields in the public REST API.  We're going to remove mention of them in the documentation.  We're not necessarily planning to remove the fields from the JSON immediately, but there is also no guarantee that they'll stay there.

Chris Nauroth
Hortonworks
http://hortonworks.com/


From: Bram Biesbrouck <b@...@beligum.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Friday, April 17, 2015 at 5:26 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Found weird issue with HttpFS and WebHdfsFileSystem

Hi Chris,

Thanks for this reply. I thought something funny was happening.

The childNum field is actually very useful (eg for (not) rendering a expansion marker next to a folder in a GUI when it has children), so it's a pity the info is there, but get's "eaten up" by the general interface, only to be re-calculated later on.
It would be nice to have the info as an optional field in the FileStatus class (initialized to -1 like it is right now), so we can use it if it's there or just ignore it when not initialized. While I'm ranting, HdfsFileStatus should override from FileStatus because it's 95% the same code anyway.

If I read your reply correctly, I assume the fields will be deleted from the webhdfs JSON responses as well in the future?

Thanks again for the extensive reply, very useful and appreciated.

cheers,

b.



On Fri, Apr 17, 2015 at 12:59 AM, Chris Nauroth <cn...@hortonworks.com>> wrote:
Hello Bram,

There are a few Apache jiras with background discussion of the introduction of these fields in WebHDFS.

https://issues.apache.org/jira/browse/HDFS-4502

https://issues.apache.org/jira/browse/HDFS-4772

https://issues.apache.org/jira/browse/HDFS-4969

The new fields could not be supported in HTTPFS (only WebHDFS), and they were not intended to be guaranteed in the public REST API.  Unfortunately, the fields were added to the documentation mistakenly in Apache Hadoop 2.5.0.

https://issues.apache.org/jira/browse/HDFS-6153

We're going to revert that documentation change in Apache Hadoop 2.8.0.  I suggest that your application does not rely on these fields, or at least includes fallback logic to keep working as best as it can if the fields are not present.  Another way to determine the number of children would be to make a subsequent LISTSTATUS call on the child path.

I apologize if this caused any inconvenience, and I hope the information helps.

Chris Nauroth
Hortonworks
http://hortonworks.com/


From: Bram Biesbrouck <b@...@beligum.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Thursday, April 16, 2015 at 7:58 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Found weird issue with HttpFS and WebHdfsFileSystem

Hi all,

I'm experiencing something strange while developing against the HttpFS front-end webapp on Hadoop 2.6.0.

I'm currently digging into WebHdfsFileSystem and HttpFS to understand it better and understand how the rest api works. I've setup a local single node Hadoop instance, which I can query successfully with eg. http://localhost:50070/webhdfs/v1/?op=LISTSTATUS
Returning eg. this FileStatus object:

{
accessTime: 0,
blockSize: 0,
childrenNum: 0,
fileId: 16386,
group: "supergroup",
length: 0,
modificationTime: 1417964248854,
owner: "hadoop",
pathSuffix: "user",
permission: "755",
replication: 0,
storagePolicy: 0,
type: "DIRECTORY"
}

Now, when I start HttpFS and ask for the same data over it's interface (http://localhost:14000/webhdfs/v1/?op=LISTSTATUS), I get a different reply. Especially, the childrenNum and fileId fields are missing, compared to the first result (same file or directory):

{
pathSuffix: "user",
type: "DIRECTORY",
length: 0,
owner: "hadoop",
group: "supergroup",
permission: "755",
accessTime: 0,
modificationTime: 1417964248854,
blockSize: 0,
replication: 0
}

Since I need the childrenNum property, I started digging into the code to see where it's "lost" and found that WebHdfsFileSystem performs a makeQualified() step (around line 1287 in WebHdfsFileSystem.java), just before the list of filestatuses is returned. Basically, it converts HdfsFileStatus objects into FileStatus objects, effectively chopping off those two properties.

The sources for HdfsFileStatus clearly state that it's an "Interface that represents the over the wire information for a file.", so I wonder why this happens, since the HdfsFileStatus contains all the right properties, according to the docs at http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#List_a_Directory

It feels like the FileStatus class hasn't been updated to match the HdfsFileStatus class, but since they don't share any interfaces or superclasses I get the feeling it's intentional, but I just can't find or figure out why.

Can somebody help or shed some light?

thanks,

b.
--

[http://dl.beligum.com/pub/beligum_flag_5dpi.png] Bram Biesbrouck - 0486/118280<tel:0486%2F118280> - www.beligum.com<http://www.beligum.com> -  the republic of reinvention



--

[http://dl.beligum.com/pub/beligum_flag_5dpi.png] Bram Biesbrouck - 0486/118280 - www.beligum.com<http://www.beligum.com> -  the republic of reinvention

Re: Found weird issue with HttpFS and WebHdfsFileSystem

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hello Bram,

I'm glad to hear the information was helpful.

If you'd like to request access to childNum as part of a guaranteed public API, then I encourage you to create a jira issue in the HDFS project.  We could consider it for the future.

HdfsFileStatus is a representation of the HDFS wire protocol, and it's intended to be decoupled from the public API FileStatus object so that the two can evolve independently.  From a pure code reuse perspective, I suppose the two could share a common base class, but then that common base class would need to creep into the public API too.

Currently, there is no guarantee about the availability of these fields in the public REST API.  We're going to remove mention of them in the documentation.  We're not necessarily planning to remove the fields from the JSON immediately, but there is also no guarantee that they'll stay there.

Chris Nauroth
Hortonworks
http://hortonworks.com/


From: Bram Biesbrouck <b@...@beligum.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Friday, April 17, 2015 at 5:26 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Found weird issue with HttpFS and WebHdfsFileSystem

Hi Chris,

Thanks for this reply. I thought something funny was happening.

The childNum field is actually very useful (eg for (not) rendering a expansion marker next to a folder in a GUI when it has children), so it's a pity the info is there, but get's "eaten up" by the general interface, only to be re-calculated later on.
It would be nice to have the info as an optional field in the FileStatus class (initialized to -1 like it is right now), so we can use it if it's there or just ignore it when not initialized. While I'm ranting, HdfsFileStatus should override from FileStatus because it's 95% the same code anyway.

If I read your reply correctly, I assume the fields will be deleted from the webhdfs JSON responses as well in the future?

Thanks again for the extensive reply, very useful and appreciated.

cheers,

b.



On Fri, Apr 17, 2015 at 12:59 AM, Chris Nauroth <cn...@hortonworks.com>> wrote:
Hello Bram,

There are a few Apache jiras with background discussion of the introduction of these fields in WebHDFS.

https://issues.apache.org/jira/browse/HDFS-4502

https://issues.apache.org/jira/browse/HDFS-4772

https://issues.apache.org/jira/browse/HDFS-4969

The new fields could not be supported in HTTPFS (only WebHDFS), and they were not intended to be guaranteed in the public REST API.  Unfortunately, the fields were added to the documentation mistakenly in Apache Hadoop 2.5.0.

https://issues.apache.org/jira/browse/HDFS-6153

We're going to revert that documentation change in Apache Hadoop 2.8.0.  I suggest that your application does not rely on these fields, or at least includes fallback logic to keep working as best as it can if the fields are not present.  Another way to determine the number of children would be to make a subsequent LISTSTATUS call on the child path.

I apologize if this caused any inconvenience, and I hope the information helps.

Chris Nauroth
Hortonworks
http://hortonworks.com/


From: Bram Biesbrouck <b@...@beligum.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Thursday, April 16, 2015 at 7:58 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Found weird issue with HttpFS and WebHdfsFileSystem

Hi all,

I'm experiencing something strange while developing against the HttpFS front-end webapp on Hadoop 2.6.0.

I'm currently digging into WebHdfsFileSystem and HttpFS to understand it better and understand how the rest api works. I've setup a local single node Hadoop instance, which I can query successfully with eg. http://localhost:50070/webhdfs/v1/?op=LISTSTATUS
Returning eg. this FileStatus object:

{
accessTime: 0,
blockSize: 0,
childrenNum: 0,
fileId: 16386,
group: "supergroup",
length: 0,
modificationTime: 1417964248854,
owner: "hadoop",
pathSuffix: "user",
permission: "755",
replication: 0,
storagePolicy: 0,
type: "DIRECTORY"
}

Now, when I start HttpFS and ask for the same data over it's interface (http://localhost:14000/webhdfs/v1/?op=LISTSTATUS), I get a different reply. Especially, the childrenNum and fileId fields are missing, compared to the first result (same file or directory):

{
pathSuffix: "user",
type: "DIRECTORY",
length: 0,
owner: "hadoop",
group: "supergroup",
permission: "755",
accessTime: 0,
modificationTime: 1417964248854,
blockSize: 0,
replication: 0
}

Since I need the childrenNum property, I started digging into the code to see where it's "lost" and found that WebHdfsFileSystem performs a makeQualified() step (around line 1287 in WebHdfsFileSystem.java), just before the list of filestatuses is returned. Basically, it converts HdfsFileStatus objects into FileStatus objects, effectively chopping off those two properties.

The sources for HdfsFileStatus clearly state that it's an "Interface that represents the over the wire information for a file.", so I wonder why this happens, since the HdfsFileStatus contains all the right properties, according to the docs at http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#List_a_Directory

It feels like the FileStatus class hasn't been updated to match the HdfsFileStatus class, but since they don't share any interfaces or superclasses I get the feeling it's intentional, but I just can't find or figure out why.

Can somebody help or shed some light?

thanks,

b.
--

[http://dl.beligum.com/pub/beligum_flag_5dpi.png] Bram Biesbrouck - 0486/118280<tel:0486%2F118280> - www.beligum.com<http://www.beligum.com> -  the republic of reinvention



--

[http://dl.beligum.com/pub/beligum_flag_5dpi.png] Bram Biesbrouck - 0486/118280 - www.beligum.com<http://www.beligum.com> -  the republic of reinvention

Re: Found weird issue with HttpFS and WebHdfsFileSystem

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hello Bram,

I'm glad to hear the information was helpful.

If you'd like to request access to childNum as part of a guaranteed public API, then I encourage you to create a jira issue in the HDFS project.  We could consider it for the future.

HdfsFileStatus is a representation of the HDFS wire protocol, and it's intended to be decoupled from the public API FileStatus object so that the two can evolve independently.  From a pure code reuse perspective, I suppose the two could share a common base class, but then that common base class would need to creep into the public API too.

Currently, there is no guarantee about the availability of these fields in the public REST API.  We're going to remove mention of them in the documentation.  We're not necessarily planning to remove the fields from the JSON immediately, but there is also no guarantee that they'll stay there.

Chris Nauroth
Hortonworks
http://hortonworks.com/


From: Bram Biesbrouck <b@...@beligum.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Friday, April 17, 2015 at 5:26 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Found weird issue with HttpFS and WebHdfsFileSystem

Hi Chris,

Thanks for this reply. I thought something funny was happening.

The childNum field is actually very useful (eg for (not) rendering a expansion marker next to a folder in a GUI when it has children), so it's a pity the info is there, but get's "eaten up" by the general interface, only to be re-calculated later on.
It would be nice to have the info as an optional field in the FileStatus class (initialized to -1 like it is right now), so we can use it if it's there or just ignore it when not initialized. While I'm ranting, HdfsFileStatus should override from FileStatus because it's 95% the same code anyway.

If I read your reply correctly, I assume the fields will be deleted from the webhdfs JSON responses as well in the future?

Thanks again for the extensive reply, very useful and appreciated.

cheers,

b.



On Fri, Apr 17, 2015 at 12:59 AM, Chris Nauroth <cn...@hortonworks.com>> wrote:
Hello Bram,

There are a few Apache jiras with background discussion of the introduction of these fields in WebHDFS.

https://issues.apache.org/jira/browse/HDFS-4502

https://issues.apache.org/jira/browse/HDFS-4772

https://issues.apache.org/jira/browse/HDFS-4969

The new fields could not be supported in HTTPFS (only WebHDFS), and they were not intended to be guaranteed in the public REST API.  Unfortunately, the fields were added to the documentation mistakenly in Apache Hadoop 2.5.0.

https://issues.apache.org/jira/browse/HDFS-6153

We're going to revert that documentation change in Apache Hadoop 2.8.0.  I suggest that your application does not rely on these fields, or at least includes fallback logic to keep working as best as it can if the fields are not present.  Another way to determine the number of children would be to make a subsequent LISTSTATUS call on the child path.

I apologize if this caused any inconvenience, and I hope the information helps.

Chris Nauroth
Hortonworks
http://hortonworks.com/


From: Bram Biesbrouck <b@...@beligum.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Thursday, April 16, 2015 at 7:58 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Found weird issue with HttpFS and WebHdfsFileSystem

Hi all,

I'm experiencing something strange while developing against the HttpFS front-end webapp on Hadoop 2.6.0.

I'm currently digging into WebHdfsFileSystem and HttpFS to understand it better and understand how the rest api works. I've setup a local single node Hadoop instance, which I can query successfully with eg. http://localhost:50070/webhdfs/v1/?op=LISTSTATUS
Returning eg. this FileStatus object:

{
accessTime: 0,
blockSize: 0,
childrenNum: 0,
fileId: 16386,
group: "supergroup",
length: 0,
modificationTime: 1417964248854,
owner: "hadoop",
pathSuffix: "user",
permission: "755",
replication: 0,
storagePolicy: 0,
type: "DIRECTORY"
}

Now, when I start HttpFS and ask for the same data over it's interface (http://localhost:14000/webhdfs/v1/?op=LISTSTATUS), I get a different reply. Especially, the childrenNum and fileId fields are missing, compared to the first result (same file or directory):

{
pathSuffix: "user",
type: "DIRECTORY",
length: 0,
owner: "hadoop",
group: "supergroup",
permission: "755",
accessTime: 0,
modificationTime: 1417964248854,
blockSize: 0,
replication: 0
}

Since I need the childrenNum property, I started digging into the code to see where it's "lost" and found that WebHdfsFileSystem performs a makeQualified() step (around line 1287 in WebHdfsFileSystem.java), just before the list of filestatuses is returned. Basically, it converts HdfsFileStatus objects into FileStatus objects, effectively chopping off those two properties.

The sources for HdfsFileStatus clearly state that it's an "Interface that represents the over the wire information for a file.", so I wonder why this happens, since the HdfsFileStatus contains all the right properties, according to the docs at http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#List_a_Directory

It feels like the FileStatus class hasn't been updated to match the HdfsFileStatus class, but since they don't share any interfaces or superclasses I get the feeling it's intentional, but I just can't find or figure out why.

Can somebody help or shed some light?

thanks,

b.
--

[http://dl.beligum.com/pub/beligum_flag_5dpi.png] Bram Biesbrouck - 0486/118280<tel:0486%2F118280> - www.beligum.com<http://www.beligum.com> -  the republic of reinvention



--

[http://dl.beligum.com/pub/beligum_flag_5dpi.png] Bram Biesbrouck - 0486/118280 - www.beligum.com<http://www.beligum.com> -  the republic of reinvention

Re: Found weird issue with HttpFS and WebHdfsFileSystem

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hello Bram,

I'm glad to hear the information was helpful.

If you'd like to request access to childNum as part of a guaranteed public API, then I encourage you to create a jira issue in the HDFS project.  We could consider it for the future.

HdfsFileStatus is a representation of the HDFS wire protocol, and it's intended to be decoupled from the public API FileStatus object so that the two can evolve independently.  From a pure code reuse perspective, I suppose the two could share a common base class, but then that common base class would need to creep into the public API too.

Currently, there is no guarantee about the availability of these fields in the public REST API.  We're going to remove mention of them in the documentation.  We're not necessarily planning to remove the fields from the JSON immediately, but there is also no guarantee that they'll stay there.

Chris Nauroth
Hortonworks
http://hortonworks.com/


From: Bram Biesbrouck <b@...@beligum.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Friday, April 17, 2015 at 5:26 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Found weird issue with HttpFS and WebHdfsFileSystem

Hi Chris,

Thanks for this reply. I thought something funny was happening.

The childNum field is actually very useful (eg for (not) rendering a expansion marker next to a folder in a GUI when it has children), so it's a pity the info is there, but get's "eaten up" by the general interface, only to be re-calculated later on.
It would be nice to have the info as an optional field in the FileStatus class (initialized to -1 like it is right now), so we can use it if it's there or just ignore it when not initialized. While I'm ranting, HdfsFileStatus should override from FileStatus because it's 95% the same code anyway.

If I read your reply correctly, I assume the fields will be deleted from the webhdfs JSON responses as well in the future?

Thanks again for the extensive reply, very useful and appreciated.

cheers,

b.



On Fri, Apr 17, 2015 at 12:59 AM, Chris Nauroth <cn...@hortonworks.com>> wrote:
Hello Bram,

There are a few Apache jiras with background discussion of the introduction of these fields in WebHDFS.

https://issues.apache.org/jira/browse/HDFS-4502

https://issues.apache.org/jira/browse/HDFS-4772

https://issues.apache.org/jira/browse/HDFS-4969

The new fields could not be supported in HTTPFS (only WebHDFS), and they were not intended to be guaranteed in the public REST API.  Unfortunately, the fields were added to the documentation mistakenly in Apache Hadoop 2.5.0.

https://issues.apache.org/jira/browse/HDFS-6153

We're going to revert that documentation change in Apache Hadoop 2.8.0.  I suggest that your application does not rely on these fields, or at least includes fallback logic to keep working as best as it can if the fields are not present.  Another way to determine the number of children would be to make a subsequent LISTSTATUS call on the child path.

I apologize if this caused any inconvenience, and I hope the information helps.

Chris Nauroth
Hortonworks
http://hortonworks.com/


From: Bram Biesbrouck <b@...@beligum.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Thursday, April 16, 2015 at 7:58 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Found weird issue with HttpFS and WebHdfsFileSystem

Hi all,

I'm experiencing something strange while developing against the HttpFS front-end webapp on Hadoop 2.6.0.

I'm currently digging into WebHdfsFileSystem and HttpFS to understand it better and understand how the rest api works. I've setup a local single node Hadoop instance, which I can query successfully with eg. http://localhost:50070/webhdfs/v1/?op=LISTSTATUS
Returning eg. this FileStatus object:

{
accessTime: 0,
blockSize: 0,
childrenNum: 0,
fileId: 16386,
group: "supergroup",
length: 0,
modificationTime: 1417964248854,
owner: "hadoop",
pathSuffix: "user",
permission: "755",
replication: 0,
storagePolicy: 0,
type: "DIRECTORY"
}

Now, when I start HttpFS and ask for the same data over it's interface (http://localhost:14000/webhdfs/v1/?op=LISTSTATUS), I get a different reply. Especially, the childrenNum and fileId fields are missing, compared to the first result (same file or directory):

{
pathSuffix: "user",
type: "DIRECTORY",
length: 0,
owner: "hadoop",
group: "supergroup",
permission: "755",
accessTime: 0,
modificationTime: 1417964248854,
blockSize: 0,
replication: 0
}

Since I need the childrenNum property, I started digging into the code to see where it's "lost" and found that WebHdfsFileSystem performs a makeQualified() step (around line 1287 in WebHdfsFileSystem.java), just before the list of filestatuses is returned. Basically, it converts HdfsFileStatus objects into FileStatus objects, effectively chopping off those two properties.

The sources for HdfsFileStatus clearly state that it's an "Interface that represents the over the wire information for a file.", so I wonder why this happens, since the HdfsFileStatus contains all the right properties, according to the docs at http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#List_a_Directory

It feels like the FileStatus class hasn't been updated to match the HdfsFileStatus class, but since they don't share any interfaces or superclasses I get the feeling it's intentional, but I just can't find or figure out why.

Can somebody help or shed some light?

thanks,

b.
--

[http://dl.beligum.com/pub/beligum_flag_5dpi.png] Bram Biesbrouck - 0486/118280<tel:0486%2F118280> - www.beligum.com<http://www.beligum.com> -  the republic of reinvention



--

[http://dl.beligum.com/pub/beligum_flag_5dpi.png] Bram Biesbrouck - 0486/118280 - www.beligum.com<http://www.beligum.com> -  the republic of reinvention

Re: Found weird issue with HttpFS and WebHdfsFileSystem

Posted by Bram Biesbrouck <b...@beligum.com>.
Hi Chris,

Thanks for this reply. I thought something funny was happening.

The childNum field is actually very useful (eg for (not) rendering a
expansion marker next to a folder in a GUI when it has children), so it's a
pity the info is there, but get's "eaten up" by the general interface, only
to be re-calculated later on.
It would be nice to have the info as an optional field in the FileStatus
class (initialized to -1 like it is right now), so we can use it if it's
there or just ignore it when not initialized. While I'm
ranting, HdfsFileStatus should override from FileStatus because it's 95%
the same code anyway.

If I read your reply correctly, I assume the fields will be deleted from
the webhdfs JSON responses as well in the future?

Thanks again for the extensive reply, very useful and appreciated.

cheers,

b.



On Fri, Apr 17, 2015 at 12:59 AM, Chris Nauroth <cn...@hortonworks.com>
wrote:

>  Hello Bram,
>
>  There are a few Apache jiras with background discussion of the
> introduction of these fields in WebHDFS.
>
>  https://issues.apache.org/jira/browse/HDFS-4502
>
>  https://issues.apache.org/jira/browse/HDFS-4772
>
>  https://issues.apache.org/jira/browse/HDFS-4969
>
>  The new fields could not be supported in HTTPFS (only WebHDFS), and they
> were not intended to be guaranteed in the public REST API.  Unfortunately,
> the fields were added to the documentation mistakenly in Apache Hadoop
> 2.5.0.
>
>  https://issues.apache.org/jira/browse/HDFS-6153
>
>  We're going to revert that documentation change in Apache Hadoop 2.8.0.
> I suggest that your application does not rely on these fields, or at least
> includes fallback logic to keep working as best as it can if the fields are
> not present.  Another way to determine the number of children would be to
> make a subsequent LISTSTATUS call on the child path.
>
>  I apologize if this caused any inconvenience, and I hope the information
> helps.
>
>   Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>   From: Bram Biesbrouck <b...@beligum.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Thursday, April 16, 2015 at 7:58 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Found weird issue with HttpFS and WebHdfsFileSystem
>
>   Hi all,
>
>  I'm experiencing something strange while developing against the HttpFS
> front-end webapp on Hadoop 2.6.0.
>
>  I'm currently digging into WebHdfsFileSystem and HttpFS to understand it
> better and understand how the rest api works. I've setup a local single
> node Hadoop instance, which I can query successfully with eg.
> http://localhost:50070/webhdfs/v1/?op=LISTSTATUS
> Returning eg. this FileStatus object:
>
>  {
> accessTime: 0,
> blockSize: 0,
> childrenNum: 0,
> fileId: 16386,
> group: "supergroup",
> length: 0,
> modificationTime: 1417964248854,
> owner: "hadoop",
> pathSuffix: "user",
> permission: "755",
> replication: 0,
> storagePolicy: 0,
> type: "DIRECTORY"
> }
>
>  Now, when I start HttpFS and ask for the same data over it's interface (
> http://localhost:14000/webhdfs/v1/?op=LISTSTATUS), I get a different
> reply. Especially, the childrenNum and fileId fields are missing, compared
> to the first result (same file or directory):
>
>  {
> pathSuffix: "user",
> type: "DIRECTORY",
> length: 0,
> owner: "hadoop",
> group: "supergroup",
> permission: "755",
> accessTime: 0,
> modificationTime: 1417964248854,
> blockSize: 0,
> replication: 0
> }
>
>  Since I need the childrenNum property, I started digging into the code
> to see where it's "lost" and found that WebHdfsFileSystem performs a
> makeQualified() step (around line 1287 in WebHdfsFileSystem.java), just
> before the list of filestatuses is returned. Basically, it converts
> HdfsFileStatus objects into FileStatus objects, effectively chopping off
> those two properties.
>
>  The sources for HdfsFileStatus clearly state that it's an "Interface
> that represents the over the wire information for a file.", so I wonder why
> this happens, since the HdfsFileStatus contains all the right properties,
> according to the docs at
> http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#List_a_Directory
>
>  It feels like the FileStatus class hasn't been updated to match the
> HdfsFileStatus class, but since they don't share any interfaces or
> superclasses I get the feeling it's intentional, but I just can't find or
> figure out why.
>
>  Can somebody help or shed some light?
>
>  thanks,
>
>  b.
> --
>
>  Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
> reinvention
>
>


-- 

 Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
reinvention

Re: Found weird issue with HttpFS and WebHdfsFileSystem

Posted by Bram Biesbrouck <b...@beligum.com>.
Hi Chris,

Thanks for this reply. I thought something funny was happening.

The childNum field is actually very useful (eg for (not) rendering a
expansion marker next to a folder in a GUI when it has children), so it's a
pity the info is there, but get's "eaten up" by the general interface, only
to be re-calculated later on.
It would be nice to have the info as an optional field in the FileStatus
class (initialized to -1 like it is right now), so we can use it if it's
there or just ignore it when not initialized. While I'm
ranting, HdfsFileStatus should override from FileStatus because it's 95%
the same code anyway.

If I read your reply correctly, I assume the fields will be deleted from
the webhdfs JSON responses as well in the future?

Thanks again for the extensive reply, very useful and appreciated.

cheers,

b.



On Fri, Apr 17, 2015 at 12:59 AM, Chris Nauroth <cn...@hortonworks.com>
wrote:

>  Hello Bram,
>
>  There are a few Apache jiras with background discussion of the
> introduction of these fields in WebHDFS.
>
>  https://issues.apache.org/jira/browse/HDFS-4502
>
>  https://issues.apache.org/jira/browse/HDFS-4772
>
>  https://issues.apache.org/jira/browse/HDFS-4969
>
>  The new fields could not be supported in HTTPFS (only WebHDFS), and they
> were not intended to be guaranteed in the public REST API.  Unfortunately,
> the fields were added to the documentation mistakenly in Apache Hadoop
> 2.5.0.
>
>  https://issues.apache.org/jira/browse/HDFS-6153
>
>  We're going to revert that documentation change in Apache Hadoop 2.8.0.
> I suggest that your application does not rely on these fields, or at least
> includes fallback logic to keep working as best as it can if the fields are
> not present.  Another way to determine the number of children would be to
> make a subsequent LISTSTATUS call on the child path.
>
>  I apologize if this caused any inconvenience, and I hope the information
> helps.
>
>   Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>   From: Bram Biesbrouck <b...@beligum.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Thursday, April 16, 2015 at 7:58 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Found weird issue with HttpFS and WebHdfsFileSystem
>
>   Hi all,
>
>  I'm experiencing something strange while developing against the HttpFS
> front-end webapp on Hadoop 2.6.0.
>
>  I'm currently digging into WebHdfsFileSystem and HttpFS to understand it
> better and understand how the rest api works. I've setup a local single
> node Hadoop instance, which I can query successfully with eg.
> http://localhost:50070/webhdfs/v1/?op=LISTSTATUS
> Returning eg. this FileStatus object:
>
>  {
> accessTime: 0,
> blockSize: 0,
> childrenNum: 0,
> fileId: 16386,
> group: "supergroup",
> length: 0,
> modificationTime: 1417964248854,
> owner: "hadoop",
> pathSuffix: "user",
> permission: "755",
> replication: 0,
> storagePolicy: 0,
> type: "DIRECTORY"
> }
>
>  Now, when I start HttpFS and ask for the same data over it's interface (
> http://localhost:14000/webhdfs/v1/?op=LISTSTATUS), I get a different
> reply. Especially, the childrenNum and fileId fields are missing, compared
> to the first result (same file or directory):
>
>  {
> pathSuffix: "user",
> type: "DIRECTORY",
> length: 0,
> owner: "hadoop",
> group: "supergroup",
> permission: "755",
> accessTime: 0,
> modificationTime: 1417964248854,
> blockSize: 0,
> replication: 0
> }
>
>  Since I need the childrenNum property, I started digging into the code
> to see where it's "lost" and found that WebHdfsFileSystem performs a
> makeQualified() step (around line 1287 in WebHdfsFileSystem.java), just
> before the list of filestatuses is returned. Basically, it converts
> HdfsFileStatus objects into FileStatus objects, effectively chopping off
> those two properties.
>
>  The sources for HdfsFileStatus clearly state that it's an "Interface
> that represents the over the wire information for a file.", so I wonder why
> this happens, since the HdfsFileStatus contains all the right properties,
> according to the docs at
> http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#List_a_Directory
>
>  It feels like the FileStatus class hasn't been updated to match the
> HdfsFileStatus class, but since they don't share any interfaces or
> superclasses I get the feeling it's intentional, but I just can't find or
> figure out why.
>
>  Can somebody help or shed some light?
>
>  thanks,
>
>  b.
> --
>
>  Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
> reinvention
>
>


-- 

 Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
reinvention

Re: Found weird issue with HttpFS and WebHdfsFileSystem

Posted by Bram Biesbrouck <b...@beligum.com>.
Hi Chris,

Thanks for this reply. I thought something funny was happening.

The childNum field is actually very useful (eg for (not) rendering a
expansion marker next to a folder in a GUI when it has children), so it's a
pity the info is there, but get's "eaten up" by the general interface, only
to be re-calculated later on.
It would be nice to have the info as an optional field in the FileStatus
class (initialized to -1 like it is right now), so we can use it if it's
there or just ignore it when not initialized. While I'm
ranting, HdfsFileStatus should override from FileStatus because it's 95%
the same code anyway.

If I read your reply correctly, I assume the fields will be deleted from
the webhdfs JSON responses as well in the future?

Thanks again for the extensive reply, very useful and appreciated.

cheers,

b.



On Fri, Apr 17, 2015 at 12:59 AM, Chris Nauroth <cn...@hortonworks.com>
wrote:

>  Hello Bram,
>
>  There are a few Apache jiras with background discussion of the
> introduction of these fields in WebHDFS.
>
>  https://issues.apache.org/jira/browse/HDFS-4502
>
>  https://issues.apache.org/jira/browse/HDFS-4772
>
>  https://issues.apache.org/jira/browse/HDFS-4969
>
>  The new fields could not be supported in HTTPFS (only WebHDFS), and they
> were not intended to be guaranteed in the public REST API.  Unfortunately,
> the fields were added to the documentation mistakenly in Apache Hadoop
> 2.5.0.
>
>  https://issues.apache.org/jira/browse/HDFS-6153
>
>  We're going to revert that documentation change in Apache Hadoop 2.8.0.
> I suggest that your application does not rely on these fields, or at least
> includes fallback logic to keep working as best as it can if the fields are
> not present.  Another way to determine the number of children would be to
> make a subsequent LISTSTATUS call on the child path.
>
>  I apologize if this caused any inconvenience, and I hope the information
> helps.
>
>   Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>   From: Bram Biesbrouck <b...@beligum.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Thursday, April 16, 2015 at 7:58 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Found weird issue with HttpFS and WebHdfsFileSystem
>
>   Hi all,
>
>  I'm experiencing something strange while developing against the HttpFS
> front-end webapp on Hadoop 2.6.0.
>
>  I'm currently digging into WebHdfsFileSystem and HttpFS to understand it
> better and understand how the rest api works. I've setup a local single
> node Hadoop instance, which I can query successfully with eg.
> http://localhost:50070/webhdfs/v1/?op=LISTSTATUS
> Returning eg. this FileStatus object:
>
>  {
> accessTime: 0,
> blockSize: 0,
> childrenNum: 0,
> fileId: 16386,
> group: "supergroup",
> length: 0,
> modificationTime: 1417964248854,
> owner: "hadoop",
> pathSuffix: "user",
> permission: "755",
> replication: 0,
> storagePolicy: 0,
> type: "DIRECTORY"
> }
>
>  Now, when I start HttpFS and ask for the same data over it's interface (
> http://localhost:14000/webhdfs/v1/?op=LISTSTATUS), I get a different
> reply. Especially, the childrenNum and fileId fields are missing, compared
> to the first result (same file or directory):
>
>  {
> pathSuffix: "user",
> type: "DIRECTORY",
> length: 0,
> owner: "hadoop",
> group: "supergroup",
> permission: "755",
> accessTime: 0,
> modificationTime: 1417964248854,
> blockSize: 0,
> replication: 0
> }
>
>  Since I need the childrenNum property, I started digging into the code
> to see where it's "lost" and found that WebHdfsFileSystem performs a
> makeQualified() step (around line 1287 in WebHdfsFileSystem.java), just
> before the list of filestatuses is returned. Basically, it converts
> HdfsFileStatus objects into FileStatus objects, effectively chopping off
> those two properties.
>
>  The sources for HdfsFileStatus clearly state that it's an "Interface
> that represents the over the wire information for a file.", so I wonder why
> this happens, since the HdfsFileStatus contains all the right properties,
> according to the docs at
> http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#List_a_Directory
>
>  It feels like the FileStatus class hasn't been updated to match the
> HdfsFileStatus class, but since they don't share any interfaces or
> superclasses I get the feeling it's intentional, but I just can't find or
> figure out why.
>
>  Can somebody help or shed some light?
>
>  thanks,
>
>  b.
> --
>
>  Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
> reinvention
>
>


-- 

 Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
reinvention

Re: Found weird issue with HttpFS and WebHdfsFileSystem

Posted by Bram Biesbrouck <b...@beligum.com>.
Hi Chris,

Thanks for this reply. I thought something funny was happening.

The childNum field is actually very useful (eg for (not) rendering a
expansion marker next to a folder in a GUI when it has children), so it's a
pity the info is there, but get's "eaten up" by the general interface, only
to be re-calculated later on.
It would be nice to have the info as an optional field in the FileStatus
class (initialized to -1 like it is right now), so we can use it if it's
there or just ignore it when not initialized. While I'm
ranting, HdfsFileStatus should override from FileStatus because it's 95%
the same code anyway.

If I read your reply correctly, I assume the fields will be deleted from
the webhdfs JSON responses as well in the future?

Thanks again for the extensive reply, very useful and appreciated.

cheers,

b.



On Fri, Apr 17, 2015 at 12:59 AM, Chris Nauroth <cn...@hortonworks.com>
wrote:

>  Hello Bram,
>
>  There are a few Apache jiras with background discussion of the
> introduction of these fields in WebHDFS.
>
>  https://issues.apache.org/jira/browse/HDFS-4502
>
>  https://issues.apache.org/jira/browse/HDFS-4772
>
>  https://issues.apache.org/jira/browse/HDFS-4969
>
>  The new fields could not be supported in HTTPFS (only WebHDFS), and they
> were not intended to be guaranteed in the public REST API.  Unfortunately,
> the fields were added to the documentation mistakenly in Apache Hadoop
> 2.5.0.
>
>  https://issues.apache.org/jira/browse/HDFS-6153
>
>  We're going to revert that documentation change in Apache Hadoop 2.8.0.
> I suggest that your application does not rely on these fields, or at least
> includes fallback logic to keep working as best as it can if the fields are
> not present.  Another way to determine the number of children would be to
> make a subsequent LISTSTATUS call on the child path.
>
>  I apologize if this caused any inconvenience, and I hope the information
> helps.
>
>   Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>   From: Bram Biesbrouck <b...@beligum.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Thursday, April 16, 2015 at 7:58 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Found weird issue with HttpFS and WebHdfsFileSystem
>
>   Hi all,
>
>  I'm experiencing something strange while developing against the HttpFS
> front-end webapp on Hadoop 2.6.0.
>
>  I'm currently digging into WebHdfsFileSystem and HttpFS to understand it
> better and understand how the rest api works. I've setup a local single
> node Hadoop instance, which I can query successfully with eg.
> http://localhost:50070/webhdfs/v1/?op=LISTSTATUS
> Returning eg. this FileStatus object:
>
>  {
> accessTime: 0,
> blockSize: 0,
> childrenNum: 0,
> fileId: 16386,
> group: "supergroup",
> length: 0,
> modificationTime: 1417964248854,
> owner: "hadoop",
> pathSuffix: "user",
> permission: "755",
> replication: 0,
> storagePolicy: 0,
> type: "DIRECTORY"
> }
>
>  Now, when I start HttpFS and ask for the same data over it's interface (
> http://localhost:14000/webhdfs/v1/?op=LISTSTATUS), I get a different
> reply. Especially, the childrenNum and fileId fields are missing, compared
> to the first result (same file or directory):
>
>  {
> pathSuffix: "user",
> type: "DIRECTORY",
> length: 0,
> owner: "hadoop",
> group: "supergroup",
> permission: "755",
> accessTime: 0,
> modificationTime: 1417964248854,
> blockSize: 0,
> replication: 0
> }
>
>  Since I need the childrenNum property, I started digging into the code
> to see where it's "lost" and found that WebHdfsFileSystem performs a
> makeQualified() step (around line 1287 in WebHdfsFileSystem.java), just
> before the list of filestatuses is returned. Basically, it converts
> HdfsFileStatus objects into FileStatus objects, effectively chopping off
> those two properties.
>
>  The sources for HdfsFileStatus clearly state that it's an "Interface
> that represents the over the wire information for a file.", so I wonder why
> this happens, since the HdfsFileStatus contains all the right properties,
> according to the docs at
> http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#List_a_Directory
>
>  It feels like the FileStatus class hasn't been updated to match the
> HdfsFileStatus class, but since they don't share any interfaces or
> superclasses I get the feeling it's intentional, but I just can't find or
> figure out why.
>
>  Can somebody help or shed some light?
>
>  thanks,
>
>  b.
> --
>
>  Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
> reinvention
>
>


-- 

 Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
reinvention

Re: Found weird issue with HttpFS and WebHdfsFileSystem

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hello Bram,

There are a few Apache jiras with background discussion of the introduction of these fields in WebHDFS.

https://issues.apache.org/jira/browse/HDFS-4502

https://issues.apache.org/jira/browse/HDFS-4772

https://issues.apache.org/jira/browse/HDFS-4969

The new fields could not be supported in HTTPFS (only WebHDFS), and they were not intended to be guaranteed in the public REST API.  Unfortunately, the fields were added to the documentation mistakenly in Apache Hadoop 2.5.0.

https://issues.apache.org/jira/browse/HDFS-6153

We're going to revert that documentation change in Apache Hadoop 2.8.0.  I suggest that your application does not rely on these fields, or at least includes fallback logic to keep working as best as it can if the fields are not present.  Another way to determine the number of children would be to make a subsequent LISTSTATUS call on the child path.

I apologize if this caused any inconvenience, and I hope the information helps.

Chris Nauroth
Hortonworks
http://hortonworks.com/


From: Bram Biesbrouck <b@...@beligum.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Thursday, April 16, 2015 at 7:58 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Found weird issue with HttpFS and WebHdfsFileSystem

Hi all,

I'm experiencing something strange while developing against the HttpFS front-end webapp on Hadoop 2.6.0.

I'm currently digging into WebHdfsFileSystem and HttpFS to understand it better and understand how the rest api works. I've setup a local single node Hadoop instance, which I can query successfully with eg. http://localhost:50070/webhdfs/v1/?op=LISTSTATUS
Returning eg. this FileStatus object:

{
accessTime: 0,
blockSize: 0,
childrenNum: 0,
fileId: 16386,
group: "supergroup",
length: 0,
modificationTime: 1417964248854,
owner: "hadoop",
pathSuffix: "user",
permission: "755",
replication: 0,
storagePolicy: 0,
type: "DIRECTORY"
}

Now, when I start HttpFS and ask for the same data over it's interface (http://localhost:14000/webhdfs/v1/?op=LISTSTATUS), I get a different reply. Especially, the childrenNum and fileId fields are missing, compared to the first result (same file or directory):

{
pathSuffix: "user",
type: "DIRECTORY",
length: 0,
owner: "hadoop",
group: "supergroup",
permission: "755",
accessTime: 0,
modificationTime: 1417964248854,
blockSize: 0,
replication: 0
}

Since I need the childrenNum property, I started digging into the code to see where it's "lost" and found that WebHdfsFileSystem performs a makeQualified() step (around line 1287 in WebHdfsFileSystem.java), just before the list of filestatuses is returned. Basically, it converts HdfsFileStatus objects into FileStatus objects, effectively chopping off those two properties.

The sources for HdfsFileStatus clearly state that it's an "Interface that represents the over the wire information for a file.", so I wonder why this happens, since the HdfsFileStatus contains all the right properties, according to the docs at http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#List_a_Directory

It feels like the FileStatus class hasn't been updated to match the HdfsFileStatus class, but since they don't share any interfaces or superclasses I get the feeling it's intentional, but I just can't find or figure out why.

Can somebody help or shed some light?

thanks,

b.
--

[http://dl.beligum.com/pub/beligum_flag_5dpi.png] Bram Biesbrouck - 0486/118280 - www.beligum.com<http://www.beligum.com> -  the republic of reinvention

Re: Found weird issue with HttpFS and WebHdfsFileSystem

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hello Bram,

There are a few Apache jiras with background discussion of the introduction of these fields in WebHDFS.

https://issues.apache.org/jira/browse/HDFS-4502

https://issues.apache.org/jira/browse/HDFS-4772

https://issues.apache.org/jira/browse/HDFS-4969

The new fields could not be supported in HTTPFS (only WebHDFS), and they were not intended to be guaranteed in the public REST API.  Unfortunately, the fields were added to the documentation mistakenly in Apache Hadoop 2.5.0.

https://issues.apache.org/jira/browse/HDFS-6153

We're going to revert that documentation change in Apache Hadoop 2.8.0.  I suggest that your application does not rely on these fields, or at least includes fallback logic to keep working as best as it can if the fields are not present.  Another way to determine the number of children would be to make a subsequent LISTSTATUS call on the child path.

I apologize if this caused any inconvenience, and I hope the information helps.

Chris Nauroth
Hortonworks
http://hortonworks.com/


From: Bram Biesbrouck <b@...@beligum.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Thursday, April 16, 2015 at 7:58 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Found weird issue with HttpFS and WebHdfsFileSystem

Hi all,

I'm experiencing something strange while developing against the HttpFS front-end webapp on Hadoop 2.6.0.

I'm currently digging into WebHdfsFileSystem and HttpFS to understand it better and understand how the rest api works. I've setup a local single node Hadoop instance, which I can query successfully with eg. http://localhost:50070/webhdfs/v1/?op=LISTSTATUS
Returning eg. this FileStatus object:

{
accessTime: 0,
blockSize: 0,
childrenNum: 0,
fileId: 16386,
group: "supergroup",
length: 0,
modificationTime: 1417964248854,
owner: "hadoop",
pathSuffix: "user",
permission: "755",
replication: 0,
storagePolicy: 0,
type: "DIRECTORY"
}

Now, when I start HttpFS and ask for the same data over it's interface (http://localhost:14000/webhdfs/v1/?op=LISTSTATUS), I get a different reply. Especially, the childrenNum and fileId fields are missing, compared to the first result (same file or directory):

{
pathSuffix: "user",
type: "DIRECTORY",
length: 0,
owner: "hadoop",
group: "supergroup",
permission: "755",
accessTime: 0,
modificationTime: 1417964248854,
blockSize: 0,
replication: 0
}

Since I need the childrenNum property, I started digging into the code to see where it's "lost" and found that WebHdfsFileSystem performs a makeQualified() step (around line 1287 in WebHdfsFileSystem.java), just before the list of filestatuses is returned. Basically, it converts HdfsFileStatus objects into FileStatus objects, effectively chopping off those two properties.

The sources for HdfsFileStatus clearly state that it's an "Interface that represents the over the wire information for a file.", so I wonder why this happens, since the HdfsFileStatus contains all the right properties, according to the docs at http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#List_a_Directory

It feels like the FileStatus class hasn't been updated to match the HdfsFileStatus class, but since they don't share any interfaces or superclasses I get the feeling it's intentional, but I just can't find or figure out why.

Can somebody help or shed some light?

thanks,

b.
--

[http://dl.beligum.com/pub/beligum_flag_5dpi.png] Bram Biesbrouck - 0486/118280 - www.beligum.com<http://www.beligum.com> -  the republic of reinvention

Re: Found weird issue with HttpFS and WebHdfsFileSystem

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hello Bram,

There are a few Apache jiras with background discussion of the introduction of these fields in WebHDFS.

https://issues.apache.org/jira/browse/HDFS-4502

https://issues.apache.org/jira/browse/HDFS-4772

https://issues.apache.org/jira/browse/HDFS-4969

The new fields could not be supported in HTTPFS (only WebHDFS), and they were not intended to be guaranteed in the public REST API.  Unfortunately, the fields were added to the documentation mistakenly in Apache Hadoop 2.5.0.

https://issues.apache.org/jira/browse/HDFS-6153

We're going to revert that documentation change in Apache Hadoop 2.8.0.  I suggest that your application does not rely on these fields, or at least includes fallback logic to keep working as best as it can if the fields are not present.  Another way to determine the number of children would be to make a subsequent LISTSTATUS call on the child path.

I apologize if this caused any inconvenience, and I hope the information helps.

Chris Nauroth
Hortonworks
http://hortonworks.com/


From: Bram Biesbrouck <b@...@beligum.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Thursday, April 16, 2015 at 7:58 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Found weird issue with HttpFS and WebHdfsFileSystem

Hi all,

I'm experiencing something strange while developing against the HttpFS front-end webapp on Hadoop 2.6.0.

I'm currently digging into WebHdfsFileSystem and HttpFS to understand it better and understand how the rest api works. I've setup a local single node Hadoop instance, which I can query successfully with eg. http://localhost:50070/webhdfs/v1/?op=LISTSTATUS
Returning eg. this FileStatus object:

{
accessTime: 0,
blockSize: 0,
childrenNum: 0,
fileId: 16386,
group: "supergroup",
length: 0,
modificationTime: 1417964248854,
owner: "hadoop",
pathSuffix: "user",
permission: "755",
replication: 0,
storagePolicy: 0,
type: "DIRECTORY"
}

Now, when I start HttpFS and ask for the same data over it's interface (http://localhost:14000/webhdfs/v1/?op=LISTSTATUS), I get a different reply. Especially, the childrenNum and fileId fields are missing, compared to the first result (same file or directory):

{
pathSuffix: "user",
type: "DIRECTORY",
length: 0,
owner: "hadoop",
group: "supergroup",
permission: "755",
accessTime: 0,
modificationTime: 1417964248854,
blockSize: 0,
replication: 0
}

Since I need the childrenNum property, I started digging into the code to see where it's "lost" and found that WebHdfsFileSystem performs a makeQualified() step (around line 1287 in WebHdfsFileSystem.java), just before the list of filestatuses is returned. Basically, it converts HdfsFileStatus objects into FileStatus objects, effectively chopping off those two properties.

The sources for HdfsFileStatus clearly state that it's an "Interface that represents the over the wire information for a file.", so I wonder why this happens, since the HdfsFileStatus contains all the right properties, according to the docs at http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#List_a_Directory

It feels like the FileStatus class hasn't been updated to match the HdfsFileStatus class, but since they don't share any interfaces or superclasses I get the feeling it's intentional, but I just can't find or figure out why.

Can somebody help or shed some light?

thanks,

b.
--

[http://dl.beligum.com/pub/beligum_flag_5dpi.png] Bram Biesbrouck - 0486/118280 - www.beligum.com<http://www.beligum.com> -  the republic of reinvention

Re: Found weird issue with HttpFS and WebHdfsFileSystem

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hello Bram,

There are a few Apache jiras with background discussion of the introduction of these fields in WebHDFS.

https://issues.apache.org/jira/browse/HDFS-4502

https://issues.apache.org/jira/browse/HDFS-4772

https://issues.apache.org/jira/browse/HDFS-4969

The new fields could not be supported in HTTPFS (only WebHDFS), and they were not intended to be guaranteed in the public REST API.  Unfortunately, the fields were added to the documentation mistakenly in Apache Hadoop 2.5.0.

https://issues.apache.org/jira/browse/HDFS-6153

We're going to revert that documentation change in Apache Hadoop 2.8.0.  I suggest that your application does not rely on these fields, or at least includes fallback logic to keep working as best as it can if the fields are not present.  Another way to determine the number of children would be to make a subsequent LISTSTATUS call on the child path.

I apologize if this caused any inconvenience, and I hope the information helps.

Chris Nauroth
Hortonworks
http://hortonworks.com/


From: Bram Biesbrouck <b@...@beligum.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Thursday, April 16, 2015 at 7:58 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Found weird issue with HttpFS and WebHdfsFileSystem

Hi all,

I'm experiencing something strange while developing against the HttpFS front-end webapp on Hadoop 2.6.0.

I'm currently digging into WebHdfsFileSystem and HttpFS to understand it better and understand how the rest api works. I've setup a local single node Hadoop instance, which I can query successfully with eg. http://localhost:50070/webhdfs/v1/?op=LISTSTATUS
Returning eg. this FileStatus object:

{
accessTime: 0,
blockSize: 0,
childrenNum: 0,
fileId: 16386,
group: "supergroup",
length: 0,
modificationTime: 1417964248854,
owner: "hadoop",
pathSuffix: "user",
permission: "755",
replication: 0,
storagePolicy: 0,
type: "DIRECTORY"
}

Now, when I start HttpFS and ask for the same data over it's interface (http://localhost:14000/webhdfs/v1/?op=LISTSTATUS), I get a different reply. Especially, the childrenNum and fileId fields are missing, compared to the first result (same file or directory):

{
pathSuffix: "user",
type: "DIRECTORY",
length: 0,
owner: "hadoop",
group: "supergroup",
permission: "755",
accessTime: 0,
modificationTime: 1417964248854,
blockSize: 0,
replication: 0
}

Since I need the childrenNum property, I started digging into the code to see where it's "lost" and found that WebHdfsFileSystem performs a makeQualified() step (around line 1287 in WebHdfsFileSystem.java), just before the list of filestatuses is returned. Basically, it converts HdfsFileStatus objects into FileStatus objects, effectively chopping off those two properties.

The sources for HdfsFileStatus clearly state that it's an "Interface that represents the over the wire information for a file.", so I wonder why this happens, since the HdfsFileStatus contains all the right properties, according to the docs at http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#List_a_Directory

It feels like the FileStatus class hasn't been updated to match the HdfsFileStatus class, but since they don't share any interfaces or superclasses I get the feeling it's intentional, but I just can't find or figure out why.

Can somebody help or shed some light?

thanks,

b.
--

[http://dl.beligum.com/pub/beligum_flag_5dpi.png] Bram Biesbrouck - 0486/118280 - www.beligum.com<http://www.beligum.com> -  the republic of reinvention