You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Rob Blah <tm...@gmail.com> on 2013/10/01 00:09:00 UTC

When to use DFSInputStream and HdfsDataInputStream

Hi

What is the use case difference between:
- DFSInputStream and HdfsDataInputStream
- DFSOutputStream and HdfsDataOutputStream

When one should be preferred over other? From sources I see they have
similar functionality, only HdfsData*Stream "follows" Data*Stream instead
of *Stream. Also is DFS*Stream more general than HdfsData*Stream, in the
sense it works on higher abstraction layer, can work with other Distributed
FS (even though it contact HDFS specific components), or its just naming
convention?

Which one should I chose to read/write data from/to HDFS and why (sounds
like academic question ;) )?

* -> means both Input and Output

regards
tmp

RE: When to use DFSInputStream and HdfsDataInputStream

Posted by Uma Maheswara Rao G <ma...@huawei.com>.
Hi Rob,

DFSInputStream:  InterfaceAudience for this class is private and  you should not use this class directly. This class mainly implements actual core functionality of read. And this is DFS specific implementation only.
HdfsDataInputStream : InterfaceAudience for this class is public and you can use this class. In fact, you will get the object of HdfsDataInputStream when you open the file for read. This wrapper provides you some additional DFS specific api implementations like getVisibleLength etc which are may not be the intended apis for normal FS.

Similar way for write:
I hope this will help you for clarifying your doubts.

Regards,
Uma

From: Rob Blah [mailto:tmp5330@gmail.com]
Sent: 01 October 2013 03:39
To: user@hadoop.apache.org
Subject: When to use DFSInputStream and HdfsDataInputStream

Hi
What is the use case difference between:
- DFSInputStream and HdfsDataInputStream
- DFSOutputStream and HdfsDataOutputStream
When one should be preferred over other? From sources I see they have similar functionality, only HdfsData*Stream "follows" Data*Stream instead of *Stream. Also is DFS*Stream more general than HdfsData*Stream, in the sense it works on higher abstraction layer, can work with other Distributed FS (even though it contact HDFS specific components), or its just naming convention?
Which one should I chose to read/write data from/to HDFS and why (sounds like academic question ;) )?

* -> means both Input and Output

regards
tmp

RE: When to use DFSInputStream and HdfsDataInputStream

Posted by Uma Maheswara Rao G <ma...@huawei.com>.
Hi Rob,

DFSInputStream:  InterfaceAudience for this class is private and  you should not use this class directly. This class mainly implements actual core functionality of read. And this is DFS specific implementation only.
HdfsDataInputStream : InterfaceAudience for this class is public and you can use this class. In fact, you will get the object of HdfsDataInputStream when you open the file for read. This wrapper provides you some additional DFS specific api implementations like getVisibleLength etc which are may not be the intended apis for normal FS.

Similar way for write:
I hope this will help you for clarifying your doubts.

Regards,
Uma

From: Rob Blah [mailto:tmp5330@gmail.com]
Sent: 01 October 2013 03:39
To: user@hadoop.apache.org
Subject: When to use DFSInputStream and HdfsDataInputStream

Hi
What is the use case difference between:
- DFSInputStream and HdfsDataInputStream
- DFSOutputStream and HdfsDataOutputStream
When one should be preferred over other? From sources I see they have similar functionality, only HdfsData*Stream "follows" Data*Stream instead of *Stream. Also is DFS*Stream more general than HdfsData*Stream, in the sense it works on higher abstraction layer, can work with other Distributed FS (even though it contact HDFS specific components), or its just naming convention?
Which one should I chose to read/write data from/to HDFS and why (sounds like academic question ;) )?

* -> means both Input and Output

regards
tmp

RE: When to use DFSInputStream and HdfsDataInputStream

Posted by Uma Maheswara Rao G <ma...@huawei.com>.
Hi Rob,

DFSInputStream:  InterfaceAudience for this class is private and  you should not use this class directly. This class mainly implements actual core functionality of read. And this is DFS specific implementation only.
HdfsDataInputStream : InterfaceAudience for this class is public and you can use this class. In fact, you will get the object of HdfsDataInputStream when you open the file for read. This wrapper provides you some additional DFS specific api implementations like getVisibleLength etc which are may not be the intended apis for normal FS.

Similar way for write:
I hope this will help you for clarifying your doubts.

Regards,
Uma

From: Rob Blah [mailto:tmp5330@gmail.com]
Sent: 01 October 2013 03:39
To: user@hadoop.apache.org
Subject: When to use DFSInputStream and HdfsDataInputStream

Hi
What is the use case difference between:
- DFSInputStream and HdfsDataInputStream
- DFSOutputStream and HdfsDataOutputStream
When one should be preferred over other? From sources I see they have similar functionality, only HdfsData*Stream "follows" Data*Stream instead of *Stream. Also is DFS*Stream more general than HdfsData*Stream, in the sense it works on higher abstraction layer, can work with other Distributed FS (even though it contact HDFS specific components), or its just naming convention?
Which one should I chose to read/write data from/to HDFS and why (sounds like academic question ;) )?

* -> means both Input and Output

regards
tmp

RE: When to use DFSInputStream and HdfsDataInputStream

Posted by Uma Maheswara Rao G <ma...@huawei.com>.
Hi Rob,

DFSInputStream:  InterfaceAudience for this class is private and  you should not use this class directly. This class mainly implements actual core functionality of read. And this is DFS specific implementation only.
HdfsDataInputStream : InterfaceAudience for this class is public and you can use this class. In fact, you will get the object of HdfsDataInputStream when you open the file for read. This wrapper provides you some additional DFS specific api implementations like getVisibleLength etc which are may not be the intended apis for normal FS.

Similar way for write:
I hope this will help you for clarifying your doubts.

Regards,
Uma

From: Rob Blah [mailto:tmp5330@gmail.com]
Sent: 01 October 2013 03:39
To: user@hadoop.apache.org
Subject: When to use DFSInputStream and HdfsDataInputStream

Hi
What is the use case difference between:
- DFSInputStream and HdfsDataInputStream
- DFSOutputStream and HdfsDataOutputStream
When one should be preferred over other? From sources I see they have similar functionality, only HdfsData*Stream "follows" Data*Stream instead of *Stream. Also is DFS*Stream more general than HdfsData*Stream, in the sense it works on higher abstraction layer, can work with other Distributed FS (even though it contact HDFS specific components), or its just naming convention?
Which one should I chose to read/write data from/to HDFS and why (sounds like academic question ;) )?

* -> means both Input and Output

regards
tmp