You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Pete Wyckoff (JIRA)" <ji...@apache.org> on 2008/09/09 22:33:44 UTC

[jira] Created: (HADOOP-4136) imeplement DFSClient on top of thriftfs - this may require DFSClient or DN protocol changes

imeplement DFSClient on top of thriftfs - this may require DFSClient or DN protocol changes
-------------------------------------------------------------------------------------------

                 Key: HADOOP-4136
                 URL: https://issues.apache.org/jira/browse/HADOOP-4136
             Project: Hadoop Core
          Issue Type: New Feature
          Components: contrib/thiftfs, dfs, fs
            Reporter: Pete Wyckoff
            Priority: Minor


Open up DFS Protocol to allow non-Hadoop DFS clients to implement reads/writes.  Obviously, the NN need not be changed because the thriftfs server will serve up the same metadata - ie it's a bridge to the NN.

This is useful because if we can do this in Java using more open APIs, we could do it in C++ or Python or Perl :)

Doing it in Java first makes sense because we already have the DFSClient - kind of a proof of concept.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4136) imeplement DFSClient on top of thriftfs - this may require DFSClient or DN protocol changes

Posted by "Pete Wyckoff (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633568#action_12633568 ] 

Pete Wyckoff commented on HADOOP-4136:
--------------------------------------

bq. , since it implies a lot of duplicated logic that will be hard to maintain

Yes, I agree, although once there are well defined/stable APIs, could we not take the KFSClient and port it to these APIs. Admittedly, I haven't looked a lot at it yet, but it should probably implement everything the DFSClient does, but in C++ - so, it should be a good starting point for a c++ client.


> imeplement DFSClient on top of thriftfs - this may require DFSClient or DN protocol changes
> -------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4136
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4136
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: contrib/thiftfs, dfs, fs
>            Reporter: Pete Wyckoff
>            Priority: Minor
>
> Open up DFS Protocol to allow non-Hadoop DFS clients to implement reads/writes.  Obviously, the NN need not be changed because the thriftfs server will serve up the same metadata - ie it's a bridge to the NN.
> This is useful because if we can do this in Java using more open APIs, we could do it in C++ or Python or Perl :)
> Doing it in Java first makes sense because we already have the DFSClient - kind of a proof of concept.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4136) imeplement DFSClient on top of thriftfs - this may require DFSClient or DN protocol changes

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629616#action_12629616 ] 

Doug Cutting commented on HADOOP-4136:
--------------------------------------

I'm not sure what you're proposing here.  Much of HDFS's logic is in the java client code.  We prefer to only support the FileSystem API to access HDFS, rather than replicate the client logic.  The RPCs are thus intentionally not public and subject to change without warning, for use only by the java client code.

> imeplement DFSClient on top of thriftfs - this may require DFSClient or DN protocol changes
> -------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4136
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4136
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: contrib/thiftfs, dfs, fs
>            Reporter: Pete Wyckoff
>            Priority: Minor
>
> Open up DFS Protocol to allow non-Hadoop DFS clients to implement reads/writes.  Obviously, the NN need not be changed because the thriftfs server will serve up the same metadata - ie it's a bridge to the NN.
> This is useful because if we can do this in Java using more open APIs, we could do it in C++ or Python or Perl :)
> Doing it in Java first makes sense because we already have the DFSClient - kind of a proof of concept.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4136) imeplement DFSClient on top of thriftfs - this may require DFSClient or DN protocol changes

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629626#action_12629626 ] 

Doug Cutting commented on HADOOP-4136:
--------------------------------------

Okay, that is really what you meant!  Unfortunately I'm not convinced we yet want to support that, since it implies a lot of duplicated logic that will be hard to maintain.  Once we go 1.0 it might be easier, since the HDFS protocols should be more stable then.

> imeplement DFSClient on top of thriftfs - this may require DFSClient or DN protocol changes
> -------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4136
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4136
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: contrib/thiftfs, dfs, fs
>            Reporter: Pete Wyckoff
>            Priority: Minor
>
> Open up DFS Protocol to allow non-Hadoop DFS clients to implement reads/writes.  Obviously, the NN need not be changed because the thriftfs server will serve up the same metadata - ie it's a bridge to the NN.
> This is useful because if we can do this in Java using more open APIs, we could do it in C++ or Python or Perl :)
> Doing it in Java first makes sense because we already have the DFSClient - kind of a proof of concept.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4136) imeplement DFSClient on top of thriftfs - this may require DFSClient or DN protocol changes

Posted by "Pete Wyckoff (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629620#action_12629620 ] 

Pete Wyckoff commented on HADOOP-4136:
--------------------------------------

It's nice to be able to change the APIs under one without warning. I guess what I'm asking for is something maybe for more like 1.0+ that opens the APIs so I can write applications that take advantage of them (maybe we aren't there yet)

One big one could be a C++ dfsclient.  that's a lot of work though (and thus the idea of  a Java proof of concept since we have the code already). I know there is talking about changing the RPC and I guess this JIRA expresses the idea that those RPCs should be open and cross language compatible. I often find myself in situations where I want something from Hadoop but can't link the whole thing with my application. Admittedly this is mostly management type stuff.

If I had a pure C++ DFSClient, I could implement a FUSE (or direct kernel module) that talks to HDFS without any Java at all.  For the kernel case, obviously this would be critical.

I think having a C++ DFSClient would open up a lot of things.

-- pete




> imeplement DFSClient on top of thriftfs - this may require DFSClient or DN protocol changes
> -------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4136
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4136
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: contrib/thiftfs, dfs, fs
>            Reporter: Pete Wyckoff
>            Priority: Minor
>
> Open up DFS Protocol to allow non-Hadoop DFS clients to implement reads/writes.  Obviously, the NN need not be changed because the thriftfs server will serve up the same metadata - ie it's a bridge to the NN.
> This is useful because if we can do this in Java using more open APIs, we could do it in C++ or Python or Perl :)
> Doing it in Java first makes sense because we already have the DFSClient - kind of a proof of concept.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4136) imeplement DFSClient on top of thriftfs - this may require DFSClient or DN protocol changes

Posted by "Pete Wyckoff (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629598#action_12629598 ] 

Pete Wyckoff commented on HADOOP-4136:
--------------------------------------

an approach could be:

1. implement the FileSystem API on top of thriftfs (even if we have to pull in all of Hadoop, doesn't matter as long as we're only using open apis)
2. implement the DFSClient to DN APIs

Done. Of course 2 is easier said than done and need to look at the details. But, it seems reasonable to re-factor those APIs to be more standardized and open.

2 need not be done with anything like thrift. could be done with a hadoop rpc layer that's well defined and can be implemented without reflection etc.

We could also fake 2 by having thriftfs be a bridge to the DN APIs too.



> imeplement DFSClient on top of thriftfs - this may require DFSClient or DN protocol changes
> -------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4136
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4136
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: contrib/thiftfs, dfs, fs
>            Reporter: Pete Wyckoff
>            Priority: Minor
>
> Open up DFS Protocol to allow non-Hadoop DFS clients to implement reads/writes.  Obviously, the NN need not be changed because the thriftfs server will serve up the same metadata - ie it's a bridge to the NN.
> This is useful because if we can do this in Java using more open APIs, we could do it in C++ or Python or Perl :)
> Doing it in Java first makes sense because we already have the DFSClient - kind of a proof of concept.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.