You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "Eric Newton (JIRA)" <ji...@apache.org> on 2014/02/01 18:56:12 UTC
[jira] [Commented] (ACCUMULO-118) accumulo could work across HDFS instances, which would help it to scale past a single namenode

    [ https://issues.apache.org/jira/browse/ACCUMULO-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13888661#comment-13888661 ] 

Eric Newton commented on ACCUMULO-118:
--------------------------------------

bq.  I think this feature was merged in before it was complete

Probably.  But it was a pretty massive change, and maintaining it as a patch set, even with git's help, would have been very hard.

bq. I did not realize all of the problems absolute paths could cause

Nor would we have if it was not merged in.

bq. should have started with administrative use cases

I think we are getting better at this.  For example, I can think of lots of ways that the initial WAL implementation caused a lot of grief for unsuspecting administrators.  We fixed this after it was released into the wild based on feedback from the administrators. Ultimately these were fixed by moving the WAL to HDFS, and then ferreting out all the settings to make HDFS an appropriate store for the WAL.

I think the use case of "what if administrators change the URL of a NN?" is a reasonable one, but was certainly not anything I was thinking about when I was changing thousands of lines of code to use full paths.  The more subtle issues of determining aliases for namespaces (hdfs://example:9000 vs hdfs://example.com:9000), and recognizing real namespaces under viewfs are the sort of subtle things that we will only find through actual use.

My initial goal of using concrete paths to simplify debugging might have been the wrong choice.  Using some kind of indirect configuration that points to a real namespace (like viewfs) may have been better.  But, that requires that you value "administrators should be able to easily move a NN to a new URL."  The ability to do this with the old relative paths was not a design goal, so much as a useful result of using the shortest name possible for each file.

bq. These really seem to be the long poll in the tent for the 1.6 release 

Seems to me to not be so far behind namespaces. Constructive criticism includes suggestions on how to make things better.  Working code is even more constructive.

> accumulo could work across HDFS instances, which would help it to scale past a single namenode
> ----------------------------------------------------------------------------------------------
>
>                 Key: ACCUMULO-118
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-118
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: master, tserver
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>            Priority: Blocker
>             Fix For: 1.6.0
>
>         Attachments: ACCUMULO-118-01.txt, ACCUMULO-118-02.txt
>
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> Consider using full path names to files, which would allow the servers to access the files on any HDFS file system.
> Work may exist elsewhere to run HDFS using a number of NameNode instances to break up the namespace.
> We may need a pluggable strategy to determine namespace for new files.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)