You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Konstantin Shvachko (JIRA)" <ji...@apache.org> on 2007/04/21 02:21:15 UTC

[jira] Created: (HADOOP-1283) Eliminate internal UTF8 to String and vice versa conversions in the name-node.

Eliminate internal UTF8 to String and vice versa conversions in the name-node.
------------------------------------------------------------------------------

                 Key: HADOOP-1283
                 URL: https://issues.apache.org/jira/browse/HADOOP-1283
             Project: Hadoop
          Issue Type: Improvement
          Components: dfs
    Affects Versions: 0.12.0
            Reporter: Konstantin Shvachko
             Fix For: 0.13.0


We have internal conversions of those two types inside name-node code. One example:
NameNode.complete(String src, String clientName)
then it calls
FSNamesystem.completeFile(new UTF8(src), new UTF8(clientName));
which in turn finally calls
FSDirectory.addNode(path.toString(), newNode )
and in another place
FSDirectory.getNode(src.toString())

So we have several conversions of the same parameter back and forth during computation.
We should keep the parameter type consistent within different methods.

The question is, which type should be used: String or Text.
>From previous discussions I remember that Text is more efficient in space and time for non ASCII
data. Here we mostly deal with file names and network addresses, which are ASCII.
Does it make sense to use Text in this case?

UTF8 is also used as a key in two maps: pendingCreates and leases.
This should be replaced too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1283) Eliminate internal UTF8 to String and vice versa conversions in the name-node.

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507209 ] 

Hudson commented on HADOOP-1283:
--------------------------------

Integrated in Hadoop-Nightly #132 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/132/])

> Eliminate internal UTF8 to String and vice versa conversions in the name-node.
> ------------------------------------------------------------------------------
>
>                 Key: HADOOP-1283
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1283
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.12.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>             Fix For: 0.14.0
>
>         Attachments: EliminateUTF8-2.patch, EliminateUTF8.patch
>
>
> We have internal conversions of those two types inside name-node code. One example:
> NameNode.complete(String src, String clientName)
> then it calls
> FSNamesystem.completeFile(new UTF8(src), new UTF8(clientName));
> which in turn finally calls
> FSDirectory.addNode(path.toString(), newNode )
> and in another place
> FSDirectory.getNode(src.toString())
> So we have several conversions of the same parameter back and forth during computation.
> We should keep the parameter type consistent within different methods.
> The question is, which type should be used: String or Text.
> From previous discussions I remember that Text is more efficient in space and time for non ASCII
> data. Here we mostly deal with file names and network addresses, which are ASCII.
> Does it make sense to use Text in this case?
> UTF8 is also used as a key in two maps: pendingCreates and leases.
> This should be replaced too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1283) Eliminate internal UTF8 to String and vice versa conversions in the name-node.

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12505894 ] 

Doug Cutting commented on HADOOP-1283:
--------------------------------------

> AFAIK UTF8 and BytesWritable serializations differ only in the type of the length field.

I think UTF8 may also use "modified UTF-8" when encoding Strings.

http://java.sun.com/javase/6/docs/api/java/io/DataInput.html#modified-utf-8

Note that, for back-compatibility, we can still read files written with UTF8 once UTF8 is gone by using DataInput, since the format is identical.  UTF8's implementation was optimized, but should be equivalent.

> Eliminate internal UTF8 to String and vice versa conversions in the name-node.
> ------------------------------------------------------------------------------
>
>                 Key: HADOOP-1283
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1283
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.12.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>         Attachments: EliminateUTF8.patch
>
>
> We have internal conversions of those two types inside name-node code. One example:
> NameNode.complete(String src, String clientName)
> then it calls
> FSNamesystem.completeFile(new UTF8(src), new UTF8(clientName));
> which in turn finally calls
> FSDirectory.addNode(path.toString(), newNode )
> and in another place
> FSDirectory.getNode(src.toString())
> So we have several conversions of the same parameter back and forth during computation.
> We should keep the parameter type consistent within different methods.
> The question is, which type should be used: String or Text.
> From previous discussions I remember that Text is more efficient in space and time for non ASCII
> data. Here we mostly deal with file names and network addresses, which are ASCII.
> Does it make sense to use Text in this case?
> UTF8 is also used as a key in two maps: pendingCreates and leases.
> This should be replaced too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.