You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Arun C Murthy (JIRA)" <ji...@apache.org> on 2009/05/08 19:22:45 UTC

[jira] Created: (HADOOP-5795) Add a bulk FIleSystem.getFileBlockLocations

Add a bulk FIleSystem.getFileBlockLocations
-------------------------------------------

                 Key: HADOOP-5795
                 URL: https://issues.apache.org/jira/browse/HADOOP-5795
             Project: Hadoop Core
          Issue Type: New Feature
          Components: dfs
    Affects Versions: 0.20.0
            Reporter: Arun C Murthy
             Fix For: 0.21.0


Currently map-reduce applications (specifically file-based input-formats) use FileSystem.getFileBlockLocations to compute splits. However they are forced to call it once per file.
The downsides are multiple:
   # Even with a few thousand files to process the number of RPCs quickly starts getting noticeable
   # The current implementation of getFileBlockLocations is too slow since each call results in 'search' in the namesystem. Assuming a few thousand input files it results in that many RPCs and 'searches'.

It would be nice to have a FileSystem.getFileBlockLocations which can take in a directory, and return the block-locations for all files in that directory. We could eliminate both the per-file RPC and also the 'search' by a 'scan'.

When I tested this for terasort, a moderate job with 8000 input files the runtime halved from the current 8s to 4s. Clearly this is much more important for latency-sensitive applications...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5795) Add a bulk FIleSystem.getFileBlockLocations

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707433#action_12707433 ] 

Doug Cutting commented on HADOOP-5795:
--------------------------------------

An alternative to passing directories might be to pass a list of files.  The request might get larger, but this is more precise, e.g., when only a subset of files in a directory will be used only that subset need be passed.  Since globbing is client-side, this requires two round trips, one to list files and one to list their blocks, but that would still be a huge improvement over per-file RPC.

> Add a bulk FIleSystem.getFileBlockLocations
> -------------------------------------------
>
>                 Key: HADOOP-5795
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5795
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.20.0
>            Reporter: Arun C Murthy
>             Fix For: 0.21.0
>
>
> Currently map-reduce applications (specifically file-based input-formats) use FileSystem.getFileBlockLocations to compute splits. However they are forced to call it once per file.
> The downsides are multiple:
>    # Even with a few thousand files to process the number of RPCs quickly starts getting noticeable
>    # The current implementation of getFileBlockLocations is too slow since each call results in 'search' in the namesystem. Assuming a few thousand input files it results in that many RPCs and 'searches'.
> It would be nice to have a FileSystem.getFileBlockLocations which can take in a directory, and return the block-locations for all files in that directory. We could eliminate both the per-file RPC and also the 'search' by a 'scan'.
> When I tested this for terasort, a moderate job with 8000 input files the runtime halved from the current 8s to 4s. Clearly this is much more important for latency-sensitive applications...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5795) Add a bulk FIleSystem.getFileBlockLocations

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707768#action_12707768 ] 

dhruba borthakur commented on HADOOP-5795:
------------------------------------------

If we adopt the approach that Doug has suggested, then the namenode still has to search for each input path in the file system namespace. This approach still has the advantage that the number of RPC calls are reduced. If we adopt Arun's proposal that specifies a directory and the RPC-call returns the splits of all the files in that directory, then it reduces the number of searches in the FS namespace as well as the number of RPC calls. I was kind-of leaning towards Arun's proposal, but Doug's approach is a little more flexible in nature, isn't it? 

> Add a bulk FIleSystem.getFileBlockLocations
> -------------------------------------------
>
>                 Key: HADOOP-5795
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5795
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.20.0
>            Reporter: Arun C Murthy
>            Assignee: Jakob Homan
>             Fix For: 0.21.0
>
>
> Currently map-reduce applications (specifically file-based input-formats) use FileSystem.getFileBlockLocations to compute splits. However they are forced to call it once per file.
> The downsides are multiple:
>    # Even with a few thousand files to process the number of RPCs quickly starts getting noticeable
>    # The current implementation of getFileBlockLocations is too slow since each call results in 'search' in the namesystem. Assuming a few thousand input files it results in that many RPCs and 'searches'.
> It would be nice to have a FileSystem.getFileBlockLocations which can take in a directory, and return the block-locations for all files in that directory. We could eliminate both the per-file RPC and also the 'search' by a 'scan'.
> When I tested this for terasort, a moderate job with 8000 input files the runtime halved from the current 8s to 4s. Clearly this is much more important for latency-sensitive applications...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5795) Add a bulk FIleSystem.getFileBlockLocations

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707445#action_12707445 ] 

Doug Cutting commented on HADOOP-5795:
--------------------------------------

How about adding something like:
  Map<FileStatus, BlockLocation[]> listBlockLocations(Path[]);
This would permit a glob-free job to get everything it needs in a single RPC, and a globbing job to do so with two RPCs.

> Add a bulk FIleSystem.getFileBlockLocations
> -------------------------------------------
>
>                 Key: HADOOP-5795
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5795
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.20.0
>            Reporter: Arun C Murthy
>             Fix For: 0.21.0
>
>
> Currently map-reduce applications (specifically file-based input-formats) use FileSystem.getFileBlockLocations to compute splits. However they are forced to call it once per file.
> The downsides are multiple:
>    # Even with a few thousand files to process the number of RPCs quickly starts getting noticeable
>    # The current implementation of getFileBlockLocations is too slow since each call results in 'search' in the namesystem. Assuming a few thousand input files it results in that many RPCs and 'searches'.
> It would be nice to have a FileSystem.getFileBlockLocations which can take in a directory, and return the block-locations for all files in that directory. We could eliminate both the per-file RPC and also the 'search' by a 'scan'.
> When I tested this for terasort, a moderate job with 8000 input files the runtime halved from the current 8s to 4s. Clearly this is much more important for latency-sensitive applications...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5795) Add a bulk FIleSystem.getFileBlockLocations

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720235#action_12720235 ] 

Doug Cutting commented on HADOOP-5795:
--------------------------------------

> Would it be better to pass in an array of BlockLocationRequests, each of which would consist of the path, start and length? 

That is more general, but what is the use case?  The motivating use case for listBlockLocations() is map reduce split construction, which typically takes a list of files as input, not a list of sections of files.  Adding a feature that won't be used will just make this new API harder to use.  -1 without a compelling use case.

> Add a bulk FIleSystem.getFileBlockLocations
> -------------------------------------------
>
>                 Key: HADOOP-5795
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5795
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.20.0
>            Reporter: Arun C Murthy
>            Assignee: Jakob Homan
>             Fix For: 0.21.0
>
>
> Currently map-reduce applications (specifically file-based input-formats) use FileSystem.getFileBlockLocations to compute splits. However they are forced to call it once per file.
> The downsides are multiple:
>    # Even with a few thousand files to process the number of RPCs quickly starts getting noticeable
>    # The current implementation of getFileBlockLocations is too slow since each call results in 'search' in the namesystem. Assuming a few thousand input files it results in that many RPCs and 'searches'.
> It would be nice to have a FileSystem.getFileBlockLocations which can take in a directory, and return the block-locations for all files in that directory. We could eliminate both the per-file RPC and also the 'search' by a 'scan'.
> When I tested this for terasort, a moderate job with 8000 input files the runtime halved from the current 8s to 4s. Clearly this is much more important for latency-sensitive applications...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5795) Add a bulk FIleSystem.getFileBlockLocations

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708251#action_12708251 ] 

Doug Cutting commented on HADOOP-5795:
--------------------------------------

> Doug, can you please confirm?

Yes, I had assumed that any directories in the request would be expanded.  The goal is to have something we can call from FileInputFormat, which takes a list of patterns.  When the patterns contain no wildcards, we should be able to create splits with a single RPC to the NameNode.  So the semantics should match those of FileInputFormat in this case.

> Add a bulk FIleSystem.getFileBlockLocations
> -------------------------------------------
>
>                 Key: HADOOP-5795
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5795
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.20.0
>            Reporter: Arun C Murthy
>            Assignee: Jakob Homan
>             Fix For: 0.21.0
>
>
> Currently map-reduce applications (specifically file-based input-formats) use FileSystem.getFileBlockLocations to compute splits. However they are forced to call it once per file.
> The downsides are multiple:
>    # Even with a few thousand files to process the number of RPCs quickly starts getting noticeable
>    # The current implementation of getFileBlockLocations is too slow since each call results in 'search' in the namesystem. Assuming a few thousand input files it results in that many RPCs and 'searches'.
> It would be nice to have a FileSystem.getFileBlockLocations which can take in a directory, and return the block-locations for all files in that directory. We could eliminate both the per-file RPC and also the 'search' by a 'scan'.
> When I tested this for terasort, a moderate job with 8000 input files the runtime halved from the current 8s to 4s. Clearly this is much more important for latency-sensitive applications...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5795) Add a bulk FIleSystem.getFileBlockLocations

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720318#action_12720318 ] 

dhruba borthakur commented on HADOOP-5795:
------------------------------------------

I think the extended version of the API would help in doing incremental distcp when hdfs-append is supported. We  use "distcp -update" to do an incremental copy of files that have changed in length, but having this proposed extended API (and more) allows distcp to copy only changed portions of a file. 



> Add a bulk FIleSystem.getFileBlockLocations
> -------------------------------------------
>
>                 Key: HADOOP-5795
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5795
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.20.0
>            Reporter: Arun C Murthy
>            Assignee: Jakob Homan
>             Fix For: 0.21.0
>
>
> Currently map-reduce applications (specifically file-based input-formats) use FileSystem.getFileBlockLocations to compute splits. However they are forced to call it once per file.
> The downsides are multiple:
>    # Even with a few thousand files to process the number of RPCs quickly starts getting noticeable
>    # The current implementation of getFileBlockLocations is too slow since each call results in 'search' in the namesystem. Assuming a few thousand input files it results in that many RPCs and 'searches'.
> It would be nice to have a FileSystem.getFileBlockLocations which can take in a directory, and return the block-locations for all files in that directory. We could eliminate both the per-file RPC and also the 'search' by a 'scan'.
> When I tested this for terasort, a moderate job with 8000 input files the runtime halved from the current 8s to 4s. Clearly this is much more important for latency-sensitive applications...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HADOOP-5795) Add a bulk FIleSystem.getFileBlockLocations

Posted by "Jakob Homan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jakob Homan reassigned HADOOP-5795:
-----------------------------------

    Assignee: Jakob Homan

> Add a bulk FIleSystem.getFileBlockLocations
> -------------------------------------------
>
>                 Key: HADOOP-5795
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5795
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.20.0
>            Reporter: Arun C Murthy
>            Assignee: Jakob Homan
>             Fix For: 0.21.0
>
>
> Currently map-reduce applications (specifically file-based input-formats) use FileSystem.getFileBlockLocations to compute splits. However they are forced to call it once per file.
> The downsides are multiple:
>    # Even with a few thousand files to process the number of RPCs quickly starts getting noticeable
>    # The current implementation of getFileBlockLocations is too slow since each call results in 'search' in the namesystem. Assuming a few thousand input files it results in that many RPCs and 'searches'.
> It would be nice to have a FileSystem.getFileBlockLocations which can take in a directory, and return the block-locations for all files in that directory. We could eliminate both the per-file RPC and also the 'search' by a 'scan'.
> When I tested this for terasort, a moderate job with 8000 input files the runtime halved from the current 8s to 4s. Clearly this is much more important for latency-sensitive applications...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5795) Add a bulk FIleSystem.getFileBlockLocations

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715295#action_12715295 ] 

Doug Cutting commented on HADOOP-5795:
--------------------------------------

> But not sure why it returns a map 

It could perhaps instead return an array of two-element structs, each containing a filestatis/blocklocations pair, but a Map seems simpler.

> How does a user figure out which were valid and which were invalid/non-existent/empty paths?

Non-existent paths should be ignored.  Paths whose URIs are for different filesystems or are somehow unparseable should cause an exception.

> Add a bulk FIleSystem.getFileBlockLocations
> -------------------------------------------
>
>                 Key: HADOOP-5795
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5795
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.20.0
>            Reporter: Arun C Murthy
>            Assignee: Jakob Homan
>             Fix For: 0.21.0
>
>
> Currently map-reduce applications (specifically file-based input-formats) use FileSystem.getFileBlockLocations to compute splits. However they are forced to call it once per file.
> The downsides are multiple:
>    # Even with a few thousand files to process the number of RPCs quickly starts getting noticeable
>    # The current implementation of getFileBlockLocations is too slow since each call results in 'search' in the namesystem. Assuming a few thousand input files it results in that many RPCs and 'searches'.
> It would be nice to have a FileSystem.getFileBlockLocations which can take in a directory, and return the block-locations for all files in that directory. We could eliminate both the per-file RPC and also the 'search' by a 'scan'.
> When I tested this for terasort, a moderate job with 8000 input files the runtime halved from the current 8s to 4s. Clearly this is much more important for latency-sensitive applications...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5795) Add a bulk FIleSystem.getFileBlockLocations

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708282#action_12708282 ] 

Konstantin Shvachko commented on HADOOP-5795:
---------------------------------------------

> (non-recursively?)

I think the rpc call itself should not be recursive. It is like with ls: the getListing() call is non-recursive, but the client recursively calls getListing() on sub-directories.
The idea is to prevent people from making a mistake to call getBlockLocation("/") on large directory trees recursively, which may freeze the name-node for a long period of time.
Non-recursive variant should be sufficient to cover Arun's use case.

> Add a bulk FIleSystem.getFileBlockLocations
> -------------------------------------------
>
>                 Key: HADOOP-5795
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5795
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.20.0
>            Reporter: Arun C Murthy
>            Assignee: Jakob Homan
>             Fix For: 0.21.0
>
>
> Currently map-reduce applications (specifically file-based input-formats) use FileSystem.getFileBlockLocations to compute splits. However they are forced to call it once per file.
> The downsides are multiple:
>    # Even with a few thousand files to process the number of RPCs quickly starts getting noticeable
>    # The current implementation of getFileBlockLocations is too slow since each call results in 'search' in the namesystem. Assuming a few thousand input files it results in that many RPCs and 'searches'.
> It would be nice to have a FileSystem.getFileBlockLocations which can take in a directory, and return the block-locations for all files in that directory. We could eliminate both the per-file RPC and also the 'search' by a 'scan'.
> When I tested this for terasort, a moderate job with 8000 input files the runtime halved from the current 8s to 4s. Clearly this is much more important for latency-sensitive applications...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5795) Add a bulk FIleSystem.getFileBlockLocations

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707446#action_12707446 ] 

Arun C Murthy commented on HADOOP-5795:
---------------------------------------

bq. Map<FileStatus, BlockLocation[]> listBlockLocations(Path[]);

+1

> Add a bulk FIleSystem.getFileBlockLocations
> -------------------------------------------
>
>                 Key: HADOOP-5795
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5795
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.20.0
>            Reporter: Arun C Murthy
>             Fix For: 0.21.0
>
>
> Currently map-reduce applications (specifically file-based input-formats) use FileSystem.getFileBlockLocations to compute splits. However they are forced to call it once per file.
> The downsides are multiple:
>    # Even with a few thousand files to process the number of RPCs quickly starts getting noticeable
>    # The current implementation of getFileBlockLocations is too slow since each call results in 'search' in the namesystem. Assuming a few thousand input files it results in that many RPCs and 'searches'.
> It would be nice to have a FileSystem.getFileBlockLocations which can take in a directory, and return the block-locations for all files in that directory. We could eliminate both the per-file RPC and also the 'search' by a 'scan'.
> When I tested this for terasort, a moderate job with 8000 input files the runtime halved from the current 8s to 4s. Clearly this is much more important for latency-sensitive applications...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5795) Add a bulk FIleSystem.getFileBlockLocations

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714244#action_12714244 ] 

dhruba borthakur commented on HADOOP-5795:
------------------------------------------

> Is there any other interface that resembles this?

The only thing that comes relatively close to this one is the READDIRPLUS operation in NFS. This call is more like getFileStatusBulk() for HDFS.

> Add a bulk FIleSystem.getFileBlockLocations
> -------------------------------------------
>
>                 Key: HADOOP-5795
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5795
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.20.0
>            Reporter: Arun C Murthy
>            Assignee: Jakob Homan
>             Fix For: 0.21.0
>
>
> Currently map-reduce applications (specifically file-based input-formats) use FileSystem.getFileBlockLocations to compute splits. However they are forced to call it once per file.
> The downsides are multiple:
>    # Even with a few thousand files to process the number of RPCs quickly starts getting noticeable
>    # The current implementation of getFileBlockLocations is too slow since each call results in 'search' in the namesystem. Assuming a few thousand input files it results in that many RPCs and 'searches'.
> It would be nice to have a FileSystem.getFileBlockLocations which can take in a directory, and return the block-locations for all files in that directory. We could eliminate both the per-file RPC and also the 'search' by a 'scan'.
> When I tested this for terasort, a moderate job with 8000 input files the runtime halved from the current 8s to 4s. Clearly this is much more important for latency-sensitive applications...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5795) Add a bulk FIleSystem.getFileBlockLocations

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707460#action_12707460 ] 

Doug Cutting commented on HADOOP-5795:
--------------------------------------

> Is there a reason to keep current per file getBlockLocations() if we had a more generic method?

Not that I can think of.  +1 for replacing it.


> Add a bulk FIleSystem.getFileBlockLocations
> -------------------------------------------
>
>                 Key: HADOOP-5795
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5795
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.20.0
>            Reporter: Arun C Murthy
>             Fix For: 0.21.0
>
>
> Currently map-reduce applications (specifically file-based input-formats) use FileSystem.getFileBlockLocations to compute splits. However they are forced to call it once per file.
> The downsides are multiple:
>    # Even with a few thousand files to process the number of RPCs quickly starts getting noticeable
>    # The current implementation of getFileBlockLocations is too slow since each call results in 'search' in the namesystem. Assuming a few thousand input files it results in that many RPCs and 'searches'.
> It would be nice to have a FileSystem.getFileBlockLocations which can take in a directory, and return the block-locations for all files in that directory. We could eliminate both the per-file RPC and also the 'search' by a 'scan'.
> When I tested this for terasort, a moderate job with 8000 input files the runtime halved from the current 8s to 4s. Clearly this is much more important for latency-sensitive applications...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5795) Add a bulk FIleSystem.getFileBlockLocations

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720299#action_12720299 ] 

Doug Cutting commented on HADOOP-5795:
--------------------------------------

I meant, what is the use case for passing in start/end positions per file?  I support the idea of a bulk call, but don't see the need to pass start/end positions per file.

> Add a bulk FIleSystem.getFileBlockLocations
> -------------------------------------------
>
>                 Key: HADOOP-5795
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5795
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.20.0
>            Reporter: Arun C Murthy
>            Assignee: Jakob Homan
>             Fix For: 0.21.0
>
>
> Currently map-reduce applications (specifically file-based input-formats) use FileSystem.getFileBlockLocations to compute splits. However they are forced to call it once per file.
> The downsides are multiple:
>    # Even with a few thousand files to process the number of RPCs quickly starts getting noticeable
>    # The current implementation of getFileBlockLocations is too slow since each call results in 'search' in the namesystem. Assuming a few thousand input files it results in that many RPCs and 'searches'.
> It would be nice to have a FileSystem.getFileBlockLocations which can take in a directory, and return the block-locations for all files in that directory. We could eliminate both the per-file RPC and also the 'search' by a 'scan'.
> When I tested this for terasort, a moderate job with 8000 input files the runtime halved from the current 8s to 4s. Clearly this is much more important for latency-sensitive applications...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5795) Add a bulk FIleSystem.getFileBlockLocations

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719929#action_12719929 ] 

dhruba borthakur commented on HADOOP-5795:
------------------------------------------

> pass in an array of BlockLocationRequests, each of which would consist of the path, start and length

+1. This sounds better than assuming that we need to send back all blocks for the specified path(s)..

> Add a bulk FIleSystem.getFileBlockLocations
> -------------------------------------------
>
>                 Key: HADOOP-5795
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5795
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.20.0
>            Reporter: Arun C Murthy
>            Assignee: Jakob Homan
>             Fix For: 0.21.0
>
>
> Currently map-reduce applications (specifically file-based input-formats) use FileSystem.getFileBlockLocations to compute splits. However they are forced to call it once per file.
> The downsides are multiple:
>    # Even with a few thousand files to process the number of RPCs quickly starts getting noticeable
>    # The current implementation of getFileBlockLocations is too slow since each call results in 'search' in the namesystem. Assuming a few thousand input files it results in that many RPCs and 'searches'.
> It would be nice to have a FileSystem.getFileBlockLocations which can take in a directory, and return the block-locations for all files in that directory. We could eliminate both the per-file RPC and also the 'search' by a 'scan'.
> When I tested this for terasort, a moderate job with 8000 input files the runtime halved from the current 8s to 4s. Clearly this is much more important for latency-sensitive applications...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5795) Add a bulk FIleSystem.getFileBlockLocations

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720459#action_12720459 ] 

Arun C Murthy commented on HADOOP-5795:
---------------------------------------

Quick note: making the length mandatory (i.e. part of the api) has the unfortunate side-effect of forcing a stat on each file apriori to the call to listBlockLocations. So, from a Map-Reduce perspective it is important to have an api which does not force InputFormats to pass in the lengths.

OTOH if we really need the more general version of the api I'd like to pass in "-1" to imply the whole file. 

> Add a bulk FIleSystem.getFileBlockLocations
> -------------------------------------------
>
>                 Key: HADOOP-5795
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5795
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.20.0
>            Reporter: Arun C Murthy
>            Assignee: Jakob Homan
>             Fix For: 0.21.0
>
>
> Currently map-reduce applications (specifically file-based input-formats) use FileSystem.getFileBlockLocations to compute splits. However they are forced to call it once per file.
> The downsides are multiple:
>    # Even with a few thousand files to process the number of RPCs quickly starts getting noticeable
>    # The current implementation of getFileBlockLocations is too slow since each call results in 'search' in the namesystem. Assuming a few thousand input files it results in that many RPCs and 'searches'.
> It would be nice to have a FileSystem.getFileBlockLocations which can take in a directory, and return the block-locations for all files in that directory. We could eliminate both the per-file RPC and also the 'search' by a 'scan'.
> When I tested this for terasort, a moderate job with 8000 input files the runtime halved from the current 8s to 4s. Clearly this is much more important for latency-sensitive applications...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5795) Add a bulk FIleSystem.getFileBlockLocations

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714189#action_12714189 ] 

Raghu Angadi commented on HADOOP-5795:
--------------------------------------


I see why the interface takes array or paths. But not sure why it returns a map (not that there is anything wrong it). This is probably the only RPC returning a map in Hadoop. 

How does a user figure out which were valid and which were invalid/non-existent/empty paths? May be user does not care? 

'getBlockLocations()' returns the blocks (sort of) sorted w.r.t client. Should this interface do that too? M/R use case does not need that sorted.

Is there any other interface that resembles this?

> Add a bulk FIleSystem.getFileBlockLocations
> -------------------------------------------
>
>                 Key: HADOOP-5795
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5795
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.20.0
>            Reporter: Arun C Murthy
>            Assignee: Jakob Homan
>             Fix For: 0.21.0
>
>
> Currently map-reduce applications (specifically file-based input-formats) use FileSystem.getFileBlockLocations to compute splits. However they are forced to call it once per file.
> The downsides are multiple:
>    # Even with a few thousand files to process the number of RPCs quickly starts getting noticeable
>    # The current implementation of getFileBlockLocations is too slow since each call results in 'search' in the namesystem. Assuming a few thousand input files it results in that many RPCs and 'searches'.
> It would be nice to have a FileSystem.getFileBlockLocations which can take in a directory, and return the block-locations for all files in that directory. We could eliminate both the per-file RPC and also the 'search' by a 'scan'.
> When I tested this for terasort, a moderate job with 8000 input files the runtime halved from the current 8s to 4s. Clearly this is much more important for latency-sensitive applications...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5795) Add a bulk FIleSystem.getFileBlockLocations

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708595#action_12708595 ] 

Doug Cutting commented on HADOOP-5795:
--------------------------------------

Dhruba: yes, that sounds right to me.

A further clarification: should subdirectories be included, with empty block lists, or elided?  My hunch is to eliminate them, so that every FileStatus returned is for a plain file--no directories.  Does that sound right to others?

> Add a bulk FIleSystem.getFileBlockLocations
> -------------------------------------------
>
>                 Key: HADOOP-5795
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5795
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.20.0
>            Reporter: Arun C Murthy
>            Assignee: Jakob Homan
>             Fix For: 0.21.0
>
>
> Currently map-reduce applications (specifically file-based input-formats) use FileSystem.getFileBlockLocations to compute splits. However they are forced to call it once per file.
> The downsides are multiple:
>    # Even with a few thousand files to process the number of RPCs quickly starts getting noticeable
>    # The current implementation of getFileBlockLocations is too slow since each call results in 'search' in the namesystem. Assuming a few thousand input files it results in that many RPCs and 'searches'.
> It would be nice to have a FileSystem.getFileBlockLocations which can take in a directory, and return the block-locations for all files in that directory. We could eliminate both the per-file RPC and also the 'search' by a 'scan'.
> When I tested this for terasort, a moderate job with 8000 input files the runtime halved from the current 8s to 4s. Clearly this is much more important for latency-sensitive applications...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5795) Add a bulk FIleSystem.getFileBlockLocations

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708132#action_12708132 ] 

Arun C Murthy commented on HADOOP-5795:
---------------------------------------

Dhruba, I was thinking it was implict in Doug's proposal that if one of the paths in the Path[] is a directory, then the new api would return block-locations of all its' children  (non-recursively?) which would satisfy the original requirement. Doug, can you please confirm?


> Add a bulk FIleSystem.getFileBlockLocations
> -------------------------------------------
>
>                 Key: HADOOP-5795
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5795
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.20.0
>            Reporter: Arun C Murthy
>            Assignee: Jakob Homan
>             Fix For: 0.21.0
>
>
> Currently map-reduce applications (specifically file-based input-formats) use FileSystem.getFileBlockLocations to compute splits. However they are forced to call it once per file.
> The downsides are multiple:
>    # Even with a few thousand files to process the number of RPCs quickly starts getting noticeable
>    # The current implementation of getFileBlockLocations is too slow since each call results in 'search' in the namesystem. Assuming a few thousand input files it results in that many RPCs and 'searches'.
> It would be nice to have a FileSystem.getFileBlockLocations which can take in a directory, and return the block-locations for all files in that directory. We could eliminate both the per-file RPC and also the 'search' by a 'scan'.
> When I tested this for terasort, a moderate job with 8000 input files the runtime halved from the current 8s to 4s. Clearly this is much more important for latency-sensitive applications...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5795) Add a bulk FIleSystem.getFileBlockLocations

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720835#action_12720835 ] 

Doug Cutting commented on HADOOP-5795:
--------------------------------------

> I think the extended version of the API would help in doing incremental distcp when hdfs-append is supported.

Thanks for the use case!  An append-savvy incremental distcp might first use listStatus to get all file lengths and dates from both filesystems, then figure out which had grown longer but whose creation dates had not changed, indicating they'd been appended to.  Then a batch call could be made to fetch block locations of just newly appended sections, and these would be used to construct splits that can be localized well.  Does that sound right?

In this case we would not list directories, but rather always pass in a list of individual files.  The mapping from inputs to outputs would be 1:1 so it could take the form:

BlockLocation[] getBlockLocations(BlockLocationRequest[])

A corollary is that it does not make sense to pass start/end positions for a directory, although these could be ignored.

Do we want to try to develop a single swiss-army-knife batch call, or add operation-optimized calls as we go?

> Add a bulk FIleSystem.getFileBlockLocations
> -------------------------------------------
>
>                 Key: HADOOP-5795
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5795
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.20.0
>            Reporter: Arun C Murthy
>            Assignee: Jakob Homan
>             Fix For: 0.21.0
>
>
> Currently map-reduce applications (specifically file-based input-formats) use FileSystem.getFileBlockLocations to compute splits. However they are forced to call it once per file.
> The downsides are multiple:
>    # Even with a few thousand files to process the number of RPCs quickly starts getting noticeable
>    # The current implementation of getFileBlockLocations is too slow since each call results in 'search' in the namesystem. Assuming a few thousand input files it results in that many RPCs and 'searches'.
> It would be nice to have a FileSystem.getFileBlockLocations which can take in a directory, and return the block-locations for all files in that directory. We could eliminate both the per-file RPC and also the 'search' by a 'scan'.
> When I tested this for terasort, a moderate job with 8000 input files the runtime halved from the current 8s to 4s. Clearly this is much more important for latency-sensitive applications...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5795) Add a bulk FIleSystem.getFileBlockLocations

Posted by "Jakob Homan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720254#action_12720254 ] 

Jakob Homan commented on HADOOP-5795:
-------------------------------------

bq. That is more general, but what is the use case?
The original motivation was Arun and Owen noticing during the terasort work that there were a large number of rpc calls were made during the task scheduling and that a bulk method could ameliorate that.  That seems reasonable to me.  I'll let Arun lobby further.

One point that came up in discussions is that it would be a good idea to have a maximum number of files that can be returned at once in order to not overwhelm the namenode.  Whether this is hard-coded or configurable was not decided.

> Add a bulk FIleSystem.getFileBlockLocations
> -------------------------------------------
>
>                 Key: HADOOP-5795
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5795
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.20.0
>            Reporter: Arun C Murthy
>            Assignee: Jakob Homan
>             Fix For: 0.21.0
>
>
> Currently map-reduce applications (specifically file-based input-formats) use FileSystem.getFileBlockLocations to compute splits. However they are forced to call it once per file.
> The downsides are multiple:
>    # Even with a few thousand files to process the number of RPCs quickly starts getting noticeable
>    # The current implementation of getFileBlockLocations is too slow since each call results in 'search' in the namesystem. Assuming a few thousand input files it results in that many RPCs and 'searches'.
> It would be nice to have a FileSystem.getFileBlockLocations which can take in a directory, and return the block-locations for all files in that directory. We could eliminate both the per-file RPC and also the 'search' by a 'scan'.
> When I tested this for terasort, a moderate job with 8000 input files the runtime halved from the current 8s to 4s. Clearly this is much more important for latency-sensitive applications...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5795) Add a bulk FIleSystem.getFileBlockLocations

Posted by "Jakob Homan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719880#action_12719880 ] 

Jakob Homan commented on HADOOP-5795:
-------------------------------------

The current API implementation of getBlockLocations includes parameters for the byte offset within the file and the number of bytes within the files for which to return blocks. These parameters aren't provided currently in the specification for the new API.  Would it be better to pass in an array of BlockLocationRequests, each of which would consist of the path, start and length?  

The other option would be to add the start and length specifications and for them to apply to each of the paths within the array, which doesn't seem particularly useful.

> Add a bulk FIleSystem.getFileBlockLocations
> -------------------------------------------
>
>                 Key: HADOOP-5795
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5795
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.20.0
>            Reporter: Arun C Murthy
>            Assignee: Jakob Homan
>             Fix For: 0.21.0
>
>
> Currently map-reduce applications (specifically file-based input-formats) use FileSystem.getFileBlockLocations to compute splits. However they are forced to call it once per file.
> The downsides are multiple:
>    # Even with a few thousand files to process the number of RPCs quickly starts getting noticeable
>    # The current implementation of getFileBlockLocations is too slow since each call results in 'search' in the namesystem. Assuming a few thousand input files it results in that many RPCs and 'searches'.
> It would be nice to have a FileSystem.getFileBlockLocations which can take in a directory, and return the block-locations for all files in that directory. We could eliminate both the per-file RPC and also the 'search' by a 'scan'.
> When I tested this for terasort, a moderate job with 8000 input files the runtime halved from the current 8s to 4s. Clearly this is much more important for latency-sensitive applications...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5795) Add a bulk FIleSystem.getFileBlockLocations

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707456#action_12707456 ] 

Konstantin Shvachko commented on HADOOP-5795:
---------------------------------------------

Currently {{getBlockLocations(src, offset, length)}} returns a class called  {{LocatedBlocks}}, which contains a list of {{LocatedBlock}} belonging to the file.
{code}
public class LocatedBlocks implements Writable {
  private long fileLength;
  private List<LocatedBlock> blocks; // array of blocks with prioritized locations
}
{code}

The question is whether we should modify {{LocatedBlocks}}, which would include the map proposed by Doug and extend the semantics of  {{getBlockLocations()}} to handle directories, or should we introduce a new method (rpc) {{getBlockLocations(srcDir)}} returning {{LocatedBlockMap}}.
Is there a reason to keep current per file {{getBlockLocations()}} if we had a more generic method?

> Add a bulk FIleSystem.getFileBlockLocations
> -------------------------------------------
>
>                 Key: HADOOP-5795
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5795
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.20.0
>            Reporter: Arun C Murthy
>             Fix For: 0.21.0
>
>
> Currently map-reduce applications (specifically file-based input-formats) use FileSystem.getFileBlockLocations to compute splits. However they are forced to call it once per file.
> The downsides are multiple:
>    # Even with a few thousand files to process the number of RPCs quickly starts getting noticeable
>    # The current implementation of getFileBlockLocations is too slow since each call results in 'search' in the namesystem. Assuming a few thousand input files it results in that many RPCs and 'searches'.
> It would be nice to have a FileSystem.getFileBlockLocations which can take in a directory, and return the block-locations for all files in that directory. We could eliminate both the per-file RPC and also the 'search' by a 'scan'.
> When I tested this for terasort, a moderate job with 8000 input files the runtime halved from the current 8s to 4s. Clearly this is much more important for latency-sensitive applications...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5795) Add a bulk FIleSystem.getFileBlockLocations

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708654#action_12708654 ] 

dhruba borthakur commented on HADOOP-5795:
------------------------------------------

> every FileStatus returned is for a plain file--no directories

Sounds good to me. 

> Add a bulk FIleSystem.getFileBlockLocations
> -------------------------------------------
>
>                 Key: HADOOP-5795
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5795
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.20.0
>            Reporter: Arun C Murthy
>            Assignee: Jakob Homan
>             Fix For: 0.21.0
>
>
> Currently map-reduce applications (specifically file-based input-formats) use FileSystem.getFileBlockLocations to compute splits. However they are forced to call it once per file.
> The downsides are multiple:
>    # Even with a few thousand files to process the number of RPCs quickly starts getting noticeable
>    # The current implementation of getFileBlockLocations is too slow since each call results in 'search' in the namesystem. Assuming a few thousand input files it results in that many RPCs and 'searches'.
> It would be nice to have a FileSystem.getFileBlockLocations which can take in a directory, and return the block-locations for all files in that directory. We could eliminate both the per-file RPC and also the 'search' by a 'scan'.
> When I tested this for terasort, a moderate job with 8000 input files the runtime halved from the current 8s to 4s. Clearly this is much more important for latency-sensitive applications...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5795) Add a bulk FIleSystem.getFileBlockLocations

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708433#action_12708433 ] 

dhruba borthakur commented on HADOOP-5795:
------------------------------------------

Ok, so from what I can understand, here is the proposal:

Map<FileStatus, BlockLocation[]> listBlockLocations(Path[] inputPaths);

The "inputPaths" can be a set of files and/or directories. If one of the inputPaths is a directory, then all items inside that directory (only one level, not recursive) are listed and their block locations are returned by this call. if one of the inputPaths is a file, then its block locations are returned by this call.  The FileStatus returned by this call should have the absolulte path of the object being returned. 

> Add a bulk FIleSystem.getFileBlockLocations
> -------------------------------------------
>
>                 Key: HADOOP-5795
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5795
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.20.0
>            Reporter: Arun C Murthy
>            Assignee: Jakob Homan
>             Fix For: 0.21.0
>
>
> Currently map-reduce applications (specifically file-based input-formats) use FileSystem.getFileBlockLocations to compute splits. However they are forced to call it once per file.
> The downsides are multiple:
>    # Even with a few thousand files to process the number of RPCs quickly starts getting noticeable
>    # The current implementation of getFileBlockLocations is too slow since each call results in 'search' in the namesystem. Assuming a few thousand input files it results in that many RPCs and 'searches'.
> It would be nice to have a FileSystem.getFileBlockLocations which can take in a directory, and return the block-locations for all files in that directory. We could eliminate both the per-file RPC and also the 'search' by a 'scan'.
> When I tested this for terasort, a moderate job with 8000 input files the runtime halved from the current 8s to 4s. Clearly this is much more important for latency-sensitive applications...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.