You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Pete Wyckoff (JIRA)" <ji...@apache.org> on 2008/07/18 01:51:31 UTC

[jira] Created: (HADOOP-3784) Cleanup optimization of reads and change it to a flag and remove #ifdefs

Cleanup optimization of reads and change it to a flag and remove #ifdefs
------------------------------------------------------------------------

                 Key: HADOOP-3784
                 URL: https://issues.apache.org/jira/browse/HADOOP-3784
             Project: Hadoop Core
          Issue Type: Improvement
            Reporter: Pete Wyckoff


Looks like optimized reads work so let's make them part of the regular core of code.  But, should allow a flag and custom sized buffer.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3784) Cleanup optimization of reads and change it to a flag and remove #ifdefs

Posted by "Pete Wyckoff (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pete Wyckoff updated HADOOP-3784:
---------------------------------

    Component/s: contrib/fuse-dfs

> Cleanup optimization of reads and change it to a flag and remove #ifdefs
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3784
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3784
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/fuse-dfs
>            Reporter: Pete Wyckoff
>
> Looks like optimized reads work so let's make them part of the regular core of code.  But, should allow a flag and custom sized buffer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3784) Cleanup optimization of reads and change it to a flag and remove #ifdefs

Posted by "Pete Wyckoff (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12635863#action_12635863 ] 

Pete Wyckoff commented on HADOOP-3784:
--------------------------------------

This is my cleaned up version of dfs_read. I renamed the variables to be more clear (craig may not like it :)).


{code}
static int dfs_read(const char *path, char *buf, size_t size, off_t offset,                                                                                                
                    struct fuse_file_info *fi)                                                                                                                             
{                                                                                                                                                                          
                                                                                                                                                                           
  // retrieve dfs specific data                                                                                                                                            
  dfs_context *dfs = (dfs_context*)fuse_get_context()->private_data;                                                                                                       
                                                                                                                                                                           
  // check params and the context var                                                                                                                                      
  assert(dfs);                                                                                                                                                             
  assert(path);                                                                                                                                                            
  assert(buf);                                                                                                                                                             
  assert(offset >= 0);                                                                                                                                                     
  assert(size >= 0);                                                                                                                                                       
                                                                                                                                                                           
  dfs_fh *fh = (dfs_fh*)fi->fh;                                                                                                                                            
                                                                                                                                                                           
  if (size > dfs->rdbuffer_size && ! dfs->direct_io) {                                                                                                                     
    if (fh->buf != NULL) {                                                                                                                                                 
      free(fh->buf);                                                                                                                                                       
    }                                                                                                                                                                      
    if ((fh->buf = (char*)malloc(size * sizeof (char))) == NULL) {                                                                                                         
      syslog(LOG_ERR, "ERROR: could not allocate memory for file buffer for a read for file %s dfs %s:%d\n", path,__FILE__, __LINE__);                                     
      return -EIO;                                                                                                                                                         
    }                                                                                                                                                                      
    fh->bufferSize = 0;                                                                                                                                                    
  }                                                                                                                                                                        
                                                                                                                                                                           
  assert(fh->bufferSize >= 0);                                                                                                                                             
                                                                                                                                                                           
  // check if the buffer is empty or                                                                                                                                       
  // the read starts before the buffer starts or                                                                                                                           
  // the read ends after the buffer ends                                                                                                                                   
                                                                                                                                                                           
  if (fh->bufferSize == 0  ||                                                                                                                                              
      offset < fh->buffersStartOffset ||                                                                                                                                   
      offset + size > fh->buffersStartOffset + fh->bufferSize)                                                                                                             
    {                                                                                                                                                                      
      // Read into the buffer from DFS                                                                                                                                     
                                                                                                                                                                           
      assert(dfs->rdbuffer_size > 0);                                                                                                                                      
                                                                                                                                                                           
      size_t num_read = 0;                                                                                                                                                 
      off_t tmp_offset = offset;                                                                                                                                           
      size_t cur_left = dfs->rdbuffer_size;                                                                                                                                
      char *cur_ptr = fh->buf;                                                                                                                                             
                                                                                                                                                                           
      while(cur_left > 0 && (num_read = hdfsPread(fh->fs, fh->hdfsFH, tmp_offset, cur_ptr, cur_left)) > 0) {                                                               
        cur_ptr += num_read;                                                                                                                                               
        cur_left -= num_read;                                                                                                                                              
      }                                                                                                                                                                    
      if (num_read < 0) {                                                                                                                                                  
        syslog(LOG_ERR, "Read error - pread failed for %s with return code %d %s:%d", path, (int)num_read, __FILE__, __LINE__);                                            
        return -EIO;                                                                                                                                                       
      }                                                                                                                                                                    
      fh->bufferSize = dfs->rdbuffer_size - cur_left;                                                                                                                      
      fh->buffersStartOffset = offset;                                                                                                                                     
    }                                                                                                                                                                      
  assert(offset >= fh->buffersStartOffset && offset + size <  fh->buffersStartOffset + fh->bufferSize);                                                                    
                                                                                                                                                                           
  const size_t bufferReadIndex = offset - fh->buffersStartOffset;                                                                                                          
  assert(bufferReadIndex >= 0 && bufferReadIndex < fh->bufferSize);                                                                                                        
                                                                                                                                                                           
  const size_t amount = min(fh->buffersStartOffset + fh->bufferSize - offset, size);                                                                                       
  assert(amount >= 0 && amount <= fh->bufferSize);                                                                                                                         
                                                                                                                                                                           
  const char *offsetPtr = fh->buf + bufferReadIndex;                                                                                                                       
  assert(offsetPtr >= fh->buf);                                                                                                                                            
  assert(offsetPtr + amount <= fh->buf + fh->bufferSize);                                                                                                                  
                                                                                                                                                                           
  memcpy(buf, offsetPtr, amount);                                                                                                                                          
  return amount;                                                                                                                                                           
}                                           

{code}

> Cleanup optimization of reads and change it to a flag and remove #ifdefs
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3784
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3784
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/fuse-dfs
>            Reporter: Pete Wyckoff
>
> Looks like optimized reads work so let's make them part of the regular core of code.  But, should allow a flag and custom sized buffer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HADOOP-3784) Cleanup optimization of reads and change it to a flag and remove #ifdefs

Posted by "Pete Wyckoff (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pete Wyckoff resolved HADOOP-3784.
----------------------------------

    Resolution: Invalid

this is a re-factoring jira and is superseded by a #of others that required re-writes of dfs_read.


> Cleanup optimization of reads and change it to a flag and remove #ifdefs
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3784
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3784
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/fuse-dfs
>            Reporter: Pete Wyckoff
>
> Looks like optimized reads work so let's make them part of the regular core of code.  But, should allow a flag and custom sized buffer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3784) Cleanup optimization of reads and change it to a flag and remove #ifdefs

Posted by "Pete Wyckoff (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12635914#action_12635914 ] 

Pete Wyckoff commented on HADOOP-3784:
--------------------------------------

correct implementation of reads
{code}
/**                                                                                                                                                                        
 * dfs_read                                                                                                                                                                
 *                                                                                                                                                                         
 * Reads from dfs or the open file's buffer.  Note that fuse requires that                                                                                                 
 * either the entire read be satisfied or the EOF is hit or direct_io is enabled                                                                                           
 *                                                                                                                                                                         
 */                                                                                                                                                                        
static int dfs_read(const char *path, char *buf, size_t size, off_t offset,                                                                                                
                    struct fuse_file_info *fi)                                                                                                                             
{                                                                                                                                                                          
                                                                                                                                                                           
  // retrieve dfs specific data                                                                                                                                            
  dfs_context *dfs = (dfs_context*)fuse_get_context()->private_data;                                                                                                       
                                                                                                                                                                           
  // check params and the context var                                                                                                                                      
  assert(dfs);                                                                                                                                                             
  assert(path);                                                                                                                                                            
  assert(buf);                                                                                                                                                             
  assert(offset >= 0);                                                                                                                                                     
  assert(size >= 0);                                                                                                                                                       
                                                                                                                                                                           
  dfs_fh *fh = (dfs_fh*)fi->fh;                                                                                                                                            
                                                                                                                                                                           
  assert(fh->bufferSize >= 0);                                                                                                                                             
                                                                                                                                                                           
  // check if the buffer is empty or                                                                                                                                       
  // the read starts before the buffer starts or                                                                                                                           
  // the read ends after the buffer ends                                                                                                                                   
                                                                                                                                                                           
  if (fh->bufferSize == 0  ||                                                                                                                                              
      offset < fh->buffersStartOffset ||                                                                                                                                   
      offset + size > fh->buffersStartOffset + fh->bufferSize)                                                                                                             
    {                                                                                                                                                                      
      // Read into the buffer from DFS                                                                                                                                     
                                                                                                                                                                           
      size_t num_read = 0;                                                                                                                                                 
      size_t total_read = 0;                                                                                                                                               
                                                                                                                                                                           
      // if the size is bigger than the read buffer, then use the passed in buffer                                                                                         
      const char *buf_ptr = size >= dfs->rdbuffer_size ? buf : fh->buf;                                                                                                    
      size_t cur_left = size >= dfs->rdbuffer_size ? size : dfs->rdbuffer_size;                                                                                            
                                                                                                                                                                           
      while(cur_left > 0 && (num_read = hdfsPread(fh->fs, fh->hdfsFH, offset + total_read, buf_ptr + total_read, cur_left)) > 0) {                                         
        cur_left -= num_read;                                                                                                                                              
        total_read += num_read;                                                                                                                                            
      }                                                                                                                                                                    
                                                                                                                                                                           
      if (num_read < 0) {                                                                                                                                                  
        // invalidate the buffer                                                                                                                                           
        fh->bufferSize = 0;                                                                                                                                                
        syslog(LOG_ERR, "Read error - pread failed for %s with return code %d %s:%d", path, (int)num_read, __FILE__, __LINE__);                                            
        return -EIO;                                                                                                                                                       
      }                                                                                                                                                                    
                                                                                                                                                                           
      if(size >= dfs->rdbuffer_size) {                                                                                                                                     
        // we read into the passed in buffer, so no need to do anything else                                                                                               
        return total_read;                                                                                                                                                 
      }                                                                                                                                                                    
                                                                                                                                                                           
      fh->bufferSize = total_read;                                                                                                                                         
      fh->buffersStartOffset = offset;                                                                                                                                     
    }                                                                                                                                                                      
  assert(offset >= fh->buffersStartOffset);                                                                                                                                
                                                                                                                                                                           
  const size_t bufferReadIndex = offset - fh->buffersStartOffset;                                                                                                          
  assert(bufferReadIndex >= 0 && bufferReadIndex < fh->bufferSize);                                                                                                        
                                                                                                                                                                           
  const size_t amount = min(fh->buffersStartOffset + fh->bufferSize - offset, size);                                                                                       
  assert(amount >= 0 && amount <= fh->bufferSize);                                                                                                                         
                                                                                                                                                                           
  const char *offsetPtr = fh->buf + bufferReadIndex;                                                                                                                       
  assert(offsetPtr >= fh->buf);                                                                                                                                            
  assert(offsetPtr + amount <= fh->buf + fh->bufferSize);                                                                                                                  
                                                                                                                                                                           
  memcpy(buf, offsetPtr, amount);                                                                                                                                          
  return amount;                                                                                                                                                           
}                             
{code}

> Cleanup optimization of reads and change it to a flag and remove #ifdefs
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3784
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3784
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/fuse-dfs
>            Reporter: Pete Wyckoff
>
> Looks like optimized reads work so let's make them part of the regular core of code.  But, should allow a flag and custom sized buffer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.