You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Aled Jones <Al...@comtec-europe.co.uk> on 2007/03/05 23:15:58 UTC

unsubscribe

 

> -----Original Message-----
> From: Tom White (JIRA) [mailto:jira@apache.org] 
> Sent: 05 March 2007 22:01
> To: hadoop-dev@lucene.apache.org
> Subject: [jira] Commented: (HADOOP-1061) S3 listSubPaths bug
> 
> 
>     [ 
> https://issues.apache.org/jira/browse/HADOOP-1061?page=com.atl
assian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_1247817
4 ] 
> 
> Tom White commented on HADOOP-1061:
> -----------------------------------
> 
> S3 files need to be written with a version number of the 
> client library that wrote them (as suggested in HADOOP-930). 
> If we did this now, then we can detect when there is a 
> mismatch and fail fast (and informatively). While it would be 
> possible (but tricky) to support both versions, I don't feel 
> that we should do that since there is a workaround for data 
> migration: copy your S3 data from the old file system to a 
> local file or HDFS (on EC2 preferably, but this isn't 
> necessary) using the old version of Hadoop, then copy it back 
> to a new S3 file system using a new version of Hadoop. I'd be 
> happy to write this.
> 
> (I'm not saying that version-aware code will never be needed, 
> just that it isn't yet since not that many people are using 
> this feature yet.)
> 
> Thoughts?
> 
> > S3 listSubPaths bug
> > -------------------
> >
> >                 Key: HADOOP-1061
> >                 URL: 
> https://issues.apache.org/jira/browse/HADOOP-1061
> >             Project: Hadoop
> >          Issue Type: Bug
> >          Components: fs
> >    Affects Versions: 0.11.2, 0.12.0
> >            Reporter: Mike Smith
> >            Priority: Critical
> >         Attachments: 1061-hadoop.patch
> >
> >
> > I had problem with the -ls command in s3 file system. It 
> was returning inconsistence number of "Found Items" if you 
> rerun it different times and more importantly it returns 
> recursive results (depth 1) for some folders. 
> > I looked into the code, the problem is caused by jets3t 
> library. The inconsistency problem will be solved if we use :
> > S3Object[] objects = s3Service.listObjects(bucket, prefix, 
> > PATH_DELIMITER); instead of S3Object[] objects = 
> > s3Service.listObjects(bucket, prefix, PATH_DELIMITER , 0); in 
> > listSubPaths of Jets3tFileSystemStore class (line 227)! 
> This change will let GET REST request to have a "max-key" 
> paramter with default value of 1000! It seems s3 GET request 
> is sensetive to this paramater!
> > But, the recursive problem is because the GET  request 
> doesn't execute the delimiter constraint correctly. The 
> response contains all the keys with the given prefix but they 
> don't stop at the path_delimiter. You can simply test this by 
> making couple folder on hadoop s3 filesystem and run -ls. I 
> followed the generated GET request and it looks all fine but 
> it is not executed correctly at the s3 server side.I still 
> don't know why the response doesn't stop at the path_delimiter. 
> > Possible casue: Jets3t library does URL encoding, why do we 
> need to do URL encoding in Jets3tFileSystemStore class!?
> > example:
> > Original path is   /user/root/folder  and it will be 
> encoded to %2Fuser%2Froot%2Ffolder is Jets3tFileSystemStore 
> class. Then, Jets3t will reencode this to make the REST 
> request. And it will be rewritten as 
> %252Fuser%252Froot%252Ffolder, so the the generated folder on 
> the S3 will be %2Fuser%2Froot%2Ffolder after decoding at the 
> amazon side. Wouldn't be better to skip the encoding part on 
> Hadoop. This strange structure might be the reason that the 
> s3 doesn't stop at the path_delimiter. 
> 
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
> 
> 
###########################################

This message has been scanned by F-Secure Anti-Virus for Microsoft Exchange.
For more information, connect to http://www.f-secure.com/

************************************************************************
This e-mail and any attachments are strictly confidential and intended solely for the addressee. They may contain information which is covered by legal, professional or other privilege. If you are not the intended addressee, you must not copy the e-mail or the attachments, or use them for any purpose or disclose their contents to any other person. To do so may be unlawful. If you have received this transmission in error, please notify us as soon as possible and delete the message and attachments from all places in your computer where they are stored. 

Although we have scanned this e-mail and any attachments for viruses, it is your responsibility to ensure that they are actually virus free.