You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by "Ravuri, Venkata Puneet" <vr...@ea.com> on 2014/10/31 11:23:41 UTC

Seek behavior difference between NativeS3FsInputStream and DFSInputStream

Hi,

I noticed a difference in behavior while seeking a given file present in S3 using NativeS3FileSystem$NativeS3FsInputStream and the file present in HDFS using DFSInputStream.

If we seek to the end of the file incase of NativeS3FsInputStream, it fails with exception "java.io.EOFException: Attempted to seek or read past the end of the file".
That is because a getObject request is issued on the S3 object with range start as value of length of file.

The end of file case is being handled safely in DFSInputStream.
Shouldn't NativeS3FsInputStream also have similar checks?

This issue is causing errors when Hive is trying to read S3 files.
Please advise.


Thanks and Regards,
Puneet

Re: Seek behavior difference between NativeS3FsInputStream and DFSInputStream

Posted by Ravi Prakash <ra...@ymail.com>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Venkata!

Please feel free to open a JIRA and upload a patch. You might also try
the new s3a implementation (instead of s3n), but there's a chance that
the behavior will be the same.

Cheers
Ravi


On 10/31/14 03:23, Ravuri, Venkata Puneet wrote:
> Hi,
>
> I noticed a difference in behavior while seeking a given file present
in S3 using NativeS3FileSystem$NativeS3FsInputStream and the file
present in HDFS using DFSInputStream.
>
> If we seek to the end of the file incase of NativeS3FsInputStream, it
fails with exception "java.io.EOFException: Attempted to seek or read
past the end of the file".
> That is because a getObject request is issued on the S3 object with
range start as value of length of file.
>
> The end of file case is being handled safely in DFSInputStream.
> Shouldn't NativeS3FsInputStream also have similar checks?
>
> This issue is causing errors when Hive is trying to read S3 files.
> Please advise.
>
>
> Thanks and Regards,
> Puneet
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAEBAgAGBQJUVyFiAAoJEGunL/HJl4XeUC0QALP6qRMf1I4CeM5hMsPfWg3n
4aZNC36XM6IrxMCNDTGwUOsYCl/O0D0q537AbZLPg0LqbeC/FYq+4zkU/z4X9hWH
BTd7UJJ9KtxD4hnFsBMxh0es8p8VfY6CrbPt13vyWukoJvMgBEHSg5QlIEYXE2SA
dvvXnjYyYCi88M5YpAsUSAteUHkCoY7iw3OjhNd3ipYyr2raCqdo1Tg6guXfRER/
hsPOjSvZYSFmUy10x94ZR5/a9/9Ar1Tm/VtNgLpGK1cG/ElK17lWLzsUKgFF5EnU
FWmXfqadvZMdKJupoqx50hebx0OwXQD+qBulaxSvEIu5LM2PmeZZIPLawoRmDj5R
4APvIMfUthEZDQc7Fe1IOHcQr9gmnVFFX9I4Ao9OZnRTdjkh2xg9wN5qePVCrmI2
69frsb61nGp6emqdUNb7Ov3GtEDZdRfl3vpfFWf3XUtExhxQZwIFAbspRTzAYnrS
D9HGbRLqklKc/+VnvUEhoh+x8BcTEz4vvnNimmKbnTLJLMusTUM5VtzWawo5xHvR
bdfqSnvXlZra6nU54r1D+e5K+Jo3xWzq0y3z+3RbQp0Y9Aqct0KH31pphypdh+z2
qSSHy5/1FxKyK5rSot0mHKZ+WTa9H1ZYM5+52CbccnoS8yraUZjFkXaWPKjw0gGx
9CXbhmMrKD8Ap2DqPMgz
=bqce
-----END PGP SIGNATURE-----


Re: Seek behavior difference between NativeS3FsInputStream and DFSInputStream

Posted by Ravi Prakash <ra...@ymail.com>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Venkata!

Please feel free to open a JIRA and upload a patch. You might also try
the new s3a implementation (instead of s3n), but there's a chance that
the behavior will be the same.

Cheers
Ravi


On 10/31/14 03:23, Ravuri, Venkata Puneet wrote:
> Hi,
>
> I noticed a difference in behavior while seeking a given file present
in S3 using NativeS3FileSystem$NativeS3FsInputStream and the file
present in HDFS using DFSInputStream.
>
> If we seek to the end of the file incase of NativeS3FsInputStream, it
fails with exception "java.io.EOFException: Attempted to seek or read
past the end of the file".
> That is because a getObject request is issued on the S3 object with
range start as value of length of file.
>
> The end of file case is being handled safely in DFSInputStream.
> Shouldn't NativeS3FsInputStream also have similar checks?
>
> This issue is causing errors when Hive is trying to read S3 files.
> Please advise.
>
>
> Thanks and Regards,
> Puneet
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAEBAgAGBQJUVyFiAAoJEGunL/HJl4XeUC0QALP6qRMf1I4CeM5hMsPfWg3n
4aZNC36XM6IrxMCNDTGwUOsYCl/O0D0q537AbZLPg0LqbeC/FYq+4zkU/z4X9hWH
BTd7UJJ9KtxD4hnFsBMxh0es8p8VfY6CrbPt13vyWukoJvMgBEHSg5QlIEYXE2SA
dvvXnjYyYCi88M5YpAsUSAteUHkCoY7iw3OjhNd3ipYyr2raCqdo1Tg6guXfRER/
hsPOjSvZYSFmUy10x94ZR5/a9/9Ar1Tm/VtNgLpGK1cG/ElK17lWLzsUKgFF5EnU
FWmXfqadvZMdKJupoqx50hebx0OwXQD+qBulaxSvEIu5LM2PmeZZIPLawoRmDj5R
4APvIMfUthEZDQc7Fe1IOHcQr9gmnVFFX9I4Ao9OZnRTdjkh2xg9wN5qePVCrmI2
69frsb61nGp6emqdUNb7Ov3GtEDZdRfl3vpfFWf3XUtExhxQZwIFAbspRTzAYnrS
D9HGbRLqklKc/+VnvUEhoh+x8BcTEz4vvnNimmKbnTLJLMusTUM5VtzWawo5xHvR
bdfqSnvXlZra6nU54r1D+e5K+Jo3xWzq0y3z+3RbQp0Y9Aqct0KH31pphypdh+z2
qSSHy5/1FxKyK5rSot0mHKZ+WTa9H1ZYM5+52CbccnoS8yraUZjFkXaWPKjw0gGx
9CXbhmMrKD8Ap2DqPMgz
=bqce
-----END PGP SIGNATURE-----


Re: Seek behavior difference between NativeS3FsInputStream and DFSInputStream

Posted by Ravi Prakash <ra...@ymail.com>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Venkata!

Please feel free to open a JIRA and upload a patch. You might also try
the new s3a implementation (instead of s3n), but there's a chance that
the behavior will be the same.

Cheers
Ravi


On 10/31/14 03:23, Ravuri, Venkata Puneet wrote:
> Hi,
>
> I noticed a difference in behavior while seeking a given file present
in S3 using NativeS3FileSystem$NativeS3FsInputStream and the file
present in HDFS using DFSInputStream.
>
> If we seek to the end of the file incase of NativeS3FsInputStream, it
fails with exception "java.io.EOFException: Attempted to seek or read
past the end of the file".
> That is because a getObject request is issued on the S3 object with
range start as value of length of file.
>
> The end of file case is being handled safely in DFSInputStream.
> Shouldn't NativeS3FsInputStream also have similar checks?
>
> This issue is causing errors when Hive is trying to read S3 files.
> Please advise.
>
>
> Thanks and Regards,
> Puneet
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAEBAgAGBQJUVyFiAAoJEGunL/HJl4XeUC0QALP6qRMf1I4CeM5hMsPfWg3n
4aZNC36XM6IrxMCNDTGwUOsYCl/O0D0q537AbZLPg0LqbeC/FYq+4zkU/z4X9hWH
BTd7UJJ9KtxD4hnFsBMxh0es8p8VfY6CrbPt13vyWukoJvMgBEHSg5QlIEYXE2SA
dvvXnjYyYCi88M5YpAsUSAteUHkCoY7iw3OjhNd3ipYyr2raCqdo1Tg6guXfRER/
hsPOjSvZYSFmUy10x94ZR5/a9/9Ar1Tm/VtNgLpGK1cG/ElK17lWLzsUKgFF5EnU
FWmXfqadvZMdKJupoqx50hebx0OwXQD+qBulaxSvEIu5LM2PmeZZIPLawoRmDj5R
4APvIMfUthEZDQc7Fe1IOHcQr9gmnVFFX9I4Ao9OZnRTdjkh2xg9wN5qePVCrmI2
69frsb61nGp6emqdUNb7Ov3GtEDZdRfl3vpfFWf3XUtExhxQZwIFAbspRTzAYnrS
D9HGbRLqklKc/+VnvUEhoh+x8BcTEz4vvnNimmKbnTLJLMusTUM5VtzWawo5xHvR
bdfqSnvXlZra6nU54r1D+e5K+Jo3xWzq0y3z+3RbQp0Y9Aqct0KH31pphypdh+z2
qSSHy5/1FxKyK5rSot0mHKZ+WTa9H1ZYM5+52CbccnoS8yraUZjFkXaWPKjw0gGx
9CXbhmMrKD8Ap2DqPMgz
=bqce
-----END PGP SIGNATURE-----


Re: Seek behavior difference between NativeS3FsInputStream and DFSInputStream

Posted by Ravi Prakash <ra...@ymail.com>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Venkata!

Please feel free to open a JIRA and upload a patch. You might also try
the new s3a implementation (instead of s3n), but there's a chance that
the behavior will be the same.

Cheers
Ravi


On 10/31/14 03:23, Ravuri, Venkata Puneet wrote:
> Hi,
>
> I noticed a difference in behavior while seeking a given file present
in S3 using NativeS3FileSystem$NativeS3FsInputStream and the file
present in HDFS using DFSInputStream.
>
> If we seek to the end of the file incase of NativeS3FsInputStream, it
fails with exception "java.io.EOFException: Attempted to seek or read
past the end of the file".
> That is because a getObject request is issued on the S3 object with
range start as value of length of file.
>
> The end of file case is being handled safely in DFSInputStream.
> Shouldn't NativeS3FsInputStream also have similar checks?
>
> This issue is causing errors when Hive is trying to read S3 files.
> Please advise.
>
>
> Thanks and Regards,
> Puneet
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAEBAgAGBQJUVyFiAAoJEGunL/HJl4XeUC0QALP6qRMf1I4CeM5hMsPfWg3n
4aZNC36XM6IrxMCNDTGwUOsYCl/O0D0q537AbZLPg0LqbeC/FYq+4zkU/z4X9hWH
BTd7UJJ9KtxD4hnFsBMxh0es8p8VfY6CrbPt13vyWukoJvMgBEHSg5QlIEYXE2SA
dvvXnjYyYCi88M5YpAsUSAteUHkCoY7iw3OjhNd3ipYyr2raCqdo1Tg6guXfRER/
hsPOjSvZYSFmUy10x94ZR5/a9/9Ar1Tm/VtNgLpGK1cG/ElK17lWLzsUKgFF5EnU
FWmXfqadvZMdKJupoqx50hebx0OwXQD+qBulaxSvEIu5LM2PmeZZIPLawoRmDj5R
4APvIMfUthEZDQc7Fe1IOHcQr9gmnVFFX9I4Ao9OZnRTdjkh2xg9wN5qePVCrmI2
69frsb61nGp6emqdUNb7Ov3GtEDZdRfl3vpfFWf3XUtExhxQZwIFAbspRTzAYnrS
D9HGbRLqklKc/+VnvUEhoh+x8BcTEz4vvnNimmKbnTLJLMusTUM5VtzWawo5xHvR
bdfqSnvXlZra6nU54r1D+e5K+Jo3xWzq0y3z+3RbQp0Y9Aqct0KH31pphypdh+z2
qSSHy5/1FxKyK5rSot0mHKZ+WTa9H1ZYM5+52CbccnoS8yraUZjFkXaWPKjw0gGx
9CXbhmMrKD8Ap2DqPMgz
=bqce
-----END PGP SIGNATURE-----