You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Woo Ju Shin (JIRA)" <ji...@apache.org> on 2013/01/03 04:48:12 UTC
[jira] [Updated] (COMPRESS-212) TarArchiveEntry getName() returns
wrongly encoded name even when you set encoding to TarArchiveInputStream
[ https://issues.apache.org/jira/browse/COMPRESS-212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Woo Ju Shin updated COMPRESS-212:
---------------------------------
Description:
I have two file systems. One is Red Hat Linux, the other is MS Windows.
I created a *.tgz file in Red Hat Linux and tried to decompress it in MS Windows using Commons Compress.
The default system encoding are different. UTF-8 in Red Hat Linux and CP949 in MS Windows.
It seems that the file name encoding follows the default encoding even though when I use the following to untar it.
FileInputStream fis = new FileInputStream(new File(*.tgz));
TarArchiveInputStream zis = new TarArchiveInputStream(new BufferedInputStream(fis),encodingOfRedHatLinux);
while ((entry = (TarArchiveEntry)zis.getNextEntry()) != null)
{
entry.getName(); // filename is not UTF-8 it is encoded in CP949 and so the filename isn't consistent
}
By referring to this
/**
* Constructor for TarInputStream.
* @param is the input stream to use
* @param encoding name of the encoding to use for file names
* @since Commons Compress 1.4
*/
public TarArchiveInputStream(InputStream is, String encoding) {
this(is, TarBuffer.DEFAULT_BLKSIZE, TarBuffer.DEFAULT_RCDSIZE, encoding);
}
encoding should be used for file names.
But actually this doesn't seem to work.
was:
I have two file systems. One is Red Hat Linux, one is MS Windows.
I created a *.tgz file in Red Hat Linux and tried to decompress it in MS Windows using Commons Compress.
The default system encoding are different. UTF-8 in Red Hat Linux and CP949 in MS Windows.
It seems that the file name encoding follows the default encoding even though when I use the following to untar it.
FileInputStream fis = new FileInputStream(new File(*.tgz));
TarArchiveInputStream zis = new TarArchiveInputStream(new BufferedInputStream(fis),encodingOfRedHatLinux);
while ((entry = (TarArchiveEntry)zis.getNextEntry()) != null)
{
entry.getName(); // filename is not UTF-8 it is encoded in CP949 and so the filename isn't consistent
}
By referring to this
/**
* Constructor for TarInputStream.
* @param is the input stream to use
* @param encoding name of the encoding to use for file names
* @since Commons Compress 1.4
*/
public TarArchiveInputStream(InputStream is, String encoding) {
this(is, TarBuffer.DEFAULT_BLKSIZE, TarBuffer.DEFAULT_RCDSIZE, encoding);
}
encoding should be used for file names.
But actually this doesn't seem to work.
> TarArchiveEntry getName() returns wrongly encoded name even when you set encoding to TarArchiveInputStream
> ----------------------------------------------------------------------------------------------------------
>
> Key: COMPRESS-212
> URL: https://issues.apache.org/jira/browse/COMPRESS-212
> Project: Commons Compress
> Issue Type: Bug
> Affects Versions: 1.4.1
> Environment: Red Hat Enterprise Linux, MS Windows 7
> Reporter: Woo Ju Shin
> Priority: Minor
>
> I have two file systems. One is Red Hat Linux, the other is MS Windows.
> I created a *.tgz file in Red Hat Linux and tried to decompress it in MS Windows using Commons Compress.
> The default system encoding are different. UTF-8 in Red Hat Linux and CP949 in MS Windows.
> It seems that the file name encoding follows the default encoding even though when I use the following to untar it.
> FileInputStream fis = new FileInputStream(new File(*.tgz));
> TarArchiveInputStream zis = new TarArchiveInputStream(new BufferedInputStream(fis),encodingOfRedHatLinux);
> while ((entry = (TarArchiveEntry)zis.getNextEntry()) != null)
> {
> entry.getName(); // filename is not UTF-8 it is encoded in CP949 and so the filename isn't consistent
> }
> By referring to this
> /**
> * Constructor for TarInputStream.
> * @param is the input stream to use
> * @param encoding name of the encoding to use for file names
> * @since Commons Compress 1.4
> */
> public TarArchiveInputStream(InputStream is, String encoding) {
> this(is, TarBuffer.DEFAULT_BLKSIZE, TarBuffer.DEFAULT_RCDSIZE, encoding);
> }
> encoding should be used for file names.
> But actually this doesn't seem to work.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira