You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Stefan Bodewig <bo...@apache.org> on 2010/05/14 09:27:36 UTC

Re: [COMPRESS] Add getArchiveType() method to [Archive|Compressor]InputStream classes?

On 2010-05-12, sebb <se...@gmail.com> wrote:

> Can we restart this?

> Compress currently has Archiver and Compressor InputStreamFactory
> classes which have the following signatures:

> public ArchiveInputStream createArchiveInputStream(
>             final String archiverName, final InputStream in)

>  public CompressorInputStream createCompressorInputStream(final String name,
>             final InputStream in)

> The archiverName or name parameters are used to determine which stream
> class to create, and should use one of the provided public constants.

> I just want to provide the inverse conversion, i.e. tag the created
> class with the key that was used to create it, so that this can be
> easily determined later e.g. if the autodetect mode is used.

In theory the inverse is not unique - the same implementation could be
used for several names.  It may become true in practice if we'd choose
to use different names for variants of archive types like pax, posixtar
and gnutar for tar.

I think it depends on what you want the method that returns the archive
type/name (s) to reflect.

 * The archive names you could pass into the create methods to obtain an
   instance of the same stream class?  Then the method actually should be
   inside the factory rather than the stream class since only the
   factory can know that for sure.

 * The format of the current stream?  In that case there can't be any
   reasonable default (other than asking the factory and potentially
   getting back a list of names rather than a single one).

> It might also be useful for the classes to provide access to their
> associated mime type(s) and "usual" file extension(s). These should
> both be read-only Lists.

OK.

This isn't anything the factory could do (unless we broaden its
contract) nor the base class could provide a reasonable default for.

> If we really want to distinguish different implementations of the same
> archive type, then maybe we can add a "fullName" or "version" field,
> as is done for the JSR-223 (javax..script) API. However I cannot see
> why that would be necessary.

I'm on the fence here.

> I think the next steps are to decide:

> 1) Is it OK to add this to the API? If so:

See above, it depends on what the method is supposed to return IMHO.

> 2) What method name should be used?

ask the Ant community and they'll tell you I'm not good at chosing
names.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [COMPRESS] Add getArchiveType() method to [Archive|Compressor]InputStream classes?

Posted by Stefan Bodewig <bo...@apache.org>.
So for I don't think we are really in any sort of disagreement, so we
should be able to cut this thread soon 8-)

On 2010-05-14, sebb <se...@gmail.com> wrote:

> On 14/05/2010, Stefan Bodewig <bo...@apache.org> wrote:
>> On 2010-05-14, sebb <se...@gmail.com> wrote:

>>> On 14/05/2010, Stefan Bodewig <bo...@apache.org> wrote:
>>>> On 2010-05-12, sebb <se...@gmail.com> wrote:

>>>>> Compress currently has Archiver and Compressor InputStreamFactory
>>>>> classes which have the following signatures:

>>>>> public ArchiveInputStream createArchiveInputStream(
>>>>>             final String archiverName, final InputStream in)

>>>>>  public CompressorInputStream createCompressorInputStream(final String name,
>>>>>             final InputStream in)

>>>>> The archiverName or name parameters are used to determine which stream
>>>>> class to create, and should use one of the provided public constants.

>>>>> I just want to provide the inverse conversion, i.e. tag the created
>>>>> class with the key that was used to create it, so that this can be
>>>>> easily determined later e.g. if the autodetect mode is used.

>>>>  I think it depends on what you want the method that returns the archive
>>>>  type/name (s) to reflect.

>>>>   * The archive names you could pass into the create methods to obtain an
>>>>    instance of the same stream class?

In this case I stick with my opinion that it should be part of the
factory and it should return a list for each implementation.  OTOH I
don't think this is the information you really won't but rather:

>>>>   * The format of the current stream?

In this case the method should be in ArchiveInputStream and should be
abstract IMHO.

>>  Only the implementation knows the real format it has detected.

> At present, the implementations *don't* actually know, as they all
> process format differences on the fly -. they don't check that the
> same format is being used throught the file and each record may have
> different settings so long as these are self-consistent.
> This is perhaps a bug.

Not sure about cpio but for it is rather common to encode to different
formats within the same archive.  The same would be true for zip/zip64
where an entry that fits into the old header format would use it any
only files that require zip64 features use the new format.

In this case there really isn't *the* format of a stream if we wanted to
split it further.  And we'd likely end up returning "tar" or "zip" from
the method you want anyway.  So my example of different format dialects
turns out to be a red herring.

> Only the factory currently knows which version of a format is being
> used, because the matching is performed in the factory code.

... by delegating to various signature sniffing methods in the stream
classes ...

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [COMPRESS] Add getArchiveType() method to [Archive|Compressor]InputStream classes?

Posted by sebb <se...@gmail.com>.
On 14/05/2010, Stefan Bodewig <bo...@apache.org> wrote:
> On 2010-05-14, sebb <se...@gmail.com> wrote:
>
>  > On 14/05/2010, Stefan Bodewig <bo...@apache.org> wrote:
>  >> On 2010-05-12, sebb <se...@gmail.com> wrote:
>
>  >>> Can we restart this?
>
>  >>> Compress currently has Archiver and Compressor InputStreamFactory
>  >>> classes which have the following signatures:
>
>  >>> public ArchiveInputStream createArchiveInputStream(
>  >>>             final String archiverName, final InputStream in)
>
>  >>>  public CompressorInputStream createCompressorInputStream(final String name,
>  >>>             final InputStream in)
>
>  >>> The archiverName or name parameters are used to determine which stream
>  >>> class to create, and should use one of the provided public constants.
>
>  >>> I just want to provide the inverse conversion, i.e. tag the created
>  >>> class with the key that was used to create it, so that this can be
>  >>> easily determined later e.g. if the autodetect mode is used.
>
>  >> In theory the inverse is not unique - the same implementation could be
>  >>  used for several names.  It may become true in practice if we'd choose
>  >>  to use different names for variants of archive types like pax, posixtar
>  >>  and gnutar for tar.
>
>  >>  I think it depends on what you want the method that returns the archive
>  >>  type/name (s) to reflect.
>
>  >>   * The archive names you could pass into the create methods to obtain an
>  >>    instance of the same stream class?  Then the method actually should be
>  >>    inside the factory rather than the stream class since only the
>  >>    factory can know that for sure.
>
>  > Yes, but the factory can pass in the name to the created stream.
>
>
> What if I don't create the stream via the factory?

Then the default name is used.

>  To make this work all implementations would need a new constructor with
>  an additional argument - that wouldn't make any sense for users who
>  bypass the factory.

Only for classes that support multiple formats, but then the ctor
needs to be told which it is handling anyway.

>  If this is the purpose of the method I'd prefer to see it inside the
>  factory, some getArchiverNames(Class) or
>  getArchiverNames(ArchiveInputStream) that returned an immutable list of
>  strings.
>
>  We could still put a method into ArchiveInputStream that queried the
>  factory passing in itself.
>
>
>  >>   * The format of the current stream?  In that case there can't be any
>  >>    reasonable default (other than asking the factory and potentially
>  >>    getting back a list of names rather than a single one).
>
>  > That would not be necessary if every IS that is created is given its
>  > name via the ctor.
>
>
> Assume one we have multiple names for one implementation.  For example
>  old_ascii, old_binary, new_portable and new_portable_crc for cpio since
>  these really are different formats.  If we use autodetection - which is
>  why you want the method in the first place IIUC - then which name do you
>  expect the factory to pass into said constructor?

cpio + old_ascii etc (see below).

>  Only the implementation knows the real format it has detected.

At present, the implementations *don't* actually know, as they all
process format differences on the fly -. they don't check that the
same format is being used throught the file and each record may have
different settings so long as these are self-consistent.
This is perhaps a bug.

Only the factory currently knows which version of a format is being
used, because the matching is performed in the factory code. And that
information is currently lost once the input stream has been created.

If we do wish the IS classes to validate the format, then the classes
will need to be created accordingly, and can therefore return their
specific format.

But I think a class that processes various versions of cpio should
return "cpio" as its fundamental type; it may also return "old_ascii"
etc as the version of the type.

>  So if this is the purpose of the new method then it has to live inside
>  the implementation and I don't see which default implementation could
>  make sense.

Apart from Zip/Jar, all the current classes only support a single type.

>  >>> It might also be useful for the classes to provide access to their
>  >>> associated mime type(s) and "usual" file extension(s). These should
>  >>> both be read-only Lists.
>
>  >> OK.
>
>  >>  This isn't anything the factory could do (unless we broaden its
>  >>  contract) nor the base class could provide a reasonable default for.
>
>  > The base class could define an abstract method.
>  > The only possible defaults would be empty Lists.
>  > These would only be useful in the sense that subclasses would not need
>  > to implement the methods.
>
>
> That's what I was trying to say - the default of an empty list is not
>  reasonable ("it simply doesn't make sense" in case I'm just using the
>  wrong English word for what I mean).

We could define empty list to mean "unknown".

>
>  Stefan
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>  For additional commands, e-mail: dev-help@commons.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [COMPRESS] Add getArchiveType() method to [Archive|Compressor]InputStream classes?

Posted by Stefan Bodewig <bo...@apache.org>.
On 2010-05-14, sebb <se...@gmail.com> wrote:

> On 14/05/2010, Stefan Bodewig <bo...@apache.org> wrote:
>> On 2010-05-12, sebb <se...@gmail.com> wrote:

>>> Can we restart this?

>>> Compress currently has Archiver and Compressor InputStreamFactory
>>> classes which have the following signatures:

>>> public ArchiveInputStream createArchiveInputStream(
>>>             final String archiverName, final InputStream in)

>>>  public CompressorInputStream createCompressorInputStream(final String name,
>>>             final InputStream in)

>>> The archiverName or name parameters are used to determine which stream
>>> class to create, and should use one of the provided public constants.

>>> I just want to provide the inverse conversion, i.e. tag the created
>>> class with the key that was used to create it, so that this can be
>>> easily determined later e.g. if the autodetect mode is used.

>> In theory the inverse is not unique - the same implementation could be
>>  used for several names.  It may become true in practice if we'd choose
>>  to use different names for variants of archive types like pax, posixtar
>>  and gnutar for tar.

>>  I think it depends on what you want the method that returns the archive
>>  type/name (s) to reflect.

>>   * The archive names you could pass into the create methods to obtain an
>>    instance of the same stream class?  Then the method actually should be
>>    inside the factory rather than the stream class since only the
>>    factory can know that for sure.

> Yes, but the factory can pass in the name to the created stream.

What if I don't create the stream via the factory?

To make this work all implementations would need a new constructor with
an additional argument - that wouldn't make any sense for users who
bypass the factory.

If this is the purpose of the method I'd prefer to see it inside the
factory, some getArchiverNames(Class) or
getArchiverNames(ArchiveInputStream) that returned an immutable list of
strings.

We could still put a method into ArchiveInputStream that queried the
factory passing in itself.

>>   * The format of the current stream?  In that case there can't be any
>>    reasonable default (other than asking the factory and potentially
>>    getting back a list of names rather than a single one).

> That would not be necessary if every IS that is created is given its
> name via the ctor.

Assume one we have multiple names for one implementation.  For example
old_ascii, old_binary, new_portable and new_portable_crc for cpio since
these really are different formats.  If we use autodetection - which is
why you want the method in the first place IIUC - then which name do you
expect the factory to pass into said constructor?

Only the implementation knows the real format it has detected.

So if this is the purpose of the new method then it has to live inside
the implementation and I don't see which default implementation could
make sense.

>>> It might also be useful for the classes to provide access to their
>>> associated mime type(s) and "usual" file extension(s). These should
>>> both be read-only Lists.

>> OK.

>>  This isn't anything the factory could do (unless we broaden its
>>  contract) nor the base class could provide a reasonable default for.

> The base class could define an abstract method.
> The only possible defaults would be empty Lists.
> These would only be useful in the sense that subclasses would not need
> to implement the methods.

That's what I was trying to say - the default of an empty list is not
reasonable ("it simply doesn't make sense" in case I'm just using the
wrong English word for what I mean).

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [COMPRESS] Add getArchiveType() method to [Archive|Compressor]InputStream classes?

Posted by sebb <se...@gmail.com>.
On 14/05/2010, Stefan Bodewig <bo...@apache.org> wrote:
> On 2010-05-12, sebb <se...@gmail.com> wrote:
>
>  > Can we restart this?
>
>  > Compress currently has Archiver and Compressor InputStreamFactory
>  > classes which have the following signatures:
>
>  > public ArchiveInputStream createArchiveInputStream(
>  >             final String archiverName, final InputStream in)
>
>  >  public CompressorInputStream createCompressorInputStream(final String name,
>  >             final InputStream in)
>
>  > The archiverName or name parameters are used to determine which stream
>  > class to create, and should use one of the provided public constants.
>
>  > I just want to provide the inverse conversion, i.e. tag the created
>  > class with the key that was used to create it, so that this can be
>  > easily determined later e.g. if the autodetect mode is used.
>
>
> In theory the inverse is not unique - the same implementation could be
>  used for several names.  It may become true in practice if we'd choose
>  to use different names for variants of archive types like pax, posixtar
>  and gnutar for tar.
>
>  I think it depends on what you want the method that returns the archive
>  type/name (s) to reflect.
>
>   * The archive names you could pass into the create methods to obtain an
>    instance of the same stream class?  Then the method actually should be
>    inside the factory rather than the stream class since only the
>    factory can know that for sure.

Yes, but the factory can pass in the name to the created stream.

>   * The format of the current stream?  In that case there can't be any
>    reasonable default (other than asking the factory and potentially
>    getting back a list of names rather than a single one).

That would not be necessary if every IS that is created is given its
name via the ctor.

>
>  > It might also be useful for the classes to provide access to their
>  > associated mime type(s) and "usual" file extension(s). These should
>  > both be read-only Lists.
>
>
> OK.
>
>  This isn't anything the factory could do (unless we broaden its
>  contract) nor the base class could provide a reasonable default for.

The base class could define an abstract method.
The only possible defaults would be empty Lists.
These would only be useful in the sense that subclasses would not need
to implement the methods.

>
>  > If we really want to distinguish different implementations of the same
>  > archive type, then maybe we can add a "fullName" or "version" field,
>  > as is done for the JSR-223 (javax..script) API. However I cannot see
>  > why that would be necessary.
>
>
> I'm on the fence here.
>
>
>  > I think the next steps are to decide:
>
>  > 1) Is it OK to add this to the API? If so:
>
>
> See above, it depends on what the method is supposed to return IMHO.
>
>
>  > 2) What method name should be used?
>
>
> ask the Ant community and they'll tell you I'm not good at chosing
>  names.
>
>
>  Stefan
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>  For additional commands, e-mail: dev-help@commons.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org