You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Jukka Zitting <ju...@gmail.com> on 2007/11/12 00:56:45 UTC

Tika v0.1 thoughts

Hi,

I don't think we'll have the 0.1 release out for ApacheCon (it's
probably even procedurally too late already), but it would still be
nice to target for a release in a relatively near future. I think
we're already at a point where quite a few people would find a frozen
snapshot of Tika useful (even if the API still isn't stable).

There are a number of API and implementation improvements I have in
mind (I'll try to offload them to Jira), but generally I'm reasonably
happy with the current state. The main thing I'm worried about is
packaging (and documentation, but that's not so important yet).

Are we happy with releasing Tika just as a jar file with a related POM
to be published in the Maven repository, or should we come up with
some packaging that perhaps bundles also all the dependencies? I'd be
fine with just a jar artifact unless we want to make Tika runnable
just by itself (either as a webapp or a CLI application).

BR,

Jukka Zitting

Re: Tika v0.1 thoughts

Posted by Bertrand Delacretaz <bd...@apache.org>.
On 11/12/07, Jukka Zitting <ju...@gmail.com> wrote:

> ...Are we happy with releasing Tika just as a jar file with a related POM
> to be published in the Maven repository...

I'm ok with that, at least at this early stage.

-Bertrand

Re: Tika v0.1 thoughts

Posted by Bertrand Delacretaz <bd...@apache.org>.
On Nov 20, 2007 4:33 AM, Jukka Zitting <ju...@gmail.com> wrote:

> On Nov 19, 2007 3:59 PM, Bertrand Delacretaz <bd...@apache.org> wrote:
> >... Also, I think NOTICE and LICENSE (without .txt extension) are the
> > preferred filenames, can someone confirm?
>
> I believe those are the canonical names, but the .txt extensions are
> accepted. Having the .txt extension makes life easier at least on
> Windows....

Fine with me (as long as I don't have to use Windows ;-)
-Bertrand

Re: Tika v0.1 thoughts

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Nov 19, 2007 3:59 PM, Bertrand Delacretaz <bd...@apache.org> wrote:
> I think releasing binaries and source code in separate files is ok, as
> long as all distributed artifacts contain the NOTICE and LICENSE file.

Yes. The main release artifact is always the source package, and other
artifacts can be included in the release as long as they carry the
appropriate licensing information.

Looking back at my original message, it reads like I was suggesting
just the binary jar file, but that was certainly not my intention.
What I typically do for Apache releases is a) tag the source tree in
the repository, b) export it to local file system, c) package that
export as the main source artifact, and d) build any binary artifacts
from the exported source.

> Current, NOTICE.txt says
>
>   This product includes software developed by Andy Clark.
>
> Could we specify what this software is? It makes it easier to keep
> track of things.

Sami added that for the NekoHTML dependency. I don't think we really
need that line unless we actually bundle the dependencies with Tika
releases.

> Also, I think NOTICE and LICENSE (without .txt extension) are the
> preferred filenames, can someone confirm?

I believe those are the canonical names, but the .txt extensions are
accepted. Having the .txt extension makes life easier at least on
Windows.

BR,

Jukka Zitting

Re: Standing down

Posted by Bertrand Delacretaz <bd...@apache.org>.
Hi Mark,

On Nov 19, 2007 6:17 PM,  <ma...@searcharea.co.uk> wrote:
> I'd like to apologise for my lack of involvement with this project....

No need to apologize, change's in one's situation and needs happen
often. But thanks for letting us know!

> ...Please
> feel free to remove my commit rights and name from the project team....

I suggest leaving everything as is (or maybe marking yourself
"inactive" on the team page), and re-evaluating once Tika graduates.

-Bertrand

Standing down

Posted by ma...@searcharea.co.uk.
I'd like to apologise for my lack of involvement with this project.

When Tika kicked off my employer was interested in parsers and had agreed to let
me spend company time helping out with Tika.

Unfortunately, since then the company has selected a commercial offering and
Tika is no longer of such interest to them. While I would personally still like
to help get a good open-source parser framework off the ground I no longer have
the necessary time or incentives to contribute in any meaningful way. Please
feel free to remove my commit rights and name from the project team.

It's great to see Tika making good progress and I wish you all the best with the
project.

Cheers,
Mark


Re: Tika v0.1 thoughts

Posted by Bertrand Delacretaz <bd...@apache.org>.
On Nov 19, 2007 3:50 AM, Chris Mattmann <ch...@jpl.nasa.gov> wrote:
> Niall,
> > ...The real issue though is whether the IPMC will accept just a jar as a
> > release - since as an incubating project Tika can't actually release
> > anything without their approval. I would be surprised if they didn't
> > make the same comment as I did.
>
>  I'm not too familiar with the Incubator PMC, but people like Jukka, and
> Bertrand, who sit on it, I'm guessing are. So we'll see what they think....

I think releasing binaries and source code in separate files is ok, as
long as all distributed artifacts contain the NOTICE and LICENSE file.

However, including the source code in the binary distribution is
useful for debugging, as some using that is sure to have the correct
version of the source code.

If we want the binary distribution to be very small, we could also
include the compiled jar in the source code distribution files, so as
to provide that "source and binary in sync" file.

Current, NOTICE.txt says

  This product includes software developed by Andy Clark.

Could we specify what this software is? It makes it easier to keep
track of things.

Also, I think NOTICE and LICENSE (without .txt extension) are the
preferred filenames, can someone confirm?

-Bertrand

Re: Tika v0.1 thoughts

Posted by Chris Mattmann <ch...@jpl.nasa.gov>.
Niall,

> Is a "binary only" release acceptable at the ASF? I always thought as
> an "open source" organization the minimum requirement for a release
> was a source distro from which someone could build it themselves - and
> a binary distro was a nice to have convience as an optional extra.
> 

 I'm sorry but you misinterpreted my (and also by transitivity Jukka's)
suggestion. By releasing Tika as a "jar file" I didn't mean that the only
visible release of the system will be a "jar file". It simply means that the
build target rather than a deployment bin struct, with say, a directory
layout (e.g., /bin /etc /logs scripts/ conf/), would be a jar file. That's
the only suggestion. The mvn assembly:assembly goal simply builds the jar
deliverable anyways (and packages them up as the "-bin".* files). Of course,
the source code is built as the "-src".* files as well and any release would
have to include src as well.

> 
> The real issue though is whether the IPMC will accept just a jar as a
> release - since as an incubating project Tika can't actually release
> anything without their approval. I would be surprised if they didn't
> make the same comment as I did.

 I'm not too familiar with the Incubator PMC, but people like Jukka, and
Bertrand, who sit on it, I'm guessing are. So we'll see what they think.

Cheers,
  Chris

> 
> Niall
> 
>>  I can work on the release as soon as I get a go-ahead from the rest of you
>> guys. Should we call a vote? I think that the only blocker issue is TIKA-91
>> that needs to be fixed pre-release. The rest of the issues (8 major and 12
>> minor) are all such that they can be cast for the 0.2 release.
>> 
>> Cheers,
>>   Chris
>> 
>> 
>> 
>> On 11/11/07 3:56 PM, "Jukka Zitting" <ju...@gmail.com> wrote:
>> 
>>> Hi,
>>> 
>>> I don't think we'll have the 0.1 release out for ApacheCon (it's
>>> probably even procedurally too late already), but it would still be
>>> nice to target for a release in a relatively near future. I think
>>> we're already at a point where quite a few people would find a frozen
>>> snapshot of Tika useful (even if the API still isn't stable).
>>> 
>>> There are a number of API and implementation improvements I have in
>>> mind (I'll try to offload them to Jira), but generally I'm reasonably
>>> happy with the current state. The main thing I'm worried about is
>>> packaging (and documentation, but that's not so important yet).
>>> 
>>> Are we happy with releasing Tika just as a jar file with a related POM
>>> to be published in the Maven repository, or should we come up with
>>> some packaging that perhaps bundles also all the dependencies? I'd be
>>> fine with just a jar artifact unless we want to make Tika runnable
>>> just by itself (either as a webapp or a CLI application).
>>> 
>>> BR,
>>> 
>>> Jukka Zitting
>> 
>> ______________________________________________
>> Chris Mattmann, Ph.D.
>> Chris.Mattmann@jpl.nasa.gov
>> Cognizant Development Engineer
>> Early Detection Research Network Project
>> _________________________________________________
>> Jet Propulsion Laboratory            Pasadena, CA
>> Office: 171-266B                     Mailstop:  171-246
>> _______________________________________________________
>> 
>> Disclaimer:  The opinions presented within are my own and do not reflect
>> those of either NASA, JPL, or the California Institute of Technology.
>> 
>> 
>> 

______________________________________________
Chris Mattmann, Ph.D.
Chris.Mattmann@jpl.nasa.gov
Cognizant Development Engineer
Early Detection Research Network Project
_________________________________________________
Jet Propulsion Laboratory            Pasadena, CA
Office: 171-266B                     Mailstop:  171-246
_______________________________________________________

Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.



Re: Tika v0.1 thoughts

Posted by Niall Pemberton <ni...@gmail.com>.
On Nov 19, 2007 1:41 AM, Chris Mattmann <ch...@jpl.nasa.gov> wrote:
> Hi Jukka,
>
>  I'm in complete agreement with you on this. I'd like to push this release
> out sooner rather than later. I think that the system in its current state
> is highly useful (I'm already targeting its use in 2 projects at my job).
>
>  I'm happy releasing Tika as a jar file (+1) and a related POM. Let's target
> the webapp/Swing GUI/CLI as a 0.2 feature.

Is a "binary only" release acceptable at the ASF? I always thought as
an "open source" organization the minimum requirement for a release
was a source distro from which someone could build it themselves - and
a binary distro was a nice to have convience as an optional extra.

Also the changes for TIKA-101 added in the facility to create source
and binary distros (run "mvn site assembly:assembly") which hopefully
meet the criteria acceptable to the ASF for a release - so it should
be straight forward to do so.

The real issue though is whether the IPMC will accept just a jar as a
release - since as an incubating project Tika can't actually release
anything without their approval. I would be surprised if they didn't
make the same comment as I did.

Niall

>  I can work on the release as soon as I get a go-ahead from the rest of you
> guys. Should we call a vote? I think that the only blocker issue is TIKA-91
> that needs to be fixed pre-release. The rest of the issues (8 major and 12
> minor) are all such that they can be cast for the 0.2 release.
>
> Cheers,
>   Chris
>
>
>
> On 11/11/07 3:56 PM, "Jukka Zitting" <ju...@gmail.com> wrote:
>
> > Hi,
> >
> > I don't think we'll have the 0.1 release out for ApacheCon (it's
> > probably even procedurally too late already), but it would still be
> > nice to target for a release in a relatively near future. I think
> > we're already at a point where quite a few people would find a frozen
> > snapshot of Tika useful (even if the API still isn't stable).
> >
> > There are a number of API and implementation improvements I have in
> > mind (I'll try to offload them to Jira), but generally I'm reasonably
> > happy with the current state. The main thing I'm worried about is
> > packaging (and documentation, but that's not so important yet).
> >
> > Are we happy with releasing Tika just as a jar file with a related POM
> > to be published in the Maven repository, or should we come up with
> > some packaging that perhaps bundles also all the dependencies? I'd be
> > fine with just a jar artifact unless we want to make Tika runnable
> > just by itself (either as a webapp or a CLI application).
> >
> > BR,
> >
> > Jukka Zitting
>
> ______________________________________________
> Chris Mattmann, Ph.D.
> Chris.Mattmann@jpl.nasa.gov
> Cognizant Development Engineer
> Early Detection Research Network Project
> _________________________________________________
> Jet Propulsion Laboratory            Pasadena, CA
> Office: 171-266B                     Mailstop:  171-246
> _______________________________________________________
>
> Disclaimer:  The opinions presented within are my own and do not reflect
> those of either NASA, JPL, or the California Institute of Technology.
>
>
>

Re: Tika v0.1 thoughts

Posted by Chris Mattmann <ch...@jpl.nasa.gov>.
Hi Jukka,

 I'm in complete agreement with you on this. I'd like to push this release
out sooner rather than later. I think that the system in its current state
is highly useful (I'm already targeting its use in 2 projects at my job).

 I'm happy releasing Tika as a jar file (+1) and a related POM. Let's target
the webapp/Swing GUI/CLI as a 0.2 feature.

 I can work on the release as soon as I get a go-ahead from the rest of you
guys. Should we call a vote? I think that the only blocker issue is TIKA-91
that needs to be fixed pre-release. The rest of the issues (8 major and 12
minor) are all such that they can be cast for the 0.2 release.

Cheers,
  Chris



On 11/11/07 3:56 PM, "Jukka Zitting" <ju...@gmail.com> wrote:

> Hi,
> 
> I don't think we'll have the 0.1 release out for ApacheCon (it's
> probably even procedurally too late already), but it would still be
> nice to target for a release in a relatively near future. I think
> we're already at a point where quite a few people would find a frozen
> snapshot of Tika useful (even if the API still isn't stable).
> 
> There are a number of API and implementation improvements I have in
> mind (I'll try to offload them to Jira), but generally I'm reasonably
> happy with the current state. The main thing I'm worried about is
> packaging (and documentation, but that's not so important yet).
> 
> Are we happy with releasing Tika just as a jar file with a related POM
> to be published in the Maven repository, or should we come up with
> some packaging that perhaps bundles also all the dependencies? I'd be
> fine with just a jar artifact unless we want to make Tika runnable
> just by itself (either as a webapp or a CLI application).
> 
> BR,
> 
> Jukka Zitting

______________________________________________
Chris Mattmann, Ph.D.
Chris.Mattmann@jpl.nasa.gov
Cognizant Development Engineer
Early Detection Research Network Project
_________________________________________________
Jet Propulsion Laboratory            Pasadena, CA
Office: 171-266B                     Mailstop:  171-246
_______________________________________________________

Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.