You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Antony Bowesman <ad...@teamware.com> on 2007/02/23 06:04:53 UTC
TextMining.org Word extractor
I'm extracting text from Word using TextMining.org extractors - it works better
than POI because it extracts Word 6/95 as well as 97-2002, which POI cannot do.
However, I'm trying to find out about licence issues with the TM jar. The TM
website seems to be permanently hacked these days.
Anyone know?
Also, has anyone come up with a good solution for extracting data from
fast-saved files, something that neither TM nor POI can do.
Antony
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: TextMining.org Word extractor
Posted by Antony Bowesman <ad...@teamware.com>.
Hi Hoss,
> : Yes, I found the info, but it seems his offer to hand over the software
> : went un-answered. Nutch uses Ryan Ackley's Word6 extractor, so I'm guessing it
>
> i don't know that you can assume that .. he specificaly said "Send me an
> email directly if you are interested"
Yes, hence this thread ;) I'd not like to rely on the textmining parser only to
discover it's not useable. I can use POI if I have to, but it does not handle
Word 6, which is bad, so I'd rather use TM.
> : is still Apache 2, but as I am about to ship some software, I wanted to put the
> : right licence text where it should be.
>
> he did explicitly say it was apache 2 in that email. and whatever copy
> you have that you want to ship should have come with the liscence.
Actually, the jar file is the one that's downloaded with the LuceneInAction.zip
file from the Manning website
http://www.lucenebook.com/LuceneInAction.zip from http://www.manning.com/hatcher2/
and there's no licence file. The book does not refer to the licence although it
refers to the parser as 'freely available'. The book just refers to the website
- now unavailable.
I've tried sending Ryan Ackley mail direct. Hopefully he will clarify its status.
Antony
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: TextMining.org Word extractor
Posted by Chris Hostetter <ho...@fucit.org>.
: Yes, I found the info, but it seems his offer to hand over the software
: went un-answered. Nutch uses Ryan Ackley's Word6 extractor, so I'm guessing it
i don't know that you can assume that .. he specificaly said "Send me an
email directly if you are interested"
: is still Apache 2, but as I am about to ship some software, I wanted to put the
: right licence text where it should be.
he did explicitly say it was apache 2 in that email. and whatever copy
you have that you want to ship should have come with the liscence.
-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: TextMining.org Word extractor
Posted by Antony Bowesman <ad...@teamware.com>.
Yes, I found the info, but it seems his offer to hand over the software
http://mail-archives.apache.org/mod_mbox/lucene-java-user/200602.mbox/%3Cfe93b8057e9.7e9fe93b805@tampabay.rr.com%3E
went un-answered. Nutch uses Ryan Ackley's Word6 extractor, so I'm guessing it
is still Apache 2, but as I am about to ship some software, I wanted to put the
right licence text where it should be.
Antony
Chris Hostetter wrote:
>
> googling...
> TextMining.org licence
> ...turns up lots of useful info, some from the archive of this list.
>
>
> : Date: Fri, 23 Feb 2007 16:04:53 +1100
> : From: Antony Bowesman <ad...@teamware.com>
> : Reply-To: java-user@lucene.apache.org
> : To: java-user@lucene.apache.org
> : Subject: TextMining.org Word extractor
> :
> : I'm extracting text from Word using TextMining.org extractors - it works better
> : than POI because it extracts Word 6/95 as well as 97-2002, which POI cannot do.
> : However, I'm trying to find out about licence issues with the TM jar. The TM
> : website seems to be permanently hacked these days.
> :
> : Anyone know?
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: TextMining.org Word extractor
Posted by Chris Hostetter <ho...@fucit.org>.
googling...
TextMining.org licence
...turns up lots of useful info, some from the archive of this list.
: Date: Fri, 23 Feb 2007 16:04:53 +1100
: From: Antony Bowesman <ad...@teamware.com>
: Reply-To: java-user@lucene.apache.org
: To: java-user@lucene.apache.org
: Subject: TextMining.org Word extractor
:
: I'm extracting text from Word using TextMining.org extractors - it works better
: than POI because it extracts Word 6/95 as well as 97-2002, which POI cannot do.
: However, I'm trying to find out about licence issues with the TM jar. The TM
: website seems to be permanently hacked these days.
:
: Anyone know?
:
: Also, has anyone come up with a good solution for extracting data from
: fast-saved files, something that neither TM nor POI can do.
:
: Antony
:
:
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: For additional commands, e-mail: java-user-help@lucene.apache.org
:
-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org