You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Antony Bowesman <ad...@teamware.com> on 2007/02/23 06:04:53 UTC

TextMining.org Word extractor

I'm extracting text from Word using TextMining.org extractors - it works better 
than POI because it extracts Word 6/95 as well as 97-2002, which POI cannot do. 
  However, I'm trying to find out about licence issues with the TM jar. The TM 
website seems to be permanently hacked these days.

Anyone know?

Also, has anyone come up with a good solution for extracting data from 
fast-saved files, something that neither TM nor POI can do.

Antony



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: TextMining.org Word extractor

Posted by Antony Bowesman <ad...@teamware.com>.
Hi Hoss,

> : Yes, I found the info, but it seems his offer to hand over the software
> : went un-answered.  Nutch uses Ryan Ackley's Word6 extractor, so I'm guessing it
> 
> i don't know that you can assume that .. he specificaly said "Send me an
> email directly if you are interested"

Yes, hence this thread ;)  I'd not like to rely on the textmining parser only to 
discover it's not useable.  I can use POI if I have to, but it does not handle 
Word 6, which is bad, so I'd rather use TM.

> : is still Apache 2, but as I am about to ship some software, I wanted to put the
> : right licence text where it should be.
> 
> he did explicitly say it was apache 2 in that email.  and whatever copy
> you have that you want to ship should have come with the liscence.

Actually, the jar file is the one that's downloaded with the LuceneInAction.zip 
file from the Manning website

http://www.lucenebook.com/LuceneInAction.zip from http://www.manning.com/hatcher2/

and there's no licence file.  The book does not refer to the licence although it 
refers to the parser as 'freely available'.  The book just refers to the website 
- now unavailable.

I've tried sending Ryan Ackley mail direct.  Hopefully he will clarify its status.

Antony


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: TextMining.org Word extractor

Posted by Chris Hostetter <ho...@fucit.org>.
: Yes, I found the info, but it seems his offer to hand over the software
: went un-answered.  Nutch uses Ryan Ackley's Word6 extractor, so I'm guessing it

i don't know that you can assume that .. he specificaly said "Send me an
email directly if you are interested"

: is still Apache 2, but as I am about to ship some software, I wanted to put the
: right licence text where it should be.

he did explicitly say it was apache 2 in that email.  and whatever copy
you have that you want to ship should have come with the liscence.




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: TextMining.org Word extractor

Posted by Antony Bowesman <ad...@teamware.com>.
Yes, I found the info, but it seems his offer to hand over the software

http://mail-archives.apache.org/mod_mbox/lucene-java-user/200602.mbox/%3Cfe93b8057e9.7e9fe93b805@tampabay.rr.com%3E

went un-answered.  Nutch uses Ryan Ackley's Word6 extractor, so I'm guessing it 
is still Apache 2, but as I am about to ship some software, I wanted to put the 
right licence text where it should be.

Antony


Chris Hostetter wrote:
> 
> googling...
> 	TextMining.org licence
> ...turns up lots of useful info, some from the archive of this list.
> 
> 
> : Date: Fri, 23 Feb 2007 16:04:53 +1100
> : From: Antony Bowesman <ad...@teamware.com>
> : Reply-To: java-user@lucene.apache.org
> : To: java-user@lucene.apache.org
> : Subject: TextMining.org Word extractor
> :
> : I'm extracting text from Word using TextMining.org extractors - it works better
> : than POI because it extracts Word 6/95 as well as 97-2002, which POI cannot do.
> :   However, I'm trying to find out about licence issues with the TM jar. The TM
> : website seems to be permanently hacked these days.
> :
> : Anyone know?



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: TextMining.org Word extractor

Posted by Chris Hostetter <ho...@fucit.org>.

googling...
	TextMining.org licence
...turns up lots of useful info, some from the archive of this list.


: Date: Fri, 23 Feb 2007 16:04:53 +1100
: From: Antony Bowesman <ad...@teamware.com>
: Reply-To: java-user@lucene.apache.org
: To: java-user@lucene.apache.org
: Subject: TextMining.org Word extractor
:
: I'm extracting text from Word using TextMining.org extractors - it works better
: than POI because it extracts Word 6/95 as well as 97-2002, which POI cannot do.
:   However, I'm trying to find out about licence issues with the TM jar. The TM
: website seems to be permanently hacked these days.
:
: Anyone know?
:
: Also, has anyone come up with a good solution for extracting data from
: fast-saved files, something that neither TM nor POI can do.
:
: Antony
:
:
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: For additional commands, e-mail: java-user-help@lucene.apache.org
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org