You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Viksit Gaur <vi...@gmail.com> on 2008/07/01 18:40:54 UTC

Nutch SWF based on Adobe's latest spec?

Hi All,

I was wondering if there are plans to update the SWF parser  with the 
latest Adobe specification? Could someone also shed some light on how 
the new specifications could (would?) change the way this parser works?

Cheers,
Viksit

Re: Nutch SWF based on Adobe's latest spec?

Posted by Andrzej Bialecki <ab...@getopt.org>.
Viksit Gaur wrote:
> Hi All,
> 
> I was wondering if there are plans to update the SWF parser  with the 
> latest Adobe specification? Could someone also shed some light on how 
> the new specifications could (would?) change the way this parser works?

Nutch community welcomes Slashdot readers ;)

Currently the Nutch SWF parser is based on a fairly old library, 
Java/SWF. I just browsed through the Open Source Flex SDK, and there is 
a chance that the swfutils/ package could provide the text / outlink 
extraction functionality that we need. It remains to be seen whether it 
works significantly better than the current parser ...

As far as I'm aware there are no plans in Nutch or Tika to write an SWF 
parser from scratch, based on this updated specification. If someone 
were to submit code that uses swfutils.jar to provide the parsing of new 
formats, we will certainly welcome such contribution. :)

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com