You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Tyler Palsulich <tp...@apache.org> on 2014/08/07 18:56:58 UTC

[DISCUSS] Give examples of Parser, Detector, and Translator usage

Hi All,

I think we should add some consolidated documentation on how to use Tika's
Java API. It would be very helpful if we had short snippets of code that
showed how exactly you can use Parser.parse(), for example. I think I
remember a thread about testing example code a while back, but I'm not
sure. We have some developer documentation on the site, but the user docs
are somewhat lacking.

I can think of a few options:

*1) tika-example module*. This module would have example code of using each
main interface of Tika. Simplicity and organization would be king, so new
users can find exactly what they're looking for quickly. A big benefit of
this is that unit tests would be baked in. I like this option. One downside
is that reading source code in the browser is terrible (e.g. see [0]).

*2)* Examples section on the *wiki*. My impression is that the wiki is not
as popular as the root website. And, it's also very easy to forget about
and let go out of date. But, formatting and explanations would be pretty.

*3)* Examples section on the *website*. This has the benefit of pretty
formatting and coloring, without the potential user having to check out the
repo or view direct source in browser. Another benefit is this section
would be perfect for showing how to use the tika-app jar.

Right now, I think the best option is a combination of 1 and 3. We get some
end to end examples running in the tika-example module and short snippets
of usage on an examples page of the website.

What do you guys think? What other options should we consider?

Tyler

[0] -
http://svn.apache.org/repos/asf/tika/trunk/tika-core/src/main/java/org/apache/tika/parser/Parser.java

Re: [DISCUSS] Give examples of Parser, Detector, and Translator usage

Posted by Nick Burch <ap...@gagravarr.org>.
On Thu, 7 Aug 2014, Tyler Palsulich wrote:
>> This needs to pull from the examples in svn, so we make sure it compiles
>> and stays working. See above!
>
> Ahh. Thank you for the link. So, first, create the tika-example module and
> some examples.

Yup

> The integrate Apache CMS into the website (never done this before, 
> anyone have experience?). Integrate the tika-example code with the CMS 
> and website.

Not quite. After writing the examples, next investigate if it'd be easy to 
get Maven to pull in snippets of code into apt-generated pages (maybe via 
our own plugin if needed?). If it is, stick with what we have, since 
everyone knows it and it works! If not, switch the website from Apt to CMS 
+ Markdown, then inline the code using the CMS feature for that.

(If we do go down the latter route, it's probably something for a half day 
hackathon at ApacheCon or similar, it's not too hard, but best done with 
others around to help, especially infra folks)

Nick

Re: [DISCUSS] Give examples of Parser, Detector, and Translator usage

Posted by Tyler Palsulich <tp...@gmail.com>.
Thank you for the input, Nick and Avi.

> This needs to pull from the examples in svn, so we make sure it compiles
and stays working. See above!
Ahh. Thank you for the link. So, first, create the tika-example module and
some examples. The integrate Apache CMS into the website (never done this
before, anyone have experience?). Integrate the tika-example code with the
CMS and website.

How does that sound?

Tyler


On Thu, Aug 7, 2014 at 10:16 AM, Nick Burch <ap...@gagravarr.org> wrote:

> On Thu, 7 Aug 2014, Tyler Palsulich wrote:
>
>> I think we should add some consolidated documentation on how to use Tika's
>> Java API. It would be very helpful if we had short snippets of code that
>> showed how exactly you can use Parser.parse(), for example. I think I
>> remember a thread about testing example code a while back, but I'm not
>> sure.
>>
>
> We did[1]. The CMS coupled with Markdown is able to pull in snippets from
> SVN. There was a general sense that we should see if we could do it with
> Apt too (maybe with a plugin?), otherwise switch from Apt to Markdown + CMS
> for building + editing the site. That was about it... Volunteer(s) needed!
>
>  *1) tika-example module*. This module would have example code of using
>> each main interface of Tika. Simplicity and organization would be king, so
>> new users can find exactly what they're looking for quickly. A big benefit
>> of this is that unit tests would be baked in. I like this option.
>>
>
> We can check for breakages with the compiler and unit tests, which is very
> handy! (Nothing worse than examples that don't work any more...)
>
>  *3)* Examples section on the *website*. This has the benefit of pretty
>> formatting and coloring, without the potential user having to check out the
>> repo or view direct source in browser. Another benefit is this section
>> would be perfect for showing how to use the tika-app jar.
>>
>
> This needs to pull from the examples in svn, so we make sure it compiles
> and stays working. See above!
>
> Nick
>
> [1] eg http://mail-archives.apache.org/mod_mbox/tika-dev/201406.
> mbox/%3Calpine.DEB.2.02.1406022014580.13115%40urchin.earth.li%3E
>

Re: [DISCUSS] Give examples of Parser, Detector, and Translator usage

Posted by Nick Burch <ap...@gagravarr.org>.
On Thu, 7 Aug 2014, Tyler Palsulich wrote:
> I think we should add some consolidated documentation on how to use Tika's
> Java API. It would be very helpful if we had short snippets of code that
> showed how exactly you can use Parser.parse(), for example. I think I
> remember a thread about testing example code a while back, but I'm not
> sure.

We did[1]. The CMS coupled with Markdown is able to pull in snippets from 
SVN. There was a general sense that we should see if we could do it with 
Apt too (maybe with a plugin?), otherwise switch from Apt to Markdown + 
CMS for building + editing the site. That was about it... Volunteer(s) 
needed!

> *1) tika-example module*. This module would have example code of using 
> each main interface of Tika. Simplicity and organization would be king, 
> so new users can find exactly what they're looking for quickly. A big 
> benefit of this is that unit tests would be baked in. I like this 
> option.

We can check for breakages with the compiler and unit tests, which is very 
handy! (Nothing worse than examples that don't work any more...)

> *3)* Examples section on the *website*. This has the benefit of pretty 
> formatting and coloring, without the potential user having to check out 
> the repo or view direct source in browser. Another benefit is this 
> section would be perfect for showing how to use the tika-app jar.

This needs to pull from the examples in svn, so we make sure it compiles 
and stays working. See above!

Nick

[1] eg http://mail-archives.apache.org/mod_mbox/tika-dev/201406.mbox/%3Calpine.DEB.2.02.1406022014580.13115%40urchin.earth.li%3E

RE: [DISCUSS] Give examples of Parser, Detector, and Translator usage

Posted by Nick Burch <ap...@gagravarr.org>.
On Mon, 11 Aug 2014, Allison, Timothy B. wrote:
> For development on TIKA-1302, I've been using a modified version of the 
> recursive parser wrapper that I submitted (well, plagiarized from 
> Jukka+Nick's code on the wiki site above) as TIKA-1329.  For Tika 1.7, 
> I'd like to add this to tika-app and tika-server, and I'll be using it 
> in planned tika-batch and tika-eval modules.  So, should I add that to 
> the tika-core, tika-parsers or tika-example module?  Do we want tika-app 
> and tika-server to depend on tika-example?

I'd say that no production code should depend on tika-example.

Maybe tika parser is the right place for this? Not sure though, I can see 
an argument for tika core too...

Nick

RE: [DISCUSS] Give examples of Parser, Detector, and Translator usage

Posted by "Allison, Timothy B." <ta...@mitre.org>.
>Recursion is one that causes confusion, we've got some example programs
>on 
>the wiki that we can include:
>https://wiki.apache.org/tika/RecursiveMetadata
>
>Ray Gauss is probably our best bet for advanced metadata stuff to send in
>some examples on that!

For development on TIKA-1302, I've been using a modified version of the recursive parser wrapper that I submitted (well, plagiarized from Jukka+Nick's code on the wiki site above) as TIKA-1329.  For Tika 1.7, I'd like to add this to tika-app and tika-server, and I'll be using it in planned tika-batch and tika-eval modules.  So, should I add that to the tika-core, tika-parsers or tika-example module?  Do we want tika-app and tika-server to depend on tika-example?

Re: [DISCUSS] Give examples of Parser, Detector, and Translator usage

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
FYI coming back to this thread, the Manning folks said it’s fine
to contribute the code! :)

I’ve filed this issue to track it:

https://issues.apache.org/jira/browse/TIKA-1562


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: <Mattmann>, Chris Mattmann <Ch...@jpl.nasa.gov>
Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
Date: Wednesday, August 13, 2014 at 3:09 PM
To: "dev@tika.apache.org" <de...@tika.apache.org>
Subject: Re: [DISCUSS] Give examples of Parser, Detector, and Translator
usage

>sure np, my point is (and you'll see in Tika in Action examples) depending
>on the package namespace "Example" in the classname may be redundant.
>
>For example, is org.apache.tika.example.translator.ExampleTranslator
>redundant? :)
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Chris Mattmann, Ph.D.
>Chief Architect
>Instrument Software and Science Data Systems Section (398)
>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>Office: 168-519, Mailstop: 168-527
>Email: chris.a.mattmann@nasa.gov
>WWW:  http://sunset.usc.edu/~mattmann/
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Adjunct Associate Professor, Computer Science Department
>University of Southern California, Los Angeles, CA 90089 USA
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
>-----Original Message-----
>From: Tyler Palsulich <tp...@gmail.com>
>Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
>Date: Wednesday, August 13, 2014 3:45 PM
>To: "dev@tika.apache.org" <de...@tika.apache.org>
>Subject: Re: [DISCUSS] Give examples of Parser, Detector, and Translator
>usage
>
>>I think the *Example names are useful, since then the names can overlap
>>with the class they give an example of. For example, TranslatorExample
>>should show how to use a Translator.
>>
>>Tyler
>>
>>
>>On Tue, Aug 12, 2014 at 4:37 PM, Mattmann, Chris A (3980) <
>>chris.a.mattmann@jpl.nasa.gov> wrote:
>>
>>> let's go with o.a.tika.example
>>> Class names don't need Example in them.
>>>
>>> Sound good?
>>>
>>>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Chris Mattmann, Ph.D.
>>> Chief Architect
>>> Instrument Software and Science Data Systems Section (398)
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 168-519, Mailstop: 168-527
>>> Email: chris.a.mattmann@nasa.gov
>>> WWW:  http://sunset.usc.edu/~mattmann/
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Adjunct Associate Professor, Computer Science Department
>>> University of Southern California, Los Angeles, CA 90089 USA
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Tyler Palsulich <tp...@gmail.com>
>>> Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
>>> Date: Tuesday, August 12, 2014 4:34 PM
>>> To: "dev@tika.apache.org" <de...@tika.apache.org>
>>> Subject: Re: [DISCUSS] Give examples of Parser, Detector, and
>>>Translator
>>> usage
>>>
>>> >Woot! Any input on naming conventions for the examples?
>>> >Package: org.apache.tika.example.
>>> >File/class: *Example.java.
>>> >
>>> >Methods?
>>> >
>>> >Tyler
>>> >On Aug 12, 2014 9:32 AM, "Mattmann, Chris A (3980)" <
>>> >chris.a.mattmann@jpl.nasa.gov> wrote:
>>> >
>>> >> OK I checked with Manning! We can contribute the source code! :)
>>> >>
>>> >> I will prepare it as part of the tika-examples package. Woot!
>>> >>
>>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >> Chris Mattmann, Ph.D.
>>> >> Chief Architect
>>> >> Instrument Software and Science Data Systems Section (398)
>>> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> >> Office: 168-519, Mailstop: 168-527
>>> >> Email: chris.a.mattmann@nasa.gov
>>> >> WWW:  http://sunset.usc.edu/~mattmann/
>>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >> Adjunct Associate Professor, Computer Science Department
>>> >> University of Southern California, Los Angeles, CA 90089 USA
>>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> -----Original Message-----
>>> >> From: <Mattmann>, Chris Mattmann <Ch...@jpl.nasa.gov>
>>> >> Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
>>> >> Date: Thursday, August 7, 2014 2:54 PM
>>> >> To: "dev@tika.apache.org" <de...@tika.apache.org>
>>> >> Subject: Re: [DISCUSS] Give examples of Parser, Detector, and
>>>Translator
>>> >> usage
>>> >>
>>> >> >Hey Nick! :)
>>> >> >
>>> >> >I'd have no problem pinching the code from Tika in Action. I wonder
>>>if
>>> >> >the Manning folks would mind.
>>> >> >
>>> >> >I'll reach out to them.
>>> >> >
>>> >> >Cheers,
>>> >> >CHris
>>> >> >
>>> >> >
>>> >> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >> >Chris Mattmann, Ph.D.
>>> >> >Chief Architect
>>> >> >Instrument Software and Science Data Systems Section (398)
>>> >> >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> >> >Office: 168-519, Mailstop: 168-527
>>> >> >Email: chris.a.mattmann@nasa.gov
>>> >> >WWW:  http://sunset.usc.edu/~mattmann/
>>> >> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >> >Adjunct Associate Professor, Computer Science Department
>>> >> >University of Southern California, Los Angeles, CA 90089 USA
>>> >> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >-----Original Message-----
>>> >> >From: Nick Burch <ap...@gagravarr.org>
>>> >> >Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
>>> >> >Date: Thursday, August 7, 2014 2:42 PM
>>> >> >To: "dev@tika.apache.org" <de...@tika.apache.org>
>>> >> >Subject: Re: [DISCUSS] Give examples of Parser, Detector, and
>>> >>Translator
>>> >> >usage
>>> >> >
>>> >> >>On Thu, 7 Aug 2014, Tyler Palsulich wrote:
>>> >> >>> Sounds like the new module is a good idea. So, let's jump on it!
>>>I
>>> >>will
>>> >> >>> create a new 'example' JIRA tag and create issues for creating
>>>the
>>> >> >>> module and adding Parse, Detect, and Translate examples. Others
>>> >>should
>>> >> >>> add issues/desired examples as they see fit. How's that sound?
>>> >> >>
>>> >> >>I wonder if it's worth approaching those crazy fools who wrote a
>>>book
>>> >>on
>>> >> >>Tika, to see if we could pinch one or two of their examples? If
>>>only
>>> >>we
>>> >> >>knew who they were... ;-)
>>> >> >>
>>> >> >>
>>> >> >>Recursion is one that causes confusion, we've got some example
>>> >>programs
>>> >> >>on
>>> >> >>the wiki that we can include:
>>> >> >>https://wiki.apache.org/tika/RecursiveMetadata
>>> >> >>
>>> >> >>Ray Gauss is probably our best bet for advanced metadata stuff to
>>> >>send in
>>> >> >>some examples on that!
>>> >> >>
>>> >> >>Another one that has generated mailing list traffic lately is
>>>embedded
>>> >> >>images, including re-writing links to them. There's some (LGPL)
>>>code
>>> >>in
>>> >> >>Alfresco which I wrote a few years ago to do that, Ray might be
>>>able
>>> >>to
>>> >> >>get the nod to contribute that (or a cut-down version) as an
>>>example
>>> >>of
>>> >> >>that style of parsing html + embedded resources in parallel
>>> >> >>
>>> >> >>Nick
>>> >> >
>>> >>
>>> >>
>>>
>>>
>


Re: [DISCUSS] Give examples of Parser, Detector, and Translator usage

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
sure np, my point is (and you'll see in Tika in Action examples) depending
on the package namespace "Example" in the classname may be redundant.

For example, is org.apache.tika.example.translator.ExampleTranslator
redundant? :)

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Tyler Palsulich <tp...@gmail.com>
Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
Date: Wednesday, August 13, 2014 3:45 PM
To: "dev@tika.apache.org" <de...@tika.apache.org>
Subject: Re: [DISCUSS] Give examples of Parser, Detector, and Translator
usage

>I think the *Example names are useful, since then the names can overlap
>with the class they give an example of. For example, TranslatorExample
>should show how to use a Translator.
>
>Tyler
>
>
>On Tue, Aug 12, 2014 at 4:37 PM, Mattmann, Chris A (3980) <
>chris.a.mattmann@jpl.nasa.gov> wrote:
>
>> let's go with o.a.tika.example
>> Class names don't need Example in them.
>>
>> Sound good?
>>
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398)
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: chris.a.mattmann@nasa.gov
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Associate Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Tyler Palsulich <tp...@gmail.com>
>> Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
>> Date: Tuesday, August 12, 2014 4:34 PM
>> To: "dev@tika.apache.org" <de...@tika.apache.org>
>> Subject: Re: [DISCUSS] Give examples of Parser, Detector, and Translator
>> usage
>>
>> >Woot! Any input on naming conventions for the examples?
>> >Package: org.apache.tika.example.
>> >File/class: *Example.java.
>> >
>> >Methods?
>> >
>> >Tyler
>> >On Aug 12, 2014 9:32 AM, "Mattmann, Chris A (3980)" <
>> >chris.a.mattmann@jpl.nasa.gov> wrote:
>> >
>> >> OK I checked with Manning! We can contribute the source code! :)
>> >>
>> >> I will prepare it as part of the tika-examples package. Woot!
>> >>
>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> Chris Mattmann, Ph.D.
>> >> Chief Architect
>> >> Instrument Software and Science Data Systems Section (398)
>> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> >> Office: 168-519, Mailstop: 168-527
>> >> Email: chris.a.mattmann@nasa.gov
>> >> WWW:  http://sunset.usc.edu/~mattmann/
>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> Adjunct Associate Professor, Computer Science Department
>> >> University of Southern California, Los Angeles, CA 90089 USA
>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> -----Original Message-----
>> >> From: <Mattmann>, Chris Mattmann <Ch...@jpl.nasa.gov>
>> >> Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
>> >> Date: Thursday, August 7, 2014 2:54 PM
>> >> To: "dev@tika.apache.org" <de...@tika.apache.org>
>> >> Subject: Re: [DISCUSS] Give examples of Parser, Detector, and
>>Translator
>> >> usage
>> >>
>> >> >Hey Nick! :)
>> >> >
>> >> >I'd have no problem pinching the code from Tika in Action. I wonder
>>if
>> >> >the Manning folks would mind.
>> >> >
>> >> >I'll reach out to them.
>> >> >
>> >> >Cheers,
>> >> >CHris
>> >> >
>> >> >
>> >> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> >Chris Mattmann, Ph.D.
>> >> >Chief Architect
>> >> >Instrument Software and Science Data Systems Section (398)
>> >> >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> >> >Office: 168-519, Mailstop: 168-527
>> >> >Email: chris.a.mattmann@nasa.gov
>> >> >WWW:  http://sunset.usc.edu/~mattmann/
>> >> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> >Adjunct Associate Professor, Computer Science Department
>> >> >University of Southern California, Los Angeles, CA 90089 USA
>> >> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >-----Original Message-----
>> >> >From: Nick Burch <ap...@gagravarr.org>
>> >> >Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
>> >> >Date: Thursday, August 7, 2014 2:42 PM
>> >> >To: "dev@tika.apache.org" <de...@tika.apache.org>
>> >> >Subject: Re: [DISCUSS] Give examples of Parser, Detector, and
>> >>Translator
>> >> >usage
>> >> >
>> >> >>On Thu, 7 Aug 2014, Tyler Palsulich wrote:
>> >> >>> Sounds like the new module is a good idea. So, let's jump on it!
>>I
>> >>will
>> >> >>> create a new 'example' JIRA tag and create issues for creating
>>the
>> >> >>> module and adding Parse, Detect, and Translate examples. Others
>> >>should
>> >> >>> add issues/desired examples as they see fit. How's that sound?
>> >> >>
>> >> >>I wonder if it's worth approaching those crazy fools who wrote a
>>book
>> >>on
>> >> >>Tika, to see if we could pinch one or two of their examples? If
>>only
>> >>we
>> >> >>knew who they were... ;-)
>> >> >>
>> >> >>
>> >> >>Recursion is one that causes confusion, we've got some example
>> >>programs
>> >> >>on
>> >> >>the wiki that we can include:
>> >> >>https://wiki.apache.org/tika/RecursiveMetadata
>> >> >>
>> >> >>Ray Gauss is probably our best bet for advanced metadata stuff to
>> >>send in
>> >> >>some examples on that!
>> >> >>
>> >> >>Another one that has generated mailing list traffic lately is
>>embedded
>> >> >>images, including re-writing links to them. There's some (LGPL)
>>code
>> >>in
>> >> >>Alfresco which I wrote a few years ago to do that, Ray might be
>>able
>> >>to
>> >> >>get the nod to contribute that (or a cut-down version) as an
>>example
>> >>of
>> >> >>that style of parsing html + embedded resources in parallel
>> >> >>
>> >> >>Nick
>> >> >
>> >>
>> >>
>>
>>


Re: [DISCUSS] Give examples of Parser, Detector, and Translator usage

Posted by Tyler Palsulich <tp...@gmail.com>.
I think the *Example names are useful, since then the names can overlap
with the class they give an example of. For example, TranslatorExample
should show how to use a Translator.

Tyler


On Tue, Aug 12, 2014 at 4:37 PM, Mattmann, Chris A (3980) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> let's go with o.a.tika.example
> Class names don't need Example in them.
>
> Sound good?
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattmann@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
> -----Original Message-----
> From: Tyler Palsulich <tp...@gmail.com>
> Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
> Date: Tuesday, August 12, 2014 4:34 PM
> To: "dev@tika.apache.org" <de...@tika.apache.org>
> Subject: Re: [DISCUSS] Give examples of Parser, Detector, and Translator
> usage
>
> >Woot! Any input on naming conventions for the examples?
> >Package: org.apache.tika.example.
> >File/class: *Example.java.
> >
> >Methods?
> >
> >Tyler
> >On Aug 12, 2014 9:32 AM, "Mattmann, Chris A (3980)" <
> >chris.a.mattmann@jpl.nasa.gov> wrote:
> >
> >> OK I checked with Manning! We can contribute the source code! :)
> >>
> >> I will prepare it as part of the tika-examples package. Woot!
> >>
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> Chris Mattmann, Ph.D.
> >> Chief Architect
> >> Instrument Software and Science Data Systems Section (398)
> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> Office: 168-519, Mailstop: 168-527
> >> Email: chris.a.mattmann@nasa.gov
> >> WWW:  http://sunset.usc.edu/~mattmann/
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> Adjunct Associate Professor, Computer Science Department
> >> University of Southern California, Los Angeles, CA 90089 USA
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>
> >>
> >>
> >>
> >>
> >>
> >> -----Original Message-----
> >> From: <Mattmann>, Chris Mattmann <Ch...@jpl.nasa.gov>
> >> Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
> >> Date: Thursday, August 7, 2014 2:54 PM
> >> To: "dev@tika.apache.org" <de...@tika.apache.org>
> >> Subject: Re: [DISCUSS] Give examples of Parser, Detector, and Translator
> >> usage
> >>
> >> >Hey Nick! :)
> >> >
> >> >I'd have no problem pinching the code from Tika in Action. I wonder if
> >> >the Manning folks would mind.
> >> >
> >> >I'll reach out to them.
> >> >
> >> >Cheers,
> >> >CHris
> >> >
> >> >
> >> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> >Chris Mattmann, Ph.D.
> >> >Chief Architect
> >> >Instrument Software and Science Data Systems Section (398)
> >> >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> >Office: 168-519, Mailstop: 168-527
> >> >Email: chris.a.mattmann@nasa.gov
> >> >WWW:  http://sunset.usc.edu/~mattmann/
> >> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> >Adjunct Associate Professor, Computer Science Department
> >> >University of Southern California, Los Angeles, CA 90089 USA
> >> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >-----Original Message-----
> >> >From: Nick Burch <ap...@gagravarr.org>
> >> >Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
> >> >Date: Thursday, August 7, 2014 2:42 PM
> >> >To: "dev@tika.apache.org" <de...@tika.apache.org>
> >> >Subject: Re: [DISCUSS] Give examples of Parser, Detector, and
> >>Translator
> >> >usage
> >> >
> >> >>On Thu, 7 Aug 2014, Tyler Palsulich wrote:
> >> >>> Sounds like the new module is a good idea. So, let's jump on it! I
> >>will
> >> >>> create a new 'example' JIRA tag and create issues for creating the
> >> >>> module and adding Parse, Detect, and Translate examples. Others
> >>should
> >> >>> add issues/desired examples as they see fit. How's that sound?
> >> >>
> >> >>I wonder if it's worth approaching those crazy fools who wrote a book
> >>on
> >> >>Tika, to see if we could pinch one or two of their examples? If only
> >>we
> >> >>knew who they were... ;-)
> >> >>
> >> >>
> >> >>Recursion is one that causes confusion, we've got some example
> >>programs
> >> >>on
> >> >>the wiki that we can include:
> >> >>https://wiki.apache.org/tika/RecursiveMetadata
> >> >>
> >> >>Ray Gauss is probably our best bet for advanced metadata stuff to
> >>send in
> >> >>some examples on that!
> >> >>
> >> >>Another one that has generated mailing list traffic lately is embedded
> >> >>images, including re-writing links to them. There's some (LGPL) code
> >>in
> >> >>Alfresco which I wrote a few years ago to do that, Ray might be able
> >>to
> >> >>get the nod to contribute that (or a cut-down version) as an example
> >>of
> >> >>that style of parsing html + embedded resources in parallel
> >> >>
> >> >>Nick
> >> >
> >>
> >>
>
>

Re: [DISCUSS] Give examples of Parser, Detector, and Translator usage

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
let's go with o.a.tika.example
Class names don't need Example in them.

Sound good?


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Tyler Palsulich <tp...@gmail.com>
Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
Date: Tuesday, August 12, 2014 4:34 PM
To: "dev@tika.apache.org" <de...@tika.apache.org>
Subject: Re: [DISCUSS] Give examples of Parser, Detector, and Translator
usage

>Woot! Any input on naming conventions for the examples?
>Package: org.apache.tika.example.
>File/class: *Example.java.
>
>Methods?
>
>Tyler
>On Aug 12, 2014 9:32 AM, "Mattmann, Chris A (3980)" <
>chris.a.mattmann@jpl.nasa.gov> wrote:
>
>> OK I checked with Manning! We can contribute the source code! :)
>>
>> I will prepare it as part of the tika-examples package. Woot!
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398)
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: chris.a.mattmann@nasa.gov
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Associate Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: <Mattmann>, Chris Mattmann <Ch...@jpl.nasa.gov>
>> Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
>> Date: Thursday, August 7, 2014 2:54 PM
>> To: "dev@tika.apache.org" <de...@tika.apache.org>
>> Subject: Re: [DISCUSS] Give examples of Parser, Detector, and Translator
>> usage
>>
>> >Hey Nick! :)
>> >
>> >I'd have no problem pinching the code from Tika in Action. I wonder if
>> >the Manning folks would mind.
>> >
>> >I'll reach out to them.
>> >
>> >Cheers,
>> >CHris
>> >
>> >
>> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >Chris Mattmann, Ph.D.
>> >Chief Architect
>> >Instrument Software and Science Data Systems Section (398)
>> >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> >Office: 168-519, Mailstop: 168-527
>> >Email: chris.a.mattmann@nasa.gov
>> >WWW:  http://sunset.usc.edu/~mattmann/
>> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >Adjunct Associate Professor, Computer Science Department
>> >University of Southern California, Los Angeles, CA 90089 USA
>> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >
>> >
>> >
>> >
>> >
>> >
>> >-----Original Message-----
>> >From: Nick Burch <ap...@gagravarr.org>
>> >Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
>> >Date: Thursday, August 7, 2014 2:42 PM
>> >To: "dev@tika.apache.org" <de...@tika.apache.org>
>> >Subject: Re: [DISCUSS] Give examples of Parser, Detector, and
>>Translator
>> >usage
>> >
>> >>On Thu, 7 Aug 2014, Tyler Palsulich wrote:
>> >>> Sounds like the new module is a good idea. So, let's jump on it! I
>>will
>> >>> create a new 'example' JIRA tag and create issues for creating the
>> >>> module and adding Parse, Detect, and Translate examples. Others
>>should
>> >>> add issues/desired examples as they see fit. How's that sound?
>> >>
>> >>I wonder if it's worth approaching those crazy fools who wrote a book
>>on
>> >>Tika, to see if we could pinch one or two of their examples? If only
>>we
>> >>knew who they were... ;-)
>> >>
>> >>
>> >>Recursion is one that causes confusion, we've got some example
>>programs
>> >>on
>> >>the wiki that we can include:
>> >>https://wiki.apache.org/tika/RecursiveMetadata
>> >>
>> >>Ray Gauss is probably our best bet for advanced metadata stuff to
>>send in
>> >>some examples on that!
>> >>
>> >>Another one that has generated mailing list traffic lately is embedded
>> >>images, including re-writing links to them. There's some (LGPL) code
>>in
>> >>Alfresco which I wrote a few years ago to do that, Ray might be able
>>to
>> >>get the nod to contribute that (or a cut-down version) as an example
>>of
>> >>that style of parsing html + embedded resources in parallel
>> >>
>> >>Nick
>> >
>>
>>


Re: [DISCUSS] Give examples of Parser, Detector, and Translator usage

Posted by Tyler Palsulich <tp...@gmail.com>.
Woot! Any input on naming conventions for the examples?
Package: org.apache.tika.example.
File/class: *Example.java.

Methods?

Tyler
On Aug 12, 2014 9:32 AM, "Mattmann, Chris A (3980)" <
chris.a.mattmann@jpl.nasa.gov> wrote:

> OK I checked with Manning! We can contribute the source code! :)
>
> I will prepare it as part of the tika-examples package. Woot!
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattmann@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
> -----Original Message-----
> From: <Mattmann>, Chris Mattmann <Ch...@jpl.nasa.gov>
> Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
> Date: Thursday, August 7, 2014 2:54 PM
> To: "dev@tika.apache.org" <de...@tika.apache.org>
> Subject: Re: [DISCUSS] Give examples of Parser, Detector, and Translator
> usage
>
> >Hey Nick! :)
> >
> >I'd have no problem pinching the code from Tika in Action. I wonder if
> >the Manning folks would mind.
> >
> >I'll reach out to them.
> >
> >Cheers,
> >CHris
> >
> >
> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >Chris Mattmann, Ph.D.
> >Chief Architect
> >Instrument Software and Science Data Systems Section (398)
> >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >Office: 168-519, Mailstop: 168-527
> >Email: chris.a.mattmann@nasa.gov
> >WWW:  http://sunset.usc.edu/~mattmann/
> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >Adjunct Associate Professor, Computer Science Department
> >University of Southern California, Los Angeles, CA 90089 USA
> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >
> >
> >
> >
> >
> >
> >-----Original Message-----
> >From: Nick Burch <ap...@gagravarr.org>
> >Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
> >Date: Thursday, August 7, 2014 2:42 PM
> >To: "dev@tika.apache.org" <de...@tika.apache.org>
> >Subject: Re: [DISCUSS] Give examples of Parser, Detector, and Translator
> >usage
> >
> >>On Thu, 7 Aug 2014, Tyler Palsulich wrote:
> >>> Sounds like the new module is a good idea. So, let's jump on it! I will
> >>> create a new 'example' JIRA tag and create issues for creating the
> >>> module and adding Parse, Detect, and Translate examples. Others should
> >>> add issues/desired examples as they see fit. How's that sound?
> >>
> >>I wonder if it's worth approaching those crazy fools who wrote a book on
> >>Tika, to see if we could pinch one or two of their examples? If only we
> >>knew who they were... ;-)
> >>
> >>
> >>Recursion is one that causes confusion, we've got some example programs
> >>on
> >>the wiki that we can include:
> >>https://wiki.apache.org/tika/RecursiveMetadata
> >>
> >>Ray Gauss is probably our best bet for advanced metadata stuff to send in
> >>some examples on that!
> >>
> >>Another one that has generated mailing list traffic lately is embedded
> >>images, including re-writing links to them. There's some (LGPL) code in
> >>Alfresco which I wrote a few years ago to do that, Ray might be able to
> >>get the nod to contribute that (or a cut-down version) as an example of
> >>that style of parsing html + embedded resources in parallel
> >>
> >>Nick
> >
>
>

Re: [DISCUSS] Give examples of Parser, Detector, and Translator usage

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
OK I checked with Manning! We can contribute the source code! :)

I will prepare it as part of the tika-examples package. Woot!

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: <Mattmann>, Chris Mattmann <Ch...@jpl.nasa.gov>
Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
Date: Thursday, August 7, 2014 2:54 PM
To: "dev@tika.apache.org" <de...@tika.apache.org>
Subject: Re: [DISCUSS] Give examples of Parser, Detector, and Translator
usage

>Hey Nick! :)
>
>I'd have no problem pinching the code from Tika in Action. I wonder if
>the Manning folks would mind.
>
>I'll reach out to them.
>
>Cheers,
>CHris
>
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Chris Mattmann, Ph.D.
>Chief Architect
>Instrument Software and Science Data Systems Section (398)
>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>Office: 168-519, Mailstop: 168-527
>Email: chris.a.mattmann@nasa.gov
>WWW:  http://sunset.usc.edu/~mattmann/
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Adjunct Associate Professor, Computer Science Department
>University of Southern California, Los Angeles, CA 90089 USA
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
>-----Original Message-----
>From: Nick Burch <ap...@gagravarr.org>
>Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
>Date: Thursday, August 7, 2014 2:42 PM
>To: "dev@tika.apache.org" <de...@tika.apache.org>
>Subject: Re: [DISCUSS] Give examples of Parser, Detector, and Translator
>usage
>
>>On Thu, 7 Aug 2014, Tyler Palsulich wrote:
>>> Sounds like the new module is a good idea. So, let's jump on it! I will
>>> create a new 'example' JIRA tag and create issues for creating the
>>> module and adding Parse, Detect, and Translate examples. Others should
>>> add issues/desired examples as they see fit. How's that sound?
>>
>>I wonder if it's worth approaching those crazy fools who wrote a book on
>>Tika, to see if we could pinch one or two of their examples? If only we
>>knew who they were... ;-)
>>
>>
>>Recursion is one that causes confusion, we've got some example programs
>>on 
>>the wiki that we can include:
>>https://wiki.apache.org/tika/RecursiveMetadata
>>
>>Ray Gauss is probably our best bet for advanced metadata stuff to send in
>>some examples on that!
>>
>>Another one that has generated mailing list traffic lately is embedded
>>images, including re-writing links to them. There's some (LGPL) code in
>>Alfresco which I wrote a few years ago to do that, Ray might be able to
>>get the nod to contribute that (or a cut-down version) as an example of
>>that style of parsing html + embedded resources in parallel
>>
>>Nick
>


Re: [DISCUSS] Give examples of Parser, Detector, and Translator usage

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
Hey Nick! :)

I'd have no problem pinching the code from Tika in Action. I wonder if
the Manning folks would mind.

I'll reach out to them.

Cheers,
CHris


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Nick Burch <ap...@gagravarr.org>
Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
Date: Thursday, August 7, 2014 2:42 PM
To: "dev@tika.apache.org" <de...@tika.apache.org>
Subject: Re: [DISCUSS] Give examples of Parser, Detector, and Translator
usage

>On Thu, 7 Aug 2014, Tyler Palsulich wrote:
>> Sounds like the new module is a good idea. So, let's jump on it! I will
>> create a new 'example' JIRA tag and create issues for creating the
>> module and adding Parse, Detect, and Translate examples. Others should
>> add issues/desired examples as they see fit. How's that sound?
>
>I wonder if it's worth approaching those crazy fools who wrote a book on
>Tika, to see if we could pinch one or two of their examples? If only we
>knew who they were... ;-)
>
>
>Recursion is one that causes confusion, we've got some example programs
>on 
>the wiki that we can include:
>https://wiki.apache.org/tika/RecursiveMetadata
>
>Ray Gauss is probably our best bet for advanced metadata stuff to send in
>some examples on that!
>
>Another one that has generated mailing list traffic lately is embedded
>images, including re-writing links to them. There's some (LGPL) code in
>Alfresco which I wrote a few years ago to do that, Ray might be able to
>get the nod to contribute that (or a cut-down version) as an example of
>that style of parsing html + embedded resources in parallel
>
>Nick


Re: [DISCUSS] Give examples of Parser, Detector, and Translator usage

Posted by Nick Burch <ap...@gagravarr.org>.
On Thu, 7 Aug 2014, Tyler Palsulich wrote:
> Sounds like the new module is a good idea. So, let's jump on it! I will 
> create a new 'example' JIRA tag and create issues for creating the 
> module and adding Parse, Detect, and Translate examples. Others should 
> add issues/desired examples as they see fit. How's that sound?

I wonder if it's worth approaching those crazy fools who wrote a book on 
Tika, to see if we could pinch one or two of their examples? If only we 
knew who they were... ;-)


Recursion is one that causes confusion, we've got some example programs on 
the wiki that we can include:
https://wiki.apache.org/tika/RecursiveMetadata

Ray Gauss is probably our best bet for advanced metadata stuff to send in 
some examples on that!

Another one that has generated mailing list traffic lately is embedded 
images, including re-writing links to them. There's some (LGPL) code in 
Alfresco which I wrote a few years ago to do that, Ray might be able to 
get the nod to contribute that (or a cut-down version) as an example of 
that style of parsing html + embedded resources in parallel

Nick

Re: [DISCUSS] Give examples of Parser, Detector, and Translator usage

Posted by Annie Burgess <an...@gmail.com>.
+1

I like it Tyler!


On Thu, Aug 7, 2014 at 1:37 PM, Mattmann, Chris A (3980) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> thanks Tyler, perfect
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattmann@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
> -----Original Message-----
> From: Tyler Palsulich <tp...@gmail.com>
> Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
> Date: Thursday, August 7, 2014 2:33 PM
> To: "dev@tika.apache.org" <de...@tika.apache.org>
> Subject: Re: [DISCUSS] Give examples of Parser, Detector, and Translator
> usage
>
> >Sounds like the new module is a good idea. So, let's jump on it! I will
> >create a new 'example' JIRA tag and create issues for creating the module
> >and adding Parse, Detect, and Translate examples. Others should add
> >issues/desired examples as they see fit. How's that sound?
> >
> >Tyler
> >
> >
> >On Thu, Aug 7, 2014 at 1:08 PM, Mattmann, Chris A (3980) <
> >chris.a.mattmann@jpl.nasa.gov> wrote:
> >
> >> Great idea! This is what we did with apache OODT radix you can scope
> >>here
> >> https://cwiki.apache.org/confluence/display/OODT/RADiX+Powered+By+OODT
> >>
> >> Sent from my iPhone
> >>
> >> On Aug 7, 2014, at 12:56 PM, "Hong-Thai Nguyen" <thaichat04@gmail.com
> >> <ma...@gmail.com>> wrote:
> >>
> >> Nice idea.
> >>
> >> We could do more than samples. We can generate parser, detecter or
> >> translator maven archetype. A kind o templete so that user can have
> >>quickly
> >> project to develop new one.
> >>
> >> Regards,
> >>
> >> Hong-Thai
> >>
> >> On 07 Aug 2014, at 18:56, Tyler Palsulich <tpalsulich@apache.org
> <mailto:
> >> tpalsulich@apache.org>> wrote:
> >>
> >> Hi All,
> >>
> >> I think we should add some consolidated documentation on how to use
> >>Tika's
> >> Java API. It would be very helpful if we had short snippets of code that
> >> showed how exactly you can use Parser.parse(), for example. I think I
> >> remember a thread about testing example code a while back, but I'm not
> >> sure. We have some developer documentation on the site, but the user
> >>docs
> >> are somewhat lacking.
> >>
> >> I can think of a few options:
> >>
> >> *1) tika-example module*. This module would have example code of using
> >>each
> >> main interface of Tika. Simplicity and organization would be king, so
> >>new
> >> users can find exactly what they're looking for quickly. A big benefit
> >>of
> >> this is that unit tests would be baked in. I like this option. One
> >>downside
> >> is that reading source code in the browser is terrible (e.g. see [0]).
> >>
> >> *2)* Examples section on the *wiki*. My impression is that the wiki is
> >>not
> >> as popular as the root website. And, it's also very easy to forget about
> >> and let go out of date. But, formatting and explanations would be
> >>pretty.
> >>
> >> *3)* Examples section on the *website*. This has the benefit of pretty
> >> formatting and coloring, without the potential user having to check out
> >>the
> >> repo or view direct source in browser. Another benefit is this section
> >> would be perfect for showing how to use the tika-app jar.
> >>
> >> Right now, I think the best option is a combination of 1 and 3. We get
> >>some
> >> end to end examples running in the tika-example module and short
> >>snippets
> >> of usage on an examples page of the website.
> >>
> >> What do you guys think? What other options should we consider?
> >>
> >> Tyler
> >>
> >> [0] -
> >>
> >>
> >>
> http://svn.apache.org/repos/asf/tika/trunk/tika-core/src/main/java/org/ap
> >>ache/tika/parser/Parser.java
> >>
>
>


-- 
------------------------------------------------------------------------------------------
Ann Bryant Burgess, PhD

Postdoctoral Fellow
Computer Science Department
University of Southern California
Viterbi School of Engineering
Los Angeles, CA

Alaska Science Center/USGS
Anchorage, AK

Cell:  (585) 738-7549
Office:  (907) 786-7059
Fax:  (907) 786-7150
E-mail: anniebryant.burgess@gmail.com
Office Address: 4210 University Dr., Anchorage, AK 99508-4626
-------------------------------------------------------------------------------------------

Re: [DISCUSS] Give examples of Parser, Detector, and Translator usage

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
thanks Tyler, perfect

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Tyler Palsulich <tp...@gmail.com>
Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
Date: Thursday, August 7, 2014 2:33 PM
To: "dev@tika.apache.org" <de...@tika.apache.org>
Subject: Re: [DISCUSS] Give examples of Parser, Detector, and Translator
usage

>Sounds like the new module is a good idea. So, let's jump on it! I will
>create a new 'example' JIRA tag and create issues for creating the module
>and adding Parse, Detect, and Translate examples. Others should add
>issues/desired examples as they see fit. How's that sound?
>
>Tyler
>
>
>On Thu, Aug 7, 2014 at 1:08 PM, Mattmann, Chris A (3980) <
>chris.a.mattmann@jpl.nasa.gov> wrote:
>
>> Great idea! This is what we did with apache OODT radix you can scope
>>here
>> https://cwiki.apache.org/confluence/display/OODT/RADiX+Powered+By+OODT
>>
>> Sent from my iPhone
>>
>> On Aug 7, 2014, at 12:56 PM, "Hong-Thai Nguyen" <thaichat04@gmail.com
>> <ma...@gmail.com>> wrote:
>>
>> Nice idea.
>>
>> We could do more than samples. We can generate parser, detecter or
>> translator maven archetype. A kind o templete so that user can have
>>quickly
>> project to develop new one.
>>
>> Regards,
>>
>> Hong-Thai
>>
>> On 07 Aug 2014, at 18:56, Tyler Palsulich <tpalsulich@apache.org<mailto:
>> tpalsulich@apache.org>> wrote:
>>
>> Hi All,
>>
>> I think we should add some consolidated documentation on how to use
>>Tika's
>> Java API. It would be very helpful if we had short snippets of code that
>> showed how exactly you can use Parser.parse(), for example. I think I
>> remember a thread about testing example code a while back, but I'm not
>> sure. We have some developer documentation on the site, but the user
>>docs
>> are somewhat lacking.
>>
>> I can think of a few options:
>>
>> *1) tika-example module*. This module would have example code of using
>>each
>> main interface of Tika. Simplicity and organization would be king, so
>>new
>> users can find exactly what they're looking for quickly. A big benefit
>>of
>> this is that unit tests would be baked in. I like this option. One
>>downside
>> is that reading source code in the browser is terrible (e.g. see [0]).
>>
>> *2)* Examples section on the *wiki*. My impression is that the wiki is
>>not
>> as popular as the root website. And, it's also very easy to forget about
>> and let go out of date. But, formatting and explanations would be
>>pretty.
>>
>> *3)* Examples section on the *website*. This has the benefit of pretty
>> formatting and coloring, without the potential user having to check out
>>the
>> repo or view direct source in browser. Another benefit is this section
>> would be perfect for showing how to use the tika-app jar.
>>
>> Right now, I think the best option is a combination of 1 and 3. We get
>>some
>> end to end examples running in the tika-example module and short
>>snippets
>> of usage on an examples page of the website.
>>
>> What do you guys think? What other options should we consider?
>>
>> Tyler
>>
>> [0] -
>>
>> 
>>http://svn.apache.org/repos/asf/tika/trunk/tika-core/src/main/java/org/ap
>>ache/tika/parser/Parser.java
>>


Re: [DISCUSS] Give examples of Parser, Detector, and Translator usage

Posted by Tyler Palsulich <tp...@gmail.com>.
Sounds like the new module is a good idea. So, let's jump on it! I will
create a new 'example' JIRA tag and create issues for creating the module
and adding Parse, Detect, and Translate examples. Others should add
issues/desired examples as they see fit. How's that sound?

Tyler


On Thu, Aug 7, 2014 at 1:08 PM, Mattmann, Chris A (3980) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> Great idea! This is what we did with apache OODT radix you can scope here
> https://cwiki.apache.org/confluence/display/OODT/RADiX+Powered+By+OODT
>
> Sent from my iPhone
>
> On Aug 7, 2014, at 12:56 PM, "Hong-Thai Nguyen" <thaichat04@gmail.com
> <ma...@gmail.com>> wrote:
>
> Nice idea.
>
> We could do more than samples. We can generate parser, detecter or
> translator maven archetype. A kind o templete so that user can have quickly
> project to develop new one.
>
> Regards,
>
> Hong-Thai
>
> On 07 Aug 2014, at 18:56, Tyler Palsulich <tpalsulich@apache.org<mailto:
> tpalsulich@apache.org>> wrote:
>
> Hi All,
>
> I think we should add some consolidated documentation on how to use Tika's
> Java API. It would be very helpful if we had short snippets of code that
> showed how exactly you can use Parser.parse(), for example. I think I
> remember a thread about testing example code a while back, but I'm not
> sure. We have some developer documentation on the site, but the user docs
> are somewhat lacking.
>
> I can think of a few options:
>
> *1) tika-example module*. This module would have example code of using each
> main interface of Tika. Simplicity and organization would be king, so new
> users can find exactly what they're looking for quickly. A big benefit of
> this is that unit tests would be baked in. I like this option. One downside
> is that reading source code in the browser is terrible (e.g. see [0]).
>
> *2)* Examples section on the *wiki*. My impression is that the wiki is not
> as popular as the root website. And, it's also very easy to forget about
> and let go out of date. But, formatting and explanations would be pretty.
>
> *3)* Examples section on the *website*. This has the benefit of pretty
> formatting and coloring, without the potential user having to check out the
> repo or view direct source in browser. Another benefit is this section
> would be perfect for showing how to use the tika-app jar.
>
> Right now, I think the best option is a combination of 1 and 3. We get some
> end to end examples running in the tika-example module and short snippets
> of usage on an examples page of the website.
>
> What do you guys think? What other options should we consider?
>
> Tyler
>
> [0] -
>
> http://svn.apache.org/repos/asf/tika/trunk/tika-core/src/main/java/org/apache/tika/parser/Parser.java
>

Re: [DISCUSS] Give examples of Parser, Detector, and Translator usage

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
Great idea! This is what we did with apache OODT radix you can scope here https://cwiki.apache.org/confluence/display/OODT/RADiX+Powered+By+OODT

Sent from my iPhone

On Aug 7, 2014, at 12:56 PM, "Hong-Thai Nguyen" <th...@gmail.com>> wrote:

Nice idea.

We could do more than samples. We can generate parser, detecter or translator maven archetype. A kind o templete so that user can have quickly project to develop new one.

Regards,

Hong-Thai

On 07 Aug 2014, at 18:56, Tyler Palsulich <tp...@apache.org>> wrote:

Hi All,

I think we should add some consolidated documentation on how to use Tika's
Java API. It would be very helpful if we had short snippets of code that
showed how exactly you can use Parser.parse(), for example. I think I
remember a thread about testing example code a while back, but I'm not
sure. We have some developer documentation on the site, but the user docs
are somewhat lacking.

I can think of a few options:

*1) tika-example module*. This module would have example code of using each
main interface of Tika. Simplicity and organization would be king, so new
users can find exactly what they're looking for quickly. A big benefit of
this is that unit tests would be baked in. I like this option. One downside
is that reading source code in the browser is terrible (e.g. see [0]).

*2)* Examples section on the *wiki*. My impression is that the wiki is not
as popular as the root website. And, it's also very easy to forget about
and let go out of date. But, formatting and explanations would be pretty.

*3)* Examples section on the *website*. This has the benefit of pretty
formatting and coloring, without the potential user having to check out the
repo or view direct source in browser. Another benefit is this section
would be perfect for showing how to use the tika-app jar.

Right now, I think the best option is a combination of 1 and 3. We get some
end to end examples running in the tika-example module and short snippets
of usage on an examples page of the website.

What do you guys think? What other options should we consider?

Tyler

[0] -
http://svn.apache.org/repos/asf/tika/trunk/tika-core/src/main/java/org/apache/tika/parser/Parser.java

Re: [DISCUSS] Give examples of Parser, Detector, and Translator usage

Posted by Hong-Thai Nguyen <th...@gmail.com>.
Nice idea.

We could do more than samples. We can generate parser, detecter or translator maven archetype. A kind o templete so that user can have quickly project to develop new one.

Regards,

Hong-Thai

> On 07 Aug 2014, at 18:56, Tyler Palsulich <tp...@apache.org> wrote:
> 
> Hi All,
> 
> I think we should add some consolidated documentation on how to use Tika's
> Java API. It would be very helpful if we had short snippets of code that
> showed how exactly you can use Parser.parse(), for example. I think I
> remember a thread about testing example code a while back, but I'm not
> sure. We have some developer documentation on the site, but the user docs
> are somewhat lacking.
> 
> I can think of a few options:
> 
> *1) tika-example module*. This module would have example code of using each
> main interface of Tika. Simplicity and organization would be king, so new
> users can find exactly what they're looking for quickly. A big benefit of
> this is that unit tests would be baked in. I like this option. One downside
> is that reading source code in the browser is terrible (e.g. see [0]).
> 
> *2)* Examples section on the *wiki*. My impression is that the wiki is not
> as popular as the root website. And, it's also very easy to forget about
> and let go out of date. But, formatting and explanations would be pretty.
> 
> *3)* Examples section on the *website*. This has the benefit of pretty
> formatting and coloring, without the potential user having to check out the
> repo or view direct source in browser. Another benefit is this section
> would be perfect for showing how to use the tika-app jar.
> 
> Right now, I think the best option is a combination of 1 and 3. We get some
> end to end examples running in the tika-example module and short snippets
> of usage on an examples page of the website.
> 
> What do you guys think? What other options should we consider?
> 
> Tyler
> 
> [0] -
> http://svn.apache.org/repos/asf/tika/trunk/tika-core/src/main/java/org/apache/tika/parser/Parser.java

Re: [DISCUSS] Give examples of Parser, Detector, and Translator usage

Posted by Avi Hayun <av...@gmail.com>.
+1 on tika-examples package


On Thu, Aug 7, 2014 at 7:56 PM, Tyler Palsulich <tp...@apache.org>
wrote:

> Hi All,
>
> I think we should add some consolidated documentation on how to use Tika's
> Java API. It would be very helpful if we had short snippets of code that
> showed how exactly you can use Parser.parse(), for example. I think I
> remember a thread about testing example code a while back, but I'm not
> sure. We have some developer documentation on the site, but the user docs
> are somewhat lacking.
>
> I can think of a few options:
>
> *1) tika-example module*. This module would have example code of using each
> main interface of Tika. Simplicity and organization would be king, so new
> users can find exactly what they're looking for quickly. A big benefit of
> this is that unit tests would be baked in. I like this option. One downside
> is that reading source code in the browser is terrible (e.g. see [0]).
>
> *2)* Examples section on the *wiki*. My impression is that the wiki is not
> as popular as the root website. And, it's also very easy to forget about
> and let go out of date. But, formatting and explanations would be pretty.
>
> *3)* Examples section on the *website*. This has the benefit of pretty
> formatting and coloring, without the potential user having to check out the
> repo or view direct source in browser. Another benefit is this section
> would be perfect for showing how to use the tika-app jar.
>
> Right now, I think the best option is a combination of 1 and 3. We get some
> end to end examples running in the tika-example module and short snippets
> of usage on an examples page of the website.
>
> What do you guys think? What other options should we consider?
>
> Tyler
>
> [0] -
>
> http://svn.apache.org/repos/asf/tika/trunk/tika-core/src/main/java/org/apache/tika/parser/Parser.java
>