You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Remzi Düzağaç <re...@gmail.com> on 2015/04/04 16:20:35 UTC

Re: GSOC RDF Microformats Support

Hi Chirs,

Sorry for late answer I couldnt write I was sick last week.
I have checked links. If I wanna do the job, I must use them and I will.
On the other hand,  I need a mentor for gsoc project. Would you consider
being my mentor?

On Sat, Mar 28, 2015 at 4:53 AM, Mattmann, Chris A (3980) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> Hi Remiz,
>
> Sure!
>
> Check out this 5 min writing a parser guide in Tika:
>
> https://tika.apache.org/1.7/parser_guide.html
>
>
> OK, so then check out Any23:
>
> http://any23.apache.org/
>
> It has support for parsing RDF Microformats. So, you
> may want to create a MicroformatsParser in Tika; then
> if it’s supported in Tika, it will in turn be available
> in Nutch and its parse-tika plugin if you upgrade it to
> the latest version of Tika.
>
> You can see how to do this here:
>
> http://s.apache.org/fsY
>
> Cheers and best of luck - hope that’s enough to get
> your proposal kicked off.
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattmann@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
> -----Original Message-----
> From: Remzi Düzağaç <re...@gmail.com>
> Reply-To: "dev@nutch.apache.org" <de...@nutch.apache.org>
> Date: Friday, March 27, 2015 at 7:22 AM
> To: dev <de...@nutch.apache.org>
> Cc: "dev@tika.apache.org" <de...@tika.apache.org>, "dev@any23.apache.org"
> <de...@any23.apache.org>
> Subject: Re: GSOC RDF Microformats Support
>
> >Hi Chris,
> >
> >
> >Thanks for your feedback.
> >I was planning to use any23 and tika but I dont have detailed grasp of
> >both projects. I guess Im gonna need to dive in both.
> >
> >
> >I would appreciate if you could guide me
> >
> >
> >thanks
> >
> >On Fri, Mar 27, 2015 at 4:07 PM, Mattmann, Chris A (3980)
> ><ch...@jpl.nasa.gov> wrote:
> >
> >Hi Remzi - thanks! You may want to consider this as a Tika or
> >Any23 project since Nutch delegates its parsing to Tika (and
> >Any23 uses Tika [and vice versa] to handle micro formats).
> >
> >Cheers,
> >Chris
> >
> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >Chris Mattmann, Ph.D.
> >Chief Architect
> >Instrument Software and Science Data Systems Section (398)
> >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >Office: 168-519, Mailstop: 168-527
> >Email: chris.a.mattmann@nasa.gov
> >WWW:  http://sunset.usc.edu/~mattmann/
> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >Adjunct Associate Professor, Computer Science Department
> >University of Southern California, Los Angeles, CA 90089 USA
> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >
> >
> >
> >
> >
> >
> >-----Original Message-----
> >From: Remzi Düzağaç <re...@gmail.com>
> >Reply-To: "dev@nutch.apache.org" <de...@nutch.apache.org>
> >Date: Friday, March 27, 2015 at 5:07 AM
> >To: "dev@nutch.apache.org" <de...@nutch.apache.org>
> >Subject: GSOC RDF Microformats Support
> >
> >>Hi Guys,
> >>
> >>
> >>I have sent a proposal to gsoc. I would like to add rdf microformat
> >>support to nutch. I kindly ask for your support. Is there anyone
> >>volunteer to be my mentor on this topic?
> >>
> >>
> >>Thank you very much
> >>
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
>