You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/11/01 19:47:34 UTC
[jira] [Updated] (NUTCH-1644) Should have a parser that uses xpath
[ https://issues.apache.org/jira/browse/NUTCH-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lewis John McGibbney updated NUTCH-1644:
----------------------------------------
Fix Version/s: (was: 2.3)
2.4
> Should have a parser that uses xpath
> ------------------------------------
>
> Key: NUTCH-1644
> URL: https://issues.apache.org/jira/browse/NUTCH-1644
> Project: Nutch
> Issue Type: New Feature
> Components: parser
> Affects Versions: 2.2.1
> Reporter: cihad güzel
> Assignee: Lewis John McGibbney
> Labels: parser, xpath
> Fix For: 2.4
>
> Attachments: NUTCH-1644.patch
>
>
> May want to parse some url via xpath. May be blog or news web sites. Should be a plugin using xpath parse.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Re: [jira] [Updated] (NUTCH-1644) Should have a parser that uses xpath
Posted by Albin Vigier <al...@gmail.com>.
Hello Sebastian,
I'll look at the xjb failure, so glad to see that it will be integrated
into ivy!
For the examples part, I normally added some commented tests in the tests
folders. I'll look to provide a conf also if not already existing. I'll
keep you in touch.
Thanks,
Albin
On Mon, Nov 3, 2014 at 11:50 PM, Sebastian Nagel <wastl.nagel@googlemail.com
> wrote:
> Hi Albin,
>
> you mean NUTCH-1870, right?
> I'm in the process of reviewing your patch.
> Just stuck in preparing the boilerplate required
> to intregate parse-xsl into build, tests, javadoc.
> I've added the jaxb dependencies to ivy,
> but the xjb task fails. Presumably, because
> there is a version mismatch.
> See attached patch. If you can resolve this problem,
> would be great!
>
> Also we need a configuration template on conf/.
> Just one rules and one transformer file,
> ideally with some examples (commented out)
> so that people can start with, and do not need
> to read external stuff. Your blog [1] is great,
> but it's better to have it at hand. Also conf/
> it the first place to look at.
>
> Thanks,
> Sebastian
>
> [1]
> http://albinscoding.wordpress.com/2014/09/25/xsl-parser-for-apache-nutch/
>
>
> On 11/01/2014 09:48 PM, Albinscode wrote:
> > Hello everybody,
> >
> > If some more efforts are to be done on NUTCH-1740, I'll be glad to
> > help. I developed this plugin because I was amongst people that didn't
> > want to create new plugins just for few metadata extraction matters ;)
> >
> > 2014-11-01 19:47 GMT+01:00 Lewis John McGibbney (JIRA) <jira@apache.org
> >:
> >>
> >> [
> https://issues.apache.org/jira/browse/NUTCH-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> ]
> >>
> >> Lewis John McGibbney updated NUTCH-1644:
> >> ----------------------------------------
> >> Fix Version/s: (was: 2.3)
> >> 2.4
> >>
> >>> Should have a parser that uses xpath
> >>> ------------------------------------
> >>>
> >>> Key: NUTCH-1644
> >>> URL: https://issues.apache.org/jira/browse/NUTCH-1644
> >>> Project: Nutch
> >>> Issue Type: New Feature
> >>> Components: parser
> >>> Affects Versions: 2.2.1
> >>> Reporter: cihad güzel
> >>> Assignee: Lewis John McGibbney
> >>> Labels: parser, xpath
> >>> Fix For: 2.4
> >>>
> >>> Attachments: NUTCH-1644.patch
> >>>
> >>>
> >>> May want to parse some url via xpath. May be blog or news web sites.
> Should be a plugin using xpath parse.
> >>
> >>
> >>
> >> --
> >> This message was sent by Atlassian JIRA
> >> (v6.3.4#6332)
>
>
Re: [jira] [Updated] (NUTCH-1644) Should have a parser that uses xpath
Posted by Albinscode <al...@gmail.com>.
Hello Sebastian,
I'll look at the xjb failure, so glad to see that it will be
integrated into ivy!
For the examples part, I normally added some commented tests in the
tests folders. I'll look to provide a conf also if not already
existing. I'll keep you in touch.
Thanks,
Albin
2014-11-03 23:50 GMT+01:00 Sebastian Nagel <wa...@googlemail.com>:
> Hi Albin,
>
> you mean NUTCH-1870, right?
> I'm in the process of reviewing your patch.
> Just stuck in preparing the boilerplate required
> to intregate parse-xsl into build, tests, javadoc.
> I've added the jaxb dependencies to ivy,
> but the xjb task fails. Presumably, because
> there is a version mismatch.
> See attached patch. If you can resolve this problem,
> would be great!
>
> Also we need a configuration template on conf/.
> Just one rules and one transformer file,
> ideally with some examples (commented out)
> so that people can start with, and do not need
> to read external stuff. Your blog [1] is great,
> but it's better to have it at hand. Also conf/
> it the first place to look at.
>
> Thanks,
> Sebastian
>
> [1] http://albinscoding.wordpress.com/2014/09/25/xsl-parser-for-apache-nutch/
>
>
> On 11/01/2014 09:48 PM, Albinscode wrote:
>> Hello everybody,
>>
>> If some more efforts are to be done on NUTCH-1740, I'll be glad to
>> help. I developed this plugin because I was amongst people that didn't
>> want to create new plugins just for few metadata extraction matters ;)
>>
>> 2014-11-01 19:47 GMT+01:00 Lewis John McGibbney (JIRA) <ji...@apache.org>:
>>>
>>> [ https://issues.apache.org/jira/browse/NUTCH-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>>>
>>> Lewis John McGibbney updated NUTCH-1644:
>>> ----------------------------------------
>>> Fix Version/s: (was: 2.3)
>>> 2.4
>>>
>>>> Should have a parser that uses xpath
>>>> ------------------------------------
>>>>
>>>> Key: NUTCH-1644
>>>> URL: https://issues.apache.org/jira/browse/NUTCH-1644
>>>> Project: Nutch
>>>> Issue Type: New Feature
>>>> Components: parser
>>>> Affects Versions: 2.2.1
>>>> Reporter: cihad güzel
>>>> Assignee: Lewis John McGibbney
>>>> Labels: parser, xpath
>>>> Fix For: 2.4
>>>>
>>>> Attachments: NUTCH-1644.patch
>>>>
>>>>
>>>> May want to parse some url via xpath. May be blog or news web sites. Should be a plugin using xpath parse.
>>>
>>>
>>>
>>> --
>>> This message was sent by Atlassian JIRA
>>> (v6.3.4#6332)
>
Re: [jira] [Updated] (NUTCH-1644) Should have a parser that uses
xpath
Posted by Sebastian Nagel <wa...@googlemail.com>.
Hi Albin,
you mean NUTCH-1870, right?
I'm in the process of reviewing your patch.
Just stuck in preparing the boilerplate required
to intregate parse-xsl into build, tests, javadoc.
I've added the jaxb dependencies to ivy,
but the xjb task fails. Presumably, because
there is a version mismatch.
See attached patch. If you can resolve this problem,
would be great!
Also we need a configuration template on conf/.
Just one rules and one transformer file,
ideally with some examples (commented out)
so that people can start with, and do not need
to read external stuff. Your blog [1] is great,
but it's better to have it at hand. Also conf/
it the first place to look at.
Thanks,
Sebastian
[1] http://albinscoding.wordpress.com/2014/09/25/xsl-parser-for-apache-nutch/
On 11/01/2014 09:48 PM, Albinscode wrote:
> Hello everybody,
>
> If some more efforts are to be done on NUTCH-1740, I'll be glad to
> help. I developed this plugin because I was amongst people that didn't
> want to create new plugins just for few metadata extraction matters ;)
>
> 2014-11-01 19:47 GMT+01:00 Lewis John McGibbney (JIRA) <ji...@apache.org>:
>>
>> [ https://issues.apache.org/jira/browse/NUTCH-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>>
>> Lewis John McGibbney updated NUTCH-1644:
>> ----------------------------------------
>> Fix Version/s: (was: 2.3)
>> 2.4
>>
>>> Should have a parser that uses xpath
>>> ------------------------------------
>>>
>>> Key: NUTCH-1644
>>> URL: https://issues.apache.org/jira/browse/NUTCH-1644
>>> Project: Nutch
>>> Issue Type: New Feature
>>> Components: parser
>>> Affects Versions: 2.2.1
>>> Reporter: cihad güzel
>>> Assignee: Lewis John McGibbney
>>> Labels: parser, xpath
>>> Fix For: 2.4
>>>
>>> Attachments: NUTCH-1644.patch
>>>
>>>
>>> May want to parse some url via xpath. May be blog or news web sites. Should be a plugin using xpath parse.
>>
>>
>>
>> --
>> This message was sent by Atlassian JIRA
>> (v6.3.4#6332)
Re: [jira] [Updated] (NUTCH-1644) Should have a parser that uses xpath
Posted by Albinscode <al...@gmail.com>.
Hello everybody,
If some more efforts are to be done on NUTCH-1740, I'll be glad to
help. I developed this plugin because I was amongst people that didn't
want to create new plugins just for few metadata extraction matters ;)
2014-11-01 19:47 GMT+01:00 Lewis John McGibbney (JIRA) <ji...@apache.org>:
>
> [ https://issues.apache.org/jira/browse/NUTCH-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>
> Lewis John McGibbney updated NUTCH-1644:
> ----------------------------------------
> Fix Version/s: (was: 2.3)
> 2.4
>
>> Should have a parser that uses xpath
>> ------------------------------------
>>
>> Key: NUTCH-1644
>> URL: https://issues.apache.org/jira/browse/NUTCH-1644
>> Project: Nutch
>> Issue Type: New Feature
>> Components: parser
>> Affects Versions: 2.2.1
>> Reporter: cihad güzel
>> Assignee: Lewis John McGibbney
>> Labels: parser, xpath
>> Fix For: 2.4
>>
>> Attachments: NUTCH-1644.patch
>>
>>
>> May want to parse some url via xpath. May be blog or news web sites. Should be a plugin using xpath parse.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)