You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@gora.apache.org by Lewis John Mcgibbney <le...@gmail.com> on 2013/01/18 22:13:07 UTC

Re: Nutch 2.x : readdb command dump

Hi Kiran & All,

We are currently discussing a modification of the WebTableReader tool [0]
over in Nutch 2.x. For reference the full mail thread can be seen here [1].
So currently Kiran's question is as below.
This list is much more appropriate to discuss the Gora Query API than
user@nutch. We may even be able to improve the Query API ;)
Any thoughts?

Best
Lewis

On Wed, Jan 16, 2013 at 10:55 PM, kiran chitturi
<ch...@gmail.com>wrote:

>
>
>
> So, my question is whether we can set a single/multiple fields in the query
> rather than all the fields like in line in 319 in [0]
>
> [0] -
>
> http://svn.apache.org/viewvc/nutch/branches/2.x/src/java/org/apache/nutch/crawl/WebTableReader.java?view=markup
>
> [1] http://www.mail-archive.com/user%40nutch.apache.org/msg08566.html

Re: Nutch 2.x : readdb command dump

Posted by kiran chitturi <ch...@gmail.com>.
Hi Lewis,

Thanks for posting my question here. I was working on something and want to
reply once i have it working.

Right now, i am able to dump only field information to Nutch using command
(./bin/nutch readdb -dump dumpFields -field status).

This command dumps the fields 'baseURL' and 'status' to the folder
'dumpFields'. Since, i have only 16k records, i have imported this file in
to excel, and sorted the urls according to the parse status.

So, what i am looking to develop is to send (key, value) to Gora and it
returns only the records that have that value.

That command can look like (./bin/nutch readdb -dump dumpFields
-fieldValues '{"status":2}' ). Actually, i can do this filter within Nutch
after gora returns all the records. I thought giving key, values as JSON
fits in to the scheme. Once i have like 600,000 docs or more this can be
really useful.

So, what do you guys think ? I am not much familiar with Gora but is it
better implement this with in Nutch itself ?

Thanks,
Kiran.









On Fri, Jan 18, 2013 at 4:13 PM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:

> Hi Kiran & All,
>
> We are currently discussing a modification of the WebTableReader tool [0]
> over in Nutch 2.x. For reference the full mail thread can be seen here [1].
> So currently Kiran's question is as below.
> This list is much more appropriate to discuss the Gora Query API than
> user@nutch. We may even be able to improve the Query API ;)
> Any thoughts?
>
> Best
> Lewis
>
>
> On Wed, Jan 16, 2013 at 10:55 PM, kiran chitturi <
> chitturikiran15@gmail.com> wrote:
>
>>
>>
>>
>> So, my question is whether we can set a single/multiple fields in the
>> query
>> rather than all the fields like in line in 319 in [0]
>>
>> [0] -
>>
>> http://svn.apache.org/viewvc/nutch/branches/2.x/src/java/org/apache/nutch/crawl/WebTableReader.java?view=markup
>>
>> [1] http://www.mail-archive.com/user%40nutch.apache.org/msg08566.html
>



-- 
Kiran Chitturi