You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Shane Wood <sh...@cbm8bit.com> on 2014/03/27 04:15:08 UTC

MYSQL field meanings

Could someone comment in what these fields do when using Nutch and MYSQL ?
or is there a web page where this information is already available.
Thanks



id
headers
text
status
markers
parseStatus
modifiedTime <---- this is always NULL ? any idea why.
prevModifiedTime <---- this is always NULL ? any idea why.
score
typ
batchId
baseUrl
content
title
reprUrl
fetchInterval
prevFetchTime
inlinks
prevSignature
outlinks
fetchTime
retriesSinceFetch
Ascending
protocolStatus
signature
metadata



Re: MYSQL field meanings

Posted by Shane Wood <sh...@cbm8bit.com>.
Can i tell generate too generate a fetch based on the status field in 
MYSQL, i wish to index only status  1 meaning not yet fetched and parse 
them only till there all done. This would be a great help.

Cheers
Shane.


On 27/03/14 13:15, Shane Wood wrote:
> Could someone comment in what these fields do when using Nutch and 
> MYSQL ?
> or is there a web page where this information is already available.
> Thanks
>
>
>
> id
> headers
> text
> status
> markers
> parseStatus
> modifiedTime <---- this is always NULL ? any idea why.
> prevModifiedTime <---- this is always NULL ? any idea why.
> score
> typ
> batchId
> baseUrl
> content
> title
> reprUrl
> fetchInterval
> prevFetchTime
> inlinks
> prevSignature
> outlinks
> fetchTime
> retriesSinceFetch
> Ascending
> protocolStatus
> signature
> metadata
>
>


Re: MYSQL field meanings

Posted by Shane Wood <sh...@cbm8bit.com>.
Thanks will read up on that...

Cheers. :)



On 27/03/14 18:48, Vangelis karv wrote:
> http://nlp.solutions.asia/?p=232
> Hope this helps :)
>
>    
>> Date: Thu, 27 Mar 2014 13:15:08 +1000
>> From: shane@cbm8bit.com
>> To: user@nutch.apache.org
>> Subject: MYSQL field meanings
>>
>> Could someone comment in what these fields do when using Nutch and MYSQL ?
>> or is there a web page where this information is already available.
>> Thanks
>>
>>
>>
>> id
>> headers
>> text
>> status
>> markers
>> parseStatus
>> modifiedTime<---- this is always NULL ? any idea why.
>> prevModifiedTime<---- this is always NULL ? any idea why.
>> score
>> typ
>> batchId
>> baseUrl
>> content
>> title
>> reprUrl
>> fetchInterval
>> prevFetchTime
>> inlinks
>> prevSignature
>> outlinks
>> fetchTime
>> retriesSinceFetch
>> Ascending
>> protocolStatus
>> signature
>> metadata
>>
>>
>>      
>   		 	   		
>    


RE: MYSQL field meanings

Posted by Vangelis karv <ka...@hotmail.com>.
http://nlp.solutions.asia/?p=232
Hope this helps :)

> Date: Thu, 27 Mar 2014 13:15:08 +1000
> From: shane@cbm8bit.com
> To: user@nutch.apache.org
> Subject: MYSQL field meanings
> 
> Could someone comment in what these fields do when using Nutch and MYSQL ?
> or is there a web page where this information is already available.
> Thanks
> 
> 
> 
> id
> headers
> text
> status
> markers
> parseStatus
> modifiedTime <---- this is always NULL ? any idea why.
> prevModifiedTime <---- this is always NULL ? any idea why.
> score
> typ
> batchId
> baseUrl
> content
> title
> reprUrl
> fetchInterval
> prevFetchTime
> inlinks
> prevSignature
> outlinks
> fetchTime
> retriesSinceFetch
> Ascending
> protocolStatus
> signature
> metadata
> 
> 
 		 	   		  

Re: MYSQL field meanings

Posted by Talat Uyarer <ta...@uyarer.com>.
Yes,

On 2.x branch the patch is commited.
27 Mar 2014 12:31 tarihinde "Shane Wood" <sh...@cbm8bit.com> yazdı:

> I'm using Nutch 2.2 as per this install tutorial would this patch already
> been added to the newer version ?.
> http://nlp.solutions.asia/?p=362
>
> Enjoy
> Shane.
>
> On 27/03/14 18:54, Talat Uyarer wrote:
>
>> Hi Shane,
>>
>> Which version of nutch do you use  ? If you use Nutch 2.2.1. This a a bug.
>> You should take a look at https://issues.apache.org/
>> jira/browse/NUTCH-1651
>>
>> Talat
>>
>>
>> 2014-03-27 5:15 GMT+02:00 Shane Wood<sh...@cbm8bit.com>:
>>
>>
>>
>>> Could someone comment in what these fields do when using Nutch and MYSQL
>>> ?
>>> or is there a web page where this information is already available.
>>> Thanks
>>>
>>>
>>>
>>> id
>>> headers
>>> text
>>> status
>>> markers
>>> parseStatus
>>> modifiedTime<---- this is always NULL ? any idea why.
>>> prevModifiedTime<---- this is always NULL ? any idea why.
>>> score
>>> typ
>>> batchId
>>> baseUrl
>>> content
>>> title
>>> reprUrl
>>> fetchInterval
>>> prevFetchTime
>>> inlinks
>>> prevSignature
>>> outlinks
>>> fetchTime
>>> retriesSinceFetch
>>> Ascending
>>> protocolStatus
>>> signature
>>> metadata
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>
>

Re: MYSQL field meanings

Posted by Shane Wood <sh...@cbm8bit.com>.
I'm using Nutch 2.2 as per this install tutorial would this patch 
already been added to the newer version ?.
http://nlp.solutions.asia/?p=362

Enjoy
Shane.

On 27/03/14 18:54, Talat Uyarer wrote:
> Hi Shane,
>
> Which version of nutch do you use  ? If you use Nutch 2.2.1. This a a bug.
> You should take a look at https://issues.apache.org/jira/browse/NUTCH-1651
>
> Talat
>
>
> 2014-03-27 5:15 GMT+02:00 Shane Wood<sh...@cbm8bit.com>:
>
>    
>> Could someone comment in what these fields do when using Nutch and MYSQL ?
>> or is there a web page where this information is already available.
>> Thanks
>>
>>
>>
>> id
>> headers
>> text
>> status
>> markers
>> parseStatus
>> modifiedTime<---- this is always NULL ? any idea why.
>> prevModifiedTime<---- this is always NULL ? any idea why.
>> score
>> typ
>> batchId
>> baseUrl
>> content
>> title
>> reprUrl
>> fetchInterval
>> prevFetchTime
>> inlinks
>> prevSignature
>> outlinks
>> fetchTime
>> retriesSinceFetch
>> Ascending
>> protocolStatus
>> signature
>> metadata
>>
>>
>>
>>      
>
>    


Re: MYSQL field meanings

Posted by Talat Uyarer <ta...@uyarer.com>.
Hi Shane,

Which version of nutch do you use  ? If you use Nutch 2.2.1. This a a bug.
You should take a look at https://issues.apache.org/jira/browse/NUTCH-1651

Talat


2014-03-27 5:15 GMT+02:00 Shane Wood <sh...@cbm8bit.com>:

> Could someone comment in what these fields do when using Nutch and MYSQL ?
> or is there a web page where this information is already available.
> Thanks
>
>
>
> id
> headers
> text
> status
> markers
> parseStatus
> modifiedTime <---- this is always NULL ? any idea why.
> prevModifiedTime <---- this is always NULL ? any idea why.
> score
> typ
> batchId
> baseUrl
> content
> title
> reprUrl
> fetchInterval
> prevFetchTime
> inlinks
> prevSignature
> outlinks
> fetchTime
> retriesSinceFetch
> Ascending
> protocolStatus
> signature
> metadata
>
>
>


-- 
Talat UYARER
Websitesi: http://talat.uyarer.com
Twitter: http://twitter.com/talatuyarer
Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304