You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by feng lu <am...@gmail.com> on 2013/02/01 08:31:53 UTC

Re: Mysql don't save Markers properly

Hi vetus.

I found the same problem when i run the crawl processing
inject->generate->parse->updatedb. the mysql db output is:

mysql> SELECT convert(markers using utf8),baseUrl FROM `webpage` WHERE 1;

dist0_injmrk_y_updmrk_*1359699678-1110220041__prsmrk__*1359699678-1110220041_gnmrk_*1359699678-1110220041
_ftcmrk_*1359699678-1110220041  | http://www.apache.org/ |

the generate and fetch mark is still in the db.

But when i use HBase as the back-end DB, with the same crawled url and same
crawl process.

In HBase , after runing the updatedb command, the Generate and Fetch mark
are all remove.

So maybe it's a bug in Gora-sql model.


On Thu, Jan 31, 2013 at 5:42 PM, amuseme <am...@gmail.com> wrote:

> Hi vetus.
>
> Why updater don't delete the values from de database. I see in
> DbUpdateReducer class WebPage has already remove the Generate and Fetcher
> markers if they exists.
>
>
>
> -----
> Don't Grow Old, Grow Up.
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Mysql-don-t-save-Markers-properly-tp4037310p4037651.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>



-- 
Don't Grow Old, Grow Up... :-)

Re: Mysql don't save Markers properly

Posted by kiran chitturi <ch...@gmail.com>.
gora-sql has few bugs. Its recommended to use hbase with Nutch. I had a
problem in fetching and parsing data.


On Fri, Feb 1, 2013 at 2:31 AM, feng lu <am...@gmail.com> wrote:

> Hi vetus.
>
> I found the same problem when i run the crawl processing
> inject->generate->parse->updatedb. the mysql db output is:
>
> mysql> SELECT convert(markers using utf8),baseUrl FROM `webpage` WHERE 1;
>
>
> dist0_injmrk_y_updmrk_*1359699678-1110220041__prsmrk__*1359699678-1110220041_gnmrk_*1359699678-1110220041
> _ftcmrk_*1359699678-1110220041  | http://www.apache.org/ |
>
> the generate and fetch mark is still in the db.
>
> But when i use HBase as the back-end DB, with the same crawled url and same
> crawl process.
>
> In HBase , after runing the updatedb command, the Generate and Fetch mark
> are all remove.
>
> So maybe it's a bug in Gora-sql model.
>
>
> On Thu, Jan 31, 2013 at 5:42 PM, amuseme <am...@gmail.com> wrote:
>
> > Hi vetus.
> >
> > Why updater don't delete the values from de database. I see in
> > DbUpdateReducer class WebPage has already remove the Generate and Fetcher
> > markers if they exists.
> >
> >
> >
> > -----
> > Don't Grow Old, Grow Up.
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Mysql-don-t-save-Markers-properly-tp4037310p4037651.html
> > Sent from the Nutch - User mailing list archive at Nabble.com.
> >
>
>
>
> --
> Don't Grow Old, Grow Up... :-)
>



-- 
Kiran Chitturi