You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Cihad Guzel <cg...@gmail.com> on 2015/05/18 22:26:20 UTC

Nutch-1741 in GSOC 2015

Hi all,

I had applied the GSoC 2015 for Apache Nutch Project and my application is
accepted. The main reason why I have choosen the Nutch Project for GSOC is
knowing the Nutch closely. My subject is "Nutch-1741 - Support of Sitemaps
in Nutch 2.x"[1] . Thanks Lewis John McGibbney and Talat Uyarer for being
my mentors on this process. I hope I can contribute to this project.

[1] https://issues.apache.org/jira/browse/NUTCH-1741

Kind Regards

Re: Nutch-1741 in GSOC 2015

Posted by Cihad Guzel <cg...@gmail.com>.
Ok Lewis,
I signed up to wiki, my wiki username: cihadguzel

2015-05-18 23:44 GMT+03:00 Lewis John Mcgibbney <le...@gmail.com>:

> Fantastic Cihad,
> Thank you for introducing yourself.
> As you are in the community bonding period right now, please feel free to
> provide your wiki username to me and I will grant you access to the wiki.
> Please also feel free to pick up some lingering issues for Nutch 2.3.1
>
> https://issues.apache.org/jira/browse/NUTCH-1945?jql=project%20%3D%20NUTCH%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%202.3.1%20ORDER%20BY%20priority%20DESC
> Thanks
> Lewis
>
>
> On Mon, May 18, 2015 at 1:26 PM, Cihad Guzel <cg...@gmail.com> wrote:
>
>> Hi all,
>>
>> I had applied the GSoC 2015 for Apache Nutch Project and my application
>> is accepted. The main reason why I have choosen the Nutch Project for GSOC
>> is knowing the Nutch closely. My subject is "Nutch-1741 - Support of
>> Sitemaps in Nutch 2.x"[1] . Thanks Lewis John McGibbney and Talat Uyarer
>> for being my mentors on this process. I hope I can contribute to this
>> project.
>>
>> [1] https://issues.apache.org/jira/browse/NUTCH-1741
>>
>> Kind Regards
>>
>
>
>
> --
> *Lewis*
>

Re: Nutch-1741 in GSOC 2015

Posted by Cihad Guzel <cg...@gmail.com>.
Ok Lewis,
I signed up to wiki, my wiki username: cihadguzel

2015-05-18 23:44 GMT+03:00 Lewis John Mcgibbney <le...@gmail.com>:

> Fantastic Cihad,
> Thank you for introducing yourself.
> As you are in the community bonding period right now, please feel free to
> provide your wiki username to me and I will grant you access to the wiki.
> Please also feel free to pick up some lingering issues for Nutch 2.3.1
>
> https://issues.apache.org/jira/browse/NUTCH-1945?jql=project%20%3D%20NUTCH%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%202.3.1%20ORDER%20BY%20priority%20DESC
> Thanks
> Lewis
>
>
> On Mon, May 18, 2015 at 1:26 PM, Cihad Guzel <cg...@gmail.com> wrote:
>
>> Hi all,
>>
>> I had applied the GSoC 2015 for Apache Nutch Project and my application
>> is accepted. The main reason why I have choosen the Nutch Project for GSOC
>> is knowing the Nutch closely. My subject is "Nutch-1741 - Support of
>> Sitemaps in Nutch 2.x"[1] . Thanks Lewis John McGibbney and Talat Uyarer
>> for being my mentors on this process. I hope I can contribute to this
>> project.
>>
>> [1] https://issues.apache.org/jira/browse/NUTCH-1741
>>
>> Kind Regards
>>
>
>
>
> --
> *Lewis*
>

Re: Nutch-1741 in GSOC 2015

Posted by Owen Lin <oa...@nyu.edu>.
Can I unsubscribe?

Thanks!

On Monday, May 18, 2015, Lewis John Mcgibbney <le...@gmail.com>
wrote:

> Cihad,
> Done. You can now edit the wiki.
> Thanks
>
> On Monday, May 18, 2015, Cihad Guzel <cguzelg@gmail.com
> <javascript:_e(%7B%7D,'cvml','cguzelg@gmail.com');>> wrote:
>
>> ok. I signed up again. my username : CihadGuzel
>>
>> 2015-05-19 2:32 GMT+03:00 Lewis John Mcgibbney <lewis.mcgibbney@gmail.com
>> >:
>>
>>> Hi Cihad,
>>> You need to sign up for a username on the Nutch wiki [0].
>>> Once you've sent your username here, I will add you to the contributors
>>> group and you can edit pages and provide content.
>>> Thank you
>>> Lewis
>>>
>>>
>>> [0] http://wiki.apache.org/nutch/
>>>
>>> On Mon, May 18, 2015 at 4:14 PM, Cihad Guzel <cg...@gmail.com> wrote:
>>>
>>>> Hi Lewis.
>>>> I don't edit to wiki for my proposal. Could you provide permit for
>>>> editing?
>>>>
>>>> 2015-05-19 1:22 GMT+03:00 Lewis John Mcgibbney <
>>>> lewis.mcgibbney@gmail.com>:
>>>>
>>>>> Hi Cihad,
>>>>> Thank you for introducing yourself.
>>>>> You now have write access to the Nutch wiki so you can augment the
>>>>> wiki page and begin working on some documentation and issues from within
>>>>> Jira.
>>>>> Really looking forward to working alongside all you guys on your
>>>>> projects.
>>>>> Best
>>>>> Lewis
>>>>>
>>>>> On Mon, May 18, 2015 at 3:19 PM, Cihad Guzel <cg...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I want to introduce myself.
>>>>>>
>>>>>> I am a Computer Engineer and I am doing master now. I like coding.I
>>>>>> have been following some open source project for about 3 years. I am
>>>>>> goaling to make some contribution with GSOC in opensource community.
>>>>>>
>>>>>> I also worked about frontend, middleware, backed development via
>>>>>> enterprise java technologies. Furthermore, experienced “Web Technologies”,
>>>>>> "Search Technologies", "Cloud Computing", "Distributed Systems" and "Big
>>>>>> Data". I took place in search engine project that Apache technologies were
>>>>>> used such as  Solr, HBase, Hadoop, Nutch, Gora and I used Nutch project
>>>>>> actively in this project. You can see more information on my
>>>>>> linkedin profile[1] about me.
>>>>>>
>>>>>> I mention some information for my process. My subject is "Nutch-1741
>>>>>> - Support of Sitemaps in Nutch 2.x" [2] .You know that the url’s can
>>>>>> be got from only pages that were scanned before in nutch crawler system.
>>>>>> Also, the degrees of importance and “change frequence” of these urls are
>>>>>> not known only guessed. But, it is possible to find the whole of urls in a
>>>>>> up-to-date sitemap file. For this reason, sitemap files in website should
>>>>>> be crawled.
>>>>>>
>>>>>> I have explained the features for this project on my proposal. I’ll
>>>>>> add it to wiki and you can see details of it on wiki at when I share . You
>>>>>> can see nutch sitemap lifecycle the drawing [3].
>>>>>>
>>>>>> [1] https://tr.linkedin.com/in/cihadguzel
>>>>>>
>>>>>> [2] https://issues.apache.org/jira/browse/NUTCH-1741
>>>>>>
>>>>>> [3]
>>>>>> https://issues.apache.org/jira/secure/attachment/12707721/SitemapCrawlerLifeCycle.pdf
>>>>>>
>>>>>> Kind Regards
>>>>>>
>>>>>>
>>>>>> 2015-05-19 1:16 GMT+03:00 Cihad Guzel <cg...@gmail.com>:
>>>>>>
>>>>>>> Ok Lewis,
>>>>>>> I signed up to wiki, my wiki username: cihadguzel
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> 2015-05-18 23:44 GMT+03:00 Lewis John Mcgibbney <
>>>>>>> lewis.mcgibbney@gmail.com>:
>>>>>>>
>>>>>>>> Fantastic Cihad,
>>>>>>>> Thank you for introducing yourself.
>>>>>>>> As you are in the community bonding period right now, please feel
>>>>>>>> free to provide your wiki username to me and I will grant you access to the
>>>>>>>> wiki.
>>>>>>>> Please also feel free to pick up some lingering issues for Nutch
>>>>>>>> 2.3.1
>>>>>>>>
>>>>>>>> https://issues.apache.org/jira/browse/NUTCH-1945?jql=project%20%3D%20NUTCH%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%202.3.1%20ORDER%20BY%20priority%20DESC
>>>>>>>> Thanks
>>>>>>>> Lewis
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, May 18, 2015 at 1:26 PM, Cihad Guzel <cg...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I had applied the GSoC 2015 for Apache Nutch Project and my
>>>>>>>>> application is accepted. The main reason why I have choosen the Nutch
>>>>>>>>> Project for GSOC is knowing the Nutch closely. My subject is "Nutch-1741 -
>>>>>>>>> Support of Sitemaps in Nutch 2.x"[1] . Thanks Lewis John McGibbney and
>>>>>>>>> Talat Uyarer for being my mentors on this process. I hope I can contribute
>>>>>>>>> to this project.
>>>>>>>>>
>>>>>>>>> [1] https://issues.apache.org/jira/browse/NUTCH-1741
>>>>>>>>>
>>>>>>>>> Kind Regards
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Lewis*
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Lewis*
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> *Lewis*
>>>
>>
>>
>
> --
> *Lewis*
>
>

Re: Nutch-1741 in GSOC 2015

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Cihad,
Done. You can now edit the wiki.
Thanks

On Monday, May 18, 2015, Cihad Guzel <cg...@gmail.com> wrote:

> ok. I signed up again. my username : CihadGuzel
>
> 2015-05-19 2:32 GMT+03:00 Lewis John Mcgibbney <lewis.mcgibbney@gmail.com
> <javascript:_e(%7B%7D,'cvml','lewis.mcgibbney@gmail.com');>>:
>
>> Hi Cihad,
>> You need to sign up for a username on the Nutch wiki [0].
>> Once you've sent your username here, I will add you to the contributors
>> group and you can edit pages and provide content.
>> Thank you
>> Lewis
>>
>>
>> [0] http://wiki.apache.org/nutch/
>>
>> On Mon, May 18, 2015 at 4:14 PM, Cihad Guzel <cguzelg@gmail.com
>> <javascript:_e(%7B%7D,'cvml','cguzelg@gmail.com');>> wrote:
>>
>>> Hi Lewis.
>>> I don't edit to wiki for my proposal. Could you provide permit for
>>> editing?
>>>
>>> 2015-05-19 1:22 GMT+03:00 Lewis John Mcgibbney <
>>> lewis.mcgibbney@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','lewis.mcgibbney@gmail.com');>>:
>>>
>>>> Hi Cihad,
>>>> Thank you for introducing yourself.
>>>> You now have write access to the Nutch wiki so you can augment the wiki
>>>> page and begin working on some documentation and issues from within Jira.
>>>> Really looking forward to working alongside all you guys on your
>>>> projects.
>>>> Best
>>>> Lewis
>>>>
>>>> On Mon, May 18, 2015 at 3:19 PM, Cihad Guzel <cguzelg@gmail.com
>>>> <javascript:_e(%7B%7D,'cvml','cguzelg@gmail.com');>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I want to introduce myself.
>>>>>
>>>>> I am a Computer Engineer and I am doing master now. I like coding.I
>>>>> have been following some open source project for about 3 years. I am
>>>>> goaling to make some contribution with GSOC in opensource community.
>>>>>
>>>>> I also worked about frontend, middleware, backed development via
>>>>> enterprise java technologies. Furthermore, experienced “Web Technologies”,
>>>>> "Search Technologies", "Cloud Computing", "Distributed Systems" and "Big
>>>>> Data". I took place in search engine project that Apache technologies were
>>>>> used such as  Solr, HBase, Hadoop, Nutch, Gora and I used Nutch project
>>>>> actively in this project. You can see more information on my linkedin
>>>>> profile[1] about me.
>>>>>
>>>>> I mention some information for my process. My subject is "Nutch-1741 -
>>>>> Support of Sitemaps in Nutch 2.x" [2] .You know that the url’s can be
>>>>> got from only pages that were scanned before in nutch crawler system. Also,
>>>>> the degrees of importance and “change frequence” of these urls are not
>>>>> known only guessed. But, it is possible to find the whole of urls in a
>>>>> up-to-date sitemap file. For this reason, sitemap files in website should
>>>>> be crawled.
>>>>>
>>>>> I have explained the features for this project on my proposal. I’ll
>>>>> add it to wiki and you can see details of it on wiki at when I share . You
>>>>> can see nutch sitemap lifecycle the drawing [3].
>>>>>
>>>>> [1] https://tr.linkedin.com/in/cihadguzel
>>>>>
>>>>> [2] https://issues.apache.org/jira/browse/NUTCH-1741
>>>>>
>>>>> [3]
>>>>> https://issues.apache.org/jira/secure/attachment/12707721/SitemapCrawlerLifeCycle.pdf
>>>>>
>>>>> Kind Regards
>>>>>
>>>>>
>>>>> 2015-05-19 1:16 GMT+03:00 Cihad Guzel <cguzelg@gmail.com
>>>>> <javascript:_e(%7B%7D,'cvml','cguzelg@gmail.com');>>:
>>>>>
>>>>>> Ok Lewis,
>>>>>> I signed up to wiki, my wiki username: cihadguzel
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> 2015-05-18 23:44 GMT+03:00 Lewis John Mcgibbney <
>>>>>> lewis.mcgibbney@gmail.com
>>>>>> <javascript:_e(%7B%7D,'cvml','lewis.mcgibbney@gmail.com');>>:
>>>>>>
>>>>>>> Fantastic Cihad,
>>>>>>> Thank you for introducing yourself.
>>>>>>> As you are in the community bonding period right now, please feel
>>>>>>> free to provide your wiki username to me and I will grant you access to the
>>>>>>> wiki.
>>>>>>> Please also feel free to pick up some lingering issues for Nutch
>>>>>>> 2.3.1
>>>>>>>
>>>>>>> https://issues.apache.org/jira/browse/NUTCH-1945?jql=project%20%3D%20NUTCH%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%202.3.1%20ORDER%20BY%20priority%20DESC
>>>>>>> Thanks
>>>>>>> Lewis
>>>>>>>
>>>>>>>
>>>>>>> On Mon, May 18, 2015 at 1:26 PM, Cihad Guzel <cguzelg@gmail.com
>>>>>>> <javascript:_e(%7B%7D,'cvml','cguzelg@gmail.com');>> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I had applied the GSoC 2015 for Apache Nutch Project and my
>>>>>>>> application is accepted. The main reason why I have choosen the Nutch
>>>>>>>> Project for GSOC is knowing the Nutch closely. My subject is "Nutch-1741 -
>>>>>>>> Support of Sitemaps in Nutch 2.x"[1] . Thanks Lewis John McGibbney and
>>>>>>>> Talat Uyarer for being my mentors on this process. I hope I can contribute
>>>>>>>> to this project.
>>>>>>>>
>>>>>>>> [1] https://issues.apache.org/jira/browse/NUTCH-1741
>>>>>>>>
>>>>>>>> Kind Regards
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Lewis*
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Lewis*
>>>>
>>>
>>>
>>
>>
>> --
>> *Lewis*
>>
>
>

-- 
*Lewis*

Re: Nutch-1741 in GSOC 2015

Posted by Cihad Guzel <cg...@gmail.com>.
ok. I signed up again. my username : CihadGuzel

2015-05-19 2:32 GMT+03:00 Lewis John Mcgibbney <le...@gmail.com>:

> Hi Cihad,
> You need to sign up for a username on the Nutch wiki [0].
> Once you've sent your username here, I will add you to the contributors
> group and you can edit pages and provide content.
> Thank you
> Lewis
>
>
> [0] http://wiki.apache.org/nutch/
>
> On Mon, May 18, 2015 at 4:14 PM, Cihad Guzel <cg...@gmail.com> wrote:
>
>> Hi Lewis.
>> I don't edit to wiki for my proposal. Could you provide permit for
>> editing?
>>
>> 2015-05-19 1:22 GMT+03:00 Lewis John Mcgibbney <lewis.mcgibbney@gmail.com
>> >:
>>
>>> Hi Cihad,
>>> Thank you for introducing yourself.
>>> You now have write access to the Nutch wiki so you can augment the wiki
>>> page and begin working on some documentation and issues from within Jira.
>>> Really looking forward to working alongside all you guys on your
>>> projects.
>>> Best
>>> Lewis
>>>
>>> On Mon, May 18, 2015 at 3:19 PM, Cihad Guzel <cg...@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I want to introduce myself.
>>>>
>>>> I am a Computer Engineer and I am doing master now. I like coding.I
>>>> have been following some open source project for about 3 years. I am
>>>> goaling to make some contribution with GSOC in opensource community.
>>>>
>>>> I also worked about frontend, middleware, backed development via
>>>> enterprise java technologies. Furthermore, experienced “Web Technologies”,
>>>> "Search Technologies", "Cloud Computing", "Distributed Systems" and "Big
>>>> Data". I took place in search engine project that Apache technologies were
>>>> used such as  Solr, HBase, Hadoop, Nutch, Gora and I used Nutch project
>>>> actively in this project. You can see more information on my linkedin
>>>> profile[1] about me.
>>>>
>>>> I mention some information for my process. My subject is "Nutch-1741 -
>>>> Support of Sitemaps in Nutch 2.x" [2] .You know that the url’s can be
>>>> got from only pages that were scanned before in nutch crawler system. Also,
>>>> the degrees of importance and “change frequence” of these urls are not
>>>> known only guessed. But, it is possible to find the whole of urls in a
>>>> up-to-date sitemap file. For this reason, sitemap files in website should
>>>> be crawled.
>>>>
>>>> I have explained the features for this project on my proposal. I’ll add
>>>> it to wiki and you can see details of it on wiki at when I share . You can
>>>> see nutch sitemap lifecycle the drawing [3].
>>>>
>>>> [1] https://tr.linkedin.com/in/cihadguzel
>>>>
>>>> [2] https://issues.apache.org/jira/browse/NUTCH-1741
>>>>
>>>> [3]
>>>> https://issues.apache.org/jira/secure/attachment/12707721/SitemapCrawlerLifeCycle.pdf
>>>>
>>>> Kind Regards
>>>>
>>>>
>>>> 2015-05-19 1:16 GMT+03:00 Cihad Guzel <cg...@gmail.com>:
>>>>
>>>>> Ok Lewis,
>>>>> I signed up to wiki, my wiki username: cihadguzel
>>>>>
>>>>> Thanks
>>>>>
>>>>> 2015-05-18 23:44 GMT+03:00 Lewis John Mcgibbney <
>>>>> lewis.mcgibbney@gmail.com>:
>>>>>
>>>>>> Fantastic Cihad,
>>>>>> Thank you for introducing yourself.
>>>>>> As you are in the community bonding period right now, please feel
>>>>>> free to provide your wiki username to me and I will grant you access to the
>>>>>> wiki.
>>>>>> Please also feel free to pick up some lingering issues for Nutch 2.3.1
>>>>>>
>>>>>> https://issues.apache.org/jira/browse/NUTCH-1945?jql=project%20%3D%20NUTCH%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%202.3.1%20ORDER%20BY%20priority%20DESC
>>>>>> Thanks
>>>>>> Lewis
>>>>>>
>>>>>>
>>>>>> On Mon, May 18, 2015 at 1:26 PM, Cihad Guzel <cg...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I had applied the GSoC 2015 for Apache Nutch Project and my
>>>>>>> application is accepted. The main reason why I have choosen the Nutch
>>>>>>> Project for GSOC is knowing the Nutch closely. My subject is "Nutch-1741 -
>>>>>>> Support of Sitemaps in Nutch 2.x"[1] . Thanks Lewis John McGibbney and
>>>>>>> Talat Uyarer for being my mentors on this process. I hope I can contribute
>>>>>>> to this project.
>>>>>>>
>>>>>>> [1] https://issues.apache.org/jira/browse/NUTCH-1741
>>>>>>>
>>>>>>> Kind Regards
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Lewis*
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> *Lewis*
>>>
>>
>>
>
>
> --
> *Lewis*
>

Re: Nutch-1741 in GSOC 2015

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi Cihad,
You need to sign up for a username on the Nutch wiki [0].
Once you've sent your username here, I will add you to the contributors
group and you can edit pages and provide content.
Thank you
Lewis


[0] http://wiki.apache.org/nutch/

On Mon, May 18, 2015 at 4:14 PM, Cihad Guzel <cg...@gmail.com> wrote:

> Hi Lewis.
> I don't edit to wiki for my proposal. Could you provide permit for editing?
>
> 2015-05-19 1:22 GMT+03:00 Lewis John Mcgibbney <le...@gmail.com>
> :
>
>> Hi Cihad,
>> Thank you for introducing yourself.
>> You now have write access to the Nutch wiki so you can augment the wiki
>> page and begin working on some documentation and issues from within Jira.
>> Really looking forward to working alongside all you guys on your projects.
>> Best
>> Lewis
>>
>> On Mon, May 18, 2015 at 3:19 PM, Cihad Guzel <cg...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I want to introduce myself.
>>>
>>> I am a Computer Engineer and I am doing master now. I like coding.I
>>> have been following some open source project for about 3 years. I am
>>> goaling to make some contribution with GSOC in opensource community.
>>>
>>> I also worked about frontend, middleware, backed development via
>>> enterprise java technologies. Furthermore, experienced “Web Technologies”,
>>> "Search Technologies", "Cloud Computing", "Distributed Systems" and "Big
>>> Data". I took place in search engine project that Apache technologies were
>>> used such as  Solr, HBase, Hadoop, Nutch, Gora and I used Nutch project
>>> actively in this project. You can see more information on my linkedin
>>> profile[1] about me.
>>>
>>> I mention some information for my process. My subject is "Nutch-1741 -
>>> Support of Sitemaps in Nutch 2.x" [2] .You know that the url’s can be
>>> got from only pages that were scanned before in nutch crawler system. Also,
>>> the degrees of importance and “change frequence” of these urls are not
>>> known only guessed. But, it is possible to find the whole of urls in a
>>> up-to-date sitemap file. For this reason, sitemap files in website should
>>> be crawled.
>>>
>>> I have explained the features for this project on my proposal. I’ll add
>>> it to wiki and you can see details of it on wiki at when I share . You can
>>> see nutch sitemap lifecycle the drawing [3].
>>>
>>> [1] https://tr.linkedin.com/in/cihadguzel
>>>
>>> [2] https://issues.apache.org/jira/browse/NUTCH-1741
>>>
>>> [3]
>>> https://issues.apache.org/jira/secure/attachment/12707721/SitemapCrawlerLifeCycle.pdf
>>>
>>> Kind Regards
>>>
>>>
>>> 2015-05-19 1:16 GMT+03:00 Cihad Guzel <cg...@gmail.com>:
>>>
>>>> Ok Lewis,
>>>> I signed up to wiki, my wiki username: cihadguzel
>>>>
>>>> Thanks
>>>>
>>>> 2015-05-18 23:44 GMT+03:00 Lewis John Mcgibbney <
>>>> lewis.mcgibbney@gmail.com>:
>>>>
>>>>> Fantastic Cihad,
>>>>> Thank you for introducing yourself.
>>>>> As you are in the community bonding period right now, please feel free
>>>>> to provide your wiki username to me and I will grant you access to the wiki.
>>>>> Please also feel free to pick up some lingering issues for Nutch 2.3.1
>>>>>
>>>>> https://issues.apache.org/jira/browse/NUTCH-1945?jql=project%20%3D%20NUTCH%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%202.3.1%20ORDER%20BY%20priority%20DESC
>>>>> Thanks
>>>>> Lewis
>>>>>
>>>>>
>>>>> On Mon, May 18, 2015 at 1:26 PM, Cihad Guzel <cg...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I had applied the GSoC 2015 for Apache Nutch Project and my
>>>>>> application is accepted. The main reason why I have choosen the Nutch
>>>>>> Project for GSOC is knowing the Nutch closely. My subject is "Nutch-1741 -
>>>>>> Support of Sitemaps in Nutch 2.x"[1] . Thanks Lewis John McGibbney and
>>>>>> Talat Uyarer for being my mentors on this process. I hope I can contribute
>>>>>> to this project.
>>>>>>
>>>>>> [1] https://issues.apache.org/jira/browse/NUTCH-1741
>>>>>>
>>>>>> Kind Regards
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Lewis*
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> *Lewis*
>>
>
>


-- 
*Lewis*

Re: Nutch-1741 in GSOC 2015

Posted by Cihad Guzel <cg...@gmail.com>.
Hi Lewis.
I don't edit to wiki for my proposal. Could you provide permit for editing?

2015-05-19 1:22 GMT+03:00 Lewis John Mcgibbney <le...@gmail.com>:

> Hi Cihad,
> Thank you for introducing yourself.
> You now have write access to the Nutch wiki so you can augment the wiki
> page and begin working on some documentation and issues from within Jira.
> Really looking forward to working alongside all you guys on your projects.
> Best
> Lewis
>
> On Mon, May 18, 2015 at 3:19 PM, Cihad Guzel <cg...@gmail.com> wrote:
>
>> Hi all,
>>
>> I want to introduce myself.
>>
>> I am a Computer Engineer and I am doing master now. I like coding.I have
>> been following some open source project for about 3 years. I am goaling to
>> make some contribution with GSOC in opensource community.
>>
>> I also worked about frontend, middleware, backed development via
>> enterprise java technologies. Furthermore, experienced “Web Technologies”,
>> "Search Technologies", "Cloud Computing", "Distributed Systems" and "Big
>> Data". I took place in search engine project that Apache technologies were
>> used such as  Solr, HBase, Hadoop, Nutch, Gora and I used Nutch project
>> actively in this project. You can see more information on my linkedin
>> profile[1] about me.
>>
>> I mention some information for my process. My subject is "Nutch-1741 -
>> Support of Sitemaps in Nutch 2.x" [2] .You know that the url’s can be
>> got from only pages that were scanned before in nutch crawler system. Also,
>> the degrees of importance and “change frequence” of these urls are not
>> known only guessed. But, it is possible to find the whole of urls in a
>> up-to-date sitemap file. For this reason, sitemap files in website should
>> be crawled.
>>
>> I have explained the features for this project on my proposal. I’ll add
>> it to wiki and you can see details of it on wiki at when I share . You can
>> see nutch sitemap lifecycle the drawing [3].
>>
>> [1] https://tr.linkedin.com/in/cihadguzel
>>
>> [2] https://issues.apache.org/jira/browse/NUTCH-1741
>>
>> [3]
>> https://issues.apache.org/jira/secure/attachment/12707721/SitemapCrawlerLifeCycle.pdf
>>
>> Kind Regards
>>
>>
>> 2015-05-19 1:16 GMT+03:00 Cihad Guzel <cg...@gmail.com>:
>>
>>> Ok Lewis,
>>> I signed up to wiki, my wiki username: cihadguzel
>>>
>>> Thanks
>>>
>>> 2015-05-18 23:44 GMT+03:00 Lewis John Mcgibbney <
>>> lewis.mcgibbney@gmail.com>:
>>>
>>>> Fantastic Cihad,
>>>> Thank you for introducing yourself.
>>>> As you are in the community bonding period right now, please feel free
>>>> to provide your wiki username to me and I will grant you access to the wiki.
>>>> Please also feel free to pick up some lingering issues for Nutch 2.3.1
>>>>
>>>> https://issues.apache.org/jira/browse/NUTCH-1945?jql=project%20%3D%20NUTCH%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%202.3.1%20ORDER%20BY%20priority%20DESC
>>>> Thanks
>>>> Lewis
>>>>
>>>>
>>>> On Mon, May 18, 2015 at 1:26 PM, Cihad Guzel <cg...@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I had applied the GSoC 2015 for Apache Nutch Project and my
>>>>> application is accepted. The main reason why I have choosen the Nutch
>>>>> Project for GSOC is knowing the Nutch closely. My subject is "Nutch-1741 -
>>>>> Support of Sitemaps in Nutch 2.x"[1] . Thanks Lewis John McGibbney and
>>>>> Talat Uyarer for being my mentors on this process. I hope I can contribute
>>>>> to this project.
>>>>>
>>>>> [1] https://issues.apache.org/jira/browse/NUTCH-1741
>>>>>
>>>>> Kind Regards
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> *Lewis*
>>>>
>>>
>>>
>>
>
>
> --
> *Lewis*
>

Re: Nutch-1741 in GSOC 2015

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi Cihad,
Thank you for introducing yourself.
You now have write access to the Nutch wiki so you can augment the wiki
page and begin working on some documentation and issues from within Jira.
Really looking forward to working alongside all you guys on your projects.
Best
Lewis

On Mon, May 18, 2015 at 3:19 PM, Cihad Guzel <cg...@gmail.com> wrote:

> Hi all,
>
> I want to introduce myself.
>
> I am a Computer Engineer and I am doing master now. I like coding.I have
> been following some open source project for about 3 years. I am goaling to
> make some contribution with GSOC in opensource community.
>
> I also worked about frontend, middleware, backed development via
> enterprise java technologies. Furthermore, experienced “Web Technologies”,
> "Search Technologies", "Cloud Computing", "Distributed Systems" and "Big
> Data". I took place in search engine project that Apache technologies were
> used such as  Solr, HBase, Hadoop, Nutch, Gora and I used Nutch project
> actively in this project. You can see more information on my linkedin
> profile[1] about me.
>
> I mention some information for my process. My subject is "Nutch-1741 -
> Support of Sitemaps in Nutch 2.x" [2] .You know that the url’s can be got
> from only pages that were scanned before in nutch crawler system. Also, the
> degrees of importance and “change frequence” of these urls are not known
> only guessed. But, it is possible to find the whole of urls in a up-to-date
> sitemap file. For this reason, sitemap files in website should be crawled.
>
> I have explained the features for this project on my proposal. I’ll add it
> to wiki and you can see details of it on wiki at when I share . You can see
> nutch sitemap lifecycle the drawing [3].
>
> [1] https://tr.linkedin.com/in/cihadguzel
>
> [2] https://issues.apache.org/jira/browse/NUTCH-1741
>
> [3]
> https://issues.apache.org/jira/secure/attachment/12707721/SitemapCrawlerLifeCycle.pdf
>
> Kind Regards
>
>
> 2015-05-19 1:16 GMT+03:00 Cihad Guzel <cg...@gmail.com>:
>
>> Ok Lewis,
>> I signed up to wiki, my wiki username: cihadguzel
>>
>> Thanks
>>
>> 2015-05-18 23:44 GMT+03:00 Lewis John Mcgibbney <
>> lewis.mcgibbney@gmail.com>:
>>
>>> Fantastic Cihad,
>>> Thank you for introducing yourself.
>>> As you are in the community bonding period right now, please feel free
>>> to provide your wiki username to me and I will grant you access to the wiki.
>>> Please also feel free to pick up some lingering issues for Nutch 2.3.1
>>>
>>> https://issues.apache.org/jira/browse/NUTCH-1945?jql=project%20%3D%20NUTCH%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%202.3.1%20ORDER%20BY%20priority%20DESC
>>> Thanks
>>> Lewis
>>>
>>>
>>> On Mon, May 18, 2015 at 1:26 PM, Cihad Guzel <cg...@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I had applied the GSoC 2015 for Apache Nutch Project and my application
>>>> is accepted. The main reason why I have choosen the Nutch Project for GSOC
>>>> is knowing the Nutch closely. My subject is "Nutch-1741 - Support of
>>>> Sitemaps in Nutch 2.x"[1] . Thanks Lewis John McGibbney and Talat Uyarer
>>>> for being my mentors on this process. I hope I can contribute to this
>>>> project.
>>>>
>>>> [1] https://issues.apache.org/jira/browse/NUTCH-1741
>>>>
>>>> Kind Regards
>>>>
>>>
>>>
>>>
>>> --
>>> *Lewis*
>>>
>>
>>
>


-- 
*Lewis*

Re: Nutch-1741 in GSOC 2015

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi Cihad,
Thank you for introducing yourself.
You now have write access to the Nutch wiki so you can augment the wiki
page and begin working on some documentation and issues from within Jira.
Really looking forward to working alongside all you guys on your projects.
Best
Lewis

On Mon, May 18, 2015 at 3:19 PM, Cihad Guzel <cg...@gmail.com> wrote:

> Hi all,
>
> I want to introduce myself.
>
> I am a Computer Engineer and I am doing master now. I like coding.I have
> been following some open source project for about 3 years. I am goaling to
> make some contribution with GSOC in opensource community.
>
> I also worked about frontend, middleware, backed development via
> enterprise java technologies. Furthermore, experienced “Web Technologies”,
> "Search Technologies", "Cloud Computing", "Distributed Systems" and "Big
> Data". I took place in search engine project that Apache technologies were
> used such as  Solr, HBase, Hadoop, Nutch, Gora and I used Nutch project
> actively in this project. You can see more information on my linkedin
> profile[1] about me.
>
> I mention some information for my process. My subject is "Nutch-1741 -
> Support of Sitemaps in Nutch 2.x" [2] .You know that the url’s can be got
> from only pages that were scanned before in nutch crawler system. Also, the
> degrees of importance and “change frequence” of these urls are not known
> only guessed. But, it is possible to find the whole of urls in a up-to-date
> sitemap file. For this reason, sitemap files in website should be crawled.
>
> I have explained the features for this project on my proposal. I’ll add it
> to wiki and you can see details of it on wiki at when I share . You can see
> nutch sitemap lifecycle the drawing [3].
>
> [1] https://tr.linkedin.com/in/cihadguzel
>
> [2] https://issues.apache.org/jira/browse/NUTCH-1741
>
> [3]
> https://issues.apache.org/jira/secure/attachment/12707721/SitemapCrawlerLifeCycle.pdf
>
> Kind Regards
>
>
> 2015-05-19 1:16 GMT+03:00 Cihad Guzel <cg...@gmail.com>:
>
>> Ok Lewis,
>> I signed up to wiki, my wiki username: cihadguzel
>>
>> Thanks
>>
>> 2015-05-18 23:44 GMT+03:00 Lewis John Mcgibbney <
>> lewis.mcgibbney@gmail.com>:
>>
>>> Fantastic Cihad,
>>> Thank you for introducing yourself.
>>> As you are in the community bonding period right now, please feel free
>>> to provide your wiki username to me and I will grant you access to the wiki.
>>> Please also feel free to pick up some lingering issues for Nutch 2.3.1
>>>
>>> https://issues.apache.org/jira/browse/NUTCH-1945?jql=project%20%3D%20NUTCH%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%202.3.1%20ORDER%20BY%20priority%20DESC
>>> Thanks
>>> Lewis
>>>
>>>
>>> On Mon, May 18, 2015 at 1:26 PM, Cihad Guzel <cg...@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I had applied the GSoC 2015 for Apache Nutch Project and my application
>>>> is accepted. The main reason why I have choosen the Nutch Project for GSOC
>>>> is knowing the Nutch closely. My subject is "Nutch-1741 - Support of
>>>> Sitemaps in Nutch 2.x"[1] . Thanks Lewis John McGibbney and Talat Uyarer
>>>> for being my mentors on this process. I hope I can contribute to this
>>>> project.
>>>>
>>>> [1] https://issues.apache.org/jira/browse/NUTCH-1741
>>>>
>>>> Kind Regards
>>>>
>>>
>>>
>>>
>>> --
>>> *Lewis*
>>>
>>
>>
>


-- 
*Lewis*

Re: Nutch-1741 in GSOC 2015

Posted by Talat Uyarer <ta...@uyarer.com>.
Superb Cihad! This would be easy following your works.

2015-05-25 9:53 GMT+03:00 Cihad Guzel <cg...@gmail.com>:
> Hi all,
>
> I fork nutch on my github acoount [1] . So you can see my next commits.
> [1] https://github.com/cguzel/nutch
>
> --
> Kind Regards
> Cihad Güzel
>
> 2015-05-20 23:50 GMT+03:00 Cihad Guzel <cg...@gmail.com>:
>>
>> Hi all.
>>
>> I have added my proposal to nutch wiki. You can see details of "Sitemap
>> Crawler" from here [1].
>>
>> [1]  https://wiki.apache.org/nutch/GoogleSummerOfCode/SitemapCrawler
>>
>> --
>> Kind Regards
>>
>>
>> 2015-05-19 1:19 GMT+03:00 Cihad Guzel <cg...@gmail.com>:
>>>
>>> Hi all,
>>>
>>>
>>> I want to introduce myself.
>>>
>>>
>>> I am a Computer Engineer and I am doing master now. I like coding.I have
>>> been following some open source project for about 3 years. I am goaling to
>>> make some contribution with GSOC in opensource community.
>>>
>>>
>>> I also worked about frontend, middleware, backed development via
>>> enterprise java technologies. Furthermore, experienced “Web Technologies”,
>>> "Search Technologies", "Cloud Computing", "Distributed Systems" and "Big
>>> Data". I took place in search engine project that Apache technologies were
>>> used such as  Solr, HBase, Hadoop, Nutch, Gora and I used Nutch project
>>> actively in this project. You can see more information on my linkedin
>>> profile[1] about me.
>>>
>>>
>>> I mention some information for my process. My subject is "Nutch-1741 -
>>> Support of Sitemaps in Nutch 2.x" [2] .You know that the url’s can be got
>>> from only pages that were scanned before in nutch crawler system. Also, the
>>> degrees of importance and “change frequence” of these urls are not known
>>> only guessed. But, it is possible to find the whole of urls in a up-to-date
>>> sitemap file. For this reason, sitemap files in website should be crawled.
>>>
>>>
>>> I have explained the features for this project on my proposal. I’ll add
>>> it to wiki and you can see details of it on wiki at when I share . You can
>>> see nutch sitemap lifecycle the drawing [3].
>>>
>>>
>>> [1] https://tr.linkedin.com/in/cihadguzel
>>>
>>> [2] https://issues.apache.org/jira/browse/NUTCH-1741
>>>
>>> [3]
>>> https://issues.apache.org/jira/secure/attachment/12707721/SitemapCrawlerLifeCycle.pdf
>>>
>>>
>>> Kind Regards
>>>
>>>
>>>
>>> 2015-05-19 1:16 GMT+03:00 Cihad Guzel <cg...@gmail.com>:
>>>>
>>>> Ok Lewis,
>>>> I signed up to wiki, my wiki username: cihadguzel
>>>>
>>>> Thanks
>>>>
>>>> 2015-05-18 23:44 GMT+03:00 Lewis John Mcgibbney
>>>> <le...@gmail.com>:
>>>>>
>>>>> Fantastic Cihad,
>>>>> Thank you for introducing yourself.
>>>>> As you are in the community bonding period right now, please feel free
>>>>> to provide your wiki username to me and I will grant you access to the wiki.
>>>>> Please also feel free to pick up some lingering issues for Nutch 2.3.1
>>>>>
>>>>> https://issues.apache.org/jira/browse/NUTCH-1945?jql=project%20%3D%20NUTCH%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%202.3.1%20ORDER%20BY%20priority%20DESC
>>>>> Thanks
>>>>> Lewis
>>>>>
>>>>>
>>>>> On Mon, May 18, 2015 at 1:26 PM, Cihad Guzel <cg...@gmail.com> wrote:
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I had applied the GSoC 2015 for Apache Nutch Project and my
>>>>>> application is accepted. The main reason why I have choosen the Nutch
>>>>>> Project for GSOC is knowing the Nutch closely. My subject is "Nutch-1741 -
>>>>>> Support of Sitemaps in Nutch 2.x"[1] . Thanks Lewis John McGibbney and Talat
>>>>>> Uyarer for being my mentors on this process. I hope I can contribute to this
>>>>>> project.
>>>>>>
>>>>>> [1] https://issues.apache.org/jira/browse/NUTCH-1741
>>>>>>
>>>>>> Kind Regards
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Lewis
>>>>
>>>>
>>>
>>
>



-- 
Talat UYARER
Websitesi: http://talat.uyarer.com
Twitter: http://twitter.com/talatuyarer
Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304

Re: Nutch-1741 in GSOC 2015

Posted by Talat Uyarer <ta...@uyarer.com>.
Superb Cihad! This would be easy following your works.

2015-05-25 9:53 GMT+03:00 Cihad Guzel <cg...@gmail.com>:
> Hi all,
>
> I fork nutch on my github acoount [1] . So you can see my next commits.
> [1] https://github.com/cguzel/nutch
>
> --
> Kind Regards
> Cihad Güzel
>
> 2015-05-20 23:50 GMT+03:00 Cihad Guzel <cg...@gmail.com>:
>>
>> Hi all.
>>
>> I have added my proposal to nutch wiki. You can see details of "Sitemap
>> Crawler" from here [1].
>>
>> [1]  https://wiki.apache.org/nutch/GoogleSummerOfCode/SitemapCrawler
>>
>> --
>> Kind Regards
>>
>>
>> 2015-05-19 1:19 GMT+03:00 Cihad Guzel <cg...@gmail.com>:
>>>
>>> Hi all,
>>>
>>>
>>> I want to introduce myself.
>>>
>>>
>>> I am a Computer Engineer and I am doing master now. I like coding.I have
>>> been following some open source project for about 3 years. I am goaling to
>>> make some contribution with GSOC in opensource community.
>>>
>>>
>>> I also worked about frontend, middleware, backed development via
>>> enterprise java technologies. Furthermore, experienced “Web Technologies”,
>>> "Search Technologies", "Cloud Computing", "Distributed Systems" and "Big
>>> Data". I took place in search engine project that Apache technologies were
>>> used such as  Solr, HBase, Hadoop, Nutch, Gora and I used Nutch project
>>> actively in this project. You can see more information on my linkedin
>>> profile[1] about me.
>>>
>>>
>>> I mention some information for my process. My subject is "Nutch-1741 -
>>> Support of Sitemaps in Nutch 2.x" [2] .You know that the url’s can be got
>>> from only pages that were scanned before in nutch crawler system. Also, the
>>> degrees of importance and “change frequence” of these urls are not known
>>> only guessed. But, it is possible to find the whole of urls in a up-to-date
>>> sitemap file. For this reason, sitemap files in website should be crawled.
>>>
>>>
>>> I have explained the features for this project on my proposal. I’ll add
>>> it to wiki and you can see details of it on wiki at when I share . You can
>>> see nutch sitemap lifecycle the drawing [3].
>>>
>>>
>>> [1] https://tr.linkedin.com/in/cihadguzel
>>>
>>> [2] https://issues.apache.org/jira/browse/NUTCH-1741
>>>
>>> [3]
>>> https://issues.apache.org/jira/secure/attachment/12707721/SitemapCrawlerLifeCycle.pdf
>>>
>>>
>>> Kind Regards
>>>
>>>
>>>
>>> 2015-05-19 1:16 GMT+03:00 Cihad Guzel <cg...@gmail.com>:
>>>>
>>>> Ok Lewis,
>>>> I signed up to wiki, my wiki username: cihadguzel
>>>>
>>>> Thanks
>>>>
>>>> 2015-05-18 23:44 GMT+03:00 Lewis John Mcgibbney
>>>> <le...@gmail.com>:
>>>>>
>>>>> Fantastic Cihad,
>>>>> Thank you for introducing yourself.
>>>>> As you are in the community bonding period right now, please feel free
>>>>> to provide your wiki username to me and I will grant you access to the wiki.
>>>>> Please also feel free to pick up some lingering issues for Nutch 2.3.1
>>>>>
>>>>> https://issues.apache.org/jira/browse/NUTCH-1945?jql=project%20%3D%20NUTCH%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%202.3.1%20ORDER%20BY%20priority%20DESC
>>>>> Thanks
>>>>> Lewis
>>>>>
>>>>>
>>>>> On Mon, May 18, 2015 at 1:26 PM, Cihad Guzel <cg...@gmail.com> wrote:
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I had applied the GSoC 2015 for Apache Nutch Project and my
>>>>>> application is accepted. The main reason why I have choosen the Nutch
>>>>>> Project for GSOC is knowing the Nutch closely. My subject is "Nutch-1741 -
>>>>>> Support of Sitemaps in Nutch 2.x"[1] . Thanks Lewis John McGibbney and Talat
>>>>>> Uyarer for being my mentors on this process. I hope I can contribute to this
>>>>>> project.
>>>>>>
>>>>>> [1] https://issues.apache.org/jira/browse/NUTCH-1741
>>>>>>
>>>>>> Kind Regards
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Lewis
>>>>
>>>>
>>>
>>
>



-- 
Talat UYARER
Websitesi: http://talat.uyarer.com
Twitter: http://twitter.com/talatuyarer
Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304

Re: Nutch-1741 in GSOC 2015

Posted by Cihad Guzel <cg...@gmail.com>.
Hi all,

I fork nutch on my github acoount [1] . So you can see my next commits.
[1] https://github.com/cguzel/nutch

--
Kind Regards
Cihad Güzel

2015-05-20 23:50 GMT+03:00 Cihad Guzel <cg...@gmail.com>:

> Hi all.
>
> I have added my proposal to nutch wiki. You can see details of "Sitemap
> Crawler" from here [1].
>
> [1]  https://wiki.apache.org/nutch/GoogleSummerOfCode/SitemapCrawler
>
> --
> Kind Regards
>
>
> 2015-05-19 1:19 GMT+03:00 Cihad Guzel <cg...@gmail.com>:
>
>> Hi all,
>>
>> I want to introduce myself.
>>
>> I am a Computer Engineer and I am doing master now. I like coding.I have
>> been following some open source project for about 3 years. I am goaling to
>> make some contribution with GSOC in opensource community.
>>
>> I also worked about frontend, middleware, backed development via
>> enterprise java technologies. Furthermore, experienced “Web Technologies”,
>> "Search Technologies", "Cloud Computing", "Distributed Systems" and "Big
>> Data". I took place in search engine project that Apache technologies were
>> used such as  Solr, HBase, Hadoop, Nutch, Gora and I used Nutch project
>> actively in this project. You can see more information on my linkedin
>> profile[1] about me.
>>
>> I mention some information for my process. My subject is "Nutch-1741 -
>> Support of Sitemaps in Nutch 2.x" [2] .You know that the url’s can be
>> got from only pages that were scanned before in nutch crawler system. Also,
>> the degrees of importance and “change frequence” of these urls are not
>> known only guessed. But, it is possible to find the whole of urls in a
>> up-to-date sitemap file. For this reason, sitemap files in website should
>> be crawled.
>>
>> I have explained the features for this project on my proposal. I’ll add
>> it to wiki and you can see details of it on wiki at when I share . You can
>> see nutch sitemap lifecycle the drawing [3].
>>
>> [1] https://tr.linkedin.com/in/cihadguzel
>>
>> [2] https://issues.apache.org/jira/browse/NUTCH-1741
>>
>> [3]
>> https://issues.apache.org/jira/secure/attachment/12707721/SitemapCrawlerLifeCycle.pdf
>>
>> Kind Regards
>>
>>
>> 2015-05-19 1:16 GMT+03:00 Cihad Guzel <cg...@gmail.com>:
>>
>>> Ok Lewis,
>>> I signed up to wiki, my wiki username: cihadguzel
>>>
>>> Thanks
>>>
>>> 2015-05-18 23:44 GMT+03:00 Lewis John Mcgibbney <
>>> lewis.mcgibbney@gmail.com>:
>>>
>>>> Fantastic Cihad,
>>>> Thank you for introducing yourself.
>>>> As you are in the community bonding period right now, please feel free
>>>> to provide your wiki username to me and I will grant you access to the wiki.
>>>> Please also feel free to pick up some lingering issues for Nutch 2.3.1
>>>>
>>>> https://issues.apache.org/jira/browse/NUTCH-1945?jql=project%20%3D%20NUTCH%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%202.3.1%20ORDER%20BY%20priority%20DESC
>>>> Thanks
>>>> Lewis
>>>>
>>>>
>>>> On Mon, May 18, 2015 at 1:26 PM, Cihad Guzel <cg...@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I had applied the GSoC 2015 for Apache Nutch Project and my
>>>>> application is accepted. The main reason why I have choosen the Nutch
>>>>> Project for GSOC is knowing the Nutch closely. My subject is "Nutch-1741 -
>>>>> Support of Sitemaps in Nutch 2.x"[1] . Thanks Lewis John McGibbney and
>>>>> Talat Uyarer for being my mentors on this process. I hope I can contribute
>>>>> to this project.
>>>>>
>>>>> [1] https://issues.apache.org/jira/browse/NUTCH-1741
>>>>>
>>>>> Kind Regards
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> *Lewis*
>>>>
>>>
>>>
>>
>

Re: Nutch-1741 in GSOC 2015

Posted by Cihad Guzel <cg...@gmail.com>.
Hi all,

I fork nutch on my github acoount [1] . So you can see my next commits.
[1] https://github.com/cguzel/nutch

--
Kind Regards
Cihad Güzel

2015-05-20 23:50 GMT+03:00 Cihad Guzel <cg...@gmail.com>:

> Hi all.
>
> I have added my proposal to nutch wiki. You can see details of "Sitemap
> Crawler" from here [1].
>
> [1]  https://wiki.apache.org/nutch/GoogleSummerOfCode/SitemapCrawler
>
> --
> Kind Regards
>
>
> 2015-05-19 1:19 GMT+03:00 Cihad Guzel <cg...@gmail.com>:
>
>> Hi all,
>>
>> I want to introduce myself.
>>
>> I am a Computer Engineer and I am doing master now. I like coding.I have
>> been following some open source project for about 3 years. I am goaling to
>> make some contribution with GSOC in opensource community.
>>
>> I also worked about frontend, middleware, backed development via
>> enterprise java technologies. Furthermore, experienced “Web Technologies”,
>> "Search Technologies", "Cloud Computing", "Distributed Systems" and "Big
>> Data". I took place in search engine project that Apache technologies were
>> used such as  Solr, HBase, Hadoop, Nutch, Gora and I used Nutch project
>> actively in this project. You can see more information on my linkedin
>> profile[1] about me.
>>
>> I mention some information for my process. My subject is "Nutch-1741 -
>> Support of Sitemaps in Nutch 2.x" [2] .You know that the url’s can be
>> got from only pages that were scanned before in nutch crawler system. Also,
>> the degrees of importance and “change frequence” of these urls are not
>> known only guessed. But, it is possible to find the whole of urls in a
>> up-to-date sitemap file. For this reason, sitemap files in website should
>> be crawled.
>>
>> I have explained the features for this project on my proposal. I’ll add
>> it to wiki and you can see details of it on wiki at when I share . You can
>> see nutch sitemap lifecycle the drawing [3].
>>
>> [1] https://tr.linkedin.com/in/cihadguzel
>>
>> [2] https://issues.apache.org/jira/browse/NUTCH-1741
>>
>> [3]
>> https://issues.apache.org/jira/secure/attachment/12707721/SitemapCrawlerLifeCycle.pdf
>>
>> Kind Regards
>>
>>
>> 2015-05-19 1:16 GMT+03:00 Cihad Guzel <cg...@gmail.com>:
>>
>>> Ok Lewis,
>>> I signed up to wiki, my wiki username: cihadguzel
>>>
>>> Thanks
>>>
>>> 2015-05-18 23:44 GMT+03:00 Lewis John Mcgibbney <
>>> lewis.mcgibbney@gmail.com>:
>>>
>>>> Fantastic Cihad,
>>>> Thank you for introducing yourself.
>>>> As you are in the community bonding period right now, please feel free
>>>> to provide your wiki username to me and I will grant you access to the wiki.
>>>> Please also feel free to pick up some lingering issues for Nutch 2.3.1
>>>>
>>>> https://issues.apache.org/jira/browse/NUTCH-1945?jql=project%20%3D%20NUTCH%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%202.3.1%20ORDER%20BY%20priority%20DESC
>>>> Thanks
>>>> Lewis
>>>>
>>>>
>>>> On Mon, May 18, 2015 at 1:26 PM, Cihad Guzel <cg...@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I had applied the GSoC 2015 for Apache Nutch Project and my
>>>>> application is accepted. The main reason why I have choosen the Nutch
>>>>> Project for GSOC is knowing the Nutch closely. My subject is "Nutch-1741 -
>>>>> Support of Sitemaps in Nutch 2.x"[1] . Thanks Lewis John McGibbney and
>>>>> Talat Uyarer for being my mentors on this process. I hope I can contribute
>>>>> to this project.
>>>>>
>>>>> [1] https://issues.apache.org/jira/browse/NUTCH-1741
>>>>>
>>>>> Kind Regards
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> *Lewis*
>>>>
>>>
>>>
>>
>

Re: Nutch-1741 in GSOC 2015

Posted by Cihad Guzel <cg...@gmail.com>.
Hi all.

I have added my proposal to nutch wiki. You can see details of "Sitemap
Crawler" from here [1].

[1]  https://wiki.apache.org/nutch/GoogleSummerOfCode/SitemapCrawler

--
Kind Regards


2015-05-19 1:19 GMT+03:00 Cihad Guzel <cg...@gmail.com>:

> Hi all,
>
> I want to introduce myself.
>
> I am a Computer Engineer and I am doing master now. I like coding.I have
> been following some open source project for about 3 years. I am goaling to
> make some contribution with GSOC in opensource community.
>
> I also worked about frontend, middleware, backed development via
> enterprise java technologies. Furthermore, experienced “Web Technologies”,
> "Search Technologies", "Cloud Computing", "Distributed Systems" and "Big
> Data". I took place in search engine project that Apache technologies were
> used such as  Solr, HBase, Hadoop, Nutch, Gora and I used Nutch project
> actively in this project. You can see more information on my linkedin
> profile[1] about me.
>
> I mention some information for my process. My subject is "Nutch-1741 -
> Support of Sitemaps in Nutch 2.x" [2] .You know that the url’s can be got
> from only pages that were scanned before in nutch crawler system. Also, the
> degrees of importance and “change frequence” of these urls are not known
> only guessed. But, it is possible to find the whole of urls in a up-to-date
> sitemap file. For this reason, sitemap files in website should be crawled.
>
> I have explained the features for this project on my proposal. I’ll add it
> to wiki and you can see details of it on wiki at when I share . You can see
> nutch sitemap lifecycle the drawing [3].
>
> [1] https://tr.linkedin.com/in/cihadguzel
>
> [2] https://issues.apache.org/jira/browse/NUTCH-1741
>
> [3]
> https://issues.apache.org/jira/secure/attachment/12707721/SitemapCrawlerLifeCycle.pdf
>
> Kind Regards
>
>
> 2015-05-19 1:16 GMT+03:00 Cihad Guzel <cg...@gmail.com>:
>
>> Ok Lewis,
>> I signed up to wiki, my wiki username: cihadguzel
>>
>> Thanks
>>
>> 2015-05-18 23:44 GMT+03:00 Lewis John Mcgibbney <
>> lewis.mcgibbney@gmail.com>:
>>
>>> Fantastic Cihad,
>>> Thank you for introducing yourself.
>>> As you are in the community bonding period right now, please feel free
>>> to provide your wiki username to me and I will grant you access to the wiki.
>>> Please also feel free to pick up some lingering issues for Nutch 2.3.1
>>>
>>> https://issues.apache.org/jira/browse/NUTCH-1945?jql=project%20%3D%20NUTCH%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%202.3.1%20ORDER%20BY%20priority%20DESC
>>> Thanks
>>> Lewis
>>>
>>>
>>> On Mon, May 18, 2015 at 1:26 PM, Cihad Guzel <cg...@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I had applied the GSoC 2015 for Apache Nutch Project and my application
>>>> is accepted. The main reason why I have choosen the Nutch Project for GSOC
>>>> is knowing the Nutch closely. My subject is "Nutch-1741 - Support of
>>>> Sitemaps in Nutch 2.x"[1] . Thanks Lewis John McGibbney and Talat Uyarer
>>>> for being my mentors on this process. I hope I can contribute to this
>>>> project.
>>>>
>>>> [1] https://issues.apache.org/jira/browse/NUTCH-1741
>>>>
>>>> Kind Regards
>>>>
>>>
>>>
>>>
>>> --
>>> *Lewis*
>>>
>>
>>
>

Re: Nutch-1741 in GSOC 2015

Posted by Cihad Guzel <cg...@gmail.com>.
Hi all.

I have added my proposal to nutch wiki. You can see details of "Sitemap
Crawler" from here [1].

[1]  https://wiki.apache.org/nutch/GoogleSummerOfCode/SitemapCrawler

--
Kind Regards


2015-05-19 1:19 GMT+03:00 Cihad Guzel <cg...@gmail.com>:

> Hi all,
>
> I want to introduce myself.
>
> I am a Computer Engineer and I am doing master now. I like coding.I have
> been following some open source project for about 3 years. I am goaling to
> make some contribution with GSOC in opensource community.
>
> I also worked about frontend, middleware, backed development via
> enterprise java technologies. Furthermore, experienced “Web Technologies”,
> "Search Technologies", "Cloud Computing", "Distributed Systems" and "Big
> Data". I took place in search engine project that Apache technologies were
> used such as  Solr, HBase, Hadoop, Nutch, Gora and I used Nutch project
> actively in this project. You can see more information on my linkedin
> profile[1] about me.
>
> I mention some information for my process. My subject is "Nutch-1741 -
> Support of Sitemaps in Nutch 2.x" [2] .You know that the url’s can be got
> from only pages that were scanned before in nutch crawler system. Also, the
> degrees of importance and “change frequence” of these urls are not known
> only guessed. But, it is possible to find the whole of urls in a up-to-date
> sitemap file. For this reason, sitemap files in website should be crawled.
>
> I have explained the features for this project on my proposal. I’ll add it
> to wiki and you can see details of it on wiki at when I share . You can see
> nutch sitemap lifecycle the drawing [3].
>
> [1] https://tr.linkedin.com/in/cihadguzel
>
> [2] https://issues.apache.org/jira/browse/NUTCH-1741
>
> [3]
> https://issues.apache.org/jira/secure/attachment/12707721/SitemapCrawlerLifeCycle.pdf
>
> Kind Regards
>
>
> 2015-05-19 1:16 GMT+03:00 Cihad Guzel <cg...@gmail.com>:
>
>> Ok Lewis,
>> I signed up to wiki, my wiki username: cihadguzel
>>
>> Thanks
>>
>> 2015-05-18 23:44 GMT+03:00 Lewis John Mcgibbney <
>> lewis.mcgibbney@gmail.com>:
>>
>>> Fantastic Cihad,
>>> Thank you for introducing yourself.
>>> As you are in the community bonding period right now, please feel free
>>> to provide your wiki username to me and I will grant you access to the wiki.
>>> Please also feel free to pick up some lingering issues for Nutch 2.3.1
>>>
>>> https://issues.apache.org/jira/browse/NUTCH-1945?jql=project%20%3D%20NUTCH%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%202.3.1%20ORDER%20BY%20priority%20DESC
>>> Thanks
>>> Lewis
>>>
>>>
>>> On Mon, May 18, 2015 at 1:26 PM, Cihad Guzel <cg...@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I had applied the GSoC 2015 for Apache Nutch Project and my application
>>>> is accepted. The main reason why I have choosen the Nutch Project for GSOC
>>>> is knowing the Nutch closely. My subject is "Nutch-1741 - Support of
>>>> Sitemaps in Nutch 2.x"[1] . Thanks Lewis John McGibbney and Talat Uyarer
>>>> for being my mentors on this process. I hope I can contribute to this
>>>> project.
>>>>
>>>> [1] https://issues.apache.org/jira/browse/NUTCH-1741
>>>>
>>>> Kind Regards
>>>>
>>>
>>>
>>>
>>> --
>>> *Lewis*
>>>
>>
>>
>

Re: Nutch-1741 in GSOC 2015

Posted by Cihad Guzel <cg...@gmail.com>.
Hi all,

I want to introduce myself.

I am a Computer Engineer and I am doing master now. I like coding.I have
been following some open source project for about 3 years. I am goaling to
make some contribution with GSOC in opensource community.

I also worked about frontend, middleware, backed development via enterprise
java technologies. Furthermore, experienced “Web Technologies”, "Search
Technologies", "Cloud Computing", "Distributed Systems" and "Big Data". I
took place in search engine project that Apache technologies were used such
as  Solr, HBase, Hadoop, Nutch, Gora and I used Nutch project actively in
this project. You can see more information on my linkedin profile[1] about
me.

I mention some information for my process. My subject is "Nutch-1741 -
Support of Sitemaps in Nutch 2.x" [2] .You know that the url’s can be got
from only pages that were scanned before in nutch crawler system. Also, the
degrees of importance and “change frequence” of these urls are not known
only guessed. But, it is possible to find the whole of urls in a up-to-date
sitemap file. For this reason, sitemap files in website should be crawled.

I have explained the features for this project on my proposal. I’ll add it
to wiki and you can see details of it on wiki at when I share . You can see
nutch sitemap lifecycle the drawing [3].

[1] https://tr.linkedin.com/in/cihadguzel

[2] https://issues.apache.org/jira/browse/NUTCH-1741

[3]
https://issues.apache.org/jira/secure/attachment/12707721/SitemapCrawlerLifeCycle.pdf

Kind Regards


2015-05-19 1:16 GMT+03:00 Cihad Guzel <cg...@gmail.com>:

> Ok Lewis,
> I signed up to wiki, my wiki username: cihadguzel
>
> Thanks
>
> 2015-05-18 23:44 GMT+03:00 Lewis John Mcgibbney <lewis.mcgibbney@gmail.com
> >:
>
>> Fantastic Cihad,
>> Thank you for introducing yourself.
>> As you are in the community bonding period right now, please feel free to
>> provide your wiki username to me and I will grant you access to the wiki.
>> Please also feel free to pick up some lingering issues for Nutch 2.3.1
>>
>> https://issues.apache.org/jira/browse/NUTCH-1945?jql=project%20%3D%20NUTCH%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%202.3.1%20ORDER%20BY%20priority%20DESC
>> Thanks
>> Lewis
>>
>>
>> On Mon, May 18, 2015 at 1:26 PM, Cihad Guzel <cg...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I had applied the GSoC 2015 for Apache Nutch Project and my application
>>> is accepted. The main reason why I have choosen the Nutch Project for GSOC
>>> is knowing the Nutch closely. My subject is "Nutch-1741 - Support of
>>> Sitemaps in Nutch 2.x"[1] . Thanks Lewis John McGibbney and Talat Uyarer
>>> for being my mentors on this process. I hope I can contribute to this
>>> project.
>>>
>>> [1] https://issues.apache.org/jira/browse/NUTCH-1741
>>>
>>> Kind Regards
>>>
>>
>>
>>
>> --
>> *Lewis*
>>
>
>

Re: Nutch-1741 in GSOC 2015

Posted by Cihad Guzel <cg...@gmail.com>.
Hi all,

I want to introduce myself.

I am a Computer Engineer and I am doing master now. I like coding.I have
been following some open source project for about 3 years. I am goaling to
make some contribution with GSOC in opensource community.

I also worked about frontend, middleware, backed development via enterprise
java technologies. Furthermore, experienced “Web Technologies”, "Search
Technologies", "Cloud Computing", "Distributed Systems" and "Big Data". I
took place in search engine project that Apache technologies were used such
as  Solr, HBase, Hadoop, Nutch, Gora and I used Nutch project actively in
this project. You can see more information on my linkedin profile[1] about
me.

I mention some information for my process. My subject is "Nutch-1741 -
Support of Sitemaps in Nutch 2.x" [2] .You know that the url’s can be got
from only pages that were scanned before in nutch crawler system. Also, the
degrees of importance and “change frequence” of these urls are not known
only guessed. But, it is possible to find the whole of urls in a up-to-date
sitemap file. For this reason, sitemap files in website should be crawled.

I have explained the features for this project on my proposal. I’ll add it
to wiki and you can see details of it on wiki at when I share . You can see
nutch sitemap lifecycle the drawing [3].

[1] https://tr.linkedin.com/in/cihadguzel

[2] https://issues.apache.org/jira/browse/NUTCH-1741

[3]
https://issues.apache.org/jira/secure/attachment/12707721/SitemapCrawlerLifeCycle.pdf

Kind Regards


2015-05-19 1:16 GMT+03:00 Cihad Guzel <cg...@gmail.com>:

> Ok Lewis,
> I signed up to wiki, my wiki username: cihadguzel
>
> Thanks
>
> 2015-05-18 23:44 GMT+03:00 Lewis John Mcgibbney <lewis.mcgibbney@gmail.com
> >:
>
>> Fantastic Cihad,
>> Thank you for introducing yourself.
>> As you are in the community bonding period right now, please feel free to
>> provide your wiki username to me and I will grant you access to the wiki.
>> Please also feel free to pick up some lingering issues for Nutch 2.3.1
>>
>> https://issues.apache.org/jira/browse/NUTCH-1945?jql=project%20%3D%20NUTCH%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%202.3.1%20ORDER%20BY%20priority%20DESC
>> Thanks
>> Lewis
>>
>>
>> On Mon, May 18, 2015 at 1:26 PM, Cihad Guzel <cg...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I had applied the GSoC 2015 for Apache Nutch Project and my application
>>> is accepted. The main reason why I have choosen the Nutch Project for GSOC
>>> is knowing the Nutch closely. My subject is "Nutch-1741 - Support of
>>> Sitemaps in Nutch 2.x"[1] . Thanks Lewis John McGibbney and Talat Uyarer
>>> for being my mentors on this process. I hope I can contribute to this
>>> project.
>>>
>>> [1] https://issues.apache.org/jira/browse/NUTCH-1741
>>>
>>> Kind Regards
>>>
>>
>>
>>
>> --
>> *Lewis*
>>
>
>

Re: Nutch-1741 in GSOC 2015

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Fantastic Cihad,
Thank you for introducing yourself.
As you are in the community bonding period right now, please feel free to
provide your wiki username to me and I will grant you access to the wiki.
Please also feel free to pick up some lingering issues for Nutch 2.3.1
https://issues.apache.org/jira/browse/NUTCH-1945?jql=project%20%3D%20NUTCH%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%202.3.1%20ORDER%20BY%20priority%20DESC
Thanks
Lewis


On Mon, May 18, 2015 at 1:26 PM, Cihad Guzel <cg...@gmail.com> wrote:

> Hi all,
>
> I had applied the GSoC 2015 for Apache Nutch Project and my application is
> accepted. The main reason why I have choosen the Nutch Project for GSOC is
> knowing the Nutch closely. My subject is "Nutch-1741 - Support of Sitemaps
> in Nutch 2.x"[1] . Thanks Lewis John McGibbney and Talat Uyarer for being
> my mentors on this process. I hope I can contribute to this project.
>
> [1] https://issues.apache.org/jira/browse/NUTCH-1741
>
> Kind Regards
>



-- 
*Lewis*

Re: Nutch-1741 in GSOC 2015

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Fantastic Cihad,
Thank you for introducing yourself.
As you are in the community bonding period right now, please feel free to
provide your wiki username to me and I will grant you access to the wiki.
Please also feel free to pick up some lingering issues for Nutch 2.3.1
https://issues.apache.org/jira/browse/NUTCH-1945?jql=project%20%3D%20NUTCH%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%202.3.1%20ORDER%20BY%20priority%20DESC
Thanks
Lewis


On Mon, May 18, 2015 at 1:26 PM, Cihad Guzel <cg...@gmail.com> wrote:

> Hi all,
>
> I had applied the GSoC 2015 for Apache Nutch Project and my application is
> accepted. The main reason why I have choosen the Nutch Project for GSOC is
> knowing the Nutch closely. My subject is "Nutch-1741 - Support of Sitemaps
> in Nutch 2.x"[1] . Thanks Lewis John McGibbney and Talat Uyarer for being
> my mentors on this process. I hope I can contribute to this project.
>
> [1] https://issues.apache.org/jira/browse/NUTCH-1741
>
> Kind Regards
>



-- 
*Lewis*