You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Stefan Groschupf <sg...@media-style.com> on 2006/07/24 20:17:06 UTC

segread vs. readseg

Hi developers,

we have command like readdb and readlinkdb but segread. Wouldn't be  
more consistent to name the command readseg instead segread?
... just a thought.

Stefan



Re: segread vs. readseg

Posted by Stefan Groschupf <sg...@media-style.com>.
I like it!

Am 24.07.2006 um 16:10 schrieb Andrzej Bialecki:

> Stefan Neufeind wrote:
>> Andrzej Bialecki wrote:
>>> Stefan Groschupf wrote:
>>>> Hi developers,
>>>>
>>>> we have command like readdb and readlinkdb but segread. Wouldn't  
>>>> be more consistent to name the command readseg instead segread?
>>>> ... just a thought.
>>>
>>> Yes, it seems more consistent. However, if we change it then  
>>> scripts people wrote would break. We could support both aliases  
>>> in 0.8, and give a deprecation message.
>>>
>>> What do others think?
>>
>> Same feeling here. Agreed.
>
> What about the following?
>
> Index: bin/nutch
> ===================================================================
> --- bin/nutch    (revision 424960)
> +++ bin/nutch    (working copy)
> @@ -40,7 +40,7 @@
>   echo "  generate          generate new segments to fetch"
>   echo "  fetch             fetch a segment's pages"
>   echo "  parse             parse a segment's pages"
> -  echo "  segread           read / dump segment data"
> +  echo "  readseg           read / dump segment data"
>   echo "  mergesegs         merge several segments, with optional  
> filtering and slicing"
>   echo "  updatedb          update crawl db from segments after  
> fetching"
>   echo "  invertlinks       create a linkdb from parsed segments"
> @@ -158,7 +158,10 @@
>   CLASS=org.apache.nutch.crawl.CrawlDbMerger
> elif [ "$COMMAND" = "readlinkdb" ] ; then
>   CLASS=org.apache.nutch.crawl.LinkDbReader
> +elif [ "$COMMAND" = "readseg" ] ; then
> +  CLASS=org.apache.nutch.segment.SegmentReader
> elif [ "$COMMAND" = "segread" ] ; then
> +  echo "[DEPRECATED] Command 'segread' is deprecated, use  
> 'readseg' instead."
>   CLASS=org.apache.nutch.segment.SegmentReader
> elif [ "$COMMAND" = "mergesegs" ] ; then
>   CLASS=org.apache.nutch.segment.SegmentMerger
>
>
> -- 
> Best regards,
> Andrzej Bialecki     <><
> ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
>


Re: segread vs. readseg

Posted by Andrzej Bialecki <ab...@getopt.org>.
Stefan Neufeind wrote:
> Andrzej Bialecki wrote:
>> Stefan Groschupf wrote:
>>> Hi developers,
>>>
>>> we have command like readdb and readlinkdb but segread. Wouldn't be 
>>> more consistent to name the command readseg instead segread?
>>> ... just a thought.
>>
>> Yes, it seems more consistent. However, if we change it then scripts 
>> people wrote would break. We could support both aliases in 0.8, and 
>> give a deprecation message.
>>
>> What do others think?
>
> Same feeling here. Agreed.

What about the following?

Index: bin/nutch
===================================================================
--- bin/nutch    (revision 424960)
+++ bin/nutch    (working copy)
@@ -40,7 +40,7 @@
   echo "  generate          generate new segments to fetch"
   echo "  fetch             fetch a segment's pages"
   echo "  parse             parse a segment's pages"
-  echo "  segread           read / dump segment data"
+  echo "  readseg           read / dump segment data"
   echo "  mergesegs         merge several segments, with optional 
filtering and slicing"
   echo "  updatedb          update crawl db from segments after fetching"
   echo "  invertlinks       create a linkdb from parsed segments"
@@ -158,7 +158,10 @@
   CLASS=org.apache.nutch.crawl.CrawlDbMerger
 elif [ "$COMMAND" = "readlinkdb" ] ; then
   CLASS=org.apache.nutch.crawl.LinkDbReader
+elif [ "$COMMAND" = "readseg" ] ; then
+  CLASS=org.apache.nutch.segment.SegmentReader
 elif [ "$COMMAND" = "segread" ] ; then
+  echo "[DEPRECATED] Command 'segread' is deprecated, use 'readseg' 
instead."
   CLASS=org.apache.nutch.segment.SegmentReader
 elif [ "$COMMAND" = "mergesegs" ] ; then
   CLASS=org.apache.nutch.segment.SegmentMerger


-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: segread vs. readseg

Posted by Stefan Neufeind <ap...@stefan-neufeind.de>.
Andrzej Bialecki wrote:
> Stefan Groschupf wrote:
>> Hi developers,
>>
>> we have command like readdb and readlinkdb but segread. Wouldn't be 
>> more consistent to name the command readseg instead segread?
>> ... just a thought.
> 
> Yes, it seems more consistent. However, if we change it then scripts 
> people wrote would break. We could support both aliases in 0.8, and give 
> a deprecation message.
> 
> What do others think?

Same feeling here. Agreed.

   Stefan

Re: segread vs. readseg

Posted by Andrzej Bialecki <ab...@getopt.org>.
Stefan Groschupf wrote:
> Hi developers,
>
> we have command like readdb and readlinkdb but segread. Wouldn't be 
> more consistent to name the command readseg instead segread?
> ... just a thought.

Yes, it seems more consistent. However, if we change it then scripts 
people wrote would break. We could support both aliases in 0.8, and give 
a deprecation message.

What do others think?

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com