You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "KuroSaka TeruHiko (JIRA)" <ji...@apache.org> on 2006/07/01 01:34:29 UTC

[jira] Created: (NUTCH-316) Confusion about query languages

Confusion about query languages
-------------------------------

         Key: NUTCH-316
         URL: http://issues.apache.org/jira/browse/NUTCH-316
     Project: Nutch
        Type: Bug

  Components: web gui  
    Versions: 0.8-dev    
 Environment: n/a
    Reporter: KuroSaka TeruHiko


In 2006-6-16 nightly source code, src/web/jsp/search.jsp has these lines:

  String queryLang = request.getParameter("lang");
  if (queryLang == null) { queryLang = ""; }
  Query query = Query.parse(queryString, queryLang, nutchConf);

According to the observation of URLs shown in the browser, the lang parameter reflects the language
of the GUI (the language in which GUI elements are labeled) as the user clicks on the two letter code 
near the bottom of each Nutch GUI screen.

The Java API Doc on Query is not clear about what queryLang is meant.  Is this the language of
the query (how query should be lemmatized, if supported by the analyzer, and what stop word list
should be applied), is is this the language of the documents to be searched?

Although the two concepts above are closely related, they are not tied to the GUI language at all.

I, as Japanese user, might prefer to see all GUIs in Japanese, but I would still need to
search English documents for Englsh words.  The current implementation of search.jsp seems
to restrict search domain to the documents of the GUI language in one way (by treating the
terms to be from the GUI language), or the other (restricting the search domain to the documents
of the GI language).

To be perfect, there should be a drop-down list from which the language of query analyzer
is selected, and a set of check boxes from which the document languages can be selected,
in addition to the existing line of two letter language codes from which the GUI language is choosen.

But that would be too clutering.  

Google uses a separate configuration screen to let the user to choose a set of languages
of the documents to be searched.  That might be a good middle-of-the-road approach.
Because of the lack of language processing on search terms, Google does not need to know
the language of the query.  Nutch GUI might want to have a drop down list from which a language
of the query can be choosen, with the GUI language pre-selected.


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira