You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2010/08/23 05:21:16 UTC

[jira] Created: (SOLR-2073) Geonames.org UpdateProcessor for Spatial

Geonames.org UpdateProcessor for Spatial
----------------------------------------

                 Key: SOLR-2073
                 URL: https://issues.apache.org/jira/browse/SOLR-2073
             Project: Solr
          Issue Type: New Feature
          Components: update
            Reporter: Chris A. Mattmann
             Fix For: 3.1


My student, William Quach, and I have put together the attached patch based on his final project in my CS572: Search Engines and Information Retrieval class at USC. This patch adds a Geonames.org UpdateProcessor to Solr so that it can take tab delimited data from Geonames.org and add it to the Solr index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2073) Geonames.org UpdateProcessor for Spatial

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901256#action_12901256 ] 

Ryan McKinley commented on SOLR-2073:
-------------------------------------

It seems odd since this assumes you have a schema of some set of fields (not included?)

Would something like this be better as a utility class chat posts via Solrj API?  It could be embedded if 'speed' is the reason an update processor seems like a good idea.

> Geonames.org UpdateProcessor for Spatial
> ----------------------------------------
>
>                 Key: SOLR-2073
>                 URL: https://issues.apache.org/jira/browse/SOLR-2073
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>            Reporter: Chris A. Mattmann
>             Fix For: 3.1
>
>         Attachments: SOLR-2073.Quach.Mattmann.082210.patch.txt
>
>
> My student, William Quach, and I have put together the attached patch based on his final project in my CS572: Search Engines and Information Retrieval class at USC. This patch adds a Geonames.org UpdateProcessor to Solr so that it can take tab delimited data from Geonames.org and add it to the Solr index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2073) Geonames.org UpdateProcessor for Spatial

Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923927#action_12923927 ] 

Chris A. Mattmann commented on SOLR-2073:
-----------------------------------------

Great, well if you guys don't find it useful, no problem. I took the time to help put this together, and we're using it, so I figured I'd share. If others find it useful, great.

> Geonames.org UpdateProcessor for Spatial
> ----------------------------------------
>
>                 Key: SOLR-2073
>                 URL: https://issues.apache.org/jira/browse/SOLR-2073
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>            Reporter: Chris A. Mattmann
>             Fix For: 3.1
>
>         Attachments: SOLR-2073.Quach.Mattmann.082210.patch.txt
>
>
> My student, William Quach, and I have put together the attached patch based on his final project in my CS572: Search Engines and Information Retrieval class at USC. This patch adds a Geonames.org UpdateProcessor to Solr so that it can take tab delimited data from Geonames.org and add it to the Solr index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2073) Geonames.org UpdateProcessor for Spatial

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901828#action_12901828 ] 

Robert Muir commented on SOLR-2073:
-----------------------------------

Hello, I am pretty familiar with this database (i do a lot of lucene testing with it), so I took a glance here:
* Can't we do this with the CSV Loader? i always load this as CSV (but with the delimiter as tab)
* String.toLowerCase should not be used without specifying locale, in fact i think it would be best to avoid lowercasing here at all. someone can lowercase with analysis.


> Geonames.org UpdateProcessor for Spatial
> ----------------------------------------
>
>                 Key: SOLR-2073
>                 URL: https://issues.apache.org/jira/browse/SOLR-2073
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>            Reporter: Chris A. Mattmann
>             Fix For: 3.1
>
>         Attachments: SOLR-2073.Quach.Mattmann.082210.patch.txt
>
>
> My student, William Quach, and I have put together the attached patch based on his final project in my CS572: Search Engines and Information Retrieval class at USC. This patch adds a Geonames.org UpdateProcessor to Solr so that it can take tab delimited data from Geonames.org and add it to the Solr index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2073) Geonames.org UpdateProcessor for Spatial

Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901263#action_12901263 ] 

Chris A. Mattmann commented on SOLR-2073:
-----------------------------------------

bq. It seems odd since this assumes you have a schema of some set of fields (not included?) 

The schema is coming in another patch.

bq. Would something like this be better as a utility class chat posts via Solrj API? It could be embedded if 'speed' is the reason an update processor seems like a good idea.

Yeah I think speed is a good reason. Also it seemed to make the most sense to us that it be an UpdateProcessor.



> Geonames.org UpdateProcessor for Spatial
> ----------------------------------------
>
>                 Key: SOLR-2073
>                 URL: https://issues.apache.org/jira/browse/SOLR-2073
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>            Reporter: Chris A. Mattmann
>             Fix For: 3.1
>
>         Attachments: SOLR-2073.Quach.Mattmann.082210.patch.txt
>
>
> My student, William Quach, and I have put together the attached patch based on his final project in my CS572: Search Engines and Information Retrieval class at USC. This patch adds a Geonames.org UpdateProcessor to Solr so that it can take tab delimited data from Geonames.org and add it to the Solr index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2073) Geonames.org UpdateProcessor for Spatial

Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901875#action_12901875 ] 

Chris A. Mattmann commented on SOLR-2073:
-----------------------------------------

Hi Robert: 

Thanks yep, probably could be done with CSVLoader. Just seemed as simple to do with a BufferedReader and some readlines, but could have just as easily been subclassed. I am not super familiar with the CSVLoader and haven't had time to look at it. As for the lowercasing, yeah makes sense.

Cheers,
Chris


> Geonames.org UpdateProcessor for Spatial
> ----------------------------------------
>
>                 Key: SOLR-2073
>                 URL: https://issues.apache.org/jira/browse/SOLR-2073
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>            Reporter: Chris A. Mattmann
>             Fix For: 3.1
>
>         Attachments: SOLR-2073.Quach.Mattmann.082210.patch.txt
>
>
> My student, William Quach, and I have put together the attached patch based on his final project in my CS572: Search Engines and Information Retrieval class at USC. This patch adds a Geonames.org UpdateProcessor to Solr so that it can take tab delimited data from Geonames.org and add it to the Solr index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2073) Geonames.org UpdateProcessor for Spatial

Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901248#action_12901248 ] 

Chris A. Mattmann commented on SOLR-2073:
-----------------------------------------

My contribution to this patch was simply formatting and adding the Apache license headers. William is on the mailing list, and he can take it from here. I'm just getting this started. William has an ICLA on file as of yesterday in r24583 in the foundation repository. See here: http://people.apache.org/committer-index.html for confirmation as well.

> Geonames.org UpdateProcessor for Spatial
> ----------------------------------------
>
>                 Key: SOLR-2073
>                 URL: https://issues.apache.org/jira/browse/SOLR-2073
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>            Reporter: Chris A. Mattmann
>             Fix For: 3.1
>
>
> My student, William Quach, and I have put together the attached patch based on his final project in my CS572: Search Engines and Information Retrieval class at USC. This patch adds a Geonames.org UpdateProcessor to Solr so that it can take tab delimited data from Geonames.org and add it to the Solr index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (SOLR-2073) Geonames.org UpdateProcessor for Spatial

Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris A. Mattmann updated SOLR-2073:
------------------------------------

    Attachment: SOLR-2073.Quach.Mattmann.082210.patch.txt

- patch from William and I.

> Geonames.org UpdateProcessor for Spatial
> ----------------------------------------
>
>                 Key: SOLR-2073
>                 URL: https://issues.apache.org/jira/browse/SOLR-2073
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>            Reporter: Chris A. Mattmann
>             Fix For: 3.1
>
>         Attachments: SOLR-2073.Quach.Mattmann.082210.patch.txt
>
>
> My student, William Quach, and I have put together the attached patch based on his final project in my CS572: Search Engines and Information Retrieval class at USC. This patch adds a Geonames.org UpdateProcessor to Solr so that it can take tab delimited data from Geonames.org and add it to the Solr index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2073) Geonames.org UpdateProcessor for Spatial

Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901253#action_12901253 ] 

Chris A. Mattmann commented on SOLR-2073:
-----------------------------------------

note: the CHANGES.txt part of the patch seems to change an odd line that I didn't intend to modify (maybe because of the encoding?).

> Geonames.org UpdateProcessor for Spatial
> ----------------------------------------
>
>                 Key: SOLR-2073
>                 URL: https://issues.apache.org/jira/browse/SOLR-2073
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>            Reporter: Chris A. Mattmann
>             Fix For: 3.1
>
>         Attachments: SOLR-2073.Quach.Mattmann.082210.patch.txt
>
>
> My student, William Quach, and I have put together the attached patch based on his final project in my CS572: Search Engines and Information Retrieval class at USC. This patch adds a Geonames.org UpdateProcessor to Solr so that it can take tab delimited data from Geonames.org and add it to the Solr index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2073) Geonames.org UpdateProcessor for Spatial

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923916#action_12923916 ] 

Grant Ingersoll commented on SOLR-2073:
---------------------------------------

When I saw the title of this, I was under the impression that what this would do is given a document being passed in, use the GeoNames database to identify and extract names from the content and add them to other fields so one could search/facet, etc.  As others have stated, I don't see much of a need for having a custom loader for this as it stands now.  If you want to create a TokenFilter that does that, it would be great.

> Geonames.org UpdateProcessor for Spatial
> ----------------------------------------
>
>                 Key: SOLR-2073
>                 URL: https://issues.apache.org/jira/browse/SOLR-2073
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>            Reporter: Chris A. Mattmann
>             Fix For: 3.1
>
>         Attachments: SOLR-2073.Quach.Mattmann.082210.patch.txt
>
>
> My student, William Quach, and I have put together the attached patch based on his final project in my CS572: Search Engines and Information Retrieval class at USC. This patch adds a Geonames.org UpdateProcessor to Solr so that it can take tab delimited data from Geonames.org and add it to the Solr index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org