You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Thomas Peuss (JIRA)" <ji...@apache.org> on 2007/07/18 09:04:07 UTC

[jira] Created: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

Add a field that generates an unique id when you have none in your data to index
--------------------------------------------------------------------------------

                 Key: SOLR-308
                 URL: https://issues.apache.org/jira/browse/SOLR-308
             Project: Solr
          Issue Type: New Feature
          Components: search
            Reporter: Thomas Peuss
            Priority: Minor


This patch adds a field that generates an unique id when you have no unique id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Created: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

Posted by tyball <do...@hotmail.com>.
 i have a simple test multiple core configuration, each with a uuid field
type  as unique id( using default="NEW").   when i use an query that uses
shards i always get an exeception that the uuid is not valid(it has
something to do with the tostring of UUIDField to retreive the UUID suddenly
contains a "java.util.UUID"+the actual UUID . i pathed it myself and removed 
"java.util.UUID" substring from the UUID and everything is fine. everything
works fine when not using shards in the query or not using uuid's afcourse.


-- 
View this message in context: http://www.nabble.com/-jira--Created%3A-%28SOLR-308%29-Add-a-field-that-generates-an-unique-id-when-you-have-none-in-your-data-to-index-tp11663589p21617406.html
Sent from the Solr - Dev mailing list archive at Nabble.com.


[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

Posted by "Lance Norskog (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662615#action_12662615 ] 

Lance Norskog commented on SOLR-308:
------------------------------------

This field type and its use is not documented in the Wiki: search for 'UUID' finds only custom code in ExtractingRequestHandler.



> Add a field that generates an unique id when you have none in your data to index
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-308
>                 URL: https://issues.apache.org/jira/browse/SOLR-308
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Thomas Peuss
>            Assignee: Hoss Man
>            Priority: Minor
>             Fix For: 1.3
>
>         Attachments: UUIDField.patch, UUIDField.patch, UUIDField.patch, UUIDField.patch, UUIDField.patch
>
>
> This patch adds a field that generates an unique id when you have no unique id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513960 ] 

Ryan McKinley commented on SOLR-308:
------------------------------------

The easiest option is to add a UUID when you index the data.  

Other options would be to make this FieldType a plugin and put it in the 'lib' directory.

> Add a field that generates an unique id when you have none in your data to index
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-308
>                 URL: https://issues.apache.org/jira/browse/SOLR-308
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Thomas Peuss
>            Priority: Minor
>         Attachments: GeneratedId.patch
>
>
> This patch adds a field that generates an unique id when you have no unique id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

Posted by "Pieter Berkel (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513891 ] 

Pieter Berkel commented on SOLR-308:
------------------------------------

>From the usage case you have provided, it sounds like the unique id will change every time you delete and re-insert the document.  If this is the case, then perhaps it might be more efficient to use the lucene document id as your unique id value rather than a seperate field?  However, as far as I'm aware, there currently isn't any way to access the lucene doc id from solr (except perhaps the luke request handler)?


> Add a field that generates an unique id when you have none in your data to index
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-308
>                 URL: https://issues.apache.org/jira/browse/SOLR-308
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Thomas Peuss
>            Priority: Minor
>         Attachments: GeneratedId.patch
>
>
> This patch adds a field that generates an unique id when you have no unique id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

Posted by "Thomas Peuss (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Peuss updated SOLR-308:
------------------------------

    Attachment: UUIDField.patch

Added missing test class and readded strong checking that the given value is indeed a valid UUID. So this behaves now like DateField.

> Add a field that generates an unique id when you have none in your data to index
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-308
>                 URL: https://issues.apache.org/jira/browse/SOLR-308
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Thomas Peuss
>            Priority: Minor
>         Attachments: UUIDField.patch, UUIDField.patch, UUIDField.patch, UUIDField.patch
>
>
> This patch adds a field that generates an unique id when you have no unique id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516307 ] 

Hoss Man commented on SOLR-308:
-------------------------------

a few misc comments...

1)  ...val.startsWith("NEW")... seems like a bad idea, why not just val.equals("NEW") ?

2) classes like IntField and DateField don't currently do strong parsing validation in the toInternal method, but this UUIDFIeld class does ... should it?

3) should toObject be strongly typed to return UUID ?

4) there shouldn't be new methods in the output writers for this field type ... output writers should only need to know about the most primitive types of data that should be viable regardless of the client language (ie: string, int, float, date, list, etc...)  the UUIDField should just write itself out as a string  (using <str> in the xml response writer)

> Add a field that generates an unique id when you have none in your data to index
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-308
>                 URL: https://issues.apache.org/jira/browse/SOLR-308
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Thomas Peuss
>            Priority: Minor
>         Attachments: UUIDField.patch, UUIDField.patch
>
>
> This patch adds a field that generates an unique id when you have no unique id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

Posted by "Thomas Peuss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513947 ] 

Thomas Peuss commented on SOLR-308:
-----------------------------------

That would be a good replacement for my problem. From the Lucene docs I see that the document id is 32 bits (int). I don't know if the docid "wraps around" when this address space is exhausted (I assume not). Or is the docid field recomputed on "optimize"?

I try to add the functionality to see the document id in the response. So for now we can close this issue for now.

> Add a field that generates an unique id when you have none in your data to index
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-308
>                 URL: https://issues.apache.org/jira/browse/SOLR-308
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Thomas Peuss
>            Priority: Minor
>         Attachments: GeneratedId.patch
>
>
> This patch adds a field that generates an unique id when you have no unique id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

Posted by "Thomas Peuss (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Peuss updated SOLR-308:
------------------------------

    Attachment: UUIDField.patch

An updated version of the patch. In the XML response the UUIDField is now rendered as <uuid>...</uuid>.

> Add a field that generates an unique id when you have none in your data to index
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-308
>                 URL: https://issues.apache.org/jira/browse/SOLR-308
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Thomas Peuss
>            Priority: Minor
>         Attachments: UUIDField.patch, UUIDField.patch
>
>
> This patch adds a field that generates an unique id when you have no unique id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

Posted by "rassen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12595757#action_12595757 ] 

rassen commented on SOLR-308:
-----------------------------

i'm having small question.
how to use these files?

> Add a field that generates an unique id when you have none in your data to index
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-308
>                 URL: https://issues.apache.org/jira/browse/SOLR-308
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Thomas Peuss
>            Assignee: Hoss Man
>            Priority: Minor
>             Fix For: 1.3
>
>         Attachments: UUIDField.patch, UUIDField.patch, UUIDField.patch, UUIDField.patch, UUIDField.patch
>
>
> This patch adds a field that generates an unique id when you have no unique id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

Posted by "Thomas Peuss (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Peuss updated SOLR-308:
------------------------------

    Attachment:     (was: GeneratedId.patch)

> Add a field that generates an unique id when you have none in your data to index
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-308
>                 URL: https://issues.apache.org/jira/browse/SOLR-308
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Thomas Peuss
>            Priority: Minor
>         Attachments: UUIDField.patch
>
>
> This patch adds a field that generates an unique id when you have no unique id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516888 ] 

Hoss Man commented on SOLR-308:
-------------------------------

> BTW: The DateField does strong parsing of the input... It tries to convert the input value to
> the internal representation and throws a SolrException when that is not possible...

...no, note quite.  DateField.toInternal(String) only does a quick sanity check to see if the string ends in a Z, if it does it *assumes* it's in the correct date format, and does no parsing -- if it does not end in a Z, then it does DateMathParsing (which may include parsing the date and throwing an exception if that can't be done) ... that parsing is only done if  necessary for the date math.

that was my point - if the UUIDFIeld class is going to index the UUID value using the orriginal human readable format, then there isn't really any reason to attempt to parse it -- except as a form of validation, i was just raising the question as to whether or not we think it should do that validation.




> Add a field that generates an unique id when you have none in your data to index
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-308
>                 URL: https://issues.apache.org/jira/browse/SOLR-308
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Thomas Peuss
>            Priority: Minor
>         Attachments: UUIDField.patch, UUIDField.patch, UUIDField.patch, UUIDField.patch
>
>
> This patch adds a field that generates an unique id when you have no unique id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

Posted by "Thomas Peuss (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Peuss updated SOLR-308:
------------------------------

    Attachment: UUIDField.patch

Patch for an UUIDField and associated test.

> Add a field that generates an unique id when you have none in your data to index
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-308
>                 URL: https://issues.apache.org/jira/browse/SOLR-308
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Thomas Peuss
>            Priority: Minor
>         Attachments: GeneratedId.patch, UUIDField.patch
>
>
> This patch adds a field that generates an unique id when you have no unique id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

Posted by "Thomas Peuss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662640#action_12662640 ] 

Thomas Peuss commented on SOLR-308:
-----------------------------------

Some documentation can be found here: http://lucene.apache.org/solr/api/org/apache/solr/schema/UUIDField.html

> Add a field that generates an unique id when you have none in your data to index
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-308
>                 URL: https://issues.apache.org/jira/browse/SOLR-308
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Thomas Peuss
>            Assignee: Hoss Man
>            Priority: Minor
>             Fix For: 1.3
>
>         Attachments: UUIDField.patch, UUIDField.patch, UUIDField.patch, UUIDField.patch, UUIDField.patch
>
>
> This patch adds a field that generates an unique id when you have no unique id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662626#action_12662626 ] 

Otis Gospodnetic commented on SOLR-308:
---------------------------------------

Lance - anyone can add/modify a Wiki page.  Do you mind adding info about this field type?


> Add a field that generates an unique id when you have none in your data to index
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-308
>                 URL: https://issues.apache.org/jira/browse/SOLR-308
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Thomas Peuss
>            Assignee: Hoss Man
>            Priority: Minor
>             Fix For: 1.3
>
>         Attachments: UUIDField.patch, UUIDField.patch, UUIDField.patch, UUIDField.patch, UUIDField.patch
>
>
> This patch adds a field that generates an unique id when you have no unique id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

Posted by "Thomas Peuss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513817 ] 

Thomas Peuss commented on SOLR-308:
-----------------------------------

The use case is the following:
* We get catalog data from vendors.
* The only unique thing is the catalogid, which is of course the same for all rows in one catalog.
* In our webapp we request first only a few fields that are needed for the search result display.
* When the customer clicks on a product in the search result he gets a detailed page. To get the info from Solr we need a unique id to read the rest of the fields (50+). This id is generated by this code.

So you see we need this id only for reference. We do nothing more with it.

Maybe I overlooked something and this can be achieved with existing code. Any hint is welcome.

> Add a field that generates an unique id when you have none in your data to index
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-308
>                 URL: https://issues.apache.org/jira/browse/SOLR-308
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Thomas Peuss
>            Priority: Minor
>         Attachments: GeneratedId.patch
>
>
> This patch adds a field that generates an unique id when you have no unique id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hoss Man reassigned SOLR-308:
-----------------------------

    Assignee: Hoss Man

Thomas: I understand you concerns, but in the balance of performance vs safety Solr tends to err on the side of performance when dealing with indexing data -- since that comes from a finite number of controlled sources (you may get it from dozens of places, but *you* must trust them at least a little and have the chance to sanitize their data before deciding to use it) while query inputs are treaty much more delicately since they typically come from much more diverse group of users many of whom you may outright distrust.

that said, i went ahead and left in the remaining validation you had, although i had to replace the isEmpty() call (Solr still uses Java 1.5)

I also changed the toInternal methods to always lowercase whatever value they get (the hex values need to be case insensitve in case someone tries to query/update using a different case then was orriginally indexed)

> Add a field that generates an unique id when you have none in your data to index
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-308
>                 URL: https://issues.apache.org/jira/browse/SOLR-308
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Thomas Peuss
>            Assignee: Hoss Man
>            Priority: Minor
>         Attachments: UUIDField.patch, UUIDField.patch, UUIDField.patch, UUIDField.patch, UUIDField.patch
>
>
> This patch adds a field that generates an unique id when you have no unique id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12595766#action_12595766 ] 

Ryan McKinley commented on SOLR-308:
------------------------------------

if you are using trunk (the nightly builds, not 1.2) it is included.

> Add a field that generates an unique id when you have none in your data to index
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-308
>                 URL: https://issues.apache.org/jira/browse/SOLR-308
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Thomas Peuss
>            Assignee: Hoss Man
>            Priority: Minor
>             Fix For: 1.3
>
>         Attachments: UUIDField.patch, UUIDField.patch, UUIDField.patch, UUIDField.patch, UUIDField.patch
>
>
> This patch adds a field that generates an unique id when you have no unique id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513582 ] 

Otis Gospodnetic commented on SOLR-308:
---------------------------------------

What type does the id end up being after this?  String?


> Add a field that generates an unique id when you have none in your data to index
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-308
>                 URL: https://issues.apache.org/jira/browse/SOLR-308
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Thomas Peuss
>            Priority: Minor
>         Attachments: GeneratedId.patch
>
>
> This patch adds a field that generates an unique id when you have no unique id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

Posted by "Thomas Peuss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12514154 ] 

Thomas Peuss commented on SOLR-308:
-----------------------------------

Hoss Man: I change the code in the way you described. Thanks for your notes on that.

> Add a field that generates an unique id when you have none in your data to index
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-308
>                 URL: https://issues.apache.org/jira/browse/SOLR-308
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Thomas Peuss
>            Priority: Minor
>         Attachments: GeneratedId.patch
>
>
> This patch adds a field that generates an unique id when you have no unique id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

Posted by "Thomas Peuss (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Peuss updated SOLR-308:
------------------------------

    Attachment: UUIDField.patch

Changes based on comments...

> Add a field that generates an unique id when you have none in your data to index
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-308
>                 URL: https://issues.apache.org/jira/browse/SOLR-308
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Thomas Peuss
>            Priority: Minor
>         Attachments: UUIDField.patch, UUIDField.patch, UUIDField.patch
>
>
> This patch adds a field that generates an unique id when you have no unique id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

Posted by "Thomas Peuss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513568 ] 

Thomas Peuss commented on SOLR-308:
-----------------------------------

Well, indirectly yes. It is viewable in the response when you store the field. We use this field because we mainly rely on 3rd party data where we have not much control of the data.

> Add a field that generates an unique id when you have none in your data to index
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-308
>                 URL: https://issues.apache.org/jira/browse/SOLR-308
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Thomas Peuss
>            Priority: Minor
>         Attachments: GeneratedId.patch
>
>
> This patch adds a field that generates an unique id when you have no unique id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hoss Man resolved SOLR-308.
---------------------------

       Resolution: Fixed
    Fix Version/s: 1.3

Committed revision 569279.


> Add a field that generates an unique id when you have none in your data to index
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-308
>                 URL: https://issues.apache.org/jira/browse/SOLR-308
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Thomas Peuss
>            Assignee: Hoss Man
>            Priority: Minor
>             Fix For: 1.3
>
>         Attachments: UUIDField.patch, UUIDField.patch, UUIDField.patch, UUIDField.patch, UUIDField.patch
>
>
> This patch adds a field that generates an unique id when you have no unique id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513661 ] 

Hoss Man commented on SOLR-308:
-------------------------------

i'm confused by this issue .. what's the need?

solr doesn't require that you have a uniqueKey field, so if there isn't a unique id for your data, why add one artificially?

> Add a field that generates an unique id when you have none in your data to index
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-308
>                 URL: https://issues.apache.org/jira/browse/SOLR-308
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Thomas Peuss
>            Priority: Minor
>         Attachments: GeneratedId.patch
>
>
> This patch adds a field that generates an unique id when you have no unique id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

Posted by "Thomas Peuss (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Peuss updated SOLR-308:
------------------------------

    Attachment: UUIDField.patch

Changed the input validation to only do basic input validation. We now only check if the thing looks like an UUID.

> Add a field that generates an unique id when you have none in your data to index
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-308
>                 URL: https://issues.apache.org/jira/browse/SOLR-308
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Thomas Peuss
>            Priority: Minor
>         Attachments: UUIDField.patch, UUIDField.patch, UUIDField.patch, UUIDField.patch, UUIDField.patch
>
>
> This patch adds a field that generates an unique id when you have no unique id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

Posted by "Thomas Peuss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516979 ] 

Thomas Peuss commented on SOLR-308:
-----------------------------------

I personally would prefer strong input checking. This avoids problems at search time. Better we find the problem at index time than the customer at search time... ;-) Maybe I am a bit paranoid here. But we get content from many suppliers and the quality is often not that good (commas instead of dots as decimal seperator in floats - even changing from row to row of the catalogue).

> Add a field that generates an unique id when you have none in your data to index
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-308
>                 URL: https://issues.apache.org/jira/browse/SOLR-308
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Thomas Peuss
>            Priority: Minor
>         Attachments: UUIDField.patch, UUIDField.patch, UUIDField.patch, UUIDField.patch, UUIDField.patch
>
>
> This patch adds a field that generates an unique id when you have no unique id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

Posted by "Thomas Peuss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516638 ] 

Thomas Peuss commented on SOLR-308:
-----------------------------------

BTW: The DateField does strong parsing of the input... It tries to convert the input value to the internal representation and throws a SolrException when that is not possible...

> Add a field that generates an unique id when you have none in your data to index
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-308
>                 URL: https://issues.apache.org/jira/browse/SOLR-308
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Thomas Peuss
>            Priority: Minor
>         Attachments: UUIDField.patch, UUIDField.patch, UUIDField.patch
>
>
> This patch adds a field that generates an unique id when you have no unique id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513702 ] 

Ryan McKinley commented on SOLR-308:
------------------------------------

If I'm following correct, this is a FieldType that generates a UUID regardless of the input value:

	public Field createField(SchemaField field, String externalVal, float boost) {
		// We ignore the external value and have our own
		return super.createField(field, UUID.randomUUID().toString(), boost);
	}

What is a use case for that? 

If you are looking for something like the sql auto increment, it might be a good candidate for the new fangled 'UpdateRequestProcessor' -- this could check if the input document has a uniqueKey - if not, add one and add the new value to the response.

> Add a field that generates an unique id when you have none in your data to index
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-308
>                 URL: https://issues.apache.org/jira/browse/SOLR-308
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Thomas Peuss
>            Priority: Minor
>         Attachments: GeneratedId.patch
>
>
> This patch adds a field that generates an unique id when you have no unique id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

Posted by "Erik Hatcher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513508 ] 

Erik Hatcher commented on SOLR-308:
-----------------------------------

Can the client get the generated id back when adding a document?

> Add a field that generates an unique id when you have none in your data to index
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-308
>                 URL: https://issues.apache.org/jira/browse/SOLR-308
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Thomas Peuss
>            Priority: Minor
>         Attachments: GeneratedId.patch
>
>
> This patch adds a field that generates an unique id when you have no unique id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

Posted by "Thomas Peuss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516368 ] 

Thomas Peuss commented on SOLR-308:
-----------------------------------

1.) I change it.
2.) I remove the check. I understand that this has a performance impact.
3.) I changed it to what DateField and IntField do.
4.) I remove that as well.

If we don't do strong parsing we should call this IDField instead of UUIDField.  If we don't enforce that this is an UUID we shouldn't name it like that. What do you think?

> Add a field that generates an unique id when you have none in your data to index
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-308
>                 URL: https://issues.apache.org/jira/browse/SOLR-308
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Thomas Peuss
>            Priority: Minor
>         Attachments: UUIDField.patch, UUIDField.patch
>
>
> This patch adds a field that generates an unique id when you have no unique id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

Posted by "Thomas Peuss (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Peuss updated SOLR-308:
------------------------------

    Attachment: GeneratedId.patch

> Add a field that generates an unique id when you have none in your data to index
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-308
>                 URL: https://issues.apache.org/jira/browse/SOLR-308
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Thomas Peuss
>            Priority: Minor
>         Attachments: GeneratedId.patch
>
>
> This patch adds a field that generates an unique id when you have no unique id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513959 ] 

Yonik Seeley commented on SOLR-308:
-----------------------------------

Lucene docids are transient (they change when the index changes) - they should not be used across different instances of an IndexReader

> Add a field that generates an unique id when you have none in your data to index
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-308
>                 URL: https://issues.apache.org/jira/browse/SOLR-308
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Thomas Peuss
>            Priority: Minor
>         Attachments: GeneratedId.patch
>
>
> This patch adds a field that generates an unique id when you have no unique id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513972 ] 

Hoss Man commented on SOLR-308:
-------------------------------

I understood your data entry/delete reindexing strategy, but i hadn't considered the use case of doing a query, and then issuing a followup query to get more details about specific items.

As yonik points out, exposing the internal lucene docid would be a bad idea since it may change every time an IndexReader is opened ... even if hte doc you are interested in is still in the index (ie: hasn't been deleted) other deletions may have changed it's internal id.

i have no objection to adding a FieldType that can generate UUID on demand for use cases like this, but having it ignore the input seems a little sketchy to me.  it seems like a better approach would be to have UUIDFieldType with a toInternal() method that tests it's input for some marker token (like "NEW" or "*") and if it sees that token, generates a new UUID, otherwise it uses the literal value.  then you can configure the id field with a defaultValue of "NEW" in the schema and any doc without an id will get a unique one, but if someone tries to update an existing doc whose id they already know, it will still work as well.

> Add a field that generates an unique id when you have none in your data to index
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-308
>                 URL: https://issues.apache.org/jira/browse/SOLR-308
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Thomas Peuss
>            Priority: Minor
>         Attachments: GeneratedId.patch
>
>
> This patch adds a field that generates an unique id when you have no unique id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

Posted by "Thomas Peuss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513817 ] 

Thomas Peuss edited comment on SOLR-308 at 7/18/07 11:04 PM:
-------------------------------------------------------------

The use case is the following:
* We get catalog data from vendors (300+). We have no control about the data.
* The only unique thing is the catalogid, which is of course the same for all rows in one catalog.
* In our webapp we request first only a few fields that are needed for the search result display.
* When the customer clicks on a product in the search result he gets a detailed page. To get the info from Solr we need a unique id to read the rest of the fields (50+). This id is generated by this code.

Of course we could add the unique id in a preprocessing step but we wanted to achieve this with Solr alone.

The update procedure goes like this:
* Delete all documents with a specific catalogId
* Insert the updated catalog data

So you see we need this id to find the exact same document we have in the search result. We do nothing more with it.

Maybe I overlooked something and this can be achieved with existing code. Any hint is welcome.


 was:
The use case is the following:
* We get catalog data from vendors.
* The only unique thing is the catalogid, which is of course the same for all rows in one catalog.
* In our webapp we request first only a few fields that are needed for the search result display.
* When the customer clicks on a product in the search result he gets a detailed page. To get the info from Solr we need a unique id to read the rest of the fields (50+). This id is generated by this code.

So you see we need this id only for reference. We do nothing more with it.

Maybe I overlooked something and this can be achieved with existing code. Any hint is welcome.

> Add a field that generates an unique id when you have none in your data to index
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-308
>                 URL: https://issues.apache.org/jira/browse/SOLR-308
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Thomas Peuss
>            Priority: Minor
>         Attachments: GeneratedId.patch
>
>
> This patch adds a field that generates an unique id when you have no unique id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

Posted by "Thomas Peuss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662654#action_12662654 ] 

Thomas Peuss commented on SOLR-308:
-----------------------------------

Fields are defined by:

<fieldType name="uuid" class="solr.UUIDField" indexed="true" />

and used by

<field name="id" type="uuid" indexed="true" stored="true" default="NEW"/>



> Add a field that generates an unique id when you have none in your data to index
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-308
>                 URL: https://issues.apache.org/jira/browse/SOLR-308
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Thomas Peuss
>            Assignee: Hoss Man
>            Priority: Minor
>             Fix For: 1.3
>
>         Attachments: UUIDField.patch, UUIDField.patch, UUIDField.patch, UUIDField.patch, UUIDField.patch
>
>
> This patch adds a field that generates an unique id when you have no unique id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.