You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Cameron Pope (JIRA)" <ji...@apache.org> on 2008/12/04 18:04:44 UTC

[jira] Created: (SOLR-895) DataImportHandler does not import multiple documents specified in db-data-config.xml

DataImportHandler does not import multiple documents specified in db-data-config.xml
------------------------------------------------------------------------------------

                 Key: SOLR-895
                 URL: https://issues.apache.org/jira/browse/SOLR-895
             Project: Solr
          Issue Type: Bug
          Components: contrib - DataImportHandler
    Affects Versions: 1.3, 1.3.1, 1.4
            Reporter: Cameron Pope


In our system we have multiple kinds of items that need to be indexed. In the database, they are represented as 'one table per concrete class'. We are using the DataImportHandler to automatically create an index from our database. The db-data-config.xml file that we are using contains two 'Document' elements: one for each class of item that we are indexing.

Expected behavior: the DataImportHandler imports items for each 'Document' tag defined in the configuration file
Actual behavior: the DataImportHandler stops importing it completes indexing of the first document

I am attaching a patch, with a unit test that verifies the correct behavior, it should apply against the trunk without problems. I can also supply a patch against the 1.3 branch if you would like.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-895) DataImportHandler does not import multiple documents specified in db-data-config.xml

Posted by "Cameron Pope (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cameron Pope updated SOLR-895:
------------------------------

    Attachment: import-multiple-documents.patch

This is a patch to DataImporter that causes it to import all documents defined in the config file. There is also a unit test to verify correct behavior. It should apply against the svn trunk without any problems.

> DataImportHandler does not import multiple documents specified in db-data-config.xml
> ------------------------------------------------------------------------------------
>
>                 Key: SOLR-895
>                 URL: https://issues.apache.org/jira/browse/SOLR-895
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.3, 1.3.1, 1.4
>            Reporter: Cameron Pope
>         Attachments: import-multiple-documents.patch
>
>
> In our system we have multiple kinds of items that need to be indexed. In the database, they are represented as 'one table per concrete class'. We are using the DataImportHandler to automatically create an index from our database. The db-data-config.xml file that we are using contains two 'Document' elements: one for each class of item that we are indexing.
> Expected behavior: the DataImportHandler imports items for each 'Document' tag defined in the configuration file
> Actual behavior: the DataImportHandler stops importing it completes indexing of the first document
> I am attaching a patch, with a unit test that verifies the correct behavior, it should apply against the trunk without problems. I can also supply a patch against the 1.3 branch if you would like.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-895) DataImportHandler does not import multiple documents specified in db-data-config.xml

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655570#action_12655570 ] 

Shalin Shekhar Mangar commented on SOLR-895:
--------------------------------------------

Multiple documents is a legacy design issue that we are trying to remove altogether. Initially DataImportHandler was an external stand-alone server writing documents to Solr. Therefore it had multiple <document> elements to write to multiple cores. Once it was integrated inside Solr itself, it made sense to move away from that design.

Just use a single document always and use multiple root entities if needed. If you have multiple cores, each core should have its own DataImportHandler configuration.

> DataImportHandler does not import multiple documents specified in db-data-config.xml
> ------------------------------------------------------------------------------------
>
>                 Key: SOLR-895
>                 URL: https://issues.apache.org/jira/browse/SOLR-895
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.3, 1.3.1, 1.4
>            Reporter: Cameron Pope
>         Attachments: import-multiple-documents.patch
>
>
> In our system we have multiple kinds of items that need to be indexed. In the database, they are represented as 'one table per concrete class'. We are using the DataImportHandler to automatically create an index from our database. The db-data-config.xml file that we are using contains two 'Document' elements: one for each class of item that we are indexing.
> Expected behavior: the DataImportHandler imports items for each 'Document' tag defined in the configuration file
> Actual behavior: the DataImportHandler stops importing it completes indexing of the first document
> I am attaching a patch, with a unit test that verifies the correct behavior, it should apply against the trunk without problems. I can also supply a patch against the 1.3 branch if you would like.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-895) DataImportHandler does not import multiple documents specified in db-data-config.xml

Posted by "Cameron Pope (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653396#action_12653396 ] 

Cameron Pope commented on SOLR-895:
-----------------------------------

I tried moving both root entities under the same <document> element and specifying 'docRoot="true"' for both of them and that appears to work. Thanks.

Since I am new to Solr, please forgive me for logging what is probably not a bug at all. Is specifying multiple 'root' entities the envisioned way to solve this problem, or is it a workaround? Just curious and trying to gain a better understanding of the design (I noticed parts of the DataImporter assume multiple <Document> elements and other parts assume only one), and if so, I'd be happy to update the wiki to include it -- I imagine I am not the only one who has a database schema like this who wants to create an index with Solr. 

All in all, I have been hugely impressed with Solr and the DataImportHandler - both are incredible pieces of work. Thanks!


> DataImportHandler does not import multiple documents specified in db-data-config.xml
> ------------------------------------------------------------------------------------
>
>                 Key: SOLR-895
>                 URL: https://issues.apache.org/jira/browse/SOLR-895
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.3, 1.3.1, 1.4
>            Reporter: Cameron Pope
>         Attachments: import-multiple-documents.patch
>
>
> In our system we have multiple kinds of items that need to be indexed. In the database, they are represented as 'one table per concrete class'. We are using the DataImportHandler to automatically create an index from our database. The db-data-config.xml file that we are using contains two 'Document' elements: one for each class of item that we are indexing.
> Expected behavior: the DataImportHandler imports items for each 'Document' tag defined in the configuration file
> Actual behavior: the DataImportHandler stops importing it completes indexing of the first document
> I am attaching a patch, with a unit test that verifies the correct behavior, it should apply against the trunk without problems. I can also supply a patch against the 1.3 branch if you would like.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (SOLR-895) DataImportHandler does not import multiple documents specified in db-data-config.xml

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shalin Shekhar Mangar resolved SOLR-895.
----------------------------------------

    Resolution: Won't Fix

Marking this as won't fix per the discussions.

> DataImportHandler does not import multiple documents specified in db-data-config.xml
> ------------------------------------------------------------------------------------
>
>                 Key: SOLR-895
>                 URL: https://issues.apache.org/jira/browse/SOLR-895
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.3, 1.3.1, 1.4
>            Reporter: Cameron Pope
>         Attachments: import-multiple-documents.patch
>
>
> In our system we have multiple kinds of items that need to be indexed. In the database, they are represented as 'one table per concrete class'. We are using the DataImportHandler to automatically create an index from our database. The db-data-config.xml file that we are using contains two 'Document' elements: one for each class of item that we are indexing.
> Expected behavior: the DataImportHandler imports items for each 'Document' tag defined in the configuration file
> Actual behavior: the DataImportHandler stops importing it completes indexing of the first document
> I am attaching a patch, with a unit test that verifies the correct behavior, it should apply against the trunk without problems. I can also supply a patch against the 1.3 branch if you would like.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-895) DataImportHandler does not import multiple documents specified in db-data-config.xml

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653366#action_12653366 ] 

Noble Paul commented on SOLR-895:
---------------------------------

why can't this be achieved using multiple root-entities under the same document?

> DataImportHandler does not import multiple documents specified in db-data-config.xml
> ------------------------------------------------------------------------------------
>
>                 Key: SOLR-895
>                 URL: https://issues.apache.org/jira/browse/SOLR-895
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.3, 1.3.1, 1.4
>            Reporter: Cameron Pope
>         Attachments: import-multiple-documents.patch
>
>
> In our system we have multiple kinds of items that need to be indexed. In the database, they are represented as 'one table per concrete class'. We are using the DataImportHandler to automatically create an index from our database. The db-data-config.xml file that we are using contains two 'Document' elements: one for each class of item that we are indexing.
> Expected behavior: the DataImportHandler imports items for each 'Document' tag defined in the configuration file
> Actual behavior: the DataImportHandler stops importing it completes indexing of the first document
> I am attaching a patch, with a unit test that verifies the correct behavior, it should apply against the trunk without problems. I can also supply a patch against the 1.3 branch if you would like.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.