You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Xavier Rodriguez <xe...@gmail.com> on 2010/07/06 17:19:25 UTC

Adding new elements to index

Hi,

I have a SOLR installed on a Tomcat application server. This solr instance
has some data indexed from a postgres database. Now I need to add some
entities from an Oracle database. When I run the full-import command, the
documents indexed are only documents from postgres. In fact, if I have 200
rows indexed from postgres and 100 rows from Oracle, the full-import process
only indexes 200 documents from oracle, although it shows clearly that the
query retruned 300 rows.

I'm not doing a delta-import, simply a full import. I've tried to clean the
index, reload the configuration, and manually remove dataimport.properties
because it's the only metadata i found.  Is there any other file to check or
modify just to get all 300 rows indexed?

Of course, I tried to find one of that oracle fields, with no results.

Thanks a lot,

Xavier Rodriguez.

Re: Adding new elements to index

Posted by Govind Kanshi <go...@gmail.com>.
Just for testing purpose - I would
1. Use curl to create new docs
2. Use Solrj to go to individual dbs and collect docs.



On Wed, Jul 7, 2010 at 12:45 PM, Xavier Rodriguez <xe...@gmail.com> wrote:

> Thanks for the quick reply!
>
> In fact it was a typo, the 200 rows I got were from postgres. I tried to
> say
> that the full-import was omitting the 100 oracle rows.
>
> When I run the full import, I run it as a single job, using the url
> command=full-import. I've tried to clear the index both using the clean
> command and manually deleting it, but when I run the full-import, the
> number
> of indexed documents are the documents coming from postgres.
>
> To be sure that the id field is unique, i get the id by assigning a letter
> before the id value. When indexed, the id looks like s_123, and that's the
> id 123 for an entity identified as "s". Other entities use different
> prefixes, but never "s".
>
> I used DIH to index the data. My configuration is the folllowing:
>
> File db-data-config.xml
>
>  <dataSource
>        type="JdbcDataSource"
>        name="ds_ora"
>        driver="oracle.jdbc.OracleDriver"
>        url="jdbc:oracle:thin:@xxx.xxx.xxx.xxx:1521:SID"
>        user="user"
>        password="password"
>    />
>
>  <dataSource
>        type="JdbcDataSource"
>        name="ds_pg"
>        driver="org.postgresql.Driver"
>        url="jdbc:postgresql://xxx.xxx.xxx.yyy:5432/sid"
>        user="user"
>        password="password"
>    />
>
> <entity name="carrers" dataSource="ds_ora" query="select 's_'||id as
> id_carrer,'a' as tooltip from imi_carrers">
>            <field column="id_carrer" name="identificador" />
>            <field column="tooltip" name="Nom" />
> </entity>
>
>
> <entity name="hidrants" dataSource="ds_pg" query="select 'h_'||id as
> id_hidrant, parc as tooltip from hidrants">
>            <field column="id_hidrant" name="identificador" />
>            <field column="tooltip" name="Nom" />
>  </entity>
>
> ----------
>
> In that configuration, all the fields coming from ds_pg are indexed, and
> the
> fields coming from ds_ora are not indexed. As I've said, the strange
> behaviour for me is that no error is logged in tomcat, the number of
> documents created is the number of rows returned by "hidrants", while the
> number of rows returned is the sum of the rows from "hidrants" and
> "carrers".
>
> Thanks in advance.
>
> Xavi.
>
>
>
>
>
>
>
> On 7 July 2010 02:46, Erick Erickson <er...@gmail.com> wrote:
>
> > first do you have a unique key defined in your schema.xml? If you
> > do, some of those 300 rows could be replacing earlier rows.
> >
> > You say: " if I have 200
> > rows indexed from postgres and 100 rows from Oracle, the full-import
> > process
> > only indexes 200 documents from oracle, although it shows clearly that
> the
> > query retruned 300 rows."
> >
> > Which really looks like a typo, if you have 100 rows from Oracle how
> > did you get 200 rows from Oracle?
> >
> > Are you perhaps doing this in two different jobs and deleting the
> > first import before running the second?
> >
> > And if this is irrelevant, could you provide more details like how you're
> > indexing things (I'm assuming DIH, but you don't state that anywhere).
> > If it *is* DIH, providing that configuration would help.
> >
> > Best
> > Erick
> >
> > On Tue, Jul 6, 2010 at 11:19 AM, Xavier Rodriguez <xe...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I have a SOLR installed on a Tomcat application server. This solr
> > instance
> > > has some data indexed from a postgres database. Now I need to add some
> > > entities from an Oracle database. When I run the full-import command,
> the
> > > documents indexed are only documents from postgres. In fact, if I have
> > 200
> > > rows indexed from postgres and 100 rows from Oracle, the full-import
> > > process
> > > only indexes 200 documents from oracle, although it shows clearly that
> > the
> > > query retruned 300 rows.
> > >
> > > I'm not doing a delta-import, simply a full import. I've tried to clean
> > the
> > > index, reload the configuration, and manually remove
> > dataimport.properties
> > > because it's the only metadata i found.  Is there any other file to
> check
> > > or
> > > modify just to get all 300 rows indexed?
> > >
> > > Of course, I tried to find one of that oracle fields, with no results.
> > >
> > > Thanks a lot,
> > >
> > > Xavier Rodriguez.
> > >
> >
>

Re: Adding new elements to index

Posted by Alexey Serba <as...@gmail.com>.
1) Shouldn't you put your "entity" elements under "document" tag, i.e.
<dataConfig>
  <dataSource ... />
  <dataSource ... />

  <document name="docs">
    <entity ...>...</entity>
    <entity ...>...</entity>
  </document>
</dataConfig>

2) What happens if you try to run full-import with explicitly
specified "entity" GET parameter?
command=full-import&entity=carrers
command=full-import&entity=hidrants


On Wed, Jul 7, 2010 at 11:15 AM, Xavier Rodriguez <xe...@gmail.com> wrote:
> Thanks for the quick reply!
>
> In fact it was a typo, the 200 rows I got were from postgres. I tried to say
> that the full-import was omitting the 100 oracle rows.
>
> When I run the full import, I run it as a single job, using the url
> command=full-import. I've tried to clear the index both using the clean
> command and manually deleting it, but when I run the full-import, the number
> of indexed documents are the documents coming from postgres.
>
> To be sure that the id field is unique, i get the id by assigning a letter
> before the id value. When indexed, the id looks like s_123, and that's the
> id 123 for an entity identified as "s". Other entities use different
> prefixes, but never "s".
>
> I used DIH to index the data. My configuration is the folllowing:
>
> File db-data-config.xml
>
>  <dataSource
>        type="JdbcDataSource"
>        name="ds_ora"
>        driver="oracle.jdbc.OracleDriver"
>        url="jdbc:oracle:thin:@xxx.xxx.xxx.xxx:1521:SID"
>        user="user"
>        password="password"
>    />
>
>  <dataSource
>        type="JdbcDataSource"
>        name="ds_pg"
>        driver="org.postgresql.Driver"
>        url="jdbc:postgresql://xxx.xxx.xxx.yyy:5432/sid"
>        user="user"
>        password="password"
>    />
>
> <entity name="carrers" dataSource="ds_ora" query="select 's_'||id as
> id_carrer,'a' as tooltip from imi_carrers">
>            <field column="id_carrer" name="identificador" />
>            <field column="tooltip" name="Nom" />
> </entity>
>
>
> <entity name="hidrants" dataSource="ds_pg" query="select 'h_'||id as
> id_hidrant, parc as tooltip from hidrants">
>            <field column="id_hidrant" name="identificador" />
>            <field column="tooltip" name="Nom" />
>  </entity>
>
> ----------
>
> In that configuration, all the fields coming from ds_pg are indexed, and the
> fields coming from ds_ora are not indexed. As I've said, the strange
> behaviour for me is that no error is logged in tomcat, the number of
> documents created is the number of rows returned by "hidrants", while the
> number of rows returned is the sum of the rows from "hidrants" and
> "carrers".
>
> Thanks in advance.
>
> Xavi.
>
>
>
>
>
>
>
> On 7 July 2010 02:46, Erick Erickson <er...@gmail.com> wrote:
>
>> first do you have a unique key defined in your schema.xml? If you
>> do, some of those 300 rows could be replacing earlier rows.
>>
>> You say: " if I have 200
>> rows indexed from postgres and 100 rows from Oracle, the full-import
>> process
>> only indexes 200 documents from oracle, although it shows clearly that the
>> query retruned 300 rows."
>>
>> Which really looks like a typo, if you have 100 rows from Oracle how
>> did you get 200 rows from Oracle?
>>
>> Are you perhaps doing this in two different jobs and deleting the
>> first import before running the second?
>>
>> And if this is irrelevant, could you provide more details like how you're
>> indexing things (I'm assuming DIH, but you don't state that anywhere).
>> If it *is* DIH, providing that configuration would help.
>>
>> Best
>> Erick
>>
>> On Tue, Jul 6, 2010 at 11:19 AM, Xavier Rodriguez <xe...@gmail.com>
>> wrote:
>>
>> > Hi,
>> >
>> > I have a SOLR installed on a Tomcat application server. This solr
>> instance
>> > has some data indexed from a postgres database. Now I need to add some
>> > entities from an Oracle database. When I run the full-import command, the
>> > documents indexed are only documents from postgres. In fact, if I have
>> 200
>> > rows indexed from postgres and 100 rows from Oracle, the full-import
>> > process
>> > only indexes 200 documents from oracle, although it shows clearly that
>> the
>> > query retruned 300 rows.
>> >
>> > I'm not doing a delta-import, simply a full import. I've tried to clean
>> the
>> > index, reload the configuration, and manually remove
>> dataimport.properties
>> > because it's the only metadata i found.  Is there any other file to check
>> > or
>> > modify just to get all 300 rows indexed?
>> >
>> > Of course, I tried to find one of that oracle fields, with no results.
>> >
>> > Thanks a lot,
>> >
>> > Xavier Rodriguez.
>> >
>>
>

Re: Adding new elements to index

Posted by Erick Erickson <er...@gmail.com>.
Hmmm, let's see your schema definitions please. I'm suspicious because
you've implied that you do use a unique key. If it's required, then your
definitions don't select it into the same name (i.e. you select as
id_carrer in one and id_hidrant in another). So if id_hidrant was defined
as your unique key AND it is a required field, that would account for it.

What do your solr logs say?

HTH
Erick




On Wed, Jul 7, 2010 at 3:15 AM, Xavier Rodriguez <xe...@gmail.com> wrote:

> Thanks for the quick reply!
>
> In fact it was a typo, the 200 rows I got were from postgres. I tried to
> say
> that the full-import was omitting the 100 oracle rows.
>
> When I run the full import, I run it as a single job, using the url
> command=full-import. I've tried to clear the index both using the clean
> command and manually deleting it, but when I run the full-import, the
> number
> of indexed documents are the documents coming from postgres.
>
> To be sure that the id field is unique, i get the id by assigning a letter
> before the id value. When indexed, the id looks like s_123, and that's the
> id 123 for an entity identified as "s". Other entities use different
> prefixes, but never "s".
>
> I used DIH to index the data. My configuration is the folllowing:
>
> File db-data-config.xml
>
>  <dataSource
>        type="JdbcDataSource"
>        name="ds_ora"
>        driver="oracle.jdbc.OracleDriver"
>        url="jdbc:oracle:thin:@xxx.xxx.xxx.xxx:1521:SID"
>        user="user"
>        password="password"
>    />
>
>  <dataSource
>        type="JdbcDataSource"
>        name="ds_pg"
>        driver="org.postgresql.Driver"
>        url="jdbc:postgresql://xxx.xxx.xxx.yyy:5432/sid"
>        user="user"
>        password="password"
>    />
>
> <entity name="carrers" dataSource="ds_ora" query="select 's_'||id as
> id_carrer,'a' as tooltip from imi_carrers">
>            <field column="id_carrer" name="identificador" />
>            <field column="tooltip" name="Nom" />
> </entity>
>
>
> <entity name="hidrants" dataSource="ds_pg" query="select 'h_'||id as
> id_hidrant, parc as tooltip from hidrants">
>            <field column="id_hidrant" name="identificador" />
>            <field column="tooltip" name="Nom" />
>  </entity>
>
> ----------
>
> In that configuration, all the fields coming from ds_pg are indexed, and
> the
> fields coming from ds_ora are not indexed. As I've said, the strange
> behaviour for me is that no error is logged in tomcat, the number of
> documents created is the number of rows returned by "hidrants", while the
> number of rows returned is the sum of the rows from "hidrants" and
> "carrers".
>
> Thanks in advance.
>
> Xavi.
>
>
>
>
>
>
>
> On 7 July 2010 02:46, Erick Erickson <er...@gmail.com> wrote:
>
> > first do you have a unique key defined in your schema.xml? If you
> > do, some of those 300 rows could be replacing earlier rows.
> >
> > You say: " if I have 200
> > rows indexed from postgres and 100 rows from Oracle, the full-import
> > process
> > only indexes 200 documents from oracle, although it shows clearly that
> the
> > query retruned 300 rows."
> >
> > Which really looks like a typo, if you have 100 rows from Oracle how
> > did you get 200 rows from Oracle?
> >
> > Are you perhaps doing this in two different jobs and deleting the
> > first import before running the second?
> >
> > And if this is irrelevant, could you provide more details like how you're
> > indexing things (I'm assuming DIH, but you don't state that anywhere).
> > If it *is* DIH, providing that configuration would help.
> >
> > Best
> > Erick
> >
> > On Tue, Jul 6, 2010 at 11:19 AM, Xavier Rodriguez <xe...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I have a SOLR installed on a Tomcat application server. This solr
> > instance
> > > has some data indexed from a postgres database. Now I need to add some
> > > entities from an Oracle database. When I run the full-import command,
> the
> > > documents indexed are only documents from postgres. In fact, if I have
> > 200
> > > rows indexed from postgres and 100 rows from Oracle, the full-import
> > > process
> > > only indexes 200 documents from oracle, although it shows clearly that
> > the
> > > query retruned 300 rows.
> > >
> > > I'm not doing a delta-import, simply a full import. I've tried to clean
> > the
> > > index, reload the configuration, and manually remove
> > dataimport.properties
> > > because it's the only metadata i found.  Is there any other file to
> check
> > > or
> > > modify just to get all 300 rows indexed?
> > >
> > > Of course, I tried to find one of that oracle fields, with no results.
> > >
> > > Thanks a lot,
> > >
> > > Xavier Rodriguez.
> > >
> >
>

Re: Adding new elements to index

Posted by Xavier Rodriguez <xe...@gmail.com>.
Thanks for the quick reply!

In fact it was a typo, the 200 rows I got were from postgres. I tried to say
that the full-import was omitting the 100 oracle rows.

When I run the full import, I run it as a single job, using the url
command=full-import. I've tried to clear the index both using the clean
command and manually deleting it, but when I run the full-import, the number
of indexed documents are the documents coming from postgres.

To be sure that the id field is unique, i get the id by assigning a letter
before the id value. When indexed, the id looks like s_123, and that's the
id 123 for an entity identified as "s". Other entities use different
prefixes, but never "s".

I used DIH to index the data. My configuration is the folllowing:

File db-data-config.xml

 <dataSource
        type="JdbcDataSource"
        name="ds_ora"
        driver="oracle.jdbc.OracleDriver"
        url="jdbc:oracle:thin:@xxx.xxx.xxx.xxx:1521:SID"
        user="user"
        password="password"
    />

 <dataSource
        type="JdbcDataSource"
        name="ds_pg"
        driver="org.postgresql.Driver"
        url="jdbc:postgresql://xxx.xxx.xxx.yyy:5432/sid"
        user="user"
        password="password"
    />

<entity name="carrers" dataSource="ds_ora" query="select 's_'||id as
id_carrer,'a' as tooltip from imi_carrers">
            <field column="id_carrer" name="identificador" />
            <field column="tooltip" name="Nom" />
</entity>


<entity name="hidrants" dataSource="ds_pg" query="select 'h_'||id as
id_hidrant, parc as tooltip from hidrants">
            <field column="id_hidrant" name="identificador" />
            <field column="tooltip" name="Nom" />
 </entity>

----------

In that configuration, all the fields coming from ds_pg are indexed, and the
fields coming from ds_ora are not indexed. As I've said, the strange
behaviour for me is that no error is logged in tomcat, the number of
documents created is the number of rows returned by "hidrants", while the
number of rows returned is the sum of the rows from "hidrants" and
"carrers".

Thanks in advance.

Xavi.







On 7 July 2010 02:46, Erick Erickson <er...@gmail.com> wrote:

> first do you have a unique key defined in your schema.xml? If you
> do, some of those 300 rows could be replacing earlier rows.
>
> You say: " if I have 200
> rows indexed from postgres and 100 rows from Oracle, the full-import
> process
> only indexes 200 documents from oracle, although it shows clearly that the
> query retruned 300 rows."
>
> Which really looks like a typo, if you have 100 rows from Oracle how
> did you get 200 rows from Oracle?
>
> Are you perhaps doing this in two different jobs and deleting the
> first import before running the second?
>
> And if this is irrelevant, could you provide more details like how you're
> indexing things (I'm assuming DIH, but you don't state that anywhere).
> If it *is* DIH, providing that configuration would help.
>
> Best
> Erick
>
> On Tue, Jul 6, 2010 at 11:19 AM, Xavier Rodriguez <xe...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I have a SOLR installed on a Tomcat application server. This solr
> instance
> > has some data indexed from a postgres database. Now I need to add some
> > entities from an Oracle database. When I run the full-import command, the
> > documents indexed are only documents from postgres. In fact, if I have
> 200
> > rows indexed from postgres and 100 rows from Oracle, the full-import
> > process
> > only indexes 200 documents from oracle, although it shows clearly that
> the
> > query retruned 300 rows.
> >
> > I'm not doing a delta-import, simply a full import. I've tried to clean
> the
> > index, reload the configuration, and manually remove
> dataimport.properties
> > because it's the only metadata i found.  Is there any other file to check
> > or
> > modify just to get all 300 rows indexed?
> >
> > Of course, I tried to find one of that oracle fields, with no results.
> >
> > Thanks a lot,
> >
> > Xavier Rodriguez.
> >
>

Re: Adding new elements to index

Posted by Erick Erickson <er...@gmail.com>.
first do you have a unique key defined in your schema.xml? If you
do, some of those 300 rows could be replacing earlier rows.

You say: " if I have 200
rows indexed from postgres and 100 rows from Oracle, the full-import process
only indexes 200 documents from oracle, although it shows clearly that the
query retruned 300 rows."

Which really looks like a typo, if you have 100 rows from Oracle how
did you get 200 rows from Oracle?

Are you perhaps doing this in two different jobs and deleting the
first import before running the second?

And if this is irrelevant, could you provide more details like how you're
indexing things (I'm assuming DIH, but you don't state that anywhere).
If it *is* DIH, providing that configuration would help.

Best
Erick

On Tue, Jul 6, 2010 at 11:19 AM, Xavier Rodriguez <xe...@gmail.com> wrote:

> Hi,
>
> I have a SOLR installed on a Tomcat application server. This solr instance
> has some data indexed from a postgres database. Now I need to add some
> entities from an Oracle database. When I run the full-import command, the
> documents indexed are only documents from postgres. In fact, if I have 200
> rows indexed from postgres and 100 rows from Oracle, the full-import
> process
> only indexes 200 documents from oracle, although it shows clearly that the
> query retruned 300 rows.
>
> I'm not doing a delta-import, simply a full import. I've tried to clean the
> index, reload the configuration, and manually remove dataimport.properties
> because it's the only metadata i found.  Is there any other file to check
> or
> modify just to get all 300 rows indexed?
>
> Of course, I tried to find one of that oracle fields, with no results.
>
> Thanks a lot,
>
> Xavier Rodriguez.
>