You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by Ro...@cognizant.com on 2010/06/10 10:52:40 UTC

Other document data.

I am using JDBC connection to search for the documents in the database.

 

The issue is  some document data(Check in date etc) is present in the
other columns. How to send this data to Solr so as to index it.

 

Why is the URL of the file taken as ID in Solr.

 

Thanks & Regards,

Rohan G Patil

Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001 

Rohan.GPatil@cognizant.com <ma...@cognizant.com> 

 



This e-mail and any files transmitted with it are for the sole use of 
the intended recipient(s) and may contain confidential and privileged 
information.
If you are not the intended recipient, please contact the sender by 
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding, 
printing or copying of this email or any action taken in reliance on this 
e-mail is strictly prohibited and may be unlawful.

  

RE: Other document data.

Posted by Ro...@cognizant.com.
Hi,
 
Okay, so we can do something like this, 
 
select idfield as $(IDCOLUMN), urlfield as  $(URLCOLUMN) , datafield as $(DATACOLUMN) , createddate as $(DATECOLUMN) , author as $(AUTHORCOLUMN) from documenttable where idfield IN $(IDLIST)
 
Thanks & Regards
-Rohan G Patil

________________________________

From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Sun 6/13/2010 7:03 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.



No.  The data query (the same one that returns the blob info) can now include additional columns.  These columns will be sent to Solr as metadata fields.

Karl

________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Friday, June 11, 2010 2:28 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

I see that the issue is resolved.

Now is there a new query where in we can specify the metadata fields ?

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Thursday, June 10, 2010 4:12 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

It is not possible to properly glom other fields onto a BLOB unless you know that the blob's contents are always encoded text.  So I suggest you create a jira enhancement request in the Lucene Connector Framework project to describe this enhancement (adding metadata support to JDBC connector).

The url is: http://issues.apache.org/jira

You may need to create an account if you don't already have one.  Let me know if you have any difficulties.

Thanks,
Karl


-----Original Message-----
From: ext Rohan.GPatil@cognizant.com [mailto:Rohan.GPatil@cognizant.com]
Sent: Thursday, June 10, 2010 6:39 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.


Hi,

Using solution 1 was not a bad idea, but the problem is the content is stored as BLOB in the database and gluing other fields with BLOB is not possible (Is it ?) .

Regarding 2 : Yes I guess I can do that modification, and anyway it all depends on how we show it to the user.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com

-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Thursday, June 10, 2010 3:19 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

(1) The JDBC connector is currently relatively primitive and does not have any support for "document metadata" at this time.  You can, of course, glom together multiple fields into the content field with it, but that's pretty crude.
(2) The LCF convention for how to identify documents uniquely in the target index is to use the URL of the document.  All documents indexed with LCF have such a URL and it is likely to be both useful and unique.  This url is how LCF requests deletion of the document from the index, if necessary, and also overwrites the document.  So it maps pretty precisely to literal.id for the basic solr setup.  Now, it may be that this is too tied to the example, and that the solr connector should have a configuration setting to allow the name of the id field used to be changed - that sounds like a reasonable modification that would not be too difficult to do.  Is this something you are looking for?

Karl
________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Thursday, June 10, 2010 4:52 AM
To: connectors-user@incubator.apache.org
Subject: Other document data.

I am using JDBC connection to search for the documents in the database.

The issue is  some document data(Check in date etc) is present in the other columns. How to send this data to Solr so as to index it.

Why is the URL of the file taken as ID in Solr.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com<ma...@cognizant.com>

This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.




This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.





RE: Other document data.

Posted by ka...@nokia.com.
Checked in a fix for this problem.  I don't think anyone ever tried time fields with the JDBC connector before.

Thanks,
Karl

From: ext Rohan.GPatil@cognizant.com [mailto:Rohan.GPatil@cognizant.com]
Sent: Wednesday, June 16, 2010 1:27 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.


Hi Karl!

Thanks! The Metadata for Solr is working! :)

The metadata is getting appended. But there is one small problem

The date metadata is getting appended like this.

org.apache.lcf.core.interfaces.TimeMarker@1d61ee4

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Wednesday, June 16, 2010 5:46 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Do your LCF processes have write access to D:\LCF_App\LCF_Sync?

There are a number of Windows bugs having to do with file locking, and it is possible you are tripping over one of them.
I suggest the following:

- shut down tomcat and the AgentRun process
- see if you can delete the file D:\LCF_App\LCF_Sync\737\563\lock-_Cache_JOBSTATUSES.lock
- if you can't delete it, reboot your system
- erase everything under D:\LCF_App\LCF_Sync
- start tomcat and the AgentRun process again

It's exactly this kind of problem that is driving me to want to use something like zookeeper for cross-process lock management, rather than the file system.

Karl

-----Original Message-----
From: ext Rohan.GPatil@cognizant.com [mailto:Rohan.GPatil@cognizant.com]
Sent: Wednesday, June 16, 2010 7:56 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.


Hi Karl!

I am currently checking it out. And meanwhile I am getting this error

[2010-06-15 19:03:28,996]WARN  Attempt to set file lock 'D:\LCF_App\LCF_Sync\737\563\lock-_Cache_JOBSTATUSES.lock' failed: Access is denied
java.io.IOException: Access is denied
        at java.io.WinNTFileSystem.createFileExclusively(Native Method)
        at java.io.File.createNewFile(File.java:883)
        at org.apache.lcf.core.lockmanager.LockObject.grabFileLock(LockObject.java:550)
        at org.apache.lcf.core.lockmanager.LockObject.leaveReadLock(LockObject.java:489)
        at org.apache.lcf.core.lockmanager.LockManager.leaveReadLock(LockManager.java:752)
        at org.apache.lcf.core.lockmanager.LockManager.leaveLocks(LockManager.java:1216)
        at org.apache.lcf.core.cachemanager.CacheManager.leaveCache(CacheManager.java:660)
        at org.apache.lcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:179)
        at org.apache.lcf.core.database.Database.executeQuery(Database.java:167)
        at org.apache.lcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:774)
        at org.apache.lcf.core.database.BaseTable.performQuery(BaseTable.java:244)
        at org.apache.lcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:1753)
        at org.apache.lcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:807)
        at org.apache.lcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:103)
[2010-06-16 06:29:38,668]ERROR Couldn't write to lock file; disk may be full.  Shutting down process; locks may be left dangling.  You must cleanup before restarting.
java.io.FileNotFoundException: D:\LCF_App\LCF_Sync\737\563\lock-_Cache_JOBSTATUSES.file (Access is denied)
        at java.io.FileOutputStream.open(Native Method)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:179)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:131)
        at java.io.FileWriter.<init>(FileWriter.java:73)
        at org.apache.lcf.core.lockmanager.LockObject.writeFile(LockObject.java:732)
        at org.apache.lcf.core.lockmanager.LockObject.enterReadLockNoWait(LockObject.java:449)
        at org.apache.lcf.core.lockmanager.LockObject.enterReadLock(LockObject.java:401)
        at org.apache.lcf.core.lockmanager.LockManager.enterLocks(LockManager.java:924)
        at org.apache.lcf.core.cachemanager.CacheManager.enterCache(CacheManager.java:278)
        at org.apache.lcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:98)
        at org.apache.lcf.core.database.Database.executeQuery(Database.java:167)
        at org.apache.lcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:774)
        at org.apache.lcf.core.database.BaseTable.performQuery(BaseTable.java:244)
        at org.apache.lcf.crawler.jobs.Jobs.activeJobsPresent(Jobs.java:1770)
        at org.apache.lcf.crawler.jobs.JobManager.getNextDocuments(JobManager.java:1490)
        at org.apache.lcf.crawler.system.StufferThread.run(StufferThread.java:157)


What must be the problem ?

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Wednesday, June 16, 2010 2:34 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

A fix was checked in yesterday for the missing "literal." in all metadata field names.  Is this fix working for you?

Karl

________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Tuesday, June 15, 2010 7:04 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

Yes the metadata is sent to solr. But it is being sent like this.

webapp=/solr path=/update/extract params={fmap.content=text&DDOCAUTHOR=sysadmin&uprefix=attr_&literal.id=/idc/groups/public/documents/sunil_layout/mfe001251.xml&lowernames=true&captureAttr=true}

as you can see DDOCAUTHOR is sent in the params part which is a custom metadata.

But to add custom data as metadata in Solr we should pass in the literal parameter along with the file.

So, there will be a small change in the code.

Instead of sending the column name in the params, we should send

literal.<<columnname>> (columnnames in small case)


Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Tuesday, June 15, 2010 4:03 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

That's hard to determine without looking at the solr logs.  I am not familiar with the log options available, but unless I'm mistaken the default configuration should be dumping every request to standard out.

Karl
________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Tuesday, June 15, 2010 5:23 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

Yes I get it. Thanks for the clarification.

I was doing the similar thing before and it used to run, now it didn't. So I got confused.

Is there any way to check if metadata is actually sent to solr? Because I am experiencing some problem there and I don't seem to figure out where it is going wrong.


Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com

-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Tuesday, June 15, 2010 2:13 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

LCF is an incremental crawler.  The version query is used to determine whether data needs to be refetched and reindexed.  If it returns the same thing each time the document is examined, the data query will not be run the second time.  I therefore suggest either the following:

(1) Supply no version query at all.  That signals to the connector that there is no version information and the data must be reindexed on every job run.
(2) Supply a version query that properly reflects changes to the data.  For instance, if there's a timestamp in each record, you can use that by itself ONLY if any metadata changes also are associated with a change in that timestamp.  If not, you will need to glom the metadata into the version string as well as the timestamp.  Is this understood?

If you want to FORCE a reindex, there is a link in the crawler-ui for the output connection which allows you to force reindexing of all data associated with that connection.

If this still doesn't seem to describe what you are seeing, please clarify further.

Thanks,
Karl

________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Tuesday, June 15, 2010 12:51 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

When we specify the metadata content, It runs fine the first time, The second time it doesn't run the data query at all. What must be the problem ?

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Sunday, June 13, 2010 7:04 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

No.  The data query (the same one that returns the blob info) can now include additional columns.  These columns will be sent to Solr as metadata fields.

Karl

________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Friday, June 11, 2010 2:28 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

I see that the issue is resolved.

Now is there a new query where in we can specify the metadata fields ?

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Thursday, June 10, 2010 4:12 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

It is not possible to properly glom other fields onto a BLOB unless you know that the blob's contents are always encoded text.  So I suggest you create a jira enhancement request in the Lucene Connector Framework project to describe this enhancement (adding metadata support to JDBC connector).

The url is: http://issues.apache.org/jira

You may need to create an account if you don't already have one.  Let me know if you have any difficulties.

Thanks,
Karl


-----Original Message-----
From: ext Rohan.GPatil@cognizant.com [mailto:Rohan.GPatil@cognizant.com]
Sent: Thursday, June 10, 2010 6:39 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.


Hi,

Using solution 1 was not a bad idea, but the problem is the content is stored as BLOB in the database and gluing other fields with BLOB is not possible (Is it ?) .

Regarding 2 : Yes I guess I can do that modification, and anyway it all depends on how we show it to the user.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com

-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Thursday, June 10, 2010 3:19 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

(1) The JDBC connector is currently relatively primitive and does not have any support for "document metadata" at this time.  You can, of course, glom together multiple fields into the content field with it, but that's pretty crude.
(2) The LCF convention for how to identify documents uniquely in the target index is to use the URL of the document.  All documents indexed with LCF have such a URL and it is likely to be both useful and unique.  This url is how LCF requests deletion of the document from the index, if necessary, and also overwrites the document.  So it maps pretty precisely to literal.id for the basic solr setup.  Now, it may be that this is too tied to the example, and that the solr connector should have a configuration setting to allow the name of the id field used to be changed - that sounds like a reasonable modification that would not be too difficult to do.  Is this something you are looking for?

Karl
________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Thursday, June 10, 2010 4:52 AM
To: connectors-user@incubator.apache.org
Subject: Other document data.

I am using JDBC connection to search for the documents in the database.

The issue is  some document data(Check in date etc) is present in the other columns. How to send this data to Solr so as to index it.

Why is the URL of the file taken as ID in Solr.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com<ma...@cognizant.com>

This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.




This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.

This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.




RE: Other document data.

Posted by Ro...@cognizant.com.
Hi Karl!

Thanks! The Metadata for Solr is working! :)

The metadata is getting appended. But there is one small problem

The date metadata is getting appended like this.

org.apache.lcf.core.interfaces.TimeMarker@1d61ee4

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Wednesday, June 16, 2010 5:46 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Do your LCF processes have write access to D:\LCF_App\LCF_Sync?

There are a number of Windows bugs having to do with file locking, and it is possible you are tripping over one of them.
I suggest the following:

- shut down tomcat and the AgentRun process
- see if you can delete the file D:\LCF_App\LCF_Sync\737\563\lock-_Cache_JOBSTATUSES.lock
- if you can't delete it, reboot your system
- erase everything under D:\LCF_App\LCF_Sync
- start tomcat and the AgentRun process again

It's exactly this kind of problem that is driving me to want to use something like zookeeper for cross-process lock management, rather than the file system.

Karl

-----Original Message-----
From: ext Rohan.GPatil@cognizant.com [mailto:Rohan.GPatil@cognizant.com]
Sent: Wednesday, June 16, 2010 7:56 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.


Hi Karl!

I am currently checking it out. And meanwhile I am getting this error

[2010-06-15 19:03:28,996]WARN  Attempt to set file lock 'D:\LCF_App\LCF_Sync\737\563\lock-_Cache_JOBSTATUSES.lock' failed: Access is denied
java.io.IOException: Access is denied
        at java.io.WinNTFileSystem.createFileExclusively(Native Method)
        at java.io.File.createNewFile(File.java:883)
        at org.apache.lcf.core.lockmanager.LockObject.grabFileLock(LockObject.java:550)
        at org.apache.lcf.core.lockmanager.LockObject.leaveReadLock(LockObject.java:489)
        at org.apache.lcf.core.lockmanager.LockManager.leaveReadLock(LockManager.java:752)
        at org.apache.lcf.core.lockmanager.LockManager.leaveLocks(LockManager.java:1216)
        at org.apache.lcf.core.cachemanager.CacheManager.leaveCache(CacheManager.java:660)
        at org.apache.lcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:179)
        at org.apache.lcf.core.database.Database.executeQuery(Database.java:167)
        at org.apache.lcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:774)
        at org.apache.lcf.core.database.BaseTable.performQuery(BaseTable.java:244)
        at org.apache.lcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:1753)
        at org.apache.lcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:807)
        at org.apache.lcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:103)
[2010-06-16 06:29:38,668]ERROR Couldn't write to lock file; disk may be full.  Shutting down process; locks may be left dangling.  You must cleanup before restarting.
java.io.FileNotFoundException: D:\LCF_App\LCF_Sync\737\563\lock-_Cache_JOBSTATUSES.file (Access is denied)
        at java.io.FileOutputStream.open(Native Method)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:179)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:131)
        at java.io.FileWriter.<init>(FileWriter.java:73)
        at org.apache.lcf.core.lockmanager.LockObject.writeFile(LockObject.java:732)
        at org.apache.lcf.core.lockmanager.LockObject.enterReadLockNoWait(LockObject.java:449)
        at org.apache.lcf.core.lockmanager.LockObject.enterReadLock(LockObject.java:401)
        at org.apache.lcf.core.lockmanager.LockManager.enterLocks(LockManager.java:924)
        at org.apache.lcf.core.cachemanager.CacheManager.enterCache(CacheManager.java:278)
        at org.apache.lcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:98)
        at org.apache.lcf.core.database.Database.executeQuery(Database.java:167)
        at org.apache.lcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:774)
        at org.apache.lcf.core.database.BaseTable.performQuery(BaseTable.java:244)
        at org.apache.lcf.crawler.jobs.Jobs.activeJobsPresent(Jobs.java:1770)
        at org.apache.lcf.crawler.jobs.JobManager.getNextDocuments(JobManager.java:1490)
        at org.apache.lcf.crawler.system.StufferThread.run(StufferThread.java:157)


What must be the problem ?

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Wednesday, June 16, 2010 2:34 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

A fix was checked in yesterday for the missing "literal." in all metadata field names.  Is this fix working for you?

Karl

________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Tuesday, June 15, 2010 7:04 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

Yes the metadata is sent to solr. But it is being sent like this.

webapp=/solr path=/update/extract params={fmap.content=text&DDOCAUTHOR=sysadmin&uprefix=attr_&literal.id=/idc/groups/public/documents/sunil_layout/mfe001251.xml&lowernames=true&captureAttr=true}

as you can see DDOCAUTHOR is sent in the params part which is a custom metadata.

But to add custom data as metadata in Solr we should pass in the literal parameter along with the file.

So, there will be a small change in the code.

Instead of sending the column name in the params, we should send

literal.<<columnname>> (columnnames in small case)


Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Tuesday, June 15, 2010 4:03 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

That's hard to determine without looking at the solr logs.  I am not familiar with the log options available, but unless I'm mistaken the default configuration should be dumping every request to standard out.

Karl
________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Tuesday, June 15, 2010 5:23 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

Yes I get it. Thanks for the clarification.

I was doing the similar thing before and it used to run, now it didn't. So I got confused.

Is there any way to check if metadata is actually sent to solr? Because I am experiencing some problem there and I don't seem to figure out where it is going wrong.


Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com

-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Tuesday, June 15, 2010 2:13 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

LCF is an incremental crawler.  The version query is used to determine whether data needs to be refetched and reindexed.  If it returns the same thing each time the document is examined, the data query will not be run the second time.  I therefore suggest either the following:

(1) Supply no version query at all.  That signals to the connector that there is no version information and the data must be reindexed on every job run.
(2) Supply a version query that properly reflects changes to the data.  For instance, if there's a timestamp in each record, you can use that by itself ONLY if any metadata changes also are associated with a change in that timestamp.  If not, you will need to glom the metadata into the version string as well as the timestamp.  Is this understood?

If you want to FORCE a reindex, there is a link in the crawler-ui for the output connection which allows you to force reindexing of all data associated with that connection.

If this still doesn't seem to describe what you are seeing, please clarify further.

Thanks,
Karl

________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Tuesday, June 15, 2010 12:51 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

When we specify the metadata content, It runs fine the first time, The second time it doesn't run the data query at all. What must be the problem ?

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Sunday, June 13, 2010 7:04 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

No.  The data query (the same one that returns the blob info) can now include additional columns.  These columns will be sent to Solr as metadata fields.

Karl

________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Friday, June 11, 2010 2:28 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

I see that the issue is resolved.

Now is there a new query where in we can specify the metadata fields ?

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Thursday, June 10, 2010 4:12 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

It is not possible to properly glom other fields onto a BLOB unless you know that the blob's contents are always encoded text.  So I suggest you create a jira enhancement request in the Lucene Connector Framework project to describe this enhancement (adding metadata support to JDBC connector).

The url is: http://issues.apache.org/jira

You may need to create an account if you don't already have one.  Let me know if you have any difficulties.

Thanks,
Karl


-----Original Message-----
From: ext Rohan.GPatil@cognizant.com [mailto:Rohan.GPatil@cognizant.com]
Sent: Thursday, June 10, 2010 6:39 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.


Hi,

Using solution 1 was not a bad idea, but the problem is the content is stored as BLOB in the database and gluing other fields with BLOB is not possible (Is it ?) .

Regarding 2 : Yes I guess I can do that modification, and anyway it all depends on how we show it to the user.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com

-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Thursday, June 10, 2010 3:19 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

(1) The JDBC connector is currently relatively primitive and does not have any support for "document metadata" at this time.  You can, of course, glom together multiple fields into the content field with it, but that's pretty crude.
(2) The LCF convention for how to identify documents uniquely in the target index is to use the URL of the document.  All documents indexed with LCF have such a URL and it is likely to be both useful and unique.  This url is how LCF requests deletion of the document from the index, if necessary, and also overwrites the document.  So it maps pretty precisely to literal.id for the basic solr setup.  Now, it may be that this is too tied to the example, and that the solr connector should have a configuration setting to allow the name of the id field used to be changed - that sounds like a reasonable modification that would not be too difficult to do.  Is this something you are looking for?

Karl
________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Thursday, June 10, 2010 4:52 AM
To: connectors-user@incubator.apache.org
Subject: Other document data.

I am using JDBC connection to search for the documents in the database.

The issue is  some document data(Check in date etc) is present in the other columns. How to send this data to Solr so as to index it.

Why is the URL of the file taken as ID in Solr.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com<ma...@cognizant.com>

This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.




This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.






This e-mail and any files transmitted with it are for the sole use of 
the intended recipient(s) and may contain confidential and privileged 
information.
If you are not the intended recipient, please contact the sender by 
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding, 
printing or copying of this email or any action taken in reliance on this 
e-mail is strictly prohibited and may be unlawful.

  

RE: Other document data.

Posted by ka...@nokia.com.
Do your LCF processes have write access to D:\LCF_App\LCF_Sync?

There are a number of Windows bugs having to do with file locking, and it is possible you are tripping over one of them.
I suggest the following:

- shut down tomcat and the AgentRun process
- see if you can delete the file D:\LCF_App\LCF_Sync\737\563\lock-_Cache_JOBSTATUSES.lock
- if you can't delete it, reboot your system
- erase everything under D:\LCF_App\LCF_Sync
- start tomcat and the AgentRun process again

It's exactly this kind of problem that is driving me to want to use something like zookeeper for cross-process lock management, rather than the file system.

Karl

-----Original Message-----
From: ext Rohan.GPatil@cognizant.com [mailto:Rohan.GPatil@cognizant.com]
Sent: Wednesday, June 16, 2010 7:56 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.


Hi Karl!

I am currently checking it out. And meanwhile I am getting this error

[2010-06-15 19:03:28,996]WARN  Attempt to set file lock 'D:\LCF_App\LCF_Sync\737\563\lock-_Cache_JOBSTATUSES.lock' failed: Access is denied
java.io.IOException: Access is denied
        at java.io.WinNTFileSystem.createFileExclusively(Native Method)
        at java.io.File.createNewFile(File.java:883)
        at org.apache.lcf.core.lockmanager.LockObject.grabFileLock(LockObject.java:550)
        at org.apache.lcf.core.lockmanager.LockObject.leaveReadLock(LockObject.java:489)
        at org.apache.lcf.core.lockmanager.LockManager.leaveReadLock(LockManager.java:752)
        at org.apache.lcf.core.lockmanager.LockManager.leaveLocks(LockManager.java:1216)
        at org.apache.lcf.core.cachemanager.CacheManager.leaveCache(CacheManager.java:660)
        at org.apache.lcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:179)
        at org.apache.lcf.core.database.Database.executeQuery(Database.java:167)
        at org.apache.lcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:774)
        at org.apache.lcf.core.database.BaseTable.performQuery(BaseTable.java:244)
        at org.apache.lcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:1753)
        at org.apache.lcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:807)
        at org.apache.lcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:103)
[2010-06-16 06:29:38,668]ERROR Couldn't write to lock file; disk may be full.  Shutting down process; locks may be left dangling.  You must cleanup before restarting.
java.io.FileNotFoundException: D:\LCF_App\LCF_Sync\737\563\lock-_Cache_JOBSTATUSES.file (Access is denied)
        at java.io.FileOutputStream.open(Native Method)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:179)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:131)
        at java.io.FileWriter.<init>(FileWriter.java:73)
        at org.apache.lcf.core.lockmanager.LockObject.writeFile(LockObject.java:732)
        at org.apache.lcf.core.lockmanager.LockObject.enterReadLockNoWait(LockObject.java:449)
        at org.apache.lcf.core.lockmanager.LockObject.enterReadLock(LockObject.java:401)
        at org.apache.lcf.core.lockmanager.LockManager.enterLocks(LockManager.java:924)
        at org.apache.lcf.core.cachemanager.CacheManager.enterCache(CacheManager.java:278)
        at org.apache.lcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:98)
        at org.apache.lcf.core.database.Database.executeQuery(Database.java:167)
        at org.apache.lcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:774)
        at org.apache.lcf.core.database.BaseTable.performQuery(BaseTable.java:244)
        at org.apache.lcf.crawler.jobs.Jobs.activeJobsPresent(Jobs.java:1770)
        at org.apache.lcf.crawler.jobs.JobManager.getNextDocuments(JobManager.java:1490)
        at org.apache.lcf.crawler.system.StufferThread.run(StufferThread.java:157)


What must be the problem ?

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Wednesday, June 16, 2010 2:34 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

A fix was checked in yesterday for the missing "literal." in all metadata field names.  Is this fix working for you?

Karl

________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Tuesday, June 15, 2010 7:04 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

Yes the metadata is sent to solr. But it is being sent like this.

webapp=/solr path=/update/extract params={fmap.content=text&DDOCAUTHOR=sysadmin&uprefix=attr_&literal.id=/idc/groups/public/documents/sunil_layout/mfe001251.xml&lowernames=true&captureAttr=true}

as you can see DDOCAUTHOR is sent in the params part which is a custom metadata.

But to add custom data as metadata in Solr we should pass in the literal parameter along with the file.

So, there will be a small change in the code.

Instead of sending the column name in the params, we should send

literal.<<columnname>> (columnnames in small case)


Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Tuesday, June 15, 2010 4:03 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

That's hard to determine without looking at the solr logs.  I am not familiar with the log options available, but unless I'm mistaken the default configuration should be dumping every request to standard out.

Karl
________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Tuesday, June 15, 2010 5:23 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

Yes I get it. Thanks for the clarification.

I was doing the similar thing before and it used to run, now it didn't. So I got confused.

Is there any way to check if metadata is actually sent to solr? Because I am experiencing some problem there and I don't seem to figure out where it is going wrong.


Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com

-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Tuesday, June 15, 2010 2:13 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

LCF is an incremental crawler.  The version query is used to determine whether data needs to be refetched and reindexed.  If it returns the same thing each time the document is examined, the data query will not be run the second time.  I therefore suggest either the following:

(1) Supply no version query at all.  That signals to the connector that there is no version information and the data must be reindexed on every job run.
(2) Supply a version query that properly reflects changes to the data.  For instance, if there's a timestamp in each record, you can use that by itself ONLY if any metadata changes also are associated with a change in that timestamp.  If not, you will need to glom the metadata into the version string as well as the timestamp.  Is this understood?

If you want to FORCE a reindex, there is a link in the crawler-ui for the output connection which allows you to force reindexing of all data associated with that connection.

If this still doesn't seem to describe what you are seeing, please clarify further.

Thanks,
Karl

________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Tuesday, June 15, 2010 12:51 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

When we specify the metadata content, It runs fine the first time, The second time it doesn't run the data query at all. What must be the problem ?

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Sunday, June 13, 2010 7:04 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

No.  The data query (the same one that returns the blob info) can now include additional columns.  These columns will be sent to Solr as metadata fields.

Karl

________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Friday, June 11, 2010 2:28 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

I see that the issue is resolved.

Now is there a new query where in we can specify the metadata fields ?

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Thursday, June 10, 2010 4:12 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

It is not possible to properly glom other fields onto a BLOB unless you know that the blob's contents are always encoded text.  So I suggest you create a jira enhancement request in the Lucene Connector Framework project to describe this enhancement (adding metadata support to JDBC connector).

The url is: http://issues.apache.org/jira

You may need to create an account if you don't already have one.  Let me know if you have any difficulties.

Thanks,
Karl


-----Original Message-----
From: ext Rohan.GPatil@cognizant.com [mailto:Rohan.GPatil@cognizant.com]
Sent: Thursday, June 10, 2010 6:39 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.


Hi,

Using solution 1 was not a bad idea, but the problem is the content is stored as BLOB in the database and gluing other fields with BLOB is not possible (Is it ?) .

Regarding 2 : Yes I guess I can do that modification, and anyway it all depends on how we show it to the user.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com

-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Thursday, June 10, 2010 3:19 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

(1) The JDBC connector is currently relatively primitive and does not have any support for "document metadata" at this time.  You can, of course, glom together multiple fields into the content field with it, but that's pretty crude.
(2) The LCF convention for how to identify documents uniquely in the target index is to use the URL of the document.  All documents indexed with LCF have such a URL and it is likely to be both useful and unique.  This url is how LCF requests deletion of the document from the index, if necessary, and also overwrites the document.  So it maps pretty precisely to literal.id for the basic solr setup.  Now, it may be that this is too tied to the example, and that the solr connector should have a configuration setting to allow the name of the id field used to be changed - that sounds like a reasonable modification that would not be too difficult to do.  Is this something you are looking for?

Karl
________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Thursday, June 10, 2010 4:52 AM
To: connectors-user@incubator.apache.org
Subject: Other document data.

I am using JDBC connection to search for the documents in the database.

The issue is  some document data(Check in date etc) is present in the other columns. How to send this data to Solr so as to index it.

Why is the URL of the file taken as ID in Solr.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com<ma...@cognizant.com>

This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.




This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



RE: Other document data.

Posted by Ro...@cognizant.com.
Hi Karl!

I am currently checking it out. And meanwhile I am getting this error

[2010-06-15 19:03:28,996]WARN  Attempt to set file lock 'D:\LCF_App\LCF_Sync\737\563\lock-_Cache_JOBSTATUSES.lock' failed: Access is denied
java.io.IOException: Access is denied
	at java.io.WinNTFileSystem.createFileExclusively(Native Method)
	at java.io.File.createNewFile(File.java:883)
	at org.apache.lcf.core.lockmanager.LockObject.grabFileLock(LockObject.java:550)
	at org.apache.lcf.core.lockmanager.LockObject.leaveReadLock(LockObject.java:489)
	at org.apache.lcf.core.lockmanager.LockManager.leaveReadLock(LockManager.java:752)
	at org.apache.lcf.core.lockmanager.LockManager.leaveLocks(LockManager.java:1216)
	at org.apache.lcf.core.cachemanager.CacheManager.leaveCache(CacheManager.java:660)
	at org.apache.lcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:179)
	at org.apache.lcf.core.database.Database.executeQuery(Database.java:167)
	at org.apache.lcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:774)
	at org.apache.lcf.core.database.BaseTable.performQuery(BaseTable.java:244)
	at org.apache.lcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:1753)
	at org.apache.lcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:807)
	at org.apache.lcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:103)
[2010-06-16 06:29:38,668]ERROR Couldn't write to lock file; disk may be full.  Shutting down process; locks may be left dangling.  You must cleanup before restarting.
java.io.FileNotFoundException: D:\LCF_App\LCF_Sync\737\563\lock-_Cache_JOBSTATUSES.file (Access is denied)
	at java.io.FileOutputStream.open(Native Method)
	at java.io.FileOutputStream.<init>(FileOutputStream.java:179)
	at java.io.FileOutputStream.<init>(FileOutputStream.java:131)
	at java.io.FileWriter.<init>(FileWriter.java:73)
	at org.apache.lcf.core.lockmanager.LockObject.writeFile(LockObject.java:732)
	at org.apache.lcf.core.lockmanager.LockObject.enterReadLockNoWait(LockObject.java:449)
	at org.apache.lcf.core.lockmanager.LockObject.enterReadLock(LockObject.java:401)
	at org.apache.lcf.core.lockmanager.LockManager.enterLocks(LockManager.java:924)
	at org.apache.lcf.core.cachemanager.CacheManager.enterCache(CacheManager.java:278)
	at org.apache.lcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:98)
	at org.apache.lcf.core.database.Database.executeQuery(Database.java:167)
	at org.apache.lcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:774)
	at org.apache.lcf.core.database.BaseTable.performQuery(BaseTable.java:244)
	at org.apache.lcf.crawler.jobs.Jobs.activeJobsPresent(Jobs.java:1770)
	at org.apache.lcf.crawler.jobs.JobManager.getNextDocuments(JobManager.java:1490)
	at org.apache.lcf.crawler.system.StufferThread.run(StufferThread.java:157)


What must be the problem ?

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001 
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com] 
Sent: Wednesday, June 16, 2010 2:34 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

A fix was checked in yesterday for the missing "literal." in all metadata field names.  Is this fix working for you?

Karl

________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Tuesday, June 15, 2010 7:04 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

Yes the metadata is sent to solr. But it is being sent like this.

webapp=/solr path=/update/extract params={fmap.content=text&DDOCAUTHOR=sysadmin&uprefix=attr_&literal.id=/idc/groups/public/documents/sunil_layout/mfe001251.xml&lowernames=true&captureAttr=true}

as you can see DDOCAUTHOR is sent in the params part which is a custom metadata.

But to add custom data as metadata in Solr we should pass in the literal parameter along with the file.

So, there will be a small change in the code.

Instead of sending the column name in the params, we should send

literal.<<columnname>> (columnnames in small case)


Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Tuesday, June 15, 2010 4:03 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

That's hard to determine without looking at the solr logs.  I am not familiar with the log options available, but unless I'm mistaken the default configuration should be dumping every request to standard out.

Karl
________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Tuesday, June 15, 2010 5:23 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

Yes I get it. Thanks for the clarification.

I was doing the similar thing before and it used to run, now it didn't. So I got confused.

Is there any way to check if metadata is actually sent to solr? Because I am experiencing some problem there and I don't seem to figure out where it is going wrong.


Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com

-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Tuesday, June 15, 2010 2:13 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

LCF is an incremental crawler.  The version query is used to determine whether data needs to be refetched and reindexed.  If it returns the same thing each time the document is examined, the data query will not be run the second time.  I therefore suggest either the following:

(1) Supply no version query at all.  That signals to the connector that there is no version information and the data must be reindexed on every job run.
(2) Supply a version query that properly reflects changes to the data.  For instance, if there's a timestamp in each record, you can use that by itself ONLY if any metadata changes also are associated with a change in that timestamp.  If not, you will need to glom the metadata into the version string as well as the timestamp.  Is this understood?

If you want to FORCE a reindex, there is a link in the crawler-ui for the output connection which allows you to force reindexing of all data associated with that connection.

If this still doesn't seem to describe what you are seeing, please clarify further.

Thanks,
Karl

________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Tuesday, June 15, 2010 12:51 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

When we specify the metadata content, It runs fine the first time, The second time it doesn't run the data query at all. What must be the problem ?

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Sunday, June 13, 2010 7:04 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

No.  The data query (the same one that returns the blob info) can now include additional columns.  These columns will be sent to Solr as metadata fields.

Karl

________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Friday, June 11, 2010 2:28 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

I see that the issue is resolved.

Now is there a new query where in we can specify the metadata fields ?

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Thursday, June 10, 2010 4:12 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

It is not possible to properly glom other fields onto a BLOB unless you know that the blob's contents are always encoded text.  So I suggest you create a jira enhancement request in the Lucene Connector Framework project to describe this enhancement (adding metadata support to JDBC connector).

The url is: http://issues.apache.org/jira

You may need to create an account if you don't already have one.  Let me know if you have any difficulties.

Thanks,
Karl


-----Original Message-----
From: ext Rohan.GPatil@cognizant.com [mailto:Rohan.GPatil@cognizant.com]
Sent: Thursday, June 10, 2010 6:39 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.


Hi,

Using solution 1 was not a bad idea, but the problem is the content is stored as BLOB in the database and gluing other fields with BLOB is not possible (Is it ?) .

Regarding 2 : Yes I guess I can do that modification, and anyway it all depends on how we show it to the user.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com

-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Thursday, June 10, 2010 3:19 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

(1) The JDBC connector is currently relatively primitive and does not have any support for "document metadata" at this time.  You can, of course, glom together multiple fields into the content field with it, but that's pretty crude.
(2) The LCF convention for how to identify documents uniquely in the target index is to use the URL of the document.  All documents indexed with LCF have such a URL and it is likely to be both useful and unique.  This url is how LCF requests deletion of the document from the index, if necessary, and also overwrites the document.  So it maps pretty precisely to literal.id for the basic solr setup.  Now, it may be that this is too tied to the example, and that the solr connector should have a configuration setting to allow the name of the id field used to be changed - that sounds like a reasonable modification that would not be too difficult to do.  Is this something you are looking for?

Karl
________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Thursday, June 10, 2010 4:52 AM
To: connectors-user@incubator.apache.org
Subject: Other document data.

I am using JDBC connection to search for the documents in the database.

The issue is  some document data(Check in date etc) is present in the other columns. How to send this data to Solr so as to index it.

Why is the URL of the file taken as ID in Solr.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com<ma...@cognizant.com>

This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.




This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of 
the intended recipient(s) and may contain confidential and privileged 
information.
If you are not the intended recipient, please contact the sender by 
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding, 
printing or copying of this email or any action taken in reliance on this 
e-mail is strictly prohibited and may be unlawful.

  

RE: Other document data.

Posted by ka...@nokia.com.
A fix was checked in yesterday for the missing "literal." in all metadata field names.  Is this fix working for you?

Karl

________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Tuesday, June 15, 2010 7:04 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

Yes the metadata is sent to solr. But it is being sent like this.

webapp=/solr path=/update/extract params={fmap.content=text&DDOCAUTHOR=sysadmin&uprefix=attr_&literal.id=/idc/groups/public/documents/sunil_layout/mfe001251.xml&lowernames=true&captureAttr=true}

as you can see DDOCAUTHOR is sent in the params part which is a custom metadata.

But to add custom data as metadata in Solr we should pass in the literal parameter along with the file.

So, there will be a small change in the code.

Instead of sending the column name in the params, we should send

literal.<<columnname>> (columnnames in small case)


Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Tuesday, June 15, 2010 4:03 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

That's hard to determine without looking at the solr logs.  I am not familiar with the log options available, but unless I'm mistaken the default configuration should be dumping every request to standard out.

Karl
________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Tuesday, June 15, 2010 5:23 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

Yes I get it. Thanks for the clarification.

I was doing the similar thing before and it used to run, now it didn't. So I got confused.

Is there any way to check if metadata is actually sent to solr? Because I am experiencing some problem there and I don't seem to figure out where it is going wrong.


Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com

-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Tuesday, June 15, 2010 2:13 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

LCF is an incremental crawler.  The version query is used to determine whether data needs to be refetched and reindexed.  If it returns the same thing each time the document is examined, the data query will not be run the second time.  I therefore suggest either the following:

(1) Supply no version query at all.  That signals to the connector that there is no version information and the data must be reindexed on every job run.
(2) Supply a version query that properly reflects changes to the data.  For instance, if there's a timestamp in each record, you can use that by itself ONLY if any metadata changes also are associated with a change in that timestamp.  If not, you will need to glom the metadata into the version string as well as the timestamp.  Is this understood?

If you want to FORCE a reindex, there is a link in the crawler-ui for the output connection which allows you to force reindexing of all data associated with that connection.

If this still doesn't seem to describe what you are seeing, please clarify further.

Thanks,
Karl

________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Tuesday, June 15, 2010 12:51 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

When we specify the metadata content, It runs fine the first time, The second time it doesn't run the data query at all. What must be the problem ?

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Sunday, June 13, 2010 7:04 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

No.  The data query (the same one that returns the blob info) can now include additional columns.  These columns will be sent to Solr as metadata fields.

Karl

________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Friday, June 11, 2010 2:28 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

I see that the issue is resolved.

Now is there a new query where in we can specify the metadata fields ?

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Thursday, June 10, 2010 4:12 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

It is not possible to properly glom other fields onto a BLOB unless you know that the blob's contents are always encoded text.  So I suggest you create a jira enhancement request in the Lucene Connector Framework project to describe this enhancement (adding metadata support to JDBC connector).

The url is: http://issues.apache.org/jira

You may need to create an account if you don't already have one.  Let me know if you have any difficulties.

Thanks,
Karl


-----Original Message-----
From: ext Rohan.GPatil@cognizant.com [mailto:Rohan.GPatil@cognizant.com]
Sent: Thursday, June 10, 2010 6:39 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.


Hi,

Using solution 1 was not a bad idea, but the problem is the content is stored as BLOB in the database and gluing other fields with BLOB is not possible (Is it ?) .

Regarding 2 : Yes I guess I can do that modification, and anyway it all depends on how we show it to the user.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com

-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Thursday, June 10, 2010 3:19 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

(1) The JDBC connector is currently relatively primitive and does not have any support for "document metadata" at this time.  You can, of course, glom together multiple fields into the content field with it, but that's pretty crude.
(2) The LCF convention for how to identify documents uniquely in the target index is to use the URL of the document.  All documents indexed with LCF have such a URL and it is likely to be both useful and unique.  This url is how LCF requests deletion of the document from the index, if necessary, and also overwrites the document.  So it maps pretty precisely to literal.id for the basic solr setup.  Now, it may be that this is too tied to the example, and that the solr connector should have a configuration setting to allow the name of the id field used to be changed - that sounds like a reasonable modification that would not be too difficult to do.  Is this something you are looking for?

Karl
________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Thursday, June 10, 2010 4:52 AM
To: connectors-user@incubator.apache.org
Subject: Other document data.

I am using JDBC connection to search for the documents in the database.

The issue is  some document data(Check in date etc) is present in the other columns. How to send this data to Solr so as to index it.

Why is the URL of the file taken as ID in Solr.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com<ma...@cognizant.com>

This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.




This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



RE: Other document data.

Posted by ka...@nokia.com.
Done.  r954843.

Karl


-----Original Message-----
From: ext Erik Hatcher [mailto:erik.hatcher@gmail.com] 
Sent: Tuesday, June 15, 2010 7:35 AM
To: connectors-user@incubator.apache.org
Subject: Re: Other document data.

Note that lowernames=true will take care of the case change on the  
Solr side anyway.  But "literal." does need to be prepended, and  
doesn't really make sense for the SQL query to "as" rename it with  
that prefix.  LCF should prepend it automatically.

	Erik

On Jun 15, 2010, at 7:31 AM, <ka...@nokia.com> wrote:

> When you write the query, you have complete control over the column  
> names.
> We can certainly always attach "literal."  to the front of each  
> metadata parameter, which I'm happy to do, but mapping case etc I  
> think should be under the user's control.
>
> So, your query should look something like this:
>
> SELECT DDOCAUTHOR as ddocauthor, ... FROM ...
>
> Does this work for you?
>
> Karl
>
>
> -----Original Message-----
> From: ext Rohan.GPatil@cognizant.com [mailto:Rohan.GPatil@cognizant.com 
> ]
> Sent: Tuesday, June 15, 2010 7:05 AM
> To: connectors-user@incubator.apache.org
> Subject: RE: Other document data.
>
>
> Hi,
>
> Yes the metadata is sent to solr. But it is being sent like this.
>
> webapp=/solr path=/update/extract  
> params 
> ={fmap.content=text&DDOCAUTHOR=sysadmin&uprefix=attr_&literal.id=/ 
> idc/groups/public/documents/sunil_layout/ 
> mfe001251.xml&lowernames=true&captureAttr=true}
>
> as you can see DDOCAUTHOR is sent in the params part which is a  
> custom metadata.
>
> But to add custom data as metadata in Solr we should pass in the  
> literal parameter along with the file.
>
> So, there will be a small change in the code.
>
> Instead of sending the column name in the params, we should send
>
> literal.<<columnname>> (columnnames in small case)
>
>
> Thanks & Regards,
> Rohan G Patil
> Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91  
> 9535577001
> Rohan.GPatil@cognizant.com
>
>
> -----Original Message-----
> From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
> Sent: Tuesday, June 15, 2010 4:03 PM
> To: connectors-user@incubator.apache.org
> Subject: RE: Other document data.
>
> That's hard to determine without looking at the solr logs.  I am not  
> familiar with the log options available, but unless I'm mistaken the  
> default configuration should be dumping every request to standard out.
>
> Karl
> ________________________________________
> From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
> Sent: Tuesday, June 15, 2010 5:23 AM
> To: connectors-user@incubator.apache.org
> Subject: RE: Other document data.
>
> Hi,
>
> Yes I get it. Thanks for the clarification.
>
> I was doing the similar thing before and it used to run, now it  
> didn't. So I got confused.
>
> Is there any way to check if metadata is actually sent to solr?  
> Because I am experiencing some problem there and I don't seem to  
> figure out where it is going wrong.
>
>
> Thanks & Regards,
> Rohan G Patil
> Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91  
> 9535577001
> Rohan.GPatil@cognizant.com
>
> -----Original Message-----
> From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
> Sent: Tuesday, June 15, 2010 2:13 PM
> To: connectors-user@incubator.apache.org
> Subject: RE: Other document data.
>
> LCF is an incremental crawler.  The version query is used to  
> determine whether data needs to be refetched and reindexed.  If it  
> returns the same thing each time the document is examined, the data  
> query will not be run the second time.  I therefore suggest either  
> the following:
>
> (1) Supply no version query at all.  That signals to the connector  
> that there is no version information and the data must be reindexed  
> on every job run.
> (2) Supply a version query that properly reflects changes to the  
> data.  For instance, if there's a timestamp in each record, you can  
> use that by itself ONLY if any metadata changes also are associated  
> with a change in that timestamp.  If not, you will need to glom the  
> metadata into the version string as well as the timestamp.  Is this  
> understood?
>
> If you want to FORCE a reindex, there is a link in the crawler-ui  
> for the output connection which allows you to force reindexing of  
> all data associated with that connection.
>
> If this still doesn't seem to describe what you are seeing, please  
> clarify further.
>
> Thanks,
> Karl
>
> ________________________________________
> From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
> Sent: Tuesday, June 15, 2010 12:51 AM
> To: connectors-user@incubator.apache.org
> Subject: RE: Other document data.
>
> Hi,
>
> When we specify the metadata content, It runs fine the first time,  
> The second time it doesn't run the data query at all. What must be  
> the problem ?
>
> Thanks & Regards,
> Rohan G Patil
> Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91  
> 9535577001
> Rohan.GPatil@cognizant.com
>
>
> -----Original Message-----
> From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
> Sent: Sunday, June 13, 2010 7:04 AM
> To: connectors-user@incubator.apache.org
> Subject: RE: Other document data.
>
> No.  The data query (the same one that returns the blob info) can  
> now include additional columns.  These columns will be sent to Solr  
> as metadata fields.
>
> Karl
>
> ________________________________________
> From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
> Sent: Friday, June 11, 2010 2:28 AM
> To: connectors-user@incubator.apache.org
> Subject: RE: Other document data.
>
> Hi,
>
> I see that the issue is resolved.
>
> Now is there a new query where in we can specify the metadata fields ?
>
> Thanks & Regards,
> Rohan G Patil
> Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91  
> 9535577001
> Rohan.GPatil@cognizant.com
>
>
> -----Original Message-----
> From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
> Sent: Thursday, June 10, 2010 4:12 PM
> To: connectors-user@incubator.apache.org
> Subject: RE: Other document data.
>
> It is not possible to properly glom other fields onto a BLOB unless  
> you know that the blob's contents are always encoded text.  So I  
> suggest you create a jira enhancement request in the Lucene  
> Connector Framework project to describe this enhancement (adding  
> metadata support to JDBC connector).
>
> The url is: http://issues.apache.org/jira
>
> You may need to create an account if you don't already have one.   
> Let me know if you have any difficulties.
>
> Thanks,
> Karl
>
>
> -----Original Message-----
> From: ext Rohan.GPatil@cognizant.com [mailto:Rohan.GPatil@cognizant.com 
> ]
> Sent: Thursday, June 10, 2010 6:39 AM
> To: connectors-user@incubator.apache.org
> Subject: RE: Other document data.
>
>
> Hi,
>
> Using solution 1 was not a bad idea, but the problem is the content  
> is stored as BLOB in the database and gluing other fields with BLOB  
> is not possible (Is it ?) .
>
> Regarding 2 : Yes I guess I can do that modification, and anyway it  
> all depends on how we show it to the user.
>
> Thanks & Regards,
> Rohan G Patil
> Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91  
> 9535577001
> Rohan.GPatil@cognizant.com
>
> -----Original Message-----
> From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
> Sent: Thursday, June 10, 2010 3:19 PM
> To: connectors-user@incubator.apache.org
> Subject: RE: Other document data.
>
> (1) The JDBC connector is currently relatively primitive and does  
> not have any support for "document metadata" at this time.  You can,  
> of course, glom together multiple fields into the content field with  
> it, but that's pretty crude.
> (2) The LCF convention for how to identify documents uniquely in the  
> target index is to use the URL of the document.  All documents  
> indexed with LCF have such a URL and it is likely to be both useful  
> and unique.  This url is how LCF requests deletion of the document  
> from the index, if necessary, and also overwrites the document.  So  
> it maps pretty precisely to literal.id for the basic solr setup.   
> Now, it may be that this is too tied to the example, and that the  
> solr connector should have a configuration setting to allow the name  
> of the id field used to be changed - that sounds like a reasonable  
> modification that would not be too difficult to do.  Is this  
> something you are looking for?
>
> Karl
> ________________________________________
> From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
> Sent: Thursday, June 10, 2010 4:52 AM
> To: connectors-user@incubator.apache.org
> Subject: Other document data.
>
> I am using JDBC connection to search for the documents in the  
> database.
>
> The issue is  some document data(Check in date etc) is present in  
> the other columns. How to send this data to Solr so as to index it.
>
> Why is the URL of the file taken as ID in Solr.
>
> Thanks & Regards,
> Rohan G Patil
> Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91  
> 9535577001
> Rohan.GPatil@cognizant.com<ma...@cognizant.com>
>
> This e-mail and any files transmitted with it are for the sole use of
> the intended recipient(s) and may contain confidential and privileged
> information.
> If you are not the intended recipient, please contact the sender by
> reply e-mail and destroy all copies of the original message.
> Any unauthorized review, use, disclosure, dissemination, forwarding,
> printing or copying of this email or any action taken in reliance on  
> this
> e-mail is strictly prohibited and may be unlawful.
>
>
>
>
> This e-mail and any files transmitted with it are for the sole use of
> the intended recipient(s) and may contain confidential and privileged
> information.
> If you are not the intended recipient, please contact the sender by
> reply e-mail and destroy all copies of the original message.
> Any unauthorized review, use, disclosure, dissemination, forwarding,
> printing or copying of this email or any action taken in reliance on  
> this
> e-mail is strictly prohibited and may be unlawful.
>
>
>
> This e-mail and any files transmitted with it are for the sole use of
> the intended recipient(s) and may contain confidential and privileged
> information.
> If you are not the intended recipient, please contact the sender by
> reply e-mail and destroy all copies of the original message.
> Any unauthorized review, use, disclosure, dissemination, forwarding,
> printing or copying of this email or any action taken in reliance on  
> this
> e-mail is strictly prohibited and may be unlawful.
>
>
>
> This e-mail and any files transmitted with it are for the sole use of
> the intended recipient(s) and may contain confidential and privileged
> information.
> If you are not the intended recipient, please contact the sender by
> reply e-mail and destroy all copies of the original message.
> Any unauthorized review, use, disclosure, dissemination, forwarding,
> printing or copying of this email or any action taken in reliance on  
> this
> e-mail is strictly prohibited and may be unlawful.
>
>
>
> This e-mail and any files transmitted with it are for the sole use of
> the intended recipient(s) and may contain confidential and privileged
> information.
> If you are not the intended recipient, please contact the sender by
> reply e-mail and destroy all copies of the original message.
> Any unauthorized review, use, disclosure, dissemination, forwarding,
> printing or copying of this email or any action taken in reliance on  
> this
> e-mail is strictly prohibited and may be unlawful.
>
>
>
> This e-mail and any files transmitted with it are for the sole use of
> the intended recipient(s) and may contain confidential and privileged
> information.
> If you are not the intended recipient, please contact the sender by
> reply e-mail and destroy all copies of the original message.
> Any unauthorized review, use, disclosure, dissemination, forwarding,
> printing or copying of this email or any action taken in reliance on  
> this
> e-mail is strictly prohibited and may be unlawful.
>
>


Re: Other document data.

Posted by Erik Hatcher <er...@gmail.com>.
Note that lowernames=true will take care of the case change on the  
Solr side anyway.  But "literal." does need to be prepended, and  
doesn't really make sense for the SQL query to "as" rename it with  
that prefix.  LCF should prepend it automatically.

	Erik

On Jun 15, 2010, at 7:31 AM, <ka...@nokia.com> wrote:

> When you write the query, you have complete control over the column  
> names.
> We can certainly always attach "literal."  to the front of each  
> metadata parameter, which I'm happy to do, but mapping case etc I  
> think should be under the user's control.
>
> So, your query should look something like this:
>
> SELECT DDOCAUTHOR as ddocauthor, ... FROM ...
>
> Does this work for you?
>
> Karl
>
>
> -----Original Message-----
> From: ext Rohan.GPatil@cognizant.com [mailto:Rohan.GPatil@cognizant.com 
> ]
> Sent: Tuesday, June 15, 2010 7:05 AM
> To: connectors-user@incubator.apache.org
> Subject: RE: Other document data.
>
>
> Hi,
>
> Yes the metadata is sent to solr. But it is being sent like this.
>
> webapp=/solr path=/update/extract  
> params 
> ={fmap.content=text&DDOCAUTHOR=sysadmin&uprefix=attr_&literal.id=/ 
> idc/groups/public/documents/sunil_layout/ 
> mfe001251.xml&lowernames=true&captureAttr=true}
>
> as you can see DDOCAUTHOR is sent in the params part which is a  
> custom metadata.
>
> But to add custom data as metadata in Solr we should pass in the  
> literal parameter along with the file.
>
> So, there will be a small change in the code.
>
> Instead of sending the column name in the params, we should send
>
> literal.<<columnname>> (columnnames in small case)
>
>
> Thanks & Regards,
> Rohan G Patil
> Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91  
> 9535577001
> Rohan.GPatil@cognizant.com
>
>
> -----Original Message-----
> From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
> Sent: Tuesday, June 15, 2010 4:03 PM
> To: connectors-user@incubator.apache.org
> Subject: RE: Other document data.
>
> That's hard to determine without looking at the solr logs.  I am not  
> familiar with the log options available, but unless I'm mistaken the  
> default configuration should be dumping every request to standard out.
>
> Karl
> ________________________________________
> From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
> Sent: Tuesday, June 15, 2010 5:23 AM
> To: connectors-user@incubator.apache.org
> Subject: RE: Other document data.
>
> Hi,
>
> Yes I get it. Thanks for the clarification.
>
> I was doing the similar thing before and it used to run, now it  
> didn't. So I got confused.
>
> Is there any way to check if metadata is actually sent to solr?  
> Because I am experiencing some problem there and I don't seem to  
> figure out where it is going wrong.
>
>
> Thanks & Regards,
> Rohan G Patil
> Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91  
> 9535577001
> Rohan.GPatil@cognizant.com
>
> -----Original Message-----
> From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
> Sent: Tuesday, June 15, 2010 2:13 PM
> To: connectors-user@incubator.apache.org
> Subject: RE: Other document data.
>
> LCF is an incremental crawler.  The version query is used to  
> determine whether data needs to be refetched and reindexed.  If it  
> returns the same thing each time the document is examined, the data  
> query will not be run the second time.  I therefore suggest either  
> the following:
>
> (1) Supply no version query at all.  That signals to the connector  
> that there is no version information and the data must be reindexed  
> on every job run.
> (2) Supply a version query that properly reflects changes to the  
> data.  For instance, if there's a timestamp in each record, you can  
> use that by itself ONLY if any metadata changes also are associated  
> with a change in that timestamp.  If not, you will need to glom the  
> metadata into the version string as well as the timestamp.  Is this  
> understood?
>
> If you want to FORCE a reindex, there is a link in the crawler-ui  
> for the output connection which allows you to force reindexing of  
> all data associated with that connection.
>
> If this still doesn't seem to describe what you are seeing, please  
> clarify further.
>
> Thanks,
> Karl
>
> ________________________________________
> From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
> Sent: Tuesday, June 15, 2010 12:51 AM
> To: connectors-user@incubator.apache.org
> Subject: RE: Other document data.
>
> Hi,
>
> When we specify the metadata content, It runs fine the first time,  
> The second time it doesn't run the data query at all. What must be  
> the problem ?
>
> Thanks & Regards,
> Rohan G Patil
> Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91  
> 9535577001
> Rohan.GPatil@cognizant.com
>
>
> -----Original Message-----
> From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
> Sent: Sunday, June 13, 2010 7:04 AM
> To: connectors-user@incubator.apache.org
> Subject: RE: Other document data.
>
> No.  The data query (the same one that returns the blob info) can  
> now include additional columns.  These columns will be sent to Solr  
> as metadata fields.
>
> Karl
>
> ________________________________________
> From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
> Sent: Friday, June 11, 2010 2:28 AM
> To: connectors-user@incubator.apache.org
> Subject: RE: Other document data.
>
> Hi,
>
> I see that the issue is resolved.
>
> Now is there a new query where in we can specify the metadata fields ?
>
> Thanks & Regards,
> Rohan G Patil
> Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91  
> 9535577001
> Rohan.GPatil@cognizant.com
>
>
> -----Original Message-----
> From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
> Sent: Thursday, June 10, 2010 4:12 PM
> To: connectors-user@incubator.apache.org
> Subject: RE: Other document data.
>
> It is not possible to properly glom other fields onto a BLOB unless  
> you know that the blob's contents are always encoded text.  So I  
> suggest you create a jira enhancement request in the Lucene  
> Connector Framework project to describe this enhancement (adding  
> metadata support to JDBC connector).
>
> The url is: http://issues.apache.org/jira
>
> You may need to create an account if you don't already have one.   
> Let me know if you have any difficulties.
>
> Thanks,
> Karl
>
>
> -----Original Message-----
> From: ext Rohan.GPatil@cognizant.com [mailto:Rohan.GPatil@cognizant.com 
> ]
> Sent: Thursday, June 10, 2010 6:39 AM
> To: connectors-user@incubator.apache.org
> Subject: RE: Other document data.
>
>
> Hi,
>
> Using solution 1 was not a bad idea, but the problem is the content  
> is stored as BLOB in the database and gluing other fields with BLOB  
> is not possible (Is it ?) .
>
> Regarding 2 : Yes I guess I can do that modification, and anyway it  
> all depends on how we show it to the user.
>
> Thanks & Regards,
> Rohan G Patil
> Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91  
> 9535577001
> Rohan.GPatil@cognizant.com
>
> -----Original Message-----
> From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
> Sent: Thursday, June 10, 2010 3:19 PM
> To: connectors-user@incubator.apache.org
> Subject: RE: Other document data.
>
> (1) The JDBC connector is currently relatively primitive and does  
> not have any support for "document metadata" at this time.  You can,  
> of course, glom together multiple fields into the content field with  
> it, but that's pretty crude.
> (2) The LCF convention for how to identify documents uniquely in the  
> target index is to use the URL of the document.  All documents  
> indexed with LCF have such a URL and it is likely to be both useful  
> and unique.  This url is how LCF requests deletion of the document  
> from the index, if necessary, and also overwrites the document.  So  
> it maps pretty precisely to literal.id for the basic solr setup.   
> Now, it may be that this is too tied to the example, and that the  
> solr connector should have a configuration setting to allow the name  
> of the id field used to be changed - that sounds like a reasonable  
> modification that would not be too difficult to do.  Is this  
> something you are looking for?
>
> Karl
> ________________________________________
> From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
> Sent: Thursday, June 10, 2010 4:52 AM
> To: connectors-user@incubator.apache.org
> Subject: Other document data.
>
> I am using JDBC connection to search for the documents in the  
> database.
>
> The issue is  some document data(Check in date etc) is present in  
> the other columns. How to send this data to Solr so as to index it.
>
> Why is the URL of the file taken as ID in Solr.
>
> Thanks & Regards,
> Rohan G Patil
> Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91  
> 9535577001
> Rohan.GPatil@cognizant.com<ma...@cognizant.com>
>
> This e-mail and any files transmitted with it are for the sole use of
> the intended recipient(s) and may contain confidential and privileged
> information.
> If you are not the intended recipient, please contact the sender by
> reply e-mail and destroy all copies of the original message.
> Any unauthorized review, use, disclosure, dissemination, forwarding,
> printing or copying of this email or any action taken in reliance on  
> this
> e-mail is strictly prohibited and may be unlawful.
>
>
>
>
> This e-mail and any files transmitted with it are for the sole use of
> the intended recipient(s) and may contain confidential and privileged
> information.
> If you are not the intended recipient, please contact the sender by
> reply e-mail and destroy all copies of the original message.
> Any unauthorized review, use, disclosure, dissemination, forwarding,
> printing or copying of this email or any action taken in reliance on  
> this
> e-mail is strictly prohibited and may be unlawful.
>
>
>
> This e-mail and any files transmitted with it are for the sole use of
> the intended recipient(s) and may contain confidential and privileged
> information.
> If you are not the intended recipient, please contact the sender by
> reply e-mail and destroy all copies of the original message.
> Any unauthorized review, use, disclosure, dissemination, forwarding,
> printing or copying of this email or any action taken in reliance on  
> this
> e-mail is strictly prohibited and may be unlawful.
>
>
>
> This e-mail and any files transmitted with it are for the sole use of
> the intended recipient(s) and may contain confidential and privileged
> information.
> If you are not the intended recipient, please contact the sender by
> reply e-mail and destroy all copies of the original message.
> Any unauthorized review, use, disclosure, dissemination, forwarding,
> printing or copying of this email or any action taken in reliance on  
> this
> e-mail is strictly prohibited and may be unlawful.
>
>
>
> This e-mail and any files transmitted with it are for the sole use of
> the intended recipient(s) and may contain confidential and privileged
> information.
> If you are not the intended recipient, please contact the sender by
> reply e-mail and destroy all copies of the original message.
> Any unauthorized review, use, disclosure, dissemination, forwarding,
> printing or copying of this email or any action taken in reliance on  
> this
> e-mail is strictly prohibited and may be unlawful.
>
>
>
> This e-mail and any files transmitted with it are for the sole use of
> the intended recipient(s) and may contain confidential and privileged
> information.
> If you are not the intended recipient, please contact the sender by
> reply e-mail and destroy all copies of the original message.
> Any unauthorized review, use, disclosure, dissemination, forwarding,
> printing or copying of this email or any action taken in reliance on  
> this
> e-mail is strictly prohibited and may be unlawful.
>
>


RE: Other document data.

Posted by ka...@nokia.com.
When you write the query, you have complete control over the column names.
We can certainly always attach "literal."  to the front of each metadata parameter, which I'm happy to do, but mapping case etc I think should be under the user's control.

So, your query should look something like this:

SELECT DDOCAUTHOR as ddocauthor, ... FROM ...

Does this work for you?

Karl


-----Original Message-----
From: ext Rohan.GPatil@cognizant.com [mailto:Rohan.GPatil@cognizant.com] 
Sent: Tuesday, June 15, 2010 7:05 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.


Hi,

Yes the metadata is sent to solr. But it is being sent like this.

webapp=/solr path=/update/extract params={fmap.content=text&DDOCAUTHOR=sysadmin&uprefix=attr_&literal.id=/idc/groups/public/documents/sunil_layout/mfe001251.xml&lowernames=true&captureAttr=true}

as you can see DDOCAUTHOR is sent in the params part which is a custom metadata.

But to add custom data as metadata in Solr we should pass in the literal parameter along with the file.

So, there will be a small change in the code.

Instead of sending the column name in the params, we should send

literal.<<columnname>> (columnnames in small case)


Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001 
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com] 
Sent: Tuesday, June 15, 2010 4:03 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

That's hard to determine without looking at the solr logs.  I am not familiar with the log options available, but unless I'm mistaken the default configuration should be dumping every request to standard out.

Karl
________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Tuesday, June 15, 2010 5:23 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

Yes I get it. Thanks for the clarification.

I was doing the similar thing before and it used to run, now it didn't. So I got confused.

Is there any way to check if metadata is actually sent to solr? Because I am experiencing some problem there and I don't seem to figure out where it is going wrong.


Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com

-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Tuesday, June 15, 2010 2:13 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

LCF is an incremental crawler.  The version query is used to determine whether data needs to be refetched and reindexed.  If it returns the same thing each time the document is examined, the data query will not be run the second time.  I therefore suggest either the following:

(1) Supply no version query at all.  That signals to the connector that there is no version information and the data must be reindexed on every job run.
(2) Supply a version query that properly reflects changes to the data.  For instance, if there's a timestamp in each record, you can use that by itself ONLY if any metadata changes also are associated with a change in that timestamp.  If not, you will need to glom the metadata into the version string as well as the timestamp.  Is this understood?

If you want to FORCE a reindex, there is a link in the crawler-ui for the output connection which allows you to force reindexing of all data associated with that connection.

If this still doesn't seem to describe what you are seeing, please clarify further.

Thanks,
Karl

________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Tuesday, June 15, 2010 12:51 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

When we specify the metadata content, It runs fine the first time, The second time it doesn't run the data query at all. What must be the problem ?

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Sunday, June 13, 2010 7:04 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

No.  The data query (the same one that returns the blob info) can now include additional columns.  These columns will be sent to Solr as metadata fields.

Karl

________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Friday, June 11, 2010 2:28 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

I see that the issue is resolved.

Now is there a new query where in we can specify the metadata fields ?

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Thursday, June 10, 2010 4:12 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

It is not possible to properly glom other fields onto a BLOB unless you know that the blob's contents are always encoded text.  So I suggest you create a jira enhancement request in the Lucene Connector Framework project to describe this enhancement (adding metadata support to JDBC connector).

The url is: http://issues.apache.org/jira

You may need to create an account if you don't already have one.  Let me know if you have any difficulties.

Thanks,
Karl


-----Original Message-----
From: ext Rohan.GPatil@cognizant.com [mailto:Rohan.GPatil@cognizant.com]
Sent: Thursday, June 10, 2010 6:39 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.


Hi,

Using solution 1 was not a bad idea, but the problem is the content is stored as BLOB in the database and gluing other fields with BLOB is not possible (Is it ?) .

Regarding 2 : Yes I guess I can do that modification, and anyway it all depends on how we show it to the user.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com

-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Thursday, June 10, 2010 3:19 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

(1) The JDBC connector is currently relatively primitive and does not have any support for "document metadata" at this time.  You can, of course, glom together multiple fields into the content field with it, but that's pretty crude.
(2) The LCF convention for how to identify documents uniquely in the target index is to use the URL of the document.  All documents indexed with LCF have such a URL and it is likely to be both useful and unique.  This url is how LCF requests deletion of the document from the index, if necessary, and also overwrites the document.  So it maps pretty precisely to literal.id for the basic solr setup.  Now, it may be that this is too tied to the example, and that the solr connector should have a configuration setting to allow the name of the id field used to be changed - that sounds like a reasonable modification that would not be too difficult to do.  Is this something you are looking for?

Karl
________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Thursday, June 10, 2010 4:52 AM
To: connectors-user@incubator.apache.org
Subject: Other document data.

I am using JDBC connection to search for the documents in the database.

The issue is  some document data(Check in date etc) is present in the other columns. How to send this data to Solr so as to index it.

Why is the URL of the file taken as ID in Solr.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com<ma...@cognizant.com>

This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.




This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of 
the intended recipient(s) and may contain confidential and privileged 
information.
If you are not the intended recipient, please contact the sender by 
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding, 
printing or copying of this email or any action taken in reliance on this 
e-mail is strictly prohibited and may be unlawful.

  

RE: Other document data.

Posted by Ro...@cognizant.com.
Hi,

Yes the metadata is sent to solr. But it is being sent like this.

webapp=/solr path=/update/extract params={fmap.content=text&DDOCAUTHOR=sysadmin&uprefix=attr_&literal.id=/idc/groups/public/documents/sunil_layout/mfe001251.xml&lowernames=true&captureAttr=true}

as you can see DDOCAUTHOR is sent in the params part which is a custom metadata.

But to add custom data as metadata in Solr we should pass in the literal parameter along with the file.

So, there will be a small change in the code.

Instead of sending the column name in the params, we should send

literal.<<columnname>> (columnnames in small case)


Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001 
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com] 
Sent: Tuesday, June 15, 2010 4:03 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

That's hard to determine without looking at the solr logs.  I am not familiar with the log options available, but unless I'm mistaken the default configuration should be dumping every request to standard out.

Karl
________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Tuesday, June 15, 2010 5:23 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

Yes I get it. Thanks for the clarification.

I was doing the similar thing before and it used to run, now it didn't. So I got confused.

Is there any way to check if metadata is actually sent to solr? Because I am experiencing some problem there and I don't seem to figure out where it is going wrong.


Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com

-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Tuesday, June 15, 2010 2:13 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

LCF is an incremental crawler.  The version query is used to determine whether data needs to be refetched and reindexed.  If it returns the same thing each time the document is examined, the data query will not be run the second time.  I therefore suggest either the following:

(1) Supply no version query at all.  That signals to the connector that there is no version information and the data must be reindexed on every job run.
(2) Supply a version query that properly reflects changes to the data.  For instance, if there's a timestamp in each record, you can use that by itself ONLY if any metadata changes also are associated with a change in that timestamp.  If not, you will need to glom the metadata into the version string as well as the timestamp.  Is this understood?

If you want to FORCE a reindex, there is a link in the crawler-ui for the output connection which allows you to force reindexing of all data associated with that connection.

If this still doesn't seem to describe what you are seeing, please clarify further.

Thanks,
Karl

________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Tuesday, June 15, 2010 12:51 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

When we specify the metadata content, It runs fine the first time, The second time it doesn't run the data query at all. What must be the problem ?

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Sunday, June 13, 2010 7:04 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

No.  The data query (the same one that returns the blob info) can now include additional columns.  These columns will be sent to Solr as metadata fields.

Karl

________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Friday, June 11, 2010 2:28 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

I see that the issue is resolved.

Now is there a new query where in we can specify the metadata fields ?

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Thursday, June 10, 2010 4:12 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

It is not possible to properly glom other fields onto a BLOB unless you know that the blob's contents are always encoded text.  So I suggest you create a jira enhancement request in the Lucene Connector Framework project to describe this enhancement (adding metadata support to JDBC connector).

The url is: http://issues.apache.org/jira

You may need to create an account if you don't already have one.  Let me know if you have any difficulties.

Thanks,
Karl


-----Original Message-----
From: ext Rohan.GPatil@cognizant.com [mailto:Rohan.GPatil@cognizant.com]
Sent: Thursday, June 10, 2010 6:39 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.


Hi,

Using solution 1 was not a bad idea, but the problem is the content is stored as BLOB in the database and gluing other fields with BLOB is not possible (Is it ?) .

Regarding 2 : Yes I guess I can do that modification, and anyway it all depends on how we show it to the user.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com

-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Thursday, June 10, 2010 3:19 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

(1) The JDBC connector is currently relatively primitive and does not have any support for "document metadata" at this time.  You can, of course, glom together multiple fields into the content field with it, but that's pretty crude.
(2) The LCF convention for how to identify documents uniquely in the target index is to use the URL of the document.  All documents indexed with LCF have such a URL and it is likely to be both useful and unique.  This url is how LCF requests deletion of the document from the index, if necessary, and also overwrites the document.  So it maps pretty precisely to literal.id for the basic solr setup.  Now, it may be that this is too tied to the example, and that the solr connector should have a configuration setting to allow the name of the id field used to be changed - that sounds like a reasonable modification that would not be too difficult to do.  Is this something you are looking for?

Karl
________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Thursday, June 10, 2010 4:52 AM
To: connectors-user@incubator.apache.org
Subject: Other document data.

I am using JDBC connection to search for the documents in the database.

The issue is  some document data(Check in date etc) is present in the other columns. How to send this data to Solr so as to index it.

Why is the URL of the file taken as ID in Solr.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com<ma...@cognizant.com>

This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.




This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of 
the intended recipient(s) and may contain confidential and privileged 
information.
If you are not the intended recipient, please contact the sender by 
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding, 
printing or copying of this email or any action taken in reliance on this 
e-mail is strictly prohibited and may be unlawful.

  

RE: Other document data.

Posted by ka...@nokia.com.
That's hard to determine without looking at the solr logs.  I am not familiar with the log options available, but unless I'm mistaken the default configuration should be dumping every request to standard out.

Karl
________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Tuesday, June 15, 2010 5:23 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

Yes I get it. Thanks for the clarification.

I was doing the similar thing before and it used to run, now it didn't. So I got confused.

Is there any way to check if metadata is actually sent to solr? Because I am experiencing some problem there and I don't seem to figure out where it is going wrong.


Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com

-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Tuesday, June 15, 2010 2:13 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

LCF is an incremental crawler.  The version query is used to determine whether data needs to be refetched and reindexed.  If it returns the same thing each time the document is examined, the data query will not be run the second time.  I therefore suggest either the following:

(1) Supply no version query at all.  That signals to the connector that there is no version information and the data must be reindexed on every job run.
(2) Supply a version query that properly reflects changes to the data.  For instance, if there's a timestamp in each record, you can use that by itself ONLY if any metadata changes also are associated with a change in that timestamp.  If not, you will need to glom the metadata into the version string as well as the timestamp.  Is this understood?

If you want to FORCE a reindex, there is a link in the crawler-ui for the output connection which allows you to force reindexing of all data associated with that connection.

If this still doesn't seem to describe what you are seeing, please clarify further.

Thanks,
Karl

________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Tuesday, June 15, 2010 12:51 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

When we specify the metadata content, It runs fine the first time, The second time it doesn't run the data query at all. What must be the problem ?

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Sunday, June 13, 2010 7:04 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

No.  The data query (the same one that returns the blob info) can now include additional columns.  These columns will be sent to Solr as metadata fields.

Karl

________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Friday, June 11, 2010 2:28 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

I see that the issue is resolved.

Now is there a new query where in we can specify the metadata fields ?

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Thursday, June 10, 2010 4:12 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

It is not possible to properly glom other fields onto a BLOB unless you know that the blob's contents are always encoded text.  So I suggest you create a jira enhancement request in the Lucene Connector Framework project to describe this enhancement (adding metadata support to JDBC connector).

The url is: http://issues.apache.org/jira

You may need to create an account if you don't already have one.  Let me know if you have any difficulties.

Thanks,
Karl


-----Original Message-----
From: ext Rohan.GPatil@cognizant.com [mailto:Rohan.GPatil@cognizant.com]
Sent: Thursday, June 10, 2010 6:39 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.


Hi,

Using solution 1 was not a bad idea, but the problem is the content is stored as BLOB in the database and gluing other fields with BLOB is not possible (Is it ?) .

Regarding 2 : Yes I guess I can do that modification, and anyway it all depends on how we show it to the user.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com

-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Thursday, June 10, 2010 3:19 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

(1) The JDBC connector is currently relatively primitive and does not have any support for "document metadata" at this time.  You can, of course, glom together multiple fields into the content field with it, but that's pretty crude.
(2) The LCF convention for how to identify documents uniquely in the target index is to use the URL of the document.  All documents indexed with LCF have such a URL and it is likely to be both useful and unique.  This url is how LCF requests deletion of the document from the index, if necessary, and also overwrites the document.  So it maps pretty precisely to literal.id for the basic solr setup.  Now, it may be that this is too tied to the example, and that the solr connector should have a configuration setting to allow the name of the id field used to be changed - that sounds like a reasonable modification that would not be too difficult to do.  Is this something you are looking for?

Karl
________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Thursday, June 10, 2010 4:52 AM
To: connectors-user@incubator.apache.org
Subject: Other document data.

I am using JDBC connection to search for the documents in the database.

The issue is  some document data(Check in date etc) is present in the other columns. How to send this data to Solr so as to index it.

Why is the URL of the file taken as ID in Solr.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com<ma...@cognizant.com>

This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.




This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



Re: Other document data.

Posted by Jack Krupansky <ja...@lucidimagination.com>.
For future reference, it would be nice to have an LCF logging option (debug? 
or maybe a separate class for the actual POST) to log the text of the HTTP 
POST requests that were sent to Solr. Maybe display a max of 1,000 or 2,000 
of the actual content with a notation of how many additional bytes/chars 
were not dumped.

-- Jack Krupansky

--------------------------------------------------
From: <Ro...@cognizant.com>
Sent: Tuesday, June 15, 2010 5:23 AM
To: <co...@incubator.apache.org>
Subject: RE: Other document data.

>
> Hi,
>
> Yes I get it. Thanks for the clarification.
>
> I was doing the similar thing before and it used to run, now it didn't. So 
> I got confused.
>
> Is there any way to check if metadata is actually sent to solr? Because I 
> am experiencing some problem there and I don't seem to figure out where it 
> is going wrong.
>
>
> Thanks & Regards,
> Rohan G Patil
> Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
> Rohan.GPatil@cognizant.com
>
> -----Original Message-----
> From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
> Sent: Tuesday, June 15, 2010 2:13 PM
> To: connectors-user@incubator.apache.org
> Subject: RE: Other document data.
>
> LCF is an incremental crawler.  The version query is used to determine 
> whether data needs to be refetched and reindexed.  If it returns the same 
> thing each time the document is examined, the data query will not be run 
> the second time.  I therefore suggest either the following:
>
> (1) Supply no version query at all.  That signals to the connector that 
> there is no version information and the data must be reindexed on every 
> job run.
> (2) Supply a version query that properly reflects changes to the data. 
> For instance, if there's a timestamp in each record, you can use that by 
> itself ONLY if any metadata changes also are associated with a change in 
> that timestamp.  If not, you will need to glom the metadata into the 
> version string as well as the timestamp.  Is this understood?
>
> If you want to FORCE a reindex, there is a link in the crawler-ui for the 
> output connection which allows you to force reindexing of all data 
> associated with that connection.
>
> If this still doesn't seem to describe what you are seeing, please clarify 
> further.
>
> Thanks,
> Karl
>
> ________________________________________
> From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
> Sent: Tuesday, June 15, 2010 12:51 AM
> To: connectors-user@incubator.apache.org
> Subject: RE: Other document data.
>
> Hi,
>
> When we specify the metadata content, It runs fine the first time, The 
> second time it doesn't run the data query at all. What must be the problem 
> ?
>
> Thanks & Regards,
> Rohan G Patil
> Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
> Rohan.GPatil@cognizant.com
>
>
> -----Original Message-----
> From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
> Sent: Sunday, June 13, 2010 7:04 AM
> To: connectors-user@incubator.apache.org
> Subject: RE: Other document data.
>
> No.  The data query (the same one that returns the blob info) can now 
> include additional columns.  These columns will be sent to Solr as 
> metadata fields.
>
> Karl
>
> ________________________________________
> From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
> Sent: Friday, June 11, 2010 2:28 AM
> To: connectors-user@incubator.apache.org
> Subject: RE: Other document data.
>
> Hi,
>
> I see that the issue is resolved.
>
> Now is there a new query where in we can specify the metadata fields ?
>
> Thanks & Regards,
> Rohan G Patil
> Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
> Rohan.GPatil@cognizant.com
>
>
> -----Original Message-----
> From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
> Sent: Thursday, June 10, 2010 4:12 PM
> To: connectors-user@incubator.apache.org
> Subject: RE: Other document data.
>
> It is not possible to properly glom other fields onto a BLOB unless you 
> know that the blob's contents are always encoded text.  So I suggest you 
> create a jira enhancement request in the Lucene Connector Framework 
> project to describe this enhancement (adding metadata support to JDBC 
> connector).
>
> The url is: http://issues.apache.org/jira
>
> You may need to create an account if you don't already have one.  Let me 
> know if you have any difficulties.
>
> Thanks,
> Karl
>
>
> -----Original Message-----
> From: ext Rohan.GPatil@cognizant.com [mailto:Rohan.GPatil@cognizant.com]
> Sent: Thursday, June 10, 2010 6:39 AM
> To: connectors-user@incubator.apache.org
> Subject: RE: Other document data.
>
>
> Hi,
>
> Using solution 1 was not a bad idea, but the problem is the content is 
> stored as BLOB in the database and gluing other fields with BLOB is not 
> possible (Is it ?) .
>
> Regarding 2 : Yes I guess I can do that modification, and anyway it all 
> depends on how we show it to the user.
>
> Thanks & Regards,
> Rohan G Patil
> Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
> Rohan.GPatil@cognizant.com
>
> -----Original Message-----
> From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
> Sent: Thursday, June 10, 2010 3:19 PM
> To: connectors-user@incubator.apache.org
> Subject: RE: Other document data.
>
> (1) The JDBC connector is currently relatively primitive and does not have 
> any support for "document metadata" at this time.  You can, of course, 
> glom together multiple fields into the content field with it, but that's 
> pretty crude.
> (2) The LCF convention for how to identify documents uniquely in the 
> target index is to use the URL of the document.  All documents indexed 
> with LCF have such a URL and it is likely to be both useful and unique. 
> This url is how LCF requests deletion of the document from the index, if 
> necessary, and also overwrites the document.  So it maps pretty precisely 
> to literal.id for the basic solr setup.  Now, it may be that this is too 
> tied to the example, and that the solr connector should have a 
> configuration setting to allow the name of the id field used to be 
> changed - that sounds like a reasonable modification that would not be too 
> difficult to do.  Is this something you are looking for?
>
> Karl
> ________________________________________
> From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
> Sent: Thursday, June 10, 2010 4:52 AM
> To: connectors-user@incubator.apache.org
> Subject: Other document data.
>
> I am using JDBC connection to search for the documents in the database.
>
> The issue is  some document data(Check in date etc) is present in the 
> other columns. How to send this data to Solr so as to index it.
>
> Why is the URL of the file taken as ID in Solr.
>
> Thanks & Regards,
> Rohan G Patil
> Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
> Rohan.GPatil@cognizant.com<ma...@cognizant.com>
>
> This e-mail and any files transmitted with it are for the sole use of
> the intended recipient(s) and may contain confidential and privileged
> information.
> If you are not the intended recipient, please contact the sender by
> reply e-mail and destroy all copies of the original message.
> Any unauthorized review, use, disclosure, dissemination, forwarding,
> printing or copying of this email or any action taken in reliance on this
> e-mail is strictly prohibited and may be unlawful.
>
>
>
>
> This e-mail and any files transmitted with it are for the sole use of
> the intended recipient(s) and may contain confidential and privileged
> information.
> If you are not the intended recipient, please contact the sender by
> reply e-mail and destroy all copies of the original message.
> Any unauthorized review, use, disclosure, dissemination, forwarding,
> printing or copying of this email or any action taken in reliance on this
> e-mail is strictly prohibited and may be unlawful.
>
>
>
> This e-mail and any files transmitted with it are for the sole use of
> the intended recipient(s) and may contain confidential and privileged
> information.
> If you are not the intended recipient, please contact the sender by
> reply e-mail and destroy all copies of the original message.
> Any unauthorized review, use, disclosure, dissemination, forwarding,
> printing or copying of this email or any action taken in reliance on this
> e-mail is strictly prohibited and may be unlawful.
>
>
>
> This e-mail and any files transmitted with it are for the sole use of
> the intended recipient(s) and may contain confidential and privileged
> information.
> If you are not the intended recipient, please contact the sender by
> reply e-mail and destroy all copies of the original message.
> Any unauthorized review, use, disclosure, dissemination, forwarding,
> printing or copying of this email or any action taken in reliance on this
> e-mail is strictly prohibited and may be unlawful.
>
>
>
> This e-mail and any files transmitted with it are for the sole use of
> the intended recipient(s) and may contain confidential and privileged
> information.
> If you are not the intended recipient, please contact the sender by
> reply e-mail and destroy all copies of the original message.
> Any unauthorized review, use, disclosure, dissemination, forwarding,
> printing or copying of this email or any action taken in reliance on this
> e-mail is strictly prohibited and may be unlawful.
>
> 

RE: Other document data.

Posted by Ro...@cognizant.com.
Hi,

Yes I get it. Thanks for the clarification.

I was doing the similar thing before and it used to run, now it didn't. So I got confused.

Is there any way to check if metadata is actually sent to solr? Because I am experiencing some problem there and I don't seem to figure out where it is going wrong.


Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001 
Rohan.GPatil@cognizant.com

-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com] 
Sent: Tuesday, June 15, 2010 2:13 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

LCF is an incremental crawler.  The version query is used to determine whether data needs to be refetched and reindexed.  If it returns the same thing each time the document is examined, the data query will not be run the second time.  I therefore suggest either the following:

(1) Supply no version query at all.  That signals to the connector that there is no version information and the data must be reindexed on every job run.
(2) Supply a version query that properly reflects changes to the data.  For instance, if there's a timestamp in each record, you can use that by itself ONLY if any metadata changes also are associated with a change in that timestamp.  If not, you will need to glom the metadata into the version string as well as the timestamp.  Is this understood?

If you want to FORCE a reindex, there is a link in the crawler-ui for the output connection which allows you to force reindexing of all data associated with that connection.

If this still doesn't seem to describe what you are seeing, please clarify further.

Thanks,
Karl

________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Tuesday, June 15, 2010 12:51 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

When we specify the metadata content, It runs fine the first time, The second time it doesn't run the data query at all. What must be the problem ?

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Sunday, June 13, 2010 7:04 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

No.  The data query (the same one that returns the blob info) can now include additional columns.  These columns will be sent to Solr as metadata fields.

Karl

________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Friday, June 11, 2010 2:28 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

I see that the issue is resolved.

Now is there a new query where in we can specify the metadata fields ?

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Thursday, June 10, 2010 4:12 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

It is not possible to properly glom other fields onto a BLOB unless you know that the blob's contents are always encoded text.  So I suggest you create a jira enhancement request in the Lucene Connector Framework project to describe this enhancement (adding metadata support to JDBC connector).

The url is: http://issues.apache.org/jira

You may need to create an account if you don't already have one.  Let me know if you have any difficulties.

Thanks,
Karl


-----Original Message-----
From: ext Rohan.GPatil@cognizant.com [mailto:Rohan.GPatil@cognizant.com]
Sent: Thursday, June 10, 2010 6:39 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.


Hi,

Using solution 1 was not a bad idea, but the problem is the content is stored as BLOB in the database and gluing other fields with BLOB is not possible (Is it ?) .

Regarding 2 : Yes I guess I can do that modification, and anyway it all depends on how we show it to the user.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com

-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Thursday, June 10, 2010 3:19 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

(1) The JDBC connector is currently relatively primitive and does not have any support for "document metadata" at this time.  You can, of course, glom together multiple fields into the content field with it, but that's pretty crude.
(2) The LCF convention for how to identify documents uniquely in the target index is to use the URL of the document.  All documents indexed with LCF have such a URL and it is likely to be both useful and unique.  This url is how LCF requests deletion of the document from the index, if necessary, and also overwrites the document.  So it maps pretty precisely to literal.id for the basic solr setup.  Now, it may be that this is too tied to the example, and that the solr connector should have a configuration setting to allow the name of the id field used to be changed - that sounds like a reasonable modification that would not be too difficult to do.  Is this something you are looking for?

Karl
________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Thursday, June 10, 2010 4:52 AM
To: connectors-user@incubator.apache.org
Subject: Other document data.

I am using JDBC connection to search for the documents in the database.

The issue is  some document data(Check in date etc) is present in the other columns. How to send this data to Solr so as to index it.

Why is the URL of the file taken as ID in Solr.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com<ma...@cognizant.com>

This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.




This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of 
the intended recipient(s) and may contain confidential and privileged 
information.
If you are not the intended recipient, please contact the sender by 
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding, 
printing or copying of this email or any action taken in reliance on this 
e-mail is strictly prohibited and may be unlawful.

  

RE: Other document data.

Posted by ka...@nokia.com.
LCF is an incremental crawler.  The version query is used to determine whether data needs to be refetched and reindexed.  If it returns the same thing each time the document is examined, the data query will not be run the second time.  I therefore suggest either the following:

(1) Supply no version query at all.  That signals to the connector that there is no version information and the data must be reindexed on every job run.
(2) Supply a version query that properly reflects changes to the data.  For instance, if there's a timestamp in each record, you can use that by itself ONLY if any metadata changes also are associated with a change in that timestamp.  If not, you will need to glom the metadata into the version string as well as the timestamp.  Is this understood?

If you want to FORCE a reindex, there is a link in the crawler-ui for the output connection which allows you to force reindexing of all data associated with that connection.

If this still doesn't seem to describe what you are seeing, please clarify further.

Thanks,
Karl

________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Tuesday, June 15, 2010 12:51 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

When we specify the metadata content, It runs fine the first time, The second time it doesn't run the data query at all. What must be the problem ?

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Sunday, June 13, 2010 7:04 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

No.  The data query (the same one that returns the blob info) can now include additional columns.  These columns will be sent to Solr as metadata fields.

Karl

________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Friday, June 11, 2010 2:28 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

I see that the issue is resolved.

Now is there a new query where in we can specify the metadata fields ?

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Thursday, June 10, 2010 4:12 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

It is not possible to properly glom other fields onto a BLOB unless you know that the blob's contents are always encoded text.  So I suggest you create a jira enhancement request in the Lucene Connector Framework project to describe this enhancement (adding metadata support to JDBC connector).

The url is: http://issues.apache.org/jira

You may need to create an account if you don't already have one.  Let me know if you have any difficulties.

Thanks,
Karl


-----Original Message-----
From: ext Rohan.GPatil@cognizant.com [mailto:Rohan.GPatil@cognizant.com]
Sent: Thursday, June 10, 2010 6:39 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.


Hi,

Using solution 1 was not a bad idea, but the problem is the content is stored as BLOB in the database and gluing other fields with BLOB is not possible (Is it ?) .

Regarding 2 : Yes I guess I can do that modification, and anyway it all depends on how we show it to the user.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com

-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Thursday, June 10, 2010 3:19 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

(1) The JDBC connector is currently relatively primitive and does not have any support for "document metadata" at this time.  You can, of course, glom together multiple fields into the content field with it, but that's pretty crude.
(2) The LCF convention for how to identify documents uniquely in the target index is to use the URL of the document.  All documents indexed with LCF have such a URL and it is likely to be both useful and unique.  This url is how LCF requests deletion of the document from the index, if necessary, and also overwrites the document.  So it maps pretty precisely to literal.id for the basic solr setup.  Now, it may be that this is too tied to the example, and that the solr connector should have a configuration setting to allow the name of the id field used to be changed - that sounds like a reasonable modification that would not be too difficult to do.  Is this something you are looking for?

Karl
________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Thursday, June 10, 2010 4:52 AM
To: connectors-user@incubator.apache.org
Subject: Other document data.

I am using JDBC connection to search for the documents in the database.

The issue is  some document data(Check in date etc) is present in the other columns. How to send this data to Solr so as to index it.

Why is the URL of the file taken as ID in Solr.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com<ma...@cognizant.com>

This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.




This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



RE: Other document data.

Posted by Ro...@cognizant.com.
Hi,

When we specify the metadata content, It runs fine the first time, The second time it doesn't run the data query at all. What must be the problem ?

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001 
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com] 
Sent: Sunday, June 13, 2010 7:04 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

No.  The data query (the same one that returns the blob info) can now include additional columns.  These columns will be sent to Solr as metadata fields.

Karl

________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Friday, June 11, 2010 2:28 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

I see that the issue is resolved.

Now is there a new query where in we can specify the metadata fields ?

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Thursday, June 10, 2010 4:12 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

It is not possible to properly glom other fields onto a BLOB unless you know that the blob's contents are always encoded text.  So I suggest you create a jira enhancement request in the Lucene Connector Framework project to describe this enhancement (adding metadata support to JDBC connector).

The url is: http://issues.apache.org/jira

You may need to create an account if you don't already have one.  Let me know if you have any difficulties.

Thanks,
Karl


-----Original Message-----
From: ext Rohan.GPatil@cognizant.com [mailto:Rohan.GPatil@cognizant.com]
Sent: Thursday, June 10, 2010 6:39 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.


Hi,

Using solution 1 was not a bad idea, but the problem is the content is stored as BLOB in the database and gluing other fields with BLOB is not possible (Is it ?) .

Regarding 2 : Yes I guess I can do that modification, and anyway it all depends on how we show it to the user.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com

-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Thursday, June 10, 2010 3:19 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

(1) The JDBC connector is currently relatively primitive and does not have any support for "document metadata" at this time.  You can, of course, glom together multiple fields into the content field with it, but that's pretty crude.
(2) The LCF convention for how to identify documents uniquely in the target index is to use the URL of the document.  All documents indexed with LCF have such a URL and it is likely to be both useful and unique.  This url is how LCF requests deletion of the document from the index, if necessary, and also overwrites the document.  So it maps pretty precisely to literal.id for the basic solr setup.  Now, it may be that this is too tied to the example, and that the solr connector should have a configuration setting to allow the name of the id field used to be changed - that sounds like a reasonable modification that would not be too difficult to do.  Is this something you are looking for?

Karl
________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Thursday, June 10, 2010 4:52 AM
To: connectors-user@incubator.apache.org
Subject: Other document data.

I am using JDBC connection to search for the documents in the database.

The issue is  some document data(Check in date etc) is present in the other columns. How to send this data to Solr so as to index it.

Why is the URL of the file taken as ID in Solr.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com<ma...@cognizant.com>

This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.




This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of 
the intended recipient(s) and may contain confidential and privileged 
information.
If you are not the intended recipient, please contact the sender by 
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding, 
printing or copying of this email or any action taken in reliance on this 
e-mail is strictly prohibited and may be unlawful.

  

RE: Other document data.

Posted by ka...@nokia.com.
No.  The data query (the same one that returns the blob info) can now include additional columns.  These columns will be sent to Solr as metadata fields.

Karl

________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Friday, June 11, 2010 2:28 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

I see that the issue is resolved.

Now is there a new query where in we can specify the metadata fields ?

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Thursday, June 10, 2010 4:12 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

It is not possible to properly glom other fields onto a BLOB unless you know that the blob's contents are always encoded text.  So I suggest you create a jira enhancement request in the Lucene Connector Framework project to describe this enhancement (adding metadata support to JDBC connector).

The url is: http://issues.apache.org/jira

You may need to create an account if you don't already have one.  Let me know if you have any difficulties.

Thanks,
Karl


-----Original Message-----
From: ext Rohan.GPatil@cognizant.com [mailto:Rohan.GPatil@cognizant.com]
Sent: Thursday, June 10, 2010 6:39 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.


Hi,

Using solution 1 was not a bad idea, but the problem is the content is stored as BLOB in the database and gluing other fields with BLOB is not possible (Is it ?) .

Regarding 2 : Yes I guess I can do that modification, and anyway it all depends on how we show it to the user.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com

-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Thursday, June 10, 2010 3:19 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

(1) The JDBC connector is currently relatively primitive and does not have any support for "document metadata" at this time.  You can, of course, glom together multiple fields into the content field with it, but that's pretty crude.
(2) The LCF convention for how to identify documents uniquely in the target index is to use the URL of the document.  All documents indexed with LCF have such a URL and it is likely to be both useful and unique.  This url is how LCF requests deletion of the document from the index, if necessary, and also overwrites the document.  So it maps pretty precisely to literal.id for the basic solr setup.  Now, it may be that this is too tied to the example, and that the solr connector should have a configuration setting to allow the name of the id field used to be changed - that sounds like a reasonable modification that would not be too difficult to do.  Is this something you are looking for?

Karl
________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Thursday, June 10, 2010 4:52 AM
To: connectors-user@incubator.apache.org
Subject: Other document data.

I am using JDBC connection to search for the documents in the database.

The issue is  some document data(Check in date etc) is present in the other columns. How to send this data to Solr so as to index it.

Why is the URL of the file taken as ID in Solr.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com<ma...@cognizant.com>

This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.




This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



RE: Other document data.

Posted by Ro...@cognizant.com.
Hi,

I see that the issue is resolved.

Now is there a new query where in we can specify the metadata fields ?

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001 
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com] 
Sent: Thursday, June 10, 2010 4:12 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

It is not possible to properly glom other fields onto a BLOB unless you know that the blob's contents are always encoded text.  So I suggest you create a jira enhancement request in the Lucene Connector Framework project to describe this enhancement (adding metadata support to JDBC connector).

The url is: http://issues.apache.org/jira

You may need to create an account if you don't already have one.  Let me know if you have any difficulties.

Thanks,
Karl


-----Original Message-----
From: ext Rohan.GPatil@cognizant.com [mailto:Rohan.GPatil@cognizant.com] 
Sent: Thursday, June 10, 2010 6:39 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.


Hi,

Using solution 1 was not a bad idea, but the problem is the content is stored as BLOB in the database and gluing other fields with BLOB is not possible (Is it ?) .

Regarding 2 : Yes I guess I can do that modification, and anyway it all depends on how we show it to the user.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001 
Rohan.GPatil@cognizant.com

-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com] 
Sent: Thursday, June 10, 2010 3:19 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

(1) The JDBC connector is currently relatively primitive and does not have any support for "document metadata" at this time.  You can, of course, glom together multiple fields into the content field with it, but that's pretty crude.
(2) The LCF convention for how to identify documents uniquely in the target index is to use the URL of the document.  All documents indexed with LCF have such a URL and it is likely to be both useful and unique.  This url is how LCF requests deletion of the document from the index, if necessary, and also overwrites the document.  So it maps pretty precisely to literal.id for the basic solr setup.  Now, it may be that this is too tied to the example, and that the solr connector should have a configuration setting to allow the name of the id field used to be changed - that sounds like a reasonable modification that would not be too difficult to do.  Is this something you are looking for?

Karl
________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Thursday, June 10, 2010 4:52 AM
To: connectors-user@incubator.apache.org
Subject: Other document data.

I am using JDBC connection to search for the documents in the database.

The issue is  some document data(Check in date etc) is present in the other columns. How to send this data to Solr so as to index it.

Why is the URL of the file taken as ID in Solr.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com<ma...@cognizant.com>

This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.




This e-mail and any files transmitted with it are for the sole use of 
the intended recipient(s) and may contain confidential and privileged 
information.
If you are not the intended recipient, please contact the sender by 
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding, 
printing or copying of this email or any action taken in reliance on this 
e-mail is strictly prohibited and may be unlawful.

  

This e-mail and any files transmitted with it are for the sole use of 
the intended recipient(s) and may contain confidential and privileged 
information.
If you are not the intended recipient, please contact the sender by 
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding, 
printing or copying of this email or any action taken in reliance on this 
e-mail is strictly prohibited and may be unlawful.

  

RE: Other document data.

Posted by ka...@nokia.com.
It is not possible to properly glom other fields onto a BLOB unless you know that the blob's contents are always encoded text.  So I suggest you create a jira enhancement request in the Lucene Connector Framework project to describe this enhancement (adding metadata support to JDBC connector).

The url is: http://issues.apache.org/jira

You may need to create an account if you don't already have one.  Let me know if you have any difficulties.

Thanks,
Karl


-----Original Message-----
From: ext Rohan.GPatil@cognizant.com [mailto:Rohan.GPatil@cognizant.com] 
Sent: Thursday, June 10, 2010 6:39 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.


Hi,

Using solution 1 was not a bad idea, but the problem is the content is stored as BLOB in the database and gluing other fields with BLOB is not possible (Is it ?) .

Regarding 2 : Yes I guess I can do that modification, and anyway it all depends on how we show it to the user.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001 
Rohan.GPatil@cognizant.com

-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com] 
Sent: Thursday, June 10, 2010 3:19 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

(1) The JDBC connector is currently relatively primitive and does not have any support for "document metadata" at this time.  You can, of course, glom together multiple fields into the content field with it, but that's pretty crude.
(2) The LCF convention for how to identify documents uniquely in the target index is to use the URL of the document.  All documents indexed with LCF have such a URL and it is likely to be both useful and unique.  This url is how LCF requests deletion of the document from the index, if necessary, and also overwrites the document.  So it maps pretty precisely to literal.id for the basic solr setup.  Now, it may be that this is too tied to the example, and that the solr connector should have a configuration setting to allow the name of the id field used to be changed - that sounds like a reasonable modification that would not be too difficult to do.  Is this something you are looking for?

Karl
________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Thursday, June 10, 2010 4:52 AM
To: connectors-user@incubator.apache.org
Subject: Other document data.

I am using JDBC connection to search for the documents in the database.

The issue is  some document data(Check in date etc) is present in the other columns. How to send this data to Solr so as to index it.

Why is the URL of the file taken as ID in Solr.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com<ma...@cognizant.com>

This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.




This e-mail and any files transmitted with it are for the sole use of 
the intended recipient(s) and may contain confidential and privileged 
information.
If you are not the intended recipient, please contact the sender by 
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding, 
printing or copying of this email or any action taken in reliance on this 
e-mail is strictly prohibited and may be unlawful.

  

RE: Other document data.

Posted by Ro...@cognizant.com.
Hi,

Using solution 1 was not a bad idea, but the problem is the content is stored as BLOB in the database and gluing other fields with BLOB is not possible (Is it ?) .

Regarding 2 : Yes I guess I can do that modification, and anyway it all depends on how we show it to the user.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001 
Rohan.GPatil@cognizant.com

-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com] 
Sent: Thursday, June 10, 2010 3:19 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

(1) The JDBC connector is currently relatively primitive and does not have any support for "document metadata" at this time.  You can, of course, glom together multiple fields into the content field with it, but that's pretty crude.
(2) The LCF convention for how to identify documents uniquely in the target index is to use the URL of the document.  All documents indexed with LCF have such a URL and it is likely to be both useful and unique.  This url is how LCF requests deletion of the document from the index, if necessary, and also overwrites the document.  So it maps pretty precisely to literal.id for the basic solr setup.  Now, it may be that this is too tied to the example, and that the solr connector should have a configuration setting to allow the name of the id field used to be changed - that sounds like a reasonable modification that would not be too difficult to do.  Is this something you are looking for?

Karl
________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Thursday, June 10, 2010 4:52 AM
To: connectors-user@incubator.apache.org
Subject: Other document data.

I am using JDBC connection to search for the documents in the database.

The issue is  some document data(Check in date etc) is present in the other columns. How to send this data to Solr so as to index it.

Why is the URL of the file taken as ID in Solr.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com<ma...@cognizant.com>

This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.




This e-mail and any files transmitted with it are for the sole use of 
the intended recipient(s) and may contain confidential and privileged 
information.
If you are not the intended recipient, please contact the sender by 
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding, 
printing or copying of this email or any action taken in reliance on this 
e-mail is strictly prohibited and may be unlawful.

  

RE: Other document data.

Posted by ka...@nokia.com.
(1) The JDBC connector is currently relatively primitive and does not have any support for "document metadata" at this time.  You can, of course, glom together multiple fields into the content field with it, but that's pretty crude.
(2) The LCF convention for how to identify documents uniquely in the target index is to use the URL of the document.  All documents indexed with LCF have such a URL and it is likely to be both useful and unique.  This url is how LCF requests deletion of the document from the index, if necessary, and also overwrites the document.  So it maps pretty precisely to literal.id for the basic solr setup.  Now, it may be that this is too tied to the example, and that the solr connector should have a configuration setting to allow the name of the id field used to be changed - that sounds like a reasonable modification that would not be too difficult to do.  Is this something you are looking for?

Karl
________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Thursday, June 10, 2010 4:52 AM
To: connectors-user@incubator.apache.org
Subject: Other document data.

I am using JDBC connection to search for the documents in the database.

The issue is  some document data(Check in date etc) is present in the other columns. How to send this data to Solr so as to index it.

Why is the URL of the file taken as ID in Solr.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com<ma...@cognizant.com>

This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.