You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Visual Logic <vi...@gmail.com> on 2010/05/30 19:33:19 UTC

Using JSON for index input and search output

Lucene,

JSON is the format used for all the configuration and property files in the RIA application we are developing. Is Lucene able to create a document from a given JSON file and index it? Is Lucene able to provide a JSON output response from a query made to an index? Does the Tika package provide this?

Local indexing and searching is needed on the local client so Solr is not a solution even though it does provide a search response in JSON format.

VL
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Using JSON for index input and search output

Posted by Visual Logic <vi...@gmail.com>.
This is interesting.

I'm coming into this with only the knowledge I gained from the 1st version of the Lucene in Action book, which I read a few years ago. So when I think of embedded I think that Solr would be integrated into my app like any API would be. Instead you say Solr is a separate process that could work adjunct to my app. In other words in my mind it would be a extension module or even plugin for my app. Would be possible to start/stop it from my app as well? Hope I'm seeing the picture clearly now?

What I find most useful in Lucene is its searching capability. Since I have pre-structured data which are the different configurations the different tools and tiers of my app use the analysis stage is not that involved. Once data is indexed, likely in several indexes/cores, I would like to search for specific documents and set them to be the current context for different tools. There are a number different contexts and index documents specific to each of them. I guess what I am after is a searchable configuration engine for a RIA.

Some of my use cases can tolerate a bit of latency but for others it would not work so well. Here are two of the more dynamic use cases (slightly simplified to make them clear) where searching has to be blazingly fast:

1 - I have a draw tool that places different shape types while dragging (each drag forms a string of shapes) across a canvas. I track the types of shapes the current drag has placed so far and then what to recommend other shape strings (prior drawn strings are indexed as they were made) that are similar in order to help the user complete the current draw string. The query would be created on the fly based on what shapes have been drawn so far and then used to search the index, the search would also need to be refreshed every time the user moves the mouse. This use case is similar to type-ahead phrase completion but completely mouse driven with fired searches in the background and immediate graphics on the canvas. Will Solr running as a separate local process be able to keep up with the user driven drawing?

2 - My app features a Painter with Renderers for different shapes. The many possible layout and style details for these shapes and their relation to each other are defined as configurations per Renderer. The index would contain multiple versions of a configuration for a given Renderer. A search would then be used to bring up a set of configurations, the set would have a configuration for each Renderer. This set would then be the active rules (context) for rendering each type of shape. Then as the user creates shapes by drawing different shape types they are rendered by their specific Renderer. The given Renderer in turn looks up all the rules for layout and styling it should have from the active set of configurations. The question is if the lookup can be on the fly searches to a index or if it is better to pull up the set of configurations from the index and transfer them to hashmaps that the Renderers would query instead of querying the index directly? Would Solr be able to keep up with all the real-time lookups the Renders would be making?

VL


On 2010-05-30, at 1:38 PM, Yonik Seeley wrote:

> On Sun, May 30, 2010 at 2:27 PM, Visual Logic <vi...@gmail.com> wrote:
>> Solr is embeddable but does that not just mean that SolrJ only provides the ability to call Solr running on some server?
> 
> Nope - embeddable as in running in the same JVM as your application.
> 
>> For some of my use cases using Solr on a remote server would work fine. For other cases it will not be quick enough,
> 
> Running as a separate server can be on the same host and be very
> quick.  Was it too slow when you tried it?
> It's a common misconception that HTTP is slow... it's really just a
> TCP socket (which can be reused with persistent connections) with some
> standardized headers.  Solr also has a binary protocol that works just
> fine over HTTP, so it's really not more overhead than doing something
> like talking to a database.
> 
> But the right solution probably depends on the details of your
> specific usecases - if you elaborate on them, people may be able to
> provide more specific recommendations.
> 
> -Yonik
> http://www.lucidimagination.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Using JSON for index input and search output

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Sun, May 30, 2010 at 2:27 PM, Visual Logic <vi...@gmail.com> wrote:
> Solr is embeddable but does that not just mean that SolrJ only provides the ability to call Solr running on some server?

Nope - embeddable as in running in the same JVM as your application.

> For some of my use cases using Solr on a remote server would work fine. For other cases it will not be quick enough,

Running as a separate server can be on the same host and be very
quick.  Was it too slow when you tried it?
It's a common misconception that HTTP is slow... it's really just a
TCP socket (which can be reused with persistent connections) with some
standardized headers.  Solr also has a binary protocol that works just
fine over HTTP, so it's really not more overhead than doing something
like talking to a database.

But the right solution probably depends on the details of your
specific usecases - if you elaborate on them, people may be able to
provide more specific recommendations.

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Using JSON for index input and search output

Posted by Visual Logic <vi...@gmail.com>.
Thanks for this helpful information.

Solr is embeddable but does that not just mean that SolrJ only provides the ability to call Solr running on some server? As far as I know embedding Solr is not really embedding it at all, only using it remotely where a internet connection needs to be maintained in order to do so. If Solr were truly embeddable how large of a memory footprint would that add to my app?

For some of my use cases using Solr on a remote server would work fine. For other cases it will not be quick enough, plus I want the user and other tool aspects of the local client to be able to utilize Lucene search for a lot more than only accessing resources on a remote server.

Sorry to hear there is no out of the box support for JSON input/output in Lucene but that can be overcome with a bit of local code.

VL


On 2010-05-30, at 12:46 PM, Yonik Seeley wrote:

> On Sun, May 30, 2010 at 1:33 PM, Visual Logic <vi...@gmail.com> wrote:
>> JSON is the format used for all the configuration and property files in the RIA application we are developing. Is Lucene able to create a document from a given JSON file and index it? Is Lucene able to provide a JSON output response from a query made to an index? Does the Tika package provide this?
> 
> No, and no.
> XML, JSON, etc, are out of scope for lucene, which is a core search library.
> Tika extracts text from documents like Word and PDF.
> 
>> Local indexing and searching is needed on the local client so Solr is not a solution even though it does provide a search response in JSON format.
> 
> Solr is embeddable as well, so you can directly index/search.  But why
> can't you run a separate server?
> 
> -Yonik
> http://www.lucidimagination.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Using JSON for index input and search output

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Sun, May 30, 2010 at 1:33 PM, Visual Logic <vi...@gmail.com> wrote:
> JSON is the format used for all the configuration and property files in the RIA application we are developing. Is Lucene able to create a document from a given JSON file and index it? Is Lucene able to provide a JSON output response from a query made to an index? Does the Tika package provide this?

No, and no.
XML, JSON, etc, are out of scope for lucene, which is a core search library.
Tika extracts text from documents like Word and PDF.

> Local indexing and searching is needed on the local client so Solr is not a solution even though it does provide a search response in JSON format.

Solr is embeddable as well, so you can directly index/search.  But why
can't you run a separate server?

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Using JSON for index input and search output

Posted by Otis Gospodnetic <ot...@yahoo.com>.
VL,

Solr (not Lucene, but you can embed Solr) has JsonUpdateRequestHandler, which lets you send docs to Solr for indexing in JSON (instead of the usual XML):
http://search-lucene.com/c/Solr:/src/java/org/apache/solr/handler/JsonUpdateRequestHandler.java


And you can get Solr to respond with JSON, as you pointed out:
http://wiki.apache.org/solr/SolJSON

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Visual Logic <vi...@gmail.com>
> To: java-user@lucene.apache.org
> Sent: Sun, May 30, 2010 1:33:19 PM
> Subject: Using JSON for index input and search output
> 
> Lucene,

JSON is the format used for all the configuration and property 
> files in the RIA application we are developing. Is Lucene able to create a 
> document from a given JSON file and index it? Is Lucene able to provide a JSON 
> output response from a query made to an index? Does the Tika package provide 
> this?

Local indexing and searching is needed on the local client so Solr 
> is not a solution even though it does provide a search response in JSON 
> format.

VL
---------------------------------------------------------------------
To 
> unsubscribe, e-mail: 
> href="mailto:java-user-unsubscribe@lucene.apache.org">java-user-unsubscribe@lucene.apache.org
For 
> additional commands, e-mail: 
> ymailto="mailto:java-user-help@lucene.apache.org" 
> href="mailto:java-user-help@lucene.apache.org">java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org