You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Heiko Henning <he...@freenet.de> on 2009/06/12 09:54:27 UTC

is this feasible with couchDb

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
 
Hello,

I have just listen the CCC exp podcast with pleasurable it.
And find your Database very interesting.
I would also like to use it for  http://www.jepaa.com/ and just like to
ask you if this is realistic:

One page looks like this:
{
  "domain" : "anzeigenmarkt.tel",
  "txt" : "bli bla blub",
  "contact" : [
    {   
        "name" : "Anfrage",
        "ort" : "work",
        "data" : "info@domain.de",
        "type" : "mail"
    },
    {   
        "name" : "Suppport",
        "ort"  : "work",
        "data" : "support@domain.de",
        "type" : "mail"
    },
   ],
  "location" : {
        "lat" : 145.32423432,
        "lon" : 232.232
   },
  "keywords" : [
    {
     "ul" : "Geschäft",
     "fn" : "Max",
     "ln" : "Muster",
     "nn" : "musti",
     "st" : "mustergasse"
    },
    {
     "ul" : "Privat",
     "fn" : "Max",
     "ln" : "Muster",
     "nn" : "musti",
     "st" : "dorfstrasse"
    },
   
  ]
}
 
Of this there are around 300 000 pieces.

Now I didt like to have an fultext Index with rating:

var stopwordfilter = new Array('the', 'or', for', 'them' .....)
function splitIt(txt)
{
 var data = txt.split("/[\s\,\.\!\?\-\_]+/");
 for(i in stopwordfilter)
  for(x in data)
   if (data[x]==stopwordfilter[i])
    delete data[x];
   
 return data;   
}

var domainParts = page.domain.split("/\./");
domainParts.arrayReverse();
var points = 5;
for(var i=1 ; i<domainParts.length ; i++)
{
 var basePoints = 15;
 for(var l = domainParts[i].length-1 ; l>0 ; l--)
 {
  addToIndex(domainParts.substr(0, l), points*basePoints);
  basePoints--;
 }
 
 points=points/2;
}

var words = splitIt(page.txt);
for(var i=1 ; i<words.length ; i++)
{
 var points = 10;
 for(var l = words[i].length-1 ; l>0 ; l--)
 {
  addToIndex(words.substr(0, l), points);
  points--;
 }
}

Also an Fultext Index over all Text in the Page, but with diferent
points per Word.
A word in Doimain gives 15 Point a word in txt gives 10 Points and so on.

If a word will only find as an part the it gives 90% of the points and
so on and so on.

Now to the Search
Searchstring: "the wet green grass"

It will search for "wet" or "green" or "grass" and page.location.lat
between 100 and 130 and page.location.lon between 200 and 220

The Ordering should be like this:
if will found in one page wet and grass the rating will be multipilcatetd.
And then it will order by all ratings per side descending

Is this realistic for couchDb and will it be performant?

Friendly regards Heiko
 
Please my worst english here you will find the german Version:
http://mail-archives.apache.org/mod_mbox/couchdb-user/200906.mbox/%3C4A314597.7060102@freenet.de%3E
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
 
iD8DBQFKMgmzNPIVS5vtVToRAse9AJ0TI2NQNvG7b+NgmbYWiHuD4HYoNQCffaGu
sN9RCCmiV47q7h0K/aLRjJI=
=rsEr
-----END PGP SIGNATURE-----