You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucy.apache.org by Milind Gupta <mi...@gmail.com> on 2016/07/01 18:53:13 UTC

[lucy-user] Re: Simple Tutorial Example

Can anyone help me running the SimpleTutorial.

Thanks,
Milind

On Wed, Jun 1, 2016 at 3:53 PM, Milind Gupta <mi...@gmail.com> wrote:

> Hi,
>         I tried running the Simple Tutorial example given on teh page:
> http://lucy.apache.org/docs/perl/Lucy/Docs/Tutorial/SimpleTutorial.html.
> It compiles fine. When I ran it I got an error saying "Can't extract
> title/bodytext from amend1.txt". After I changed the pattern in the fscanf
> by removing the 2 m's after the % signs then it worked. But I don't see the
> index file being created. Running the Search program returns that it cannot
> find the index file. Is there some command missing to actually write the
> index file to the disk?
>         I am running this on Windows 10 and this is 0.5.1 version of
> Apache Lucy.
>
> Thanks,
> Milind
>
>

Re: [lucy-user] Re: Simple Tutorial Example

Posted by Milind Gupta <mi...@gmail.com>.
Thanks for pointing me in the right direction. I will open a issue to make
the C99 compliant.

Milind


On Fri, Jul 1, 2016 at 2:15 PM, Nick Wellnhofer <we...@aevum.de> wrote:

> On 01/07/2016 21:57, Milind Gupta wrote:
>
>> 2. I had to remove the 'm' characters in the fscan format to make it
>>  "%[^\r\n] %[\x01-\x7F]" from  "%m[^\r\n] %m[\x01-\x7F]". After removing
>> the 'm' the Release executable works but does nothing otherwise with the m
>> it just gives the error message that it cannot extract the title/bodytext
>>
>
> The `m` modifier is a POSIX 2008 extension to the scanf functions. It's
> not supported by the Microsoft CRT. Working around this limitation is a
> basic C programming exercise. We can't help you with this kind of questions
> on this list, but maybe this question on StackOverflow can point you in the
> right direction:
>
>     http://stackoverflow.com/questions/3911547
>
> It would be nice if the tutorial code was strictly C99 compliant, so feel
> free to open an issue in the Lucy bug tracker.
>
> Nick
>
>

Re: [lucy-user] Re: Simple Tutorial Example

Posted by Nick Wellnhofer <we...@aevum.de>.
On 01/07/2016 21:57, Milind Gupta wrote:
> 2. I had to remove the 'm' characters in the fscan format to make it
>  "%[^\r\n] %[\x01-\x7F]" from  "%m[^\r\n] %m[\x01-\x7F]". After removing
> the 'm' the Release executable works but does nothing otherwise with the m
> it just gives the error message that it cannot extract the title/bodytext

The `m` modifier is a POSIX 2008 extension to the scanf functions. It's not 
supported by the Microsoft CRT. Working around this limitation is a basic C 
programming exercise. We can't help you with this kind of questions on this 
list, but maybe this question on StackOverflow can point you in the right 
direction:

     http://stackoverflow.com/questions/3911547

It would be nice if the tutorial code was strictly C99 compliant, so feel free 
to open an issue in the Lucy bug tracker.

Nick


[lucy-user] Re: Simple Tutorial Example

Posted by Milind Gupta <mi...@gmail.com>.
As a reference here is my code listing. Some points to note are:
1. In the debug mode (compiler flag -g) the relative path does not work I
have to use the absolute path for opendir to work
2. I had to remove the 'm' characters in the fscan format to make it
 "%[^\r\n] %[\x01-\x7F]" from  "%m[^\r\n] %m[\x01-\x7F]". After removing
the 'm' the Release executable works but does nothing otherwise with the m
it just gives the error message that it cannot extract the title/bodytext
3. If I have the m characters removed the debug mode shows segmentation
fault at this line: String *value = Str_new_from_utf8(title, strlen(title));

Any help and pointers to help me get started would be very helpful.

Thanks,
Milind

-------------------------------------------------CODE
BELOW----------------------------------------------------------

/* Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

#include <dirent.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define CFISH_USE_SHORT_NAMES
#define LUCY_USE_SHORT_NAMES
#include "Clownfish/String.h"
#include "Lucy/Simple.h"
#include "Lucy/Document/Doc.h"

const char path_to_index[] = "D:/lucy_index";
//const char uscon_source[]  = "E:/Milind/Technical/My Work/My
Programs/C_C++/__Playground/Apache
Lucy/SimpleTutorial/common/sample/us_constitution";
const char uscon_source[]  = "../../common/sample/us_constitution";

bool S_ends_with(const char *str, const char *postfix) {
    size_t len         = strlen(str);
    size_t postfix_len = strlen(postfix);

    return len >= postfix_len
           && memcmp(str + len - postfix_len, postfix, postfix_len) == 0;
}

Doc* S_parse_file(const char *filename) {
    size_t bytes = strlen(uscon_source) + 1 + strlen(filename) + 1;
    char *path = (char*)malloc(bytes);
    path[0] = '\0';
    strcat(path, uscon_source);
    strcat(path, "/");
    strcat(path, filename);

    FILE *stream = fopen(path, "rb");
    if (stream == NULL) {
        perror(path);
        exit(1);
    }

    char *title    = NULL;
    char *bodytext = NULL;
    if (fscanf(stream, "%[^\r\n] %[\x01-\x7F]", &title, &bodytext) != 2) {
        fprintf(stderr, "Can't extract title/bodytext from '%s'", path);
        exit(1);
    }

    Doc *doc = Doc_new(NULL, 0);

    {
        // Store 'title' field
        String *field = Str_newf("title");
        String *value = Str_new_from_utf8(title, strlen(title));
        Doc_Store(doc, field, (Obj*)value);
        DECREF(field);
        DECREF(value);
    }

    {
        // Store 'content' field
        String *field = Str_newf("content");
        String *value = Str_new_from_utf8(bodytext, strlen(bodytext));
        Doc_Store(doc, field, (Obj*)value);
        DECREF(field);
        DECREF(value);
    }

    {
        // Store 'url' field
        String *field = Str_newf("url");
        String *value = Str_new_from_utf8(filename, strlen(filename));
        Doc_Store(doc, field, (Obj*)value);
        DECREF(field);
        DECREF(value);
    }

    fclose(stream);
    free(bodytext);
    free(title);
    free(path);
    return doc;
}

int main() {
    // Initialize the library.
    lucy_bootstrap_parcel();

    String *folder   = Str_newf("%s", path_to_index);
    String *language = Str_newf("en");
    Simple *lucy     = Simple_new((Obj*)folder, language);

    DIR *dir = opendir(uscon_source);
    if (dir == NULL) {
        perror(uscon_source);
        return 1;
    }

    printf("Directory opened\n");

    for (struct dirent *entry = readdir(dir); entry; entry = readdir(dir)) {

        if (S_ends_with(entry->d_name, ".txt")) {
            Doc *doc = S_parse_file(entry->d_name);
            Simple_Add_Doc(lucy, doc); // ta-da!
            DECREF(doc);
        }
    }

    closedir(dir);

    DECREF(lucy);
    DECREF(language);
    DECREF(folder);
    return 0;
}

--------------------------------------------CODE
ENDS--------------------------------------------------

On Fri, Jul 1, 2016 at 11:53 AM, Milind Gupta <mi...@gmail.com>
wrote:

> Can anyone help me running the SimpleTutorial.
>
> Thanks,
> Milind
>
> On Wed, Jun 1, 2016 at 3:53 PM, Milind Gupta <mi...@gmail.com>
> wrote:
>
>> Hi,
>>         I tried running the Simple Tutorial example given on teh page:
>> http://lucy.apache.org/docs/perl/Lucy/Docs/Tutorial/SimpleTutorial.html.
>> It compiles fine. When I ran it I got an error saying "Can't extract
>> title/bodytext from amend1.txt". After I changed the pattern in the fscanf
>> by removing the 2 m's after the % signs then it worked. But I don't see the
>> index file being created. Running the Search program returns that it cannot
>> find the index file. Is there some command missing to actually write the
>> index file to the disk?
>>         I am running this on Windows 10 and this is 0.5.1 version of
>> Apache Lucy.
>>
>> Thanks,
>> Milind
>>
>>
>