You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by Andrea Asta <as...@gmail.com> on 2015/05/07 16:44:56 UTC

File System output connector error

Hello,
I'm new on ManifoldCF, having some issues while trying to perform a simple
job: crawling a website and storing results on a file system folder.

The job crashes with an error while trying to save a file from an article
having, for example, not acceptable chars in the name (? and similar). Is
there a way to just let it replace them and always not stopping the job?

Example of error:
Error: Could not create file
'E:\ManifoldCF\http\nypost.com\2015\05\06\bloombergs-the-man-to-beat-hillary-for-democratic-nomination?msg=fail&shared=email':
E:\ManifoldCF\http\nypost.com\2015\05\06\bloombergs-the-man-to-beat-hillary-for-democratic-nomination?msg=fail&shared=email
(The filename, directory name, or volume label syntax is incorrect)

Thank you.
Andrea

Re: File System output connector error

Posted by Karl Wright <da...@gmail.com>.
Hi Andrea,

The file system output connector was intended to emulate wget.
Unfortunately, this has two major problems: (1) wget is a unix utility, so
it obeys unix file rules, and (2) wget does not have any kind of formal
specification, so whenever anyone finds something weird we need to research
what wget does in that case.

We're open to any improvements that keep us / make us compatible with
wget.  If you can do the research that identifies where we differ, we're
happy to do changes needed to take care of that.  It is probably also
possible to just "skip" documents that the local OS can't handle, if that's
what you think is best in this case.  Please open whatever tickets make
sense, given that.

Karl


On Thu, May 7, 2015 at 10:44 AM, Andrea Asta <as...@gmail.com> wrote:

> Hello,
> I'm new on ManifoldCF, having some issues while trying to perform a simple
> job: crawling a website and storing results on a file system folder.
>
> The job crashes with an error while trying to save a file from an article
> having, for example, not acceptable chars in the name (? and similar). Is
> there a way to just let it replace them and always not stopping the job?
>
> Example of error:
> Error: Could not create file 'E:\ManifoldCF\http\nypost.com\2015\05\06\bloombergs-the-man-to-beat-hillary-for-democratic-nomination?msg=fail&shared=email':
> E:\ManifoldCF\http\nypost.com\2015\05\06\bloombergs-the-man-to-beat-hillary-for-democratic-nomination?msg=fail&shared=email
> (The filename, directory name, or volume label syntax is incorrect)
>
> Thank you.
> Andrea
>