You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Amjad ALSHABANI <as...@gmail.com> on 2016/02/22 09:32:04 UTC

Loading file into executor classpath

Hello everybody,

I ve implemented a Loganalyzer program in spark, which takes the logs from
an apache log file and translate it to a given object,

The regex of the log file is GROK, so I m using GROK library to extract the
desired field

When running the application locally, it succeded without any problem, but
when deploying it to yarn (with multiple nodes) I m having an issue with
the pattern file that could not be found

file:/hadoop-disk1/yarn/local/usercache/hadoop/appcache/application_1454418114641_7429/container_1454418114641_7429_01_000002/./myFatJat-jar-with-dependencies.jar!/haproxy_pattern.txt

where the haproxy_pattern.txt is the GROK file

I submit my jar as the following:

$ spark-submit --master yarn-client --class
com.vsct.dt.bigdata.cdn.app.MainRunner  --conf
spark.driver.extraClassPath=conf/ --conf
spark.executor.extraClassPath=conf/  myFatJat-jar-with-dependencies.jar

My haproxy_pattern.txt file existe in the sub-directory conf/



More details:

th grok API I m using is :
            <groupId>io.thekraken</groupId>
            <artifactId>grok</artifactId>
            <version>0.1.1</version>


My code looks like:
the map code:



        JavaRDD<String> rawLog = sc.textFile(configuration.getInput());
        JavaRDD<LogEntry> logEntryRDD = rawLog.map(new Function<String,
LogEntry>() {

            private static final long serialVersionUID = 1L;

            @Override
            public LogEntry call(String raw_line) throws Exception {
                grokReader = new SparkGrokReader(configuration);
                *LogEntry logEntry = grokReader.read(raw_line);*
                return logEntry;
            }
        }).cache();


the method which will extract the fields from the grok:

public LogEntry read(String raw_line) {
LogEntry logEntry = null;
        try {
            Match gm = grok.match(raw_line);
            gm.captures();
            logEntry = buildLogentry(gm.toJson());
        } catch (NullPointerException npe) {
            logger.warn("Line could not be parsed by GROK: {}", raw_line);
}
return logEntry;
}