Wednesday, 17 August 2016

Eclipse IDE setup for Apache Spark, Scala with Maven:


Hello all in this post I am planning to explore how to integrate spark and scala in to Eclipse. I am assuming that every one has eclipse setup has been done already and has some basic knowledge on eclipse.




Scala IDE setup:
I am having Eclipse Mars in Laptop I will be using the same to setup Scala IDE. Now open eclipse and open Marketplace under help.

 



In eclipse marketplace search for scala and click on install it, it would take few minutes. After that you will be prompted to conform Scala IDE futures.





























Note: After successful installation eclipse may ask you to restart, just restart to get the changes reflected into eclipse.

Once eclipse is restarted we have to setup Scala JDT setting as shown bellow (you can choose the JDT recommendation or other settings):


With this we are done with Scala IDE setup. A successful setup will have scala perspective in Eclipse perspective as shown below and select Scala perspective & then right click on project explorer.




Spark integration:

Continuing from above we will create a maven project and integrate spark in to maven project using pom.xml. Let's see how to create maven project:





After clicking on finish we see maven project created as below(it may take few minutes to create project):



This project is created as a java project to make this scala project we need to add scala nature to it by clicking on project properties --> configuration --> Add scala Nature



Lets integrate Spark into Eclipse using maven pom.xml:

Open pom.xml and Spark & Scala dependencies as shown below


save the pom.xml changes it build the work space

Now we will create a scala object & provide a name to it:



I am pasting LineCount.scala code here for your reference.


import org.apache.spark.SparkContext
import org.apache.spark.SparkConf

object WordCount{
  def main(args: Array[String]) {
    System.setProperty("hadoop.home.dir", "D:\\Spark\\")

    val logFile = "D:\\Softwares\\Spark_Binary\\spark-1.3.1-bin-hadoop2.4\\README.md" // Should be some file on your system
    val conf = new SparkConf().setAppName("Simple Application").setMaster("local")
    val sc = new SparkContext(conf)
    val logData = sc.textFile(logFile, 2).cache()
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
  }
}

Now we can run this scala program using spark by just clicking on run

final result looks as below:

Key search match: maven  apache spark scacla eclipse 
Key Words: MAVEN , Maven , mvn , SPARK  , Spark, spark , Scala , scala , SCALA . Eclipse , eclipse , ECLIPSE , Lambda , Hadoop , Big Data . 

No comments:

Post a Comment