Using spark-shell and spark-submit
SnappyData, out-of-the-box, colocates Spark executors and the SnappyData store for efficient data intensive computations. You, however, may need to isolate the computational cluster for other reasons. For instance, a computationally intensive Map-reduce machine learning algorithm that needs to iterate over a cached data set repeatedly. Refer to SnappyData Smart Connector Mode for examples.
To support such cases it is also possible to run native Spark jobs that access a SnappyData cluster as a storage layer in a parallel fashion. To connect to the SnappyData store the
spark.snappydata.connection property should be provided while starting the Spark-shell.
To run all SnappyData functionalities, you need to create a SnappySession.
// from the SnappyData base directory // Start the Spark shell in local mode. Pass SnappyData's locators host:clientPort as a conf parameter. $ ./bin/spark-shell --master local[*] --conf spark.snappydata.connection=locatorhost:clientPort --conf spark.ui.port=4041 scala> // Try few commands on the spark-shell. Following command shows the tables created using the snappy-sql scala> val snappy = new org.apache.spark.sql.SnappySession(spark.sparkContext) scala> val airlineDF = snappy.table("airline").show scala> val resultset = snappy.sql("select * from airline")
Any Spark application can also use the SnappyData as store and Spark as a computational engine by providing the
spark.snappydata.connection property as mentioned below:
// Start the Spark standalone cluster from SnappyData base directory $ ./sbin/start-all.sh // Submit AirlineDataSparkApp to Spark Cluster with snappydata's locator host port. $ ./bin/spark-submit --class io.snappydata.examples.AirlineDataSparkApp --master spark://masterhost:7077 --conf spark.snappydata.connection=locatorhost:clientPort $SNAPPY_HOME/examples/jars/quickstart.jar // The results can be seen on the command line.