Spark

Apache Spark is widely considered to be the successor to MapReduce for general purpose data processing on Apache Hadoop clusters. Like MapReduce applications, each Spark application is a self-contained computation that runs user-supplied code to compute a result.

Spark is particularly good for iterative computations on large data sets over a cluster of machines. While Hadoop MapReduce can also execute distributed jobs and take care of machine failures, etc., Apache Spark outperforms MapReduce significantly in iterative tasks because Spark does all computations in-memory.

Developing Spark Applications

When you are ready to move beyond running core Spark applications in an interactive shell, you need best practices for building, packaging, and configuring applications and using the more advanced APIs. This section describes:

  • How to develop, package, and run Spark applications.
  • Aspects of using Spark APIs beyond core Spark.
  • How to access data stored in various file formats, such as Parquet and Avro.
  • Developing and Running a Spark Word Count Application
  • Using Spark Streaming
  • Using Spark SQL

Usage of Spark Applications: As with MapReduce jobs, Spark applications can use the resources of multiple hosts. However, Spark has many advantages over MapReduce. In MapReduce, the higher-level unit of computation is a job. … In Spark, the higher-level unit of computation is an application.

How does Spark application work It is used to create Spark RDDs, accumulators, and broadcast variables, access Spark services and run jobs. Spark-context is a client of Spark execution environment and acts as the master of Spark application. The main works of Spark Context are: Getting the current status of spark application.

What is Spark application Master: Spark Application Master: The Spark Application Master is responsible for negotiating resource requests made by the driver with YARN and finding a suitable set of hosts/containers in which to run the Spark applications. There is one Application Master per application.

CCC DIGITAL

#2nd Floor, Quantum hub, Near HSBC,
Siripuram Jn, Visakhapatnam – 530003
+91 8919373374
#111, Sri Sai Gayatri towers, Near Wipro Circle,
Financial District, Hyderabad – 500008
+91 8142996999