Spark with Scala Training

Spark with Scala Course:
Manipulating big data distributed over a cluster using functional concepts is rampant in industry, and is arguably one of the first widespread industrial uses of functional ideas. This is evidenced by the popularity of MapReduce and Hadoop, and most recently Apache Spark, a fast, in-memory distributed collections framework written in Scala. In this course, we'll see how the data parallel paradigm can be extended to the distributed case, using Spark throughout. We'll cover Spark's programming model in detail, being careful to understand how and when it differs from familiar programming models, like shared-memory parallel collections or sequential Scala collections. Through hands-on examples in Spark and Scala, we'll learn when important issues related to distribution like latency and network communication should be considered and how they can be addressed effectively for improved performance.

Spark with Scala Course Curriculum

1. Introduction to Data Analysis with Spark

What Is Apache Spark?

A Unified Stack

Spark Core

Spark SQL

Spark Streaming

MLlib

GraphX

Cluster Managers

Who Uses Spark, and for What?

Data Science Tasks

Data Processing Applications

A Brief History of Spark

Spark Versions and Releases

Storage Layers for Spark

2. Downloading Spark and Getting Started

Downloading Spark

Introduction to Spark’s Python and Scala Shells

Introduction to Core Spark Concepts

Standalone Applications

Initializing a SparkContext

Building Standalone Applications

3. Programming with RDDs

RDD Basics

Creating RDDs

RDD Operations

Transformations

Actions

Lazy Evaluation

Passing Functions to Spark

Common Transformations and Actions

Basic RDDs

Converting Between RDD Types

Persistence (Caching)

4. Working with Key/Value Pairs

Motivation

Creating Pair RDDs

Transformations on Pair RDDs

Aggregations

Grouping Data

Joins

Sorting Data

Actions Available on Pair RDDs

Data Partitioning (Advanced)

Determining an RDD’s Partitioner

Operations That Benefit from Partitioning

Operations That Affect Partitioning

Example: PageRank

Custom Partitioners

5. Loading and Saving Your Data

Motivation

File Formats

Text Files

JSON

Comma-Separated Values and Tab-Separated Values

SequenceFiles

Object Files

Hadoop Input and Output Formats

File Compression

Filesystems

Local/“Regular” FS

Amazon S

HDFS

Structured Data with Spark SQL

Apache Hive

JSON

Databases

Java Database Connectivity

Cassandra

HBase

Elasticsearch

6. Advanced Spark Programming

Introduction

Accumulators

Accumulators and Fault Tolerance

Custom Accumulators

Broadcast Variables

Optimizing Broadcasts

Working on a Per-Partition Basis

Piping to External Programs

Numeric RDD Operations

7. Running on a Cluster

Introduction

Spark Runtime Architecture

The Driver

Executors

Cluster Manager

Launching a Program

8. Deploying Applications with spark-submit

Packaging Your Code and Dependencies

A Java Spark Application Built with Maven

A Scala Spark Application Built with sbt

Dependency Conflicts

Scheduling Within and Between Spark Applications

Cluster Managers

Standalone Cluster Manager

Hadoop YARN

Apache Mesos

Amazon EC

Which Cluster Manager to Use?

9. Tuning and Debugging Spark

Configuring Spark with SparkConf

Components of Execution: Jobs, Tasks, and Stages

Finding Information

Spark Web UI

Driver and Executor Logs

Key Performance Considerations

Level of Parallelism

Serialization Format

Memory Management

Hardware Provisioning

10. Spark SQL

Linking with Spark SQL

Using Spark SQL in Applications

Initializing Spark SQL

Basic Query Example

SchemaRDDs

Caching

Loading and Saving Data

Apache Hive

Parquet

JSON

From RDDs

JDBC/ODBC Server

Working with Beeline

Long-Lived Tables and Queries

User-Defined Functions

Spark SQL UDFs

Hive UDFs

Spark SQL Performance

Performance Tuning Options

11. Spark Streaming.

A Simple Example

Architecture and Abstraction

Transformations

Stateless Transformations

Stateful Transformations

Output Operations

Input Sources

Core Sources

Additional Sources

Multiple Sources and Cluster Sizing / Operation

Checkpointing

Driver Fault Tolerance

Worker Fault Tolerance

Receiver Fault Tolerance

Processing Guarantees

Streaming UI

Performance Considerations

Batch and Window Sizes

Level of Parallelism

Garbage Collection and Memory Usage

12. Machine Learning with MLlib

Overview

System Requirements

Machine Learning Basics

Example: Spam Classification

Data Types

Working with Vectors

Algorithms

Feature Extraction

Statistics

Classification and Regression

Clustering

Collaborative Filtering and Recommendation

Dimensionality Reduction

Model Evaluation

Tips and Performance Considerations

Preparing Features

Configuring Algorithms

Table of Contents | vii

Caching RDDs to Reuse

Recognizing Sparsity

Level of Parallelism

Pipeline API

Frequently Asked Questions

What are the modes of training for "Spark with Scala" course?

This "Spark with Scala" course is an instructor-led training (ILT). The trainer travels to your office location and delivers the training within your office premises. If you need training space for the training we can provide a fully-equipped lab with all the required facilities. The online instructor-led training is also available if required. Online training is live and the instructor's screen will be visible and voice will be audible. Participants screen will also be visible and participants can ask queries during the live session.

Will I be provided with any study material during the "Spark with Scala" training?

Participants will be provided "Spark with Scala"-specific study material. Participants will have lifetime access to all the code and resources needed for this "Spark with Scala". Our public GitHub repository and the study material will also be shared with the participants.

What is the pedagogy of zekeLabs?

All the courses from zekeLabs are hands-on courses. The code/document used in the class will be provided to the participants. Cloud-lab and Virtual Machines are provided to every participant during the "Spark with Scala" training.

What is the duration of this course?

The "Spark with Scala" training varies several factors. Including the prior knowledge of the team on the subject, the objective of the team learning from the program, customization in the course is needed among others. Contact us to know more about "Spark with Scala" course duration.

What would be the venue for the "Spark with Scala" training?

The "Spark with Scala" training is organised at the client's premises. We have delivered and continue to deliver "Spark with Scala" training in India, USA, Singapore, Hong Kong, and Indonesia. We also have state-of-art training facilities based on client requirement.

Who is the trainer for "Spark with Scala" training?

Our Subject matter experts (SMEs) have more than ten years of industry experience. This ensures that the learning program is a 360-degree holistic knowledge and learning experience. The course program has been designed in close collaboration with the experts working in esteemed organizations such as Google, Microsoft, Amazon, and similar others.

Can we customize this course based on our requirements?

Yes, absolutely. For every training, we conduct a technical call with our Subject Matter Expert (SME) and the technical lead of the team that undergoes training. The course is tailored based on the current expertise of the participants, objectives of the team undergoing the training program and short term and long term objectives of the organisation.

How can I reach out to you if I have any other queries regarding the "Spark with Scala" course?

Drop a mail to us at [email protected] or call us at +91 8041690175 and we will get back to you at the earliest for your queries on "Spark with Scala" course.

Recommended Courses

	Big Data Processing with PySpark
	Hadoop - Mastering Big Data with Hadoop Ecosystem
	Big Data Processing with PySpark
	Big Data Processing with PySpark
	Big Data Processing with PySpark

More Courses

	Power BI
	Backbone.js
	Pivotal Cloud Foundry
	Oracle SQL
	AWS Certified SysOps Administrator
	Pega Systems Architect
	Big Data Processing with PySpark
	Java 8
	Best Blockchain Certification
	Pivotal Cloud Foundry

First Name*
Last Name*
Mobile*
Email*
Training Required For*

Organisation
Message*
Lead Status
Lead Source

Spark with Scala Training

Spark with Scala Course:

Spark with Scala Course Curriculum

1. Introduction to Data Analysis with Spark

2. Downloading Spark and Getting Started

3. Programming with RDDs

4. Working with Key/Value Pairs

5. Loading and Saving Your Data

6. Advanced Spark Programming

7. Running on a Cluster

8. Deploying Applications with spark-submit

9. Tuning and Debugging Spark

10. Spark SQL

11. Spark Streaming.

12. Machine Learning with MLlib

Frequently Asked Questions

What are the modes of training for "Spark with Scala" course?

Will I be provided with any study material during the "Spark with Scala" training?

What is the pedagogy of zekeLabs?

What is the duration of this course?

What would be the venue for the "Spark with Scala" training?

Who is the trainer for "Spark with Scala" training?

Can we customize this course based on our requirements?

How can I reach out to you if I have any other queries regarding the "Spark with Scala" course?

Mass layoffs in IT Majors – speculations - facts - and the future ahead !

Top 3 Applications of Apache Spark

Practical use cases of AI in Business

What are the various IoT Frameworks?

Object Model in Python - Understanding Internals

What are the next best Programming Languages?

zekeLabs among Top 10 destinations to learn AI & Machine Learning

Using Terraform with Azure

The Vital Role of Big Data to Fight Against Corona virus

How to do Cloud Automation using Terraform?

Recommended Courses

Big Data Processing with PySpark

Hadoop - Mastering Big Data with Hadoop Ecosystem

Big Data Processing with PySpark

Happy to hear your feedback