Top 3 Applications of Apache Spark

Top 3 Applications of Apache Spark  Awantik Das
Posted on May 22, 2017, 12:07 p.m.

appications of apache spark

Based on statistics published by DataBricks, the top 3 applications using Apache Spark are the following.   

  • BUSINESS / CUSTOMER INTELLIGENCE ( 68% of Spark Users )
  • DATA WAREHOUSING ( 52% of Spark Users )
  • REAL-TIME / STREAMING SOLUTIONS (45% of Spark Users )

BUSINESS / CUSTOMER INTELLIGENCE

Business Intelligence (BI) is deriving presentable & actionable information to help corporate executives, business managers   & other stack holders to make an informed business decision.  Benefits of BI includes - accelerating, improving decisions and finding new opportunities.Customer Intelligence (CI) is the information derived from customer data that an organization can use to understand customer needs & serve better.

Before Spark, accessing few days of data took 24 hours. And, after using Spark, a year’s data get processed in a 10 min coffee break.  

Now, because of real quick BI & CI due to Spark, businesses have a real competitive edge over their rivals. The simplest example is – Knowing customers early is an unparalleled advantage over your rivals.  

DATA WAREHOUSING

Traditional data warehouses are great for structured data. But, the current trend of data consists of 4 V’s (Volume, Velocity, Variety, and Veracity). Data is coming from various sources like smartphones, sensors, social media, log, transactions etc.Your competitive edge is processing it faster. Data warehouses built using Spark-SQL provide a capability to address 4 V’s & gives an edge over other competitors.

REAL-TIME / STREAMING SOLUTIONS

Organizations get data from various sources in real-time like sensors, mobile, IoT devices, twitter, online transaction. All these data need to be monitored & processed. So, the need of the hour is large-scale, real-time data processing capability.Streaming ETL – Data is continuously cleaned and aggregated prior to pushing it to stores.  Spark Streaming solutions is used by companies like Pinterest to provide live insight how users are engaging with Pins across the world. Based on this Pinterest’s recommendation engine show more related pins.

Other applications that use Apache Spark are - RECOMMENDATION ENGINES, LOG PROCESSING, USER FACING SERVICES & FRAUD DETECTION


Awantik Das is a Technology Evangelist and is currently working as a Corporate Trainer. He has already trained more than 3000+ Professionals from Fortune 500 companies that include companies like Cognizant, Mindtree, HappiestMinds, CISCO and Others. He is also involved in Talent Acquisition Consulting for leading Companies on niche Technologies. Previously he has worked with Technology Companies like CISCO, Juniper and Rancore (A Reliance Group Company).




Keywords : data-science spark


Recommended Reading


Impact of Artificial Intelligence, Big Data and Technology on the Financial Sector: Disruption

The financial services sector was one of the first sectors to understand the wave of new technology which included Artificial Intelligence (AI) and the promise of the Big Data Revolution. Businesses in this sector, define themselves by their ability to make...


Container is the new process and Kubernetes is the new Unix.

Once a microservice is deployed in a container it shall be scheduled, scaled and managed independently. But when you are talking about hundreds of microservices doing that manually would be inefficient. Welcome Kubernetes, for doing container orchestration ...


What are Big Data, Hadoop & Spark ? What is the relationship among them ?

Big Data is a problem statement & what it means is the size of data under process has grown to 100's of petabytes ( 1 PB = 1000TB ). Yahoo mail generates some 40-50 PB of data every day. Yahoo has to read that 40-50 PB of data & filter out spans. E-commerce...


How can one explain the concept of Apache Spark in layman's terms?

Data needs computation to get some information out. Size of data can be really huge. Huge data is broken down into chunks & stored across different systems.


What are Big Data, Hadoop & Spark ? What is the relationship among them ?

difference between big data and spark, relationship between big data & spark