Big Data Processing with Deep Learning - Spark vs TensorFlow

Big Data Processing with Deep Learning - Spark vs TensorFlow  Awantik Das
Posted on July 29, 2019, 10:34 a.m.

 

More and more organizations are integrating big data pipeline with deep learning infrastructure. This is something, Spark & TensorFlow folks have in mind as well. Let’s have a quick glance about their journey so far. 

 

Spark

  • Came as an alternative to old school Hadoop’s Map-Reduce.

  • It’s fast because of in-memory computing.

  • Super social in nature, could interact with all Big Data technologies like Kafka, HDFS, Cassandra & many more.

  • Easy to deal with, developer-friendly, do more by writing less.

  • You can scale using your commodity hardware.

  • Spark has many machine learning algorithms implemented.

  • For deep learning it allows porting TensorFlow on spark using open source libraries from various sources.

  • Building a data pipeline using Spark looks like -

 

TensorFlow

  • TensorFlow came from Google & very soon become the most trusted AI technology adopted, industry-wide for deep learning.

  • Fundamentally, TensorFlow is something more generic - ‘open-source scientific computing library’.

  • List of other TensorFlow libraries

  • Now, TensorFlow as grown into much beyond deep learning library. One of the functionalities TensorFlow provides is the

  • Ability to build big data pipeline is just one of many functionalities that TensorFlow provides.

  • Tensorflow.io & tf.data allows you to read data from different sources.

  • Audio, image & text preprocessing infrastructure is now inbuilt with TensorFlow.

  • Building a data pipeline using TensorFlow looks like -

             

Conclusion

  • The journey of Spark & TensorFlow has been different. Spark started with big data ecosystem and then moved towards Machine learning whereas Tensorflow started as Machine learning library and now headed to provide full big data ecosystem.

  • Three things decide the winner in the increasing order of importance 

    • Performance

    • Stability

    • Ease of use

  • Over a period of time, there will be one absolute winner for big data processing with deep learning stack. Only time will answer.


Corporate Trainer & Consultant for Big Data Technologies, Machine Learning & Deep Learning. Have trained & consultant more than 40+ different companies for these technologies.




Keywords : Artificial Intelligence data-science spark Machine-Learning


Recommended Reading


The Machine Learning Pipeline - Essential things to know before getting started with machine learning

Starting from development to deployment of machine learning, the journey of a product can be broken down to 7 important stages: Business Understanding, Data Wrangling, Visualization, Preprocessing, Model Training, Model Validation, Deployment


What are Big Data, Hadoop & Spark ? What is the relationship among them ?

Big Data is a problem statement & what it means is the size of data under process has grown to 100's of petabytes ( 1 PB = 1000TB ). Yahoo mail generates some 40-50 PB of data every day. Yahoo has to read that 40-50 PB of data & filter out spans. E-commerce...