Apache Spark - Exeliq

[bunch_modern page_link="__empty__" _id="33140" image="https://exeliqconsulting.com/wp-content/uploads/2019/02/sparkstr.jpg" style0="1" text1="RXhlbGlxIHByb3ZpZGVzIEFwYWNoZSBTcGFyayBpbXBsZW1lbnRhdGlvbiBzZXJ2aWNlcy4gV2UgaGlnaGx5IHJlY29tbWVuZCBTcGFyayB0byBlbnRlcnByaXNlcyB3b3JsZHdpZGUuIFRoZSBmcmFtZXdvcmsgb2ZmZXJzIGdyZWF0IHBlcmZvcm1hbmNlIGJlbmVmaXRzIGFuZCB2ZXJzYXRpbGl0eS4gRW50ZXJwcmlzZXMgYXJlIGZhY2VkIHdpdGggYSBoaWdoIHZvbHVtZSBhbmQgdmVsb2NpdHkgb2YgZGF0YSBjb21pbmcgZnJvbSB3ZWIgYW5kIG1vYmlsZSBhcHBzLiBUbyBzdGF5IGFoZWFkIG9mIHRoZSBjdXJ2ZSwgaXQgaXMgY3JpdGljYWwgdGhhdCB0aGUgc3BlZWQgb2YgZGF0YSBwcm9jZXNzaW5nIGFuZCBhbmFseXNpcyBzaG91bGQgc3VwcG9ydCB0aGUgQmlnIERhdGEgYXBwcy4gU3BhcmsgZ2l2ZXMgeW91ciBidXNpbmVzcyB0aGF0IGFkdmFudGFnZS4gSXQgYWxzbyBvZmZlcnMgbXVsdGlwbGUgYW5hbHl0aWNzIG9wdGlvbnMgc3VjaCBhcyBtYWNoaW5lIGxlYXJuaW5nLCBzdHJlYW1pbmcgYW5hbHl0aWNzIGFuZCBncmFwaCBhbmFseXRpY3Mu"]
[bunch_services_icon_five _id="722050" style0="1" column="4" cat="data-science-foundation" text_limit="30" num="6" sort="date" order="ASC"]

CURRICULUM

Prerequisites:

Intermediate Python and Spark/Scala
Azure/AWS (S3, Redshift, Azure Blob Storage, Azure Data Lake Storage, Azure SQL Data warehouse)
Hadoop/Hive

Target Audience:

This class is for you if:
Programmers, Developers, Technical Leads, Architects
Developers/Business Analysts aspiring to be a ‘Machine Learning Engineer’
Data Scientists/Data Analysts who want to gain expertise in Predictive Analytics
‘Python’ professionals who want to design automatic predictive models

Learning Outcomes:

Upon completion of this course, you will be able to:
Learn how to use Databricks for Python/Spark/R/Spark-SQL development
Setup job for Notebook, Setup Spark cluster
Setup BI Tool with Databricks
Intergrate CMD CLI with Databricks
Use Databrick Rest API
Use Databrick for Data Visulazation
Learn how to use the Databricks for ML/GraphX/Predictive models.

Spark Overview

Basic Spark Components


Spark Architecture


Low Level API – RDD & RDD Operation
(Trasformation and Actions)


Discributed Variable – Broadcast Variable & Accumulator


RDD – Partitions and Shuffling

Spark SQL and DataFrames

Reading from CSV, JSON, Parquet Files, JDBC


Writing Data in CSV, JSON, Parquet Files, JDBC


Use of DataFrames


Use of DataSets


Spark SQL


SQL Joins with DataFrames


Broadcast Join


Aggregations


UDF


Catalyst Query Optimization(Theory )

Spark Internals

Jobs, Stages and Tasks (Theory )


Partitions and Shuffling

Structured Streaming

Streaming Sources and Sinks


Structured Streaming APIs


Windowing and Aggregation


Checkpointing


Watermarking


Reliability and Fault Tolerance (Theory)

Machine Learning

Basic of Spark ML


Liner and Logistic Regression ML Algo

Graph Processing with GraphFrames

Basic GraphFrames API

[bunch_calltoaction_two page_link="https://exeliqconsulting.com/contact-us/" _id="102293" style0="1" bgimage="https://exeliqconsulting.com/wp-content/uploads/2019/02/newglobe.png" text="PGg0PkNvbnRhY3QgdXMgZm9yIHRoZSBEYXRhIFNjaWVuY2UgRm91bmRhdGlvbnMgT25saW5lIENvdXJzZTwvaDQ+DQo8YnI+DQo8YnI+DQoNCg0KVGhlIG5leHQgRGF0YSBTY2llbmNlIEZvdW5kYXRpb25zIE9ubGluZSBDb3Vyc2Ugd2lsbCBydW4gZnJvbSAyMDE5LTAzLTA1IHRvIDIwMTktMDQtMjUuIENsYXNzZXMgYXJlIGdlbmVyYWxseSBoZWxkIG9uIFR1ZXNkYXlzIGFuZCBUaHVyc2RheXMgZnJvbSA2OjMwLTk6MzAgUE0gRVQgLyAzOjMwLTY6MzAgUE0gUFQsIHdpdGggc29tZSBleGNlcHRpb25zIGZvciBob2xpZGF5cy4gVGhlIGRlYWRsaW5lIGZvciByZWdpc3RyYXRpb24gaXMgMjAxOS0wMi0yMi4gVGhlIGNvdXJzZSB0dWl0aW9uIGlzICQzNDk1IHdpdGggZWFybHktYmlyZCBkaXNjb3VudHMgYXZhaWxhYmxlLg0KDQpUaGUgZXhhY3QgZGF0ZXMgZm9yIHRoZSBuZXh0IHNlc3Npb24gd2lsbCBiZTogMy81LCAzLzcsIDMvMTIsIDMvMTQsIDMvMTksIDMvMjEsIDMvMjUsIDMvMjgsIDQvMiwgNC80LCA0LzksIDQvMTEsIDQvMTYsIDQvMTgsIDQvMjMsIDQvMjUNCg0K" btn="contact us" ttitle="Q29udGFjdCB1cw==" class="popmake-enquiry-for-spark"]
[bunch_testimonials_two _id="736285" style0="1" title="Testimonials" cat="datascience_training" text_limit="30" num="3" sort="date" order="ASC"]

A few of our 250+ hiring and training partners: