PowerPoint presentation

 PowerPoint presentation on Microsoft Power BI 

Overview of ETL Tool:

Don't use plagiarized sources. Get Your Custom Essay on
PowerPoint presentation
Just from $13/Page
Order Essay

RAJARAM GAUTAM
July 26, 2022

Executive Summary:

To present overview on StreamSets and its development, its

significance as Data Integration Platform and why to choose

StreamSets in comparison to its alternative products.

Established Launched Launched Released Launched Streamsets

Stream Inc Data Collector Streamsets Control Hub Transformer Engine Dataops Platform

2014 2015 2017 2019 2021

Timeline for Development History of StreamSets

Overview on Streamsets

High Level Architecture

Strengths / Weakness

Competitive Position

Agenda:

Overview: StreamSets

● Hadoop-based data lakes became the data storage system of choice for raw, unstructured data due to flood
of big data in 2014.

● IT firms were struggling to keep up with data sources no longer under their control.
● From Hadoop to Apache Kafka, then to Databricks, an open source software transformed hardware centered

data center into a decentralized system of increasingly specialized applications connected across owned
and virtual systems.

● Founded in 2014 by Arvind Prabhakar and Girish Pancha, a Former Cloudera Engineer and Informatica
Product Leader to manage data integration.

Modern Data Integration Platform to build smart data pipelines for DataOps across multi cloud architectures.

● Automating as much as possible abstracting away “How” of data pipeline to “What” of the data, so data

teams spends less time fixing and more time doing.
● The Future of data infrastructures was not about schema and scale, but about managing change and

automating as much as possible.
● In 2015, StreamSets launched Data Collector, an open source data execution engine for streaming data

pipelines built with resiliency to data drift.

● Data Collector Engine tackled the problem of streaming data ingest just like Apache Kafka and

Hadoop systems.

● Data Collector simplify batch, streaming and CDC pipelines, Data Collector Engine becomes tool of

choice for thousands of organization worldwide.

● In 2017, StreamSets Control Hub is introduced providing a single software as a service platform to

design, deploy, monitor, and manage smart data pipelines at scale on any cloud and premises.

● In 2019, Streamsets Transformer Engine was released which added ETL capabilities on Apache

Spark.

● Humana, BT Group, Shell, and IBM have made streamsets a core technology in their DataOps

practice.

● In 2021 StreamSets brought all the functionality of Control Hub, the Data Collector and Transformer

Engines into a fully managed service called StreamSets DataOps Platform.

StreamSets High Level Architecture / Data Pipeline Architectures
Three Basic Architectures of Data Pipelines depending on nature of data we are gathering and its use.

Batch Data Pipeline
Batch data pipelines move large sets of data at a particular time or in response to a behavior or when a
threshold is met. A batch data pipeline is often used for bulk ingestion or ETL processing. A batch data
pipeline might be used to deliver data weekly or daily from a CRM system to a data warehouse for use in a
dashboard for reporting and business intelligence.

Streaming Data Pipeline
Streaming data pipelines flow data continuously from source to destination as it is created. Streaming data
pipelines are used to populate data lakes or as part of data warehouse integration, or to publish to a
messaging system or data stream. They are also used in event processing for real-time applications. For
example, streaming data pipelines might be used to provide real-time data to a fraud detection system and to
monitor quality of service.

Change Data Capture Pipeline (CDC)
Change data capture pipelines are used to refresh data and keep multiple systems in sync. Instead of copying
the entire database, only changes to data since the last sync are shared. This can be particularly useful during
a cloud migration project when 2 systems are operating with the same data sets.

Data Engineering Platforms

There is a third way. A data engineering platform builds smart data pipelines according to DataOps principles.
Smart data pipelines abstract away the “how” so you can focus on the what, who, and where of the data. This is
the fundamental difference between data integration and data engineering. Instead of being perpetually under
construction, out of order, or limited to a single platform, smart data pipelines allow you to move fast with
confidence that your data will continue to flow with little to no intervention. Data engineering platforms allow you
to:

● Design and deploy data
pipelines in hours, not weeks
or months

● Build in as much resiliency as
possible to handle changes

● Adopt to new platforms by
pointing to them, a task that
takes minutes not months

Streamsets: Strength

1. User Friendly, Not Steep Learning Curve and Non-technological personnel can
also learn quick

2. On Premises and Cloud Environment
3. Easy to use when connecting enterprises data stores such as OLTP databases

or messaging systems such as Kafka. Enables us to create a data pipeline
without coding knowledge. No need to have knowledge on all databases and
coding to work with streamsets.

4. Built in data drift resilience plays in ETL operations.
5. It helps to resolve data sync issues coming from various sources. Reduces time

to fix data drift breakages.
6. It has lots of features from AWS, Azure and Snowflake.
7. Saves cost on licensing for some of the legacy software.

Streamsets: Strength (Continued)
8. Reusability of template for certain use case for moving data to the cloud. We can create job
templates for certain cases and use same templates again by just changing parameters.

9. Faster Data Transfer than Hadoop Scenario.

10. Masking sensitive data like PHI and PII is made easy.

11. Scheduling is easy in streamsets.

12. Easy to connect to Hadoop using streamsets.

13. It provides Change Data Capture as soon as source data has changed. Good technical
customer support.

14. It is compatible with various source systems like SQL Server, Oracle, REST API.

15. Control Hub Dataops platform manages the load balancing.

16. Everything in one place with streamsets.

17. Good online documentation.

Streamsets: Weakness

1. Lack of folder structures in organizing pipelines and jobs in Control Hub.
2. The Logging mechanism can be improved.
3. Visualization part can be improved by adding time factor in it. To see the

changes with respect to time.
4. Unable to read multiple tables from SAP HANA without querying them.
5. JDBC Lookup take long time to process.
6. Memory Leak Issues.

Competitive Positioning : StreamSets
Before making decision for using Streamsets Dataops Platforms as Data Integration Tools, users
also consider the following as an alternatives.

Even though having short history of its development in comparison to its alternative, StreamSets team has
developed the tool in rapid pace. It has done really good job by making StreamSets data pipeline resilient
to data drift.

Informatica PowerCenter SQL Server Integration Services

Fivetran Alteryx Designer AWS Glue

Oracle GoldenGate Qlik Replicate Talend Data Fabric

IBM DataStage Denodo Platform

Summary:
Has Short History of Development, Developed in 2015.

Modern Data Integration Platform to build smart data pipelines for DataOps across multi cloud architectures.

Automating as much as possible taking away “How” of data pipeline to “What” of the data. So more time in doing than
fixing.

StreamSets Data Collector, an open source data execution engine for streaming data pipelines built with resilience to data
drift.

StreamSets Control Hub(2017) as single software as service to design, deploy, monitor, and manage smart data pipelines.

StreamSets Transformer Engine(2019) added ETL capabilities on Apache Spark.

StreamSets (2021) bought all the functionality of Control Hub, Data Collector, Data Transformer Engines as StreamSets
DataOps Platform.

User Friendly UI, No steep learning curve, No coding required, Built in Data Drift Resilient.

References:

Homepage


https://www.peerspot.com/products/streamsets-reviews#review_2482491
https://www.gartner.com/reviews/market/data-integration-tools/vendor/streamsets/product/streamsets-dataops-platform/alternatives

Achiever Essays
Calculate your paper price
Pages (550 words)
Approximate price: -

Why Work with Us

Top Quality and Well-Researched Papers

We always make sure that writers follow all your instructions precisely. You can choose your academic level: high school, college/university or professional, and we will assign a writer who has a respective degree.

Professional and Experienced Academic Writers

We have a team of professional writers with experience in academic and business writing. Many are native speakers and able to perform any task for which you need help.

Free Unlimited Revisions

If you think we missed something, send your order for a free revision. You have 10 days to submit the order for review after you have received the final document. You can do this yourself after logging into your personal account or by contacting our support.

Prompt Delivery and 100% Money-Back-Guarantee

All papers are always delivered on time. In case we need more time to master your paper, we may contact you regarding the deadline extension. In case you cannot provide us with more time, a 100% refund is guaranteed.

Original & Confidential

We use several writing tools checks to ensure that all documents you receive are free from plagiarism. Our editors carefully review all quotations in the text. We also promise maximum confidentiality in all of our services.

24/7 Customer Support

Our support agents are available 24 hours a day 7 days a week and committed to providing you with the best customer experience. Get in touch whenever you need any assistance.

Try it now!

Calculate the price of your order

Total price:
$0.00

How it works?

Follow these simple steps to get your paper done

Place your order

Fill in the order form and provide all details of your assignment.

Proceed with the payment

Choose the payment system that suits you most.

Receive the final file

Once your paper is ready, we will email it to you.

Our Services

No need to work on your paper at night. Sleep tight, we will cover your back. We offer all kinds of writing services.

Essays

Essay Writing Service

No matter what kind of academic paper you need and how urgent you need it, you are welcome to choose your academic level and the type of your paper at an affordable price. We take care of all your paper needs and give a 24/7 customer care support system.

Admissions

Admission Essays & Business Writing Help

An admission essay is an essay or other written statement by a candidate, often a potential student enrolling in a college, university, or graduate school. You can be rest assurred that through our service we will write the best admission essay for you.

Reviews

Editing Support

Our academic writers and editors make the necessary changes to your paper so that it is polished. We also format your document by correctly quoting the sources and creating reference lists in the formats APA, Harvard, MLA, Chicago / Turabian.

Reviews

Revision Support

If you think your paper could be improved, you can request a review. In this case, your paper will be checked by the writer or assigned to an editor. You can use this option as many times as you see fit. This is free because we want you to be completely satisfied with the service offered.

Live Chat+1(978) 822-0999EmailWhatsApp

Order your essay today and save 20% with the discount code RESEARCH