m Hadoop

Key Points

  1. Data pipelines connect, transform data sources to data targets in batches or event streams


References

Reference_description_with_linked_URLs_______________________Notes______________________________________________________________




https://projects.apache.org/projects.html?category

Apache Big Data projects




Hadoop

apache Hadoop

apache Hadoop docs
https://www.cloudera.com/downloads.htmlCloudera open-source distribution for Hadoop tools and more
https://www.cloudera.com/downloads.htmlCloudera Hadoop docs

C:\Users\Jim Mason\Google Drive\_books\tutorials\jp\hadoop

https://drive.google.com/open?id=0BxqKQGV-b4WQT2FpNU0tVGxydHc

Java Passion Hadoop courses
https://drive.google.com/open?id=0BxqKQGV-b4WQYXNnM3JHT1pzS2sJava Passion Hadoop intro
https://towardsdatascience.com/what-happened-to-hadoop-what-should-you-do-now-2876f68dbd1dHadoop challenges article







Key Hadoop Concepts




Apache Hadoop


Java Passion Hadoop intro

https://drive.google.com/open?id=0BxqKQGV-b4WQYXNnM3JHT1pzS2s



Cloudera Distributions

https://www.cloudera.com/downloads.html



Cloudera DataFlow

Scalable, real-time streaming analytics platform that ingests, curates, and analyzes data for key insights and immediate actionable intelligence.

Learn More
Cloudera Manager

A unified interface to manage your enterprise data hub. Express and Enterprise editions available.

Learn More
Cloudera Altus Director

Self-service, reliable experience for CDH and Cloudera Enterprise in the cloud

Learn More

Featured Downloads
Cloudera QuickStarts

Setting up your local machine using a QuickStart VM or Docker Image will give you examples of how to get started with some of the tools provided in CDH and how to manage your services via Cloudera Manager.

Download QuickStarts

Cloudera Data Science Workbench

Cloudera Data Science Workbench enables fast, easy, and secure self-service data science for the enterprise.

Download the Cloudera Data Science Workbench

Cloudera CDH

Cloudera's open source software distribution including Apache Hadoop and additional key open source projects

Download CDH
Download Phoenix for CDH

Hortonworks Sandbox

Hortonworks Sandbox can help you get started learning, developing, testing and trying out new features on HDP and HDF.

Download the Hortonworks Sandbox

Hortonworks Data Platform (HDP)

Hortonworks Data Platform (HDP) helps enterprises gain insights from structured and unstructured data. It is an open source framework for distributed storage and processing of large, multi-source data sets.

Download the Hortonworks Data Platform (HDP)

Hortonworks Data Flow (HDF)

Hortonworks DataFlow (HDF) is a scalable, real-time streaming analytics platform that ingests, curates and analyzes data for key insights and immediate actionable intelligence.

Download the Hortonworks Data Flow (HDF)

Apache Spark 2

Apache Spark 2 is a new major release of the Apache Spark project, with notable improvements in its API, performance and stream processing capabilities.
Encryption-at-Rest Security

Additional software for encryption and key management, available to Cloudera Enterprise customers.


Navigator Key Trustee Server

Enterprise-grade key management, storing keys for HDFS encryption and Navigator Encrypt. Required prerequisite for all 3 of the related downloads below.

Download Key Trustee Server

Navigator Encrypt

High-performance encryption for metadata, temp files, ingest paths and log files within Hadoop. Complements HDFS encryption for comprehensive protection of the cluster.
Download Navigator Encrypt

Navigator Key Trustee KMS

Connects HDFS Encryption to Navigator Key Trustee Server for production-ready key storage.
Download Navigator Key Trustee KMS

Navigator Key HSM

Integrates Navigator Key Trustee to existing Hardware Security Modules (HSMs), providing an (optional) additional layer of security.
Download Navigator Key HSM


Database Drivers

The Cloudera ODBC and JDBC Drivers for Hive and Impala enable your enterprise users to access Hadoop data through Business Intelligence (BI) applications with ODBC/JDBC support.

Hive ODBC Driver Downloads
Hive JDBC Driver Downloads
Impala ODBC Driver Downloads
Impala JDBC Driver Downloads

Oracle Instant Client

The Oracle Instant Client parcel for Hue enables Hue to be quickly and seamlessly deployed by Cloudera Manager with Oracle as its external database. For customers who have standardized on Oracle, this eliminates extra steps in installing or moving a Hue deployment on Oracle.

Oracle Instant Client for Hue Downloads
More Information
Data Transfer Connectors

Sqoop Connectors are used to transfer data between Apache Hadoop systems and external databases or Enterprise Data Warehouses. These connectors allow Hadoop and platforms like CDH to complement existing architecture with seamless data transfer.

Teradata Connector Downloads
Netezza Connector Downloads






Potential Value Opportunities

  1. distributed data management services
  2. batch and real-time processing options
  3. file-based storage ( vs memory ( Spark ))



Potential Challenges



Candidate Solutions



Step-by-step guide for Example



sample code block

sample code block
 



Recommended Next Steps