m Hadoop
Key Points
- Data pipelines connect, transform data sources to data targets in batches or event streams
References
Reference_description_with_linked_URLs_______________________ | Notes______________________________________________________________ |
---|---|
Apache Big Data projects | |
Hadoop | |
apache Hadoop | |
apache Hadoop docs | |
https://www.cloudera.com/downloads.html | Cloudera open-source distribution for Hadoop tools and more |
https://www.cloudera.com/downloads.html | Cloudera Hadoop docs |
C:\Users\Jim Mason\Google Drive\_books\tutorials\jp\hadoop https://drive.google.com/open?id=0BxqKQGV-b4WQT2FpNU0tVGxydHc | Java Passion Hadoop courses |
https://drive.google.com/open?id=0BxqKQGV-b4WQYXNnM3JHT1pzS2s | Java Passion Hadoop intro |
https://towardsdatascience.com/what-happened-to-hadoop-what-should-you-do-now-2876f68dbd1d | Hadoop challenges article |
Key Hadoop Concepts
Apache Hadoop
Java Passion Hadoop intro
https://drive.google.com/open?id=0BxqKQGV-b4WQYXNnM3JHT1pzS2s
Cloudera Distributions
https://www.cloudera.com/downloads.html
Cloudera DataFlow
Scalable, real-time streaming analytics platform that ingests, curates, and analyzes data for key insights and immediate actionable intelligence.
Learn More
Cloudera Manager
A unified interface to manage your enterprise data hub. Express and Enterprise editions available.
Learn More
Cloudera Altus Director
Self-service, reliable experience for CDH and Cloudera Enterprise in the cloud
Learn More
Featured Downloads
Cloudera QuickStarts
Setting up your local machine using a QuickStart VM or Docker Image will give you examples of how to get started with some of the tools provided in CDH and how to manage your services via Cloudera Manager.
Download QuickStarts
Cloudera Data Science Workbench
Cloudera Data Science Workbench enables fast, easy, and secure self-service data science for the enterprise.
Download the Cloudera Data Science Workbench
Cloudera CDH
Cloudera's open source software distribution including Apache Hadoop and additional key open source projects
Download CDH
Download Phoenix for CDH
Hortonworks Sandbox
Hortonworks Sandbox can help you get started learning, developing, testing and trying out new features on HDP and HDF.
Download the Hortonworks Sandbox
Hortonworks Data Platform (HDP)
Hortonworks Data Platform (HDP) helps enterprises gain insights from structured and unstructured data. It is an open source framework for distributed storage and processing of large, multi-source data sets.
Download the Hortonworks Data Platform (HDP)
Hortonworks Data Flow (HDF)
Hortonworks DataFlow (HDF) is a scalable, real-time streaming analytics platform that ingests, curates and analyzes data for key insights and immediate actionable intelligence.
Download the Hortonworks Data Flow (HDF)
Apache Spark 2
Apache Spark 2 is a new major release of the Apache Spark project, with notable improvements in its API, performance and stream processing capabilities.
Encryption-at-Rest Security
Additional software for encryption and key management, available to Cloudera Enterprise customers.
Navigator Key Trustee Server
Enterprise-grade key management, storing keys for HDFS encryption and Navigator Encrypt. Required prerequisite for all 3 of the related downloads below.
Download Key Trustee Server
Navigator Encrypt
High-performance encryption for metadata, temp files, ingest paths and log files within Hadoop. Complements HDFS encryption for comprehensive protection of the cluster.
Download Navigator Encrypt
Navigator Key Trustee KMS
Connects HDFS Encryption to Navigator Key Trustee Server for production-ready key storage.
Download Navigator Key Trustee KMS
Navigator Key HSM
Integrates Navigator Key Trustee to existing Hardware Security Modules (HSMs), providing an (optional) additional layer of security.
Download Navigator Key HSM
Database Drivers
The Cloudera ODBC and JDBC Drivers for Hive and Impala enable your enterprise users to access Hadoop data through Business Intelligence (BI) applications with ODBC/JDBC support.
Hive ODBC Driver Downloads
Hive JDBC Driver Downloads
Impala ODBC Driver Downloads
Impala JDBC Driver Downloads
Oracle Instant Client
The Oracle Instant Client parcel for Hue enables Hue to be quickly and seamlessly deployed by Cloudera Manager with Oracle as its external database. For customers who have standardized on Oracle, this eliminates extra steps in installing or moving a Hue deployment on Oracle.
Oracle Instant Client for Hue Downloads
More Information
Data Transfer Connectors
Sqoop Connectors are used to transfer data between Apache Hadoop systems and external databases or Enterprise Data Warehouses. These connectors allow Hadoop and platforms like CDH to complement existing architecture with seamless data transfer.
Teradata Connector Downloads
Netezza Connector Downloads
Potential Value Opportunities
- distributed data management services
- batch and real-time processing options
- file-based storage ( vs memory ( Spark ))
Potential Challenges
Candidate Solutions
Step-by-step guide for Example
sample code block