Training goals

This four-day Analyzing with Data Warehouse course will teach you to apply traditional data analytics and business intelligence skills to big data. This course presents the tools data professionals need to access, manipulate, transform, and analyze complex data sets using SQL and familiar scripting languages.

What you'll learn

Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the ecosystem, learning how to:

Use Apache Hive and Apache Impala to access data through queries
Identify distinctions between Hive and Impala, such as differences in syntax, data formats, and supported features
Write and execute queries that use functions, aggregate functions, and subqueries
Use joins and unions to combine datasets
Create, modify, and delete tables, views, and databases
Load data into tables and store query results
Select file formats and develop partitioning schemes for better performance
Use analytic and windowing functions to gain insight into their data
Store and query complex or nested data structures
Process and analyze semi-structured and unstructured data
Optimize and extend the capabilities of Hive and Impala
Determine whether Hive, Impala, an RDBMS, or a mix of these is the best choice for a given task
Utilize the benefits of CDP Public Cloud Data Warehouse

What to expect

This course is designed for data analysts, business intelligence specialists, developers, system architects, and database administrators. Some knowledge of SQL is assumed, as is basic Linux command-line familiarity.

Conspect Show list

Foundations for Big Data Analytics
- Big Data Analytics Overview
- Data Storage: HDFS
- Distributed Data Processing: YARN, MapReduce, and Spark
- Data Processing and Analysis: Hive and Impala
- Database Integration: Sqoop
- Other Data Tools
- Exercise Scenario Explanation
Introduction to Apache Hive and Impala
- What Is Hive?
- What Is Impala?
- Why Use Hive and Impala?
- Schema and Data Storage
- Comparing Hive and Impala to Traditional Databases
- Use Cases
Querying with Apache Hive and Impala
- Databases and Tables
- Basic Hive and Impala Query Language Syntax
- Data Types
- Using Hue to Execute Queries
- Using Beeline (Hive's Shell)
- Using the Impala Shell
Common Operators and Built-In Functions
- Operators
- Scalar Functions
- Aggregate Functions
Data Management
- Data Storage
- Creating Databases and Tables
- Loading Data
- Altering Databases and Tables
- Simplifying Queries with Views
- Storing Query Results
Data Storage and Performance
- Partitioning Tables
- Loading Data into Partitioned Tables
- When to Use Partitioning
- Choosing a File Format
- Using Avro and Parquet File Formats
Working with Multiple Datasets
- UNION and Joins
- Handling NULL Values in Joins
- Advanced Joins
Analytic Functions and Windowing
- Using Analytic Functions
- Other Analytic Functions
- Sliding Windows
Complex Data
- Complex Data with Hive
- Complex Data with Impala
Analyzing Text
- Using Regular Expressions with Hive and Impala
- Processing Text Data with SerDes in Hive
- Sentiment Analysis and n-grams in Hive
Apache Hive Optimization
- Understanding Query Performance
- Cost-Based Optimization and Statistics
- Bucketing
- ORC File Optimizations
Apache Impala Optimization
- How Impala Executes Queries
- Improving Impala Performance
Extending Hive and Impala
- User-Defined Functions
- Parameterized Queries
Choosing the Best Tool for the Job
- Comparing Hive, Impala, and
- Relational Databases
- Which to Choose?
CDP Public Cloud Data Warehouse
- Data Warehouse Overview
- Auto-Scaling
- Managing Virtual Warehouses
- Querying Data Using CLI and Third-Party Integration
Appendix: Apache Kudu
- What Is Kudu?
- Kudu Tables
- Using Impala with Kudu

Download conspect training as PDF

Additional information

Prerequisites	This course is designed for data analysts, business intelligence specialists, developers, system architects, and database administrators. Some knowledge of SQL is assumed, as is basic Linux command-line familiarity.
Difficulty level
Duration	4 days
Certificate	The participants will obtain certificates signed by Cloudera (course completion). Upon completion of the course, attendees are encouraged to continue their study and register for the CDP Data Analyst exam https://www.cloudera.com/about/training/certification/cdp-dataanalyst-exam-cdp-4001.html and/or CDP Data Engineer exam https://www.cloudera.com/about/training/certification/cdp-data-engineer-exam-guide-cdp-3002.html Certification is a great differentiator. It helps establish you as a leader in the field, providing employers and customers with tangible evidence of your skills and expertise.
Trainer	Certified Cloudera Instructor

Other training Cloudera | Cloudera Data Analyst

DENG-254 Preparing with Cloudera Data Engineering training Cloudera

price from: 3000 EUR

duration: 4 days

difficulty level: 3 of 6

code: CLOUDERA-DENG-254

Cloudera

Cloudera Data Analyst
DOPS-242 Ingesting with Cloudera DataFlow training Cloudera

price from: 3000 EUR

duration: 4 days

difficulty level: 3 of 6

code: CLOUDERA-DOPS-242

2025-09-16 | Kraków / Virtual Classroom HYBRID: hybrid training sign up

2025-10-21 | Warszawa / Virtual Classroom HYBRID: hybrid training sign up

Cloudera

Cloudera Data Analyst
Cloudera Streaming Analytics: Using Apache Flink and SQL Stream Builder on CDP training Cloudera

price from: 1500 EUR

duration: 2 days

difficulty level: 3 of 6

code: CSA-UAFSSB-CDP

Cloudera

Cloudera Data Analyst

Big Data

Data analysis

DevOps

Cloudera show more courses

Contact form

Please fill form below to obtain more info about this training.

Additional information about training
Additional information about pricing and terms
Training special offers from this vendor
New courses offers

Printed Compendium CE courses catalogue
Need urgent contact

Please send me newsletter

* I declare that I agree to receive commercial and marketing information in electronic form, by phone from Compendium - Centrum Edukacyjne Spółka z o.o. ul. Tatarska 5, 30-103 Kraków to the above e-mail address, telephone number and to process the personal data provided in the Compendium database - Centrum Edukacyjne Spółka z o.o. ul. Tatarska 5, 30-103 Kraków on the terms set out in the provisions of the Act of 18 July 2002 on the provision of electronic services (Dz.U.2017.1219, ie with later changes) and on the collection, processing and use of given personal data by Compendium - Centrum Edukacyjne Spółka z oo ul. Tatarska 5, 30-103 Krakow for statistical and marketing purposes in accordance with the Law on the Protection of Personal Data of 29.08.1997. (Dz.U.2016.922 as amended) and Regulation of the European Parliament and of the Council (EU) 2016/679 of 27 April 2016 on the protection of individuals with regard to the processing of personal data and on the free movement of such data, and repeal of Directive 95/46 / EC (GDPR), Compendium - Centrum Edukacyjne will make every effort to ensure the correct operation of the notification system and storage of personal data located on the www.compendium.pl server and their full protection.

* Fields marked with (*) are required !!!

Information on data processing by Compendium - Centrum Edukacyjne Spółka z o.o.

PRICE 3000 EUR

FORM OF TRAINING ?

TRAINING MATERIALS ?

SELECT TRAINING DATE

2025-11-04 | Kraków / Virtual Classroom

hybrid training: HYBRID
- General information
- Guaranteed dates
- Last minute (-10%)
- Language of the training
- English
2025-12-02 | Warszawa / Virtual Classroom

hybrid training: HYBRID
- General information
- Guaranteed dates
- Last minute (-10%)
- Language of the training
- English

Book a training appointment

Traditional training

Sessions organised at Compendium CE are usually held in our locations in Kraków and Warsaw, but also in venues designated by the client. The group participating in training meets at a specific place and specific time with a coach and actively participates in laboratory sessions.

Dlearning training

You may participate from at any place in the world. It is sufficient to have a computer (or, actually a tablet, or smartphone) connected to the Internet. Compendium CE provides each Distance Learning training participant with adequate software enabling connection to the Data Center. For more information, please visit dlearning.eu site

Paper materials

Traditional materials: The price includes standard materials issued in the form of paper books, printed or other, depending on the arrangements with the manufacturer.

Electronic materials

Electronic materials: These are electronic training materials that are available to you based on your specific application: Skillpipe, eVantage, etc., or as PDF documents.

Ctab materials

Ctab materials: the price includes ctab tablet and electronic training materials or traditional training materials and supplies provided electronically according to manufacturer's specifications (in PDF or EPUB form). The materials provided are adapted for display on ctab tablets. For more information, check out the ctab website.

Upcoming Cloudera training

2025-09-02 Warszawa / Virtual Classroom

ADMIN-236 Managing Apache Ozone

hybrid training: HYBRID
2025-09-02 Warszawa / Virtual Classroom

ADMIN-332 Building Secure Cloudera Clusters

hybrid training: HYBRID
2025-09-16 Kraków / Virtual Classroom

DOPS-242 Ingesting with Cloudera DataFlow

hybrid training: HYBRID
2025-09-18 Kraków / Virtual Classroom

DGOV-221 Controlling with Cloudera Data Governance

hybrid training: HYBRID
2025-10-07 Kraków / Virtual Classroom

ADMIN-335 Running Cloudera Private Cloud

hybrid training: HYBRID
2025-10-07 Kraków / Virtual Classroom

Administrator Training: CDP Private Cloud Base

hybrid training: HYBRID
2025-10-15 Kraków / Virtual Classroom

Cloudera Training for Apache HBase

hybrid training: HYBRID
2025-10-16 Warszawa / Virtual Classroom

DGOV-221 Controlling with Cloudera Data Governance

hybrid training: HYBRID
2025-10-21 Warszawa / Virtual Classroom

DOPS-242 Ingesting with Cloudera DataFlow

hybrid training: HYBRID
2025-11-04 Kraków / Virtual Classroom

DANA-262 Analyzing with Cloudera Data Warehouse

hybrid training: HYBRID

Training schedule Cloudera

Compendium Centrum Edukacyjne

Training goals

Conspect Show list

Additional information

Other training Cloudera | Cloudera Data Analyst

DENG-254 Preparing with Cloudera Data Engineering training Cloudera

price from: 3000 EUR

duration: 4 days

difficulty level: 3 of 6

code: CLOUDERA-DENG-254

Cloudera

Cloudera Data Analyst

DOPS-242 Ingesting with Cloudera DataFlow training Cloudera

price from: 3000 EUR

duration: 4 days

difficulty level: 3 of 6

code: CLOUDERA-DOPS-242

Cloudera

Cloudera Data Analyst

Cloudera Streaming Analytics: Using Apache Flink and SQL Stream Builder on CDP training Cloudera

price from: 1500 EUR

duration: 2 days

difficulty level: 3 of 6

code: CSA-UAFSSB-CDP

Cloudera

Cloudera Data Analyst

Big Data

Data analysis

DevOps

PRICE 3000 EUR

FORM OF TRAINING ?

TRAINING MATERIALS ?

SELECT TRAINING DATE

Traditional training

Dlearning training

Paper materials

Electronic materials

Ctab materials

Upcoming Cloudera training

NIS2 Cyber Security

Compendium Centrum Edukacyjne

DANA-262 Analyzing with Cloudera Data Warehouse

Training goals

Conspect Show list

Additional information

Other training Cloudera | Cloudera Data Analyst

price from: 3000 EUR

duration: 4 days

difficulty level: 3 of 6

code: CLOUDERA-DENG-254

price from: 3000 EUR

duration: 4 days

difficulty level: 3 of 6

code: CLOUDERA-DOPS-242

price from: 1500 EUR

duration: 2 days

difficulty level: 3 of 6

code: CSA-UAFSSB-CDP

PRICE 3000 EUR

FORM OF TRAINING ?

TRAINING MATERIALS ?

SELECT TRAINING DATE

Traditional training

Dlearning training

Paper materials

Electronic materials

Ctab materials

Upcoming Cloudera training