Training Cloudera

Training goals dlearning

code: CL-ST

Cloudera University’s three-day Search training course is for developers and data engineers who want to index data in Hadoop for more powerful real-time queries. Participants will learn to get more value from their data by integrating Cloudera Search with external applications. Cloudera Search brings full-text, interactive search and scalable, flexible indexing to Hadoop and an enterprise data hub. Powered by Apache Solr, Search delivers scale and reliability for a new generation of integrated, multi-workload queries.

Through instructor-led discussion and interactive, hands-on exercises, participants will learn Apache Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the Hadoop ecosystem, learning topics such as:

  • Perform batch indexing of data stored in HDFS and HBase
  • Perform indexing of streaming data in near-real-time with Flume
  • Index content in multiple languages and file formats
  • Process and transform incoming data with Morphlines
  • Create a user interface for your index using Hue
  • Integrate Cloudera Search with external applications
  • Improve the Search experience using features such as faceting, highlighting, spelling Correction

Conspect Show list

  1. Introduction
  2. Overview of Cloudera Search
    • What is Cloudera Search?
    • Helpful Features
    • Use Cases
    • Basic Architecture
  3. Performing Basic Queries
    • Executing a Query in the Admin UI
    • Basic Syntax
    • Techniques for Approximate Matching
    • Controlling Output
  4. Writing More Powerful Queries
    • Relevancy and Filters
    • Query Parsers
    • Functions
    • Geospatial Search
    • Faceting
  5. Preparing to Index Documents
    • Overview of the Indexing Process
    • Understanding Morphlines
    • Generating Configuration Files
    • Schema Design
    • Collection Management
  6. Batch Indexing HDFS Data with MapReduce
    • Overview of the HDFS Batch Indexing Process
    • Using the MapReduce Indexing Tool
    • Testing and Troubleshooting
  7. Near-Real-Time Indexing with Flume
    • Overview of the Near-Real-Time Indexing Process
    • Introduction to Apache Flume
    • How to Perform Near-Real-Time Indexing with Flume
    • Testing and Troubleshooting
  8. Indexing HBase Data with Lily
    • What is Apache HBase?
    • Batch Indexing for HBase
    • Indexing HBase Tables in Near-Real-Time
  9. Indexing Data in Other Languages and Formats
    • Field Types and Analyzer Chains
    • Word Stemming, Character Mapping, and Language Support
    • Schema and Analysis Support in the Admin UI
    • Metadata and Content Extraction with Apache Tika
    • Indexing Binary File Types with SolrCell
  10. Improving Search Quality and Performance
    • Delivering Relevant Results
    • Helping Users Find Information
    • Query Performance and Troubleshooting
  11. Building User Interfaces for Search
    • Search UI Overview
    • Building a User Interface with Hue
    • Integrating Search into Custom Applications
  12. Considerations for Deployment
    • Planning for Deployment
    • Determining Hardware Needs
    • Security Overview
    • Collection Aliasing
  13. Conclusion
Download conspect training as PDF

Additional information


This course is intended for developers and data engineers with at least basic familiarity with Hadoop and experience programming in a general-purpose language such as Java, C, C++, Perl, or Python. Participants should be comfortable with the Linux command line and should be able to perform basic tasks such as creating and removing directories, viewing and changing file permissions, executing scripts, and examining file output. No prior experience with Apache Solr or Cloudera Search is required, nor is any experience with HBase or SQL.

Difficulty level
Duration 3 days

The participants will obtain certificates signed by Cloudera (training completion).


Certified Cloudera Instructor.

Other training Cloudera | Developer

Training thematically related


Big Data

Contact form

Please fill form below to obtain more info about this training.

* Fields marked with (*) are required !!!

Information on data processing by Compendium - Centrum Edukacyjne Spółka z o.o.

1780 EUR


Discount codes

Discount code may refer to (training, producer, deadline). If you have a discount code, enter it in the appropriate field.
(green means entering the correct code | red means the code is incorrect)



Traditional training

Sessions organised at Compendium CE are usually held in our locations in Kraków and Warsaw, but also in venues designated by the client. The group participating in training meets at a specific place and specific time with a coach and actively participates in laboratory sessions.

Dlearning training

You may participate from at any place in the world. It is sufficient to have a computer (or, actually a tablet, or smartphone) connected to the Internet. Compendium CE provides each Distance Learning training participant with adequate software enabling connection to the Data Center. For more information, please visit site



Electronic materials

Electronic Materials: These are electronic training materials that are available to you based on your specific application: Skillpipe, eVantage, etc., or as PDF documents.

Ctab materials

Ctab materials: the price includes ctab tablet and electronic training materials or traditional training materials and supplies provided electronically according to manufacturer's specifications (in PDF or EPUB form). The materials provided are adapted for display on ctab tablets. For more information, check out the ctab website.



No deadlines for this training.

Suggest your own appointment