Cube Guardian, a Python tool for efficient Cube control

Michaël Scherding
4 min readNov 27, 2023

--

Discover Cube Guardian, a Python tool inspired by Looker’s Spectacles, designed to automate data cube validation in Cube.js. Learn how to set it up, use its CLI for testing, and integrate it into CI/CD pipelines for reliable data analytics.

Introduction to Cube

In today’s data-driven world, Cube.js has become a key player. As an open-source framework, it allows developers to build powerful analytical applications, adept at handling large datasets with ease. Cube.js is known for its modular query engine, streamlining data retrieval from diverse databases and services. This makes it a valuable tool for businesses and developers eager to extract meaningful insights from their data. However, the accuracy and integrity of data in Cube.js are vital for the reliability of any analytical application.

Ensuring Data Integrity

In the journey of data analysis, the integrity of data within Cube.js is a critical factor. Errors or inaccuracies in data cubes can lead to flawed analyses, potentially impacting crucial business decisions. Manual validation processes are not only time-consuming but also prone to human error. It became evident that a more efficient and reliable solution was needed to safeguard data accuracy.

Inspired by the need for dependable data validation in Cube.js, Cube Guardian came into existence. It’s a Python-based tool designed to automate and expedite the validation process of data cubes. Cube Guardian acts as a guardian of data integrity, ensuring that the data within Cube.js remains accurate and trustworthy.

The idea that sparked Cube Guardian’s development was the observation of Spectacles effectiveness in Looker. It prompted a fundamental question: Could a similar approach be applied to Cube.js?

Installation and configuration

Setting Up the Environment

Before diving into Cube Guardian, you need to ensure that you have Python installed on your system. Cube Guardian is compatible with Python 3.7 and higher. Once Python is set up, you can proceed to install the necessary dependencies.

Open your terminal and navigate to the directory where you want to store the Cube Guardian project. Then, run the following command:

git clone https://github.com/mchl-schrdng/cubeguardian.git

Configuration

To configure Cube Guardian for your specific Cube.js environment, you need to create a config.yaml file. This file should contain essential information such as the Cube.js API URL and API token. Cube Guardian relies on these details to access and validate the data cubes. Example:

api_url: "https://example-api-url.com"
api_token: "your_api_token_here"

Running Cube Guardian

Command-Line Interface (CLI)

Cube Guardian operates primarily through its command-line interface (CLI). It provides a range of options to customize the validation process according to your needs. Here’s a brief overview of some key CLI options:

  • --fail-fast: Enabling this option stops testing further dimensions on the first failure, making it useful for quick validation.
  • --cubes: You can specify a list of cube names to test. If not set, Cube Guardian will validate all cubes.
  • --concurrency: Set the concurrency limit for testing. The default is 10 concurrent tests.

Initiating Validation

With Cube Guardian configured and the CLI options set, you can initiate the validation process. Simply run the following command:

python cubeguardian.py

or

python cubeguardian.py --fail-fast --cubes cube1 cube2 --concurrency 5

Output

Example of output

Cube validation status

Cube Guardian logs the status of each cube that it validates. The status can be one of the following:

  • Passed: This means that the cube passed the validation, and its data is considered accurate and reliable.
  • Failed: If Cube Guardian encounters issues during validation, it marks the cube as failed. This indicates that there are problems with the cube’s data.

Detailed Error Messages

In case Cube Guardian encounters issues during validation, it provides detailed error messages. These messages pinpoint the specific problems found within the cube’s data. Error messages are essential for diagnosing and addressing data inaccuracies.

Logging

Cube Guardian logs its activities as it progresses through the validation process. This includes information about which cubes were tested, their status, and any error messages generated. The log serves as a record of the validation process, making it useful for troubleshooting and auditing.

Duration

Cube Guardian reports the duration of the cube validation process. It indicates how much time was taken to complete the validation. This information can be helpful for assessing the efficiency of the validation workflow.

Conclusion

Cube Guardian, inspired by Looker’s Spectacles, is your trusted partner for automating data cube validation in Cube.js. With its CLI, customization options, and CI/CD integration, it ensures data integrity, empowering data professionals to make confident, accurate decisions.

Please find the GitHub repo here.

Have fun 🤟

--

--

Responses (1)