Why I still think Looker is a damn exceptional BI Tool

Michaël Scherding
7 min readOct 28, 2023

--

Generated on deepai.org

In the ever-evolving world of Business Intelligence (BI) tools, getting caught up in the glitz and glam of complex visualizations is easy. While many BI tools compete in the race of intricate visuals, Looker chooses to tread a different path, embracing simplicity and efficiency. And here’s why I staunchly believe that this approach makes Looker an outstanding tool in the BI landscape.

The misunderstood virtue of simplified visualizations

As the adage goes, less is often more. In the realm of BI tools, this couldn’t be truer. Over-customized dashboards, while initially impressive, can lead to unnecessary complications and deviations from the core message. Looker’s choice to prioritize simplified visuals is not a shortcoming but a strategic decision. Unlike many of its competitors that focus on a plethora of visualization options, Looker emphasizes clarity and user-friendliness. The result? Dashboards deliver their intended message without the fluff, ensuring that users stay focused on the data that matters.

Infrastructure management with Terraform

One of Looker’s most commendable features is its seamless integration with Terraform. For those in the know, Terraform stands out as an indispensable tool for infrastructure as code, enabling users to provision and manage infrastructure resources systematically.

Through this integration, Looker offers:

  • Automated provisioning: Forget the tedious manual processes; with Looker and Terraform, infrastructure provisioning becomes a breeze.
  • Version control: Keep track of changes, roll back when necessary, and ensure everyone is on the same page.
  • Consistent environments: Whether it’s for development, testing, or production, maintain consistent environments effortlessly.

The robustness of Looker’s API

Having personally experimented with Looker’s API for over three years, I can attest to its robustness. It’s like a well-oiled machine — no matter how hard you push, it continues to perform optimally without hiccups. Its flexibility allows for a wide array of custom solutions, its scalability ensures it can handle growth efficiently, and its reliability means you can trust it to deliver when it matters most.

LookML dashboards, a game-changer

Enter LookML dashboards — Looker’s ingenious solution for those who appreciate the power of code. By allowing dashboards to be converted into LookML, Looker provides a structured, code-based representation of dashboards. This not only aids in professional versioning but also brings in the benefits of a code-centric approach, such as detailed documentation and easier collaboration.

Real-world applications, finding duplicate dashboards

Now, let’s delve into a practical application to truly understand Looker’s prowess. In our quest to optimize our BI operations, we faced a challenge: identifying and eliminating near-duplicate dashboards. Our solution? Harness the power of LookML and vectorization.

By converting dashboards to LookML, we could represent each dashboard as a unique code structure. Following this, we employed vectorization techniques to compare these code representations. The outcome? We were efficiently able to spot near duplicates and prune our dashboard collection, ensuring only the most relevant and unique dashboards remained.

Initial imports

import pandas as pd
import sqlite3
import time
import looker_sdk
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from joblib import Parallel, delayed
from concurrent.futures import ThreadPoolExecutor

Explanation: This section imports necessary libraries and modules.

  • pandas for data manipulation and analysis.
  • sqlite3 to work with an SQLite database.
  • looker_sdk to interact with Looker.
  • Several modules from sklearn and joblib for text processing and parallel processing.
  • ThreadPoolExecutor for concurrent execution.

Understanding the libraries

  1. TfidfVectorizer (from sklearn.feature_extraction.text):

What is it? TfidfVectorizer stands for Term Frequency-Inverse Document Frequency Vectorizer. It is a technique to quantify a word in documents, we generally compute a weight to each word which signifies the importance of the word in the document and corpus.

Why use it? In the context of comparing LookML dashboard texts, this tool transforms the text data into a matrix of TF-IDF features, making it feasible to analyze and compare textual data mathematically.

2. cosine_similarity (from sklearn.metrics.pairwise):

What is it? It is a metric used to determine the cosine of the angle between two non-zero vectors. This can be used to measure how similar two documents (or in this case, LookML dashboard texts) are irrespective of their size.

Why use it? By comparing the vectors of two documents (represented by the TF-IDF scores of their words), you can get a measure of their similarity, which is precisely what you’re aiming for when identifying near-duplicate dashboards.

3. Parallel and delayed (from joblib):

What are they? Parallel is a utility that helps execute functions concurrently, while delayed is a decorator that captures the arguments and function for later evaluation.

Why use them? In the context of your script, you use these tools to speed up the comparison of dashboards by processing them in parallel. This is especially useful when dealing with a large number of dashboards, as it optimizes processing time.

4. ThreadPoolExecutor (from concurrent.futures):

What is it? It’s a class that provides a high-level interface for asynchronously executing functions using threads. It creates a pool of threads and manages the execution of tasks in these threads.

Why use it? In your script, ThreadPoolExecutor is used to concurrently fetch LookML for multiple dashboards. This concurrent execution speeds up the data-fetching process, particularly when there are a large number of dashboards to process.

SQLite database setup

def setup_sqlite_db():
conn = sqlite3.connect('looker_dashboards.db')
cursor = conn.cursor()
cursor.execute("DROP TABLE IF EXISTS dashboards")
cursor.execute('''
CREATE TABLE dashboards (
id TEXT PRIMARY KEY,
title TEXT NOT NULL,
lookml TEXT
)
''')
conn.commit()
return conn, cursor

Explanation: Sets up an SQLite database and creates a table named dashboards to store Looker dashboard data.

Fetching LookML from Looker

def fetch_lookml(dashboard, sdk):
if not dashboard.id.isnumeric():
return None
try:
lookml = sdk.dashboard_lookml(dashboard.id).lookml
return dashboard.id, dashboard.title, str(lookml)
except looker_sdk.error.SDKError:
print(f"Failed to fetch LookML for dashboard ID: {dashboard.id}. Skipping...")
return None

Explanation: For a given dashboard, the function fetches its LookML representation using the Looker SDK.

Storing dashboards in SQLite

def store_dashboards_in_sqlite(all_dashboards, cursor, sdk):
with ThreadPoolExecutor(max_workers=5) as executor:
results = list(executor.map(lambda dashboard: fetch_lookml(dashboard, sdk), all_dashboards))
for result in results:
if result:
dashboard_id, dashboard_title, lookml = result
cursor.execute("INSERT OR REPLACE INTO dashboards (id, title, lookml) VALUES (?, ?, ?)",
(dashboard_id, dashboard_title, lookml))

Explanation: The function fetches LookML for dashboards concurrently and then stores the results in the SQLite database.

Extracting data from SQLite

def get_dashboard_texts(cursor):
cursor.execute("SELECT id, title, lookml FROM dashboards")
rows = cursor.fetchall()
return [row[0] for row in rows], [row[1] for row in rows], [row[2] for row in rows]

Explanation: Retrieves all dashboard IDs, titles, and LookML representations from the SQLite database.

Comparing dashboards

def compare_dashboards(i, dashboard_titles, similarity_matrix, threshold=0.9):
similar_dashboards = []
for j in range(i + 1, len(dashboard_titles)):
if similarity_matrix[i][j] > threshold:
similar_dashboards.append((i, j, similarity_matrix[i][j]))
return similar_dashboards

Explanation: Compares a specific dashboard’s similarity with all other dashboards using the similarity matrix.

def main():
sdk = looker_sdk.init40()
conn, cursor = setup_sqlite_db()
all_dashboards = sdk.all_dashboards()
store_dashboards_in_sqlite(all_dashboards, cursor, sdk)
conn.commit()
dashboard_ids, dashboard_titles, dashboard_texts = get_dashboard_texts(cursor)
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(dashboard_texts)
similarity_matrix = cosine_similarity(X)
num_cores = -1
data = Parallel(n_jobs=num_cores)(
delayed(compare_dashboards)(i, dashboard_titles, similarity_matrix)
for i in range(len(dashboard_titles))
)
similar_dashboards_flat = [item for sublist in data for item in sublist]
df_data = {
"Dash 1 Title": [dashboard_titles[i] for i, _, _ in similar_dashboards_flat],
"Dash 1 ID": [dashboard_ids[i] for i, _, _ in similar_dashboards_flat],
"Dash 2 Title": [dashboard_titles[j] for _, j, _ in similar_dashboards_flat],
"Dash 2 ID": [dashboard_ids[j] for _, j, _ in similar_dashboards_flat],
"Similarity Ratio": [score for _, _, score in similar_dashboards_flat]
}
df = pd.DataFrame(df_data)
print(df)
conn.close()

Explanation: The main function runs the entire process: initializing the SDK, setting up the SQLite database, storing and retrieving dashboard data, vectorizing the LookML, computing the similarity matrix, finding similar dashboards

Results

Precaution and future developments

Versioning note: The code shared above represents the initial version. As with all initial versions, it’s essential to understand its limitations and areas for improvement. We are actively deliberating on how to make this code more production-ready.

User scope: In subsequent versions, one of our main focuses will be to ensure that users can only find duplicates within their data scope, respecting data privacy and access rights.

Resource consumption: Fetching vast amounts of data via APIs, especially in a large organization with multiple dashboards, can be resource-consuming. It might slow down other crucial operations or even lead to throttling. To mitigate this, we are considering introducing this feature only during weekends. This approach ensures minimal disruption during peak working hours.

Notification system: Instead of having users manually check for duplicate dashboards, we plan to automate notifications. This could be done by sending a one-time message to a designated Slack channel or by email. The notification will be directed to the team responsible for the project, alerting them about possible dashboard duplications.

Deepening the functionality: Going forward, we aim to refine the system further by introducing automated notifications for dashboard owners. If the system detects a dashboard that’s significantly similar to another, it can automatically alert the respective owners. This proactive approach empowers dashboard owners to manage duplications and redundancies better.

Remember, as with all BI tools and processes, continuous improvement and iteration are the keys. The objective is not just to have a feature but to ensure that it adds value and efficiency to the end-users.

Conclusion

Through the example provided, we’ve showcased just a fraction of the power Looker brings to the table. It possesses a robust API, enabling users to craft bespoke solutions tailored to a company’s unique needs. Such out-of-the-box solutions amplify an organization’s ability to manage its Business Intelligence (BI) efficiently, ensuring that its data landscape remains clutter-free and coherent.

However, the journey doesn’t stop here. There’s ample room to dive deeper, refine, and optimize, ensuring an even cleaner and more efficient BI instance. While it’s true that for many, Looker can be a challenging tool to grasp — often due to its depth and the breadth of possibilities it offers — it’s essential to see beyond its surface. It’s easy to focus predominantly on the final visualization, which, given its tangible nature, is understandable. But when one truly comprehends Looker’s underlying philosophy and harnesses its full potential, what emerges is a platform that any company would find invaluable: efficient, tidy, and effectively managed.

Remember, it’s not just about understanding the tool, but also about recognizing the vision behind it. And with Looker, that vision is a world where BI is not just accessible but also streamlined, logical, and powerfully effective.

See ya 🤟

--

--

Responses (2)