A step-by-step guide to maintaining a clean Looker instance

Michaël Scherding
6 min readApr 15, 2024

--

Cleaning an instance is crucial for maintaining optimal performance and resource utilization, reducing costs, and enhancing user experience. It ensures that users have access to accurate and relevant data, aids in compliance with data retention policies, and improves security by minimizing the risk of data breaches.

In the realm of Business Intelligence (BI), maintaining a clean and organized environment is crucial. Over time, unused dashboards and outdated reports can clutter the workspace, leading to decreased productivity and efficiency. Automating the clean-up process ensures that the BI tools remain streamlined and user-friendly, allowing teams to focus on deriving insights rather than navigating through the clutter.

In this article, we will walk through the process of creating an automated clean-up script for Looker, using Python. This script will notify dashboard creators about their inactive dashboards and delete the ones that have been inactive for an extended period.

First, create your necessary Looks

To generate the necessary results, you are required to create two distinct Looks utilizing the system activity:

First Look:

  • Navigate to the history explore.
  • Please include the following dimensions: dashboard creator name, ID of the dashboard, and incorporate the measure most recent query date.
    The purpose of employing the most recent query date is to determine whether or not the dashboard is in use.

Second Look:

  • Navigate to the users explore.
  • Include the dimensions: name, ID, and email.

Ideally, we could consolidate these two explores using a merge result. However, since creating a Look from a merge result isn’t feasible, we will instead execute the join/filter logic utilizing pandas and dataframes in our subsequent steps.

Importing Necessary Libraries

import smtplib
import ssl
import json
import pandas as pd
import looker_sdk
from datetime import datetime
  • smtplib and ssl are used for sending emails securely.
  • json is used for parsing JSON data.
  • pandas is used for handling data in tabular form.
  • datetime is used for working with dates and times.
  • looker_sdk is used for interacting with the Looker API.

Defining the send_mail Function

def send_mail(emails, subject, content):
port = 465 # For SSL
smtp_server_domain_name = "smtp.gmail.com"
sender_mail = "xxx" # Replace with your account
password = "xxx" # Replace with your App Password

ssl_context = ssl.create_default_context()
with smtplib.SMTP_SSL(smtp_server_domain_name, port, context=ssl_context) as service:
service.login(sender_mail, password)
for email in emails:
msg = f"Subject: {subject}\n{content}"
service.sendmail(sender_mail, email, msg.encode('utf-8')) # Encode message in UTF-8

This function sends an email with a given subject and content to a list of email addresses using a predefined sender email and password. It establishes a secure SMTP connection and logs in to the email service to send emails.

Defining the notify_dashboard_creators Function

def notify_dashboard_creators(data):
base_url = "xxx" # Replace with your Looker url
print(f"Sending notifications to {len(data)} dashboard creators...")
for index, row in data.iterrows():
creator_email = row['user.email']
creator_name = row['dashboard_creator.name']
dashboard_id = row['dashboard.id']
dashboard_url = base_url + str(dashboard_id)
subject = "LOOKER - Action Required: Dashboard Inactivity Notice"
content = f"""\
Dear {creator_name},

We have noticed that your dashboard with ID {dashboard_id} has not been used for more than 6 months. You can view the dashboard directly by following this link: {dashboard_url}

We kindly request you to either delete or update it to help us maintain a clean instance.

Please note that if no action is taken, the dashboard will be archived after 13 months of inactivity.

Thank you for your understanding and cooperation.

Best regards
"""
send_mail([creator_email], subject, content)
print(f"Notification sent to {creator_name} (ID: {dashboard_id}) at {creator_email}.")
print("Waiting for 2 seconds before sending the next email...")
time.sleep(2) # Introduce a delay of 2 seconds between each email

This function sends notifications to dashboard creators whose dashboards have been inactive for more than 6 months. It constructs the email subject and content and calls the send_mail function to send the emails. After each email, it waits for 2 seconds before proceeding to the next one.

The time.sleep is necessary in order to not hit the Gmail quota limitations of 50/100 emails per minute.

Defining the delete_dashboards function

def delete_dashboards(data):
print(f"Deleting {len(data)} dashboards...")
for index, row in data.iterrows():
dashboard_id = row['dashboard.id']
try:
sdk.update_dashboard(dashboard_id, models.WriteDashboard(deleted=True))
print(f"Dashboard with ID {dashboard_id} has been moved to trash.")
except error.SDKError:
print(f"Failed to delete dashboard with ID {dashboard_id}.")

This function deletes dashboards that have been inactive for a certain period by calling the Looker SDK’s update_dashboard method. If the deletion fails, it prints an error message.

Main Execution Block

if __name__ == '__main__':
# Fetch data from the first Look
print("Fetching data from the first Look...")
response1 = sdk.run_look(
look_id="xxx", # Id of you first look
result_format="json"
)
data1 = json.loads(response1)
df1 = pd.DataFrame(data1)

# Fetch data from the second Look
print("Fetching data from the second Look...")
response2 = sdk.run_look(
look_id="xxx", # Id of the second look
result_format="json"
)
data2 = json.loads(response2)
df2 = pd.DataFrame(data2)

# Join the two dataframes on the name columns
print("Joining the two dataframes on the name columns...")
joined_df = df1.merge(df2, left_on="dashboard_creator.name", right_on="user.name", how="left")

# Convert the 'most_recent_query_date' column to datetime type
print("Converting the 'most_recent_query_date' column to datetime type...")
joined_df['history.most_recent_query_date'] = pd.to_datetime(joined_df['history.most_recent_query_date'])

# Filter rows where the difference from today is greater than 180 days
print("Filtering rows where the difference from today is greater than 180 days...")
filtered_df = joined_df[joined_df['history.most_recent_query_date'] < datetime.now() - pd.Timedelta(days=180)]

# Send notifications
print("Sending notifications...")
notify_dashboard_creators(filtered_df)

# Delete dashboards not used since 390 days
print("Deleting dashboards not used since 390 days...")
to_delete_df = joined_df[joined_df['history.most_recent_query_date'] < datetime.now() - pd.Timedelta(days=390)]
delete_dashboards(to_delete_df)

print("Process completed successfully!")

This block is executed when the script is run. It fetches data from two different Looks using the Looker SDK, converts the fetched data to pandas DataFrames, joins the two DataFrames on the name columns, filters rows based on the last query date, sends notifications to the corresponding dashboard creators, deletes dashboards based on the filtered DataFrames, and finally prints a completion message.

Full code

import smtplib
import ssl
import json
import pandas as pd
from datetime import datetime
import looker_sdk
from looker_sdk import models, error

def send_mail(emails, subject, content):
port = 465 # For SSL
smtp_server_domain_name = "smtp.gmail.com"
sender_mail = "xxx" # Replace with account
password = "xxx" # Replace with your App Password

ssl_context = ssl.create_default_context()
with smtplib.SMTP_SSL(smtp_server_domain_name, port, context=ssl_context) as service:
service.login(sender_mail, password)
for email in emails:
msg = f"Subject: {subject}\n{content}"
service.sendmail(sender_mail, email, msg.encode('utf-8')) # Encode message in UTF-8

def notify_dashboard_creators(data):
base_url = "xxx" # Replace with your url
print(f"Sending notifications to {len(data)} dashboard creators...")
for index, row in data.iterrows():
creator_email = row['user.email']
creator_name = row['dashboard_creator.name']
dashboard_id = row['dashboard.id']
dashboard_url = base_url + str(dashboard_id)

subject = "LOOKER - Action Required: Dashboard Inactivity Notice"
content = f"""\
Dear {creator_name},

We have noticed that your dashboard with ID {dashboard_id} has not been used for more than 6 months. You can view the dashboard directly by following this link: {dashboard_url}

We kindly request you to either delete or update it to help us maintain a clean instance.

Please note that if no action is taken, the dashboard will be archived after 13 months of inactivity.

Thank you for your understanding and cooperation.

Best regards
"""
send_mail([creator_email], subject, content)
print(f"Notification sent to {creator_name} (ID: {dashboard_id}) at {creator_email}.")

print("Waiting for 2 seconds before sending the next email...")
time.sleep(2) # Introduce a delay of 2 seconds between each email

def delete_dashboards(data):
print(f"Deleting {len(data)} dashboards...")
for index, row in data.iterrows():
dashboard_id = row['dashboard.id']
try:
sdk.update_dashboard(dashboard_id, models.WriteDashboard(deleted=True))
print(f"Dashboard with ID {dashboard_id} has been moved to trash.")
except error.SDKError:
print(f"Failed to delete dashboard with ID {dashboard_id}.")

if __name__ == '__main__':
# Fetch data from the first Look
print("Fetching data from the first Look...")
response1 = sdk.run_look(
look_id="xxx", # Replace with Look id
result_format="json"
)
data1 = json.loads(response1)
df1 = pd.DataFrame(data1)

# Fetch data from the second Look
print("Fetching data from the second Look...")
response2 = sdk.run_look(
look_id="xxx", # Replace with Look id
result_format="json"
)
data2 = json.loads(response2)
df2 = pd.DataFrame(data2)

# Join the two dataframes on the name columns
print("Joining the two dataframes on the name columns...")
joined_df = df1.merge(df2, left_on="dashboard_creator.name", right_on="user.name", how="left")

# Convert the 'most_recent_query_date' column to datetime type
print("Converting the 'most_recent_query_date' column to datetime type...")
joined_df['history.most_recent_query_date'] = pd.to_datetime(joined_df['history.most_recent_query_date'])

# Filter rows where the difference from today is greater than 180 days
print("Filtering rows where the difference from today is greater than 180 days...")
filtered_df = joined_df[joined_df['history.most_recent_query_date'] < datetime.now() - pd.Timedelta(days=180)]

# Send notifications
print("Sending notifications...")
notify_dashboard_creators(filtered_df)

# Delete dashboards not used since 390 days
print("Deleting dashboards not used since 390 days...")
to_delete_df = joined_df[joined_df['history.most_recent_query_date'] < datetime.now() - pd.Timedelta(days=390)]
delete_dashboards(to_delete_df)

print("Process completed successfully!")

Result

Nota bene

The provided code is in a single .py file for simplicity and explanatory purposes. However, for practical and production use, it’s beneficial to split the code into multiple files or modules, ideally within a dedicated repository.

Summary

Each snippet of this code has a specific role, from importing necessary libraries, defining functions for sending emails, notifying dashboard creators, deleting dashboards, to executing the main block that orchestrates all these functions to manage Looker dashboards effectively.

Have fun 🤟

--

--

No responses yet