How to outsmart BigQuery storage costs and save $$$ like a pro
You’re a BigQuery ninja, slicing through massive datasets with lightning-fast queries. Life’s great… until you look at the bill. Ouch! Turns out, all those terabytes of data sitting in logical storage are costing you a fortune. But don’t worry, we’re about to flip the script and make BigQuery work smarter for you.
Welcome to the world of logical vs. physical storage, where data is king, and your wallet is on the line. In this guide, we’ll decode BigQuery’s storage magic and arm you with a SQL query so sharp it could cut through your bills.
The Good, the Logical, and the Physical
Before we jump into the code, let’s get one thing straight: logical storage is BigQuery’s default, but it’s not always your best friend. Why? Because logical storage bills you for the full, uncompressed size of your data. It’s like paying for a gallon of milk and getting two-thirds of it in air.
On the flip side, physical storage compresses your data, saving space and (more importantly) money. It’s perfect for long-term storage or datasets that aren’t actively being updated. But here’s the catch: switching to physical storage isn’t automatic, and making the wrong move could end up costing you.
The Master Plan
The goal is simple:
- Compare costs between logical and physical storage.
- Identify datasets where switching to physical storage will save you money.
- Generate a SQL command to make the switch with one click.
You’ll look like a genius, and your CFO will thank you.
The query
Understanding the Query’s Logic
Step 1: pricing setup
The query begins by defining the cost per GiB for both logical and physical storage. These values are declared as variables to make the query adaptable to your specific pricing structure.
Step 2: gathering dataset storage data
Using BigQuery’s INFORMATION_SCHEMA.TABLE_STORAGE_BY_PROJECT
, the query extracts storage metrics for all datasets in a region. It focuses on:
Logical storage: The raw, uncompressed size of the data (active_logical_bytes
and long_term_logical_bytes
).
Physical storage: The compressed size (active_physical_bytes
), separated into:
- Active physical size without time-travel data.
- Long-term physical size.
- Time travel and fail-safe physical data.
Step 3: calculating costs
For each dataset, the query estimates:
- Logical storage costs: Based on the size of active and long-term logical data multiplied by their respective costs.
- Physical storage costs: Includes active, long-term, time-travel, and fail-safe physical sizes, multiplied by their respective costs.
Step 4: cost comparison
The query compares the total logical storage costs with physical storage costs. The result is the forecast_total_cost_difference
, which quantifies the savings (or loss) from switching storage modes:
- Positive value → Switching to physical storage is cheaper.
- Negative value → Staying in logical storage is better.
Step 5: filtering for actionable insights
The query filters out datasets where switching to physical storage doesn’t save money. Only datasets with a positive forecast_total_cost_difference
are included in the results.
Step 6: Generating SQL Commands
For each cost-saving dataset, the query outputs an SQL command:
ALTER SCHEMA `<dataset_name>` SET OPTIONS(storage_billing_model = 'PHYSICAL');
Conclusion
BigQuery is a beast, fast, powerful, and hungry for your money. But now, you’ve got the ultimate tool to tame it: a smart query that knows when to say, “Hey, let’s save some cash.”
With this query, you’re no longer just a user; you’re a data ninja:
- Slicing through logical and physical storage costs like a samurai with a budget spreadsheet.
- Pinpointing the datasets where physical storage saves you real money — no more guesswork.
- Generating SQL commands so slick, they basically hit “Enter” for you.
So go ahead, run this query, flip some datasets to physical storage, and watch the savings roll in.
Take care