BigQuery ML or the Machine Learning for the dumbs

Michaël Scherding
3 min readMar 22, 2022

Hi all,

Hope you’re doing well. Today and want to discuss about BigQuery ML and how you can leverage Machine Learning with ‘’only’’ SQL knowledges.

Context

You all already know BigQuery, but since a couple a years you can call Machine Learning models only with SQL queries. You can call several models like:

  • linear regression, for previsions
  • binary linear regression, for classification
  • multicass logistic regression… and so on

You can have a look at all the models here.

In our example, we want to predict the number of covid cases in the next 15 days. And we will use Data Studio for visualization.

Step 1

We will use public dataset, you will find below the query used:

SELECT *
FROM `bigquery-public-data.covid19_open_data.compatibility_view`
WHERE country_region = 'France'

The results will look like:

Then we can jump directly on Data Studio in order to have a first overview:

Then we can start to analyse the result:

Step 2

Now it’s time to build our model, SQL will looks like:

CREATE MODEL bqml_tuto.confirmed_pred
OPTIONS(model_type='ARIMA', time_series_data_col='confirmed', time_series_timestamp_col='date') AS
SELECT date, LOG(SUM(confirmed)) confirmed
FROM `bigquery-public-data.covid19_open_data.compatibility_view`
WHERE country_region = 'France'
AND confirmed IS NOT NULL
GROUP BY date
ORDER BY date DESC

We will use the ARIMA’s algorithm. You will have to adjust 2options:

  • model_type, name of the model
  • time_series_data_col, name of the column where the date is stored

We will also use the LOG function. This function will helps to ‘’flat” series when they are not linear. It will looks like:

Step 4

Our model is ready, we can now use with:

SELECT *,
EXP(forecast_value) AS pred_confirmed
FROM ML.FORECAST(MODEL bqml_tuto.confirmed_pred,
STRUCT(15 AS horizon, 0.9 AS confidence_level))

We will use the ML.FORECAST function. STRUCT will define the number of days you want to predict (horizon) and the confidence level. In order to limit the number of columns you can use the query below:

You can use Data Studio for the visualization:

Conclusion

As you can see, it’s really to use BigQuery ML. It’s a great tool to start to think about ML when you are not confortable with Machine Learning.

Have fun!

--

--