How to ingest data from an api directly in a Google BigQuery table!

Michaël Scherding
4 min readNov 2, 2022

--

Hey 👋

I’m coming from an analyst background with years of experience on SQL but I always felt frustrated to not understand what is under the hood.

When I was learning Python and during my learning journey I felt sometimes frustrated by tutorials where I did not find or understand clearly what was going on under the lines. So I decided to make my own small learning spot on something simple.

Of course it’s not perfect, of course it’s simple, of course there are a ton of ways to do it better, but it’s an honest start to just… do something and understand the logic. If you have any way of improvement I would love to know it in the comments section.

So let’s get our hands dirty 🤟

Context

Well, in the upcoming steps we will try to retrieve data from an API, write it in a .csv file and then charge it in a Google BigQuery table. Basic, but as you will see we will have several steps to respect.

As I am a beginner and as I love doing simple things in the most straightforward way, I’ll divide the python script into several functions. The goal is simple, having some small piece of code doing only one thing. Of course you can wrap everything in only one function, but I prefer the step by step method which is easier to understand.

Set 1, import libraries

Nothing complicated we will use (in the right order):

  • requests will support the api call
  • json from transforming data coming from the api to json format
  • pandas will support data structure thought dataframe
  • bigquery from google, it will support the integration of data from a .csv to a bigquery table

Step 2, get data from api

First step will be to retrieve data from an api. My advice is to use rapidapi, it’s a great way to find good api endpoints with a lot of great topics. And you will have a free number of requests, so perfect for your tests.

I decided to use the Game Prices. In this api you can get the current game prices from a variety of stores and tell you which store has the cheapest price.

Please find below the first function called get_data_from_api:

nb: in the response.json() I could only retrieve a part of the data and not everything. It can be a good idea to limit the call especially if you are ingesting data in columnar format (like bigquery table). For sure api can change and evolve though the time and playing on quicksand can be tricky especially if you are not familiar with schema evolution.

Step 3, write to .csv

Next we have to write the json in a .csv format.

nb: I took .csv format for simplicity and because I already used python code for that. But you have several option for output format like parquet or avro. Please find here a great article to have an overview of the best option.

Step 4, load to BigQuery

And finally we will load everything in a bigquery table.

Please find below the logic:

  • client will define the user (or service account) used for the action. In fact if you want to load data to a table, you should have access to it and of course with the right level of access.
  • table_id is basically the path to the table you want to load
  • job_config is here to define the config, I’m telling to Google that my source format is .csv, then I will skip the first row because it’s the header of my table and I’ll use the autodetect feature of bigquery Autodetect is great when you want to let bigquery manage the schema creation in your table. Most of the time I prefer write the schema on my own but for the example I wanted to go as simple possible
  • with open will open the csv and then we will define the load job to the table
  • finally we will receive some information from the job in order to build a smooth message in the print

Step 5, wrapping all together

Please find below all functions wrapped together:

Next steps

  • how to manage security and avoiding using key or .json file directly in the code
  • maybe use parquet format for ingestion or directly avoid the file creation and ingest data directly from json format

Conclusion

Python is really straightforward and easy to read. It’s clearly a good option for coding introduction. And for sure SQL + Python combo will let you have many options in your analytics daily journey.

If you have any questions, ideas or improvments I would love to discuss with you in the comment section.

See you soon and stay safe

--

--

Responses (4)