Learning data engineering with the snowflake, airflow, dbt stack.

Right now I am trying to delve deeper in the area of data engineering, to do this I chose the stack comprising of Snowflake, for a cloud datawarehouse storage system, airflow to extract data from one point to another and dbt to transform the data where I want it to be.

To this goal I am now developing a project that consists of a youtube API data pipeline, I plan to extract semi-structured data from a Youtube API about popular current videos, channels and anything else the API offers, then I will dbt to structure the data and insert it into Snowflake, as Snowflake is a paid service with a limited time free trial I intend to later on move to a local instance of postgreSQL but try to learn the most of cloud datawarehousing while I can.

What is interesting about studying youtube data is that as with any other social media you can extract knowledge from any kind of field you can think of; as a techincal challenge I can also go further beyond and run a sentiment analysis algorithm to determine a genre of video, generate a summary of it, etc.

YT pipeline page Right now I only have a very simple data pipeline that loads a video into an sql table in Snowflake; next I want to structure the data into a star schema and run more pipeline processing on it: I believe that the pace of the project will pick up now that I have a better hang of the tools I am working with.

Successes and failures statistics

What I am working on now

Learning data engineering with the snowflake, airflow, dbt stack.