Collecting Data with Apache Airflow on a Raspberry Pi

-

A Raspberry Pi is All You Need

Raspberry Pi Zero (model 2021), Image source Wikipedia

Often, we’d like to gather some data inside a certain time frame. It might be data from the IoT sensor, statistical data from social networks, or something else. For example, the YouTube Data API allows us to get the variety of views and subscribers for any channel at the present moment, however the analytics and historical data can be found only to the channel owner. Thus, if we would like to get weekly or monthly summaries about these channels, we’d like to gather this data ourselves. Within the case of the IoT sensor, there could also be no API in any respect, and we also must collect and save data on our own. In this text, I’ll show learn how to configure Apache Airflow on a Raspberry Pi, which allows running tasks for a protracted time frame without involving any cloud provider.

Obviously, should you’re working for a big company, you will likely not need a Raspberry Pi. In that case, should you need an additional cloud instance, just create a Jira ticket to your MLOps department 😉 But for a pet project or a low-budget startup, it will probably be an interesting solution.

Let’s see how it really works.

Raspberry Pi

What is definitely a Raspberry Pi? For those readers who’ve never been excited about hardware for the last 10 years (the primary Raspberry Pi model was introduced in 2012), I can briefly explain that this can be a single-board computer running full-fledged Linux. Often, a Raspberry Pi has a 1GHz, 2–4-core ARM CPU and 1–8 MB of RAM. It’s small, low-cost, and silent; it has no fans and no disk drive (the OS is running from a Micro SD card). A Raspberry Pi needs only a typical USB power supply; it will probably be connected via Wi-Fi or Ethernet to a network and run different tasks inside months and even years.

For my data science pet project, I wanted to gather the YouTube channel statistics inside 2 weeks. For a task that requires only 30–60 seconds twice per day, a serverless architecture could be a perfect solution, and we are able to use something like Google Cloud Function for that. But every tutorial from Google began with the phrase “enable billing to your project”. There may be free first credit and free quotas provided by Google, but I didn’t wish to have one other headache of monitoring how much money I…

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x