← Back to Projects

✈️ Frankfurt (FRA) Aviation Data Pipeline

This serverless pipeline is updated automatically by GCP Cloud Scheduler every day.

About This Project

This application provides a dynamic and visually appealing analytics dashboard for Frankfurt Airport (FRA) outbound flights. Powered by a fully automated, serverless Medallion architecture, data is fetched hourly from the Aviationstack API. Instead of processing raw data directly on the dashboard, a containerized PySpark job running on Google Cloud Run transforms the data through Bronze, Silver, and Gold layers. It applies advanced probability-based data augmentation for missing delays and stores the final structured data in Google BigQuery. This setup ensures data integrity, realistic visualizations, and highly optimized query performance.

The source code for this project is available on GitHub at the following link: Aviation Pipeline.

Key Features

  • Serverless Medallion Architecture: Data flows seamlessly through Bronze, Silver, and Gold layers using Apache Spark (PySpark) inside a Docker container.
  • Fully Automated CI/CD: Every code push triggers a GitHub Actions workflow that securely builds and deploys the updated container to Google Cloud Run.
  • Zero Idle Costs: Scheduled via GCP Cloud Scheduler to run hourly, meaning you only pay for the exact compute seconds used during the ETL process.

Technologies Used

  • BI / Visualization: Looker Studio
  • Data Processing: Python, Apache Spark (PySpark), Pandas
  • Cloud: Google Cloud Platform (Cloud Run, Cloud Scheduler, BigQuery)
  • DevOps: Docker, GitHub Actions
Serverless Aviation Data Pipeline Architecture

Figure: Serverless Aviation Pipeline with PySpark & GCP CI/CD