← Back to Main Portfolio

Weather Data ETL Pipeline

This project is an automated ETL (Extract, Transform, Load) pipeline built in Python. It fetches real-time weather data for a predefined list of cities from the OpenWeatherMap API, cleans and structures the data, and then loads it into a SQLite database for storage and future analysis.

Project Overview

The ETL pipeline executes the following core functions:

  1. Extract: The script sends GET requests to the OpenWeatherMap API to retrieve current weather data for several major cities. API keys and other configurations are securely managed using environment variables.
  2. Transform: The raw JSON responses from the API are parsed. Relevant fields like temperature, humidity, wind speed, and weather conditions are selected. Data is cleaned by converting temperature from Kelvin to Celsius and structuring it into a clean, tabular format using the Pandas library.
  3. Load: A connection is established to a SQLite database using SQLAlchemy. The transformed data, now in a Pandas DataFrame, is efficiently loaded into a dedicated `weather_data` table. The script handles table creation if it doesn't already exist.
  4. Logging: The script provides informative log messages throughout the process, indicating its progress, successful operations, and any potential errors.

Technologies Used

How to Run

1. First, clone the repository and navigate to the project directory from the repository root:

cd projects/weather_etl

2. Create a .env file in this directory to store your credentials. You will need a free API key from OpenWeatherMap. The database path is also set here.

# .env file
API_KEY='your_openweathermap_api_key'
DATABASE_URL='sqlite:///weather_data.db'

3. Install the required Python packages:

pip install -r requirements.txt

4. Run the ETL script:

python weather_etl.py

The script will execute the pipeline and load the latest weather data into the `weather_data.db` file in the same directory.

Sample Output

When you run the script, you will see console output similar to the following, confirming the process steps:


INFO:root:Starting weather ETL process...
INFO:root:Fetching data for London, GB...
INFO:root:Fetching data for New York, US...
INFO:root:Fetching data for Tokyo, JP...
INFO:root:Fetching data for Sydney, AU...
INFO:root:All data fetched successfully.
INFO:root:Transforming data...
INFO:root:Data transformation complete.
INFO:root:Loading data into SQLite database...
INFO:root:Data loaded successfully. 4 rows inserted into weather_data.
INFO:root:ETL process finished.