This project is an automated ETL (Extract, Transform, Load) pipeline built in Python. It fetches real-time weather data for a predefined list of cities from the OpenWeatherMap API, cleans and structures the data, and then loads it into a SQLite database for storage and future analysis.
The ETL pipeline executes the following core functions:
1. First, clone the repository and navigate to the project directory from the repository root:
cd projects/weather_etl
2. Create a .env file in this directory to store your credentials. You will need a free API key from OpenWeatherMap. The database path is also set here.
# .env file
API_KEY='your_openweathermap_api_key'
DATABASE_URL='sqlite:///weather_data.db'
3. Install the required Python packages:
pip install -r requirements.txt
4. Run the ETL script:
python weather_etl.py
The script will execute the pipeline and load the latest weather data into the `weather_data.db` file in the same directory.
When you run the script, you will see console output similar to the following, confirming the process steps:
INFO:root:Starting weather ETL process...
INFO:root:Fetching data for London, GB...
INFO:root:Fetching data for New York, US...
INFO:root:Fetching data for Tokyo, JP...
INFO:root:Fetching data for Sydney, AU...
INFO:root:All data fetched successfully.
INFO:root:Transforming data...
INFO:root:Data transformation complete.
INFO:root:Loading data into SQLite database...
INFO:root:Data loaded successfully. 4 rows inserted into weather_data.
INFO:root:ETL process finished.