Machine learning to predict day zero

A project completed as part of the EECS:349 Machine Learning course to spread awareness about rapid depletion of water resources by predicting Day Zero for a country. Day Zero refers to a situation of acute water shortage. The initial stage of this project involved building up a large dataset (1968 examples), with 11 attributes, shown in the table below. We chose attributes like rainwater harvesting awareness, water consumption per capita, and desalination capacity. The dataset contained information for 180 countries from 1960 to 2014, sourced from the AQUASTAT database. After this, we trained a machine learning model on this dataset, and with the stress level as the target attribute, we used the best model to predict the stress levels of a country.

Attribute	Unit	Description
Rainwater Harvesting Awareness	Yes/No	Determined by whether or not rainwater harvesting is widely practiced
Water Consumption per Capita	m^3/year/inhabitant	Total amount of water withdrawn per capita
Desalination Capacity	km^3/year	Fresh water produced using brackish or salt water
Water Dependency Ratio	%	Percentage of water that comes from other countries
Agricultural Water Withdrawal	%	Percentage of total water withdrawn used for agriculture
Industrial Water Withdrawal	%	Percentage of total water withdrawn used for industrial purposes
Municipal Water Withdrawal	%	Percentage of total water withdrawn used for municipal purposes
Water Stress Level	%	Water stress level measured by dividing total water withdrawal by the total water available minus any water needed for environmental flow. This was used to determine the class label for each sample
Total Land Cultivated	%	Percentage of the total land area of the country that has been cultivated
Annual Precipitation	mm/yr	Total depth of precipitation per year
Total Renewable Water Resources per Capita	m^3/year/inhabitant	The maximum theoretical yearly amount of water available per person for a country at a given moment

weka algorithm vs. classification accuracy

programming skills

Using the dataset, we used Weka to find the right algorithm to build a model for our data. The success rates for the different models are shown in the graph above. We found out that nearest neighbor (IBk) produced the best results (with 88.26% classification accuracy). Using the model built with the nearest neighbor algorithm, we predicted stress levels for all the countries. Then, with scikit-learn, we performed linear regression on the data to predict when each country’s water stress level would cross a critical level. This project was completed by myself and Aamir Husain (MSR ‘18).

Try out our interactive website here! The final project report can be downloaded from here.

weka algorithm vs. classification accuracy

programming skills

BuiltWorlds Hackathon

UI/UX Website Design