Machine learning to predict day zero

 

A project completed as part of the EECS:349 Machine Learning course to spread awareness about rapid depletion of water resources by predicting Day Zero for a country. Day Zero refers to a situation of acute water shortage. The initial stage of this project involved building up a large dataset (1968 examples), with 11 attributes, shown in the table below. We chose attributes like rainwater harvesting awareness, water consumption per capita, and desalination capacity. The dataset contained information for 180 countries from 1960 to 2014, sourced from the AQUASTAT database. After this, we trained a machine learning model on this dataset, and with the stress level as the target attribute, we used the best model to predict the stress levels of a country.

map.png

Attribute Unit Description
Rainwater Harvesting
Awareness
Yes/No Determined by whether or not
rainwater harvesting is widely
practiced
Water Consumption per Capita m^3/year/inhabitant Total amount of water withdrawn per
capita
Desalination Capacity km^3/year Fresh water produced using brackish
or salt water
Water Dependency Ratio % Percentage of water that comes from
other countries
Agricultural Water Withdrawal % Percentage of total water withdrawn
used for agriculture
Industrial Water Withdrawal % Percentage of total water withdrawn
used for industrial purposes
Municipal Water Withdrawal % Percentage of total water withdrawn
used for municipal purposes
Water Stress Level % Water stress level measured by
dividing total water withdrawal by the
total water available minus any water
needed for environmental flow. This
was used to determine the class
label for each sample
Total Land Cultivated % Percentage of the total land area of
the country that has been cultivated
Annual Precipitation mm/yr Total depth of precipitation per year
Total Renewable Water
Resources per Capita
m^3/year/inhabitant The maximum theoretical yearly
amount of water available per person
for a country at a given moment

weka algorithm vs. classification accuracy

programming skills

 
 
 

Using the dataset, we used Weka to find the right algorithm to build a model for our data. The success rates for the different models are shown in the graph above. We found out that nearest neighbor (IBk) produced the best results (with 88.26% classification accuracy). Using the model built with the nearest neighbor algorithm, we predicted stress levels for all the countries. Then, with scikit-learn, we performed linear regression on the data to predict when each country’s water stress level would cross a critical level. This project was completed by myself and Aamir Husain (MSR ‘18).

Try out our interactive website here! The final project report can be downloaded from here.

 
 

 
 
 
Previous
Previous

BuiltWorlds Hackathon

Next
Next

UI/UX Website Design