Climate Science Hackathon Winner: Team 154
Xi Chen (ICS; Informatics):
I am a second-year master student majoring in Informatics from ICS. I have worked in several different data
science projects where I was responsible for the analysis of raw data and also the creation of thoughts and
analysis. My previous hands-on project experience taught me how to identify real world questions, extract and
analyze the data, then visualize and present the report. The Bachelor and Master of Science in Electronic
Engineering and biomedical engineering that I earned from Fudan University were instrumental to my
knowledge of Internet technology during my time as a project manager at China Mobile. Along the way to
earning my second Master degree in Informatics in UCI, I also developed a strong background in statistics and
machine learning that has served me well in my pursuit to advance the interests of Healthcare information
Reza Asadi (Computer Science):
I am a third year PhD student in Computer Science at UC, Irvine. I got my B.Sc. and M.Sc. in Computer
Science at Amirkabir University of Technology, Iran at 2011 and 2013. I am working with Professor Amelia
Regan at UCI. My research area falls in distributed optimization algorithms with the application in Large-scale
Machine Learning, Transportation Systems and Power Flow networks. I am working on distributed
optimization as core component of large scale machine learning problems where a distributed methodologies
increase performance of the system. In transportation systems, I am working on data scientist projects in
academia and industry with the goal of designing efficient intelligent transportation systems. Also, I develop
the theory of optimization to propose a distributed power flow system.
Ahmad Razavi (Computer Science):
Ahmad Razavi received his bachelor and master degree in computer engineering in 2007, and 2010
respectively. He joined University of California, Irvine in 2014. Currently, He is a computer science PhD
candidate, working on decentralizing data fusion algorithms in multi robot systems. His research interest
includes embedded systems, robotics, and Machine Learning.
California Drought dataset contains storage level of 83 reservoirs during the last 15 years. Apart from the storage level
and corresponding date, we also have the geographical information such as longitude, latitude and elevation.
After taking a quick overview of the data and pre-processing data, the seasonal trend among some of the reservoirs
caught our eyes and brought us to two hypotheses. Firstly, there is a prevalent seasonal trend among most of the
reservoirs. Secondly, geographical parameters are closely related to this trend. Then, we analyzed the dataset to answer
how trends are geographically distributed in California.
We applied temporal auto-correlation and seasonal analysis on each reservoir’s time sequence data. We used phase and
length of cycle to describe each seasonal trend. The results show that most storage levels have seasonal patterns and
change dramatically over the years. Some reservoirs have longer cycles than others, some start peak time earlier than
others. According to the temporal analysis result, we split reservoirs’ storage level into two groups: seasonal and non-
Given phase and cycle (temporal correlation) we got from the temporal analysis, we clustered reservoirs using K-means
clustering to illustrate how trends geographically change. The results show not only geographical areas change the
trends which could be results of factors outside of given dataset, but also elevation has a high impact on storage level
trends, e.g. high elevation reservoirs feed by melted snow (in summer), while others feed by rain (in fall and spring).
In conclusion, temporal analysis explains the seasonal trend, spatial analysis illustrates how reservoir’s trends are
related to a geographical area. Both analyses results in better prediction of water level trends in California.
To further explore the dataset, we could explore the causal relationship between temporal trend and spatial trend. Also,
one can introduce more spatial information such as river and mountain range to the dataset, so that we might able to find
the factors that impact the trends. Finding more factors related to storage levels results in better prediction and
understanding of storage level changes.