A research summary was published in the Spring 2019 edition (pg. 25) of the MIT Undergraduate Research Journal.
What is the problem?
Analyzing hourly emissions data from power plants across the United States.
Why is it important?
We want to know the effects of intermittency from renewable energy sources on thermal power plants to assess grid response capabilities and total CO2 emissions among other factors.
This data is being integrated into an energy systems model which will cover the majority of U.S. GHG emissions.
What did I do?
First, I took a step back from the code to ask my supervisors (post-docs) what the data is being used for and what format would be best for analysis. Then, I looked at how the data was being cross referenced between various sources. Incorporating that feedback, I structured my code to allow abstraction between subsequent steps of analysis. This allowed me to rewrite the slowest scripts in the overall workflow. The speed up factor was approximately 50x.
What was challenging about the work?
The initial challenge was confusion about the sources of various pieces of information that were necessary to create a holistic data point. An additional challenge has been learning to write code that’s well documented, so anyone can understand and edit it quickly. I don’t have formal training in software development, so I’ve been finding online resources and following them as best as I can. This practice has been helpful for myself because I’ll need to work on a script that I had written weeks ago and would have otherwise forgotten why I had a certain line.
What have I learned?
Having patience, finding the right documentation or examples, keeping track of ways to improve the code but maintaining focus on urgent tasks.