Initiative in Environmental Data Science
Wiser through Data
Core team: Steve Shaw, Hyatt Green, Mary Collins, James Gibbs, Colin Beier, Lindi
Collaborating staff and faculty: Tim Morin, Chuck Kroll, Jim Sahm, Bahram Salehi, Brian Leydet
This initiative will create the Center for Environmental Data Science to leverage existing ESF resources to address institutional weaknesses, and to be responsive to emergent data science themes being pushed by NSF and NIH.
ESF has a long and successful history in the environmental science disciplines but has yet to embrace powerful data science tools and concepts. This initiative will create the Center for Environmental Data Science to leverage existing ESF strengths, to address institutional weaknesses, and to be responsive to emergent data science themes being pushed by NSF and NIH. The center will also foster synergies with Empire Innovation Program-supported new hires here at ESF and SUNY Upstate. The Center will coordinate new course offerings, seed grants, graduate student support, campus computing (along-side ITS), innovative faculty training, and new partnerships with other academic institutions, government, and industry.
"Data Science [is] the multi- disciplinary field that combines data analysis with data processing methods and domain expertise, transforming data into understandable and actionable knowledge relevant for informed decision making..."—Gibert et al. (2018, Envir. Model. and Software, 106: 4-12)
Background In the last five years, business and academic institutions have heralded a new era of data science. A May 2018 Bloomberg article titled "This is America's Hottest Job" (Michael Sasso) noted data science job postings on Indeed.com had increased 75% since 2015 with numerous companies unable to fill their openings. Academic institutions both nearby (University of Rochester, SUNY Binghamton) and distant (Duke, Stanford) have recently launched data science programs.
Data science encompasses data analysis in fields as diverse as health care, finance, energy systems, genomics, and marketing as well as the environment. Underpinning all data science is expertise within computer science and applied mathematics. Thus, with limited reach into these other disciplines, ESF is at a disadvantage when competing in the broad realm of data science.
However, ESF has distinct strengths within the sub-field of environmental data science which relies more heavily on domain knowledge and analog data acquisition (Figure 1) compared to the broad field of data science whose data likely originate in the digital realm (e.g., financial transactions, Facebook likes). A push for environmental data science would leverage ESF's exceptional domain knowledge, research properties beyond the Syracuse campus, and environmentally-geared faculty and student body to build a competitive data science program. This initiative would build off of existing ESF programs in aquatic health, geospatial analysis, hydrologic modeling, wildlife tracking, and others as well as connect to new data science- focused faculty lines in environmental health at both SUNY ESF and Upstate.
Specific challenges to conducting environmental data science include issues with data interoperability, treatment of data errors, data fusion, data storage and reproducibility, and selection of proper data streams (Gibert et al., 2018). Overcoming these challenges would move the environmental data science field forward as well as develop new capacity at ESF.
The central goal of this initiative is to create a new Center for Environmental Data Science that broadly enhances environmental data science capacity at ESF in order to 1) be a leader in the field of environmental data science, 2) differentiate and enhance the academic programs ESF offers, 3) ensure our students remain highly sought after on the job market, 4) strengthen the research capabilities of ESF faculty 5) be a sought after partner to government, industry, and academic institutions, and 6) maximize the ability of ESF to solve real-world problems.