Excerpts taken from the ORIGINAL STORY, published on October  8, 2019, the SOURCE, by ,

“It’s a problem all too familiar to modern researchers: They’re drowning in data. Say a sociologist in South Florida wants to study how flood risk affects the area’s vulnerable, low-income communities. That researcher would need to integrate reams of complicated data, from flood maps, to building stock, to income distribution by region.

That’s a tall order. Such data are collected by different agencies, stored in different formats, and protected by different levels of access. Urban sustainability issues in particular crisscross mountains of data, with no easy way for researchers to use that data efficiently.

Too much data, without a means to integrate it, is the motivation behind a newly funded project led by Shrideep Pallickara, professor in the Colorado State University Department of Computer Science. Pallickara and a multidisciplinary team of university and government partners have been awarded $3 million from the National Science Foundation to develop a system for streamlining and managing vast datasets that could advance research in urban sustainability.

The five-year award is from the NSF’s Cyberinfrastructure for Sustained Scientific Innovation division. It includes CSU collaborators Sangmi Pallickara and Sudipto Ghosh in the Department of Computer Science; Mazdak Arabi in the Department of Civil and Environmental Engineering; Jay Breidt in the Department of Statistics, and researchers from Arizona State University, University of California Irvine, and University of Maryland Baltimore County.

“The key here is to complement, enhance and catalyze what a researcher can do,” said Shrideep Pallickara, whose expertise is in cloud infrastructure and large-scale cyberphysical systems. “We want to meet them where they are, rather than tell them to come meet us.”

In other words, Pallickara and team want to help urban systems researchers do better science by clearing roadblocks caused by unwieldy datasets that don’t talk to each other. Their secret sauce is a spatiotemporal “sketching” algorithm that essentially decouples these reams of data from the useful information hidden inside that data.”