What is special about mining spatial and spatio-temporal datasets?
Tuesday, January 24, 2017 - 1:25pm - 2:25pm
Shashi Shekhar (University of Minnesota, Twin Cities)
The importance of spatial and spatio-temporal data mining is growing with the increasing incidence and importance of large datasets such as trajectories, maps, remote-sensing images, census and geo-social media. Applications include Public Health (e.g. monitoring spread of disease, spatial disparity, food deserts), Public Safety (e.g. crime hot spots), Public Security (e.g. common operational picture), Environment and Climate (change detection, land-cover classification), M(obile)-commerce (e.g. location-based services), etc.
Classical data mining techniques often perform poorly when applied to spatial and spatio-temporal data sets because of the many reasons. First, these dataset are embedded in continuous space with implicit relationships, whereas classical datasets (e.g. transactions) are often discrete. Second, the cost of spurious patterns (e.g., false positives, chance patterns) is often high in spatial application domains. In addition, one of the common assumptions in classical statistical analysis is that data samples are independently generated. When it comes to the analysis of spatial and spatio-temporal data, however, the assumption about the independence of samples is generally false because such data tends to be highly self correlated. For example, people with similar characteristics, occupation and background tend to cluster together in the same neighborhoods. In spatial statistics this tendency is called autocorrelation. Ignoring autocorrelation when analyzing data with spatial and spatio-temporal characteristics may produce hypotheses or models that are inaccurate or inconsistent with the data set.