UNIQUE DATA SCIENCE PROJECTS IN MULTIPLE DOMAINS
Everglades National Park
Using hyperspectral imagery captured by UF's GatorEye Unmanned Flying Laboratory (GE-UFL), which uses a Headwall photonics VNIR 270 spectral band hyperspectral sensor, we attempted to determine aquatic plant species using structured and unstructured classification. The steps included image processing, visual examination, and species determination. Principal componenet analysis captured 90.01% of variation in the first component. Further investigation showed that the first component contained green and infrared bandwidths, and visualizations identified that this component did not overlap our water bodies. It was determined that terrestrial vegetation was primarily captured in this component. The second component captured 9.04% of the variation, and reflected blue and yellow. When visualized it was entirely overlapping with our water bodies. It is likely that the yellow reflectance was the aquatic vegetation we were looking to identify. Further analysis using a neural network and IsoData revealed that we were looking at aquatic vegetation, but it became clear that identifying the species was not likely. For now, manual determination is the best approach for identifying aquatic vegetation species.
PRINCIPAL COMPONENT ANALYSIS
The goal of this analysis was to identify whether a time trend and spatial trend existed within a data set taken from the Florida Association of Land Surveyors. The data under analysis was of robbery, theft, or larceny of land surveying equipment throughout Florida. Exploratory analysis can be seen on the visualizations page, under the Kepler visuals. Due to the trend of more than 70% of the crimes occurring in the Miami tri-county area, an analysis of the Miami tri-county area crimes was calculated. However, because the data was sparse, traditional statistical models that rely on MCMC would not converge. A spatial autoregressive regression model using integrated nested LaPlace approximations was calculated to resolve this issue. Below are some of the outputs from the model. Unfortunately, a time trend was not identified, but a moderate effect from roadway circuits was identified.
SAR(INLA) TIME TREND
REMOTE SENSING SOFTWARE
Using Java SE and the Swing library, this software application allows a user to analyze a raster image along three channels (red, blue, green). This application is specific to RGB imagery, but it was originally built with the intent to use it for satellite imagery. Time constraints limited its function to RGB images instead of multi-spectral images. The application demonstrates Object Oriented Programing to perform geospatial analysis. The objects classes created are the main object, a GUI object, a backend object, and a processing object. The main obect calls all of the object classes to build the program. The GUI object calls all of the Java Swing objects and methods for display. And, the backend and processing objects perform the necessary calculations for the raster image. The project link points to a GitHub repo where this software is available as well as a report about it.
SCREENSHOT OF RASTER ANALYSIS SOFTWARE BEING USED FOR SATELLITE IMAGE ANALYSIS
Currently I am developing an application that geoparses text using named entity recognition and named entity disambiguation. The methodology is specifically related to decision making analysis I am calculating for my PhD disseratation. If my geoparser methodology is successful it will be an effective way to identify the geography associated with a group of texts, and allow additional analysis to be applied.
Research Decision Making
A continuation from developing a geoparser application is the analysis of the data extracted. Using extracted geograpies and blocking texts by geography we are able to include features at the geographic unit level for analysis. Currently my PhD dissertation project is using public health research as case studies, at different geographic resolutions, to identify the decisions made surrounding research efforts. Ideally, this process will allow us to tease out the bias from the model error. The goal is to apply this to multiple research topics.