You are here

Advanced Knowledge Enablement (CAKE)

Florida International University

Dubna International University

Florida Atlantic University

Last Reviewed: 04/07/2017

The Center's mission is to conduct industry-relevant studies in the representation, management, storage, analysis, search and social aspects of large and complex data sets, with particular applications in geospatial location-based data and healthcare.

Center Mission and Rationale

The explosive growth in the number and resolution of sensors and scientific instruments, of enterprise and scientific databases, and of Internet traffic and activity has engendered unprecedented volumes of data. The frameworks, metadata structures, algorithms, data sets, search and data mining solutions needed to manage the volumes of data in use today are largely ad-hoc. The research being carried out in the universities in this area and more broadly in information technology underpins advances in virtually every other area of science and technology and provides new capacity for economic productivity.

The Center studies the representation, management, storage, analysis, search and social aspects of large and complex data. The research is applicable to biomedical, defense, disaster mitigation, homeland security, environmental concerns, real estate, health records management, finance, and technology service companies. The faculty carry out research in performance studies, benchmark evaluations, and the application of novel algorithms, routines, data models, network analyses and software tools to large-scale data sets.

Research program

3D Data Environments using Multi-Touch Screens

With the increased availability and use of multi-touch devices such as smart phones, tablet computers and multi-touch desktop monitors, research and development in more natural, efficient and effective methods of using these devices has garnered increased interest. This is particularly true in the challenges encountered in the area of navigation and manipulation of 3D worlds using a 2D touch screen interface. In this project, we are working towards visualization of geospatial data on 3D displays that do not require the use of peripheral accessories or equipment. As such, the first implementation is focused on displays that create a 3-dimensional sensation to the user without the need to wear polarized glasses or goggles. The team is developing a methodology and algorithms for interfacing between multi-dimensional data streams and pixel-rendering protocols of 3D displays. Our work has focused on the development of multi-touch gestures that can be used to command translations and rotations in 3 axes within a 3D environment. Our solution has been implemented using a 3M M2256PW 22” Multi-touch monitor as the interaction device. In our definition of multi-touch gesture sets, we have established independent gestures for (1) each type of translation and (2) each type of rotation, so that we could study how users prefer to combine or concatenate basic gestures. In our most recent work, the team has designed a demonstration of the interface between our Geospatial Data Engine TerraFly and 3D display. FIU I/UCRC developed a projection method to take a 2D distorted image captured with Omni-directional view aerial photographic sensor 360 fish eye camera and transformed it into a distortion-corrected 3D panorama image. With many of the new and emerging devices and displays, this work is relevant to a number of our industry partners.

Automated Querying and Analysis of Geolocated Scanned Documents

The demand for paperless management of business documents has led to the establishment of companies that focus on providing B-2-B solutions for efficient management of all types of documents. An example of this is software solutions that compress PDF documents, OCR PDF files, and automate the processing of EOBs, invoices, and handwritten documents. An emerging challenge is this area is the ability to access to a wide range of geo-located scanned documents with complete metadata.

A virtual counseling system for brief alcohol health interventions

Advances in the potential use of interpersonal technologies have garnered increased attention in the health fields in recent years. This is particularly true of areas which require healthy lifestyle changes that involve relatively simple behavioral changes that can easily be targeted. Adaptations of motivational interviewing (AMIs) have mushroomed in the past few years, with the purpose to meet the need for motivational interventions within medical and health care settings. In our prior work, we have focused on infusing advanced technologies into AMIs that will allow them to interact and deliver personalized and tailored behavior change interventions using the MI communication style via multimodal verbal and non-verbal channels. We developed the On-demand Health Counselor System Architecture, an MI-based ECA automated intervention system aimed at helping people with excessive alcohol consumption become aware and potentially motivated to change related unhealthy patterns of behavior.

In our most recent work, we developed our dialog system based on Markov decision processes framework and optimized it by using reinforcement learning (RL) algorithms with data we collected from real user interactions. The system begins to learn optimal dialog strategies for initiative selection and for the type of confirmations that it uses during the interaction. We compared the unoptimized system with the optimized system in terms of objective measures (e.g. task completion) and subjective measures (e.g. ease of use, future intention to use the system). Our evaluation of our approach showed that the dialog managers that are optimized with RL have the potential to reach optimal behavior, given enough training data. This provides advancements to the healthcare domain with the first system to use speech as an input medium with a RL-based approach.


The CARMEL-TerraFly System enabled monitoring of airborne cameras.  The project integrates cutting-edge Context Aware Rich Media Extensible Middleware technology (known as CARMEL) from IBM Research – Haifa ( with the TerraFly Geospatial System at the Center for Advanced Knowledge Enablement (CAKE). This integrated system offers innovative situational awareness technology, while helping expand the Center’s international influence and connections. By combining IBM Haifa’s Geographic Information Systems (GIS) and streaming technology research, CARMEL is a geographically anchored, video-on-demand streaming infrastructure that provides: 1) scalable, end-to-end low-delay and resilient streaming technologies; 2) on-demand bandwidth adaptation (transcoding); 3) highly accurate geographical searches, 4) real-time, geo-located notification, and; 5) high performance, service oriented architecture-enabled technologies.  This work was published in the NSF Compendium of Industry-Nominated Technology Breakthroughs of NSF I/UCRCs.,_+_CARMEL-TerraFly_annotation.pdf

Cloud Computing Based Patient Centric Global-Medical-Information-System

The goal of this project is to develop a cloud computing based Patient-Centric-Global-Medical-Information-System (PC-GMIS) framework that will allow various authorized users to securely access patient records from various Care Delivery Organizations (CDOs) such as hospitals, urgent care centers, doctors, laboratories, imaging centers among others, from any location. The system will seamlessly integrate all patient records including images such as CT- SCANS and MRI'S which can easily be accessed from any location and reviewed by any authorized user. The storage and transmission of medical records will be conducted in a totally secure and safe environment with a very high standard of data integrity, protecting patient privacy and complying with all Health Insurance Portability and Accountability Act (HIPAA) regulations. The sharing of medical records, specifically radiology imaging databases with CDOs will drastically reduce medical redundancies, exposure to radiations, costs to patients. The project will empower the patients with the automated ownership of their secure personal medical information. The use of cloud computing in this application would allow the CDOs to address the challenge of sharing medical data that is overly complex and highly expensive to address with traditional technologies.

CORBI Collaboration with FAU & UMBC

This project is a Multi-Center Distributed Cloud Computing Collaborative Feasibility Study to Provide Massive 3-D Visualization Services for Climate Data on Demand (with the University of Maryland Baltimore County's CHMPR I/UCRC).  We are integrating UMBC datasets with TerraFly.

Toward a system of spatio-temporal visualization, the FIU site has developed and deployed a multi-variate multi-dimensional scientific data animator at The animator/visualizer accepts as a parameter a scientific observations datasets name and present a spatio-temporal playable animation of spatial data in time. The user can stop the animator and manually browse data projections in space and time.

Economic Impact of Disasters Analysis

The main objective is to quantify the property value effect on any individual property in the proximity of the disaster zone. Our initial set of experiments involved studying the effects of Hurricanes Andrew (1992) and Wilma (2005) in Miami-Dade County, where we successfully identified negatively affected geographical regions (with respect to property prices) on the hurricanes' paths. We are using public records on property sales and geographical databases of properties to incorporate additional characteristics, e.g. type of property, footage. The method performs geospatial and temporal analysis of massive property records with the intention of finding latent property value change patterns on similar properties. This research line has been leveraged into a RAPID grant to study the effects of the Deepwater Horizon disaster.

This work has been generalized as an online tool for Geospatial Trends Analysis:

Effective Information Discovery on Electronic Medical Records

This project is exploring the generation of algorithms and tools to effectively search EMRs stored in the most recent XML-based standard format: the Health Level Seven (HL7) Clinical Document Architecture (CDA). The same techniques can be carried to other hierarchical standard formats as well. The successful realization of information discovery on EMRs is expected to have a great impact on the quality of healthcare.

Exploration of using MapReduce for geospatial data

The amount of information in spatial databases is growing as more data is made available. Spatial databases mainly store two types of data: raster data (satellite/aerial digital images), and vector data (points, lines, polygons). The complexity and nature of spatial databases makes them ideal for applying parallel processing.  MapReduce is an emerging massively parallel computing model, proposed by Google.  We have applied the MapReduce model to address five important spatial problems: (a) bulk-construction of R-Trees (using vector data), (b) aerial image quality computation (using raster data), (c) using MapReduce to improve geo-textual searches in large-scale databases, (d) Non-negative matrix factorization in MapReduce, and (e) all-pairs shortest-path problem in real road networks using MapReduce. Our results confirm the excellent scalability of the MapReduce framework in processing parallelizable problems.

Flexible Data Schema and Ingest Engine for Domain Specific Applications

As geospatial and location-based data are increasingly available to the public, companies are increasingly in need of geospatial querying systems that can easily be integrated with their current data systems. This is particularly critical for companies whose approaches to innovation are heavily data-driven and reliant on Big Data. This project involves the development of a data schema and data ingest mechanism that will enable companies to index and perform geo-spatial queries that are relevant to their specific domain. In this project, we are working on approaches that will help companies identify and ingest the relevant location datasets on top of their current systems. A company-branded TerraFly front end would then be deployed to enable straight forward visualization of the data and analysis results, with drill down and slice/dice.

FRS Visualization of Geospatial Time Series and Dynamic Objects

Raster data time series: we have deployed ability to view, animate, and morph imagery at different times; see Vector data time series: we have deployed a nationwide time functions of water level and flow as measured at USGS stream gauges; see and

Geolocating Non-Geotagged Social Media Information

The proliferation and use of social media data by businesses and government agencies has dramatically increased in recent years. This type of data has been found to provide key information that is at times richer in detail and available in a timelier manner. For example, there have been several cases where a disaster-related incident, such as an earthquake, was reported via social media within seconds of its occurrence. However, in the majority of cases a social media post, photograph or video is not geo-tagged. This makes it significantly more difficult to determine the source location of the data or image. We have been developing algorithms that find approximate geo-location of social media data that does not contain location coordinates in its metadata. Specifically, non-geo-located social media data is mined and analyzed using automated text analysis tools for named entity recognizers, as well as associated keyword queries. Identifying information is extracted from the textual content such as tags, image descriptions and reader comments. Analytical techniques are employed to improve precision and remove error of estimated locations. For example, an image related to the Deepwater Horizon Oil Spill would likely be tagged with “oil spill” and have a time/date stamp corresponding to the period of time during this disaster. This would help identify this image as being related to the oil spill. However, this alone is grossly imprecise as the affected area of the spill is expansive. In this case, the algorithm searches for other keywords and combinations of keywords that would help to better determine a more precise location. In the case of our example, if an image is also tagged with “beach” and “Pensacola”, then we would be able to determine that the image originated from Pensacola, FL’s beach. As is indicated in the Products section of this document, this work has resulted in the submission of a patent application for our algorithm for geolocating social media.

Geospatial Data Engine interface to 3D and Embossing Printers

In recent years, the use of 3D and emboss printing technologies has dramatically increased in both volume and scope. These technologies are now being used for a wide range of applications, including providing innovative tools for active learning in the education field and the visualization of geographic data. Support for these types of printing beyond traditional CAD applications, however, is severely limited. In this project, researchers are working towards the development of an interface between the FIU Geospatial Data Engine TerraFly and embossing printers. This will expand both educational opportunities and business capabilities that are not currently available. The researchers expect this to be an exciting and innovative capability that will attract scientific, educational, and governmental interest in TerraFly.

Geospatial System to Disseminate Ultra-High-Resolution Aerial Imagery

The TerraFly Geospatial Data and Analytics system have been developed over the last 11 years via NSF MRI, MII, CREST, IIS, and other awards, disseminates geospatial data to the public, researchers, and industry-specific applications. In recent years there has been a dramatic increase in the availability of new sensor systems and types of location-based and geospatial data. Our researchers are transforming the TerraFly Geospatial Data and Analytics system to enable the mashup and dissemination of ultra-high-res (0.25 inch/pixel) aerial imagery and environmental data streams collected by these new sensor systems.

Stakeholders, including emergency managers needing immediate very high-resolution imagery, pre and post disasters, will access these new types of data via TerraFly, mashed-up with thousands of other TerraFly datasets.

See: Image


The Dubna site of I/UCRC-CAKE primarily engages in applied research in Geographic Information Systems, especially with respect to their flagship project GISIntegro. The Dubna effort is collaborative with and complementary to the core expertise of the FIU Site of CAKE. By combining the strengths of the two groups, we are able to undertake major governmental and industrial applied research projects in GIS. These studies are focused on elaboration of methodology and original algorithms of integrated analysis of data about studied processes of objects and development of intellegent user interfaces for each stage of the process - from data acquisition, georeferencing, data integrity and quality assurance, through multi-level analysis to pre-print of hard-copies of published maps or decision-making support systems for mineral exploitation and environmental protection management.

Our work together has resulted in a number of cutting-edge findings and capabilities, and provided additional opportunities. Some of these are summarized below.

Development of pattern recognition algorithms:  Holotype. We have developed algorithms to compute similarity measure matrices between objects described by heterogeneous properties. We are currently working on fine tuning these algorithms for higher precision and more complex data sets. We are also continuing to develop algorithms that solve a greater variety of recognition problems applicable to problems with incomplete information available. The work in this area is particularly dynamic as the increased use of technologies in our everyday lives leads to new potential sources of data.

Multi-functional geo-information server. We have been developing algorithms to integrate remote geo-informational resources and spatial modeling of situations. Spatial modeling allows estimation of the current state and prognosis of its change based on holistic evaluation of environmental properties and environmental impacts.

Development of methodology of modeling of ecological data and structure of ecological informational space, development of methodology of determination of natural and anthropogenic factors affecting the ecological situation of the subject region.

Health Information Technology (HIT) and Data Mining for Personalized Medicine

Currently, treatment decisions for diseases such as cancer are based primarily on clinical parameters, with little use of genomic data. Yet, the use of genomic data could provide data on individual health status and disease risk.  In our recent work, we have used various data mining and supervised machine learning techniques for generating a prediction model capable of distinguishing between cases and controls for initial screening of breast cancer. We have statistically analyzed 3 different methods:  Naive SNP Selection Approach, Feature Selection Approach and Domain Knowledge Integration Approach.  We demonstrated the benefit of the addition of domain knowledge of single nucleotide polymorphism (SNP) in machine learning procedures. Our observations revealed that the machine learning model generated using both the domain knowledge and the feature selection technique performed slightly better compared to the naive approach of classification.

Intelligent Help Desk Assistant

This project aims to (i) develop a technology for understanding and classifying email messages and electronic reports from clients/customers by capturing their semantic contents; (ii) develop automated techniques for discovering similarities in reported problem descriptions. Much unnecessary work can be avoided by first determining whether a problem is known or not; (iii) develop cost-effective approaches for diagnosing the customer problems in help desk applications.

LabQuest Sensor Data Acquisition

LabQuest is a standalone science education device that can be used to collect basic sensor data and visualize that data using built-in, real-time graphing and live sensor data display capabilities. It has a large, high-resolution touch screen and wireless connectivity that encourages collaboration and personalized learning. In our work, we are combining the LabQuest Device technology with other current FIU research projects and partners in an attempt to develop an alternate use of the LabQuest Device. Our initial focus is on environmental studies and sensor data acquisition in field research. By expanding the versatility of the LabQuest platform the proposed project would expand the marketability of the platform, as well as provide new approaches for collection of data directly from the environment. Some of the tertiary benefits of this project would be to provide an innovative approach to data collection, expand the market streams for the devices used and promote environmental awareness/green technologies, and improve research and development.

Motion Estimation on Parallel Platforms

The focus of this project was the parallelization of the motion estimation in video using NVIDIA CUDA toolkit. The CUDA toolkit makes developing parallel programs on the NVIDIA GPUs easier and enables access to the large scale parallelism offered by the GPU. Video processing applications are especially suited for the NVIDIA platform because of the data parallel nature of the problem. We evaluated the tools available in CUDA and apply them to the motion estimation problem. Motion estimation is widely studied and is the computationally expensive part of video coding. We developed and implemented a parallel motion estimation algorithm using CUDA. The results show a speedup of over 100 compared to a plain C implementation.

Private Location Centric Profiles for Online Social Networks

As the popularity of social networks has dramatically increased, so has the availability of a wider range of tools and technologies to support related activities.  A recent addition to social networks are geosocial networks (GSNs), such as Yelp, Foursquare and Facebook Places, that provide detailed information on personal location through "check-ins" performed when users visit certain venues. This process, however, involves the collection of personal information from users and, thus, can expose users to significant risk. To mediate these risks, we have been developing ProfilR , a framework that allows the construction of location centric profiles (LCPs) based on the profiles of present users, while ensuring the privacy of users and correctness of the profiles. ProfilR relies on (Benaloh's) homomorphic cryptosystem and zero knowledge proofs to enable oblivious and provable correct LCP computations. In our current work, we are focusing on investigating the viability and development of a venue-oriented ProfilR that would involve the deployment of an inexpensive device at a business venue along with the implementation of snapshot LCPs that are built by user devices from the profiles of co-located users and communicated over ad hoc wireless connections. Snaphot LCPs are not bound to venues, but instead user devices can compute LCPs of neighbors at any location of interest.

Processing top-k Spatial Boolean Queries

Geographic information on the Internet is growing rapidly as more systems (software services, mobile phones, digital cameras) allow users to geotag resources such as Web pages, pictures, and videos by associating geographical metadata to textual descriptions, or via automated geotagging. In such large geographic collections, a key challenge is to efficiently support spatial queries with constraints on textual contents. We have developed a novel method to efficiently process top-k spatial queries with conjunctive Boolean constraints on keywords combined with logical connectives.

Querying of Complex Multimodal Data

The visual world provides a rich and complex source of data that is often relevant and needed by many businesses. Yet much of this data is inaccessible. With the increased proliferation and use of images and videos, and even 3D modeling, finding effective and accurate methods of extracting and organizing this rich source of data is critical. In this project, research is being performed in effective access to complex multimodal data. In our current work, this research is being conducted over a data set collected from MIMIC II (Multiparameter Intelligent Monitoring in Intensive Care) database, with a focus on feature representation, feature selection and correlation discovery.

Semantic Wrapper

The Semantic Wrapper provides its users with easier access to a legacy relational database, in parallel to continued access via existing legacy application software.

Sensor Fusion and Signal Processing for Advanced Inertial Measurement Units (IMUs)

The use of new and innovative input devices for computing technologies raises may challenges for developers. One such challenge is the development of instrumented gloves, capable of digitizing the movements of a human hand, in real-time, for applications as a form of computer input, virtual reality (VR), rehabilitation, skills training, etc. Recently, integrated circuits (ICs) including 3-axis accelerometers, 3-axis rate gyroscopes and 3-axis magnetometers have become available. Some research and development companies are pursuing the development of VR gloves instrumented with these ICs, also called Inertial Measurement Units (IMUs). A major challenge in the development of this type of VR glove is developing fast, accurate signal processing approaches that use the full information available from the IMUs to achieve a proper representation of the orientation changes of the thumb around the palm of the hand (The orientation of the thumb had been particularly difficult to monitor with enough accuracy in previous VR glove models). To that end, our researchers have developed a Sensor Fusion Algorithm, that combines information obtained from the accelerometers, gyroscopes and magnetometers within the IMU chip to  be attached to the thumb of the glove), that is simultaneously less noisy and less affected by drift error than the orientation estimations generated from accelerometer data or gyroscope data alone. The researchers have tested the algorithms off-line, using data previously recorded from the IMU chip, while execution well-specified motions. Results have been encouraging. The next step is to study the best way to combine the estimation of position and orientation of two IMU chips.

Sign Language Animation Generator and Interpreter

Developed as assistive technology for deaf individuals and their families, the AcceleGlove™ is a nylon glove that measures motion and orientation of the hand, wrist, and fingers using six integrated, 3-axis MEMS accelerometer sensors. One accelerometer is on the back of the hand, with the remaining located on the back of each individual finger. The algorithms that are used to determine the correct gesture, i.e. letter, word, or command, process the recorded hand trajectory and finger positions. The microcontroller coverts the findings to ASCII characters, which can be sent to PC’s, tablets, smart phones, or smart televisions.

TerraFly for Disaster Mitigation

The TerraFly Group, with the National Science Foundation's Industry/University Cooperative Research Center for Advanced Knowledge Enablement at Florida International University, specializes in aggregation, amelioration, querying, and visualization of geospatial data with applications in disaster mitigation. users visualize and query aerial imagery and data layers. Users virtually "fly" over imagery via a web browser, without any software to install or plug in. Tools include user-friendly geospatial querying, data drill-down, interfaces with real-time data suppliers, demographic analysis, annotation, route dissemination via autopilots, customizable applications, production of aerial atlases, and application programming interface (API) for web sites.

The TerraFly project has been featured on TV news programs (including FOX TV News), worldwide press, covered by the New York Times, USA Today, NPR, and Science and Nature journals.

The 40TB TerraFly data collection includes, among others, 1-meter aerial photography of almost the entire United States and 3-inch to 1-foot full-color recent imagery of major urban areas. TerraFly vector collection includes 400 million geolocated objects, 50 billion data fields, 40 million polylines, 120 million polygons, including: all US and Canada roads, the US Census demographic and socioeconomic datasets, 110 million parcels with property lines and ownership data, 15 million records of businesses with company stats and management roles and contacts, 2 million physicians with expertise detail, various public place databases (including the USGS GNIS and NGA GNS), Wikipedia, extensive global environmental data (including daily feeds from NASA and NOAA satellites and the USGS water gauges), and hundreds of other datasets.


XML Delivery of TerraFly Raster and Vector Data

This project would enable users to directly access the TerraFly database via simple and complex queries and to receive data in a format that could be used in their own applications.

XML Querying of Geospatial Data

We have developed Real-time interface between third-party applications and TerraFly databases, allowing applications to generate intelligent geospatial queries, receive results in XML, and process the results by the application before delivering same to the application's users. XML is a standardized interface between programs communicating via Internet.


Florida International University

Florida International University
Modesto A Maidique Campus, SCIS, ECS 243

Miami, Florida, 33199

United States



Florida Atlantic University

Department of Computer & Electrical Engineering and Computer Science
777 Glades Road

Boca Raton, Florida, 33431

United States



Dubna International University

Institute of System Analysis and Management
Universitetskaya str., 19

Dubna, Moscow reg., , 141980


+7 (49621) 22 683

+7 (49621) 28 548