Researchers Develop New Methods for Multiple Simultaneous Hierarchical Labeling

Greater search and retrieval efficiency, power for medical records, other big data

Date

January 27, 2015—CHICAGO—To address the growing volume of data and reduce the corresponding search for the "needle in a haystack" in large data sets, researchers at the Illinois Institute of Technology in Chicago have secured three patents (US 8,209,358 B2, US 8,626,792, US 7,720,869 B2) for methods for multiple simultaneous hierarchical labeling.

Search for needed documents and information is increasingly eating into workplace productivity.

Employees take up to eight tries to retrieve the right document and information, according to SearchYourCloud in 2013. Information is in silos; content is created by different users; information might be tagged but does not provide for multiple cross-sectional views of the data.

For applications in large data handling such as medical records where tracking symptoms or the spread of a disease may be paramount,multiple simultaneous hierarchical labeling allows for rapid, easily repeatable, intuitive and accurate search and browsing of multiple streams of data at the same time. Users can tag each data item with multiple, hierarchical labels. Previously, only simple tags were permitted. A subsequent search of categorized medical data, for example, could simultaneously pull up data categorized into multiple taxonomies using genome type, geographical location, age, and profession.

The patents for the simultaneous hierarchical labeling schemes were issued to Sanjiv Kapoor, professor of computer science at IIT, and Ophir Frieder, former IITRI Chair Professor of Computer Science and director of the Information Retrieval Laboratory at IIT and now the Robert L. McDevitt, K.S.G., K.C.H.S. and Catherine H. McDevitt, L.C.H.S. Professor of Computer Science and Information Processing at Georgetown University, and professor, Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center.

TECHNICAL DESCRIPTION
The new patents describe the development of multiple simultaneous hierarchies that do not require data replication, but support efficient access and search capability that is applicable over distributed and cloud environments, as well as web browsers. Hierarchy maintenance is simplified since no data replication is required. Changes in one hierarchy or data values are seamlessly transferred to other hierarchies. Experiments show data structures that make the search system remarkably efficient, affording greater freedom of maintaining classification/taxonomies under change.

As an example, consider the following structure of files (Figure 1) illustrating disease details based on geographic location:

US/Illinois/Hepatitis/D0
US/Indiana/Hepatitis/D1
US/Illinois/Rotavirus/D2
US/Illinois/Norovirus/D3
US/Indiana/Rotavirus/D4
US/Indiana/Norovirus/D5

 

While this may be partly accomplished by incorporating search keywords, such an approach using simple keywords will not provide a hierarchical view orthogonal to the original hierarchy as desired. Kapoor and Frieder created the notion of structured keywords and abstract directories so that files can be organized in any user-specified hierarchies. A file system based on this approach is also pictured below.

The university is exploring the licensing of these patents and also has applied for two additional, related patents.

Founded in 1890, Illinois Institute of Technology, located in the historic Bronzeville community on Chicago’s South Side, is a private, technology-focused, research university offering undergraduate and graduate degrees in engineering, science, architecture, business, design, human sciences, applied technology, and law. One of 22 institutions that comprise the Association of Independent Technological Universities (AITU), Illinois Tech offers exceptional preparation for professions that require technological sophistication, an innovative mindset, and an entrepreneurial spirit. Visit www.iit.edu.