Computer Science Seminar by Valerie Hayot-Sasson: New Models for Effective Data Sharing and Communication
Speaker: Valerie Hayot-Sasson, post doctoral researcher, University of Chicago
Title: New Models for Effective Data Sharing and Communication
Abstract:
With the increasing heterogeneity of computing infrastructure and the evolution of scientific applications from long-running simulations to artificial intelligence and machine learning workflows, new task-based computing frameworks (e.g., Ray, Dask, Globus Compute) have been created to take advantage of federated computing infrastructure. However, while these frameworks facilitate distributed execution, much of the responsibility for managing data remains the responsibility of the user. For example, efficiently managing and transferring data in these federated environments is challenging, as data producers are unaware of their potential consumers and communication may be temporally and referentially decoupled. We developed ProxyStore, a Python library for just-in-time resolution of Python objects, that enables fast federated communication through its Proxy abstraction. The ProxyStore Proxy object is a lightweight reference to the serialized data object located in one of the various connectors (e.g., Redis, local shared file system, P2P federated endpoints) supported by ProxyStore. Several high-level abstractions, such as futures, streaming and ownership, build on the Proxy abstraction to facilitate the expression of complex data communication patterns found commonly in scientific computing. Benchmark results demonstrate that the use of ProxyStore in scientific applications reduces runtime and improves both CPU and GPU utilization. I will conclude by discussing important future challenges related to efficient communication in federated applications, specifically looking at the intersection of open data initiatives that have transformed scientific computing and the adoption of data streaming as a method for rapid and scalable data processing.
Bio:
Valerie Hayot-Sasson is a postdoctoral scholar at the University of Chicago with a joint appointment at Argonne National Laboratory. Her research interests lie at the intersection of data management and distributed scientific computing, developing strategies and solutions to minimize data transfer times as well as energy consumption, and promote data sharing. She received her Ph.D. from Concordia University in 2022 where she studied the impacts of big data transfers on neuroimaging workflows.