Events
Memory Optimizations for Distributed Stream-based Applications
00:00 03-05-2007
Distributed stream-based applications manage large quantities of data and exhibit unique production and consumption patterns that set them apart from general-purpose applications. This dissertation examines possible ways to harness the unique characteristics of such applications to assist in creating efficient memory management schemes. Two complementary approaches are suggested:
1. Garbage Identification
2. Adaptive Resource Utilization
Garbage Identification is concerned with an analysis of dynamic data dependencies to infer those items that the application is no longer going to access. Several garbage identification algorithms are examined. Each one of
the algorithms uses a set of application properties (possibly distinct from one another) to reduce the memory consumption of the application. The performance of these garbage identification algorithms is compared to the
performance of an ideal garbage collector, using a novel logging/post-mortem analyzer. The results indicate that the algorithms that achieve a low memory footprint (close to that of an ideal garbage collector), perform their garbage identification decisions locally; however, they base these decisions on best-effort global information obtained from other components of the distributed application.
The Adaptive Resource Utilization (ARU) algorithm analyzes the dynamic relationships between the production and consumption of data items. It uses this information to infer the capacity of the system to process data items and adjusts data generation by the application accordingly. The ARU algorithm
makes local capacity decisions based on global information. This algorithm is found to be as effective as the most successful garbage identification algorithm in reducing the memory footprint of stream-based applications, thus confirming the previous observation that using global information to perform local decisions is fundamental in reducing memory consumption for stream-based applications.
1. Garbage Identification
2. Adaptive Resource Utilization
Garbage Identification is concerned with an analysis of dynamic data dependencies to infer those items that the application is no longer going to access. Several garbage identification algorithms are examined. Each one of
the algorithms uses a set of application properties (possibly distinct from one another) to reduce the memory consumption of the application. The performance of these garbage identification algorithms is compared to the
performance of an ideal garbage collector, using a novel logging/post-mortem analyzer. The results indicate that the algorithms that achieve a low memory footprint (close to that of an ideal garbage collector), perform their garbage identification decisions locally; however, they base these decisions on best-effort global information obtained from other components of the distributed application.
The Adaptive Resource Utilization (ARU) algorithm analyzes the dynamic relationships between the production and consumption of data items. It uses this information to infer the capacity of the system to process data items and adjusts data generation by the application accordingly. The ARU algorithm
makes local capacity decisions based on global information. This algorithm is found to be as effective as the most successful garbage identification algorithm in reducing the memory footprint of stream-based applications, thus confirming the previous observation that using global information to perform local decisions is fundamental in reducing memory consumption for stream-based applications.