Automate: Automatic Anomaly Detection and Root Cause Analysis Framework for Hadoop
Tarih
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Erişim Hakkı
Özet
Big data frameworks, such as Hadoop, unlock immense potential. Yet, they come in handy with complex challenges, like illusive faults and straggler tasks, that disrupt the execution workflow and reduce resource utilisation. Traditional techniques such as the median analysis, struggle to identify and locate these faults. To address this issue, this paper presents Automate:automatic anomaly detection and root cause analysis framework, which makes a two-fold contribution. First, Automatehas improved the monitoring methods of Hadoop clusters and implements AUtool to monitor cluster resources and task progress. With the enhanced monitoring, we further leverage machine learning algorithms to analyse system logs, aiming to detect outliers and determine their root causes. Automate targets the issue of slow process execution, focusing on the combined impact of server heterogeneity and data locality, thereby offering a comprehensive analysis of factors affecting system efficiency. Our experimental findings demonstrate that the proposed method significantly enhances the accuracy in identifying system outliers and analysing root causes, offering an automated and more effective solution for monitoring and optimising big data system performance. © 2025 Elsevier B.V., All rights reserved.










