HTwitt: A Hadoop-based platform for analysis and visualization of streaming Twitter data

Demirbaga, Ümit

dc.contributor.author	Demirbaga, Ümit
dc.date.accessioned	2021-06-07T12:06:37Z
dc.date.available	2021-06-07T12:06:37Z
dc.date.issued	2021-05-05
dc.identifier.uri	https://link.springer.com/article/10.1007/s00521-021-06046-y
dc.identifier.uri	http://hdl.handle.net/11772/6651
dc.description.abstract	Twitter produces a massive amount of data due to its popularity that is one of the reasons underlying big data problems. One of those problems is the classification of tweets due to use of sophisticated and complex language, which makes the current tools insufficient. We present our framework HTwitt, built on top of the Hadoop ecosystem, which consists of a MapReduce algorithm and a set of machine learning techniques embedded within a big data analytics platform to efficiently address the following problems: (1) traditional data processing techniques are inadequate to handle big data; (2) data preprocessing needs substantial manual effort; (3) domain knowledge is required before the classification; (4) semantic explanation is ignored. In this work, these challenges are overcome by using different algorithms combined with a Naïve Bayes classifier to ensure reliability and highly precise recommendations in virtualization and cloud environments. These features make HTwitt different from others in terms of having an effective and practical design for text classification in big data analytics. The main contribution of the paper is to propose a framework for building landslide early warning systems by pinpointing useful tweets and visualizing them along with the processed information. We demonstrate the results of the experiments which quantify the levels of overfitting in the training stage of the model using different sizes of real-world datasets in machine learning phases. Our results demonstrate that the proposed system provides high-quality results with a score of nearly 95% and meets the requirement of a Hadoop-based classification system.	tr_TR
dc.description.sponsorship	Newcastle Universitesi	tr_TR
dc.language.iso	eng	tr_TR
dc.publisher	Springer	tr_TR
dc.relation.isversionof	10.1007/s00521-021-06046-y	tr_TR
dc.rights	info:eu-repo/semantics/openAccess	tr_TR
dc.subject	Big data	tr_TR
dc.subject	MapReduce	tr_TR
dc.subject	Machine learning	tr_TR
dc.subject	Classification	tr_TR
dc.subject	Monitoring	tr_TR
dc.subject	Visualization	tr_TR
dc.title	HTwitt: A Hadoop-based platform for analysis and visualization of streaming Twitter data	tr_TR
dc.type	article	tr_TR
dc.relation.journal	Neural Computing and Applications	tr_TR
dc.contributor.department	Bartın Üniversitesi, Mühendislik Mimarlık ve Tasarım Fakültesi, Bilgisayar Mühendisliği Bölümü	tr_TR
dc.contributor.authorID	0000-0001-5159-0723	tr_TR