HTwitt: A Hadoop-based platform for analysis and visualization of streaming Twitter data

dc.contributor.authorDemirbaga, Ümit
dc.contributor.authorDemirbaga, Ümit
dc.date.accessioned2021-06-07T12:06:37Z
dc.date.available2021-06-07T12:06:37Z
dc.date.created2021
dc.date.issued2021
dc.date.issuedyyyymmdd2021-05-05
dc.departmentFakülteler, Mühendislik Mimarlık ve Tasarım Fakültesi, Bilgisayar Mühendisliği Bölümü
dc.description.abstractTwitter produces a massive amount of data due to its popularity that is one of the reasons underlying big data problems. One of those problems is the classification of tweets due to use of sophisticated and complex language, which makes the current tools insufficient. We present our framework HTwitt, built on top of the Hadoop ecosystem, which consists of a MapReduce algorithm and a set of machine learning techniques embedded within a big data analytics platform to efficiently address the following problems: (1) traditional data processing techniques are inadequate to handle big data; (2) data preprocessing needs substantial manual effort; (3) domain knowledge is required before the classification; (4) semantic explanation is ignored. In this work, these challenges are overcome by using different algorithms combined with a Naïve Bayes classifier to ensure reliability and highly precise recommendations in virtualization and cloud environments. These features make HTwitt different from others in terms of having an effective and practical design for text classification in big data analytics. The main contribution of the paper is to propose a framework for building landslide early warning systems by pinpointing useful tweets and visualizing them along with the processed information. We demonstrate the results of the experiments which quantify the levels of overfitting in the training stage of the model using different sizes of real-world datasets in machine learning phases. Our results demonstrate that the proposed system provides high-quality results with a score of nearly 95% and meets the requirement of a Hadoop-based classification system.
dc.description.sponsorshipNewcastle Universitesitr_TR
dc.identifier.doi10.1007/s00521-021-06046-y
dc.identifier.orcid0000-0001-5159-0723
dc.identifier.scopus2-s2.0-85105351157
dc.identifier.scopusqualityQ1
dc.identifier.urihttps://link.springer.com/article/10.1007/s00521-021-06046-y
dc.identifier.urihttps://hdl.handle.net/11772/6651
dc.identifier.urihttps://doi.org/10.1007/s00521-021-06046-y
dc.identifier.wosWOS:000647349100002
dc.identifier.wosqualityQ2
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherSpringer
dc.relation.ispartofNeural Computing and Applications
dc.rightsinfo:eu-repo/semantics/openAccess
dc.subjectBig data
dc.subjectMapReduce
dc.subjectMachine learning
dc.subjectClassification
dc.subjectMonitoring
dc.subjectVisualization
dc.titleHTwitt: A Hadoop-based platform for analysis and visualization of streaming Twitter data
dc.typeArticle
dspace.entity.typePublication
relation.isAuthorOfPublication6197518d-2220-4e55-aa0a-5fc7d5c6606d
relation.isAuthorOfPublication.latestForDiscovery6197518d-2220-4e55-aa0a-5fc7d5c6606d

Dosyalar

Orijinal paket

Listeleniyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
İsim:
HTwitt- a hadoop-based platform for analysis and visualization of streaming Twitter data (published).pdf
Boyut:
1.73 MB
Biçim:
Adobe Portable Document Format
Açıklama:
Main article

Lisans paketi

Listeleniyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
İsim:
license.txt
Boyut:
1.59 KB
Biçim:
Item-specific license agreed upon to submission
Açıklama: