Constructing a Turkish Corpus for Paraphrase Identification and Semantic Similarity

dc.contributor.authorEyecioglu, Asli
dc.contributor.authorKeller, Bill
dc.contributor.authorÖzmutlu, Aslı Eyecioğlu
dc.date.accessioned2025-10-18T13:23:00Z
dc.date.created2018
dc.date.issued2018
dc.departmentBartın Üniversitesi
dc.description17th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing) -- APR 03-09, 2016 -- Mevlana Univ, Konya, TURKEY
dc.description.abstractThe Paraphrase identification (PI) task has practical importance for work in Natural Language Processing (NLP) because of the problem of linguistic variation. Accurate methods should help improve performance of key NLP applications. Paraphrase corpora are important resources in developing and evaluating PI methods. This paper describes the construction of a paraphrase corpus for Turkish. The corpus comprises pairs of sentences with semantic similarity scores based on human judgments, permitting experimentation with both PI and semantic similarity. We believe this is the first such corpus for Turkish. The data collection and scoring methodology is described and initial PI experiments with the corpus are reported. Our approach to PI is novel in using 'knowledge lean' methods (i.e. no use of manually constructed knowledge bases or processing tools that rely on these). We have previously achieved excellent results using such techniques on the Microsoft Research Paraphrase Corpus, and close to state-of-the-art performance on the Twitter Paraphrase Corpus.
dc.identifier.doi10.1007/978-3-319-75477-2_42
dc.identifier.endpage599
dc.identifier.isbn978-3-319-75477-2
dc.identifier.isbn978-3-319-75476-5
dc.identifier.issn0302-9743
dc.identifier.issn1611-3349
dc.identifier.orcidEYECIOGLU OZMUTLU, ASLI/0000-0001-8817-3851;
dc.identifier.scopus2-s2.0-85044412562
dc.identifier.scopusqualityQ2
dc.identifier.startpage588
dc.identifier.urihttps://doi.org/10.1007/978-3-319-75477-2_42
dc.identifier.urihttps://hdl.handle.net/11772/22614
dc.identifier.volume9623
dc.identifier.wosWOS:000540380100042
dc.identifier.wosqualityN/A
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherSpringer International Publishing Ag
dc.relation.ispartofComputational Linguistics and Intelligent Text Processing, (Cicling 2016), Pt I
dc.relation.publicationcategoryKonferans Öğesi - Uluslararası - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.snmzWoS_20251016
dc.subjectParaphrase Identification
dc.subjectTurkish
dc.subjectCorpora Construction
dc.subjectKnowledge-Lean
dc.subjectParaphrasing
dc.subjectSentential Semantic Similarity
dc.titleConstructing a Turkish Corpus for Paraphrase Identification and Semantic Similarity
dc.typeConference Object
dspace.entity.typePublication
relation.isAuthorOfPublication0e3fb570-2f38-4b32-b68c-479beeb84c2e
relation.isAuthorOfPublication.latestForDiscovery0e3fb570-2f38-4b32-b68c-479beeb84c2e

Dosyalar