Constructing a Turkish Corpus for Paraphrase Identification and Semantic Similarity

Eyecioglu, Asli; Keller, Bill; Özmutlu, Aslı Eyecioğlu

doi:10.1007/978-3-319-75477-2_42

Constructing a Turkish Corpus for Paraphrase Identification and Semantic Similarity

dc.contributor.author	Eyecioglu, Asli
dc.contributor.author	Keller, Bill
dc.contributor.author	Özmutlu, Aslı Eyecioğlu
dc.date.accessioned	2025-10-18T13:23:00Z
dc.date.created	2018
dc.date.issued	2018
dc.department	Bartın Üniversitesi
dc.description	17th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing) -- APR 03-09, 2016 -- Mevlana Univ, Konya, TURKEY
dc.description.abstract	The Paraphrase identification (PI) task has practical importance for work in Natural Language Processing (NLP) because of the problem of linguistic variation. Accurate methods should help improve performance of key NLP applications. Paraphrase corpora are important resources in developing and evaluating PI methods. This paper describes the construction of a paraphrase corpus for Turkish. The corpus comprises pairs of sentences with semantic similarity scores based on human judgments, permitting experimentation with both PI and semantic similarity. We believe this is the first such corpus for Turkish. The data collection and scoring methodology is described and initial PI experiments with the corpus are reported. Our approach to PI is novel in using 'knowledge lean' methods (i.e. no use of manually constructed knowledge bases or processing tools that rely on these). We have previously achieved excellent results using such techniques on the Microsoft Research Paraphrase Corpus, and close to state-of-the-art performance on the Twitter Paraphrase Corpus.
dc.identifier.doi	10.1007/978-3-319-75477-2_42
dc.identifier.endpage	599
dc.identifier.isbn	978-3-319-75477-2
dc.identifier.isbn	978-3-319-75476-5
dc.identifier.issn	0302-9743
dc.identifier.issn	1611-3349
dc.identifier.orcid	EYECIOGLU OZMUTLU, ASLI/0000-0001-8817-3851;
dc.identifier.scopus	2-s2.0-85044412562
dc.identifier.scopusquality	Q2
dc.identifier.startpage	588
dc.identifier.uri	https://doi.org/10.1007/978-3-319-75477-2_42
dc.identifier.uri	https://hdl.handle.net/11772/22614
dc.identifier.volume	9623
dc.identifier.wos	WOS:000540380100042
dc.identifier.wosquality	N/A
dc.indekslendigikaynak	Web of Science
dc.indekslendigikaynak	Scopus
dc.language.iso	en
dc.publisher	Springer International Publishing Ag
dc.relation.ispartof	Computational Linguistics and Intelligent Text Processing, (Cicling 2016), Pt I
dc.relation.publicationcategory	Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı
dc.rights	info:eu-repo/semantics/closedAccess
dc.snmz	WoS_20251016
dc.subject	Paraphrase Identification
dc.subject	Turkish
dc.subject	Corpora Construction
dc.subject	Knowledge-Lean
dc.subject	Paraphrasing
dc.subject	Sentential Semantic Similarity
dc.title	Constructing a Turkish Corpus for Paraphrase Identification and Semantic Similarity
dc.type	Conference Object
dspace.entity.type	Publication
relation.isAuthorOfPublication	0e3fb570-2f38-4b32-b68c-479beeb84c2e
relation.isAuthorOfPublication.latestForDiscovery	0e3fb570-2f38-4b32-b68c-479beeb84c2e

Koleksiyon

WoS İndeksli Yayınlar Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu

Constructing a Turkish Corpus for Paraphrase Identification and Semantic Similarity

Dosyalar

Koleksiyon