Constructing a Turkish Corpus for Paraphrase Identification and Semantic Similarity

Eyecioglu, Asli; Keller, Bill; Özmutlu, Aslı Eyecioğlu

doi:10.1007/978-3-319-75477-2_42

Constructing a Turkish Corpus for Paraphrase Identification and Semantic Similarity

Tarih

2018

Yazarlar

Eyecioglu, Asli

Keller, Bill

Özmutlu, Aslı Eyecioğlu

Yayıncı

Springer International Publishing Ag

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

The Paraphrase identification (PI) task has practical importance for work in Natural Language Processing (NLP) because of the problem of linguistic variation. Accurate methods should help improve performance of key NLP applications. Paraphrase corpora are important resources in developing and evaluating PI methods. This paper describes the construction of a paraphrase corpus for Turkish. The corpus comprises pairs of sentences with semantic similarity scores based on human judgments, permitting experimentation with both PI and semantic similarity. We believe this is the first such corpus for Turkish. The data collection and scoring methodology is described and initial PI experiments with the corpus are reported. Our approach to PI is novel in using 'knowledge lean' methods (i.e. no use of manually constructed knowledge bases or processing tools that rely on these). We have previously achieved excellent results using such techniques on the Microsoft Research Paraphrase Corpus, and close to state-of-the-art performance on the Twitter Paraphrase Corpus.

Açıklama

17th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing) -- APR 03-09, 2016 -- Mevlana Univ, Konya, TURKEY

Anahtar Kelimeler

Paraphrase Identification, Turkish, Corpora Construction, Knowledge-Lean, Paraphrasing, Sentential Semantic Similarity

Kaynak

Computational Linguistics and Intelligent Text Processing, (Cicling 2016), Pt I

WoS Q Değeri

N/A

Scopus Q Değeri

Q2

Cilt

9623

Bağlantı

https://doi.org/10.1007/978-3-319-75477-2_42
https://hdl.handle.net/11772/22614

Koleksiyon

WoS İndeksli Yayınlar Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu

Detaylı Öğe Kaydı

Constructing a Turkish Corpus for Paraphrase Identification and Semantic Similarity

Tarih

Yazarlar

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Erişim Hakkı

Araştırma projeleri

Organizasyon Birimleri

Dergi sayısı

Özet

Açıklama

Anahtar Kelimeler

Kaynak

WoS Q Değeri

Scopus Q Değeri

SDG

Cilt

Sayı

Künye

Bağlantı

Koleksiyon

Onay

İnceleme

Ekleyen

Referans Veren