Constructing a Turkish Corpus for Paraphrase Identification and Semantic Similarity

Yükleniyor...
Küçük Resim

Tarih

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Springer International Publishing Ag

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Araştırma projeleri

Organizasyon Birimleri

Dergi sayısı

Özet

The Paraphrase identification (PI) task has practical importance for work in Natural Language Processing (NLP) because of the problem of linguistic variation. Accurate methods should help improve performance of key NLP applications. Paraphrase corpora are important resources in developing and evaluating PI methods. This paper describes the construction of a paraphrase corpus for Turkish. The corpus comprises pairs of sentences with semantic similarity scores based on human judgments, permitting experimentation with both PI and semantic similarity. We believe this is the first such corpus for Turkish. The data collection and scoring methodology is described and initial PI experiments with the corpus are reported. Our approach to PI is novel in using 'knowledge lean' methods (i.e. no use of manually constructed knowledge bases or processing tools that rely on these). We have previously achieved excellent results using such techniques on the Microsoft Research Paraphrase Corpus, and close to state-of-the-art performance on the Twitter Paraphrase Corpus.

Açıklama

17th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing) -- APR 03-09, 2016 -- Mevlana Univ, Konya, TURKEY

Anahtar Kelimeler

Paraphrase Identification, Turkish, Corpora Construction, Knowledge-Lean, Paraphrasing, Sentential Semantic Similarity

Kaynak

Computational Linguistics and Intelligent Text Processing, (Cicling 2016), Pt I

WoS Q Değeri

Scopus Q Değeri

SDG

Cilt

9623

Sayı

Künye

Onay

İnceleme

Ekleyen

Referans Veren