Knowledge-lean Paraphrase Identification Using Character-Based Features

Eyecioglu, Asli; Keller, Bill; Özmutlu, Aslı Eyecioğlu

doi:10.1007/978-3-319-71746-3_21

Knowledge-lean Paraphrase Identification Using Character-Based Features

Tarih

2018

Yazarlar

Eyecioglu, Asli

Keller, Bill

Özmutlu, Aslı Eyecioğlu

Yayıncı

Springer-Verlag Berlin

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

The paraphrase identification task has practical importance in the NLP community because of the need to deal with the pervasive problem of linguistic variation. Accurate methods should help improve the performance of NLP applications, including machine translation, information retrieval, question answering, text summarization, document clustering and plagiarism detection, amongst others. We consider an approach to paraphrase identification that may be considered knowledge-lean. Our approach minimizes the need for data transformation and avoids the use of knowledge-based tools and resources. Candidate paraphrase pairs are represented using combinations of word-and character-based features. We show that SVM classifiers may be trained to distinguish paraphrase and nonparaphrase pairs across a number of different paraphrase corpora with good results. Analysis shows that features derived from character bigrams are particularly informative. We also describe recent experiments in identifying paraphrase for Russian, a language with rich morphology and free word order that presents a particularly interesting challenge for our knowledge-lean approach. We are able to report good results on a three-way paraphrase classification task.

Açıklama

6th Conference on Artificial Intelligence and Natural Language (AINL) -- SEP 20-23, 2017 -- Saint Petersburg, RUSSIA

Anahtar Kelimeler

Paraphrase Identification, Paraphrase Corpora, Character N-Grams, Lexical Overlap, Support Vector Machines

Kaynak

Artificial Intelligence and Natural Language

WoS Q Değeri

N/A

Scopus Q Değeri

Q3

Cilt

789

Bağlantı

https://doi.org/10.1007/978-3-319-71746-3_21
https://hdl.handle.net/11772/20515

Koleksiyon

WoS İndeksli Yayınlar Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu

Detaylı Öğe Kaydı

Knowledge-lean Paraphrase Identification Using Character-Based Features

Tarih

Yazarlar

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Erişim Hakkı

Araştırma projeleri

Organizasyon Birimleri

Dergi sayısı

Özet

Açıklama

Anahtar Kelimeler

Kaynak

WoS Q Değeri

Scopus Q Değeri

SDG

Cilt

Sayı

Künye

Bağlantı

Koleksiyon

Onay

İnceleme

Ekleyen

Referans Veren