Knowledge-lean Paraphrase Identification Using Character-Based Features

Eyecioglu, Asli; Keller, Bill; Özmutlu, Aslı Eyecioğlu

doi:10.1007/978-3-319-71746-3_21

Knowledge-lean Paraphrase Identification Using Character-Based Features

dc.contributor.author	Eyecioglu, Asli
dc.contributor.author	Keller, Bill
dc.contributor.author	Özmutlu, Aslı Eyecioğlu
dc.date.accessioned	2025-10-18T10:02:18Z
dc.date.created	2018
dc.date.issued	2018
dc.department	Bartın Üniversitesi
dc.description	6th Conference on Artificial Intelligence and Natural Language (AINL) -- SEP 20-23, 2017 -- Saint Petersburg, RUSSIA
dc.description.abstract	The paraphrase identification task has practical importance in the NLP community because of the need to deal with the pervasive problem of linguistic variation. Accurate methods should help improve the performance of NLP applications, including machine translation, information retrieval, question answering, text summarization, document clustering and plagiarism detection, amongst others. We consider an approach to paraphrase identification that may be considered knowledge-lean. Our approach minimizes the need for data transformation and avoids the use of knowledge-based tools and resources. Candidate paraphrase pairs are represented using combinations of word-and character-based features. We show that SVM classifiers may be trained to distinguish paraphrase and nonparaphrase pairs across a number of different paraphrase corpora with good results. Analysis shows that features derived from character bigrams are particularly informative. We also describe recent experiments in identifying paraphrase for Russian, a language with rich morphology and free word order that presents a particularly interesting challenge for our knowledge-lean approach. We are able to report good results on a three-way paraphrase classification task.
dc.description.sponsorship	NLP Seminar,ITMO Univ
dc.identifier.doi	10.1007/978-3-319-71746-3_21
dc.identifier.endpage	276
dc.identifier.isbn	978-3-319-71746-3
dc.identifier.isbn	978-3-319-71745-6
dc.identifier.issn	1865-0929
dc.identifier.issn	1865-0937
dc.identifier.orcid	EYECIOGLU OZMUTLU, ASLI/0000-0001-8817-3851
dc.identifier.scopus	2-s2.0-85037534755
dc.identifier.scopusquality	Q3
dc.identifier.startpage	257
dc.identifier.uri	https://doi.org/10.1007/978-3-319-71746-3_21
dc.identifier.uri	https://hdl.handle.net/11772/20515
dc.identifier.volume	789
dc.identifier.wos	WOS:000437301200021
dc.identifier.wosquality	N/A
dc.indekslendigikaynak	Web of Science
dc.indekslendigikaynak	Scopus
dc.language.iso	en
dc.publisher	Springer-Verlag Berlin
dc.relation.ispartof	Artificial Intelligence and Natural Language
dc.relation.publicationcategory	Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı
dc.rights	info:eu-repo/semantics/closedAccess
dc.snmz	WoS_20251016
dc.subject	Paraphrase Identification
dc.subject	Paraphrase Corpora
dc.subject	Character N-Grams
dc.subject	Lexical Overlap
dc.subject	Support Vector Machines
dc.title	Knowledge-lean Paraphrase Identification Using Character-Based Features
dc.type	Conference Object
dspace.entity.type	Publication
relation.isAuthorOfPublication	0e3fb570-2f38-4b32-b68c-479beeb84c2e
relation.isAuthorOfPublication.latestForDiscovery	0e3fb570-2f38-4b32-b68c-479beeb84c2e

Koleksiyon

WoS İndeksli Yayınlar Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu

Knowledge-lean Paraphrase Identification Using Character-Based Features

Dosyalar

Koleksiyon