Knowledge-lean Paraphrase Identification Using Character-Based Features

dc.contributor.authorEyecioglu, Asli
dc.contributor.authorKeller, Bill
dc.contributor.authorÖzmutlu, Aslı Eyecioğlu
dc.date.accessioned2025-10-18T10:02:18Z
dc.date.created2018
dc.date.issued2018
dc.departmentBartın Üniversitesi
dc.description6th Conference on Artificial Intelligence and Natural Language (AINL) -- SEP 20-23, 2017 -- Saint Petersburg, RUSSIA
dc.description.abstractThe paraphrase identification task has practical importance in the NLP community because of the need to deal with the pervasive problem of linguistic variation. Accurate methods should help improve the performance of NLP applications, including machine translation, information retrieval, question answering, text summarization, document clustering and plagiarism detection, amongst others. We consider an approach to paraphrase identification that may be considered knowledge-lean. Our approach minimizes the need for data transformation and avoids the use of knowledge-based tools and resources. Candidate paraphrase pairs are represented using combinations of word-and character-based features. We show that SVM classifiers may be trained to distinguish paraphrase and nonparaphrase pairs across a number of different paraphrase corpora with good results. Analysis shows that features derived from character bigrams are particularly informative. We also describe recent experiments in identifying paraphrase for Russian, a language with rich morphology and free word order that presents a particularly interesting challenge for our knowledge-lean approach. We are able to report good results on a three-way paraphrase classification task.
dc.description.sponsorshipNLP Seminar,ITMO Univ
dc.identifier.doi10.1007/978-3-319-71746-3_21
dc.identifier.endpage276
dc.identifier.isbn978-3-319-71746-3
dc.identifier.isbn978-3-319-71745-6
dc.identifier.issn1865-0929
dc.identifier.issn1865-0937
dc.identifier.orcidEYECIOGLU OZMUTLU, ASLI/0000-0001-8817-3851
dc.identifier.scopus2-s2.0-85037534755
dc.identifier.scopusqualityQ3
dc.identifier.startpage257
dc.identifier.urihttps://doi.org/10.1007/978-3-319-71746-3_21
dc.identifier.urihttps://hdl.handle.net/11772/20515
dc.identifier.volume789
dc.identifier.wosWOS:000437301200021
dc.identifier.wosqualityN/A
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherSpringer-Verlag Berlin
dc.relation.ispartofArtificial Intelligence and Natural Language
dc.relation.publicationcategoryKonferans Öğesi - Uluslararası - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.snmzWoS_20251016
dc.subjectParaphrase Identification
dc.subjectParaphrase Corpora
dc.subjectCharacter N-Grams
dc.subjectLexical Overlap
dc.subjectSupport Vector Machines
dc.titleKnowledge-lean Paraphrase Identification Using Character-Based Features
dc.typeConference Object
dspace.entity.typePublication
relation.isAuthorOfPublication0e3fb570-2f38-4b32-b68c-479beeb84c2e
relation.isAuthorOfPublication.latestForDiscovery0e3fb570-2f38-4b32-b68c-479beeb84c2e

Dosyalar