File
A novel Naive Bayes classifier for detecting AI-generated text using word pair probabilities
Digital Document
Abstract |
Abstract
Nowadays, in the case of precise AI language models that can generate text without human intervention, distinguishing human-written content from AI-generated ones becomes crucial in such areas as education, blogging, and validating information sources. Common ways of text classification are often based on simple strategies like bag-of-words model or word lists as features. Consequently, these procedures fail to identify complex patterns or semantic relationships contained in natural languages because they concentrate on single terms. This thesis introduces a new text classification method that uses term pairs as features within the framework of the original Multinomial Naive Bayes model to enhance its effectiveness. Naive Bayes classifiers are simple yet effective for classification, however as they treat each word independently, making it difficult to capture the complex relationships in language. For example, terms like "New York" or "artificial intelligence" have meanings that extend beyond their individual words. By incorporating term pairs, this method captures these relationships more effectively. This proposed AI detector distinguishes between human-written and AI-generated texts while also identifying the specific AI source, determining whether the text was generated by ChatGPT, Gemini, or classified as coming from less popular AI models under "OtherAI". Unlike known tools such as GPTZero or QuillBot, which only detect AI-generated content without identifying the model, this method provides a more detailed classification by identifying the specific AI that generated the text. |
---|---|
Persons |
Persons
Author (aut): Golchoubian, Seyedeharezou
Thesis advisor (ths): Chen, Liang
Degree committee member (dgc): Jiang, Fan
Degree committee member (dgc): Li, Jianbing
|
Degree Name |
Degree Name
|
Department |
Department
|
DOI |
DOI
https://doi.org/10.24124/2024/59574
|
Collection(s) |
Collection(s)
|
Origin Information |
|
||||||
---|---|---|---|---|---|---|---|
Organizations |
Degree granting institution (dgg): University of Northern British Columbia
|
||||||
Degree Level |
Extent |
Extent
1 online resource (vii, 70 pages)
|
---|---|
Physical Form |
Physical Form
|
Physical Description Note |
Physical Description Note
PUBLISHED
|
Content type |
Content type
|
Resource Type |
Resource Type
|
Genre |
Genre
|
Language |
Language
|
Handle |
Handle
Handle placeholder
|
---|
Use and Reproduction |
Use and Reproduction
author
|
---|---|
Rights Statement |
Rights Statement
|
unbc_59574.pdf2.11 MB
20243-Extracted Text.txt107.83 KB
Download
Language |
English
|
---|---|
Name |
A novel Naive Bayes classifier for detecting AI-generated text using word pair probabilities
|
Authored on |
|
MIME type |
application/pdf
|
File size |
2208324
|
Media Use |