File
Entropy of printed Bengali language texts.
Digital Document
Abstract |
Abstract
One of the most important sources of information is written and spoken human language. The language that is spoken, written, or signed by humans for general-purpose communication is referred to as natural language. Determining the entropy of natural language text is a fundamentally important problem in natural language processing. The study and analysis of the entropy of a language can be a meaningful resource for researchers in linguistics and communication theory. For the purpose of this research we have taken printed Bengali language text as our source of natural language. We have collected a sufficient number of printed Bengali language text samples and divided them into two classes, newspaper and literature. We have studied each class in order to come up with specific entropy for each category and analyzed their characteristics. As a separate study, we collected printed religious Bengali language texts, divided them into two classes, Islamic and Hindu, found their entropy and studied and analyzed their characteristics. From our research, we have found the zero and first-order entropy of Bengali language to be 5.52 and 4.55 respectively. The language uncertainty and redundancy are 0.8242 and 17.58% respectively. These entropy and redundancy results of the language will be useful to researchers to help find a better text compression method for Bengali language. |
---|---|
Persons |
Persons
Author (aut): Pramanik, Subrata
Thesis advisor (ths): Zahir, Saif
|
Degree Name |
Degree Name
|
Department |
Department
|
DOI |
DOI
https://doi.org/10.24124/2008/bpgub560
|
Collection(s) |
Collection(s)
|
Origin Information |
|
||||||
---|---|---|---|---|---|---|---|
Organizations |
Degree granting institution (dgg): University of Northern British Columbia
|
||||||
Degree Level |
Subject Topic | |
---|---|
Library of Congress Classification |
Library of Congress Classification
Q370 .P73 2008
|
Extent |
Extent
Number of pages in document: 72
|
---|---|
Physical Form |
Physical Form
|
Content type |
Content type
|
Resource Type |
Resource Type
|
Genre |
Genre
|
Language |
Language
|
Handle |
Handle
Handle placeholder
|
---|---|
ISBN |
ISBN
978-0-494-48804-1
|
Use and Reproduction |
Use and Reproduction
Copyright retained by the author.
|
---|---|
Rights Statement |
Rights Statement
|
unbc_15992.pdf872.13 KB
23253-Extracted Text.txt91.55 KB
Download
Language |
English
|
---|---|
Name |
Entropy of printed Bengali language texts.
|
Authored on |
|
MIME type |
application/pdf
|
File size |
893066
|
Media Use |