File
Study of document retrieval using Latent Semantic Indexing (LSI) on a very large data set.
Digital Document
Abstract |
Abstract
The primary purpose of an information retrieval system is to retrieve all the relevant documents, which are relevant to the user query. The Latent Semantic Indexing (LSI) based ad hoc document retrieval task investigates the performance of retrieval systems that search a static set of documents using new questions/queries. Performance of LSI has been tested for several smaller datasets (e.g., MED, CISI abstracts etc) however, LSI has not been tested for a large dataset. In this research, we concentrated on the performance of LSI on large dataset. Stop word list and term weighting schemes are two key parameters in the area of information retrieval. We investigated the performance of LSI by using three different set of stop word lists and, also, without removing the stop words from the test collection. We also applied three different term-weighting (raw term frequency, log-entropy, and tf-idf) schemes to measure retrieval performance of LSI. We observed that, firstly, for a LSI based ad hoc information retrieval system, a tailored stop word list must be assembled for every unique large dataset. Secondly, the use of tf-idf term weighting scheme shows better retrieval performance than log-entropy and raw term frequency weighting schemes even when the test collection became large. --P. ii. |
---|---|
Persons |
Persons
Author (aut): Zaman, A. N. K.
Thesis advisor (ths): Chen, Liang
Thesis advisor (ths): Brown, Charles
|
Degree Name |
Degree Name
|
Department |
Department
|
DOI |
DOI
https://doi.org/10.24124/2010/bpgub700
|
Collection(s) |
Collection(s)
|
Origin Information |
|
||||||
---|---|---|---|---|---|---|---|
Organizations |
Degree granting institution (dgg): University of Northern British Columbia
|
||||||
Degree Level |
Subject Topic | |
---|---|
Library of Congress Classification |
Library of Congress Classification
QA76.9.T48 Z36 2010
|
Extent |
Extent
Number of pages in document: 103
|
---|---|
Physical Form |
Physical Form
|
Content type |
Content type
|
Resource Type |
Resource Type
|
Genre |
Genre
|
Language |
Language
|
Handle |
Handle
Handle placeholder
|
---|---|
ISBN |
ISBN
978-0-494-60849-4
|
Use and Reproduction |
Use and Reproduction
Copyright retained by the author.
|
---|---|
Rights Statement |
Rights Statement
|
unbc_16069.pdf1.96 MB
Download
Language |
English
|
---|---|
Name |
Study of document retrieval using Latent Semantic Indexing (LSI) on a very large data set.
|
Authored on |
|
MIME type |
application/pdf
|
File size |
2055075
|
Media Use |