A MULTIAGENT ARCHITECTURE FOR SEMANTIC QUERY ACCESS TO LEGACY RELATIONAL DATABASES by Mohammad Zubayer B.Sc., International Islamic University Malaysia, 2005 THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN MATHEMATICAL, COMPUTER, AND PHYSICAL SCIENCES (COMPUTER SCIENCE) UNIVERSITY OF NORTHERN BRITISH COLUMBIA November 2011 ©Mohammad Zubayer, 2011 1+1 Library and Archives Canada Bibliotheque et Archives Canada Published Heritage Branch Direction du Patrimoine de I'edition 395 Wellington Street Ottawa ON K1A0N4 Canada 395, rue Wellington Ottawa ON K1A 0N4 Canada Your file Votre reference ISBN: 978-0-494-87590-2 Our file Notre reference ISBN: 978-0-494-87590-2 NOTICE: AVIS: The author has granted a non­ exclusive license allowing Library and Archives Canada to reproduce, publish, archive, preserve, conserve, communicate to the public by telecommunication or on the Internet, loan, distrbute and sell theses worldwide, for commercial or non­ commercial purposes, in microform, paper, electronic and/or any other formats. L'auteur a accorde une licence non exclusive permettant a la Bibliotheque et Archives Canada de reproduire, publier, archiver, sauvegarder, conserver, transmettre au public par telecommunication ou par I'lnternet, preter, distribuer et vendre des theses partout dans le monde, a des fins commerciales ou autres, sur support microforme, papier, electronique et/ou autres formats. The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission. L'auteur conserve la propriete du droit d'auteur et des droits moraux qui protege cette these. Ni la these ni des extraits substantiels de celle-ci ne doivent etre imprimes ou autrement reproduits sans son autorisation. In compliance with the Canadian Privacy Act some supporting forms may have been removed from this thesis. Conformement a la loi canadienne sur la protection de la vie privee, quelques formulaires secondaires ont ete enleves de cette these. While these forms may be included in the document page count, their removal does not represent any loss of content from the thesis. Bien que ces formulaires aient inclus dans la pagination, il n'y aura aucun contenu manquant. Canada Abstract This thesis proposes a novel approach to accessing information stored in legacy rela­ tional databases (RDB), based on Semantic Web and multiagent systems technologies. It introduces an architectural model of the Semantic Report Generation System (SRGS), designed to address the rising demand for flexible access to information in decision sup­ port systems. SRGS is composed of server Database Subsystems (DBS) and client User Subsystems (US). In a DBS, an agent interacts with the administrator to build a refer­ ence ontology from the RDB schema, which enables semantic queries without modifying the database. In a US, the decision-making user accesses the system through a simplified natural language interface, using customized extensions to the reference ontology that was imported from DBS; an agent helps build the custom ontology, and facilitates query formulation and report generation. The proposed approach is illustrated by several sce­ narios that highlight the key behavioral aspects of accessing information and developing ontologies. ii Contents 1 Introduction 1 2 Background and Related Work 8 2.1 2.2 2.3 2.4 Relational database systems 8 2.1.1 Knowledge representation using RDB model 9 2.1.2 The SQL query language 10 The Semantic Web 11 2.2.1 Semantic Web technologies 12 2.2.2 Knowledge representation using the RDF model 14 2.2.3 The SPARQL query language 17 2.2.4 Examples of Semantic Web projects 18 Building ontologies from relational structures 20 2.3.1 Converting relational structure to RDF 20 2.3.2 Upgrading converted structures to full ontologies 26 Multiagent systems 32 2.4.1 Agent-oriented software engineering 33 2.4.2 Agents and Semantic Web 35 2.4.3 Human-agent interactions 36 iii CONTENTS iv 3 Semantic Query Access to Legacy Relational Databases using Intelli­ gent Middleware 4 The Architectural Model 4.1 4.2 39 43 The system requirements 44 4.1.1 The generic system 44 4.1.2 Legacy RDB system 47 4.1.3 The Semantic Report Generation System (SRGS) 49 The multiagent architecture of SRGS 54 4.2.1 The basic architecture 55 4.2.2 Multiple Users accessing single User Subsystem 63 4.2.3 Multiple User Subsystems to single Database Subsystem 65 4.2.4 Single User Subsystem to multiple Database Subsystems 65 4.2.5 Multiple User Subsystems to multiple Database Subsystems ... 68 4.3 Agent roles 69 4.4 Incorporation of existing system components 72 5 Modeling and Accessing Information in SRGS 75 5.1 Ontology development by software agents 76 5.2 Scenario 1: Accessing information in SRGS 81 5.3 Scenario 2: Developing reference ontology 90 5.4 98 Scenario 3: Developing custom ontology 6 Analysis and Evaluation 102 7 Conclusions and Future Work 106 CONTENTS v Bibliography 109 A The D2RQ Platform 117 A.l The D2RQ Mapping File 117 A. 2 D2RQ extension 120 List of Tables 2.1 Sample RDB tables 10 2.2 ETL vs on-demand mapping 21 2.3 Performance comparison between Jena2 and D2RQ Platform 23 2.4 The main primitives of OWL 30 vi List of Figures 2.1 An SQL query 11 2.2 Semantic Web stack layer (Reproduced from Wikipedia Semantic Web entry) 13 2.3 RDF graph showing an instructor's record 16 2.4 A SPARQL query 17 4.1 The actors and high-level use cases of the generic system 45 4.2 Legacy RDB system 48 4.3 Accept request for information and present report 52 4.4 Manage ontology 53 4.5 Single User Subsystem to single Database Subsystem 55 4.6 The User Subsystem 56 4.7 The Database Subsystem 60 4.8 Multiple Users accessing SRGS 64 4.9 Customized User Subsystem for multiple users 65 4.10 Single User Subsystem attached to multiple Database Subsystems .... 67 4.11 Customized User Subsystem for multiple Database Subsystems ...... 68 4.12 The role of D2RQ Platform in the DBS 74 vii LIST OF FIGURES viii 5.1 The SNL request for report 82 5.2 87 The SPARQL script 5.3 The SQL query 88 5.4 The SPARQL results 88 5.5 The formatted report 89 5.6 (a) Prefix and RDB details in Mapping File (b) XML namespaces and ontology header in reference ontology 92 5.7 Class definition in: (a) Mapping File and (b) reference ontology 93 5.8 Subclass definition in reference ontology 94 5.9 Property definition in: (a) Mapping File and (b) reference ontology ... 95 5.10 Class relation definition in: (a) Mapping File and (b) reference ontology . 96 5.11 Class synonym definition in reference ontology 97 5.12 Reference ontology graph 98 5.13 Definition of transfer student in custom ontology 101 A.l D2R Server error 121 A.2 The extension 122 A.3 The D2RQ Platform with the extension 123 A.4 Binary log processing method 124 A.5 Query interception method 125 Chapter 1 Introduction The importance and impact of computer-based information systems in the progress of hu­ man society is well acknowledged in all disciplines. These systems enable users to create, store, organize, and access large volumes of information. Individuals and organizations increasingly rely on them for problem solving, decision making, and forecasting. The requests for information are increasing in complexity and sophistication, while the time to produce the results is tightening. These trends in using information systems compel researchers to look for more effective access techniques that meet modern requirements. Computer-based systems rely on databases for storing information. The relational database model (Codd, 1970) has been dominant for more than three decades. In or­ der to extract the necessary information from relational databases (RDB), non-technical users require technical assistance of database programmers, report writers, and appli­ cation software developers. These support tasks may be time consuming and involve multiple technical experts, resulting in delays and costs. In order to speed up access and give users more control, decision support systems rely on data warehousing techniques. 1 CHAPTER 1. INTRODUCTION 2 Those techniques require information to be extracted from operational databases, reor­ ganized in terms of facts and dimensions, and stored in data warehouses (Olszak and Ziemba, 2007). Operational databases are designed to support typical day-to-day oper­ ations, whereas data warehouses are designed for analytical processing of large volumes of information accumulated over time. That approach still relies on human technical expertise and may require weeks to effect the restructuring of data. Moreover, it requires accurate foresight as to what information might be needed. In the meantime, two relevant technologies have developed in the realm of the World Wide Web and Artificial Intelligence. One is the Semantic Web, which is a web of data that enables computers to understand the semantics, or meaning, of information on the Web (Berners-Lee, 1998a). The development of the Semantic Web entails structuring of information using a set of tools and standards recommended by the World Wide Web Consortium (W3C). This process requires formal representation of human knowledge in the form of a hierarchy of ontologies that correspond to knowledge domains at different levels of abstraction. An ontology is an explicit specification of a conceptualization; a conceptualization is an abstract, simplified view of the world that is represented for some purpose (Gruber, 1993). The Semantic Web infrastructure is in an early stage of con­ struction; it is developing through numerous current projects. The other novel technology is based on intelligent software agents and multiagent software systems. A software agent is a computer program, which is situated in a spe­ cific environment, and can act autonomously in that environment in order to meet its delegated objectives (Wooldridge, 2009). Multiple interacting agents can form a single CHAPTER 1. INTRODUCTION 3 system, known as a multiagent system (MAS). Agent technology brings together research results from the last few decades in several disciplines, mainly artificial intelligence, dis­ tributed systems, software engineering, economics, psychology, and social sciences. Fol­ lowing decades of research, this is becoming a major trend in mainstream computing and a likely successor to the currently dominant object-oriented software engineering paradigm (Lind, 2000). Software agents are expected to assist humans in many tasks, including searching and reasoning with information in a decision support environment. However, the information underlying the decision process needs to be organized following Semantic Web structures such as ontologies. Software agents can reason with information in a knowledge base once it is organized using ontologies. In reasoning with information stored in a database, software agents operate on the Closed World Assumption (CWA), which states that the information in the database is complete, and what is not asserted as true, is false (Russell and Norvig, 2003). In this thesis, I explore how a combination of these two technologies can be applied to overcome some of the issues arising in the context of traditional decision support envi­ ronments. I propose a system architecture, Semantic Report Generation System (SRGS), that relies on Semantic Web tools and software agents to enable effective user access to information in RDB systems without depending on report writers and database program­ mers. In SRGS, direct access to information is achieved by building a layer of semantic information structures on top of the existing legacy RDB system, and allowing users to interact with the system using a Simplified Natural Language (SNL). SRGS employs a CHAPTER 1. INTRODUCTION 4 software agent to help the Database Administrator in building ontologies using the struc­ ture of information stored in the RDB system, human domain knowledge, and knowledge resources on the Semantic Web. SRGS employs another software agent to help the users create their own layers of ontology by defining user-specific concepts that may not exist in the reference ontology developed from the RDB. This agent also assists the users in formulating requests for information. My research started with the formulation of system requirements followed by an analysis leading to a preliminary definition of the system architecture, and proceeded to selective modeling of existing Semantic Web and MAS tools that fit into the architecture of SRGS. I carried out these two tasks in an iterative manner, in which I identified ex­ isting software components that could be integrated, and then refined the architectural definition so that it could rely on the identified components. SRGS consists of two subsystems: the Database Subsystem (DBS) and the User Sub­ system (US). The DBS facilitates creating, storing, and organizing information while the US allows accessing this information. The US and DBS can reside on different machines and communicate through a network. Within each subsystem there is a software agent that assists the human users in organizing and accessing information. I show three pos­ sible configurations of SRGS with regards to the multiplicity of the subsystems: multiple USs to single DBS, single US to multiple DBSs, and multiple USs to multiple DBSs. I also show how the multiplicity of users affects the architecture of SRGS. For instance, when multiple users access the US, some elements in the system are customized to suit each user's preferences for interacting with the system. CHAPTER 1. INTRODUCTION 5 In the Database Subsystem (DBS), a software agent named Database Interface Agent (DBIA) assists a Database Administrator (DBA) in gradually building a reference on­ tology from the RDB structure; this is the common core ontology for all users of SRGS. The software agent possesses the technical know-how of ontology development process. In addition, the software agent refers to external resources on the Web such as publicly available libraries of ontologies and online lexical dictionaries, as sources of conceptual, lcxical. and domain-specific knowledge. Without the Semantic Web in place, the software agent is limited to its technical knowledge of the ontology development process alone. Thus, the software agent's role evolves from being a technical assistant to a knowledge­ able partner in the ontology development proccss as more doinain-spceific ontological resources become available with the development of the Semantic Web. The DBS is also responsible for retrieving the necessary information from the RDB system and presenting this information to the US in response to the request. The User Subsystem (US) accepts a user's request for information, asks the DBS to retrieve the information, and presents it as a formatted report. The user develops a custom ontology, which complements the reference ontology by defining user-specific terms with the assistance of a software agent called User Interface Agent (UIA). The user-system interaction occurs in a simplified natural language. The user formulates requests for information and introduces new terms in the custom ontology using the simplified natural language. The ontologies are used to formulate and verify requests for information, construct semantic queries, and format the extracted information as reports. CHAPTER 1. INTRODUCTION 6 The feasibility of the proposed approach is then substantiated using three scenarios illustrating the behavioral aspects of SRGS in accessing information in an RDB system and developing ontologies from the RDB structure. The scenarios show the interactions that occur between the agents and the human actors, the actions performed by the agents, and the tasks executed by the system components. In order to help to develop ontologies, the agents are equipped with meta-ontological knowledge and the know-how of method­ ological steps for guiding the ontology construction process. In addition, agents refer to external knowledge resources on the Semantic Web. The agents provide the technical knowledge while the human actors make decisions. The ontology construction process begins in the Database Subsystem with a rudimentary version of reference ontology gen­ erated through automatic conversion of an RDB structure to a Semantic Web structure. The converted structure, called the base ontology, then serves as a starting point from which the DBIA, in interaction with the DBA, incrementally develops a full reference ontology. The first scenario illustrates how the user of SRGS can request for information us­ ing the SNL. It illustrates the specific tasks performed by each system component in formulating a request for information, retrieving the requested information from the un­ derlying database, and presenting it to the user as formatted report. The second scenario is aimed at showing the interactions between the DBIA and the DBA, and the activities that occur within the DBS in the process of constructing the reference ontology. Once the construction completes, the DBS exports a copy of the reference ontology to the attached US. The third scenario focuses on showing how the user can complement the reference ontology by defining user-specific concepts in a custom ontology. CHAPTER 1. INTRODUCTION 7 The remaining chapters of the thesis cover the background and related work (Chap­ ter 2), the approach for query access to legacy RDB systems using Semantic Web tools (Chapter 3), the architectural model (Chapter 4), the behavioral aspects of SRGS (Chap­ ter 5), analysis and evaluation (Chapter 6), and conclusions and future work (Chapter 7). Chapter 2 Background and Related Work This chapter presents the background and an overview of previous research work in Relational Database systems (Section 2.1), the Semantic Web (Section 2.2), developing ontologies from relational structures (Section 2.3), and multiagent systems (Section 2.4). 2.1 Relational database systems The wealth of data that populates the Web is stored in legacy information systems; they are socio-technical computer-based systems that were developed in the past using older or obsolete technology. It may be risky to replace a legacy system, because an orga­ nization and its organizational policies can be critically dependent on its structure and function. Legacy information systems contain immense volumes of data accumulated over the lifetime of the system (Sommerville, 2004). Common sources of legacy data in­ clude relational, hierarchical, network, and object databases; as well as XML documents and flat files, such as the comma-delimited text files (Ambler, 2003). My study focuses on accessing information in relational databases, because of their prevalence in present 8 CHAPTER 2. BACKGROUND AND RELATED WORK 9 legacy information systems. A relational database (RDB) is implemented using the relational data model invented by Codd (1970). It is based on the mathematical term relation, which is represented as table in the database context. Each relation (table) represents an entity and is made up of named attributes (columns) of that entity, and each row contains one value per attribute (Connolly and Begg, 2001). The relational model has been the dominant data modeling technique because of its track record of scalability, reliability, efficient storage, and optimized query execution (Sahoo et al., 2009). However, one major limitation of the relational model is its inability to capture semantic relationships between data units. In addition, non-technical users of relational databases require technical assistance of database programmers and database administrators in order to access the necessary in­ formation. The remainder of this section is organized as follows: Knowledge representation us­ ing the relational mode is discussed in Subsection 2.1.1, and a language for querying information in RDB system is presented in 2.1.2. 2.1.1 Knowledge representation using RDB model In relational data modeling, an entity is represented as a table, and each attribute of the entity becomes a column in that table. Each row is an instance of the entity and can be uniquely identified by a primary key. Relationships between entities are represented by foreign keys. This logical structure of a database is called the database schema (Connolly and Begg, 2001). Table 2.1 shows a subset of an RDB containing two tables: Depart­ 10 CHAPTER 2. BACKGROUND AND RELATED WORK ment and Instructor. Each department is uniquely identified by Department-Name and each instructor by Instructor JD. An instructor belongs to only one department and a department can have one to many instructors. Therefore, Department_Name column is the primary key in the Department table and foreign key in the Instructor table. This model is for illustrative purpose only, and does not represent any real database. Table 2.1: Sample RDB tables Table: Department Department_Name Building Budget Comp. Sci. Taylor 100000 Biology Watson 90000 Finance Painter 120000 Table: Instructor 2.1.2 InstructorJD Last_Name Salary Department _N ame 12121 Wu 90000 Finance 45565 Katz 75000 Comp. Sci. The SQL query language Structured Query Language (SQL) is the standard language for defining and manipulat­ ing data stored in RDB systems (Chamberlin and Boyce, 1974; IBM, 2006). Common SQL commands include schema creation and modification, data insert, query, update, and delete. Writing SQL queries requires understanding of the underlying database CHAPTER 2. BACKGROUND AND RELATED WORK 11 schema, in addition to the knowledge of the SQL itself. Figure 2.1 shows a sample SQL query that returns all department names and budgets from the table Department. SELECT Department_Name, Budget FROM Department; Figure 2.1: An SQL query 2.2 The Semantic Web The Semantic Web is a web of data that enables computer systems to understand the se­ mantics, or meaning, of information that populates the Web (Berners-Lee, 1998a). This is in contrast to the current Web which is a web of documents. The objective of the Semantic Web is driving the evolution of the current web of document into a web of data in which users can easily find, share and combine information. The Semantic Web will enable machines to understand the meaning of information, thus allowing machines to assist human users in finding right information. Currently there are many individ­ ual projects underway towards developing Semantic Web infrastructure and applications using common formats and technologies recommended by the World Wide Web Consor­ tium (W3C) (Baker et al., 2009). Many of these applications are intended to eventually connect with each other and share information between them. The information underlying the Semantic Web must be organized according to the meaning of the represented contents. Human knowledge can be represented by organizing information as ontologies. Ontologies are considered as one of the essential parts of the CHAPTER 2. BACKGROUND AND RELATED WORK 12 Semantic Web. A university ontology, for instance, would define concepts such as faculty, student, department, course, project, etc., and how they are related to each other. At the most basic level of an ontology, concepts are represented as classes, and various attributes of a concept are represented as properties, and a class can have subclasses representing concepts that are more specific than the parent class. For example, a student class may have two subclasses: undergraduate student and graduate student. In addition to clas­ sification, one can define relationships between classes in an ontology. Thus, ontologies provide the structural framework for organizing and reasoning with information within a particular domain. In addition, upper ontologies, which describe general concepts that are the same across all knowledge domains, provide the functionality of semantic inter­ operability between multiple domain ontologies (Noy and McGuinness, 2001). In the rest of this section, I briefly review the main technical concepts underlying the Semantic Web architecture as envisioned in the W3C standards. Subsection 2.2.1 provides the basic definitions. Subsection 2.2.2 describes the knowledge representation model, and 2.2.3 presents a corresponding query language for retrieving information. Finally, 2.2.4 outlines three early Semantic Web projects. 2.2.1 Semantic Web technologies The development of the Semantic Web entails restructuring of information using a set of languages and standards. The Semantic Web architecture is illustrated in Figure 2.2. The main components in the Semantic Web stack are Unified Resource Identifier (URI), Resource Description Framework (RDF), RDF Schema (RDFS), Simple Protocol and RDF Query Language (SPARQL), and Web Ontology Language (OWL). CHAPTER 2. BACKGROUND AND RELATED WORK 13 User Interface and Applications Trust Proof Unifying Logic Ontologies: OWL Querying: SPARQL Rules: RIF/SWRL O -o Taxonomies: RDFS Data interchange: RDF Syntax: XML Identifiers: URI Character set: UNICODE Figure 2.2: Semantic Web stack layer (Reproduced from Wikipedia Semantic Web entry) URI provides a mechanism for uniquely identifying each information resource on the Web. A special type of URI is Unified Resource Locator (URL) which uniquely identifies the location of a resource, such as a web page, within the World Wide Web (Sauermann et al., 2008). Internationalized Resource Identifier (IRI) is a generalization of URI that may contain characters from the Universal Character Set, including Chinese, Japanese and Korean. RDF is a data modeling language which conceptualizes a data unit as a resource in terms of its property and property-values. Each resource is uniquely identified by its URI (Manola and Miller, 2004). RDFS provides the basic primitives such as classes and properties for structuring RDF resources. SPARQL is a language for querying RDF data, analogous to the way SQL is used for querying relational data (Prud'hommeaux and Seaborne, 2007). OWL is a knowledge representation language for authoring on­ CHAPTER 2. BACKGROUND AND RELATED WORK 14 tologies. It also facilitates reasoning and inference. OWL is based on RDF and RDFS (McGuinness and Harmelen, 2004). Discussing in details all the components of the Semantic Web is beyond the scope of this chapter. I focus on the components that are relevant to my research objective with regards to representing RDB structures in Semantic Web structures. In the following subsection, I illustrate how some of these components can be used to represent and access information in the Semantic Web compliant format. 2.2.2 Knowledge representation using the RDF model RDF is based on the idea of describing an entity in terms of properties and propertyvalues. An RDF statement consists of an entity (the subject), a property (the predicate) and a property-value (the object). This subject-predicate-object expression, also writ­ ten as (S, P, O) is known as a triple. The complete description of an entity would consists of a collection of triples called an RDF graph. The entity's class, which is the table name in the relational model, is described by an RDF triple containing the rdf:type predicate, rdfrtype is used to state that a resource is an instance of a class. For example, a triple of the form: R rdf:type C states that R is an instance of C, and C is an instance of rdfs:Class (Brickley and Guha, 2004). RDF mandates that each subject and predicate must be URIs; the object can be a URI or an actual value. Relational databases can be converted into RDF triples by following the core guide­ lines outlined by Berners-Lee (1998b). He proposed the following direct mappings be­ tween RDB and RDF: CHAPTER 2. BACKGROUND AND RELATED WORK 15 • An RDB record (row) is an RDF is an RDF subject • The column name of an RDB is an RDF predicate • An RDB table cell is an RDF object Figure 2.3 shows RDF representation of an instructor's record from the previous example of the relational model. The first row from Instructor table can be written as "45565 has Last_Name which is Katz". This statement becomes an RDF triple when written in the form (S, P, 0), in other words, (45565, Last_Name, Katz). But S and P must be in the URI format hence, the URI http://localhost:8080/resource/Instructor/45565 is assigned to 45565, and http://localhostvocab/resource/Instructor_Last_Name to Last-Name. Therefore, the correct triple is (http://localhost:8080/resource/Instructor/45565, http://localhostvocab/resource/Instructor_Last_Name, "Katz") In Figure 2.3, each triple corresponds to a single arc with its beginning node as the sub­ ject, arc label as the predicate, and ending node as the object. RDF refers to a set of URIs as a vocabulary (Manola and Miller, 2004). An organi­ zation may define its own vocabulary consisting of the terms it uses in its business. In our example, such terms can be http://localhostvocab/resource/Instructor for Instructor, and http://localhostvocab/resource/Instructor_Last_Name for Last_Name. CHAPTER 2. BACKGROUND AND RELATED WORK 16 http://iocathost:8080/resource/lnstnjctor/45565 rdf:type http;//localhostvocab/resource/lnstructor_Last_Name htip://localhostvocab/resource/!nstructor Katz http://localftostvocab/resource/Department# http:/Aocalhostvocab/resource/lnstrcutor_Salary http://iocaihostvocab/resource/Department/Comp._Sc}. 75000 rdf;type http//localhostvocab/resource/Department_Budget http://localhostvocab/resource/Department 100000 Figure 2.3: RDF graph showing an instructor's record An organization might as well take advantage of an external vocabulary instead of defin­ ing its own. Friend of a Friend's (FOAF) Vocabulary Specification (Brickley and Miller, 2005) and Dublin Core's Metadata Terms (Powell et al., 2007) are examples of such vocabularies. In our example of the RDF model, one can use FOAF's http://xmlns.com/foaf/spec/#term_lastName instead of using our own http://localhostvocab/resource/Instructor_Last_Name to refer to the term Last_name. Constructing RDF statements with URI predicates instead of character strings offers two main benefits. First, it minimizes the practice of using different terms to refer to the same thing. For instance, a database designer may use attribute names such as Family Name or Second Name to refer to someone's last name. CHAPTER 2. BACKGROUND AND RELATED WORK 17 In order to avoid the use of multiple attribute names, FOAF's Vocabulary Specification provides a unique identifier - http://xmlns. com/f oaf/spec/#term_lastName - to refer to a person's last name. This mechanism forces a designer to use the same URI predicate for all occurrences of a last name. Second, the use of URIs in RDF triples supports development and use of shared vocabularies on the Web. 2.2.3 The SPARQL query language SPARQL (Prud'hommeaux and Seaborne, 2007) is the standard query language for RDF data. A SPARQL query is made up of a set of triple patterns containing a subject, a predicate and an object. Each of the subjects, predicates or objects in a query can be a variable. SPARQL query processor searches for a set of triples that match the triple patterns specified in a query, binding the variables in the query to the corresponding part of each triple. Figure 2.4 shows a SPARQL query that returns all department names and budgets from the table Department. PREFIX vocab: SELECT ?department_name ?budget WHERE { ?department a vocab:department. ?department vocab:Department_Department_Name ?department_name. Tdepartment vocab:Department_Budget ?budget. > Figure 2.4: A SPARQL query SPARQL supports querying semi-structured and ragged data — data in unpredictable and unreliable structure — and querying disparate data sources in a single query. How­ ever, it does not support aggregate and group functions. SPARQL is a very young query CHAPTER 2. BACKGROUND AND RELATED WORK 18 language compared to SQL and is still maturing. There are other alternative query lan­ guages such as RDF Data Query Language (RDQL) (Seaborne, 2004) and RDF Query Language (RQL) (Karvounarakis et al., 2002). 2.2.4 Examples of Semantic Web projects I discuss three projects that have made significant contributions to the development of the Semantic Web. However, these projects do not constitute the entire Semantic Web as depicted in Figure 2.2. FOAF One of the earliest implementations of Semantic Web application is the Friend of a Friend (FOAF) project (Graves et al., 2007). FOAF creates a web of machine-readable pages that describe people, the links between them and the things that they are inter­ ested in. Brickley and Miller (2005) have defined the FOAF vocabulary specification which include the basic classes of entities such as person, organization, group, document and the type of links that exist between these entities. FOAF also takes advantage of Dublin Core (DC) metadata (Powell et al., 2007) for adding semantic annotation to its entities. FOAF continues to solve several problems of identity management on the Web. The University of North Carolina at Chapel Hill has applied FOAF approach to model the structure of its IT department. It is now possible to search staff-related information in seconds (Graves et al., 2007). CHAPTER 2. BACKGROUND AND RELATED WORK 19 DBpedia The traditional Web is based upon the idea of linking documents through hyperlinks. Semantic Web, on the other hand, is based upon the idea of linking data units. Bizer (2009) shows that links at a lower level of granularity, i.e. data-level, makes it possible to crawl the data space, and provide expressive query capabilities, much like how a database is queried today. Bizer's point is well demonstrated in the DBpedia project (Bizer et al., 2007), a community effort to extract structured information from Wikipcdia and make this information accessible in a way users can ask complex questions such as List all sci­ entists that were born in the 20th century in Canada. DBpedia knowledge base currently describes more than 2.6 million entities. It has been linked to other data sources on the Web which has made it a central interlinking hub for the emerging Web of data (Bizer et al., 2009). Kngine Kngine (ElFadeel and ElFadeel, 2008), known as Web 3.0 search engine, is a semantic search engine designed to understand the meaning of users' queries and return precise results. Depending on the nature of the query, Kngine shows results in visual represen­ tation such as graph, comparison table, map, and image. For example, searching for '3G cellphones' returns a list of all 3G network compatible mobile phones along with their pictures and relevant details. The results can be filtered by selecting one or more proper­ ties of 3G cellphones: year of release, brand name, operating system, camera resolution, and CPU type. Kngine does this by discovering the relationships between the keywords and concepts, and by linking different types of information together. CHAPTER 2. BACKGROUND AND RELATED WORK 20 These examples show how the Semantic Web technologies add new dimensions to the way one can access and use information on the Web. In order to do this at a large scale, the vast majority of the relational data that powers the Web needs to be exposed to Semantic Web structures. 2.3 Building ontologies from relational structures We have seen in Section 2.1 that one of the major limitations of the relational model is its inability to capture any semantic relationships between data. Section 2.2 shows that Semantic Web knowledge representation techniques can overcome this limitation. Data presented in RDF structures includes the semantic relationships; however it does not capture the domain knowledge that is associated with the data. In order to include the domain knowledge, an evolution from the converted RDF structure into an ontology is required. Relevant research with regards to converting relational to RDF structures is discussed in 2.3.1, and development of full ontologies is discussed in 2.3.2. 2.3.1 Converting relational structure to RDF There has been a great deal of research on mapping information stored in relational databases to RDF. Sahoo et al. (2009) list two main approaches in mapping relational data to RDF: Extract Transform Load (ETL) mapping and on-demand mapping. ETL process takes relational data as source input and delivers equivalent RDF triples as out­ put. On-demand mapping takes a SPARQL query as input, translates it to an equivalent SQL query, executes the SQL query on relational data, and translates SQL query results to SPARQL query results (in the form of RDF triples) as output. The strengths and CHAPTER 2. BACKGROUND AND RELATED WORK 21 weaknesses of both approaches are summarized in Table 2.2 (Sahoo et al., 2009). Table 2.2: ETL vs on-demand mapping ETL mapping Strengths Weaknesses 1. Faster query execution 1. Querying large RDF dataset may not be as fast as querying equal amount of RDB data. 2. Reduced arbitrary perfor­ mance demand on source RDB 2. SPARQL query results may not reflect most recent data. 3. Managing duplicate copies of data in two models. On-demand mapping 1. Query results are based on most recent data values 2. Data retrieval is based on RDB, and RDB outper­ forms RDF for analytic queries 1. Arbitrary performance de­ mand on source RDB may affect the performance of legacy information systems The on-demand mapping is widely preferred method primarily because it allows ac­ cess to the most up to date information, and it does not burden one with the task of maintaining another version of the same information. An example of on-demand mapping is Virtuoso Universal Server (Erling and Mikhailov, 2007), which converts all primary keys and foreign keys of an RDB into Internationalized Resource Identifiers (IRIs), and assigns a predicate IRI to each column, and rdfitype predicate for each row linking it to CHAPTER 2. BACKGROUND AND RELATED WORK 22 an RDF class IRI corresponding to the table. It then takes each column that is neither part of primary or foreign key, and creates a triple consisting of the primary key IRI as subject, the column IRI as predicate, and the column's value as object. This mapping process allows relational data to be rendered as virtual RDF graphs, and accessed using SPARQL queries. A second example of on-demand mapping is SquirrelRDF (Steer, 2009), a prototype tool that allows SPARQL queries on non-RDF databases such as RDB or Lightweight Directory Access Protocol (LDAP) servers. It automatically generates a mapping file that exposes an RDB schema to an RDF view. Gray et al. (2009) note that the auto­ matically generated RDF views require manual editing for maintaining their referential integrity. There exist tools that provide both ETL and on-demand mapping services. One of these tools is D2RQ Platform developed by Bizer and Seaborne (2004), which allows one to either convert an entire RDB into a set of RDF triples, or access an RDB as virtual and read-only RDF triples. D2RQ Platform consists of two main components: D2RQ Engine and D2R Server. D2RQ Engine relies on Jena (McBride et al., 2010) and Sesame (Broekstra et al., 2002), which are frameworks for storing and querying RDF data. D2RQ Engine works as a plug-in for Jena or Sesame Semantic Web toolkits by rewriting Jena or Sesame API calls and SPARQL queries to SQL queries using its D2RQ mapping language. The results of these SQL queries are then transformed into RDF triples and passed onto the D2R Server for publishing on the Semantic Web. CHAPTER 2. BACKGROUND AND RELATED WORK 23 D2RQ performance was compared to the performance of Jena2 database back end using a dataset of 200,000 paper descriptions from the DBLP Computer Science Bib­ liography. Jena2, which is a subsystem of Jena, stores RDF triples using a relational database. Query execution time was measured in milliseconds. The find(s p o) query was run on both platforms. This query is a minimal SPARQL query used for experimen­ tal purpose. The parameter's' represents the subject, 'p' represents the predicate and 'o' represents the object. The '?' parameter implies any for matching that slot. For instance, if's' denotes 'books', 'p' denotes 'has authors' and 'o' denotes 'authors names', find(? ? Tanenbaum) will return all books authored by Tanenbaum. The results from the performance comparison test by Bizer and Seaborne (2004) are shown in Table 2.3. Table 2.3: Performance comparison between Jena2 and D2RQ Platform find(s p o) query Jena2 D2RQ 1. find (s ? ?) 1.83 ms 0.01 ms 2. find (? p o) 1.94 ms 0.97 ms 3. find (? p ?) 42431 ms 72 ms 4. find (? ? o) 1.72 ms 3.23 ms As seen in the performance results, D2RQ executes queries much faster than Jena2 implementation. Sequeda et al. (2008), however, notes that in D2RQ approach, map­ ping between a relational schema and existing ontology requires one to manually specify the classes and the hierarchies between classes using an ontology editor. In addition, CHAPTER 2. BACKGROUND AND RELATED WORK 24 Bizer and Cyganiak (2007) listed a number of limitations of D2RQ platform. It does not support integration of multiple RDBs or other data source; it does not allow data manipulation; and it does not provide any inference capability. SquirrelRDF and D2RQ Platform still remain prototype tools for RDB to RDF map­ ping. Gray et al. (2009) prove that these prototypes fail to expose large science archives stored in relational format. The authors have tested several RDB to RDF mapping tools including D2RQ and SquirrelRDF for executing queries over a sample of large astronom­ ical data set and have come to the conclusion that more research and improvements are required for SPARQL and RDB to RDF mapping tools for exposing science archive data. They have tested with 18 standard scientific SQL queries and only 9 of them can be expressed in SPARQL queries. SPARQL also does not support mathematical functions such as aggregate and trigonometric functions. The mapping tools and techniques that have been discussed so far are intended for converting data from relational structure to RDF structure. Semantic Web researchers have also attempted to create tools for converting an RDB schema to an ontology. For instance, Sequeda et al. (2009) created Ultrawrap, an automatic wrapping system that generates an OWL ontology from RDB schema. The authors, however, doubt whether the results of a purely syntax driven translation of an RDB schema to OWL can qualify for a comprehensive ontology. Therefore, they use the term putative ontology to describe the resulting OWL ontology. The putative ontology works as a basis for the user to write SPARQL queries. Ultrawrap then natively translates SPARQL queries to SQL queries and uses an SQL optimizer to execute the SQL queries on the RDB system. It does not CHAPTER 2. BACKGROUND AND RELATED WORK 25 provide any reasoning capabilities. Mapping a relational database to an ontology is a challenging task. Cullot et al. (2007) have developed a prototype tool called DB20WL to create an ontology from a relational database. Its mapping process classifies RDB tables into three different cases to determine which ontology structures are to be created from which database compo­ nents. Tables in RDB are mapped to OWL classes and sub-classes. RDB table cases determine whether a table is mapped to an OWL class or sub-class. Then RDB columns are mapped to OWL properties. Primary key and foreign key relationships are mapped to object properties in order to preserve their referential integrity. During the mapping proccss, a mapping file is generated and used to translate ontological queries into SQL queries and retrieve corresponding instances. The previous examples mostly show general mapping of relational to RDF structure regardless of the domain of the data. There has also been research in building tools for mapping domain-specific data. For example, Byrne (2008) shows a general mechanism for converting cultural heritage data from relational databases to RDF triples. The author created a triplestore - specialized database for the storage and retrieval of RDF triples named Tether from the Royal Commission on the Ancient and Historical Monuments of Scotland (RCAHMS) database of around 250,000 historical sites with 1.5 million archives and bibliographic materials. CHAPTER 2. BACKGROUND AND RELATED WORK 2.3.2 26 Upgrading converted structures to full ontologies The previous section discussed techniques for converting relational to RDF structures from which a list of terms and their properties can be extracted. However, the converted structures do not capture the inherent knowledge in the data that often resides in a data dictionary or in the mind of the DBA. In order to capture the domain knowledge, the converted structures need to be upgraded into full ontologies. Noy and McGuinness (2001) state that there is no prescribed way or methodology for developing ontologies; the best solution always depends on the application in mind and the extensions that follow. The authors recommend the following seven guiding steps in developing an ontology: Step 1: Determine the domain and scope of the ontology The development process of an ontology starts with the definition of its domain and scope. In this step, the developer can ask some basic questions such as: (i) What is the domain that the ontology will cover? (ii) For what purpose the ontology is going to be used? (iii) For what types of questions the information in the ontology should provide answers to? (iv) who will use and maintain the ontology? The answers to these questions help determine the domain and limit the scope of the ontology. Step 2: Consider reusing existing ontologies The developer should check what others have done and whether it is possible to refine and extend existing ontologies instead of creating from the scratch. Reusing ontologies is important because creating an ontology for each application defeats the purpose of sharing knowledge. There are ontologies that are publicly available on the Web such as Ontolingua Server (2008) and DAML Ontology Library (2004), CHAPTER 2. BACKGROUND AND RELATED WORK 27 which can be imported into an ontology development environment. Step 3: Enumerate important terms in the ontology Create a list of all terms that one would like to make statements about. The terms can be formulated by asking some basic questions such as: (i) What are the terms that would one like to talk about? (ii) What properties do these terms have? (iii) what would one like to say about these terms? It is important to create a comprehensive list of all terms. Step 4-' Define classes and the class hierarchy There are three approaches in defining a set of classes: a top-down development pro­ cess starts with the definition of the most general concepts in the domain followed by subsequent specialization of the general concepts; a bottom-up development pro­ cess starts with the definition of the most specific concepts followed by subsequent grouping of these concepts into more general concepts; and a combination devel­ opment process is a blend of top-down and bottom-up processes which starts with the definition of more notable concepts and then proceeds with appropriate gener­ alization and specialization of the remaining concepts. Whichever approach is followed, one should start by defining classes. From the terms listed in Step 3, ones that describe concepts having independent existence are defined as classes in the ontology. The classes are then organized into a hierarchical structure. The class hierarchy represents an 'is-a' relation: a class A is a subclass of B if every instance of A is also an instance of B. CHAPTER 2. BACKGROUND AND RELATED WORK 28 Step 5: Define the properties of the classes - slots The properties of a class describe the internal structure of the class. Some of the terms formulated in Step 3 are defined as classes in Step 4; most of the remaining terms are then defined as properties of those classes. For a given property, one must determine which class it describes. Thus properties become slots attached to their respective classes. All subclasses inherit the slots of its parent class. For example, if a slot called Last name is added to the class Person, and Student is a subclass of Person , then Student will inherit the slot Last name. Step 6: Define the facets of the slots A slot can have different facets describing the value types, allowable values, cardi­ nality, etc. Value types describe what types of values can fill in the slot. Common value types are string, number, and boolean. Slot cardinality defines what is the minimum and maximum number of values a slot can have. Step 1: Create instances In the final step, individual instances of each class are created. This is done by first, choosing a class; second, creating an individual instance of that class; and third, filling in the slot values. The ontology development process should not stop with completing the final step. Rather it should be an iterative one in which a basic ontology is first created, and then revised and refined to fill in the missing pieces. CHAPTER 2. BACKGROUND AND RELATED WORK 29 There are a variety of languages for developing ontologies. RDF Schema (RDFS), which is an extension of the RDF language, introduces basic ontological primitives such as class, subclass, domain, range, etc that are used to define concepts and their relation­ ships in an ontology. The W3C has adopted OWL (McGuinness and Harmelen, 2004), which uses RDF and XML syntax and provides Description Logic (DL) based reasoning support, as the standard ontology language for the Semantic Web. McGuinness and Harmelen provide a comprehensive list of OWL primitives. The main primitives are summarized in Table 2.4. There exist a number tools for developing and managing ontologies. Protege (Horridge et al., 2009) is an open source ontology editor which allows one to create and export ontologies into various formats including RDFS and OWL. Fensel et al. (2001) extended RDFS to Ontology Inference Layer or Ontology Interchange Language (OIL), an ontol­ ogy infrastructure which includes the definition of a formal semantics based on DL, an ontology editor, and inference engines for reasoning capabilities. Protege and OIL are widely used tools for authoring ontologies from scratch or mod­ ifying existing ontologies; however, they do not provide any integrative function to work with an RDB to RDF mapping tool. TopBraid Composer (TBC) (TopQuadrant Inc., 2007) provides integrative function for connecting to an RDB through D2RQ Platform. TBC is an application development framework which provides a comprehensive set of tools covering the life cycle of semantic application development. TBC's integrative fea­ ture allows one to import an RDB structure into TBC as a base ontology and modify it towards building a more comprehensive ontology. RDB table names are represented CHAPTER 2. BACKGROUND AND RELATED WORK 30 Table 2.4: The main primitives of OWL owhclass : A class is a group of entities that share some common characteristics. Classes can be organized in a hierarchical order ranging from general to specialized classes. In class hierarchy, a general class is known as parent class, and a special class is called subclass. rdfs:subClassOf \ A specialized class can be defined as a subclass of its parent class. For example, the class Student can be stated as a subclass of Person. This subclass definition allows a reasoner to deduce that if an entity is a Student then it is also a Person. rdfs:property: A Property asserts general facts about the members of classes and specific facts about individuals. There are two types of properties: datatype property and object property. Datatype properties are relations between instances of classes and RDF literals and XML Schema datatypes, whereas Object properties are relations between instances of two classes. rdfs:subPropertyOf: Property hierarchies can be created by stating that a property is a subproperty of another property. rdfstdomain: A domain of a property limits the individuals to which the property can be applied. If a property relates an individual to another individual, and the property has a class as one of its domains, then the individual must belong to the class. rdfs:range: The range of a property limits the individuals that the property may have as its value. If a property relates an individual to another individual, and the property has a class as its range, then the other individual must belong to the range class. rdfilD. The instances of classes are declared using this primitive. owl:equivalentClass: Two classes may be stated as equivalent classes if they have the same instances. Equality can be used to create synonymous classes. owUequivalentProperty : Two properties may be stated as equivalent properties. Equality may be used to create synonymous properties. owl:sameAs: Two instances may be stated to be the same. An instance can be identified by a number of different names using this primitive. as classes and column names are represented as datatype properties. Primary key and foreign key relationships are represented as object properties. One can extend existing classes with superclasses and subclasses and specify the properties that the subclasses CHAPTER 2. BACKGROUND AND RELATED WORK 31 inherit from the superclasses. Ontologies created using TBC also can be exported to RDFS and OWL format. Ontology language use certain types of logic to support reasoning. A reasoning engine can infer logical consequences from a set of asserted facts and the inference results vary depending on which of the two assumptions are in place. The Closed World Assump­ tion (CWA) states that "databases (and people) assume that the information provided is complete, so that ground atomic sentences are not asserted to be true are assumed to be false" (Russell and Norvig, 2003). For example, assuming that a student database contains information about all students, if the name 'Adam' is not found in the database, a reasoning engine will conclude that 'Adam' is not a student. The CWA is often com­ plemented by the Unique Name Assumption (UNA) which assumes that names in a knowledge-base are unique and refer to distinct instances. The Open World Assumption (OWA), on the other hand, assumes that the descriptions of resources are not confined to a single knowledge-base or scope (Smith et al., 2004). In other words, from the absence of a statement alone a reasoner cannot conclude that the statement is false. OWL is designed with the purpose of defining Web ontologies. The Web is an open and dynamic environment in which information continues to evolve, and at any point in time one cannot assume its completeness. Therefore, the OWA is more appropriate while reasoning with information presented on the Web. Ricca et al. (2009) argue that OWL's OWA is unsuited for modeling enterprise ontologies because they evolve from relational databases where both CWA and UNA are mandatory. In addition, the presence of nam­ ing conventions in most enterprises can guarantee uniqueness of names which makes the CHAPTER 2. BACKGROUND AND RELATED WORK 32 UNA relevant. The authors have proposed an ontology representation language called OntoDLP which extends Answer Set Programming (ASP) with the main features such as classes, inheritance, relations and axioms that are relevant to ontologies. ASP is a kind of logic programming with negation as failure that works by translating the logic program into ground form and then searching for answer sets (Russell and Norvig, 2003). OntoDLP is used by OntoDLV, a system that facilitates specification and reasoning of enterprise ontologies. The mapping tools and techniques that I have presented in subsection 2.3.1, can fully automate the process of converting relational to RDF structure. Beyond this point, there is little or no automated assistance for upgrading the converted RDF structures to full ontologies. The available tools for upgrading RDF structures to ontologies require human expertise with considerable understanding of the ontology development process. 2.4 Multiagent systems Wooldridge (2009) defines agent as "a computer system that is situated in some environ­ ment, and that is capable of autonomous action in this environment in order to meet its delegated objectives". Although an agent system may operate alone in an environment and when necessary interact with its users, in most cases they consist of multiple agents. These multiagent systems can model complex software systems in which individual agents interact and collaborate to achieve a common goal or compete to serve their self interests (Bellifemine et al., 2007). Wooldridge and Jennings (1995) suggest three capabilities of an intelligent agent: CHAPTER 2. BACKGROUND AND RELATED WORK 33 1. Reactivity: Intelligent agents are able to perceive their environment, and respond in a timely fashion to changes that occur in it in order to satisfy their design objectives. 2. Proactiveness: Intelligent agents are able to exhibit goal-directed behav­ ior by taking the initiative in order to satisfy their design objectives. 3. Social Ability: Intelligent agents are capable of interacting with other agents (and possibly humans) in order to satisfy their design objectives. In order to achieve these capabilities, agents are modeled with the mental abilities such as beliefs, desires, and intentions. This is due to the fact that humans use these concepts as an abstraction mechanism for understanding the properties of a complex system. Developing machines with such mental qualities was first proposed by Mccarthy (1979). Shoham (1993) then articulated the idea of programming software systems in terms of mental states. The remainder of this section is organized as follows: agent-oriented software engi­ neering is discussed in Section 2.4.1; how the Semantic Web relates to agents is presented in Section 2.4.2; and issues related to agent-human interactions are discussed in Section 2.4.3. 2.4.1 Agent-oriented software engineering Agent-oriented approach in Software Engineering is a new software paradigm that models a complex system as a collection of autonomous, proactive, and social agents. Shoham (1993) first introduced the concept of Agent Oriented Programming (AOP) as a new soft­ ware development paradigm which can be viewed as a specialization of Object Oriented CHAPTER 2. BACKGROUND AND RELATED WORK 34 Programming (OOP). Though both paradigms may appear similar from the theoretical perspective, they have visible differences. Wooldridge (2009) lists three distinctions be­ tween AOP and OOP. First, agents exhibit autonomous behaviors while objects depend on external invocation. Agents enjoy the freedom to make their own decision whether or not to perform an action. Second, agents are reactive, proactive, and social, whereas the object-oriented model has nothing to do with these behaviors. Third, in a multiagent system, each agent is considered to have its own thread of control whereas in a standard object-oriented model the system has a single thread of control. Jennings and Wooldridge (2000) argue why agent-oriented techniques are well-suited to developing complex software systems. The authors compared agent-oriented tech­ niques with the object-oriented approach. In the object-oriented approach, an object perform its actions only when it is instructed by an external invocation. This approach may work for smaller application in cooperating and well-controlled environments; how­ ever, it is not suited to complex or competitive environments because it gives the control to execute an action to the client requesting that action and not the action executor. Thus objects are obedient to one another. Agent-oriented approaches allow the action executor - the agent - to decide whether or not to perform an action because it is more intimate with the details of the actions to be performed, therefore, it may know a good reason for executing or refusing to perform an action. The Object-oriented approach also fails to provide an adequate set of mechanisms for modeling a complex system that com­ prises of inter-related subsystems. Agent-oriented techniques provide problem solving abstraction for modeling the dependencies and interactions that exist in such complex systems. CHAPTER 2. BACKGROUND AND RELATED WORK 35 There are a number of platforms available for developing multiagent systems such as Jason (Bordini et al., 2007), Jadex (Pokahr et al., 2005), and JADE (Bellifemine et al., 2007). Jason, which is a Java based platform, uses AgentSpeak agent-oriented program­ ming language to program the behavior of individual agents. Jadex is a popular open source platform for programming intelligent software agents using XML and Java. JADE is an agent-oriented middleware that provides domain-independent infrastructure for de­ veloping multiagent systems. Telecom Italia distributes Jade as open source software under the terms of the LGPL (Lesser General Public License) Version 2. 2.4.2 Agents and Semantic Web Berners-Lee et al. (2001) envision a Semantic Web agent that communicates with other agents to set up a doctor's appointment. At the doctor's office, Lucy instructed her Semantic Web agent through her handheld Web browser. The agent promptly retrieved information about Mom's prescribed treatment from the doctor's agent, looked up several lists of providers, and checked for the ones in-plan for Mom's insurance within a 20-mile radius of her home and with a rating of excellent or very good on trusted rating services. It then began trying to find a match between available appointment times (supplied by the agents of individual providers through their Web sites) and Pete's and Lucy's busy schedules. (The emphasized keywords indicate terms whose semantics, or meaning, were defined for the agent through the Semantic Web.) CHAPTER 2. BACKGROUND AND RELATED WORK 36 As information on the Semantic Web is presented with semantic annotations and in the form on ontologies, agents are able to understand the meaning of this information and take appropriate actions based on perceived meaning. Thus, the entire Web becomes part of agents' environment in which agents can proactively search for necessary infor­ mation to serve its intended goals. However, realizing Berners-Lee's grandiose vision of Semantic Web agents has been highly challenging due to the dynamic and heterogeneous nature of the Semantic Web (Tamma and Payne, 2008). The authors have formulated some challenges faced by Semantic Web agents. These challenges can be summarized as follows: discovering resources; determining ontology identity; ontology reconciliation; dynamic evolution of agent ontologies; describing dialogues and protocols using ontolo­ gies; and representing and reasoning with uncertain information. The W3C in collaboration with Semantic Web researchers continues to develop tools and standards that may overcome some of the challenges identified by Tamma and Payne. 2.4.3 Human-agent interactions Human-agent interactions can be regarded as a specialization of human-computer inter­ actions. So far, there is no standard language or communication mechanism for humans to interact with agents. This choice is left with the designer to decide how agents are instructed and in which format agents report back to the users. Of particular interest is the issue that when it comes to communicating with oth­ ers, humans regard certain values such as speaking manner very highly. Agents, when interacting with humans should also follow a certain code of communication ethics. Brad- CHAPTER 2. BACKGROUND AND RELATED WORK 37 shaw et al. (2011) outline a number of characteristics of a good agent with regard to joint activity in the following maxims: • A good agent is observable. It makes its pertinent state and intentions obvious. • A good agent is attuned to the requirement of progress appraisal. It enables others to stay informed about the status of its tasks and identifies any potential trouble spots ahead. • A good agent is informative and polite. It knows enough about others and their situations so that it can tailor its messages to be helpful, opportune, and appropriately presented. • A good agent knows its limits. It knows when to take the initiative on its own, and when it needs to wait for outside direction. It respects policy-based constraints on its behavior, but will consider exceptions and workarounds when appropriate. • A good agent is predictable and dependable. It can be counted on to do its part. • A good agent is directable at all levels of the sense-plan-act cycle. It can be retasked in a timely way by a recognized authority whenever circumstances require. • A good agent is selective. It helps others focus attention on what is most important in the current context. CHAPTER 2. BACKGROUND AND RELATED WORK 38 • A good agent is coordinated. It helps communicate, manage, and deconflict dependencies among activities, knowledge, and resources that are prerequisites to effective task performance and the maintenance of common ground. The set of characteristics that makes an agent good or bad is a subjective choice. What is considered good manners in one culture may not be the case in another culture. Therefore, a designer should take into account the cultural sensitivity that agents may experience while engaging with other agents and human users. Chapter 3 Semantic Query Access to Legacy Relational Databases using Intelligent Middleware In this thesis, I propose and investigate a novel approach to accessing information stored in legacy relational database (RDB) systems. This approach is motivated by the rising demand for flexible access to information through user-friendly interfaces for human users as well as standardized software interfaces for intelligent agents that act on their behalf. The Semantic Web project envisions a world-wide infrastructure providing such services through the adoption of associated standards, and the use of tools and ontologies that are being developed towards realizing the objective of the Semantic Web. In that context, I explore how an institutional decision support system, based on legacy RDB systems, could employ a combination of Semantic Web and multiagent systems (MAS) technolo­ gies to evolve towards flexible semantic access to information within its own operational 39 CHAPTER 3. SEMANTIC QUERY ACCESS TO LEGACY RELATIONAL DATABASES USING INTELLIGENT MIDDLEWARE 40 scope. Legacy RDB systems are typically used for decision support purposes by a limited number of users with considerable knowledge about the domain of the information stored in relational databases. The users explain their information requirements to report writ­ ers through natural language communication, which refers to domain knowledge that is not explicitly captured within the database itself. The report writers use their acquired domain knowledge and technical expertise to translate the users' requests into formal SQL queries for extracting the necessary information. Thus, the users of legacy RDB systems always depend on report writers for bridging the gap between domain-level dis­ course and RDB query. Software agents can assist human users in a decision support environment by reasoning with information underlying the decision process; however, this requires the information to be structured using ontologies that formally capture the domain knowledge associated with that information. By developing the necessary ontologies within the system itself, and by introducing a simplified natural language interface, the proposed approach enables the user to directly access information without depending on the assistance of human intermediaries such as report writers. This implies that the structures of the information stored in RDB systems need to be represented in the form of ontologies. As such, I ex­ plore the possibility of converting an RDB schema to ontologies while keeping the RDB system's original design and functionalities intact. An RDB schema can be converted to ontologies in two steps: first, transforming the RDB schema to an RDF structure; and second, upgrading the RDF structure to an ontology. The first step has been fully CHAPTER 3. SEMANTIC QUERY ACCESS TO LEGACY RELATIONAL DATABASES USING INTELLIGENT MIDDLEWARE 41 automated by several RDB-to-RDF conversion tools, namely Virtuoso Universal Server, SquirrelRDF, and the D2RQ Platform. The second step requires human involvement for manually upgrading the converted structures to ontologies using ontology editors such as Protege, TopBraid Composer, and OntoDLV. To overcome some of the issues raised above I propose a system architecture, called the Semantic Report Generation System (SRGS). In defining the architecture of SRGS, I focus on the following tasks: 1. Develop a definition of a distributed system architecture that combines Semantic Web and MAS technologies in an intelligent middleware layer, which supports the building of ontologies from RDB structures, and provides the user with effective semantic query access to information in RDB systems. 2. Examine to what extent one can realize such a system architecture using the existing software systems that have already been developed or envisioned in the context of the Semantic Web and MAS research. Tasks one and two are carried out in an iterative manner in which I identify existing software systems that can be integrated as components in the new system architecture and then refine the aixhitectural definition so that it can rely on the identified compo­ nents. The proposed system will allow the Database Administrator (DBA) to incrementally build ontologies from an RDB schema with the assistance of a software agent. The archi­ tecture takes advantage of the existing automated process for converting RDB schema CHAPTER 3. SEMANTIC QUERY ACCESS TO LEGACY RELATIONAL DATABASES USING INTELLIGENT MIDDLEWARE 42 to RDF structure, and introduces a software agent to assist the DBA in developing a reference ontology from the RDF structure, human domain knowledge, and knowledge resources on the Semantic Web. Another software agent assists the user to develop a custom ontology, which defines user-specific concepts using entries from the reference on­ tology. In this approach, the agents interact with human actors throughout the ontology development process. The agents perform some of the technical tasks while the human actors make decisions. This role in ontology development adds a new dimension to the traditional user and DBA profiles. However, this does not require them to become tech­ nical experts fully specialized in the ontology development process because the agents are responsible for executing some of the technical tasks. The ontologies create a layer of semantic information structures on top of the existing legacy RDB system that enables semantic queries and allows agents to reason about the information stored in the RDB. In addition, the system includes a simplified natural language interface which enables the users to directly communicate their requests for information stored in the underlying RDB systems without depending on any human intermediaries. In this process, a software agent assists the users to formulate requests for information using the simplified natural language. Chapter 4 The Architectural Model In defining the architecture of Semantic Report Generation System (SRGS), I begin by abstracting the aspects of the system that are relevant to my study topic. I am pri­ marily interested in the viewpoint of the user who accesses information in an existing relational database (RDB) system. The structure of the database and its contents are not controlled by the user. The user is aware that the structure and contents of the database may evolve over time. However, the requirements that cause such changes to be made are beyond the scope of my current interest. If the present user can influence such requirements, those influences occur outside of my model. I assume that the user has some knowledge about the domain as well as the source of information. The system requirements are described in Section 4.1; the global architecture of the system is presented in Section 4.2; agent roles are discussed in Section 4.3; and incorpo­ ration of existing Semantic Web and MAS components is discussed in Section 4.4. 43 CHAPTER 4 . THE ARCHITECTURAL MODEL 4.1 44 The system requirements The system requirements are developed in three steps. Subsection 4.1.1 describes the requirements for a generic system that represents the basic functionality of user access to information in an RDB system in a way that is common to its many possible im­ plementations. The focus of 4.1.2 is on user-system interaction in legacy RDB system. Subsection 4.1.3 describes the requirements for user-system interaction for SRGS. The system requirements are represented in the form of use cases and actors. Rumbaugh et al. (2004) define use case as a coherent unit of functionality expressed as a transaction among actors and the system. An actor may be a person, organization or other external entity that interacts with the system. 4.1.1 The generic system The actors and high-level use cases of the generic system are shown in Figure 4.1. The actor of primary interest is the User. The other two actors, the Database Ad­ ministrator (DBA) and the Data Entry Operator (DEO), maintain the RDB structure and content respectively. The top four use cases of Figure 4.1 capture the generic system functions performed on behalf of the user, regardless of how these functions are executed. For example, in a legacy system the user typically performs these functions through a human intermediary who in turn accesses the computer system. In SRGS, the user performs the same functions through direct interaction with the computer system. CHAPTER 4• THE ARCHITECTURAL MODEL 45 Manage System Access And User Profile Present Report Figure 4.1: The actors and high-level use cases of the generic system Manage System Access and User Profile System access is based on user authentication, which verifies the user's identity and specific access rights. The user can also specify a set of preferences contained in the user profile with regards to the various options available in the user interface. This general use case includes system help and tutorial assistance. CHAPTER 4. THE ARCHITECTURAL MODEL 46 Accept Request for Information and Present Report In this use case, the system accepts request in which the user specifies what in­ formation should be retrieved and how it should be presented; the system then retrieves the information and presents it in the requested format. Manage Reports This use case allows the user to save, delete, reformat, and retrieve reports. Manage Ontology As the usage of data and the environment evolve there is a need to introduce new terms and modify some definitions of terms in the ontology. The ontology itself consists of two components. The first one, which I call the reference ontology, is incrementally developed from the structure of the RDB. It describes concepts and the semantic relationships between the concepts in an application domain. The second component, called the custom ontology describes the conceptual framework specific to a particular user, that can be directly translated to the reference on­ tology. Modifications of the rcferencc ontology occur when the DBA changes the structure of the database, or when the represented knowledge is updated due to external factors, such as changes in the organizational policies. Modifications of the custom ontology mainly occur when the user introduces new concepts and their relationships that are defined using constructs in the reference ontology. Maintain RDB data This use case allows the DEO to insert, delete, and modify data in the RDB system. CHAPTER 4• THE ARCHITECTURAL MODEL 47 Maintain RDB schema This use case enables the DBA to modify the structure of the RDB system. When necessary, the DBA may add, remove, or change RDB tables and the columns withing the tables. 4.1.2 Legacy RDB system In a legacy RDB system, some of the functionalities of the generic system shown in Figure 4.1 are performed by the human intermediary report writer on behalf of the user while the other functionalities are performed by the system. The DBA and the DEO actors play the same roles as in a generic system, and the use cases are executed in similar ways by the system. The actors and the high-level use cases of legacy RDB system are shown in Figure 4.2. Below I briefly describe the four use cases from the perspective of a legacy RDB system. Manage System Access and User Profile The user delegates access right to the report writer who accesses the system. System access is based on user authentication, which verifies the report writer's identity, and authorization. This general use case may include system help and tutorial assistance. Accept Request for Information and Present Report In this use case, the report writer accepts a request, in which the user specifies what information should be retrieved and how it should be presented; the report 48 CHAPTER 4. THE ARCHITECTURAL MODEL Legacy RDB System Maintain RDB Data Data Entry Operator Q DEO User System interaction Query RDB and A Present Results User Report Writer Maintain RDB Schema Database Administrator DBA Figure 4.2: Legacy RDB system writer then queries the RDB to retrieve the information and presents it to the user in the requested format. If the request is not clear, the report writer engages with the user to further clarify the request through natural language communication. Manage Reports This use case allows the report writer to save, delete, reformat and retrieve reports on behalf of the user. The user typically receives printed copies of the report. Manage Ontology The management of the reference ontology occurs between the report writer and the DBA; the management of the custom ontology occurs between the user and the report writer. The ontologies represent knowledge and personal experience of CHAPTER 4. THE ARCHITECTURAL MODEL 49 the actors informally recorded in electronic and paper-based documents or simply remembered by the actors. Managing both ontologies involve natural language communications between the actors. For instance, the report writer informs the user about modifications of concepts in the reference ontology; the user informs the report writer about a new concept the user wants to introduce to the custom ontology. Maintain RDB data This use case provides the same services as specified in the generic system. Maintain RDB schema This use case provides the same services as specified in the generic system. This process of negotiation and delegation between the user and the report writer is often time consuming, resulting in delays and costs. 4.1.3 The Semantic Report Generation System (SRGS) In SRGS, the user directly interacts with the system that performs the functionalities in the top four use cases shown in Figure 4.1. The short descriptions of the use cases are as follows. Manage System Access and User Profile System access is based on user authentication, which verifies the user's identity, and authorization. The user-system communication occurs in an interactive manner. The user can specify personal preferences for communicating with the system. The CHAPTER 4• THE ARCHITECTURAL MODEL 50 system helps the user to specify personal preferences and manages them. This general use case includes system help and tutorial assistance. Accept Request for Information and Present Report This use case allows the user to directly communicate report requests to the system. In the request, the user specifies what information should be retrieved and how it should be presented. If the user's request is not clear, the system asks the user to further clarify the request. This clarification process is an interactive one in which the system ensures that it understands the user's request. The system tries to mimic the role of the report writer in legacy RDB system. It then retrieves the information and presents in the requested format. Manage Reports This use case allows the user to save, delete, reformat and retrieve reports. Manage Ontology In SRGS, the reference ontology formally represents knowledge originating from the underlying relational database, human actors in the system, and ontological knowledge available on the Semantic Web. The DBA interacts with the system in building and maintaining the reference ontology. Thus, the DBA's actor profile now includes the new role of managing the reference ontology in addition to the traditional role of managing RDB systems. The custom ontology allows the user to introduce user-specific concepts and their relationships on top of the reference ontology. The user now has the additional role of managing the custom ontology. CHAPTER 4. THE ARCHITECTURAL MODEL 51 Maintain RDB data This use case provides the same services as specified in the generic system. Maintain RDB schema This use case provides the same services as specified in the generic system. The functions of each high level use case can be further specified by decomposing into simpler use cases. I discuss decomposition of the following two high-level use cases. 1. Accept Request for Information and Present Report The decomposition of the Accept Request for Information and Present Report use case is shown in Figure 4.3. The decomposed uses cases are grouped by their functionalities into the use cases that directly communicate with the user: the Front End, and the use cases that communicate with the legacy RDB system: the Back End. The Front End and the Back End can reside on different machines and communicate through a network. In general, a Back End can support multiple Front Ends, and a Front End can interact with multiple Back Ends. In the Front End, the Accept Request for Information (SNL) and Present Results use case allows the user to formulate request for information in a Simplified Natural Language (SNL). The request contains domain-specific terms that specify the information to be re­ trieved, and keywords that describe the format for presenting the retrieved information. Once the request is accepted, the Parse and Interpret SNL Request use case produces an intermediate representation of the user request, and the Verify Request Ontology use case ensures that each statement as a whole in the request is semantically correct. If the 52 CHAPTER 4. THE ARCHITECTURAL MODEL Back End Front End Parse and Interpret Convert SPARQL Script to SNL Request „ SQL Queries , Verify Request Q Ontology Accept Request for Information (SPARQL) and Present Results , f Accept Requests, for Information (SNL) \and Present Results' Query RDB N and Present Results, Generate User SPARQL Script Format and Display Report Convert SQL QuetyResults to SPARQL v Query Results j Figure 4.3: Accept request for information and present report SNL request is valid, the Generate SPARQL Script use case creates a SPARQL script from the intermediate representation of the SNL request. The Front End then sends the SPARQL script to the Back End for further processing. The Front End receives the SPARQL results sent from the Back End. The SPARQL results are then formatted as report and presented by the Format and Display Report use case. In the Back End, the Accept Request for Information (SPARQL) and Present Results use case receives the SPARQL script and translates it to equivalent SQL queries by the functions in the Convert SPARQL Script to SQL Queries use case. The Query RDB and Present Results use case then executes the SQL queries on the RDB system and sends the SQL results to the Convert SQL Query Results to SPARQL Query Results use case, which translates the SQL query results to SPARQL results. The Accept Request for Information (SPARQL) and Present Results use case sends the SPARQL results to the 53 CHAPTER 4. THE ARCHITECTURAL MODEL Front End. 2. Manage Ontology The decomposition of the Manage Ontology use case is shown in Figure 4.4. The use cases are grouped by their functionalities into use cases that directly communicate with the user, the Front End ; and use cases that communicate with the DBA, the Back End. Back End Front End Import Reference Ontology Export Reference Ontology Initialize Custom Ontology Manage Ontology Maintain Maintain Reference Ontology ROB Schema Update User DBA Custom Ontology Display Ontology Changes Maintain Consistency of leference Ontolog' Update Reference Ontology Figure 4.4: Manage ontology The Front End initially imports the reference ontology when it connects to the Back End. While importing the reference ontology, the functions of the Import Reference On­ tology use case in the Front End rely on the functions of the Export Reference Ontology use case in the Back End. The Initialize Custom Ontology use case allows the user to create a conceptual framework specific to the user. Through the functions in the Update Custom Ontology use case, the user can modify the definition of user-specific concepts CHAPTER 4• THE ARCHITECTURAL MODEL 54 in the custom ontology. When the reference ontology is updated in the Back End, the Maintain Consistency of Reference Ontology use case ensures that the updates are also applied to the reference ontology in the Front End. The reference ontology updates are displayed to the user by the Display Ontology Changes use case. In the Back End, the Export Reference Ontology use case sends a copy of the reference ontology to the attached Front End. The Maintain RDB Schema use case allows the DBA to modify the structure of the RDB by changing the RDB schema. When the DBA changes the RDB schema, the Maintain Reference Ontology use case incorporates the schema changes into the reference ontology with the help of the Update Reference Ontology use case. 4.2 The multiagent architecture of SRGS At the high level, SRGS consists of two subsystems: User Subsystem (US) and Database Subsystem (DBS). The primary functions of the US include accepting the user's request for information, presenting reports, and developing the custom ontology. The DBS is responsible for retrieving information from the legacy RDB system and developing the reference ontology. In the basic architecture, each subsystem is comprised of an agent and an environment which contains several software components. The agent perceives the behaviors of the components in the environment and influences their future actions. Subsection 4.2.1 describes the basic architecture of the system in which a single user is connected to the US which is attached to one DBS. Subsection 4.2.2 through 4.2.5 present three different versions of the system architecture with regards to the multiplicity of users and the subsystems. 55 CHAPTER 4. THE ARCHITECTURAL MODEL 4.2.1 The basic architecture The basic architecture is described using a very simple configuration of SRGS. It consists of a single US and a single DBS, connected through a wide area network, with a single user accessing the system. This configuration is depicted in Figure 4.5. User Subsystem (US) Database Subsystem (DBS) Database Interface Agent (DBIA) User Interface Agent (UIA) User User Interface Environment (UIE) WAN Database Interface Environment (D8IE) DBA ROB System J°E° Figure 4.5: Single User Subsystem to single Database Subsystem The User Subsystem The architectural structure of the US is shown in Figure 4.6. The User Interface En­ vironment (UIE) comprises the components that provide the main subsystem functions. The primary purpose of the UIE is to execute the routine user requests efficiently, without the need to engage in reasoning in the sense of artificial intelligence techniques. The UIE components can be designed and implemented using the conventional Object-Oriented Software Engineering (OOSE) methodology, with an emphasis on efficient performance. 56 CHAPTER 4. THE ARCHITECTURAL MODEL The User Interface Agent (UIA) can observe the events in the environment, including the behavior of individual components, and act on the environment to influence the be­ havior of its components. The agent provides the practical reasoning (i. e., deliberation and planning) capabilities to the subsystem, enabling it to autonomously resolve arising problems without intervention of human experts. Its presence introduces the qualities of flexibility, adaptability, tolerancc to variations in user preferences and practices, and evolution of the subsystem behavior according to changing user requirements. Those qualities are necessary in order for the system to meet the requirements formulated in Chapter 3 without additional human assistance. User Subsystem (US) User Interface Agent (UIA) User Interface Environment (UIE) SNL Processor I Natural Language Lexical Knowledge Representation SPARQL Generator Communication Service, Access Control & Security User Interface User Report Manager Ontology Manager Figure 4.6: The User Subsystem User Interface Environment All the components that communicate with the user and the UIA are grouped into the UIE. The solid lines represent direct communication that occurs between the user, the CHAPTER I THE ARCHITECTURAL MODEL 57 components and the UIA. The dashed line represents communication that occurs between the user and the UIA. The UIE consists of the following main components: User Interface The User Interface (UI) enables all communications between the user and the sys­ tem. It provides the functionalities with regards to accessing the system as formu­ lated in 4.1.3. SNL Processor The SNL Processor component enables the user to interact with the system using the SNL. The user formulates requests for information and modifications to the custom ontology using SNL statements. The SNL processor generates intermediate representations from these statements in three steps. First, it performs a lexical analysis in which it breaks the statements into smaller pieces called tokens, which are atomic units of the statements such as words and symbols. Second, it performs a syntax analysis by parsing the token sequence to identify the syntactic structure of the statements. Third, it performs a semantic analysis by verifying the custom and reference ontology to ensure that the tokens are positioned according to their semantic relationships so that each statement as a whole is meaningful. Once suc­ cessfully generated, the SNL Processor forwards the intermediate representations to the relevant component. If the statements concern a request for information, The SNL Processor forwards the statements regarding what information is to be extracted to the SPARQL Generator, and the statements for formatting the ex­ tracted information to Report Manager. Otherwise, it forwards the statements to the Ontology Manager. CHAPTER 4, THE ARCHITECTURAL MODEL 58 SPARQL Generator The SPARQL Generator constructs a SPARQL script from the intermediate repre­ sentation of user requests for information received from the SNL Processor. While constructing a script, it refers to the Ontology Manager for the RDB-specific names of terms used in the requests. Once a SPARQL script is generated, the UIA sends it to the DBS for further processing. Report Manager The Report Manager presents requested information in the form of reports. It receives SPARQL query results from the DBS and formats the results according to the user's formatting preferences. It communicates with the Ontology Manager to replace any database-specific name in the report with its primary name if the database-specific name is not the primary name. The Report Manager allows the user to view, reformat, save, and delete reports. Ontology Manager The Ontology Manager is responsible for maintaining the custom ontology and pro­ viding ontological services to the SNL Processor and SPARQL Generator. The cus­ tom ontology defines user-specific concepts and their relationships using constructs from the reference ontology. In order to keep the custom ontology in consistency with the reference ontology, any update in the reference ontology needs to be re­ flected in the custom ontology, if the update affects the definitions of any concepts in the custom ontology. When there is an update in the reference ontology, the UIA checks whether this update affects the custom ontology. If it does, the UIA makes the required changes in the custom ontology. If the update requires simple CHAPTER 4. THE ARCHITECTURAL MODEL 59 operation such as renaming a reference-ontological construct, the UIA performs this action without involving the user. If it requires more complex operations, such as restructuring certain relationships, the UIA engages with the user in the process of updating the custom ontology. Natural Language Lexical Knowledge Representation This component provides the meaning and semantic-relations between naturallanguage concepts in both machine processable and human readable format. In principle, it contains language ontology which can be enhanced by ontological de­ velopment by a software agent but doing so would be beyond the scope of this thesis. The UIA and the SNL Processor communicates with this component to look up meaning and relationships between natural language terms. Communication Service, Access Control and Security This component facilitates communications that occur between the US and the DBS. User privilege and security features are enforced by this component. Sys­ tem access is based on user authentication, which verifies the user's identity and authorization to access specific resources. User Interface Agent The UIA interacts with the user through the UIE. The user formulates requests for infor­ mation and develops a custom ontology through the User Interface. The user interacts with the system using the SNL. The UIA assists the user in the process of formulating requests for information as well as developing and maintaining a custom ontology. The UIA invokes different components in the UIE in order to carry out its tasks. It perceives 60 CHAPTER 4. THE ARCHITECTURAL MODEL the behaviors of the components and through its actions the agent can influence the behavior of the components. The Database Subsystem The architectural structure of the DBS is shown in Figure 4.7. The Database Inter­ face Environment (DBIE) comprises the components that provide the main subsystem functions. The primary purpose of the DBIE is to extract requested information from the legacy RDB system, without the need to engage in reasoning in the sense of artificial intelligence techniques. The DBIE components can also be designed and implemented in the same way as the UIE components. Database Subsystem (DBS) Database Interface Agent (DBIA) Database Interface Environment (DBIE) Natural Language Lexical Knowledge Representation Ontology Manager SNl Processor User ^ Interface Communication Service, Access Control & Security Translator SPARQL to SQL Queries SQL to SPARQL Results Schema to Base Ontology Mapper Schema Monitor | RDB System Jr A1 DEO Figure 4.7: The Database Subsystem Database Interface Environment All the components that communicate with the DBIA, the DBA, and the DEO are CHAPTER 4. THE ARCHITECTURAL MODEL 61 grouped into the DBIE. The solid lines represent direct communication that occurs be­ tween the components. The dashed line represents communication that occurs between the DBA and the DBIA. The DBIE consists of the following main components: User Interface The User Interface (UI) provides an access point in which the DBA interacts with the DBS. Through the UI the DBA builds and maintains the reference ontology with the assistance of the DBIA and modifies the structure of the RBD system. SNL Processor The SNL Processor component enables the DBA to interact with the system using SNL statements. The DBA interacts with the DBIA in developing and maintaining the reference ontology. The DBA enters SNL statements through the User Interface. The SNL processor then generates an intermediate representation from the DBA's statements in the same three steps the SNL Processor in the US follows. Once the intermediate representation is generated, the SNL Processor forwards it to the DBIA if the interaction concerns developing and maintaining the reference ontology, or to the RDB system if the interaction concerns managing the database. Ontology Manager The Ontology Manager is responsible for the coordination and maintenance of the reference ontology. The DBS exports a copy of the reference ontology to the attached US. Thus the reference ontology is replicated in both subsystems. The Ontology Manager ensures that any modifications to the reference ontology in the DBS are propagated to the instances of reference ontology in all participating USs. CHAPTER 4. THE ARCHITECTURAL MODEL 62 Natural Language Lexical Knowledge Representation This component is identical to Natural Language Lexical Knowledge Representation in the US. The DBIA communicates with this component while developing and maintaining the reference ontology. Translator The Translator generates SPARQL query results from RDB data in three steps. First, it converts the SPARQL script to SQL queries; second, it executes the SQL queries 011 the RDB system and retrieves the SQL query results; and filially it converts the SQL query results to SPARQL query results. The DBIA then sends the SPARQL results to the US. Schema to Base Ontology Mapper This component automatically generates a base ontology from the underlying RDB schema. The base ontology represents an RDB table name as a class and the column names of the corresponding table as properties of the class. It also captures the relationships between RDB tables. The base ontology serves as a rudimentary ontology from which the reference ontology is incrementally developed. Schema Monitor The Schema Monitor always listens for change in the RDB schema made by the DBA. When it detects a schema change it notifies the Schema to Base Ontology Mapper component to reflect the modifications in the reference ontology. Communication Service, Access Control and Security This component facilitates all communications between the US and the DBS. By CHAPTER 4. THE ARCHITECTURAL MODEL 63 enforcing security features it ensures that no unauthorized access occurs in the RDB systems. RDB System The RDB system contains relational data which the user of SRGS is interested in. The Data Entry Operator (DEO) may insert, delete, or modify data in the RDB system. SRGS is not affected by such modifications. Database Interface Agent The Database Interface Agent (DBIA) communicates with the DBA and the DBIE. The DBIA is primarily responsible for assisting the DBA in developing and maintaining the reference ontology. It invokes different components in the DBIE. It perceives the behaviors of the components and through its actions the agent can influence the behavior of the components. 4.2.2 Multiple Users accessing single User Subsystem In this version of the system architecture, several users may access the US which is at­ tached to only one DBS. The multiplicity of users affects the architecture in the following ways: 1. When there are multiple users accessing the US at the same time, a single UIA is inadequate for attending to all users simultaneously. The preferred solution is to have an agent serving each user. The UIA can dynamically create an agent or awaken up an inactive agent from the background, and assign the agent to the additional user. The architecture of the system is illustrated in Figure 4.8. 64 CHAPTER 4. THE ARCHITECTURAL MODEL 2. Some elements in the UI are customized to each user's preferences for interacting with SRGS. Some elements that serve common functionalities to all users remain the same as in the basic configuration of the architecture. The customization of the UI is depicted in Figure 4.9. 3. A custom ontology needs to be created for each user. The Ontology Manager main­ tains a user's custom ontology by defining the terms that the user may introduce in the custom ontology. 4. The Report Manager maintains each user's history of preferences for report formats. User Subsystem (US) Database Subsystem (DBS) Database Interface Agent (DBIA) User i •• UIA •• User s~ Database Interface Environment (OBIE) User Interface Environment (UIE) V ROB System Figure 4.8: Multiple Users accessing SRGS The DBS is not affected by this change as the DBIA's tasks remain the same as in the basic architecture, regardless of the multiplicity of the users or the USs. The CHAPTER 4. 65 THE ARCHITECTURAL MODEL UIA n f 1 ; ut.il l H • I ; User • j Interface - Report Manager i-l 1 : Ontology Manager ' h~ ui„ ! | Formatting | Formatting Preferences 1 "' Preferences „; / \ Usern Reference Ontology i Custom I Ontology, > Custom Ontology „ Figure 4.9: Customized User Subsystem for multiple users DBIA's workload may increase in which case more resources can be added in the actual implementation. 4.2.3 Multiple User Subsystems to single Database Subsystem In this version of the system architecture, multiple USs interact with a single DBS. This configuration has no effect on the internal design of each US. Implementing this version of the architecture may require certain hardware configurations which are beyond the scope of this thesis. 4.2.4 Single User Subsystem to multiple Database Subsystems In this version of the system architecture, one US is attached to multiple DBSs. The requirement is that the multiplicity of the DBS is transparent to the user accessing the CHAPTER 4. THE ARCHITECTURAL MODEL 66 US. This transparency can be achieved in the following manner: 1. I assume that there are n DBSs attached to one US. The SPARQL Generator decomposes the SPARQL script into up to n SPARQL scripts. The UIA then sends each SPARQL script to its respective DBS. Once the scripts are processed, the UIA receives a set of SPARQL results from the DBSs and forwards them to the Report Manager. Note that the UIA must receive SPARQL query results from all the DBSs. The Report Manager aggregates the set of SPARQL results into one resultset. This scenario is illustrated in Figure 4.10. 2. The reference ontology in the UIE is now a union of n reference ontologies, where n is the number of DBSs. The Ontology Manger updates the reference ontology whenever the reference ontology in the DBS is changed. The modified US is illus­ trated in Figure 4.11. 67 CHAPTER 4. THE ARCHITECTURAL MODEL User Subsystem (US) Database Subsystem (DBS) t Database interface Agent (DBIA) User Interface Agent (UIA) User WAN User Interface Environment (U1E) Database Interface Environment (DBIE) RDB System Database Subsystem (DBS) Database Interface Agent (DBIA) Database Interface Environment (DBIE) DBA RDB System -9j DEO A Figure 4.10: Single User Subsystem attached to multiple Database Subsystems 68 CHAPTER 4. THE ARCHITECTURAL MODEL ! is»r Sufc-iysi-wn •:US} Aij-.rt UiA; 5 ; ' U'fUV: ': ^ ! SPARQL Generator Ontology Manager U'^r Report Manager Reference Ontology, Comfnunj~~:-!cr Serv ;:s. Oc*n^vi \ . Reference Ontology „ Figure 4.11: Customized User Subsystem for multiple Database Subsystems 4.2.5 Multiple User Subsystems to multiple Database Subsys­ tems In this version of the system architecture, several users access the US, which is attached to more than one DBSs. It is a combination of the scenarios discussed in Subsection 4.2.2 and Subsection 4.2.4. Several components in the UIE need to be changed in order to accommodate the multiplicity of both subsystems. The User Interface is customized for each user. The Report Manager maintains each user's history of preferences of report formats. The Ontology Manger maintains a custom ontology for each user. The reference ontology is a union of n number of ontologies, where n is the number of DBSs. CHAPTER 4. THE ARCHITECTURAL MODEL 4.3 69 Agent roles This section provides the specific roles performed by the UIA and the DBIA in SRGS. User Interface Agent 1. Assistance in SNL dialogue The UIA assists the user in formulating requests for information and developing a custom ontology. The user interacts with the system using SNL statements. The SNL processor generates an intermediate representation from these statements in three steps. In each step, the SNL processor may produce a warning or an error when the statements are formulated incorrectly. If the SNL processor produces a warning, the UIA perceives this warning and reconciles with the SNL Processor to resolve any arising issues in the statements. If the SNL processor produces an error, the UIA engages with the user to correct the error in the statements. 2. Searching the Semantic Web The agent can search the Semantic Web to look for two things. First, it can search for language ontologies such as WordNet to look up synonyms and hypernyms of terms. Second, the agent can search for domain ontologies to include terms that are acceptable in wider context. 3. Development of custom ontology The UIA assists the user in developing a custom ontology, which describes the conceptual framework specific to a the user that can be directly translated to the reference ontology. In order to keep the custom ontology in consistency with the reference ontology, the UIA ensures that any update in the reference ontology is also CHAPTER 4. THE ARCHITECTURAL MODEL 70 reflected in the custom ontology if the update affects any definitions of concepts and their relationships in the custom ontology. The UIA has the technical knowledge of the ontology development process. In addition, the UIA refers to publicly available ontological resources on the Semantic Web. 4. Customizing the behavior of User Interface While assisting the user with SNL dialog, the UIA can observe if the user is typically making a certain types of choice, and offer this choice first in its next interaction. This choice can be an explicit one in which the user specifies certain preferences or an implicit one in which the agent continuously learns by observing the user's course of actions. 5. Coordination of reference ontologies There may be multiple reference ontologies if there are multiple DBSs attached to the US. The UIA needs to resolve any conflicting situations that may arise due to the presence of multiple reference ontologies in the US. There can be two terms that are identical across two different ontologies but they may have completely dif­ ferent meaning in their respective contexts. While developing the custom ontology, the user needs to conveniently see these differences. The UIA disambiguates the terms that are seemingly identical but have different meaning. The UIA does this by assigning unique tags to the constructs when they are retrieved from different reference ontologies. CHAPTER 4. THE ARCHITECTURAL MODEL 71 Database Interface Agent 1. Assistance in SNL dialogue The DBIA assists the DBA to interact with the system while developing and main­ taining the reference ontology. The DBA interacts with the system using SNL statements. The SNL processor generates an intermediate representation from these statements in three steps. In each step, the SNL processor may produce a warning or an error when the statements are formulated incorrectly. If the SNL processor produces a warning, the DBIA perceives this warning and reconciles with the SNL Processor to resolve any arising issues in the statements. If the SNL pro­ cessor generates an error, the DBIA engages with the DBIA to rectify the error in the statements. 2. Searching the Semantic Web The DBIA can search the Web to look for two things. First, it can search for language ontologies such as WordNet (Miller, 1995) to lookup synonyms and hypernyms of terms. Second, it can search for domain ontologies to include terms that are acceptable in wider context. 3. Development of reference ontology The DBIA interacts with the DBA in developing and maintaining a reference ontol­ ogy. The Schema to Base Ontology Mapper component analyzes the RDB schema and generates a Mapping File, which contains RDF models of the RDB schema. The Mapping File then serves as a base ontology from which the DBA incremen­ tally builds a full reference ontology with the assistance of the agent. The DBIA is CHAPTER 4. THE ARCHITECTURAL MODEL 72 equipped with the technical knowledge of ontology development process. In addi­ tion, it refers to ontological resources on the Semantic Web. 4. Customizing the behavior of User Interface While assisting the DBA with SNL dialogue, the DBIA can observe if the DBA is typically making a certain types of choice, and offer this choice first in its next interaction. This choice can be an explicit one in which the DBA specifies certain preferences or an implicit one in which the agent continuously learn by observing the DBAs course of actions. In a multiagent environment, several instances of the same role can be assigned to multiple agents. It is also possible to assign several instances of different roles to the same agent. Some of the roles described above are demonstrated in the use case scenarios presented in Chapter 5. 4.4 Incorporation of existing system components The proposed architecture relies on existing software tools that have been developed as results of research in the Semantic Web and MAS. This subsection presents the main Se­ mantic Web and MAS tools that can be used as components in the proposed architecture. The D2RQ Platform (Bizer and Seaborne, 2004) can provide the functionalities of the Translator, Schema to Base Ontology Mapper, and Schema Monitor components in SEGS. Its front-end, the D2R Server, accepts SPARQL queries and presents SPARQL results (RDB triples). The D2RQ Engine is the core of the platform which provides the conversion service. It analyzes the structure of the RDB and generates a Mapping CHAPTER 4 . THE ARCHITECTURAL MODEL 73 File which is RDF representation of the RDB schema. In SRGS, the Mapping File is considered as the base ontology. The Mapping File is further described in A.l. D2RQ Engine uses the Mapping File to convert RDB data to RDF triples. The D2RQ Platform is an on-demand mapping tool, which dynamically translates RDB data to RDF triples instead of completely transforming an RDB to an RDF triplestore. If a data value in the database is changed, the D2RQ Platform can instantly display the new value. However, when a column or a table in the RDB is altered or dropped it does not display the new value. In other words, the D2RQ Platform fails to detect any change in the RDB schema. In order to overcome this limitation, I have added an extension to the D2RQ Platform. The extension enables the D2RQ Engine to detect any RDB schema change and display the modified values. The extension is further discussed in Appendix A.2. The three DBS components that provide the functions of the D2RQ Platform are shown in the dashed box in Figure 4.12. JADE (Bellifemine et al., 2007) is an agent-oriented middleware that provides domainindependent infrastructure for developing multiagent systems. JADE complies with the Foundation for Intelligent Physical Agents (FIPA) specifications and includes a set of tools that supports debugging and deployment tasks. The agent platform can be dis­ tributed across multiple computers and the configuration can be controlled via a remote Graphical User Interface. The configuration can be changed at run-time by creating new agents. 74 CHAPTER 4 . THE ARCHITECTURAL MODEL Database Subsystem (DBS) Database Interface Agent (DBIA) . • • \ hJ Database Interface Environment (DBIE) Natural Language Lexical Knowledge Representation Ontology Manager SNL Processor User ) tag. CHAPTER 5. MODELING AND ACCESSING INFORMATION IN SRGS 86 It then constructs the body of the query consisting of a SELECT clause and a WHERE clause. The SELECT clause identifies the variables to appear in the query results. In the SELECT clause, variables are taken from the technical words appearing in the second statement of the SNL request. The generator appends a leading "?" symbol to each base name to make it a variable. In the example SPARQL query shown in Figure 5.2, the variables in the SELECT clause are ?StudentID, ?FirstName, ?LastName, ?DOB, and ?CGPA. In the WHERE clause, a number of triple patterns are constructed. A triple pattern consists of a subject, a predicate, and an object. The subject is a variable created by appending the "?" symbol to the class name from the basic-technical-words set. The pred­ icate is a technical word written in the URI format (PREFIX :Class_Property), which is constructed in the following two steps: First, The SPARQL Generator concatenates a class name and a property name with an underscore symbol (_) in between them. The class name comes from the basic-technical-words set and the property name comes from the SELECT clause. Second, it concatenates the prefix (base) and the previously cre­ ated segment (Class_Property) with a colon symbol (:) in between them. The object variable is constructed using the property name. Following this method the generator constructs a triple pattern for each variable appearing in the SELECT clause. The generator then constructs a triple pattern for each property name from conditionaltechnical-words set. This time it uses the words from the conditional-technical-words set to create the variables. These two groups of triple patterns are then linked with a third triple pattern whose predicate has the property StudentID, which is a common prop­ CHAPTER 5. MODELING AND ACCESSING INFORMATION IN SRGS 87 erty between the class in the basic-technical-words set and the class in the conditionaltechnical-words set. The SPARQL script in Figure 5.2 is constructed from the example SNL request. Once the SPARQL script is constructed, the Communication Service, Ac­ cess Control k. Security (CSACS) sends it to the destination DBS. PREFIX base: SELECT 'StudentID ?FirstName ?LastName ?D0B? ?CGPA WHERE { 'student a vocab:Student. ?registration a vocab:Registration. ?student base:Student_StudentID ?studentID. ?student base:Student_FirstName ?FirstName. ?student base:Student_LastName ?LastName. ?student base:Student_D0B ?D0B. ?student base:Student_CGPA ?CGPA. ?registration base:Registration_StudentID ?student. ?registration base:Registration_Semester "Fall". ?registration base:Registration_Year "2011". > Figure 5.2: The SPARQL script The CSACS in the DBS receives the SPARQL script. By verifying credentials of the sender, it ensures that no unauthorized access occurs to the RDB system. It then passes the SPARQL script to the Translator component, which decomposes the script into one or more SPARQL queries. The D2RQ Engine within the Translator generates equivalent SQL queries by rewriting the SPARQL queries to RDB-specifie SQL queries. The SQL query shown in Figure 5.3 is generated from the SPARQL query in Figure 5.2. CHAPTER 5. MODELING AND ACCESSING INFORMATION IN SRGS 88 SELECT Student.StudentID, Student.FirstName, Student.LastName, Student.DOB, Student.CGPA FROM Student, Registration WHERE Student.StudentID = Registration.StudentID AND Registration.Semester = 'Fall' AND Registration.Year = '2009' Figure 5.3: The SQL query D2RQ query engine executes the SQL queries on the RDB system and retrieves SQL results. The Translator then converts the retrieved results from SQL format to SPARQL format. Note that the SQL results and the SPARQL results are the same specific infor­ mation retrieved from the database. For compatibility reason the Translator converts the retrieved results from SQL to SPARQL format. Once the SPARQL results are generated, the CSACS sends them to the US. A subset of the generated SPARQL results is shown in Figure 5.4. StudentID FirstName LastName DOB CGPA 98988 Shen Ming 1988-12-22 3.25 44553 Phill Cody 1990-05-10 3.7 98765 Emily Brandt 1978-10-29 2.85 70665 Jie Zhang 1990-08-26 3.4 76543 Lisa Brown 1992-06-01 3.7 19991 Shankar Pat el 1986-02-17 3.65 70557 Amanda Snow 1989-01-17 3.1 76653 Tom Anderson 1984-03-20 3.5 Figure 5.4: The SPARQL results CHAPTER 5. MODELING AND ACCESSING INFORMATION IN SRGS 89 The Report Manager in the US receives the SPARQL results from the DBS. It then formats the results according to the formatting instructions provided in the request by Henry. It adds the report title Registered Students and the subtitle Date for the report generation date. A user selected template (format-k) is used for displaying the report. It also sorts the SPARQL results alphabetically by LastName. The Report Manager refers to the Ontology Manager to replace any base name with its primary name or uscr-specific name. It then displays the formatted report to Henry. The report generated from the SPARQL results is shown in Figure 5.5. Registered Students Date: September 15, 2011 Student ID First Name LastName Date Of Birth C6PA 76653 Tom Anderson 1984-03-20 3.5 98765 Emily Brandt 1978-10-29 2.85 76543 Lisa Brown 1992-06-01 3.7 44553 Phill Cody 1990-05-10 3.7 98988 Shen Ming 1988-12-22 3.25 19991 Shankar Patel 1986-02-17 3.65 70557 Amanda Snow 1989-01-17 3.1 70665 Jie Zhang 1990-08-26 3.4 Figure 5.5: The formatted report CHAPTER 5. MODELING AND ACCESSING INFORMATION IN SRGS 5.3 90 Scenario 2: Developing reference ontology This scenario presents the specific steps involved, and the interactions that occur be­ tween the Database Interface Agent (Adam) and the Database Administrator (Helen), in developing a reference ontology from an RDB schema. The steps are presented in the following order: determining ontology domain and scope, defining classes and class hierarchies, defining properties of classes, and defining relations between classes. The construction of the reference ontology begins with determining its domain name that accurately reflects its scope. Adam extracts the name of the database from the Mapping File and displays to Helen, asking whether it represents the domain of the ap­ plication. Helen approves either by accepting the displayed name or entering a different name. Adam then defines the approved name as the domain of the reference ontology. In general there can be several relevant names to describe the ontology domain, but in this case we show one. Once the ontology domain name has been established, Adam uses it to search for publicly available ontologies in the same domain on the Semantic Web. For example, if the application domain is university, Adam searches for university ontology on the Semantic Web. With Helen's approval Adam may include the URI reference link to such an ontology in the reference ontology header. This allows the agent to later selectively import certain constructs from the external university ontology with Helen's approval. Similarly, Helen can specify other relevant domain for which external knowledge bases may be helpful; for instance, importing an external ontology of calendar structures or time zones may be preferable to developing one's own. In the context of developed Se­ CHAPTER 5. MODELING AND ACCESSING INFORMATION IN SRGS 91 mantic Web, review of external ontologies can play a significant role in constructing one's own reference ontology. In the current scenario this process is illustrated only through the use of the external ontology of the English language (such as WordNet (Miller, 1995)). Once the domain is determined, Adam asks Helen to provide any general comments about the ontology being developed and includes them in the reference ontology. Adam also extracts the prefix statements from the Mapping File (Figure 5.6a) and defines them as Extensible Markup Language (XML) namespaces above the reference ontology header (Figure 5.6b). Using the prefix vocab: , Adam defines the base namespace xmlns:base = "http://localhost:2020/vocab/ resource/", which provides a means to unambiguously identify constructs in the refer­ ence ontology from constructs in an imported ontology (which come with their own base namespace prefix). The remaining namespace definitions enable writing names in shorter forms, such as rdf instead of http://www.w3.org/1999/02/22-rdf-syntax-ns#. The class names and their synonyms are defined next. The set of class names is formed in two steps: first, the names of base classes are extracted from the base ontology represented by the Mapping File; second, the names of higher level classes are intro­ duced in interaction between Adam and Helen. In the first step, Adam extracts base class definition entries, such as map:Student a d2rq: ClassMap;, from the Mapping File and defines corresponding classes in the reference ontology. Since the base class names are RDB table names, they may not always be sufficiently descriptive for user level com­ munication. Therefore, Adam presents each base class to Helen, who can respond in three ways. First, she may decide that the base class name can adequately serve as the CHAPTER 5. MODELING AND ACCESSING INFORMATION IN SRGS 92 . xmlns:xsd = "http://www.w3.Org/2001/XMLSchema#" ©prefix rdf: . ©prefix rdfs: . ©prefix xsd: . Reference ontology developed from map:database a d2rq:Database; the university RDB schema d2rq:jdbcDriver "com.mysql.jdbc.Driver"; d2rq:jdbcDSN "jdbc:mysql://localhost/University"; <©vl:versiocInfo> VI.1 2011/09/15 University Ontology (a) D2RQ Mapping (b) OWL Figure 5.6: (a) Prefix and RDB details in Mapping File (b) XML namespaces and ontology header in reference ontology primary class name and approve it as such; second, Helen may provide an alternative primary name and approve it immediately; third, Helen may provide a tentative choice of primary name, asking Adam to conduct an external synonym search, and decide which primary name to approve after reviewing the synonym choices. (At this point Helen does not have the option of introducing other synonyms for the class name, even though class name synonyms in the reference ontology are allowed their use is restricted to the maintenance of backward compatibility between ontology versions.) Figure 5.7 illustrates the definition of the class Department in (a) the Mapping File and in (b) the reference ontology. CHAPTER 5. MODELING AND ACCESSING INFORMATION IN SRGS map:Department a d2rq:ClassMap; (a) D2RQ Mapping (b) OWL 93 Figure 5.7: Class definition in: (a) Mapping File and (b) reference ontology Once the base classes are defined, Adam and Helen define the more general classes. The superclasses can be defined in four ways. First, Helen identifies several existing classes that can be generalized into a new superclass; she provides the primary name for the superclass and Adam creates it. Second, Helen identifies the existing classes and provides a tentative primary name for the superclass; Adam then searches for synonyms of that name in external lexical ontologies, and Helen approves the new class names after reviewing the choices. Third, Helen identifies the existing classes and asks Adam to sug­ gest possible superclass names; Adam searches for hypernyms of each existing class name and reports to Helen the intersection of the hypernym sets resulting from the searches; Helen then approves the primary name of the new superclass after reviewing the choices. Fourth, Adam finds the hypernym set of each existing class, forms all possible inter­ sections, and reports to Helen each nonempty intersection; in each case, Helen decides whether a new superclass is needed and, if so, approves its primary name after reviewing the choices. For example, for the classes Student and Faculty Member Adam finds the common hypernym Person and creates a new class Person as well as subclass relationships between Student and Person, and between Faculty Member and Person with Helen's approval. The OWL definition of the hierarchical relationship between Student and Person is shown CHAPTER 5. MODELING AND ACCESSING INFORMATION IN SRGS 94 in Figure 5.8. Figure 5.8: Subclass definition in reference ontology The properties of the base classes are defined next. Property definition starts from the base classes and proceeds to their superclasses. For each base class defined, Adam extracts the property definition entries, such as map:Student_FirstName a d2rq:PropertyBridge;, from the Mapping File and defines corresponding base properties of their respective base class. Since the base property names are RDB column names, they may not always be sufficiently descriptive for user level communication. Therefore, Adam presents each base property name to Helen, who can respond in the same three possible ways as described in the case of defining base class names, and the rest of the process is the same. The Mapping File fragment for the properties FirstName, LastName, and DOB is shown in Figure 5.9a; and the OWL definitions of these properties and a meaningful name for DOB is shown in Figure 5.9b. Once the properties of the base classes are defined, Adam and Helen next define the properties of the superclasses. The properties of the superclasses can be defined in two ways. First, Adam conducts a property name comparison to see if there are identical property names among all the subclasses of each superclass, and if there are identical property names He takes the primary names of these properties and defines them as prop- CHAPTER 5. MODELING AND ACCESSING INFORMATION IN SRGS 95 map:Student_FirstName a d2rq:PropertyBridge; d2rq:belongsToClassMap map:Student; d2rq:property vocab:Student.FirstName; d2rq:propertyDefinitionLabel "Student FirstName"; d2rq:column "Student.FirstName"; map:Student.LastName a d2rq:PropertyBridge; d2rq:belongsToClassMap map:Student; d2rq:property vocab:Student.LastName; d2rq:propertyDefinitionLabel "Student LastName"; Cowl:DatatypeProperty rdf:ID*"bn.DGB"> d2rq:column "Student.LastName"; map:Student_DOB a d2rq:PropertyBridge; d2rq:belongsToClassMap map:Student; d2rq:property vocab:Student.DQB; d2rq:propertyDefinitionLabel "Student DOB"; d2rq:column "Student.DOB"; (a) D2R.Q Mapping (b) OWL Figure 5.9: Property definition in: (a) Mapping File and (b) reference ontology erties of the superclass. Second, For each superclass, Adam displays the subclasses along with their properties, asking Helen to specify if there are properties common among the base classes. These common properties may have different primary names even though they refer to the same property of their respective class. Helen may respond in two ways to resolve this name conflict. She may suggest one of the property name to be defined as property of the superclass, or she may suggest a new name representing the common properties. Adam defines the suggested names as properties of the superclass. He then removes the primary names of the common properties from the subclasses because they inherit these properties from their superclass. However, the base property names remain attached to the subclasses. For example, Person is a parent class of both Student and Faculty Member. The properties FirstName, LastName, and DOB are common between the subclasses Student and Faculty Member; therefore, the primary names of these prop- CHAPTER 5. MODELING AND ACCESSING INFORMATION IN SRGS 96 erties are moved up the hierarchy and defined as properties of the parent class Person. Finally, the relations between classes are defined. In the Mapping File, a relation between RDB tables is represented by the word join followed by the base class names separated by a directed arrow symbol (=>). Adam extracts the base class names appearing in each relation from the Mapping File and creates a graphical picture showing each pair of base classes. Adam displays the graphical picture to Helen, asking her to provide a name for each relation. Adam then defines each relation name as an ObjectProperty with the corresponding base class names as the domain and range. (This domain, defined in Section 2.4, differs from domain as an area of knowledge.) For example, Helen provides the word Offers to describe the relation between Department and Course. The Mapping File fragment of this relation is shown in Figure 5.10a, and the OWL definition is shown in Figure 5.10b. map:Course_DepartmentName a d2rq:PropertyBridge; d2rq:belongsToClassMap map:Course; d2rq:property vocab:Course_DepartmentName; d2rq:refersToClassMap map:Department; Crdfs:range rdf:resource="#pn.Course"> d2rq:join "Course.DepartmentName => Department.DepartmentName"; (a) D2RQ Mapping (b) OWL Figure 5.10: Class relation definition in: (a) Mapping File and (b) reference ontology The synonyms of names are allowed only for the maintenance of backward compat­ ibility between ontology version. For example, a new policy in the university requires Program to be called Department. As a result,, Helen instructs Adam to define a new class name Department and designate it as the primary name instead of Program. However, CHAPTER 5. MODELING AND ACCESSING INFORMATION IN SRGS 97 the name Program may still be referred to by the users; therefore, Helen asks Adam to define Program as a synonym of the new primary name Department. The OWL definition of the synonym Program is shown in Figure 5.11. Figure 5.11: Class synonym definition in reference ontology In the next step, Adam creates a graphical representation of the entire reference on­ tology and displays to Helen for final approval. A fragment of a university reference ontology illustrating the examples in this scenario is shown in Figure 5.12. Browsing and editing of the reference ontology is facilitated with an ontology editor such as Protege. By looking at the global picture of the reference ontology Helen may approve or suggest modifications. If Helen approves, Adam completes the construction of the reference on­ tology and saves it in the Ontology Manager. Otherwise Helen suggests modifications by editing the ontology graph. Adam follows relevant steps to apply the suggested modifi­ cations and completes the construction of the reference ontology. CHAPTER 5. MODELING AND ACCESSING INFORMATION IN SRGS 98 University Ontology syn~> LastName Family-Name FifstName prop prop - - rop~-»\ DOB / syn >(^Date-Of-Birth^) (^"^Budget C^^^uidling StudentID Program prop prop \ Name Student •prop\\ syn V Major ^^epartmeniT^) CGPA ^^partroent-Name^^ Enrolls > Offers Has Course 'prop" CRN Relationships ^ Constructs -is_a- - - • Generalization —prop • Property • —ret • Relation syn • Synonym :*. Primary Class Name Class Property c Primary Property Name Relation c. Primary Relation Name Figure 5.12: Reference ontology graph 5.4 Scenario 3: Developing custom ontology This scenario shows the processes involved, and the interactions that occur between the user (Henry) and the User Interface Agent (Alice), in developing a custom ontology using the constructs from the reference ontology. CHAPTER 5. MODELING AND ACCESSING INFORMATION IN SRGS 99 Henry formulates a request for report using the Simplified Natural Language (SNL) and submits it through the User Interface. While processing the request, the SNL Pro­ cessor performs lexical, syntactic, and semantic analysis, and recognizes the actions that need to be performed and invokes those actions in the appropriate system components. In the case of a query if the request successfully passes through all of the analysis stages, the SNL Processor forwards the statements specifying what information is to be retrieved to the SPARQL Generator, and the statements describing how the retrieved information is to be formatted to the Report Manager. Assuming that Henry uses a new term called "transfer student" in the request, during semantic analysis the Ontology Manager cannot find the term in the custom or reference ontology; therefore, it posts the following error message in the User Interface Environ­ ment (UIE): New term "transfer student" does not exist. Upon perceiving this error, Alice engages in a conversation with Henry to clarify the request. Alice displays the following message to Henry: The term "transfer student" is not found in the ontologies; Please define "transfer student". In response, Henry enters the following definition: Transfer student is a student who has transferred credits from previous institution. Alice verifies Henry's definition using the ontologies in the next step. Alice extracts the words from the definition and temporarily stores in a word set of new-term-set. Us­ ing the name of each element in new-term-set, Alice searches for matching constructs in the reference ontology. For "transfer student", assuming Alice finds the class construct Student and the property construct Transferred Credits and displays them to Henry. CHAPTER 5. MODELING AND ACCESSING INFORMATION IN SRGS 100 If Henry approves the displayed constructs, he also specifies any restriction associated with any of the constructs. In this scenario, Henry approves the class construct Student, and the property construct Transferred Credits with a restriction that its value cannot be empty. Alice then defines Transfer Student as a subclass of the class Student and Transferred Credits as its property with the restriction that it must have a value (of type number). Note that the class Student is in the reference ontology; Alice uses the Uni­ fied Resource Identifier (URI) of Student as a reference link from the custom ontology. The OWL statements for defining Transfer Student in the custom ontology is shown in Figure 5.13. Alice replaces the blank space in Transfer Student with a hyphen (-) and appends the leading tag "un" followed by a period (.) to denote that Transfer Student is a user-specific name. If Henry rejects the displayed constructs, Alice asks him to enter a different definition for "transfer student" and follows identical steps in defining the new term. Once the new term is defined, Ontology Manager creates a graphical representation of the custom ontology and Alice displays it to Henry for final approval. By looking at the global picture of the custom ontology Henry may approve or suggest modifications. If Henry approves, Alice saves it in the Ontology Manger. Otherwise, he suggests mod­ ifications by editing the graphical ontology graph. Alice follows relevant steps to apply the suggested modifications in the custom ontology. CHAPTER 5. MODELING AND ACCESSING INFORMATION IN SRGS 101 Figure 5.13: Definition of transfer student in custom ontology In this chapter, I have presented a number of scenarios to demonstrate the behavioral aspects of SRGS. The scenarios have been developed to show how the architecture of SRGS supports the activities involved in accessing information stored in the RDB system and developing ontologies from the RDB schema. In the next chapter, I analyze my work with regards to the main objectives of this thesis. Chapter 6 Analysis and Evaluation The main objective of this thesis has been the definition of a system architecture, Seman­ tic Report Generation System (SRGS), that allows developing ontologies from a legacy RDB schema, and accessing information stored in the legacy RDB system. In this chap­ ter, I analyze and evaluate the proposed architecture with respect to these objectives. The definition of the architectural model presented in chapter 4 begins with a set of system requirements represented as use case diagrams. The system requirements have been carefully developed to guide me in abstracting the aspects of the system that are relevant to my study topic. The basic configuration of the architecture includes a User Subsystem (US) and a Database Subsystem (DBS), each comprised of a software agent and an environment. The software agents assist their human partners in building on­ tologies and accessing information stored in an RDB system. The environments contain system components designed to provide the functionalities stipulated in the system re­ quirements. The US and the DBS can reside on different machines and communicate through a network. 102 CHAPTER 6. ANALYSIS AND EVALUATION 103 The basic architecture of SRGS is complete, yet scalable in several aspects. Multiple USs can interact with a single DBS, and multiple DBSs can be attached to a single US. In the US, more instances of the software agent can be created in the event of multiple users accessing the system simultaneously. The presence of multiple agents allows customized assistance for each user's unique requirements in terms of user system interaction and system behavior. SRGS allows development of ontologies from an RDB structure. In the DBS, the con­ struction of a reference ontology begins with a rudimentary version of reference ontology generated through automatic conversion of the RDB structure to a Semantic Web struc­ ture. The converted structure then serves as a starting point from which the Database Interface Agent (DBIA) in interaction with the Database Administrator (DBA) incre­ mentally develops a full reference ontology. In the US, the User Interface Agent (UIA) assists the user in developing a custom ontology, which defines user-specific concepts using constructs from the reference ontology. Thus, SRGS features a novel approach in which software agents assist human partners in developing ontologies. The agents are equipped with the requisite knowledge of how to build an ontology which includes an understanding of the semantics of general ontological notions, such as class and relationship. In addition, the agents refer to external knowledge resources pub­ licly available on the Semantic Web. The human partners only make decisions and need a good understanding of the ontology building process but owing to the agent assistance need not to have the level of expertise of specialist ontology developers. CHAPTER 6. ANALYSIS AND EVALUATION 104 The quality of ontologies developed in SRGS depends on two main factors. Though the human partners are not required to become technical experts fully specialized in the ontology development process, their level of understanding of the ontology development process can influence the quality of ontologies to a certain degree. Unintended and acci­ dental errors committed by human partners can be identified and resolved by the agents. However, poor decisions owing to lack of understanding of the ontology development process may result in misrepresentation of knowledge. The other factor influencing ontology quality is the availability of knowledge resources on the Semantic Web. The agents rely on external knowledge resources, such as lexical dictionaries and libraries of ontologies, on the Semantic Web. The Semantic Web is in its early stage of development, as such these knowledge resources are yet to be realized at large scale. The more such resources are available for agents to exploit the more refined and comprehensive ontologies can be developed. In Chapter 5, I introduce three scenarios to illustrate the main behavioral aspects of SRGS. The first scenario illustrates how the user of SRGS can access information stored in an RDB using a Simplified Natural Language; the second scenario shows the interac­ tions between the DBIA and the DBA, and the activities that occur within the DBS in the process of developing a reference ontology; and the third scenario demonstrates how the user can complement the reference ontology by introducing user-specific concepts in a custom ontology. CHAPTER 6. ANALYSIS AND EVALUATION 105 The above analysis suggests that it is possible to combine Semantic Web technologies and MAS approach to create a system architecture for accessing information stored in the RDB system without relying on human intermediaries, and for developing ontologies from an RDB schema. However, these observations remain to be further confirmed through studies involving implementation and experimentation of SRGS. Chapter 7 Conclusions and Future Work This thesis proposes and investigates a novel approach to modeling and accessing infor­ mation stored in legacy RDB systems. The preliminary research has included a review of literature in several areas: legacy RDB systems and their use in decision support; the Semantic Web project and its presently available technology; converting relational data to Semantic Web structures; and multiagent systems (MAS). Those preliminary studies led to several observations. The first observation was that the increasing demands in decision support systems that rely on legacy RDB systems compelled researches to look for more effective techniques that meet modern requirements. The second observation from the study of the Semantic Web was that information stored in legacy RDB systems can be represented using Semantic Web structures and searched by semantic queries in the SPARQL language; moreover, this can be done on demand, without any modifications of the RDB itself; however high-level semantic in­ teractions between the user and the system require a domain knowledge ontology that is more developed than the rudimentary ontology represented by the RDB schema. The 106 CHAPTER 7. CONCLUSIONS AND FUTURE WORK 107 third observation was that MAS technology holds the promise of development of in­ telligent decision support systems, with software agents that understand the nature of decision processes; however, this requires that the knowledge underlying the decisions be formally represented as an ontology. The final observation was that the agents themselves can be equipped with a meta-ontology and assist the human partner in the building of domain ontology, which in turn will enable both semantic queries and agent reasoning. These observations have led to the main line of research in this thesis, namely the defini­ tion of an architectural model of the Semantic Report Generation System (SRGS) that combines Semantic Web and MAS technologies. The first step towards defining the architectural model was to formulate a set of sys­ tem requirements in the form of use cases. These use cases then led to the preliminary definition of the global architecture of SRGS. At the high level, SRGS is comprised of client User Subsystems (US) and server Database Subsystems (DBS). A US consists of a User Interface Environment and a User Interface Agent (UIA). It facilitates user access, processing of users' requests for information, and developing and maintaining custom ontologies. The DBS consists of a Database Interface Environment, a Database Interface Agent (DBIA), and the legacy RDB system. It facilitates developing and maintaining a reference ontology, and retrieving information from the RDB system. The US and DBS can reside on different machines and communicate through a network. Multiple users can simultaneously access the US which can interact with multiple DBSs. Multiple USs can interact with a single or multiple DBSs. CHAPTER 7. CONCLUSIONS AND FUTURE WORK 108 The behavioral aspects of the architecture were then demonstrated through the devel­ opment of three characteristics scenarios. One scenario shows how the user can directly access information stored in an RDB system through a semantic query, without requiring technical assistance of database programmers and report writers. The other two scenar­ ios illustrate the intelligent assistance of software agents and the specific tasks executed by system components in developing ontologies from the RDB schema. The analysis of the scenarios suggests that SRGS has met the objective of defining a system architecture that capitalizes on Semantic Web and MAS technologies to create a layer of Semantic Web structures on top of a legacy RDB system in order to facilitate access to information stored in the underlying RDB system. This has been achieved through an innovative combination of Semantic Web and MAS technologies in which agents assist in ontology development. The next step in the current line of research concerns the possibility of implementing an SRGS prototype that can be used for practical verification of the presented architec­ ture. There are several other issues that can be further researched in order to advance the proposed approach. The software agents can be trained to be able to make more independent decisions and further reduce human involvement in the development of on­ tologies. The specific steps involved in ontology mediation with regard to importing constructs from external ontologies need to be further studied and elaborated. Bibliography S. Ambler. Agile Database Techniques: Effective Strategies for the Agile Software Devel­ oper. John Wiley & Sons, Inc., New York, NY, USA, 2003. T. Baker, T. Heath, N. Noy, R. Swick, and I. Herman. Semantic Web Case Studies and Use Cases, 2009. Retrieved June 18, 2011 from http://www.w3.org/2001/sw/sweo/ public/UseCases/. F. Bellifemine, G. Caire, and D. Greenwood. Developing Multi-Agent Systems with JADE. John Wiley h Sons, Inc., Wiltshire, UK, 2007. T. Berners-Lee. Semantic Web Road Map. Philosophical Points, 1998a. W3C Design Issues Architectural and Retrieved May 03, 2010 from http://www.w3.org/ Designlssues/Semantic.html. T. Berners-Lee. Relational Databases on the Semantic Web. WSC Design Issues, 1998b. URL http://www.w3.org/Designlssues/RDB-RDF.html. T. Berners-Lee, J. Hendler, and O. Lassila. The semantic web: A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities. Scientific American, 285(5) :28—37, 2001. 109 BIBLIOGRAPHY 110 C. Bizer. The Emerging Web of Linked Data. IEEE Intelligent Systems, 24:87-92, 2009. C. Bizer and R. Cyganiak. D2RQ - Lessons Learned. Position paper for the W3C Workshop on RDF Access to Relational Databases, Cambridge, USA, 2007. C. Bizer and A. Seaborne. D2RQ-Treating non-RDF Databases as Virtual RDF Graphs. In Proceedings of the 3rd International Semantic Web Conference (ISWC2004), Hi­ roshima, Japan, 2004. C. Bizer, R. Cyganiak, S. Auer, G. Kobilarov, and J. Lehmann. DBpedia - Querying Wikipedia like a Database. In Developers Track at 16th International World Wide Web Conference (WWW2007), Banff, Canada, 2007. C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. DBpedia - A crystallization point for the Web of Data. Web Semantics: Science, Services and Agents on the World Wide Web, 7(3):154-165, September 2009. R.H. Bordini, M. Wooldridge, and J.F. Hiibner. Programming Multi-Agent Systems in AgentSpeak using Jason. John Wiley & Sons, 2007. J.M. Bradshaw, P. Feltovich, and J. Matthew. Human-Agent Interaction. In Guy Boy, editor, Handbook of Human-Machine Interaction, pages 283 - 302. Ashgate, 2011. D. Brickley and R.V. Guha. RDF Vocabulary Description Language 1.0: RDF Schema. Technical report, W3C, 2004. Retrieved October 06, 2010 from http://www.w3.org/ TR/2004/REC-rdf-schema-20040210/. D. Brickley and L. Miller. Friend of a Friend Vocabulary Specification, 2005. Retrieved August 28, 2010 from http://xmlns.com/foaf/spec/. BIBLIOGRAPHY 111 J. Broekstra, A. Kampman, and F.V. Harmelen. Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema. In Proceedings of the 1st International Semantic Web Conference, pages 54-68, Sardinia, Italy, 2002. K. Byrne. Having Triplets - Holding Cultural Data as RDF. In M Larson, K Fernie, J Oomen, and J Cigarran, editors, Proceedings of the ECDL 2008 Workshop on In­ formation Access to Cultural Heritage, volume 1, pages 978-90, Aarhus, Denmark, 2008. D.D. Chamberlin and R.F. Boyce. SEQUEL: A structured English query language. In Proceedings of the 1974 ACM SIGFIDET Workshop on Data Description, Access and Control, pages 249-264, Ann Arbor, Michigan, 1974. E.F. Codd. A Relational Model of Data for Large Shared Data Banks. Commun. ACM, 13(6):377-387, 1970. T. Connolly and C. Begg. Database Systems: A Practical Approach to Design, Imple­ mentation, and Management. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2001. X. Cullot, R. Ghawi, and K. Yetongnon. DB20WL : A Tool for Automatic Databaseto-Ontology Mapping. In M. Ceci, D. Malerba, L. Tanca, M. Ceci, D. Malerba, and L. Tanca, editors, Proceedings of the 15th Italian Symposium on Advanced Database Systems (SEBD 2007), pages 491-494, Torre Canne, Italy, 2007. DAML Ontology Library, 2004. Retrieved June 20, 2011 from http://www.daml.org/ ontologies/ontologies.html. 112 BIBLIOGRAPHY « H.A.A. ElFadeel and A.A.A. ElFadeel. Kngine, 2008. Retrieved January 10, 2011 from http://www.kngine.com/. O. Erling and I. Mikhailov. Mapping Relational Data to RDF in Virtuoso, 2007. Re­ trieved September 20, 2010 from http://virtuoso.openlinksw.com/whitepapers/ relational°/020rdf7020views7„20mapping. html. D Fensel, F.V. Harmelen, I. Horrocks, D.L. McGuinness, and P.F. Patel-Schneider. OIL: An Ontology Infrastructure for the Semantic Web. IEEE Intelligent Systems, 16(2): 38-45, 2001. M. Graves, A. Constabaris, and D. Brickley. FOAF : Connecting People on the Semantic Web. Cataloging and Classification Quarterly, 43(3):191-202, 2007. A.J.G. Gray, N. Gray, and I. Ounis. Can RDB2RDF Tools Feasibly Expose Large Science Archives for Data Integration? In Proceedings of the 6th Annual European Semantic Web Conference (ESWC2009), pages 491-505, Heraklion, Greece, June 2009. T.R. Gruber. A translation approach to portable ontology specifications. Knowledge Acquisition, 5(2):199-220, 1993. M. Horridge, N. Drummond, S Jupp, G. Moulton, and R. Stevens. A Practical Guide to Building Ontologies Using Protege 4 and CO-ODE Tools. Editioin 1.2, 2009. The University of Manchester, Manchester, UK. I. Horrocks, B. Parsia, P. Patel-Schneider, and J. Hendler. Semantic Web Architecture: Stack or Two Towers? In Proceedings of the 3rd International Workshop on Principles and Practice of Semantic Web Reasoning, pages 37-41, Dagstuhl Castle, Germany, 2005. 113 BIBLIOGRAPHY IBM. Structured Query Language (SQL), 2006. Retrieved March 22, 2011 from http://publib.boulder.ibm.com/infocenter/db21uw/v9/index.jsp? topic=/com.ibm.db2.udb.admin.doc/doc/c0004100.htm. N.R. Jennings and M. Wooldridge. Agent-Oriented Software Engineering. Artificial Intelligence, 117:277-296, 2000. G. Karvounarakis, S. Alexaki, V. Christophides, D. Plexousakis, and M. Seholl. RQL: A Declarative Query Language for RDF. In Proceedings of the 11th International Conference on World Wide Web (WWW 02), pages 592-603, New York, NY, USA, 2002. J. Lind. Issues in Agent-Oriented Software Engineering. In Proceedings of the 1st Inter­ national Workshop on Agent-Oriented Software Engineering, pages 45-58, Limerick, Ireland, 2000. F. Manola and E. Miller. RDF Primer. Technical report, W3C, 2004. Retrieved May 14, 2010 from http://www.w3.org/TR/rdf-primer/. B. McBride, D. Boothby, and C. Dollin. An Introduction to RDF and the Jena RDF API, 2010. Retrieved June 20, 2011 from http://openjena.org/tutorial/RDF_API/ index.html. J. Mccarthy. Ascribing Mental Qualities to Machines. In M. Ringle, editor, Philosophical Perspectives in Artificial Intelligence, pages 161-195. Humanities Press, 1979. D.L. McGuinness and F.V. Harmelen. OWL Web Ontology Language Overview. Techni­ cal report, W3C, 2004. Retrieved September 25, 2010 from http://ia.ucpel.tche. br/~lpalazzo/Aulas/TEWS/arq/OWL-Qverview.pdf. BIBLIOGRAPHY 114 G.A. Miller. WordNet: A Lexical Database for English. Communications of the ACM, 38:39-41, 1995. N. Noy and D.L. McGuinness. Ontology Development 101: A Guide to Creating Your First Ontology. Technical Report KSL-01-05, Knowledge Systems, AI Laboratory, Stanford University, 2001. C.M. Olszak and E. Ziemba. Approach to Building and Implementing Business Intel­ ligence Systems. Interdisciplinary Journal of Information, Knowledge, and Manage­ ment, 2, 2007. Ontolingua Server, 2008. Retrieved June 20, 2011 from http://www.ksl.Stanford, edu/software/ontolingua/. A. Pokahr, L. Braubach, and W. Lamersdorf. Jadex: A BDI Reasoning Engine. In R. Bordini, M. Dastani, J. Dix, and A.E.F. Seghrouchni, editors, Multi-Agent Programming, pages 149-174. Springer Science & Business Media Inc., 2005. A. Powell, M. Nilsson, A. Naeve, P. Johnston, and T. Baker. Dublin Core Metadata Initiative - Abstract Model, 2007. White Paper Retrieved August 31, 2010 from http: //dublincore.org/documents/abstract-model. E. Prud'hommeaux and A. Seaborne. SPARQL Query Language for RDF (Working Draft). Technical report, W3C, 2007. Retrieved August 30, 2010 from http://www. w3.org/TR/cooluris/. F. Ricca, L. Gallucci, R. Schindlauer, T. Dell'Armi, G. Grasso, and N. Leone. OntoDLV: An ASP-based System for Enterprise Ontologies. Journal of Logic and Computation, 19:643-670, 2009. BIBLIOGRAPHY 115 J. Rumbaugh, I. Jacobson, and G. Booch. Unified Modeling Language Reference Manual. Pearson Higher Education, 2nd edition, 2004. S. Russell and P. Norvig. Artificial Intelligence: A Modern Approach. Prentice Hall, New Jersey, USA, second edition, 2003. S.S. Sahoo, W. Halb, S. Hellmann, K. Idehen, T. Thibodeau, Jr., S. Auer, J. Sequeda, and A. Ezzat. A Survey of Current Approaches for Mapping of Relational Databases to RDF. 2009. Retrieved July 10, 2010 from http://www.w3.org/2005/Incubator/ rdb2rdf/RDB2RDF_SurveyReport.pdf. L. Sauermann, R. Cyganiak, D. Ayers, and M. Volkel. Cool URIs for the Semantic Web, 2008. Retrieved April 29, 2010 from http://www.dfki.uni-kl.de/~{}sauermann/ 2006/11/cooluris/. A. Seaborne. RDQL - A Query Language for RDF (Member Submission). Technical re­ port, W3C, 2004. Retrieved August 30, 2010 from http://www.w3.org/Submission/ 2004/SUBM-RDQL-20040109/. J.F. Sequeda, S. Tirmizi, and D. Miranker. A Bootstrapping Architecture for Integration of Relational Databases to the Semantic Web. In Proceedings of the 7th International Semantic Web Conference (ISWC2008), Karlsruhe, Germany, October 2008. J.F. Sequeda, R. Depena, and D. Miranker. Ultrawrap: Using SQL Views for RDB2RDF. In Proceedings of the 8th International Semantic Web Conference (ISWC2009% Wash­ ington, DC, USA, 2009. Y. Shoham. Agent Oriented Programming. Artificial Intelligence , 60:51-92, March 1993. 116 BIBLIOGRAPHY M. K. Smith, C. Welty, and D.L. Mcguinness. Owl web ontology language guide, 2004. Retrieved June 27, 2011 from http://www.w3.org/TR/owl-guide/. I. Sommerville. Software Engineering. Pearson/Addison Wesley, USA, 2004. D. Steer. SquirrelRDF, 2009. Retrieved September 05, 2010 from http://jena. sourceforge.net/SquirrelRDF/. V. Tamma and T.R. Payne. Is a Semantic Web Agent a Knowledge-Savvy Agent? IEEE Intelligent Systems, 23:82-85, 2008. J. Tao, E. Sirin, J. Bao, and D.L McGuinness. Integrity constraints in owl. In Proceedings of the 24th Conference on Artificial Intelligence(AAAI 2010), Atlanta, GA, USA, 2010. TopQuadrant Inc. TopBraid Composer 2007: Getting Started Guide Version 2.0, 2007. Retrieved June 16, 2011 from http://www.topquadrant.com/docs/marcom/ TBC-Getting-Started-Guide.pdf. M. Wooldridge. An Introduction to Multiagent Systems. Wiley, Glasgow, UK, 2nd edition, 2009. M. Wooldridge and N.R. Jennings. Intelligent Agents: Theory and Practice. Knowledge Engineering Review, 10(2):115—152, 1995. Appendix A The D2RQ Platform The D2RQ Mapping File is explained in A.l, and the proposed extension to the D2RQ Platform is described in A.2. A.l The D2RQ Mapping File The Mapping File contains the RDF representation of an RDB schema. Its file for­ mat is W3C standard format Notation 3 (N3), which is a compact alternative to the RDF syntax, intended for human readability and designed to optimize expression of data and logic in the same language. The Mapping File can be generated by running the generate-mapping script available in the D2RQ Platform software package. When this script is run, the D2RQ Engine analyzes the schema of the database and creates an RDF representation of the schema. The D2RQ Platform then uses the Mapping File every time it translates RDB data to RDF triples. An excerpt of the Mapping File generated from an RDB called university is given below. 117 APPENDIX A. THE D2RQ PLATFORM ©prefix map: . ©prefix vocab: . ©prefix rdf: . ©prefix rdfs: . ©prefix xsd: . ©prefix d2rq: ©prefix jdbc: . map:database a d2rq:Database; d2rq:jdbcDriver "com.mysql.jdbc.Driver"; d2rq:jdbcDSN "jdbc:mysql://localhost/university"; d2rq:username "root"; d2rq:password "123456"; jdbc:autoReconnect "true"; jdbc:zeroDateTimeBehavior "convertToNull"; # Table Course map:Course a d2rq:ClassMap; d2rq:datastorage map:database; d2rq:uriPattern "Course/@@Course.CRNlurlify©©"; d2rq:class vocab:Course; d2rq:classDefinitionLabel "Course"; map:Course_CRN a d2rq:PropertyBridge; d2rq:belongsToClassMap map:Course; d2rq:property vocab:Course_CRN; d2rq:propertyDefinitionLabel "Course CRN"; d2rq:column "Course.CRN"; map:Course_Title a d2rq:PropertyBridge; d2rq:belongsToClassMap map:Course; d2rq:property vocab:Course_Title; d2rq:propertyDefinitionLabel "Course Title"; d2rq:column "Course.Title"; map:Course_DepartmentName a d2rq:PropertyBridge; d2rq:belongsToClassMap map:Course; d2rq:property vocab:Course_DepartmentName; d2rq:refersToCl assMap map .-Department; APPENDIX A. THE D2RQ PLATFORM 119 d2rq:join "Course.DepartmentName => Department.DepartmentName"; # Table Department map:Department a d2rq:ClassMap; d2rq:datastorage map:database; d2rq:uriPattern "Department/©©Department.DepartmentName|urlify@@"; d2rq:class vocab:Department; d2rq:classDefinitionLabel "Department"; map:Department.DepartmentName a d2rq:PropertyBridge; d2rq:belongsToClassMap map:Department; d2rq:property vocab:Department_DepartmentName; d2rq:propertyDefinitionLabel "Department DepartmentName"; d2rq:column "Department.DepartmentName"; map:Department_Building a d2rq:PropertyBridge; d2rq:belongsToClassMap map:Department; d2rq:property vocab:Department_Building, d2rq:propertyDefinitionLabel "Department Building"; d2rq:column "Department.Building"; The Mapping File begins with the declaration of a number of prefixes for the com­ mon Unified Resource Identifiers (URI). Of particular interest is the base URI: , which establishes a vocabulary namespace for the constructs in the Mapping File. When each vocabulary is given a namespace, the ambiguity between identically named elements across multiple vocabularies can be re­ solved. The d2rq:Database tag specifies the JDBC connection to the database and the login credentials for accessing the database. The d2rq:ClassMap tag represents an RDB table as class, and d2rq:PropertyBridge represents an RDB column as property. A relation­ APPENDIX A. THE D2RQ PLATFORM 120 ship between two EDB tables is specified by the d2rq: join tag. For example, the tag d2rq:join "Course.Department_Name => Department.Department_Name" defines the relation between the Course and Department tables, i.e., department offers course. A.2 D2RQ extension I installed the D2RQ Platform and tested it against a small database named university. I ran flnd(s p o) SPARQL query, and the D2R Server displayed correct results. While the server was running, I changed a data value in a table and D2R server showed the new value in real time. This proved that the D2RQ Platform provides on-demand Relational Database (RDB) to Resource Description Framework (RDF) mapping. However, my test revealed a flaw in the platform. It failed to detect any changes that were made to the database schema. For instance, when I added a new column to the a table and populated it with new data, find(s p o) query results did not include the newly added column and its values. When I dropped or renamed a column the D2R Server gave the following error: Unknown column 'Department.Budget' in 'field list': SELECT DIS­ TINCT 'Department'.'Budget', 'Department'.'Department-Name' FROM 'Department' (E0) In this case, I dropped the Budget column from the Department table. Creating or dropping a table resulted in similar errors. Moreover, I encountered the error message shown in Figure A.l when I stopped the D2R Server and tried to launch again. APPENDIX A. THE D2RQ PLATFORM 121 File Edit View Terminal Help rg.Kortbay.Iog.Stf4jLog 20:02:15 INFO 20:02:15 INFO log log let 20:02:16 INFO D2RServer 20:02:16 INFO D2RServer e schedule mapping,n3 :: jetty-6.1.18 : : NO JSP Support f o r , d i d n o t find org.apache.jasper.servlet.JspServ : : using port 8688 : : u s i n g c o n f i g f i l e : file:/home/mohair,mad/ r d b 2 r d f / d 2 r - s e r v e r - 0 . 7 / c o u r s 2 0 : 0 2 : 1 6 ERROR l o g : : Failed startup of context org.n-Qrtbay.jetty.webapp .WebAppContext<568 2406{,webapp} de.fuberlin.wiwiss.d2rq.D2RQException: Column (^Department.Budget.® not found i n database !E0) at de.fuberlin.wiwiss,d2rq.dbschema.DatabaseSchemalnspector.coluisnType (OatabaseScheffalnspector.jav a: 96) at de.fuberlm.wiwiss.d2rq.sql.ConnectedOB.colunnType(ConnectedDB.)ava:317) at de.fuberlin.wiwiss.42rq.pap. MappmgSAt tributeTypeValidator.validate(Happing.java:173) at de.fuberlin.wiwiss,d2rq.map.Mapping.validate(Mapping.java:96) at de.fuberlin.wiwiss.d2rq.GraphD2RQ.(GraphD2RQ.java:85) a t de.fuberlin.wiwiss.d2rq.GraphD2R0.(GraphD2RQ. java:74) at de.fuberlin.wiwiss.d2rq.ModelD2RQ.(HodelD2RQ.java:61) at de.fuberlin.wiwiss.d2rs.AutcReloadableDataset.initD2RQDatasetGraph(AutoReloadableDataset.java:8 0! at de.fuberlin.wiwiss.d2rs.AutoReloadableDataset.forceReload(AutoReloadableDataset.java;54) at de.fuberlin.wiwiss.d2rs.D2RServer.start(D2RServer.java:225) at de.fuberlin.wiwiss.d2rs.WebapplnitIistener.context Initialized(WebapplnitListener.java:37) at org.uortbay.jetty.handler.ContextHandler.startContext(ContextHandler.java:540) at org.mortbay.jetty.servlet.Context.startContext{Context.java:135) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1220) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:510) at org.mortbay.jetty.webapp.'WebAppContext.doStart(WebAppContext.java:448) a t org.isortbay. coir-ponent .AbstractLifeCycle. s t a r t (AbstractLifeCycle. java: 39) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130) at org.mortbay.jetty.Server.doStart(Server.java:222) at org.mortbay.component.AbstractLifeCycle,start(AbstractLifeCycle. java:39) at de.fuberlin.wiwiss.d2rs.Jettylauncber.start!Jettylauncher.java:64) at d2r.server.startServer!server.java:86) at d2r. server, mam(server, java: 57) 120:02:16 INFO log : : Started SocketConnectortfO.0.0.0:8080 [Exception i n thread "main" java.lang.NullPointerException at de.fuberlin.wiwiss,<12rs.JettyLauncher.start{JettyLauncher.java:68) at d2r.server.startServer(server.java:86) at d2r.server,main(server.java:57) Figure A.l: D2R Server error After I had replaced the old Mapping File with a new version, I was able to launch the D2R Server. I also noticed my schema changes appeared in find(s p o) query re­ sults. Therefore, I came to the conclusion that in order to achieve real time consistency between RDB data and SPARQL query results, the Mapping File needs to be recreated whenever the RDB schema is modified. In order to automate this process I have added an extension to the D2RQ Platform. The conceptual picture of the proposed extension 122 APPENDIX A. THE D2RQ PLATFORM is shown in Figure A.2; and the processes in mapping RDB data to RDF triples with the extension in place are illustrated in Figure A.3. Yes Extension D2RQ Engine / Has \ RDB Schema \Changed?/ 1b: ReadsRDB Schema RDB RDBS -> RDFS Figure A.2: The extension APPENDIX A. THE D2RQ PLATFORM 123 Console User Interface ; SPARQL Endpoint {D2R Server) 3a: SQL Result RDBS RDFS D2RQ Engine Uses Maping RDB 2c: SQL Query Call generatemapping • Extei , ' Has RDB Schema ; Yes Figure A.3: The D2RQ Platform with the extension The extension can be implemented using one of the following two methods Binary log processing MySQL generates binary log files that record every transaction occurring in the databases. A log file can be associated to a database, and MySQL updates the log file every time a query is executed on that database. My proposed extension analyzes the log file contents to find out whether the most recent transaction has modified the 124 APPENDIX A. THE D2RQ PLATFORM database schema. I take the most recently executed query from the log file and run it through a string tokenizer to search for CREATE, ALTER or DROP string, because a query with one of these SQL statements is the one that modifies the database schema. When a match is found, the D2RQ Platform is invoked to update the Mapping File. This process is illustrated in figure A.4. Binary Log MySQL Database Listen for Change - Has Binary 1 UogChang«J?- No Yes Convert Binary to Text file New_Log OW_Log \ Find difference Copy N e w _ l o g t o Ofd_Log Log Difference Search string: CREATE. ALTER or DROP ' Match . found? - Yes Update ** Mapping File Figure A.4: Binary log processing method A PPENDIX A. THE D2R Q PL A TFORM 125 Query interception MySQL Proxy is a free and open source application that can intercept all queries and responses between a MySQL client and server. The extension uses MySQL Proxy to intercept all incoming queries, it then chccks whether a query has the string CREATE, ALTER or DELETE in it. If the extension finds a match it invokes the D2RQ Platform to update the Mapping File. This method is illustrated in Figure A.5. MySQL Server a Result Client a: Listen for CREATE, ALTER or DROP statement MySQL Proxy > Match Found ? Yes No Figure A.5: Query interception method Update wpuat<3 Mapping File