Towards Context-aware Mobile Web 2.0 Augmented Reality by Rahim P. Khajei MSc., Azad Qazvin University, 2011 THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE UNIVERSITY OF NORTHERN BRITISH COLUMBIA September 2017 © Rahim P. Khajei, 2017 Abstract Augmented reality (AR) is a Context-aware services service which allows users to have an enhanced perception of the real world through a composition of virtual and actual objects. In recent years, AR has received tremendous attention from both academic and industry sectors. However, developers and end users are still suffering from lack of standard formats and protocols. We believe the obstacles stopping AR from flourishing are partially inherited from context-aware services and partially stem from the architecture of the current AR applications. Here, we aimed to develop a new model that can support AR framework for sharing Content between AR applications and communication between AR users. By incorporating Web 2.0 standards in Client-server architecture, we designed a new architecture for AR named Client Federated Servers (CFS). We implemented an AR application named Scratcher as a proof of concept. Scratcher allows users to search and share Targets as well as communicate with each other. ii TABLE OF CONTENTS Abstract ii Table of Contents iii List of Figures vii List of Tables ix Acknowledgements xi Glossary xii 1 2 Introduction 1 1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Research Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 Purpose of Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.5 Objectives of Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.6 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.7 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.8 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Background and Literature Review 2.1 10 Augmented Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 iii 2.1.1 Brief History of AR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.1.2 How AR Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.1.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4 2.5 3 2.1.3.2 Medical . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.1.3.3 Entertainment . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.1.3.4 Military . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 AR Research Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.1.5 AR Enabling Technologies . . . . . . . . . . . . . . . . . . . . . . . . 19 2.1.5.1 Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.1.5.2 Interaction and Interface . . . . . . . . . . . . . . . . . . . 21 2.1.5.3 Display Methods . . . . . . . . . . . . . . . . . . . . . . . . 22 Challenges in AR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Client-Server Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.2.1 2.3 Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.1.4 2.1.6 2.2 2.1.3.1 AR Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Web 2.0 and Social Services . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.3.1 Web 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.3.2 Social Networking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.3.3 Social Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.4.1 Target and Content Related Issues . . . . . . . . . . . . . . . . . . . 39 2.4.2 Web 2.0 in Augmented Reality . . . . . . . . . . . . . . . . . . . . . 43 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Proposed Framework 50 3.1 Basic Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.2 Limitations with Current Structure . . . . . . . . . . . . . . . . . . . . . . 52 iv 3.3 Client Federated Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.3.1 Description of Components . . . . . . . . . . . . . . . . . . . . . . . 55 3.3.2 Benefits of Client Federated Servers Framework . . . . . . . . . . . 58 3.3.3 Practical Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.4 Data Flow and Connection between Components . . . . . . . . . . . . . . 62 3.5 Main Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.5.1 Registering in Target Hub . . . . . . . . . . . . . . . . . . . . . . . . 67 3.5.2 Sharing Targets and Contents and Subscription . . . . . . . . . . . 68 3.5.3 Searching and Loading Targets . . . . . . . . . . . . . . . . . . . . . 69 3.5.4 Chat Rooms and Communication Handling . . . . . . . . . . . . . 70 3.5.4.1 4 3.6 Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.7 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Scratcher - Proof of Concept 4.1 77 Mobile Application Implementation . . . . . . . . . . . . . . . . . . . . . . 78 4.1.1 How Does It Work? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.1.2 Chat System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.1.3 Storing and Retrieving . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.1.4 Sending and Receiving a Message . . . . . . . . . . . . . . . . . . . 86 4.2 Server Side Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.3 Target Hub Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.4 Web APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.5 Expiration and Activation Tags . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.5.1 4.6 5 Subscription and Notification Handling . . . . . . . . . . 72 Test Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Conclusion and Future Directions 105 v 5.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Bibliography 110 vi LIST OF FIGURES 2.1 Simple AR application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 Reality-Virtuality (RV) Continum [59] . . . . . . . . . . . . . . . . . . . . . 12 2.3 A simple AR system [72] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.4 Components in AR architecture [34] . . . . . . . . . . . . . . . . . . . . . . 16 2.5 AR in military application [30] . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.6 A modern HMD - Microsoft’s HoloLens [9] . . . . . . . . . . . . . . . . . 23 2.7 Client Server processing environment . . . . . . . . . . . . . . . . . . . . . 27 2.8 AR using Client-Server Architecture . . . . . . . . . . . . . . . . . . . . . 28 2.9 Flow of information in client-server framework [71] . . . . . . . . . . . . 30 2.10 Context provisioning ecosystem [77] . . . . . . . . . . . . . . . . . . . . . 37 2.11 Infrastructure of Cyber-Physical Web [36] . . . . . . . . . . . . . . . . . . 40 2.12 System architecture for AR application [28] . . . . . . . . . . . . . . . . . 41 2.13 System architecture for mobile AR application[75] . . . . . . . . . . . . . 42 2.14 Visualization of the trail marks in AR [39] . . . . . . . . . . . . . . . . . . 45 2.15 Visualization in Link2U [25] . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.16 Vision sharing feature for SOAR proof of concept [63] . . . . . . . . . . . 47 2.17 Bridging augmented reality and augmented virtuality [41] . . . . . . . . 48 3.1 client federated servers architecture for AR applications . . . . . . . . . . 55 3.2 Searching for target and list of available contents for a target . . . . . . . 61 3.3 Data flow diagram of the system . . . . . . . . . . . . . . . . . . . . . . . . 64 vii 3.4 Information flow for end user’s interaction . . . . . . . . . . . . . . . . . 65 3.5 Use cases of the target hub . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.6 Registration - Activity diagram . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.7 Target Sharing - Activity diagram . . . . . . . . . . . . . . . . . . . . . . . 68 3.8 Search and load target - Activity diagram . . . . . . . . . . . . . . . . . . 70 3.9 Communication process - Activity diagram . . . . . . . . . . . . . . . . . 71 4.1 Working environment of Unity . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.2 Interaction hierarchy between client and app server . . . . . . . . . . . . 79 4.3 Log in page of the Scratcher . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.4 Activating the AR scene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.5 Chips has been detected but stones not . . . . . . . . . . . . . . . . . . . . 83 4.6 Target search and load page . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.7 Chips and Tarmac both have detected . . . . . . . . . . . . . . . . . . . . . 84 4.8 Connecting by a common target . . . . . . . . . . . . . . . . . . . . . . . . 85 4.9 Chat history of tarmac . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.10 Class of Message . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.11 Chat room scene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.12 Polling VS Long polling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.13 Interaction hierarchy between app server and target hub . . . . . . . . . 89 4.14 Web methods of the app server . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.15 Registration method of the app server . . . . . . . . . . . . . . . . . . . . 91 4.16 Requests target hub for list of targets . . . . . . . . . . . . . . . . . . . . . 92 4.17 App server forwards update message to the target hub . . . . . . . . . . 93 4.18 Activity diagram of the chatting system . . . . . . . . . . . . . . . . . . . 94 4.19 Entity relationship model of the target hub . . . . . . . . . . . . . . . . . . 96 4.20 Register method in server controller . . . . . . . . . . . . . . . . . . . . . . 97 viii 4.21 Forwarding a chat message to the subscribers . . . . . . . . . . . . . . . . 98 4.22 Only sphere is in the scene . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.23 Both of sphere and cube are in the scene . . . . . . . . . . . . . . . . . . . 103 4.24 Sphere expired and disappeared . . . . . . . . . . . . . . . . . . . . . . . . 104 ix LIST OF TABLES 4.1 Document of searching for targets in the target hub . . . . . . . . . . . . 98 4.2 Document of downloading a target from the target hub . . . . . . . . . . 99 4.3 Document of uploading a target to the target hub . . . . . . . . . . . . . . 99 4.4 Document of message passing in target hub . . . . . . . . . . . . . . . . . 100 4.5 Document of registration to target hub . . . . . . . . . . . . . . . . . . . . 100 4.6 Document of showing the servers of the target hub . . . . . . . . . . . . . 101 x Acknowledgements “My favourite things in life don’t cost any money. It’s really clear that the most precious resource we all have is time.” — Steve Jobs Firstly, I would like to express my sincere gratitude to my wife Mona Aminorroayaee for her continuous support during my studies. I would like to thank my supervisor Dr. Alex Aravind for his input, guidance, and feedback. The door to Prof. Aravind’s office was always open when I had questions. Dear friends, your support surely helped me to finish my thesis and better its quality. My sincere gratitude to my friends and lab mates, namely, Behrooz Dalvandi for helping me in the application development, Mani Samani for helping me in the documentation and his feedback, Conan Veitch for thesis proofreading and his feedback, Nahid Taheri, Shanthini Rajendran, Darshik Shirodaria, Raja Gunasekaran, Arthi Babu, Gurpreet Lakha, and Braemen Stoltz for their passionate participations in my presentations and their feedback. I would also like to thank my committee members Dr. Luke Harris and Dr. Samuel Walters for their very valuable comments on this thesis and their guidance. I also thank my external examiner Dr. Balbinder Deo for his feedback and his comments and Dr. Ian Hartley for chairing my thesis defence. xi Glossary Activity diagram UML activity diagram. 66 Adoptability The likelihood of a product to be accepted and used by other developers. 77 All-in-one systems A system that has all of its components in one place. 31 API Application Programming Interface (API) specifies how to interact with software components. 3 AR resource Context and Content information of an augmented reality system. 67 Augmented reality browser Type of browsers that use camera to detect and display information related to location around the user. 59 Client Federated Servers The proposed framework of this thesis which is an enhancement over Client-server. ii Client-server Refers to a computational model that service providers called servers and service requesters called clients are scattered over a network. ii, xii Context Provider Services that collect and share context information. In augmented reality context is target or point of interest. 99 Co-routine Is type of a function that can pause the execution and return the controller to caller and resume the execution from where it was stopped if control ever returns to the function. 80 xii Content The information that is going to be delivered to the user in a Context-aware services service. ii Content providers Services that collect and share virtual objects. 51 Context Any information that can be used to characterize the situation of an Entity. 1 Context-aware services Services that are trying to provide relevant information to the user by recognizing a situation and adapting according to changes. ii, xii Cross-platform The ability of a software to run on multiple computer platforms. xiv, 4 Data flow diagram UML data flow diagram. 63 End user The ultimate intended user of a product. 63 Entity An entity is a person, place, or object that is considered relevant to the interaction between a user and an application, including the user and applications themselves. xiii, 1 HTTP requests A communication protocol that allows a computer to send a request and a server to respond. 33 Hub A center for data exchange and routing. 43 KHARMA Is an open architecture for augmented reality that allows user contribution through HTML and JavaScript. 40 Open systems Systems that have a combination of flexibility, interoperability, and portability. 26 Participatory AR Refers to type of augmented reality with multiple users interacting in a shared space. 106 Points of interest A location that a Virtual content is going to be registered for it. This is the equivalent of Targets for location based AR applications. xiv, 13 xiii Proprietary standards Protocols and specifications for a software or hardware that is controlled by a company rather than a standards organization. 31 Prototype Incomplete version of a software that serves a special purpose in software development. 8 Push notification service A message that is sent by servers and pops up on a mobile device. It does not need users to be in the application or use the application. 71 Registration Aligning Virtual content on Targets. 11 Scalability Capability of a system to handle its growth in work or number of clients. 30 Scratcher Is an AR prototype application that have been developed as a proof of concept. ii SDK Software Development Kit is a set of application development tools that facilitates developing an application. xiv, 4 Target Hub The structure that collects and shares Targets with clients. 54 Targets Visual patterns that are to be recognized by an AR application. ii, xiii, xiv Token An ID issued by server to verify the client for future requests. 66 Unity Is a a Cross-platform game engine developed by Unity Technologies. 4 Use case diagram UML use case diagram. 64 User profile Representation of user model or user identity. 33 Virtual content Computer generated objects such as texts, images, videos and audios that are going to be delivered user upon detection of the Targets and/or Points of interests. xiii, xiv Vuforia Is a Software Development Kit (SDK) to enable augmented reality functions. 4 xiv WebClient Class A .NET class that Provides functions for sending and receiving data from a URI. 89 WWW class Is a small utility module for simple access to web pages. 86 XML Is a hypertext system that operates over the Internet. 40 xv Chapter 1 Introduction 1.1 Overview The Web has affected our lives in many respects from communication and information sharing to business models. However, we have never stopped in asking for more. As such, a new era of Web called Web 2.0 emerged with particular emphasis on two main aspects: user contribution and considering the Web as a platform [61]. For user contribution, it is expected to enable users to create and share contents through participation. This pattern of two-way traffic from developers to users and from users to developers have been encouraged by Web 2.0 and has been adopted in social network and social media. Web 2.0 also expects a multifunctional mash-up of services which is created by combining services offered by different service providers. This allows us to consider Web as a platform for application developing instead of detached (islands of) scattered services. On the other hand, context-aware services are trying to provide relevant information to the user by recognizing a situation and adapting according to changes. In this regard Dey et al. define Context as “any information that can be used to characterize the situation of an Entity. An entity is a person, place, or object that is considered 1 relevant to the interaction between a user and an application, including the user and applications themselves” [11]. It is believed that the availability and the quality of contextual information will be crucial factors of future context-aware services [52, 77]. One important issue in context-aware services is a shortage of standards for contexts. Especially the problem gets magnified when it comes to fusing several types of contexts to offer a particular service. There have been efforts and research studies toward integration of Web 2.0 with context-aware services. Many of the suggested platforms have been adopted over the past few years. For instance, a mobile middleware component has been proposed as a platform to collect user context and authentication information in [49]. This platform allows Web services to subscribe to user context and utilize the services offered by other subscribers. It would be beneficial to bring Web 2.0 and context awareness together. However, in different technologies, context, content, and user contribution may have different meaning and requirements. Therefore, a platform that is desired to deliver both Web 2.0 standards and context-awareness should be tailored and customized according to those requirements. With the advancement of computing and communication technologies, the virtual worlds have been created. A virtual world is using digital objects such as sound, image, video, and graphic, and it has become part of our everyday life. Virtual reality technology is an attempt to create a fully virtual world that gives users a real life or “make believe” experience. Augmented Reality (AR) takes this idea even further by “augmenting” digital objects into the physical environment for enriching the users’ perception of the real world. That is, AR technology enhances the physical environment by augmenting digital objects and enables users to interact with this enhanced new world. Augmented reality is a cutting edge, under development technology in the realm of context-aware services. It allows computer generated contents in the form of image, text, video, audio and 3D objects to be superimposed on physical objects. AR technology enable users to experience new possibilities that are not feasible either in a 2 real world environment alone or a virtual world alone. In theory, AR technology forms a spectrum of possibilities – touching at one end with an entirely real (physical) world environment and the other end of an entirely virtual (digital) world. The goal of augmented reality is to enhance users’ perception of the reality by providing information related to the context. A significant number of AR applications have been developed over the past few years. Nonetheless, only a few of them has proven to be successful among them the most famous is Pokémon Go [10]. In practice, there are some issues and obstacles in augmented reality which stop it from achieving its goals to the fullest. These problems are partially from context-aware services and partially because of the architecture of the current AR applications. For instance, proprietary interfaces and formats have been a huge obstacle for widespread adoption of this technology [36]. On the other hand, there is no standard way of users’ contribution whether to create or share a content or communicating and sharing ideas. This not only makes it difficult to support users’ contribution inside AR technology, but it also becomes a bigger problem when it is desired to share users’ contribution as a service to other Web services. The existing implementations of the AR applications use one of the popular models of distributed systems called client-server model. We believe that this traditional implementation of AR using client-server model has some significant limitations. Especially in compliance with Web 2.0 and context awareness, the need for a platform that can provide Web 2.0 features such as user contribution and a rich high-quality context-aware experience is highly perceivable. The main contribution of this thesis is to propose an alternate model of AR implementation to address the limitation of the current AR implementations and offer added benefits to AR users. We refer to the proposed alternate model for future AR applications as “client federated servers.” The key objective of this new model is to enhance user experience and also increase the applications of AR technology. This model combines elements of context awareness and Web 2.0 standards tailored to meet AR needs. It allows suitable collaboration between servers of different AR apps to share 3 their resources regarding targets and virtual objects. Target hub is the server side and backend infrastructure of the proposed model. The target hub gathers user context and AR content information such as communicated messages and shared contents and makes them available through offered API to its subscribed servers. In client federated servers model, users of an AR app can communicate not only with each other but also with the users of other AR applications. Joining databases of the targets and virtual objects, and allowing communication between users, can make AR environment more intuitive and consistent. This also will enrich the experience of AR users. In this thesis, we have implemented a proof of concept that we refer to as “Scratcher”. The name stems from the analogy between scratchcard and AR applications. To reveal the information behind the opaque covering, we have to scratch the cover. The same way, Scratcher reveals the information (content) covered behind the physical object (target). Scratcher is an application that has been implemented in a Cross-platform environment using Unity, a powerful game engine, and Vuforia, an SDK for developing AR applications. Users can log into the application and start browsing. They can search for a target by its name or by the assigned tags. A target is any object or image of an object that is recognizable by the app. Targets are stored either locally on the device or in a cloud. Using the target hub, it is possible to share targets with other applications. It is also possible to search for a target using the assigned tags and download it to a device. Upon recognition of a target, users can click on it and enter to its chat room where they can start to share their ideas. The platform allows users to communicate with not just the Scratcher’s users, but everyone subscribed on the target hub. This project enables AR functionalities as well as allowing user’s contribution in multiple levels including sharing content and exchange of ideas and popularity vote. 4 1.2 Motivation My first experience with augmented reality started with an NSERC Engage grant project to develop an AR mobile framework for teaching First Nations language and culture. During the project, I learned that AR holds high potential to bring us together and help us share our stories. As Alida Gersie has mentioned in [33], stories are the source of inspiration; through them, we can understand how to value and devalue our planet. The project finally finished, but my work had just begun. I wanted to know with all the potentials that AR holds and all the exciting experience that it can offer: why AR is not popular, where the problem is, and what is lacking? Pursuing these questions led me to be more familiarized with augmented reality. Then, I noticed the key of success to AR is partially lacking in all aspects of right data, right time, and the right place. I think we all have heard the famous quote that “the key to success is to be in the right place at the right time.” Well, I think augmented reality is supposed to be all about right data in the right place at the right time, except that it is not. In augmented reality, the right data would be high-quality contents. The right place would be targets or points of interests upon which the content is going to be overlaid, and the right time is the time of need that is going to be inferred the from context information. In general, I noticed that it is not just the lack of data but also a lack of cooperation among providers and consumers that affects AR. Typically, AR resources are stored in the proprietary databases with proprietary formats. One major motivation of this work is to bring all these resources under a common framework and make it accessible to all AR users. My other key motivator for this work is the belief in the wisdom of crowds that the many are smarter than few. We need to bring powers of crowds into augmented reality. One way of that would be enabling users’ contribution by supporting bi-directional interactions between users and developers. Users of augmented reality should be able to contribute to the collective experience by sharing their side of the story. This 5 contribution needs the users to be able to create AR contents and targets and share them with others. Also, users of AR should be able to communicate and share their ideas with the provided medium. I intend to deal with these issues. 1.3 Research Problem Suppose two persons A and B are walking in a park. User A is using his AR application and browsing a tree. User B is using her AR application and browsing flowers. When User A turns his smartphone to the flowers, he could not browse the flowers. Also, when user B points her phone to the tree, she could not see any superimposed information over the statue. Although both users are browsing in the similar environment at the same time, they lack sharing capability. How can these users share their virtual contents with each other? Another example, suppose users A and B are browsing a famous statue in the park. User A wants to share his ideas about it with other interested persons (e.g. user B). User B wants to chat about the history of the statue with other concerned users. How can these users communicate? The research problem is to investigate the causes for the lack of communication and sharing and their consequences on mobile AR applications. 1.4 Purpose of Study Since virtual content is retrieved from a server, the server side of the application should support any necessary structure for sharing and communication. This leads to the following questions. From a developer point of view, is there a way to support such a sharing and 6 communication capability? What are the requirements? The purpose of the study is to understand the causes for the lack of communication and sharing in AR with the aim to develop a new method that can support AR framework for sharing contents between AR applications and communication between AR users. 1.5 Objectives of Study Considering the two users A and B mentioned in the research problem; we are investigating the possibility for these two users to see more than they browse now. To be exact, if user A can browse the targets TA1 , TA2 ..., TAn and user B can browse the targets TB1 , TB2 ,..., TBn . • We want to enable user A to browse any subset of {TB1 , TB2 ,..., TBn } and vice versa for user B to browse any subset of {TA1 , TA2 ..., TAn }. • We want to enable user A to send and receive messages from user B and vice versa for user B to send and receive messages from user A. 1.6 Research Questions • What kind of software models and protocols would enable user A to browse any subset of {TB1 , TB2 ,..., TBn } and vice versa for user B how is it possible to browse any subset of {TA1 , TA2 ..., TAn }? • What kind of software models and protocols would make it possible for user A to send and receive messages from user B and vice versa for user B to send and receive messages from user A? 7 • If we propose a new AR system to solve the above two research questions, how to justify its feasibility in practice? 1.7 Contributions Our attempt to answer the research questions resulted in the following contributions. • The main contribution of this thesis is to propose an alternate model of AR implementation to address the limitations of the current AR implementations and offer added benefits to AR users. • Using the proposed framework, we can connect users of different AR applications. Chatrooms for targets connects the users of various applications around a common topic of interest. • Our third contribution is expanding the scope of AR applications by providing the ability to share targets among applications of different platforms. Users are also able to add targets and contribute to authoring AR targets and contents. This will lead to a jump in the number of targets. Also, contents can be shared using this framework. However, in the project implementation, we have implemented only the target sharing. • Last but not least, we implemented the proposed model as a Prototype AR mobile application. Our implementation is not the only way to implement the client federated servers offered in this thesis. However, it encompasses all the necessary modules, and it functions well enough to be considered as a proof of concept for our proposed model. 8 1.8 Thesis Structure The structure of the thesis is as follows: Chapter 2 offers the necessary background to the concepts used in this thesis such as social network, social media, client-server, and augmented reality and its applications. Then, the literature review section covers current implementation of the AR and problems and limitations that it imposes on the current AR applications. Also, in chapter 2 we get to know the previous attempts in dealing with social services in AR as well as the advantages and limitations of one over another. Chapter 3 gives a detailed technical insight of what we are offering as a framework, the components of the proposed framework and their tasks in the system. Also, the information flow from an AR user to the servers and finally to the target hub has been discussed in Chapter 3. Chapter 4 evaluates the feasibility and practicality of the proposed framework through a prototype application named Scratcher. The blueprint for the implementation of the AR application, the server and the target hub has been presented in detail. Chapter 5 discusses the limitations of the work and concludes the thesis by offering guidelines for the interested researchers who want to continue this vision. 9 Chapter 2 Background and Literature Review In this section, we will first give a background to augmented reality, its evolution, applications, and technological aspects such as challenges and limitation. Then we discuss the client-server architecture adopted for augmented reality, its advantageous and disadvantageous. Then we provide a brief background to Web 2.0 concept, its features and how it is helping augmented reality. We finalize this chapter by reviewing the previous researches on employing Web 2.0 features in augmented reality. We are particularly interested in the research studies that are proposing AR resource sharing and user contribution in augmented reality. 2.1 Augmented Reality Informally, AR technology allows computer-generated contents in the form of image, text, video, audio and 3D objects to be superimposed on the physical environment. Consider the two simple example applications of AR given in Figure 2.1a and Figure 2.1b. In Figure 2.1b, the physical environment has a speed board and a speed detector equipment displaying the speed of the moving vehicles on the road. Here, based on the 10 (a) (b) [46] Figure 2.1: Simple AR application speed detected by the equipment, the digital image (speed in number) is created and displayed on the display board. This simple AR application gives the impression to the ordinary users that she is driving the car in a smart environment that can provide (alert) with useful information about her car speed. In the example shown in Figure 2.1b, the camera from the cell phone detects a body portion of a human and displays the digital image of the internal parts (organs) of that portion of the body. In this application, by focusing the camera on the different body parts of a human, one can visualize the internal structure of that body part. This can be an excellent application of AR system for educational purposes. Formally, as defined by Azuma in [16], AR is an interactive space created through computer-generated images capable of 3D Registration and rendering with the display of a combination of real and virtual objects. This definition has three basic elements: 1) AR mixes real and virtual objects; 2) AR is interactive in real time, and 3) AR registers virtual contents with physical objects in the real world. Augmented reality is the middle ground between virtual reality as an entirely artificial world and telepresence as a whole real world [59]. Figure 2.2, shows a continuum 11 from the real environment in one end and virtual environment on the other end encompassing augmented reality and augmented virtuality. The difference between virtual reality and augmented reality lies in the environment in which the user is positioned. While in virtual reality the user is immersed entirely in a different world, i.e. a virtual world, in augmented reality virtual images and information are delivered to the user in its real physical world. Figure 2.2: Reality-Virtuality (RV) Continum [59] 2.1.1 Brief History of AR Ivan Sutherland started to work on virtual reality system named “the ultimate display,” and he succeeded to make the first augmented reality and virtual reality head mounted display system about half a century ago in 1968 [81, 82]. Since then, AR had been considered as a subset of mixed reality or as a variation of virtual reality rather than a separate field for a long period. The focus of that era is on visual sensing and display which both are inherited from virtual reality and mixed reality. Later in 1998, with virtual reality fading into the background and emerging the new concepts such as ubiquitous computing (UC) Weiser [85], the focus of AR started to change toward user experience rather than visual display. This shift is understandable considering what UC is proposing. Instead of taking computation into a virtual world, UC brings computation power into the real world. In the same way, AR started a new trend toward smart and networked objects. For instance, Mackay in [56, 57] introduced a wider definition of AR which considers interactive virtual objects and also smart networked objects. This 12 trend of AR tries to enhance everyday objects with memory, computation power, and a sense of awareness, which would lead to augment the user experience. By then AR had turned to be a field of study and the first AR conference, International Workshop on Augmented Reality (IWAR), was held in San Francisco, October 1998. After two decay of active research on augmented reality, today’s AR is rather different from what we had expected to experience. There can be many reasons for that as Evan Barba et al. [17] speculated one reason could be lack of technology for smart objects. Major AR technology today boils down to smartphones and, of course, Google glasses or other companies’ head-worn devices. This would never satisfy tangible or smart objects. Meeting users’ needs is the other important factor in forming of today’s AR. By looking back to the proposed AR, it is simply understandable that they either were not beneficial enough or were not easy to have them in everyday life. The important outcome of this is identifying key elements in today’s AR. From a technological point of view, smartphones and head worn glasses are key elements in developing current AR applications. Also, the main source of computational power is the cloud and applications should be tailored to the user. 2.1.2 How AR Works Typically, AR systems follow a general system flowchart which is initiated with an input sensory device such as a camera capturing the predefined scenes of the real world. AR application reads this input and matches it to a database of patterns that are to be detected. These patterns are formatted images or locations, and often are called targets or Points of interest (POIs). When there is a match in the database of targets, the location and orientation of the camera are calculated, and a virtual content is aligned with the target. This aligning action is called registration. The virtual contents are often labels, images, or 3D models, which can be stored locally on the device or a server. Real scenes captured by a camera and the virtual content should be rendered (combined) 13 to a new displayable image. Finally, the augmented image will be displayed on the user’s device. Figure 2.3 shows the diagram of these actions in the described AR system. Another important process of any AR application is tracking. Once a target is detected, its location will be tracked by the relative changes of locations and angles calculated from previous scenes. Figure 2.3: A simple AR system [72] In general, AR systems are composed of the following components. Input: Target or POI from the physical environment and its position. Target detection is one of the most difficult tasks computationally. Therefore, to alleviate the computational complexity of this task, targets from the environment are assumed to have easily identifiable features. These identifiable features are referred as fiducial markers. From the location of fiducial markers, the positions where the virtual objects must be augmented are computed. Fiducial marker is a unique pattern that essentially tells what virtual image must be displayed and in what point of view. Fiducial markers are physical entities with unique features, and they could be from the actual environment, landmarks, or objects artificially attached for the purpose of identification. Commonly used artificial fiducial markers in AR applications are unique drawings, images, and “quick response” (QR) codes. Typically, sensors and cameras are used to identify fiducial markers. Locations are calculated based on the readings from the sensor devices such as a compass, GPS, 14 gyroscope, accelerometer, etc. Buttons, touchscreen, keyboard, mouse in the user devices can also be considered as sensors that capture user inputs. Registration: Refers to spatial alignment of virtual objects with the physical objects. Registration requires knowing where the physical things are in space in real time so that the virtual objects could be aligned with the real objects accurately. Tracking: Keeping track of the objects’ current locations and positions that are necessary for the accurate registration process. This operation becomes tedious if the device or the fiducial markers are mobile. Target and virtual objects repository: This is a database of targets and virtual objects. Target objects normally are formatted source images of the targets or their signatures. Virtual objects are a predefined set of computer made graphics, images, or texts made ready to be rendered along with physical objects upon the detection of the targets. Graphics/Rendering: When a target is detected, the corresponding virtual objects are first fetched, then aligned with the real scene, and finally compiled into one “image” that can be displayed. Display: The display technology on which the rendered scene (the final image) is displayed. Communication: The messages that are relayed between the AR components mentioned above when they reside on different systems. From the implementation perspective, these components can physically reside in one system or multiple systems based on the application requirements and performance metrics. Jens Grubert and Raphael Grasset [34] have considered AR applications in three layers: (a) the application layer; (b) the AR layer; and (c) the OS third party layer, as shown in Figure 2.4. The application layer implements the logic of the application. For example, in an AR game, the application layer handles the characters and their 15 behaviors (such as movement). The AR layer includes main components needed for any AR application namely, display, registration, and interaction between the other AR components. The OS layer provides required essential services such as processing sensory input such as camera, GPS information, etc. Figure 2.4: Components in AR architecture [34] 2.1.3 Applications Over the past decades, several areas have been discovered and introduced as potential application areas. In some of those areas, AR has flourished much more than others. For instance, the military is using AR in pilots’ helmets as an example, but in education, we have not seen much yet. This section briefly introduces those areas. 2.1.3.1 Annotation Annotation is almost most typical and preliminary type of AR in practice. Assuming to have a very large database of objects and information, a user can point his smartphone toward different objects. As soon as the objects detected, the related information is overlaid on the objects in real time. This can be helpful in navigation, or any guidance 16 system [29, 68]. One example of this application has been implemented in [69], called augmented library which assists the user to find a book or answers questions about the books in the library. 2.1.3.2 Medical It is possible to collect a 3D dataset of a patient through several types of sensors and then combine and render these images to make a compelling virtual content. Doctors can have an “X-ray” vision on the patient in the real-time. Other potential applications can be in the surgery room by providing a vision of the needle inside the patient’s body [31, 78]. This, of course, would need a very precise registration and tracking. 2.1.3.3 Entertainment Many AR games are already in use. Also, developers all over the world have started to use AR in their games. Unique perspective, interacting with game objects directly and in 3D, and mixing the real environment with the game environment are part of the features that AR has brought into this wonderland. “ARQuake,” created by Piekarski and Thomas [64], is one example of such environment that player is playing in the real environment superimposed by virtual enemies. 2.1.3.4 Military One very famous example of this would be pilots’ helmets. For example, the F-35’s helmet-mounted display system superimposes the necessary information such as airspeed, heading, altitude, targeting information and warnings (see Figure 2.5) [30]. There are many other application areas which we only name them here as we do not intend to survey AR in this section: 17 Figure 2.5: AR in military application [30] • Manufacturing and repair • Robot path planning • Personal information systems • Advertisement • Industry • Education • Simulation 2.1.4 AR Research Areas By studying technology surveys in [15, 16, 90] it is inferable that there are a number of areas to be considered for a successful AR application. Zhou et al. in [90] have listed major areas in AR as follows: Graphical hardware and software: Complex 3D virtual content creation and rendering as well as overlaying those contents on video streams would need suitable graphical hardware and software. 18 Tracking and registration: Virtual contents should be related to at least one aspect of the real world this is called registration. In a further attempt, real world objects and location should be tracked so that virtual contents would properly be adjusted to the changes. Display hardware: The results of tracking and overlaying need to be reflected which is why display hardware is required. It can be a monitor, projector, cell phone, etc. Processing unit: A computational unit to run AR application code, which might be distributed. Interface: Any human-computer interaction would need an interface to convey commands and demands between users and the application. This gives the capability of manipulating contents. Although AR researchers usually have focused on one or number of above topics, it is important to know a typical AR application includes all of these areas. 2.1.5 AR Enabling Technologies 2.1.5.1 Tracking Tracking is the most popular research topic in AR context. Based on what tool and technique are used for tracking the targets (real world objects), tracking methods can be one of the followings: A) Sensor based tracking The main idea is to use sensors such as magnetic, acoustic, optical, and other types of sensors, to detect and track the targets. Thre are only a few research studies in this context, mainly because of major disadvantages of this method. For example, distortion is a common problem with magnetic sensors. There have been attempts 19 to combine different type of sensors to reach more accurate tracking. For instance, Klinker et al. tried to combine a local monitoring system installed on the human body with fixed global tracking [48]. B) Vision based tracking Instead of sensors, computer vision along with image processing is used to detect and track the targets as well as dynamically calculate pose and orientation of the camera and objects. This is the main research area in tracking techniques. Vision based tracking divides into two groups: marker-based and feature-based (or markerless) [65]. The marker-based approach uses fiducial markers to calculate the camera pose. One of the dominant tracking techniques was square markers. Stricker et al. investigated a method for finding coordinates of the four corners of a marker [79]. Famous approaches in this area have been thoroughly reviewed in [89]. The feature-based technique which was introduced by Park is trying to find targets using natural information extracted from the edges and lines in the image of the target [62]. C) Hybrid tracking There are advantages and disadvantages of the two mentioned methods. Mainly vision based tracking has low jitter and no drafting, but it is slow to swift and fast motions which might lead to tracking failure and it is time-consuming to resume once the target is lost. Sensor based tracking, on the other hand, is vulnerable to distortion and draft and errors can be accumulated leading to inaccuracy. A combination of both methods seems to be a better method. For example, visionbased tracking along with GPS localization and acceleration sensors for calculating rotation and camera pose. 20 2.1.5.2 Interaction and Interface Interactivity and interface are the important aspects of AR which have gained increasing attention. Intractable virtual contents and user-friendly interfaces are major milestones for AR technology on its path of evolution. It is a delicate way of interaction in which users can interact with the virtual contents and through that with virtual world without dealing with traditional computer interfaces and only by real world objects. The basic idea in AR interaction is to bridge virtual and the real world through manipulation of features in physical objects. Toward this goal, tangible augmented reality has introduced in [45]. Considering that every action in the real world can be interpreted as an interface command to the virtual world, it is understandable why tangible augmented reality is popular. Hand gestures or finger hints are one of the popular ways of interacting [58]. One challenging issue for developers was how to instruct users to make the right motion to activate the desired command for which a nice solution is proposed in [86]. They were augmenting visual hint on the real object to guide the user to proper action. Another interesting aspect of AR interaction, is collaborative interaction which mostly happens between multiple users in a shared space. The beauty of the work lies in the intuitiveness of the interactions which is based on already established social protocols. In [37], Henrysson showed how AR can support collaborative interaction by virtually playing a tennis game. In this work, the phone acts as a tennis pad that vibrates when the virtual ball hits the pad (the phone). In another attempt, Stafford et al. introduced an interesting way of interaction between indoor and outdoor users [76]. In this study, an indoor user pointed a location on a map which triggered an augmentation on the same location for the outdoor user as a finger that came from the sky (called God like interaction). This can facilitate interaction between indoor and outdoor users. 21 2.1.5.3 Display Methods Any AR experience should be visually reflected for its users at the end. Unless it is the audio augmented reality which is beyond the scope of this study. To this end, there are several methods or in other word several instruments each of which has its advantages and disadvantages. Three main categories of displays have been recognized in [47]: 1. Mobile handheld displays 2. Video spatial displays and spatial augmented reality 3. Wearable displays Handheld devices Most popular AR displays are handheld displays due to several reasons such as their small size, ease of mobility, lower prices compared to other types, minimally intrusive, and accessibility since they are already present in the social life. Cellphones and Tablets are the most popular devices that are used in this technique. Considering recent advancement in cell phone technology including embodied Cameras, GPS, different sensors, and high-resolution screens, there are high promises for this type of display. This goes that far that many researchers tend to study AR only in the smart phone’s ecosystem (handheld devices generally) [17]. Although there has been a significant technological advancement in handheld devices in recent years, still slow processor and low memory can be their drawbacks in many AR applications. The main limitation for them might be tracking which is mainly based on image processing tools such as ARToolkit, Vuforia, Metaio and similar other instruments. Video spatial displays and spatial augmented reality Spatial augmented reality refers to a type of device that displays the virtual contents directly on the physical objects. These devices include projector based displays and holographic optical devices or half silvered mirrors. Distinguishing feature in this technology is the natural view and feelings that it serves to its users. However, the need to the extra and 22 usually expensive device is the disadvantage of this technique. Projector based displays are most suitable for applications with several users demanding to share their AR experiences such as a teaching class or a surgery room. Projection light should be registered with a physical object and illuminated on the object for which a projector illumination method has been proposed in [22]. Head mounted devices (HMDs) Wearable devices are goggles such as head mounted displays or glasses which augment the virtual contents in a more natural way. This type of displays is composed of the real-time video stream on which virtual contents are overlaid. Owing to plenty of image processing techniques, handling occlusions and color contrast and other types of lighting difficulties are much easier comparing to optical displays. Modern head mounted devices allow sixdegree freedom of movement and monitoring (Figure 2.6). Although HMDs seem to be partially successful, wearing HMDs for too long can make users uncomfortable. Figure 2.6: A modern HMD - Microsoft’s HoloLens [9] 2.1.6 Challenges in AR Surrounding user in a mixed world of real and virtual objects leading to a more desirable world is the core idea of augmented reality. To achieve this goal following challenges and limitations should be answered and overcome. 23 So far data have been categorized by applications. The extreme view of such classification is in smartphones that each application has its own data space (sandbox). Obviously, AR tends to group data based on location and environment. Generally speaking, the current cell phone’s ecosystem could be much better for an AR application than it is now. So this ecosystem can build a state in which there are a huge amount of virtual contents for the desired object but distributed in several applications. In this situation, we might be able to experience all of these contents separately. However, we probably will not get the killing application and AR’s ultimate potentiality unless all those contents are experienced together. Having two services at one time has an added value. Obviously, these contents together are more than the sum of all of them in separate. Hence, experiencing these contents together is a richer experience than experiencing the contents individually. Another challenge is removing a significant implicit assumption. Strangely it is expected from users to take out his cell phone once in a while and point it in a random direction hoping that there are going to be a virtual content, or otherwise it should be assumed that users already know where they should look for virtual contents. This assumption not only is absurd but in many cases is against the spontaneous nature of augmented reality. For users to enjoy the spontaneous nature of AR, new forms of display technology are needed. Such technologies should be able to contribute to our social life rather than being intrusive. Google glass could be an example of this technology, even though it has its problems. The two challenges mentioned above are mainly related to IT technologies on a larger scale and not necessarily AR. There are a few number of AR specific challenges which addressing them would pave the way for reaching to AR goals. The most significant AR challenges are: Accurate registration: When it comes to outdoor applications especially in open area, registration becomes a major problem [48]. Integrating virtual contents in the real world depends vastly on the accurate calculation of the camera pose and the 24 physical object’s position and orientation. Another problem in this context is switching between different registration techniques such as GPS and fiducial markers or other techniques. Virtual content quality and quantity: AR applications rely on virtual contents, and virtual contents depend on the density of the point of interest (POI). While the density of POIs is high in some locations such as downtowns of the cities, it gets scattered when it gets to rural areas. This brings an unpleasant experience for AR users. The other factor is the quality of the contents especially when it comes to comparison with the Web [79]. Technology itself is imposing limits on AR from different perspectives. These limits are partly because of hardware limits imposed due to frame rate. There are also algorithmic limits imposed due to computational complexity. Scene complexity and calculating state of virtual contents are two examples of such limits. Scene complexity: Tracking moving objects depends highly on some factors including frame rate, the motion of the target object, and prediction algorithms. The speed of the target might cause tracking to fail. Sensor based tracking can be helpful here. However sensors need maintenance, and besides, they have short ranges and can only be tracked when they are in the scene. This makes them unsuitable for outdoor applications. Another approach to overcoming this problem is to use natural features of the target objects, “edges” for instance. To do so, different prediction algorithms must be exploited, and predictions usually are error prone. Hence, these types of algorithms are computationally heavy and not always applicable. Kalman filters [42], for example, are used to handle the uncertainties of predictions, but these filters only apply to mostly linear systems that can be described by unimodal distributions which often is not the case for outdoor AR applications. 25 Calculating state of virtual contents: For interaction purposes, AR needs to have an accurate calculation of virtual contents. In many AR applications, users are interacting using the physical objects and virtual information attached to them. Tangible surface as the interface has two main constraints. First, it is difficult to recognize the state of the virtual contents about the physical object. The result of a study done by Grubert et al. in 2011 indicates that the content and registration issues are one of the causes for discontinuing the use of augmented reality browsers [35]. Secondly, dimension calculation in tangible settings depends on the surface of the tangible object. Although it is possible to exploit markers to mitigate the issue, hand occlusion can easily fail marker based solutions. 2.2 Client-Server Architecture Though an entire AR application can be implemented in a user device, today’s AR applications are typically implemented on a distributed system consisting of client devices, a communication network, and a server. Hence, AR applications are generally distributed applications. Figure 2.7 shows the client-server model that is used to implement distributed systems. In the client-server architecture, a distributed application is structured with having two main components, namely, clients and a server. The clients reside in the user devices, and the server with associated databases resides elsewhere in the network. A communication network such as the Internet connects the server and the clients. The server provides a set of services to the clients. Often, the clients initiate service requests through the network, and the server responds to them by offering appropriate services. According to Alex Berson [20], the client-server approach offers many advantageous such as leveraging desktop computing technology and, recently mobile computing technology, reducing the network traffic by residing the processing close to the source 26 Figure 2.7: Client Server processing environment of data, facilitating the use of graphical user interface (GUI), and above all encouraging Open systems. However, it is aptly emphasized that a client-server architecture must be founded on standards-based architectures to fulfil interoperability and application portability requirements. 2.2.1 AR Architecture Although several frameworks have been proposed and implemented for AR applications, currently, AR applications are typically modeled as client-server architecture [32, 71, 74, 84]. In this model of AR, when a user is close to the target (i.e., a client detects a target) the corresponding client software requests that the server provide the predefined virtual contents. Then the client, with the help of the server when required, registers those virtual contents with the real world of the user. The virtual contents are then rendered and displayed along with real world objects by the client. Some implementations use separate databases for target objects, virtual contents, and subscriber information. Gassmann et al. split AR into two main tasks [32]: object recognition, which is 27 handled on the server, and tracking the object, which is on the client side. They have efficiently implemented their platform on the Android system. The proposed platform provided AR compatible target detection service for 300000 object clusters having a response time less than 2 seconds. Another implementation of AR using client-server architecture is shown in Figure 2.8. The tasks of detection and tracking have been implemented on both client and server sides, and their final location depends on the application requirements. The choice of implementation is based on the application’s requirements and available resources such as computing and storage capacity of the client devices and the network delay and bandwidth. Similarly, virtual contents that are to be superimposed on real scenes are stored in the client devices. Figure 2.8: AR using Client-Server Architecture Since the focus of our research is on the server side, we are more interested in the role of the server in the client-server framework. Sobota and Janoso [74] have listed the main modules of an AR server application as follows: 28 A) Data management This module is responsible for providing AR data including 2D and 3D models, texts, graphs, photos, and target data including markers and targets. This module manages to store and to retrieve requested data from the database. B) User management User management includes user authentication, access controls, user data sharing, user profile management,and etc. C) Calculation of position (content management) Location and direction of the user are needed for marker identification, camera position calculation and gathering all marker related data. In general, this module is for managing content information. D) Network connection All client-server interactions are using the network connection, and AR data, target model, marker data are transferred over the network. Establishing network connection and maintaining it is a crucial factor for any client-server based AR application. E) User interface The main role of the server interface is allowing users to modify and edit all types of AR related data. This includes markers, models, targets, labels, texts, clients’ information, and etc. Shen et al. have proposed a client-server architecture and mechanisms to support product design in a collaborative AR environment [71]. Figure 2.9, shows the flow of information and interaction between clients and the server. There are several benefits of dividing AR applications into client-server system instead of an all-in-one system [74]. Those benefits are, briefly: 29 Figure 2.9: Flow of information in client-server framework [71] A) Mobility An important aspect of an AR application would be mobility. Since many of AR applications are in the fields, it would be desirable to have a light and easy to move device. Splitting AR into client and server allows developers to use server’s computational power and memory storage. This makes the client-side device more affordable regarding weight, scale, and price. B) Centralized data Since client-server architecture provides a centralized management of data, it eventually increases performance comparing to all-in-one system. Updating data or modifying it is centralized and simplified. C) Scalability Scalability can be understood in different ways, here we are considering it as the ability of the system to work efficiently with a large number clients and a significant amount of data. In that sense, a client-server approach would outperform an all-in-one system due to having bigger resources and easier way of increasing 30 these resources. For example, it is difficult for users to increase their smartphone’s memory beyond 128 GB. Therefore, an AR database of 1 TB would be impossible for a big portion of users. D) Cost Computational power and memory storage are two costly elements of computer systems. By outsourcing part of these two elements, the client-server approach is more beneficiary in comparison to All-in-one systems. Although there are many benefits of a client-server architecture for AR applications, however, with the current implementation of this model, each application mostly has its own server and proprietary databases [36, 50]. The database maintains the target and virtual objects and the number of subscribed users. In this regard, Alex Berson in [20], has summarized the disadvantageous of vast proliferation of the Proprietary standards for a client-server architecture as summarized below: • The very high cost of switching for customers that are locked into one vendor’s system. • In case of migrating to another environment, the cost of documentation and user training. • Software developers tend to develop for vendors with larger systems and thus larger systems will always have a competition advantage even when they are not meeting users’ need any more. Therefore, in the next chapter, we are going to discuss in more detail the restrictions that a typical client-server approach imposes on AR applications, and then we propose a new framework for AR applications. 31 2.3 Web 2.0 and Social Services 2.3.1 Web 2.0 Web 2.0 refers to standards that enhance the freedom of sharing and reusing of Web contents by using open communication standards and decentralization of authority [18]. Web 2.0 is not about technical specifications, but more about enhancing the way users are utilizing the Web. The focus of the Web 2.0 is creativity, communications, secure information sharing, collaboration and functionality of the Web [53]. Web 2.0 supports the wisdom of crowds [80] and the idea that large groups of people are smarter than an elite few, no matter how gifted those few may be. This way of thinking has led developers to come up with solutions that shift the paradigm of considering users merely as consumers of information to a more interactive and cooperative paradigm in which users are also producers of information. Web 2.0 is distinguished from Web 1.0 with users’ ability to create Web content, whereas in Web 1.0 only developers and authors were making Web content and users were accessing them without the capacity to modify them. This unidirectional way of accessing Web contents in Web 1.0 has been changed to bi-directional communication, allowing users not only access the information but also to create and modify them. The other simple yet powerful concept that has been introduced by Web 2.0 is keyword tagging [53, 70]. Tagging is capable of replacing sophisticated semantics of the Web and allows a broad audience to search and access the contents of the Web easily. Tagging also provides an efficient way of organizing and sorting the contents for both developers and users. 32 2.3.2 Social Networking Social networking sites work by creating and managing user profiles which are a fundamental concept in all social networking services. These profiles and their data can be shared among members of the social network. The social network itself is formed by linking the profile page of the members. This linking of the pages is a function under the user’s control and part of their contribution. Users link the pages based on their shared interests or shared friends, etc. The social network allows searching for contents and friends’ profile through a vast number of pages [18]. Social networking and crowdsourced contents can be seen as a result of Web 2.0 concept. Especially because of the open standards of Web 2.0 that provide the necessary functions to create, share and search for data in the massive scale of social networks. Through Web 2.0 APIs, information from different sources are combined, and users are experiencing this enriched environment in the social networks. Once again the simplicity of Web 2.0 is in reusing the existing protocols such as HTTP requests, JSON, and AJAX calls that are used to implement the infrastructure of Web 2.0. This infrastructure handles all of the required functionality of Web 2.0 [70]. Social networking is a Web-based service with three key features: a) capability of having a User profile, b) connecting to other users and showing the list of connected users, c) view and travers the list of connections of other users. The core element of social networking is not just the networking, but the capability of showing and sharing the profile and the network of friends which are members of the system. Several social networking services (SNSs) have been developed, but their main difference is in the structural variations of visibility of profiles and networks and also in access to the contents. There are also functionality differences among SNSs such as capabilities of video and photo sharing, built-in blogging, and instant messaging [23]. The research areas in social networking are mainly about analyzing the behavior of users such as the ways people are communicating, who do people are communi33 cating with, who do people share information with, etc. [24, 27]. DiMicco et al. have summarized the motivation behind sharing information on the social network in three factors of caring, climbing, and campaigning [27]. Caring is about connecting on a social level which is a source of personal satisfaction. Climbing is about career advancement, which doesn’t seem to hold true for all type of social networking sites. The last theme in sharing contents in social networks is campaigning which is about sharing ideas and seeking support for it. Their results show that most shared content belongs to comments that users have written with 20.3% contribution and the next is adding connections with 11.2% and then photos, status messages, about-you’s and list sharing. 2.3.3 Social Media It seems that social media logic has blended with mass media logic. Therefore, it is imperative to understand mass media’s strategies and tactics first. Mass media considers the world as a continuous flow of events - a stream of things and people out there. Albeit the fact that mass media applies filters and adjustments on the level of exposure for the covered items, they are trying to present themselves as neutral platforms that cover different voices and opinions fair and just. To legitimize its independence, mass media uses ratings, polls, and surveys as evidence of audience demand. Framing reality and media’s neutrality or independence claims have been reported as elements of mass media’s logic by Altheide and Snow [12]. Same as mass media, social media also handles polling and surveys and ratings. The difference is in the capacity of social media platforms to seamlessly integrate those processes in the architecture of social media. According to Dijck and Poell [26], social media logic is about channeling social traffic. More precisely, social media logic refers to “the processes, principles, and practices through which social media platforms process information, news, and communication.” Having a two-way traffic of data from producers to consumers and also from con34 sumers to producers is the most significant functional difference in the logic of social media and mass media. As mentioned before, the two-way traffic is the key concept and distinguishing factor of Web 2.0. In this regard, Kaplan in [44] defined Social media as “a group of Internet-based applications that build on the ideological and technological foundations of Web 2.0, and that allow the creation and exchange of User Generated Content.” Contribution of consumers in the process of social media has changed their nature from observers to actuators that can affect and shape the results and the process of social media. This difference alone has hit all of the main principles of social media namely programmability, popularity, connectivity, and datafication. For instance, programmability in mass media is an editorial strategy to glue channels to keep their audiences from one item to the next as a continuous flow by content manipulation [12]. However, when it comes to social media the code and users are taking the place of content and audience, and the one-way traffic is replaced with two-way traffic [83]. In this environment, users can form the stream of information by posting contents and voting to ascend or descend the priority of items. 2.4 Related Work In dealing with a context-aware service, it is important to understand the nature of the context in the application that the context is going to be used. The better understanding of the context would help the designers to provide better support for the needed behaviors. This discussion of context is relevant to handheld computing since in handheld computing, the freedom of mobility of users is increased. The growth of mobility makes the environment of users much more dynamic, and users’ context such as their locations, and surrounding objects are changing more frequently. Therefore, supporting this dynamic environment would need an adaptive service that can provide necessary information related to the user’s context whenever necessary [11]. 35 According to the definition of context given by Dey and Abowd [11], any information that is useful in the evaluation of an entity’s situation can be considered as context. In this definition, the entity is any object that is relevant to the interaction between a user and an application. Therefore, a context-aware application is using context information to provide relevant service to its users. With this definition in mind, we went back to augmented reality to see what would be the meaning of context in the AR application. Augmented reality is referred to a technology that overlays virtual contents such as images, texts, graphical 3D models on real objects [16]. As for any context-aware service, the overlaid information are related to the context of the user’s task. Therefore, in any AR application, we have two concepts of context and content. AR applications are using context information to deliver content information to meet user’s need. Generally, in AR applications the context would be some elements of a physical object, and typically those elements are locations of objects or visual patterns. The visual patterns can be natural patterns of objects or a QR code attached to an object. Also, in some platforms such as Vuforia, the visual pattern of an object is referred as a target. Both of the target and the POI are context information of an object (or a user) in an AR application. For simplicity, we are going to use the target to refer to both of them unless it is needed to distinguish a target from a POI. Contents, on the other hand, are the computer generated graphics/information that is going to be superimposed on top of the physical object. The separation of content and context does not seem clear in some of the previous works. For instance, Grubert et al. have mentioned the content availability as a source of complaints about users in their report [35]. However, they have used POIs (which is a context information) to refer to content availability. This being said, Slabeva et al. in [77] have classified augmented reality as a context-aware service, and they have given a clear separation between context and content in a context-aware provisioning ecosystem. They have conceptualized a future context provisioning ecosystem shown in Figure 2.10 consisting of three clusters, context provisioning cluster, content provisioning cluster, 36 and network operator cluster. Network operator’s cluster handles the data connection to the end user. Location information providers, as well as social network site operators and sensor network operators, are in context cluster. This cluster also has a context aggregator which infuses all the context data coming from a variety of providers to provide a comprehensive context information on a specific user which is considered a significant added value. Content provisioning, on the other hand, includes a variety of content providers from broadcasters and newspapers to user-generated content and consumer opinion platforms. The content cluster also includes a content aggregator that bridges the content providers to service providers by delivering the contents from different resources to its consumers in several service providers. Figure 2.10: Context provisioning ecosystem [77] The quality of experience of a context-aware service such as augmented reality 37 would heavily rely on the quality and quantity of context and content information. The survey report of Grubert et al. shows the content quantity and quality is one of the major sources of user’s complaints and their reason for abandoning the augmented reality application [35]. Hence, it is vital to investigate the reasons behind the lack of context and content information, mechanisms of sharing this information among providers and consumers, and methods of contribution in authoring and provisioning them. We categorized our study and investigation generally into three sets of problems. The first category is the issue of lack of formats and standards for context and also not having enough context information. Same as context, for contents also there is not any widely-adopted standard format which makes contents platform dependent [36, 40, 50]. This problem affects AR by increasing the cost of application development and by reducing the number of targets for a single AR application. Consequently, AR applications lose AR users due to the shortage of targets. The second type of problems is related to the architectures that do not provide effective ways of sharing the context and content information. As described in [77], the future ecosystem of context-aware service includes context and content aggregators. AR infrastructure as a context-aware service should include context and content aggregators too. Not being able to share AR resources does not directly affect the overall number of resources in AR. However, it reduces the accessibility of information for a single application, and subsequently, it leads to losing AR users. Finally, bringing user contribution into AR is a recent research area of AR. In previous sections, we discussed that user contribution is one of the pillars in Web 2.0 and how much it has affected networks and media. In this regard, Schmalstieg et.al [40] have proposed AR 2.0 which is the integration of Web 2.0 and AR. AR 2.0 aims to enable user generated contents, information sharing, and massive-scale deploy of information. In this work, the researchers have named low-cost display platform, mobility for AR, glsbackend infrastructure for distribution of AR content and application, authoring 38 tools, and real-time AR tracking solutions as five key technologies needed for AR 2.0 to fully emerge. Although the intention of this thesis was not AR 2.0 in the first place, our investigation on the limitations of AR brought us to the same conclusion: current backend infrastructure is not providing an efficient way of distributing and sharing AR resources. We also, agree on the real-time tracking solutions. Therefore, in the next chapter, we are going to propose a new architecture capable of distributing and sharing AR resources and at the same time providing real-time search. There have been several research studies on each of the mentioned problems of AR, and we could not bring all of them in this work. However, in the following, we are surveying a subset of the previous works based on the attention they have received in the literature by their number of citations or the novelty of the works and also the inspiration that they have had on our work. 2.4.1 Target and Content Related Issues Gu et al. have emphasized on the proprietary formats, standards, and architectures as the primary source of problems for both communicating and sharing information as well as the wide-spread adoption of AR applications [36]. Hence, they have proposed a new open solution which includes open content format and flexible framework. The solution is based on the physical attributes of the objects and is called Cyber Physical Markup language (CPML). In analogy to the Web which URI is used to identify a resource, here they are using geographic location to identify a physical object. A sophisticated representation of visual features of an object is also being supported. The main features of their proposed work are: a) the ability of grouping multiple objects in one CPML page which allows scalability and flexibility, b) a convenient way of creating and editing CPML pages, and c) adoption of conventional protocols for the Web which ease the navigation system among pages and reduces the development cost for cyber-physical Web. The infrastructure of the CPML is shown in Figure 2.11. 39 Figure 2.11: Infrastructure of Cyber-Physical Web [36] Although CPML is building on top of the existing Web protocols, however, having a proprietary format for visual representation will add another format to the long list of proposed formats and exacerbates the situation. Another aspect of the proposed framework is tightly coupling targets and contents which is not necessarily a good idea. For instance, for a single target each user can get a different content depending on their application or their profile, and this is not addressed in the CPML. Also, their reliance on location-based services such as GPS, which is not always available especially in indoor situations, can be their major drawback. Another problem can be constant updates of the CPML pages due to the dynamic nature of the AR resources. For example, for mobile targets, their location should be updated in their correspondent CPML page constantly. Hence, the structure of the page should be revised as soon as the target leaves one group and joins another group. According to Applin and Fischer [14], majority of the AR applications is based on the geolocation. Nonetheless, there are plenty of AR SDKs such as Vuforia, Wikitude, ARToolkit, ARmedia, D’Fusion [2, 13] and recent standards such as ARML 2.0 [1, 50] which support natural feature and visual search and detection of targets on a variety of devices from PCs to smartphones and tablets. Therefore, any proposed standard or platform for AR should consider both location-based and visual AR. 40 Besides the problem of proprietary formats, creating, storing, and sharing of targets and contents are another set of issues that have been addressed in [28]. According to the authors, a major challenge in the current applications is their need for a massive database of targets to recognize objects, and the current infrastructure is not supporting dynamic adding and sharing information (i.e. targets). They have offered a framework for AR in handheld devices which is shown in Figure 2.12. Using their proposed framework users can create, share, and import POIs which are basically in the form of XML files in the format of KHARMA [38]. Users can share the XML file and notify friends on their social network page of the newly created POI. Figure 2.12: System architecture for AR application [28] Fanjiang et al. provided a rapid way of developing AR applications and sharing POIs [28]. However, they have considered only location-based services for AR, and they have not addressed other types of AR. Also, it is not clear how the proposed design would support the already existing AR applications. It is worth to mention that the framework is not supporting communication and messaging between different AR platforms. 41 The problem of lacking targets and the target database scalability has also been aptly emphasized in [75]. Song et al. have used social network’s service to gather more images as targets. They have implemented a client-server based framework. The server side has three modules: image recognition module, social network service crawling module, and a database of image contents which is used as targets. Their proposed architecture is shown in Figure 2.13. In their proposed architecture, server-side gathers images from social networks which have textual annotations. The image recognition module communicates with the client to respond the client’s image based queries. The client-side app is a tool to query an image in the server side. Their proposed structure is not supporting social networking in AR. However, it uses social network’s services (SNSs) to help scalability of the target database in AR. The work is aspiring because they have presented a new approach toward integrating SNSs in AR as an open platform. They have also used user participation for contents’ quality and quantity. Figure 2.13: System architecture for mobile AR application[75] 42 2.4.2 Web 2.0 in Augmented Reality In the case of integrating Web 2.0 in AR, many researchers have tried to enable social services such as social navigation, social network, and social media. Some Researchers have enriched the AR experience by bringing more contents and information from social networks into AR. Kang and Hong defined AR as “technology used to make expressions by combining medium (text, image, sound, video, etc.) linked to the real world.” They argued that such expressions would put the human in the periphery and, because of its mediumcenter nature, essentially limits the expressions to the relation map around the objects. The authors believed that expressions that put human in the center would be more context-aware. Hence, media and objects would be expressed through contextual reasoning. According to the authors, services based on AR do not have the capacity of supporting dynamic behavioral changes made by users. Therefore, they suggested using social networks capabilities—especially the information stored in the profiles of SNSs—to enrich AR. They have developed a system which is capable of linking to Facebook, LinkedIn, and other SNSs and fetching location information and showing AR objects related to the users’ location information [43]. The similarity to our work is that we also agree the AR alone is defective when it comes to supporting communication and dynamic behavioral changes. However, we want to enable AR through restructuring its framework to be able to overcome those defects. Our approach is embedding social network services into AR rather than borrowing it from SNSs. They have also considered only location-based technology which would be a limitation because not all AR applications are using location based techniques. Another difference between that work and ours is in the fact that they have considered the IT device (handheld cell phone) as a Hub for information sharing and communication. In our framework, we have used a middleware between all app servers and content providers to play the role of an information hub. 43 AR has been used to connect people either by using some form of social networking or pointing to friend’s location on a 2D/3D map. This way of using AR is interesting for us since we have also offered this feature in our proposed framework. A platform named SPORANGIUM has been presented in [54], using which it is possible to create ad-hoc networks and support the creation of sporadic social networks. The goal was to get the most from the people and the resources from the surrounding environment. SPORANGIUM provides a broad range of functionalities in different levels, namely, application, knowledge management, mobile cloud computing, and ad-hoc communication. The platform relies on an ad-hoc network on the first level, and therefore it is trying to establish connections proactively. Another aspect is the number of value added services that can be provided and shared even without the internet and just by ad-hoc networks for sporadic social networking. The researcher has used the museum as an example of an application for which users only need to install one application by entering the museum, and they can engage with people that are physically close (in the museum). However, the context is restricted to the location only. For example, if two people in different museums are interested in the same historical location or object, they cannot engage in communication using SPORANGIUM’s design. This is one of the most important points that we are trying to address in our proposed framework. Another problem with SPORANGIUM is that it only connects people who are using their platform (application), whereas in our proposed framework, we are trying to bridge between different platforms and applications. MeetYou is another example of the applications that are using AR to connect people in close geographical range [73]. The software offers functionalities such as registration, login/logout, friend management, grouping users and assigning different parameters to each group, and notification if a member is close. Users can “check in” the application to let their friends to be notified of their frequently visited places. The novelty in MeetYou was that it notifies its users if a member of a group is nearby. Hoang et al. used augmented reality to visualize trails of the visited locations by 44 friends [39]. A blue cone is an icon shows the visited location by a specific person illustrated in figure 2.14. If the user steps into the blue icon, he can call the person who has visited that location before and can start a conversation using VOIP (voice over IP). Figure 2.14: Visualization of the trail marks in AR [39] The Researchers have tried to support mobile 3D AR information for WEB 2.0. The assumption was that the locations of the friends were going to be available either by friends or by mining Twitter or Flickr, etc. This work is bringing information mined from social media i.e. the location of users, into AR. Then they were using the mined information to contact the users who were online and reachable through VOIP. Although the proposed method of using SNSs in augmented reality is exciting, however, it only considers location based AR. Also, it was not used for networking or communication with a group of people. The other aspect is about the implementation of their proposed method. It has been implemented on the wearable device, which is not convenient so much, and its technology is not prevalent. In studying the previous researchers’ works, we saw two approaches regarding to mixing social networks and augmented reality. The first approach is trying to enable user communication and user profiling inside augmented reality. This needs to enhance AR backend infrastructure to be capable of supporting SNSs. This is the approach 45 we have also adopted. The other approach is using AR features to improve social networking environment and social network experience. For instance, De Chiara et al. have followed the latter [25]. Instead of enabling some form of communication between users in AR, they argued that it is possible to offer new interaction and communication techniques to social networks due to having mobile devices and hence mobile users. They have focused on the mobility aspect of the users with mobile devices. Their work has presented Link2U as an integrated solution which is trying to combine augmented reality with social networking to answer the need of information about the surrounding environment of the users. It offers functionalities such as messaging and calculation of road on a map and identification of other social network’s users and POIs inside AR. In Link2U, users were divided into contact lists. Upon users’ presence and location availability, other members could see the user on a map of the environment (shown in Figure 2.15a) or in the live mode (illustrated in figure 2.15b). (a) Link2U map mode visualization (b) Link2U live mode visualization Figure 2.15: Visualization in Link2U [25] Functionalities that Link2U offers are visualization of the connected people and route calculation toward a specific POI that can be a user. However, the system is not providing a communication module. Link2U is another example of a client-server based application. Social service in AR has been used for learning purposes too. Social augmented reality (SoAR) is a framework that has been designed to enhance learning in construction 46 work. SoAR improves social interaction among peers with the focus on augmenting synchronous communication in response to new contexts [63]. The authors have categorized the found challenges into the emergent context (material shortage and access to context), synchronous communication (synchronous communication with the responsible people at the time of need), bi-directional content authoring (users should be able to generate and publish contents), and social interaction. SoAR enables the communication among the coworkers, and it augments that communication in the form of drawings on the screen. The provided functionalities of SoAR are professional profile building, instant messaging, and vision-sharing. The proof of concept for their framework was a Web-based application that works on mobiles using browsers shown in Figure 2.16. Figure 2.16: Vision sharing feature for SOAR proof of concept [63] Mixing real with virtual is a continuum, and there are spaces in between such as augmented reality and augmented virtuality shown in Figure 2. Jang et al. have tried to connect the virtual world with the real world through augmented reality and augmented virtuality [41] (illustrated in Figure 2.17). Their project provides the following functionalities: it maps the real world space and users into a virtual world and also augments the real world with the location of the avatars in the virtual world. Also, the message passing has been enabled between users in reality and the avatars in the virtual world. Similar works have been done before such as cAR/PE [66] using which users from 47 Figure 2.17: Bridging augmented reality and augmented virtuality [41] different worlds can interact through a video conference. Other studies such as XIM [19] and TwinSpace [67] have tried to build an integrated world of real and virtual world. The problem with all these systems is that they have used a dedicated environment which makes it difficult to be widely adopted. Researchers in [41] have designed a prototype system called SyncIS (Synchronized Indoor Space). SyncIS has the functionality of supporting location-based social networking to the public users which differentiate this work with works before. From what reviewed we could see a need for an effective framework that can support user contribution in AR at the same time capable of sharing targets and contents between different AR platforms. Although there are standards proposed for content [38, 50, 55], still the majority of the contents are proprietary formats. Not being able to share the contents contributes to one of the most glaring issues of AR applications which is the paucity of sufficient targets in AR world. The shortage of targets does not entice people to use AR applications in their daily lives. Many applications such as Life360 [7], LOCiMobile [8], FOURSQUARE [4], and Glympse [5] are offering location-based services that allow the users to share locations and share their path even communicate about places. On the other hand, applications like Layar [6] and Aurasma [3] are offering augmented reality. However, so far we have not seen an application that have combined them both effectively. Although, 48 communication by itself is not a new concept; forming a communication around a POI in augmented reality space is a novelty. 2.5 Chapter Summary In the first two sections of this chapter, we overviewed augmented reality by providing its history, the way it functions, and its applications and challenges. We showed the backend infrastructure of a typical AR application and how it is incorporating clientserver architecture to implement AR application. The problem of such structure could be the high cost of functioning and maintenance in addition to unfair competition advantageous for certain vendors. In the rest of the chapter, we covered Web 2.0 and social services. We explained how Web 2.0 and social services are promoting the freedom of sharing and reusing of Web contents. Regarding social services, the effect of having a two-way traffic of data from producers to consumers and also from consumers to producers have been explained in detail. The rest of the chapter is surveying the research studies that have tackled many of the problems related to lack of standards, AR architecture and incorporating Web 2.0 and social services in AR. What we present in the next chapter is a framework that allows users and developers to share targets from different platforms and provides an efficient way of communication among users of various applications. 49 Chapter 3 Proposed Framework “It is not the beauty of a building you should look at; its the construction of the foundation that will stand the test of time.” — David Allan Coe 3.1 Basic Idea By considering client-server architecture and the reviewed works discussed in the previous section, there are important aspects that are worth to highlight. 1. AR as any context-aware service relies on the availability of contexts (targets in our case). These targets currently are stored either locally on the storage of the device which is running an AR application or on a remote server of the AR application. One major motivation of our work is to make it possible to gather all these targets under a common framework and make it accessible to all AR applications. Obviously, we are not going to propose another proprietary format. Therefore our intention is to offer all of the services provided by the framework through Web APIs. 50 2. Currently, in most cases of AR applications, context and content developers are setting the stage, and AR users are merely consumers of data. Although it is possible to implement a client-server architecture that is capable of having bidirectional requests and interactions, this is not the case that happens regularly in current architectures. We intend to make it possible for users to contribute in creating and sharing targets and contents. 3. Social services have gained much success and attention recently. However, a reliable way of communication between users of different AR platforms has not been proposed. Users should be able to communicate with each other, review and vote, join and disjoin from AR social groups. User communication is another aspect of our intention behind our proposal. 4. Interestingly, all three points mentioned above are related to a greater concept in the Web, named Web 2.0. It is more than a decade now that Web developers are trying to harness the power of users’ contribution in the form of comments, likes, user profiling, content sharing, etc. Web 2.0 emphasizes user contribution and seeing the Web as a programming platform. Services scattered around the Web are getting integrated which adds value to the information. In this regard, there are efforts to integrate Web 2.0 in context-aware services. Looking from this aspect our proposal becomes an example of integrating Web 2.0 and a context-aware service (i.e. augmented reality). 5. Last but not least, we noticed that there are subtle assumptions or misconceptions of coupling the concepts of target and virtual content in many AR related types of research. Though this is not the case with all of the studies, here we want to emphasize that target and content can be decoupled, and each of them can be found in a separate place of the architecture of an AR application. Another assumption is that the provider of a target and the developer of content for that target are necessarily the same. We want to show by decoupling target and content; content providers would be able to develop their desired contents and deliver it to 51 their consumers without worrying about creating and managing a target. Hence, for one target there could be various contents and Content providers. 3.2 Limitations with Current Structure There are limitations imposed on current AR applications due to the aspects mentioned above and present client-server architecture and implementation styles adopted by AR developers. These limitations are as follows: (A) User interaction and contribution An example of user interaction would be to imagine a person at a famous scene (like Vancouver’s suspension bridge), and he wants to send a message to users interested in the same target and share his idea or ask them to join him on a special activity. These kinds of interactions are not supported at the moment. According to Applin and Fischer [14], stories are fixed, single narrative and a group-oriented social experience augmented reality is missing. One reason of that as discussed before is because of storing targets on proprietary databases either locally or on cloud servers. (B) Limited number of targets AR applications rely on the targets and the virtual contents that are going to be superimposed on the detected targets. Research result conducted by Grubert and et al. shows that users are complaining not just about the shortage of targets around them, but also for not meeting their expectations [35]. Even the existing contents are not up-to-date in many cases. Currently, for example, every browser implements its own client-server architecture for the same application. Each implements its own set of targets and the virtual objects. We explain this problem by analogy from Web browsers. Suppose a person uses Firefox (an Internet browser) to reach to a Web address. The same address returns an error of “Server 52 not found” when used in Google Chrome. This is what happens with current AR applications. Even though each of AR applications has a database of targets, but in general there is not a standard way of sharing the targets. This is a huge drawback and plays a significant role in the lack of targets in AR world. (C) Lack of contents and content sharing Since it is difficult for individuals to create virtual contents in a global scale, there are not an adequate number of virtual contents in AR world in general. This shortage of virtual contents and contents authoring tools is one of the major drawbacks of AR systems [21, 87, 88]. Take the previous example and think after enjoying suspension bridge you want to put a note in the virtual diary of the place or you want to read other visitors’ opinions about the place. Since the diary is on a proprietary server, only a limited number of users would be able to access the diary. One way of coping with this problem would be developing the same content with several formats to support as many AR applications as possible. This approach would increase the cost of development [36]. Furthermore, it is not a scalable approach if there are many AR applications. (D) Target and content naming convention Consider a situation that a user has joined an AR experience and wanted to comment about his experience or put some likes under the target. The user would need a method to reference the target. To the best of our knowledge, currently, there is no naming convention for targets and contents in AR that can capture all of the resources in AR. In analogy with the Internet, URL (uniform resource locator) is used to reference and access a resource over the Internet. The same way we need a method to be able to refer to an AR resource. We are aware that reference to a resource and a mechanism to access that resource is not just about naming convention. However, to access a resource, of course, the first step is to have a unique address (or a name) for that resource. The above limitations motivate us to look for a better model of AR implementation 53 that could enhance the user experience and expand AR applications. The enhancement in the proposed model is mostly on the server side. The servers of the applications especially, applications with common a platform must have the ability to interact among themselves and provide a comprehensive set of services to the users. Managing a federated set of servers is a challenging task, but its advantage, we believe, outweighs the challenge. Several implementation frameworks of the proposed architecture are possible. One such implementation framework of the proposed architecture is given in the next chapter. 3.3 Client Federated Servers Here, we propose a framework for AR applications shown in Figure 3.1. The main objective of the proposed framework is to eliminate or alleviate the aforementioned limitations of the current client-server model. It is an improvement and in some senses a generalization of the client-server architecture. We refer to this new improved architecture as Client Federated Servers (CFS) architecture. The proposed framework focuses on the server side, and the client side can be the same as a typical client-server AR application. However, the proposed architecture has been designed to serve a broader range of clients including AR applications, target providers, content developers and any other Web services. This architecture is using Web APIs to receive and respond to requests. AR applications have their own target and content databases. However, using an intermediate target server i.e. “Target Hub” (TH), a server can reach out to other targets and contents that have been shared by other applications and developers. It is important to mention that content providers do not need to develop a whole application to share their AR contents. What they need to do is to subscribe and upload their contents to the target hub to make it globally accessible. 54 Figure 3.1: client federated servers architecture for AR applications 3.3.1 Description of Components The proposed framework has the following components. Web API: An interface with multiple predefined methods that exposes services and data, and provides a way of communication between the target hub and AR resource providers and consumers. It receives and responds with HTTP protocol using JSON format. Using API allows third parties to access to AR resources easily and plays a major role in realizing the intention of the proposed architecture. Documentation of these APIs is provided in the next chapter. 55 Request Manager: It plays controller role of the architecture. It includes logic, algorithms, and rules of the system. When a request is received in the Web API, the request is translated into a command for request manager. Request manager initiates a chain of method calls, message passing, data request, and data restoring to fulfill the request. Since the provided service is customized to the service requestor, any authorization and billing or customization happens in this layer. Therefore, it is vital to maintaining a profile of clients, and for any service request, client profile manager should be negotiated. Client Profile Manager: This component keeps the profile of the subscribed clients that have registered in the target hub. A registration process starts with an HTTP request which is supported by the Web API. The registration request will finally come to the profile manager to check for name availability and other requirements. Clients can build their profiles by providing basic information including a unique name and a password. Clients also can specify if they allow their contents and targets to be stored in target hub or not (for licensing issue). Keeping a record of client profile would allow the target hub to customize its services in a smarter way. For example, in case there are multiple contents to be forwarded for a server request, target hub is going to decide which content should be sent to the server, based on the provided priority list. Target Manager: It handles the requests for the targets and includes a database of targets and records of the requests for each target. Target providers can share their targets in the target hub with a unique name. A target’s name is composed of two parts. The target provider’s profile name which is a fixed and unique name for all the contents uploaded by that provider and a given name by the target provider. Target manager keeps the record of requested targets, the time and the number of times a target has been requested, the number of servers, etc. This information allows AR developers to have a better understanding of what users are more interested in and what their preferences are. These statistics are shared by target hub as added value service. 56 Content Manager: It is a database of the contents and information about each content. Content developers can upload their contents using the Web API to the target hub. Each content should at least have one target upon which the content is going to be overlaid. The content’s name similar as the target’s name is composed of two parts. The content developer’s profile name which is a fixed and unique name for all the contents uploaded by that developer and a given name by the developer. The content manager, also keeps the record of the requested contents, how many times content has been requested, how many servers have requested for a particular content, etc. This information is reachable by third parties for their analysis. By handling the requests and providing the virtual contents, the target hub is playing the role of a content aggregator which is an essential element in any context-aware service. Content consumers send their request using an HTTP request. The content manager is responsible for retrieving the content and updating the record of the content in the database. User Interaction Module: All forms of communications between users including user’s interactions with contents and messaging between users are handled in this module. User interaction module keeps the history of communication for each target. End users start their communications through their service providers (an AR application for example). In case target hub has been adopted by the service provider, end users have the chance to communicate to other users around the world that are augmenting the same target. The interaction between users can be in voice, image or text form. This module receives the interaction requests from the request manager and processes the requests. Synchronizer: Any update on a client’s profile, target’s record, or on a virtual content’s record has a possibility of creating inconsistency. For instance, for any content, there should be at least one registered target. If there is any content that has not been registered for a target, it is an inconsistency in the target hub. It is the synchronizer’s job to keep the records consistent. The Communication history also needs to be synchronized among all parties of the communication. This is 57 another responsibility of the synchronizer. The synchronizer also interacts with the file system to categorize and aggregate contents and targets. 3.3.2 Benefits of Client Federated Servers Framework In a big picture, the proposed framework is an intermediary that connects AR resource providers and consumers and also it has the mechanism to support a limited interaction among AR end users. This framework also plays the role of context and content aggregator. Meaning that several types of context information of AR applications such as points of interest and visual patterns in the form of targets are stored and provided to AR resource consumers. The same is true for the contents; the proposed framework provides a way that content developers can share their contents independent of the targets. On the other hand, user generated contents are also stored and provided to any content consumer. Besides considering the Web as a platform, user contribution has been emphasized as one of the main objectives of Web 2.0. The new architecture makes it possible for users of different platforms to communicate with each other and share ideas about targets and contents. At the same time, users’ contribution in the form of interaction with targets and contents such as voting and rating can be reached out by any interested third parties. Especially, popularity rating has been mentioned as a key element for social media. Enabling this feature in AR increases the chances of using AR as a platform for social media. Two main material in AR are targets and contents. As mentioned in previous sections, lack of targets and contents has been a problem in AR for a long time now. Since app servers can communicate and register their targets and contents in the target hub, the most important immediate benefit of the proposed framework would be increasing the number of targets and contents. We believe there is no need to hardly couple a content to a target. We have emphasized this conceptual separation by having a separate component for each of them 58 in the proposed architecture. The proposed architecture allows content developers to deliver their contents to the content consumers without worrying about the complexity of developing a full AR application. Simultaneously, target providers need not to necessarily develop contents for their targets to be reached by end users. A popular place is worth to be added as a target to the target hub which motivates content developers to provide contents for that popular target. On the other hand, a popular content such as Pokémons (Pokémon Go game’s monsters) can raise interest to a target which in this case is a location. As we know, people go to these places to hunt some Pokémons. This environment would create a positive synergy among end users, target providers, and content developers to encourage each other to create more contents, provide more targets and start to use AR more than before. Having a way to know which targets and contents are more popular than others would potentially encourage developers to adopt the formats of those targets and contents. This, in turn, would open a way toward converging to a limited number of AR formats based on their usability and popularity. The fact that each application has only its own database of targets and contents gives a very restricted view of the world, especially for outdoor applications. AR users would need to switch between applications from place to place. Availability of more targets and contents in one application can reduce the need for switching between AR applications and results in a more intuitive way of using AR technology. Another outcome of the proposed architecture is its naming convention for AR resources, which provides a unique reference to any AR resource in the target hub. AR is a context-aware service with many resource providers from one hand and many resource consumers on the other hand. Still, no resource referencing method can identify a unique AR resource similar to URL. Finally, we believe our work has the potential to open up new paths for future of AR. Some of the concepts that are introduced or significantly emphasized by this work are naming convention (or uniform referencing method) for AR resources, decoupling targets and contents, the analogy of Augmented reality browser (ARB) and internet 59 browsers. In short, the outcomes of the proposed architecture are as follows: • Sharing targets • Sharing contents • Connecting app servers, developers and Web services • Separating content developing complexity from AR application development • Decoupling target and content • Unique naming convention for AR resource referencing • Context and content aggregator • Web 2.0 in AR (the Web as a platform and user contribution) • Communication among users of all platforms • Recognizing popular formats and platforms and a way to converge toward them • Potential to open up several new paths for future AR including naming convention for AR resources, decoupling targets and contents, comparing ARB and internet browsers. 3.3.3 Practical Scenarios Here, we want to show the problems we are concerned with and solutions we propose in the form of practical scenarios that can happen in our real life. Following scenarios are from different prospectives. Scenario A: Jules is working with an X-Ray machine with which she is not familiar. She needs help with the instruction of the machine. She is using an AR app that has helped her with other machines before. She runs the AR application and points 60 her phone to the machine, but apparently, there is no information available for this model of the machine. With the target hub implemented and adopted by the app developer, now she can search a couple of tags such as the machine’s make and model and instruction. What she is supposed to receive is shown in Figure 3.2. For each target that she selects there can be various contents from different content providers. Each of these contents has their features and specifications such as price, popularities (number of stars), and description of the content (Figure 3.2). Jules can install and preview any of them, and if she likes, she can buy and use them. She also can enter to targets’ social room and read or write comments about her experience. Figure 3.2: Searching for target and list of available contents for a target Scenario B: Jane is a computer graphic developer. Recently, she has decided to develop for AR applications. She wants to start her work by developing for popular platforms and targets. She also wants to advertise her design and get feedback from users. Using target hub, she can get a list of most popular contents and targets for each platform. She now knows that there are requests for a newly developed X-Ray machine, but the contents developed for it are not very helpful. She decides to develop a good user-friendly content. The content that Jane is developing here probably will be used by Jules. Scenario C: John is a technician in a medical hardware manufacturing company. His 61 company has recently introduced a new X-Ray machine with higher capabilities than the previous models. The problem with the new machine is that it has a complex instruction guideline due to having so many features. John is very familiar with the machine, and he can guide the users interactively. He decides to use his AR application to make a target of the machine. Since John’s AR application is using target hub, he now can read comments about the machine and answer questions and help the users. 3.4 Data Flow and Connection between Components To get into details of the system, we start by analyzing the processes and information flow. The flow of information between main processes of the system is shown in Figure 3.3. Target providers and content providers are sharing their products using HTTP requests. On the other hand, AR applications are using those targets and contents by providing information of their requested targets and the contents. AR applications are also sending and receiving end users’ communicated messages. All of the requests would initiate a request handling process. This process interacts with different modules of the system to handle the request which will be discussed later. If the request is for adding a new content or target to the database or (basically any update to the database), the request needs to be passed to synchronizing process. Requested target or content is provided to AR application directly by the target or content managing module. Communication history between all of the communicators needs to be monitored and synchronized. Therefore, any update on messaging history would go through a synchronizing process. This process checks the message info of the last message and number of communicated messages to keep every part synchronized. Third parties are the ones who desire to get context and content information or reports for their purpose of use. Third parties would need to provide their client info, and after their request is processed, they will get the report. Such reports are generated from databases. Generally, 62 for report purposes there are replicated databases not to hinder the production database. However, for simplicity, we are not showing replicated databases. Admin of the system can configure the system and generate reports of the system. For the sake of simplicity we did not put the end user’s interaction in the Data flow diagram shown in Figure 3.3, because in a conceptual view, that type of interaction is handled within AR applications. However, in a less abstracted view, end user’s interaction is shown in Figure 3.4. End user are using their AR applications to browse the world for targets. On the other side, AR applications are providing targets and contents to the applications. In order to have an exciting AR experience, there needs to be enough number of high-quality targets and contents. AR applications are uploading their targets and contents to the target hub. Simultaneously, AR applications are subscribing to the provided services. When there is no content for a target that has been requested by the user, the AR application server sends a request to the target hub. The request has information about the target that includes tags for the target, platforms specific information and some requirements for the content. One important module in the proposed framework is the module which plays an intermediary role between app servers of different applications. The target manager receives requests for targets from app servers. It keeps a database of the targets. Records of such database include ID, target name, target provider’s info, targets files, target platform, popularity, description, tags, subscribed clients, active connections, and a list of provided virtual contents. Content manager is keeping the record of information about contents provided by content developers or AR applications. When a new content is uploaded to the content manager, it has a record of information about the content and the targets that it supports. This record includes ID, the name of the content, the supported platform info, list of targets, popularity, description, price, tags, and a list of active connections. One of the services that subscribed clients would receive is the notification of a new content. Each time a new content is uploaded for a target, all the clients that have subscribed for that 63 Figure 3.3: Data flow diagram of the system target will be notified of the new update. When a user starts to put some comments on a target or content, the message is being sent under the profile of the AR application from which the end user is sending his message. This creates an active connection between the AR application as a client and the target that user is augmenting at that moment. Any new interaction is forwarded to all active connections. This way when a new comment is received in target chat room, all of the users that are augmenting that target would immediately receive the message. 64 Figure 3.4: Information flow for end user’s interaction 65 3.5 Main Processes So far, we introduced the target hub and gave an overview of the system and the main idea behind it. We also presented the entities interacting with the system, the main processes and information flow with a high-level data flow diagram. In the following sections, we get into the details of each main processes of the proposed architecture. We are trying to explain how each part of the architecture is cooperating to deliver a requested resource or service. Main processes of the system are shown in the Use case diagram in Figure 3.5. Communication between target hub and clients are HTTP messages. The details of the HTTP requests and responses are covered in API documentation section of the next chapter. Figure 3.5: Use cases of the target hub 66 3.5.1 Registering in Target Hub To get any service from the target hub, the service requester should register in the target hub first. The process starts by sending an HTTP request with specific parameters to the target hub’s address. Typically, a client would use an internet browser or a custom developed function in his server to generate and send the HTTP request. Target hub receives the request and checks the client database to see if there is any client by the provided info already registered. If there is no such client, client profile manager would create a new profile and returns the registration confirmation and a Token ID. The token is going to be used for all the future requests that are coming from that client. Aside from minor technical details, this covers the registration process that is depicted in the Activity diagram in Figure 3.6. Figure 3.6: Registration - Activity diagram 67 3.5.2 Sharing Targets and Contents and Subscription One main objective of the target hub is to create a bridge between AR resource providers and resource consumers. For this to happen, the resources (i.e. targets and contents) need to be uploaded to the target hub and shared. A registered client can start this process by an HTTP post request to which the target or content has been attached. Since the process of uploading and sharing a target and a content are fairly similar, we are only presenting the target sharing process for simplicity. A target can be uploaded without being shared. It would mean that other client can see that the target exist and they can read its description, but they cannot download and use it. However, when a target is not shared, it cannot be subscribed to notification services. If an AR app developer wants to share his targets and get notification whenever it is downloaded, or a new message has arrived for it, he will follow the process shown in Figure 3.7. Figure 3.7: Target Sharing - Activity diagram 68 3.5.3 Searching and Loading Targets Clients can search targets by their name which is a unique identifier, or by target’s tags which are assigned by its provider. The result of the search for a target would be a list of JSON objects. Each of the JSON objects is a target composed of information about a target’s name, size, platforms, shared status, etc. Clients might have different purposes for such a list of targets. One scenario can be an AR application’s end user who is browsing a target and tries to get some new contents for it. She would start her quest by entering some tags such as the keywords for the target she is augmenting. Those tags would be sent to the application server. When a request for a target comes from a client to its app server, the server looks for the target in its own local target DB. Generally, a target is stored in the form of patterns and a descriptive language such as XML. If there is no such target in the local target DB, the server generates a request for the target by wrapping the requested target info into JSON objects and sends it to the target hub. After authentication and authorization, the target hub returns the list of matched targets information. Now, the user knows about the targets that she can get. Furthermore, she can read some descriptions of the target and its correspondent contents, as well as price tags and reviews. Target hub’s clients can directly request to download a target, whether they will get the target depends on the sharing status of the target and the client’s authorization though. If the end user decides to download any of the provided targets, her request will reach to the AR app server first, in case the target does not exist in the local database, that request is relayed to target hub. Target hub updates the target’s record first. One record for each request should be logged, and then the requester client should be added to the subscribed list of the target. Then, the target’s information and all of the files (if there is any) are sent back to the requester (i.e. the application server). This process is illustrated in detail with an activity diagram in Figure 3.8. 69 Figure 3.8: Search and load target - Activity diagram 3.5.4 Chat Rooms and Communication Handling There are two types of interaction messages. The first one is between users, and the other is for chat rooms. Typically, the AR application sends both types of the messages to the target hub. Target hub uses the parameter settings of the message to determine if the message is meant for an end user of an application or a chat room of a target. A communication message is carrying information about the history of a chat such as the number of communicated messages, ID of the last message and some other information. All these information help the synchronizer to keep the history of the communicated messages synchronized between all of the communication parties. If the message is for a target chat room, message synchronizer updates the chat history of the target’s chat room and sends a new notification message to the subscribed clients. If the message is meant for an end user, the synchronizer updates the communication history of the two AR applications. Then, the communication module sends a notification to the destination AR application with the information of the source AR application. The 70 source is the sender application and the end-user (of the application) from whom the message is coming. The destination is the end-user that the message is meant for her. The process of this message passing has been depicted in Figure 3.9. Figure 3.9: Communication process - Activity diagram 71 3.5.4.1 Subscription and Notification Handling In the previous sections, we have used subscription and notification concepts without going into the details of their meaning. The general idea is to signal the client about the event in which the client is interested. The event can be the availability of a resource or information. Clients can be any of the AR application, target providers, content developers, or third parties. In general, signaling a client can have many forms such as setting a flag, sending a text message, or a Push notification service. The problem with flag setting techniques is the bandwidth and computation power waste. If the target hub sets a flag to signal the availability of a resource, the clients need to poll that flag in every particular period constantly. SMS might be a good idea, but not when the client is not a person and not for fast decision making situations. Pushing notifications seems like a good approach. However, we need an approach that supports all of our clients. So, at the implementation level, it should be decided to take one or a combination of these approaches for signaling system. In any case, the clients need to inform the target hub of their interest to be notified of the desired event that is called being subscribed to that event. For an application subscription, the best way would be providing application server address and to support some callback APIs for notification and message passing. Clients of the target hub can be subscribed to several types of events. These events and their meaning are as follows: • Target subscription: By subscribing to a target, the client allows the target hub to notify the subscriber if: – A new message arrives for the target’s chat room – A new content is available for that target – A new target with the same tags is available (less likely) – The target is downloaded • Content subscription: by subscribing to a content, the client will get notifications 72 if: – There is any update to the content by its owner such as its price – A new content for the same target is uploaded – The content is downloaded • Messaging subscription: only an application can be subscribed to this service. The application would need to provide a call back address with the parameters specified by the target hub. If any new message comes for an application, it will be authenticated and redirected to the provided address. 3.6 Specifications Here, we want to summarize the specifications and features of the proposed architecture. The followings are specifications of the framework along with reasons explaining their necessity and benefits. 1. Profile name is unique in the target hub Profile name should be unique to identify the clients. 2. Target’s name and content’s name are unique This is to create a way to address and identify AR resources in the target hub; there is a need for a unique identifier and a naming convention. Inspired by URL, Target’s name is composed of two parts, the target provider’s name and a given unique name by the provider. The same applies to the contents, but we are covering only targets for simplicity. Target provider’s name is its profile name, which is a unique name in the target hub. Using profile name for the targets allows all the targets from the same provider to be grouped under the same name. Grouping targets and contents enriches the search method and improves the accessibility of the targets and the contents. The same analogy is all the resources 73 of a Website has the common domain name. For instance, if there is a case that there are multiple targets and the user is only looking for the targets that are provided by a particular developer, the user can search the developer’s name to find only his products. Another scenario is when many contents are suggested for a user, and she is not familiar with the contents. There she would see the providers name before the content and it can be a useful piece of information. 3. Communications are authenticated by token After being registered in the target hub, the client receives a token. For all the future interactions that token is going to be sent to the target hub for authentication purpose. The client can reset his token by calling the correspondent API. 4. There can be multiple contents registered for a target This will give the option to the user to choose which contents she wants to get. From the developer perspective, it gives more opportunity to develop contents for the targets that already have a content for it. From target provider view, it creates a competitive environment among content developers to be able to sell their products that in turn helps a target provider to get the best from the situation. 5. A content is registered for only one target A content is subjected to constant updates to satisfy its users and is customized to its target. Therefore, it is not a good idea to assign a content to multiple targets. However, a content can be assigned to multiple targets under different names. 6. There is one chat room for every target For any target in the target hub; there can be a chat room in the target hub under the target’s name. However, the chat room is not created until target hub receives the first message for that target. 7. There is one chat record for any two AR application Any two AR applications have the option of supporting communication between their end users. Communication history between any two applications is stored 74 in target hub and is reachable by the applications. The notification system also supports this type of communication. However, the record in the database is not created until the first message is received from an application. 8. All database updates are going through the synchronizer Synchronizer generates correspondent changes that should be applied to keep the database consistent. All the necessary updates are sent to the update method in the controllers. The method applies the changes to the correspondent model. The reason of aggregating the changes to one method is that either all or none of the updates should be committed on the database. This keeps the database consistent. 3.7 Chapter Summary This chapter started by giving the basic idea of what is missing in AR and how we intend to solve it. We covered the goals of the thesis as it follows: • Bringing AR resources under a common framework and making it accessible to AR applications • Having bi-directional traffic to enable users in creating and sharing targets and contents • User communication, content review, polling, tagging, join and disjoin from AR social groups • Incorporating Web 2.0 in AR • Clarification of AR basic elements, namely content, and context In addition to the goals, we listed limitations of the current AR applications under four categories of user interaction and contribution, a limited number of targets, lack of contents and content sharing, and target and content naming convention. 75 We introduced client federated-server model as the main contribution of this thesis. Practical scenarios of this chapter show the efficiency of the proposed approach using which users would be benefited from AR more than before. We discussed the details of the main components of client federated-server model. Also, the static and dynamic relation of different modules were illustrated with class and activity diagrams. In the end, we listed some specifications of the proposed model. We showed the feasibility and the functionality of the proposed structure by implementing an application as a proof of concept. In the next chapter, we are introducing projects that include the implementation of our framework, an AR application that works effectively with the target hub and a subproject that shows how expiration tag can be implemented and exploited in the system. 76 Chapter 4 Scratcher - Proof of Concept The validity and capability of the proposed framework are demonstrated here through a prototype application that we have named it “Scratcher.” Scratcher can show main functionalities of the proposed architecture including user interactions, target searching, sharing functionalities, chatting, and notification capabilities. The application has been implemented on the most common platform of the current AR industry, which is smartphones. Although the proposed framework is a general idea that covers other types of platforms, the mobile augmented reality is in the focus of AR industry at the moment. We are also using the Android system, which again is a great portion of the smartphones industry. Scratcher has been designed in Unity using Vuforia plugin, and for the server side of the application, we have used ASP.net and Microsoft SQL Server for its database. Unity allows us to compile the application for different platforms including Android, IOS, and Windows. We have implemented target hub using Model View Controller (MVC) framework. We have used ASP.NET, C# and Microsoft SQL server in the implementation of the target hub. In the following, we are presenting the details of the whole system divided into three sections of target hub, app server, and the mobile application. We have tried to keep 77 each of these entities independent from each other at all levels of the implementation. This separation allows Adoptability and scalability of the proposed architecture. 4.1 Mobile Application Implementation We have implemented the application in Unity which is one of the most powerful and popular game engines. Unity 3D is a cross-platform environment which allows compiling the final product on different platforms such as Android, IOS, and Windows. Figure 4.1, shows the working environment of the Unity with a green box as a 3D model. This model represents a content that can be augmented on a target. In the lower side of Figure 4.1, scripts are red boxed. The scripts are codes that define functions and behaviors. By attaching a script to an object, we can give behavior to an object. A developer can choose any of the JavaScript or C# Script to develop his code. We used C# script because we were more familiar with it. Figure 4.1: Working environment of Unity 78 Unity alone does not support necessary functions for augmented reality. There are multiple plugins that we could use to support AR in Unity. We decided to use Vuforia which is a software development Kit (SDK) that supports augmented reality. Vuforia allows positioning a complex 3D object on an image targets. The client-server architecture and effective implementation of the application enable the application to request for a new target database, and the app server also is capable of responding and returning a new target database. The application also gives a way of searching and selecting the targets using their tags and names. The lowest level of interaction between the application and application server happens when the user starts to send and receive messages in the targets’ chat rooms. The hierarchy of the interactions between the application and the app server is shown in Figure 4.2. Figure 4.2: Interaction hierarchy between client and app server 4.1.1 How Does It Work? The application we have developed is using a client-server architecture. It interacts with the end user on one side and with the application server on the other side. As any 79 AR application, there should be a set of targets and some contents to be augmented over those targets. We have created a default database of targets that are going to be downloaded on the device as soon as a user logs into the application. After the user runs the application, she is going to be asked for her username and password (shown in Figure 4.3a). The application has been designed to check for internet connectivity and username and password matching with proper prompting messages to guide the user (Figure 4.3b). As soon as the first page loads, the application starts to ping the app server. If the server is not reachable, the user will be notified to check internet connectivity, or the app server is down. (a) Username checking (b) Password checking Figure 4.3: Log in page of the Scratcher By clicking on the start button, the next page loads, which is an AR camera. As soon as the page loads, a Co-routine starts to download the default database of the 80 application. Now user can start to browse the objects and particularly images to see the augmented models upon them. The whole process of activating the AR scene has been illustrated in the activity diagram shown in Figure 4.4. Vuforia has some very popular benchmark targets namely: stones, chips, and tarmac which we used them in our demo execution. In the scenario that we are presenting here, we are showing that the application can detect the chips target and augment a red box upon it. However, in the same scene, there is another target (the stones target) that the application is not showing any reaction to it, meaning it is not being detected (shown in Figure 4.5). In the scene shown in Figure 4.5, if the user clicks on the scene, she would enter to the chat room of the chips target which we will discuss about chat room functionality later. If the user decides to find a target, all she needs to do is to get into the search page. To load the search page, the user needs to click any place on the AR scene when there is no target has been detected. There she can enter the appropriate tags and search for a target. The list of the targets is loaded to the drop-down menu under search button. If there is any target returned, she can select that target and load it. She also can go back to AR scene without loading any target or quit the application from this page. The tags that the user enters work with a logic of “OR,” so if any of the tags hits a target, the target is going to be listed. Figure 4.6 shows that the user has searched “lab” as a tag and the target named “tarmac” has been retrieved from the app server. Also, the user has selected the retrieved target (tarmac), and it is ready to be loaded. By clicking on “Load Target” button, the application sends an HTTP request to the app server requesting the target. The important point is that the AR application and end user are not aware of the source of the targets, whether it is the app server or the target hub. As soon as the new target is downloaded, the app can detect both of the targets as shown in Figure 4.7. The Tarmac target is on the left side upon which a green 81 Figure 4.4: Activating the AR scene sphere is overplayed. 82 Figure 4.5: Chips has been detected but stones not Figure 4.6: Target search and load page 83 Figure 4.7: Chips and Tarmac both have detected 4.1.2 Chat System The main idea behind the chat room for a target is to connect users that are augmenting a common target. For instance, if two persons are looking at a similar target, they should be able to start a communication (shown in Figure 4.8). When the user clicks on a target, the chat room of the target will be loaded. Implementing and managing a chat room can have multiple aspects, including sending and receiving a message, storing and retrieving the messages, synchronizing chat history among the parties of the communication, and notifications. We are covering these aspects in the following sections. 4.1.3 Storing and Retrieving All chatting history are stored in an XML file, in the local device under the target’s name. A sample chat history for a target with the name of tarmac is shown in Figure 4.9. The number of messages stored in the XML file is three, and the content of the last communicated message is also stored in the XML file. This information is going to be 84 Figure 4.8: Connecting by a common target used for synchronization purpose. We understand that if two chat files get corrupted, and they both have the same number of messages and the same last message, our implementation is not able to detect that corruption and does not fix it. However, we think this is good enough considering that we are implementing only a prototype and the probability of such corruption scenario. By entering the chat room of a target, the communication history is loaded into a list of messages where each message becomes an object of the message class shown in Figure 4.10. Then the chat room page loads on the screen and the user see the messages and can send a text message. We are loading all of the chat histories in a scrolling view, but it is possible to load only recent “n” number of messages (Figure 4.11A). 85 Figure 4.9: Chat history of tarmac Figure 4.10: Class of Message 4.1.4 Sending and Receiving a Message When a user sends a message the color of the message is gray, and the font size is smaller than other messages until the message reaches to the server and an acknowledge message is received in the app. Figure 4.11B shows a “test” message that has not been acknowledged yet. We are using the WWW class of the Unity to send our chat data to the server. This 86 Figure 4.11: Chat room scene class can be used to send both GET and POST requests. A chat message has multiple fields including Message type, Target name, sender, receiver, body, the last message of the chat history, number of messages in the history, and other controlling fields. To receive a message, one way is to poll the server to see if there is any new message has arrived. Polling is sending a message to the server every short period. In response, the server replies an update message if there is any new message or replies an empty message in case there is no new message available for the client. This is a resourceconsuming method, and it is not real-time. The other approach is to use push technology in which the server sends the update to the client as soon as there is any update available. The problem with this technology is either they are not supported across all platforms 87 or too much complex for our prototype system. We have implemented our notification system using long polling which is in between of the simple polling and server push technology. Long polling is not a push technology. However it emulates the push mechanism, and it has more flexibility in supporting Https and security policies. The sequence diagram in Figure 4.12 shows the difference of polling and long polling. With long polling, we have real-time communication, simplicity, and flexibility at the same time. However, there is a slight cost regarding traffic and reconnection handling. Figure 4.12: Polling VS Long polling 4.2 Server Side Implementation The server is a web application that communicates with the AR application from one side and the target hub on the other side. The interactions between the app server and 88 the target hub are in a three-level hierarchical model that is shown in Figure 4.13. The first level is the interface which is about registration and authentication. The second level is target handling, and the lowest level is the communication that happens over each target. Figure 4.13: Interaction hierarchy between app server and target hub We have implemented our server using ASP.Net. To show that target hub is truly capable of connecting to different AR application and its users, we are using two different app servers each of which has its own AR application that is interacting with its correspondent server. We have uploaded our servers to the Microsoft Azure platform on the following two addresses: Server A: http://arapp.azurewebsites.net Server B: http://gece-ar.azurewebsites.net The server is using web methods to interact with the AR application. The first method that an AR application calls after launching is ping method. This method is used to check if the app server is reachable. The next method is about checking for username and password. Figure 4.14 shows the methods of “Ping” and “CheckPass.” 89 Figure 4.14: Web methods of the app server The server is using “WebClient Class” class to implement the interaction methods. The first interaction of any server with the target hub would be registration. The target hub needs to receive three parameters of a server name, a server identifier, and a server address. We are using the server address as a callback address. The target hub is going to use this address to send its requests and notifications. The target hub returns an ID to the server. The returned ID along with the server name is going to be used to authenticate this server in the target hub for all future interactions. It is possible that the application developer wants to register his server in target hub twice with different configurations. However, if he intends to use the same server name and the same server address, all he needs to do is to register in the target hub again with the same server name and server address but with a different identifier. It will let the target hub to generate another unique ID for this server, and the target hub would keep two separate profiles with similar server name and address but with different configurations. This feature allows the app developer to categorize his users 90 and provide profile based service to the users. For instance, the request of free service users is going to be sent to the target hub with an ID different from paid service users. In the target hub, requests authorization is handled based on the profile of the requestors (i.e., ID of the users). Therefore, if there is any service that has costs for the app owner, he will not be charged for his free user’s access. Implementation of the registration function is shown in Figure 4.15. Figure 4.15: Registration method of the app server If a user wants a target that does not exist on his app, he is going to search for the target using some keywords as tags for that target. The server of the app is going to search those tags in its target database and also forwards the request to the target hub. The target hub replies the name of the targets that has any of those tags. Then, the server replies all the found targets whether local targets or targets on the target hub to the app. Implementation of the search request from app server to the target hub is shown in 91 Figure 4.16. If the user chooses to download any of the targets, the target is downloaded first to the server (in case it is on the hub) and then it is forwarded the application. Figure 4.16: Requests target hub for list of targets The last and important part of the server side implementation is about chatting module. The server side of chatting module has been implemented in three sections. The first part is an interface that interacts with the application. The second part handles updating the communication histories, subscriptions, and notifications that as discussed before has been implemented by long polling technology. The last part is responsible for forwarding messages to the target hub if it is needed. The server is expecting to receive two types of messages from the application namely “last message” and “sent message.” “Sent message” informs the server that a new message is coming from the app server meaning that one of the users has sent a new text message. The server updates the communication history of that chat room and then sends the update message to all subscribers of that target using the long polling technology. If the target is shared with the target hub, an update message needs to be sent to the target hub. This message refreshes the chat history in the target hub. Figure 4.17 shows how the app server sends the new message to the target hub. The “Last message” is the long polling’s request message which carries the last 92 Figure 4.17: App server forwards update message to the target hub communicated message of in a chat room. This message requests the server for any update. The server is not going to reply to this message unless there is any update for this request. Otherwise, the server is going to keep the request until it expires. The expiration time for an update request is a predefined and equal amount of time in the server and the application. When the application sends an update request, it sets the expiry timer for that request. The same way, when the server receives an update request, it replies the update if there is any new message. Otherwise, it sets the expiry timer. The activity diagram in Figure 4.18, shows the whole chatting process that expands from the user to AR app, app server, and the target hub. 4.3 Target Hub Implementation Target hub is the heart of our proposed architecture. Surely, there can be many ways to implement an idea. The most important factor in our implementation was to apply the necessary functions of the architecture to show that it is feasible (practical and functional). We have used MVC framework in the implementation of the target hub which emphasizes separation of the model (M) as data of the application from control (C) as the logic of the application, and these two are separated from view (V) as the interface 93 Figure 4.18: Activity diagram of the chatting system of the application. We do not have a view for our target hub, but methods in controllers are exposing the APIs necessary to interact with the hub. Our prototype target hub is being hosted on Microsoft Azure platform on the following address. Target hub address: http://arconnect.azurewebsites.net To manage models of the system without being concerned about underlying database tables and columns, we have adopted the entity framework. Entity framework is part of .Net framework, although it has been separated from .Net after version 6. With the entity framework adopted, our main concern is the logic of the application that how data is processed and manipulated and what are the relations between entities. For instance, when an update happens to an object of a model the entity framework will update the database accordingly. Main models in the target hub are target, server, tag, subscription, target request, 94 target request type, server request, and server request type. We have not implemented content because the mechanism that is managing the target works the same way for the content. Therefore we have only implemented the target management and model of the target. The models are represented as classes, and for each class there is a table in the database. The relation of the entities is shown in Figure 4.19. For each target, there can be multiple tags. The tags are used as keywords to search a target. There can be many ways to search a target. However, there are various formats for different targets. Therefore the only uniform searching method for targets would be using tags. A tag is nothing but a label attached to a target. For each target, there can be multiple incoming requests. To generate reports and keep track of the targets, we want to log each request. Also, there are different types of requests. The server is the other important model of the system. Each server can subscribe to multiple targets, and multiple servers can subscribe to one target. Therefore, we have to normalize the relationship by using a subscription model. The subscription helps when an update happens on a target. For example, if a new message is received for a chat room of a target, all of the subscribed servers need to be notified of the new message. Each server also can have multiple requests with different types. Request types include download, upload, register, unregister, update, message, etc. The controllers for the models are responsible for the user interaction and working with models. We have implemented the APIs in controllers. For example, to register a server in the target hub, a request should be sent to the controller of the server. Register method in the server controller will check the parameters first and then adds the server to the list of servers. The code for register methods in Figure 4.20 shows how the server controller exposes register API and handles the request. The other important controller of the system is target controller. Which exposes APIs such as “GetTargets,” “Download,” “Upload,” and “ForwardMessage.” One of the important modules of the target hub is handling the messages of the chat rooms which we have discussed in the previous sections. Figure 4.21 shows the forward message 95 Figure 4.19: Entity relationship model of the target hub method with Identifier and ID to authenticate the request and the target name to find the target, the username of the sender, and the body of the message as input parameters. 96 Figure 4.20: Register method in server controller The method checks if the server has been registered or not. Then, it finds the right target in the database, and finally, it forwards the message to each of the target’s subscribed servers. The method replies by a string showing the result of the request. Users of the Scratcher application on server A can communicate with users of a different application on server B because of message handling of the target hub. So far we discussed the implementation of the all three levels of our implementation that includes the Scratcher which is an AR application, the AR application server, and the target hub. We showed the architecture, activity diagram, entity relationship and source code of the implementation whenever it was necessary. 97 Figure 4.21: Forwarding a chat message to the subscribers 4.4 Web APIs As discussed before, app servers and the target hub are interacting using http requests. Here, we are giving the list of APIs exposed by the target hub in our prototype implementation. The target hub APIs are accessed on http://arconnect.azurewebsites. net/api. we are using ‘show targets’ given in Table 4.1 in order to look for targets using tags. Table 4.1: Document of searching for targets in the target hub Title: URL: Method: URL Parameters: Data Parameters: Success Response: Error Response: Sample Call: Notes: Show targets /Target/GetTargets? Identifier=:identifier&ID=:id GET Identifier = [integer], ID = [integer] None Code: 200 Content: {1 : “stones”, 2 :,“tarmac”} Null Target/GetTargets?Identifier=7568123&ID=20&tags[]=lab&tags[]=park Shows name of the targets that has any of the tags specified in the parameters. When we have the target name we can start downloading the target using ‘Download 98 Target.’ Details of the API is given in Table 4.2. Table 4.2: Document of downloading a target from the target hub Title: URL: Method: URL Parameters: Data Parameters: Success Response: Error Response: Sample Call: Notes: Download target /target/Download? Identifier=:identifier&ID=:id&TargetName=:name,&format=:format GET Identifier = [integer], ID = [integer], TargetName,= [string], format = [string] None Code: 200 Content: [file data] Code: 404 Code: 400 Content: {Message : “This File does,not exist in the Hub!”} Content: {Message : “Server is not registered!”} /target/Download?Identifier=7568123&ID=20&TargetName=tarmac&format=xml Downloads a target with specified name,and format Users, developers, and applications are capable of uploading targets to the target hub using ‘Upload target’ API. Details of the API is explained in Table 4.3. Table 4.3: Document of uploading a target to the target hub Title: URL: Method: URL Parameters: Data Parameters: Success Response: Error Response: Sample Call: Notes: Upload target /target/Upload? Identifier=:identifier&ID=:id&TargetName=:name,&tags[]=:tag POST Identifier = [integer], ID = [integer], TargetName,= [string], tags[] = [array string] File: [Media type file] Code: 200 Content: {Message : “Target added”} Code: 400 Content: {Message : “Server is not,registered!”} target/Upload?Identifier=7568123&ID=20&TargetName=bottle& tags[]=glass&tags[]=sport Uploads a target with specified name, tags,and the attached file Sending and receiving text messages is supported by ‘Message passing’ API. Details of the API is explained in Table 4.4. All of the services in the target hub is only provided for the registered clients. We are using API ‘Register a server’ in order to register to the target hub. Necessary information for the API is provided in Table 4.5. Content and Context Provider can use the target hub to collect information for their application purposes. In this regard, we have provided an API called ‘GetServers’ that lists name of the servers that have already registered in the target hub. Details and 99 Table 4.4: Document of message passing in target hub Title: URL: Method: URL Parameters: Data Parameters: Success Response: Error Response: Sample Call: Notes: Message passing target/ForwardMessage?Identifier=:identifier&ID=:id&TargetName=:name,& UserName=:username&SentMessage=:message GET Identifier = [integer], ID = [integer], TargetName,= [string], UserName = [string], SentMessage = [string] None Code: 200 Content: {Message : “Message,forwarded”} Code: 400 Content: {Message : “Server is not registered!”} target/Upload?Identifier=7568123&ID=20&TargetName=bottle& UserName =rahim& SentMessage=Hi Receives a message from an application,server under a target’s name and forwards it to all servers subscribed for,the specified target. Table 4.5: Document of registration to target hub Title: URL: Method: URL Parameters: Data Parameters: Success Response: Error Response: Sample Call: Notes: Register a server /server/register?server=:servername&Identifier=:identifier&Address=:address GET Identifier = [integer], servername = [string],,Address = [string] None Code: 200 Content: {Message : “ID:id”} Code: 400 Content: {Message : “server name or id is null!”} /server/register?server=ServerA&Identifier=123354 &Address=http://gece-ar.azurewebsites.net Registers a server to the target hub,under the specified name and address and returns the id of the server. This,id is going to be used as a token for future requests. sample call of the API is provided in Table 4.6. 4.5 Expiration and Activation Tags Previously, we discussed that the burst of the targets and contents should be controlled in the proposed architecture. One way is to keep a target or a content in the hub only until it expires. Several AR architectures and platforms such as ARML, ARGON, Wikitude, are using tags very similar to XML tags to manage virtual contents. A tag is typically describing an attribute of a content including its type, ID, location, orientation, etc. 100 Table 4.6: Document of showing the servers of the target hub Title: URL: Method: URL Parameters: Data Parameters: Success Response: Error Response: Sample Call: Notes: Show all registered servers /server/GetServers GET Identifier = [integer], servername = [string],,Address = [string] None Code: 200 Content: [ {“Identifier”: “7568123” “Name”: “ServerB” “Requests”: [ { “Id”: 18 “Type”:“Register” } ] } { “Identifier”:“123354” “Name”: “ServerA” “Requests”: [ { “Id”: 19 “Type”:“Register” } ] } ] Null /server/getservers Shows the list of all registered servers,with all of the requests they have made. In this section, we are proposing two essential attributes that many AR contents would need it if it comes to deal with time. We are naming them “Activation Tag” and “Expiration Tag.” Although these concepts can be found in several works, for instance, a time tag on a network packet. However, these concepts have been overlooked in the architectures for AR target’s and content’s format. • Activation Tag This tag shows the time after which the content is enabled. Before this time the content will be treated as if it does not exist. • Expiration Tag This tag shows the time after which the content is disabled. After this time the 101 content will be treated as if it does not exist. We implemented both of the expiration and activation tags to show its feasibility and functionality. We developed a component for the “Activation” and “Expiration” tags. As soon as the component is added to the asset, it is possible to set the times. There is an enable check mark for each tag using which user can disable or enable the tag. 4.5.1 Test Case We have a cube as an AR content with an Activation tag for a time in the future. The exact time in the future is: 12/29/14 15:010. We also have a sphere with the expiration time tag 12/29/14 15:11. In this scenario, the image targets are stones and chips models that are used in the Vuforia SDK sample examples. Figure 4.22 shows that the target has been found, and the sphere has been augmented, but the box is not in the scene. The reason is the activation tag for the box has been set for 15:10, but the time is 15:09. There are many ways to implement time triggered events. We have used Unity’s Update method to check the time in every scene update. In Figure 4.23, time is 15:10 and we can see that the box has appeared in the scene. Update method in Unity is invoked on every frame. Therefore expiration of the contents will be checked and detected with the frame speed. As soon as a content expires, it disappears from the scene. Figure 4.24 shows that at 15:11, the sphere disappears because it has passed its expiration time and the sphere is no longer in the scene. 4.6 Chapter Summary In this chapter, we showed how we are validating our work by implementing a software and the proposed framework as a proof of concept. The details of the implementation 102 Figure 4.22: Only sphere is in the scene Figure 4.23: Both of sphere and cube are in the scene 103 Figure 4.24: Sphere expired and disappeared for each main module was discussed. We also covered the technologies used in the implementation such as MVC framework, long polling and simple polling, Post and Get Http methods. We implemented the proof of concept application using Web APIs for which the necessary documentation was provided. The next chapter is concluding the thesis by providing a summary of the research objectives and research results, the limitations of our work, and future directions. 104 Chapter 5 Conclusion and Future Directions Augmented reality is about superimposing contextually-relevant information onto the real world. The technology has consumed researchers’ and developers’ imaginations for a long time. In recent years, we witnessed rises and falls in different aspects of AR. We saw the problems of head-worn devices such as Google Glass and Microsoft HoloLens, despite the initial excitement, and also the rise of hordes of Pokémon monsters. However, the question of what AR experience people would want or need is even now an open question [51]. Still, Developers and end users cannot properly benefit from AR due to the proprietary formats, lack of standards, and structural problems. Also, the existing AR architectures generally are not designed to enable users’ contribution. This prevents the wide-spread adoption of AR. User contribution, on the other hand, has been the center of attention for many internet based services including social networks and social media. In fact, user contribution is one the basic pillars of Web 2.0. The ability to involve the users in creating and sharing AR resources to enrich the AR application’s experience forms the premises of investigation of this thesis. The thesis intended to answer these research questions: • What kind of software models and protocols would enable user A to browse any subset of the targets {TB1 , TB2 ..., TBn } which user B is augmenting? 105 • What kind of software models and protocols would make it possible for user A to send and receive messages from user B and vice versa? The attempt to answer these research questions led us to design a new architecture for implementing the AR technology, which we named “Client Federated Servers.” Having user contribution in mind, we designed the client federated server model to be capable of handling AR resource sharing and user communication. Using the proposed model, users can share their targets and communicate with each other. Client federated server architecture is using Web APIs to handle the requests which makes it platform independent. To demonstrate the validity and feasibility of the proposed architecture, we developed a mobile application called “Scratcher.” Scratcher allows AR users to have communication about a target as a focal point. Users can share their experience in targets’ chat rooms. Also, using Scratcher, the targets of different applications can be shared and augmented among Scratcher users. 5.1 Limitations Although we have tried to cover drawbacks of the previous works, there are still a number of shortcomings in the proposed architecture. We do not think of these issues as trivial matters. However, we believe that the system is functional enough to be adopted and implemented. We are dividing these limitations to the boundaries of the AR application and the obstacles of functionality. The areas of concern are as follows: (A) Participatory AR experience A Participatory AR application is an AR experience with multiple user’s collective interaction environment [60]. Such interactions include the interaction of the user with other users, physical aspects of the target, and the virtual content 106 superimposed over the target. Target hub supports user-level communication. However, interactions between users and contents is a problem for future work. (B) Content level interaction To the best of our knowledge, content interaction has not been scrupulously examined among AR contents with different formats. A standard way of content interaction among both contents with same framework and contents belong to different frameworks is needed. By content interaction, we mean the ability of AR contents to communicate with each other independent of a user’s intervention. For more clarification, imagine a virtual ball is moving in one AR application, and there is a wall in another application’s environment in the trajectory of the ball of the previous application. The ball should be able to hit the wall and redirect without any outside intervention. With current AR frameworks, the content interaction would be possible only on proprietary contents. (C) Redeveloping problem Target hub gives a way to share targets and contents, but in the end, a content will be used on the platform for which it is been developed. Hence, a popular target or content needs to be redeveloped to each desired platform. A standard for all of the targets and a standard for contents yet remains an unresolved issue. Regarding the obstacles of functionality, there are few potential problems for the proposed framework that should be taken into consideration. (A) Content and target management We think the most significant potential problem that the new design can introduce is the burst of increase of targets and contents. Especially, most of these targets and contents are going to have temporal usage for users. This increase will have two impacts on servers. Firstly, it will slow down both servers while looking up for the targets and contents, and also the overall response time of the system. 107 Secondly, servers will have memory problem. This can damage the scalability of the system, too. One solution for the problem can be implementing the targets and contents with expiration dates. Targets and contents that are expired are going to be deleted from the system. Purging the unnecessary data would effectively decrease the volume of the contents and targets. (B) Increase of the traffic With the new design, there is going to be an increase in the network traffic. The traffic induced by target detection request on the target hub as well as the traffic coming from the client due to search request should be effectively managed. One solution for managing traffic on the target hub is to have a distributed structure for the hub. A hierarchical tree structure for the target hub similar to Domain Name System (DNS) can resolve these requests for a target in the lower levels of the tree rather than relaying them to the root servers. (C) Adaptability If the proposed framework is not adaptable with current frameworks, it can be rejected from the business sector and also it will be an island of its own. Adaptability is the capability of supporting current frameworks in the sense of software and hardware. The proposed framework should not require radical changes in the hardware. What we are looking for is to build on top of the existing frameworks and increase functionality. Therefore, we are using web APIs to be able to support our clients with minimal adoption effort. 5.2 Future Work The contributions of this work could be further improved in the following areas: • Participatory AR has not been supported. 108 To enable participatory AR, complex behaviors of the targets and contents should be supported in target hub. This needs further investigation. • Platform independent targets and contents are lacking. One way of implementing platform independent targets and contents is using a middle-ware such as Java Virtual Machine (JVM) that would reside on the clients’ machine. The middle-ware should be able to render the targets and contents that are shared by all other users and developers. • Social network in AR has not been implemented. The main aspect of social network is the ability to traverse the graph of friends. We did not incorporate all of the aspects of social network in the thesis. It would be exciting to see a full integration of social networking in AR. 109 Bibliography [1] Arml 2.0 swg, http://www.opengeospatial.org/projects/groups/arml2.0swg. [2] Augmented reality sdk comparison, http://socialcompare.com/en/comparison/ augmented-reality-sdks. [3] Aurasma, https://www.aurasma.com/. [4] Foursquare, https://foursquare.com/. [5] Glympse, http://www.glympse.com/. [6] Layar, https://www.layar.com/. [7] Life360, https://www.life360.com/. [8] Locimobile, http://www.locimobile.com/. [9] Microsoft hololens, https://www.microsoft.com/en-us/hololens. [10] Pokémon go, http://www.pokemongo.com/en-ca/. [11] Gregory D Abowd, Anind K Dey, Peter J. Brown, Nigel Davies, Mark Smith, and Pete Steggles, Towards a better understanding of context and context-awareness, vol. 40, pp. 304–307, 1999. [12] David L. Altheide and Robert P. Snow, Media logic and culture: Reply to oakes, International Journal of Politics, Culture and Society 5 (1992), no. 3, 465–472. [13] Dhiraj Amin and Sharvari Govilkar, Comparative study of augmented reality sdk’s, International Journal on Computational Science and Applications 5 (2015), no. 1, 11–26. [14] Sally A. Applin and Michael D. Fischer, Toward a multiuser social augmented reality experience: Shared pathway experiences via multichannel applications, IEEE Consumer Electronics Magazine 4 (2015), no. 2, 100–106. [15] R. Azuma, Y. Baillot, R. Behringer, S. Feiner, S. Julier, and B. MacIntyre, Recent advances in augmented reality, IEEE Computer Graphics and Applications 21 (2001), no. 6, 34–47. [16] Ronald T. Azuma, A survey of augmented reality, Presence: Teleoperators and Virtual Environments 6 (1997), no. 4, 355–385. 110 [17] Evan Barba, Blair MacIntyre, and Elizabeth D. Mynatt, Here we are! where are we? locating mixed reality in the age of the smartphone, Proceedings of the IEEE 100 (2012), no. 4, 929–936. [18] Eugene Barsky and Michelle Purdon, Introducing web 2.0: social networking and social bookmarking for health librarians, Journal of the Canadian Health Libraries Association 27 (2006), no. 3, 65–67. [19] Ulysses Bernardet, Sergi Bermdez i Badia, and Paul FMJ Verschure, The experience induction machine and its role in the research on presence, pp. 329–333, 2007. [20] Alex. Berson, Client/server architecture, McGraw-Hill, 1996. [21] Mark Billinghurst and Andreas Duenser, Augmented Reality in the Classroom, Computer 45 (2012), no. 7, 56–63. [22] Oliver Bimber and Bernd Frohlich, Occlusion shadows: using projected light to generate realistic occlusion effects for view-dependent optical see-through displays, pp. 186–319, IEEE Comput. Soc, 2002. [23] danah m. boyd and Nicole B. Ellison, Social network sites: Definition, history, and scholarship, Journal of Computer-Mediated Communication 13 (2007), no. 1, 210– 230. [24] E.F. Churchill and C.a. Halverson, Guest editors’ introduction: Social networks and social networking, IEEE Internet Computing 9 (2005), no. 5, 14–19. [25] Davide De Chiara, Luca Paolino, Marco Romano, Monica Sebillo, Genoveffa Tortora, and Giuliana Vitiello, Link2u: Connecting social network users through mobile interfaces, vol. 6298 LNCS, pp. 583–594, 2010. [26] Jos van Dijck and Thomas Poell, Understanding social media logic, vol. 1, Aug 2013. [27] Joan DiMicco, David R Millen, Werner Geyer, Casey Dugan, Beth Brownholtz, and Michael Muller, Motivations for social networking at work, no. April 2016, pp. 711–720, ACM Press, 2008. [28] Yong-Yi Fanjiang, Shih-Chieh Lin, and Yu-Zuo Lin, Design of an augmented reality application framework to mobile device, pp. 177–179, IEEE, Aug 2012. [29] George W. Fitzmaurice, Situated information spaces and spatially aware palmtop computers, Communications of the ACM 36 (1993), no. 7, 39–49. [30] Mauricio A. Frigo, Ethel C. C. da Silva, and Gustavo F. Barbosa, Augmented reality in aerospace manufacturing: A review, Journal of Industrial and Intelligent Information 4 (2016), no. 2, 125–130. [31] Henry Fuchs, Mark a Livingston, Ramesh Raskar, D’nardo Colucci, Kurtis Keller, Andrei State, Jessica R Crawford, Paul Rademacher, Samuel H Drake, and Anthony a Meyer, Augmented reality visualization for laparoscopic surgery, pp. 934–943, 1998. 111 [32] Stephan Gammeter, Alexander Gassmann, Lukas Bossard, Till Quack, and Luc Van Gool, Server-side object recognition and client-side object tracking for mobile augmented reality, no. C, pp. 1–8, IEEE, Jun 2010. [33] Alida Gersie, Earthtales: storytelling in times of change, Green Print, 1992. [34] Jens Grubert and Raphael Grasset, Augmented reality fo randroid application development learn how to develop advanced augmented reality applications for android, Packt Publishing, 2013. [35] Jens Grubert, Tobias Langlotz, and R Grasset, Augmented reality browser survey, Technical Report (2011), no. ICG-TR-1101. [36] Xiaoling Gu, Lidan Shou, Hua Lu, and Gang Chen, A generic framework for cyberphysical web, Proceedings of the First International Workshop on Middleware for Cloud-enabled Sensing - MCS ’13 (2013), 1–6. [37] Anders Henrysson, Mark Billinghurst, and Mark Ollila, Face to face collaborative ar on mobile phones, vol. 1, pp. 80–89, IEEE, 2005. [38] Alex Hill, Blair MacIntyre, Maribeth Gandy, Brian Davidson, and Hafez Rouzati, Kharma: An open kml/html architecture for mobile augmented reality applications, p. 233234, IEEE, Oct 2010. [39] Thuong N. Hoang, Shane R. Porter, Benjamin Close, and Bruce H. Thomas, Web 2.0 meets wearable augmented reality, Proceedings - International Symposium on Wearable Computers, ISWC (2009), 151–152. [40] Tobias Hollerer, Dieter Schmalstieg, and Mark Billinghurst, Ar 2.0: Social augmented reality - social computing meets augmented reality, pp. 229–230, IEEE, Oct 2009, Important Paper-IntroductionBackgroundRelated work. [41] Daesung Jang, Joon-Seok Kim, Ki-Joune Li, and Chi-Hyun Joo, Overlapping and synchronizing two worlds, Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems - GIS ’11 (2011), 493–496. [42] Rudolph Emil Kalman et al., A new approach to linear filtering and prediction problems, Journal of basic Engineering 82 (1960), no. 1, 35–45. [43] Jang Mook Kang and Bong Hwa Hong, A study on the sns (social network service) based on location model combining mobile context-awareness and real-time ar (augmented reality) via smartphone, Communications in Computer and Information Science 184 CCIS (2011), no. PART 1, 299–307. [44] Andreas M Kaplan and Michael Haenlein, Users of the world, unite! the challenges and opportunities of social media, Business Horizons 53 (2010), no. 1, 59–68. [45] H. Kato, M. Billinghurst, I. Poupyrev, K. Imamoto, and K. Tachibana, Virtual object manipulation on a table-top ar environment, pp. 111–119, IEEE, 2000. 112 [46] Wee Sim Khor, Benjamin Baker, Kavit Amin, Adrian Chan, Ketan Patel, and Jason Wong, Augmented and virtual reality in surgerythe digital surgical environment: applications, limitations and legal pitfalls, Annals of translational medicine 4 (2016), no. 23. [47] Greg. Kipper and Joseph. Rampolla, Augmented reality : an emerging technologies guide to ar, Syngress, 2012. [48] G. Klinker, R. Reicher, and B. Brugge, Distributed user tracking concepts for augmented reality applications, pp. 37–44, IEEE, 2000. [49] Timo Koskela, Nonna Kostamo, Otso Kassinen, Juuso Ohtonen, and Mika Ylianttila, Towards context-aware mobile web 2.0 service architecture, Mobile Ubiquitous Computing, Systems, Services and Technologies, 2007. UBICOMM’07. International Conference on, IEEE, 2007, pp. 41–48. [50] Martin Lechner, Arml 2.0 in the context of existing ar data formats, pp. 41–47, IEEE, Mar 2013. [51] Peter Lee, The 50 years of the acm turing award celebration, https://www.facebook. com/AssociationForComputingMachinery/videos/10154936964433152/, 2017, Accessed 06/26/17. [52] Jing Li, The design of context-aware service system in web 2.0, Advances in Technology and Management (2012), 145–152. [53] Lara Lomicka and Gillian Lord, Introduction to social networking, collaboration, and web 2.0 tools, The Next Generation: Social Networking and Online Collaboration in Foreign Language Learning (2009), 1–12. [54] Martin Lopez-Nores, Yolanda Blanco-Fernandez, Alberto Gil-Solla, Manuel RamosCabrer, Jorge Garcia-Duque, and Jose Juan Pazos-Arias, Leveraging short-lived social networks in museums to engage people in history learning, pp. 83–88, IEEE, Dec 2013. [55] Blair MacIntyre, Alex Hill, Hafez Rouzati, Maribeth Gandy, and Brian Davidson, The argon ar web browser and standards-based ar application environment, pp. 65–74, IEEE, Oct 2011. [56] Wendy E Mackay, Augmenting reality: A new paradigm for interacting with computers, La Recherche (1996), no. Mar, 13–21. [57] , Augmented reality: linking real and virtual worlds: a new paradigm for interacting with computers, pp. 13–21, ACM Press, 1998. [58] S. Malik, C. McDonald, and Gerhard Roth, Hand tracking for interactive pattern-based augmented reality, pp. 117–126, IEEE Comput. Soc, 2002. [59] Paul Milgram, Haruo Takemura, Akira Utsumi, and Fumio Kishino, Augmented reality: a class of displays on the reality-virtuality continuum, vol. 2351, pp. 282–292, Dec 1995. [60] Yun Tae Nam and Je-ho Oh, Participatory Mixed Reality Space: Collective Memories, 2016 IEEE International Symposium on Mixed and Augmented Reality (ISMARAdjunct), IEEE, sep 2016, pp. 353–354. 113 [61] Tim O’Reilly, What is web 2.0 - o’reilly media, 2005. [62] Jun Park, Suya You, and Ulrich Neumann, Natural feature tracking for extendible robust augmented realities, IEEE Transactions on Multimedia 1 (1999), no. 1, 53–64. [63] Jana Pejoska, Merja Bauters, Jukka Purma, and Teemu Leinonen, Social augmented reality: Enhancing context-dependent communication and informal learning at work, British Journal of Educational Technology 47 (2016), no. 3, 474–483. [64] Wayne Piekarski and Bruce Thomas, ARQuake: the outdoor augmented reality gaming system, Communications of the ACM 45 (2002), no. 1, 36–38. [65] Muriel Pressigout and Eric Marchand, Hybrid tracking algorithms for planar and non-planar structures subject to illumination changes, Mixed and Augmented Reality, 2006. ISMAR 2006. IEEE/ACM International Symposium on, IEEE, 2006, pp. 52–55. [66] H. Regenbrecht, C. Ott, M. Wagner, T. Lum, P. Kohler, W. Wilke, and E. Mueller, An augmented virtuality approach to 3d videoconferencing, pp. 290–291, IEEE Comput. Soc, 2003. [67] Derek F. Reilly, Hafez Rouzati, Andy Wu, Jee Yeon Hwang, Jeremy Brudvik, and W. Keith Edwards, Twinspace: an infrastructure for cross-reality team spaces, pp. 119– 128, ACM Press, 2010. [68] Jun Rekimoto, Navicam:a magnifying glass approach to augmented reality, Presence: Teleoperators and Virtual Environments 6 (1997), no. 4, 399–412. [69] Jun Rekimoto and Katashi Nagao, The world through the computer: Computer augmented interaction with real world environments, pp. 29–36, ACM Press, 1995. [70] Dieter Schmalstieg, Tobias Langlotz, and Mark Billinghurst, Augmented reality 2.0, pp. 13–37, Springer Vienna, 2011. [71] Y. Shen, S.K. Ong, and A.Y.C. Nee, Augmented reality for collaborative product design and development, Design Studies 31 (2010), no. 2, 118–145. [72] Sanni Siltanen, Theory and applications of marker-based augmented reality, 2012. [73] Alexandra Mihaela Siriteanu and Adrian Iftene, Meetyou - social networking on android, Proceedings - RoEduNet IEEE International Conference (2013). [74] Branislav Sobota and Radovan Janošo, 3d interface based on augmented reality in client server environment, Journal of information, control and management systems 8 (2010), no. 3, 247–256. [75] Injun Song, Ig-Jae Kim, Jae-in Hwang, Sang Chul Ahn, Hyoung-gon Kim, and Heedong Ko, Social network service based mobile ar, pp. 175–178, ACM Press, 2010. [76] Aaron Stafford, Wayne Piekarski, and Bruce Thomas, Implementation of god-like interaction techniques for supporting collaboration between outdoor ar and indoor tabletop users, pp. 165–172, IEEE, Oct 2006. 114 [77] Katarina Stanoevska-Slabeva, Thomas Wozniak, Christian Mannweiler, Isabella Hoffend, and Hans D. Schotten, Emerging context market and context-aware services, 2010 Future Network and Mobile Summit (2010), 1–8. [78] Andrei State, Mark A Livingston, William F Garrett, Gentaro Hirota, Mary C Whitton, Etta D Pisano, and Henry Fuchs, Techniques for augmented-reality systems: Realizing ultrasound-guided needle biopsies, pp. 439–446, ACM Press, 1996. [79] D. Stricker, G. Klinker, and D. Reiners, A fast and robust line-based optical tracker for augmented reality applications, Proc. 1rst International Workshop on Augmented Reality (IWAR’98) (1998), 31–46. [80] James Surowiecki, The wisdom of crowds, Anchor Books, 2005. [81] Ivan E. Sutherland, The ultimate display, Proceedings of the IFIP Congress 2 (1965), 506–508. [82] , A head-mounted three dimensional display, pp. 757–764, ACM Press, 1968. [83] William Uricchio, Television’s next generation: Technology /interface culture / flow in,, Spigel, L. en Olsson, J.(Eds.) Television After TV: Essays on a Medium in Transition (2004), 163–183. [84] Tim Verbelen, Tim Stevens, Pieter Simoens, Filip De Turck, and Bart Dhoedt, Dynamic deployment and quality adaptation for mobile augmented reality applications, Journal of Systems and Software 84 (2011), no. 11, 1871–1882. [85] Mark Weiser, Some computer science issues in ubiquitous computing, Communications of the ACM 36 (1993), no. 7, 75–84. [86] Sean White, Levi Lister, and Steven Feiner, Visual hints for tangible gestures in augmented reality, pp. 1–4, IEEE, Nov 2007. [87] Jason Wither, Stephen DiVerdi, and Tobias Höllerer, Annotation in outdoor augmented reality, Computers & Graphics 33 (2009), no. 6, 679–689. [88] Zornitza Yovcheva, Dimitrios Buhalis, Christos Gatzidis, and Corné PJM van Elzakker, Empirical evaluation of smartphone augmented reality browsers in an urban tourism destination context, International Journal of Mobile Human Computer Interaction (IJMHCI) 6 (2014), no. 2, 10–31. [89] Xiang Zhang, Stephan Fronz, and Nassir Navab, Visual marker detection and decoding in ar systems: a comparative study, pp. 97–106, IEEE Comput. Soc, 2002. [90] Feng Zhou, Henry Been-lirn Duh, and Mark Billinghurst, Trends in augmented reality tracking, interaction and display: A review of ten years of ismar, pp. 193–202, IEEE, Sep 2008. 115