A Token-Based Local Help Platform with NLP Support by Jiaen Tao B.Sc., Zhejiang International Studies University, 2015 PROJECT SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE THE UNIVERSITY OF NORTHERN BRITISH COLUMBIA October 2025 © Jiaen Tao, 2025 Abstract The motivation behind this thesis arises from the high labor costs commonly observed in Canadian communities, where residents are often forced to acquire multiple skills to cope with everyday needs.A web-based skill-exchange platform—where two people trade services using their respective skills—would be valuable. However, population sparsity often makes matching difficult. To address this challenge, we explore a novel approach to the sharing economy: a local mutual-aid platform. Within this platform, users can consume services provided by others through virtual tokens, while the only way to earn tokens is by offering services themselves. Since these tokens are purely virtual, mutual-aid activities do not incur legal liabilities, nor do they risk creating full-time workers motivated solely by financial profit, which could undermine the spirit of reciprocity. On the implementation side, this thesis leverages an optimized Retrieval-Augmented Generation (RAG) approach to enable query handling under sparse data conditions, ensuring that even limited datasets can yield accurate and explainable recommendations. Furthermore, a self-developed distributed transaction manager based on the Saga pattern ensures the integrity of user data across distributed environments, supporting consistent balance updates, order confirmations, and notifications. The prototype platform we developed demonstrates how combining modern AI techniques with lightweight distributed systems can provide both practical utility and long-term sustainability for local help ecosystems. ii Contents Abstract ii List of Tables v List of Figures vi Acknowledgement viii 1 Introduction 1 1.1 Research Background and Motivation . . . . . . . . . . . . . . . . . . 1 1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.1 Feasibility of the Sharing Economy . . . . . . . . . . . . . . . 2 1.2.2 Feasibility of Token-Based Incentives . . . . . . . . . . . . . . 3 1.2.3 Feasibility of Search Under Sparse Data . . . . . . . . . . . . 3 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 2 Related Work 6 2.1 Geographical distribution characteristics of Canadian Communities . 7 2.2 Demographic and Social Characteristics of Canadian Communities . . 9 2.3 Sharing Economy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3.1 History of Sharing Economy . . . . . . . . . . . . . . . . . . . 12 iii 2.3.2 Implementation of a Sharing Economy Platform . . . . . . . . 13 2.3.3 Distributed Transactions . . . . . . . . . . . . . . . . . . . . . 14 2.4 Token-Based Incentive Systems . . . . . . . . . . . . . . . . . . . . . 16 2.5 Recommend under Sparse Data Conditions . . . . . . . . . . . . . . . 17 2.5.1 Traditional Recommendation . . . . . . . . . . . . . . . . . . 18 2.5.2 Limitations of Traditional Recommendation . . . . . . . . . . 19 2.5.3 Recommendation Based on LLMs . . . . . . . . . . . . . . . . 20 3 Methodology 3.1 3.2 22 The Implementation of Local Help Platform . . . . . . . . . . . . . . 22 3.1.1 Home Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.1.2 Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.1.3 Login . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.1.4 Profile Management 3.1.5 Profile Search . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.1.6 Service Selection . . . . . . . . . . . . . . . . . . . . . . . . . 38 . . . . . . . . . . . . . . . . . . . . . . . 26 The Deployment of Local Help Platform . . . . . . . . . . . . . . . . 44 4 Evaluation 45 4.1 Performance Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.2 Recommendation Quality Testing . . . . . . . . . . . . . . . . . . . . 47 5 Conclusion and Discussion 51 Bibliography 53 Appendix: RAGAS Evaluation Samples iv 59 List of Tables 2.1 Comparison of Median Age in Selected Canadian Regions (2024) . . . 12 3.1 Order log row data (id = 2033). . . . . . . . . . . . . . . . . . . . . . 42 4.1 Latency statistics under 1200 QPS load . . . . . . . . . . . . . . . . 45 4.2 System resource usage during benchmark . . . . . . . . . . . . . . . . 46 4.3 RAGAS Evaluation Scores of Clear Question. . . . . . . . . . . . . . 50 4.4 RAGAS Evaluation Scores of Ambiguous Question. . . . . . . . . . . 50 v List of Figures 2.1 Remoteness classification of Canadian census subdivisions based on the Remoteness Index. . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Recommendation Workflow . . . . . . . . . . . . . . . . . . . . . . . 18 3.1 Positive feedback loop between the service system and the token system 22 3.2 Home Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3 The Components in Home Page . . . . . . . . . . . . . . . . . . . . . 23 3.4 Registration entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.5 Workflow of the registration process. . . . . . . . . . . . . . . . . . . 25 3.6 Successful login & registration . . . . . . . . . . . . . . . . . . . . . . 25 3.7 User login with a standard username and password. . . . . . . . . . . 26 3.8 Feature overview for uploading service information . . . . . . . . . . . 27 3.9 Workflow across User, Server, and Search Service . . . . . . . . . . . 28 3.10 CSRF token issuance and submission . . . . . . . . . . . . . . . . . . 28 3.11 Embedding Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.12 Example embedding records . . . . . . . . . . . . . . . . . . . . . . . 30 3.13 Structure of the search results . . . . . . . . . . . . . . . . . . . . . . 31 3.14 Workflow of server and search server interaction . . . . . . . . . . . . 31 3.15 Overall RAG workflow extend → retrieve → rerank → generate. . . . 32 vi 3.16 Successfully Matched to “I Want to Pursue a PhD” — Feedback Requested . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.17 Show the reason for the empty result . . . . . . . . . . . . . . . . . . 35 3.18 product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.19 freezed time slot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.20 End-to-end purchase workflow . . . . . . . . . . . . . . . . . . . . . 40 3.21 Begin Check Out . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.22 Address Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.23 Notification format sent via the email API . . . . . . . . . . . . . . . 41 3.24 Saga workflow with TM . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.25 Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.1 RAG-based evaluation workflow . . . . . . . . . . . . . . . . . . . . . 48 vii Acknowledgement I would like to express my sincere gratitude to Professor Chen for his invaluable guidance throughout the entire business workflow design. His insights on compliance and technical architecture provided critical direction for this project. I am also deeply thankful to Professor Li for his helpful suggestions on the data slicing strategy, which significantly improved the effectiveness of the system. Moreover, I would like to thank Professor Jiang for his constructive advice on academic standardization, which made the overall structure of my thesis more rigorous and well-organized. As an international student, I acknowledge that ChatGPT (GPT-5), a generative AI system, was used for grammar correction and language polishing in this thesis. In addition, the synthetic evaluation dataset used in the RAGAS experiments was generated with the assistance of ChatGPT(GPT-5). viii Chapter 1 Introduction 1.1 Research Background and Motivation This work is motivated by an innovative idea proposed by my supervisor, Dr. Liang Chen. In Canada—and increasingly worldwide—labour and service costs are high; therefore, a web-based skill-exchange platform, analogous to a product-exchange marketplace, may offer a compelling solution in which people trade services using their respective skills. Two challenges arise: (i) simultaneous, one-to-one matching is difficult—especially in sparsely populated regions; and (ii) certain categories of work require quality and risk management. In Dr. Chen’s design, the platform addresses these issues by employing a token system—users earn tokens by completing services for others and redeem them later for services they need—and, for selected categories, arranging group insurance through licensed insurers to manage quality and risk, thereby providing legal protection for both providers and recipients. Consequently, a new business scenario emerged: the concept of a local help platform. Such a platform would allow residents to find local partners with specific skills to help with daily tasks, while also enabling individuals to showcase their abilities 1 and assist others. However, no platform currently exists that is explicitly designed for the local help scenario. Existing systems fall into two categories: sharing economy platforms and crowdsourcing platforms. Sharing economy platforms such as Airbnb and Uber [1] operate as intermediaries, matching demand with service providers. Yet, Airbnb focuses on short-term rentals and Uber focuses on ride-sharing, neither of which addresses the skill-sharing needs of local help. Crowdsourcing platforms [2], such as gig platforms and Freelancer, are closer to a C2B structure, where individuals provide services to businesses, rather than facilitating true peer-to-peer skill exchange, and thus do not align with the authentic needs of local help. It is also worth noting that both types of platforms involve cash transactions, which still carry significant economic and legal risks. This raises an important question: could we design a platform based on the principles of the sharing economy, but where incentives are provided through non-cash mechanisms? In such a model, mutual aid between users would be motivated by goodwill or by personal interest in practicing and applying one’s skills. This type of platform would not only provide practical services that improve users’ lives but also, through sustained participation, strengthen interpersonal relationships within the community. 1.2 Problem Statement 1.2.1 Feasibility of the Sharing Economy From the above discussion, it can be seen that local help essentially represents a form of the sharing economy that does not rely on cash incentives. Therefore, we can expect to encounter business and technical challenges similar to those faced by existing sharing economy systems. Moreover, we need to explore which functions 2 must be implemented in the first version of the platform under the sharing economy model to ensure overall usability. Handling Diverse Website Interactions At the same time, given the limited human resources available within local communities, the system design must emphasize minimizing operational and maintenance costs, as well as reducing expenses associated with handling user complaints and dispute resolution. It is foreseeable that the platform will involve a large number of user interactions, such as searches and clicks, and will also need to interact with numerous external systems, such as email delivery. Therefore, we have also studied the implementation of distributed transactions and compared the relationship between integration cost and user experience across different approaches, in order to select a solution that balances development cost and usability. 1.2.2 Feasibility of Token-Based Incentives Another key consideration is the incentive mechanism. Since our platform explicitly avoids using money as a stimulus, a natural question arises: can tokens be used to encourage user participation? Is token-based stimulation theoretically feasible? Most importantly, how should tokens be issued and utilized so as to maximize community engagement at the lowest possible cost and legal risk? The goal is for tokens to gradually attract users and encourage participation and interaction. 1.2.3 Feasibility of Search Under Sparse Data Finally, since the platform is targeted at local communities, it is foreseeable that local community life will be highly diverse. If we rely on traditional inverted index approaches to perform keyword-based searches, it will be very difficult to retrieve 3 relevant data. Therefore, we must confront the challenge of sparse data—for example, cases where user profiles or service information are limited. The critical technical issue is how to ensure that such limited data can still be effectively retrieved and presented with sufficient accuracy, so that the platform remains useful under data-scarce conditions. Another issue is user experience: due to data scarcity, users may still fail to find relevant information. Hence, we need to design interaction mechanisms that allow users to adjust and refine their inputs. 1.3 Thesis Structure the Related Work This section analyzes the geographic distribution and social background of Canadian communities to demonstrate that the demand for local help aligns with current societal conditions; it then provides an overview of the origins and development of the sharing economy, discusses potential challenges such platforms may face such as trust, governance, and long-term user engagement, and identifies the core functionalities required for an initial version, including reliable service matching, transparent reputation systems, and user-friendly interfaces. In addition, it examines the theoretical feasibility of token-based incentive mechanisms in fostering participation, rewarding contributions, and mitigating the free-rider problem, and reviews existing research on low-cost query handling in sparse data environments, with particular attention to RAG techniques, distributed transaction management, and adaptive indexing to ensure scalability and efficiency under resource-constrained conditions. The Methodology section will then present the proposed implementation of the platform, covering both functional design and the underlying technical architecture. The Evaluation section evaluates the system’s performance and recommendation 4 accuracy, confirming its low-cost deployment and its recommendation capability under sparse data conditions. Finally, the Conclusion will summarize how the proposed platform addresses the aforementioned challenges, evaluate its effectiveness, and outline possible directions for future iterations and technical enhancements. 5 Chapter 2 Related Work At present, there is no platform specifically dedicated to mutual aid. Therefore, this chapter will also analyze the geographic distribution and cultural background of Canadian communities in order to explore the feasibility of local mutual aid. As introduced in the Introduction, the essence of this platform is a sharing platform. The website merely serves as an information provider and matchmaking intermediary, while the actual completion of services still depends on interactions between users. Accordingly, in terms of theoretical feasibility and risk assessment, this paper will primarily draw upon business models related to the sharing economy. This chapter will also examine some of the subsequent issues encountered by sharing economy platforms and propose strategies to avoid them. Of course, the platform also incorporates several unique features, such as a tokenbased incentive mechanism and natural language interaction capabilities, which require independent theoretical exploration. 6 2.1 Geographical distribution characteristics of Canadian Communities Figure 2.1: Remoteness classification of Canadian census subdivisions based on the Remoteness Index. Source: Statistics Canada, Remoteness Index Map (Remote/Very remote vs. Accessible areas) [3]. First, as the second-largest country in the world by land area, Canada has a very sparse population. However, based on the following analysis, we can see that most communities are clustered along rivers or transportation routes, making intercommunity mutual aid geographically feasible. Extreme Distance from Urban Centres According to Statistics Canada’s Remoteness Index, most Inuit Nunangat communities are situated more than 1,000 7 kilometres away from major urban centers, with no road access and reliance exclusively on air transportation. Such extreme isolation severely limits access to external human resources and diminishes the attractiveness of these communities for long-term settlement. Statistics from 2016 further indicate that a majority of Inuit lived in areas classified as very remote (57%) or remote (23%), compared to only about 3% of the non-Indigenous population [4]. This sharp contrast underscores the structural challenges of sustaining population growth and labour markets in these regions. Low Population Density and Internal Dispersion Remote Canadian settlements typically have extremely low population densities—often fewer than 0.5 persons per square kilometre—compared to over 4,700 persons per square kilometre in metropolitan Toronto. Combined with a strong cultural emphasis on individual privacy, this spatial dispersion hinders residents’ awareness of each other’s skills and capacities, creating barriers to effective resource sharing and mutual support in times of need. Feasible Inter-Community Proximity While intra-community dispersion is significant, many remote communities in provinces such as British Columbia and Ontario are located within moderate driving distances of one another. In practical terms, this often translates into travel times that can be managed within a few hours, suggesting the possibility of inter-community support and cooperation across neighbouring settlements. At the national level, Statistics Canada’s Remoteness Index (Figure 2.1) further demonstrates that many Canadian communities are categorized as “remote” rather than “very remote,” indicating that they remain within acceptable reach of neighbouring settlements. This is particularly true in provincial contexts, where road and 8 ferry connections facilitate inter-community travel. 2.2 Demographic and Social Characteristics of Canadian Communities This section analyzes the economic situation and cultural background of Canadian communities. We observe that the vast majority of Canadian communities, even those in remote areas, benefit from solid infrastructure and reliable Internet connectivity. Moreover, residents generally possess relatively high levels of education, with most being able to use computers and access the Internet. We also found that Canadian communities exhibit a highly diverse age distribution, which implies that the needs of community residents are also highly diverse. Therefore, from a cultural perspective, a local help platform can effectively address and fulfill the needs of these residents. Digital Literacy and Internet Use in Communities Communities located in mega cities naturally enjoy advanced infrastructure and thus are not the focus of this discussion. Instead, we turn our attention to remote communities. Despite geographic isolation, remote Canadian communities benefit from the country’s strong educational foundations and widespread digital infrastructure. As of 2020, nearly 94% of Canadian households had fixed-broadband Internet access, and this availability continues to expand into remote and rural areas, enabling residents to engage with various online services [5]. Furthermore, most residents in these communities are capable of using electronic devices. Younger populations demonstrate significantly higher digital literacy: 77% of Canadians aged 15–34 are classified as “Proficient or Advanced” Internet users, whereas the majority of older individu9 als fall into the “Non-user or Basic” categories [6]. Historical initiatives such as the Community Access Program (CAP) have also delivered essential digital exposure and training to underserved communities by providing access points in local schools, libraries, and community centers [7]. Taken together, these data suggest that even in remote regions of Canada, the population is generally capable of using computers and accessing the Internet. These communities are therefore fully able to adopt and benefit from the local help platform. Community Mutual-Aid Tradition Canada has a long-standing tradition of mutual aid, which is clearly reflected in national data on volunteering and charitable giving. According to Statistics Canada’s Survey on Giving, Volunteering and Participating (SGVP), both formal volunteering through organizations and informal neighbour-to-neighbour assistance are widely embedded in daily life. The SGVP data collected in 2023, during the COVID-19 period, show that although the national volunteering rate declined compared to 2018, nearly 73% of Canadians still engaged in various forms of volunteering. On average, each volunteer contributed 173 hours annually, which underscores that mutual aid in Canadian society is not a temporary response to crises but rather a deeply rooted and sustained practice. Moreover, the scale of contributions is substantial. In 2023, Canadians devoted approximately 4.1 billion hours to formal and informal volunteering combined, a figure that, while lower than in 2018, remains impressive. Notably, the top 10% of volunteers alone contributed more than 60% of total volunteer hours, highlighting the presence of highly committed individuals. Taken together, these findings support the view that, even without monetary incentives, a significant proportion of local residents can be mobilized to support others [8]. 10 Diverse Community Needs The diverse age composition of Canadian communities is a key factor contributing to the heterogeneity of local help demands. Table 2.1 summarizes the comparative median ages across selected regions. In the far northern territories, such as Nunavut, the population is remarkably young, with a median age of only 26.8 years—the lowest in the country—contrasting sharply with rural regions in provinces such as Ontario, where the median age approaches 47 years. Meanwhile, the Northwest Territories (36.0 years), Yukon (38.4 years), and the national average (40.3 years) illustrate a spectrum of intermediate age structures. This demographic variation inevitably leads to diverse community needs. In addition, cultural diversity is also a defining characteristic of Canadian communities. Canada’s population has long been marked by multicultural features. From the early English and French settlements, to the large influx of European immigrants in the early 20th century, and later, since the 1970s, to newcomers from Asia, Africa, and Latin America, the country has gradually developed into a society that includes “visible minorities” and a wide variety of ethnic groups. This structure implies significant differences among community members in terms of language, dietary practices, religious beliefs, educational traditions, and social interactions. Such cultural diversity translates directly into distinct everyday needs: for example, Asian and South Asian groups may require bilingual or multilingual public services, Arab or Muslim communities may prioritize access to halal food and religious facilities, while Indigenous communities emphasize the preservation of land, language, and traditional practices. Based on these findings [9], we can argue that a local-help platform must be capable of supporting highly diverse queries. Whether in healthcare, education, or social services, the design and implementation of such platforms must account for ethnic differences and cultural sensitivity; otherwise, true equity and inclusiveness 11 in communities cannot be achieved. Table 2.1: Comparison of Median Age in Selected Canadian Regions (2024) Region Median Age (years) Nunavut 26.8 (youngest nationally) Northwest Territories (NWT) 36.0 Source: Statistics Yukon 38.4 Canada (national average) 40.3 Rural regions (e.g., Ontario) ∼47 Canada, Median age on July 1, 2024 [10]; rural median age data from Rural Ontario Institute [11]. 2.3 Sharing Economy It can be seen that there is indeed a demand for implementing a local help platform. Moreover, the business model of this platform is similar to that of an agent, enabling two users with corresponding needs to quickly match. The overall operating model resembles that of a sharing-economy platform. Therefore, we will encounter similar risks that need to be mitigated, as well as comparable functional features that must be implemented. 2.3.1 History of Sharing Economy The generally acknowledged starting point of the sharing economy was in 2008, when Airbnb and Uber were founded in San Francisco and are regarded as the pioneers of this domain [1]. Their initial core challenge was that demand had to occur instantaneously and simultaneously. For instance, in the case of Uber, when passengers required a ride in the morning, a driver had to be present at the same time and place, heading in the same direction. If time or location did not coincide, the transaction could not be completed. Hence, a platform was necessary to match 12 service providers with service seekers. Similarly, Airbnb faced the same situation: a guest needed accommodation at time A in location A, while a host in location A had to have availability during that period; only under such conditions could the transaction be realized. Challenges in the Later Stage of the Sharing Economy Over time, the original emphasis on idle resource exchange and sharing within sharing economy platforms has gradually faded. As platforms such as Airbnb and Uber expanded in scale, professional service providers began to emerge, such as multi-listing Airbnb hosts or organized Uber fleets. This trend fundamentally transformed both the business models and community atmosphere of these platforms: services were no longer primarily provided by individual users but instead dominated by professional sellers. Prior research has shown that such professionalization not only affects pricing structures and supply but may also erode the initial spirit of mutual aid, leading to deteriorated user experience and increased regulatory risks [12]. 2.3.2 Implementation of a Sharing Economy Platform It can be seen that a sharing economy platform must not only process large amounts of user-generated content (e.g., service postings, resource descriptions, and availability updates), but also provide an efficient and optimized query and retrieval mechanism that enables users to quickly locate relevant resources. This dual requirement implies that even a minimum viable sharing economy platform must implement a range of fundamental and critical functions, as noted by [13, 14]. Specifically, the platform must support a complete set of user interaction features, ranging from registration and content creation to matchmaking and transaction completion. Furthermore, since such platforms inevitably involve the storage of users’ personal in13 formation, security must be ensured. In addition, the platform should be capable of notifying users of changes related to their transactions and interacting with a wide range of external systems. To reduce maintenance costs, it is essential to incorporate fault-tolerance mechanisms that can maintain system stability in the presence of external failures. In summary, the platform must satisfy the following technical requirements: • Platform Availability (Ability): At the MVP stage, the most critical aspect is that the system can successfully support user registration, resource posting, and basic matchmaking. Moreover, in the event of failures in external dependencies (e.g., messaging APIs), the platform should ensure that user data remains consistent without requiring manual intervention. • Basic Data Security (Integrity): Users must at least trust that the platform will not leak their basic information (e.g., account credentials). Therefore, the immediate priority is to implement minimal data security mechanisms, such as encrypted password storage and secure session management. • Basic Resource Description (Product Trust): The platform should provide a clear and transparent resource description interface to reduce cognitive discrepancies between users, thereby improving the likelihood of successful matches and supporting long-term retention. 2.3.3 Distributed Transactions As noted in [13], sharing economy platforms inevitably involve extensive user interactions and integrations with external systems such as payment gateways and notification services. To ensure a seamless user experience, we must introduce the concept of distributed transactions [15]. This is essential to guarantee data integrity 14 even when dependent services experience outages. Once those services recover, the system should be able to resume normal operations without manual intervention. Currently, most distributed transaction solutions incur significant operational and development overhead. For example: • XA protocol [16]: it ensures strong consistency but requires tight coupling with resource managers and can negatively impact performance. • TCC (Try-Confirm-Cancel): This idea was first proposed in 2007 and was later elaborated in the 2016 ACM Queue version [15]. Building on this line of thought, the Try-Confirm-Cancel (TCC) pattern was later formalized, providing fine-grained control over transaction stages but demanding complex business logic and typically relying on independently deployed transaction coordinators. • SAGA [17]: A more lightweight approach that decomposes long-running transactions into a sequence of local transactions, each with a corresponding compensation action. SAGA is easier to implement and can be integrated via SDKs without the need for standalone coordination services. Although XA and TCC offer robust guarantees, their complexity and operational costs make them less suitable for high-concurrency, high-interaction environments typical of sharing economy platforms. In contrast, the SAGA pattern strikes a balance between consistency and simplicity, making it a practical choice for systems built on microservices architecture. 15 2.4 Token-Based Incentive Systems It can be observed that platforms involving cash-based or contribution-based economies often raise concerns related to insurance or economic disputes [18], as well as the deterioration of service quality brought about by the emergence of full-time providers (e.g., professional Airbnb hosts) [12]. Therefore, it is also necessary to investigate the feasibility of a purely token-incentivized platform. The following discussion is not about blockchain-based designs; but it applies to normal platform tokens. Although token-based incentives are typically weaker than direct monetary rewards, prior studies have shown that, as long as stable acquisition and usage rules are in place, tokens can substantially shape user behavior [19]. In the context of a local help platform, users can redeem tokens for services provided by others, which in turn motivates them to actively provide services in order to earn tokens, thereby forming a positive feedback loop. Such a mechanism is not only theoretically sound but also supported by practical cases. A notable real-world analogy is Stack Overflow, one of the world’s largest online Q&A communities, with over 29 million registered users, millions of questions and answers, and hundreds of thousands of active contributors each year. Empirical research has shown that its badge system (a form of virtual tokens) significantly enhances user engagement: after obtaining a badge, users become more active not only in the activities directly associated with the badge but also exhibit spillover effects in unrelated activities. More importantly, even badges with seemingly negative connotations (such as the Tumbleweed badge, awarded to unanswered questions) can motivate users to improve their reputation and increase contributions. In other words, badges, as non-monetary tokens, influence user motivation through social identity, reputation, and psychological cues, ultimately fostering the production 16 and circulation of community knowledge [20, 21]. Fundamentally, badge and token systems provide a form of non-monetary incentive whose impact extends beyond short-term engagement to the long-term cultivation of habitual behavior. Researchers have noted that users may initially be motivated by external rewards, but over time they gradually internalize these external drivers, transforming them into a sense of belonging, social identity, and even self-actualization. Thus, tokens and badges are not merely a “points system,” but rather a dual mechanism that combines external incentives with the cultivation of internal motivation. A similar phenomenon can also be observed in GitHub, the world’s largest opensource platform. Mechanisms such as stars, watches, and contribution logs do not involve direct economic benefits but serve as public records of participation. Developers are motivated to increase their activity precisely because their contributions are made visible to others [22]. Likewise, although token-based incentives cannot rival monetary rewards in direct economic value, their role in sustaining community governance and user participation should not be overlooked. In the design of a local help platform, carefully incorporating badge and token mechanisms similar to those of Stack Overflow can not only encourage continuous user engagement but also maintain the platform’s long-term vitality and healthy development through positive feedback loops. 2.5 Recommend under Sparse Data Conditions As mentioned earlier, local help platforms must face the challenge of data sparsity, while keeping solution costs reasonably low. In our investigation, we found that although traditional recomendation and search solutions are widely used across var- 17 ious systems, they are not well-suited for local-help platforms. 2.5.1 Traditional Recommendation Figure 2.2: Recommendation Workflow Traditional recommender systems typically follow a multi-stage pipeline. The first step is to collect user–item interaction data, such as explicit feedback (ratings, likes, purchases), as illustrated in Fig. 2.2. These data are then transformed into a sparse user–item matrix. In the candidate generation stage, classical methods include collaborative filtering (user-based or item-based) and matrix factorization, which project users and items into a shared latent space to perform similarity 18 matching [23]. In addition, user behavior and item attributes can be incorporated to improve coverage. The retrieved candidates are subsequently scored and ranked, often relying on linear models or heuristic functions. Hand-crafted rules may also be added to improve precision, and finally the Top-K results are presented to users. There are also search-based solutions, such as the inverted index systems used in Solr and Elasticsearch, which rely on keyword matching. However, these approaches are even less suitable for the local-help business scenario and are therefore not discussed further. 2.5.2 Limitations of Traditional Recommendation It is clear that the aforementioned approaches have certain limitations. For instance, they are highly dependent on user behavioral data, but our local-help system is unlikely to have a sufficiently large user base in its early stages. Therefore, more advanced recommendation systems, such as YouTube’s approach [24], have been developed. In the candidate generation stage, user profiles and possible input queries are transformed into vectors, while all item information is also vectorized. The matching task is then formulated as an approximate nearest neighbor (ANN) search problem in the joint vector space. In the scoring and ranking stage, deep neural models are applied to return the top-ranked results. This vector-based matching approach is still adopted in our local-help scenario, as the development of efficient ANN algorithms, such as HNSW [25], together with embedding models, has made ANN search extremely fast and effective [26]. By leveraging deep neural networks for the final scoring, this approach partially alleviates the data sparsity problem. Nevertheless, it should be noted that in order to continuously improve recommendation accuracy, models must be frequently retrained, which results in high overall maintenance costs. This raises the question: 19 could large language models (LLMs) be leveraged as the final reranking and filtering mechanism, thereby achieving strong recommendation performance without extensive retraining? 2.5.3 Recommendation Based on LLMs Compared with traditional deep recommendation models, large language models (LLMs) can achieve comparable or even equivalent recommendation accuracy without relying on large-scale training data, particularly in cold-start or few-shot scenarios [27]. A similar phenomenon is also observed in [28], where even untrained LLMs demonstrate strong performance in retrieval tasks. This observation is especially relevant for local help platforms with limited user bases and sparse interaction data, as it implies that such platforms do not need to bear the high cost of model training. Moreover, given the relatively small user base, there is also no need to be overly concerned about the high inference costs of LLMs in large-scale online systems, as pointed out in the study. For query processing under sparse data conditions, solutions have also been proposed. In particular, [29] introduces methods to improve query matching with LLMs through query expansion. Two strategies are discussed: Generative Query Rewriting (GQR), which produces multiple synonymous variants of the original query, and Generative Query Expansion (GQE), which generates additional content relevant to the query to enhance retrieval. Empirical evidence shows that GQE significantly outperforms GQR, achieving substantial improvements in metrics such as NDCG and Recall across multiple datasets. However, it is important to note that query expansion must be combined with appropriate parameter settings, such as temperature adjustment. While higher temperatures may yield more diverse expansions, in lightweight community-based recommendation scenarios such as local 20 help platforms, excessive diversity often reduces recommendation precision, leading to expanded queries that deviate from the platform’s core service orientation [30]. Furthermore, as highlighted in [31], temperature adjustment alone is insufficient; prompts must explicitly specify that all expanded queries are grounded in the local help context in order to effectively constrain the LLM’s tendencies at the final query expansion and filtering stages. Therefore, the existing literature suggests that with carefully set temperatures and explicitly defined prompts, untrained LLMs can already meet the search and recommendation requirements of local help platforms under sparse data conditions. 21 Chapter 3 Methodology This chapter will provide a detailed introduction to the main functions of the Local Help Platform and the corresponding technical solutions. 3.1 The Implementation of Local Help Platform Figure 3.1: Positive feedback loop between the service system and the token system The functional design of this system revolves around two core components: the service system and the token system. Through the service system, users can quickly find suitable services and spend tokens. The only way to earn tokens, apart from the initial gift granted upon registration, is by helping others. This mechanism incentivizes users to provide services for others. As illustrated in Figure 3.1, this process 22 forms a positive feedback loop, which continuously strengthens the atmosphere of mutual assistance within the community. 3.1.1 Home Page Figure 3.2: Home Page Figure 3.3: The Components in Home Page Figure 3.2 shows the homepage, which serves as the entry point of the entire 23 platform. The upper section functions as the entry point for the search feature, while the lower section mainly displays community activities and highlights residents who are willing to offer help, configured manually. The detailed composition of the modules is illustrated in Figure 3.3. The search and shopping cart modules in the figure will be introduced in detail in the following sections. 3.1.2 Registration Figure 3.4: Registration entry User registration in this platform is based on the user’s email address. A nonduplicate email is considered a new user, as shown in Figure 3.4. Otherwise, it will be recognized as a duplicate registration. As illustrated in the workflow in Figure 3.5, once the user successfully registers, the system automatically grants 20 tokens as a welcome bonus, enabling the user to enjoy community services and encouraging them to participate in community activities (see Figure 3.6). 24 Figure 3.5: Workflow of the registration process. Figure 3.6: Successful login & registration 3.1.3 Login As shown in Figure 3.7, the login process adopts a standard username and password mechanism. After a successful login, the right side of the homepage displays the number of tokens currently owned by the user (see Figure 3.6). Once logged in, the user can immediately enjoy the services provided by other community members. Furthermore, for a considerable period of time after a successful login, the user’s authentication token is stored in the browser’s cookies, allowing seamless access 25 Figure 3.7: User login with a standard username and password. without the need to log in repeatedly. 3.1.4 Profile Management This feature is a core part of the service system. It covers (1) uploading a user’s service information and (2) parsing that information so it can be easily discovered and used by other users. Functional Overview As shown in Figure 3.8, users provide their skills and recent availability. The availability format is yyyy-MM-dd hh:mm, indicating the specific hour during which the user is free. Avatar Upload Users first upload an avatar from their local device. The server stores the image in a local filesystem directory and returns the resulting avatar URL to the user. If satisfied, the user clicks Submit to send their basic profile information. Server-Side Processing Upon receiving the submission, the server: 1. Validates that all required fields exist and are well-formed (skills, availability, avatar URL, etc.). 2. Persists the record to the local database. 26 Figure 3.8: Feature overview for uploading service information 3. Forwards name and skills to the downstream search service, which updates the user’s resume/profile index. The indexing update flow is summarized in Section Profile Feature Engineering. The end-to-end interaction among the three services is illustrated in Figure 3.9. After processing completes, the user can be found via the search module. Security Considerations: CSRF Protection User identity artifacts are stored in browser cookies and corresponding session state persists on the server (e.g., mapped to files or memory). If a cookie were stolen, an attacker could attempt to forge submissions. To mitigate this, the server issues a CSRF token after each successful login and stores it in server memory. Whenever a form submission is required, the server also sends the CSRF token to the client (Figure 3.10); on submit, the client includes this token so the server can verify the request originated from a legitimate user action. Additionally this mechanism also works in other writing workflow such as payment, address update. The search service performs feature engineering on the received profile fields 27 Figure 3.9: Workflow across User, Server, and Search Service Figure 3.10: CSRF token issuance and submission (e.g., skills normalization, keyword extraction) and updates the associated search index to ensure fast and accurate retrieval. Profile Feature Engineering When the search server detects that a UserId has new updates (see Figure 3.11), it performs the following high-level steps: 28 1. Purge existing vectors for the user: Remove any prior embeddings and documents for this UserId from the vector database to avoid duplication or stale content. 2. Collect service data: The server retrieves the latest service records associated with the user. 3. Aggregate to plain text: The collected fields are normalized and concatenated into plain text to form the indexing corpus for this user. 4. Run the indexing pipeline: Proceed with the steps in Indexing Pipeline(See section Profile Feature Engineering). Figure 3.11: Embedding Workflow Indexing Pipeline The indexing pipeline consists of the following stages: 1. Cleanup: Remove special characters and anomalous content (e.g., stray control symbols) from the aggregated text. 29 2. Chunk: Split the text into slices by either fixed length (e.g., 100 characters) or sentence boundaries to preserve semantic coherence. 3. Embedding: Convert both the original (full) text and each chunk into vector representations and store them in the vector database. Alongside the raw text, persist rich metadata such as the service provider’s name and the corresponding UserId. An example of embedded records is shown in Figure 3.12. All stages persist execution logs and checkpoints. Therefore, if a transient failure (e.g., network interruption) occurs mid-pipeline, a subsequent run resumes from the last failed step instead of re-executing the completed stages. Model & Vector Dimension Given that the overall text size is small and each single-user service description is under 1 KB, we use the sentence-transformer model all-MiniLM-L6-v2. [26]To balance accuracy, throughput, and cost, we store embeddings at a vector dimension of 384. Figure 3.12: Example embedding records 3.1.5 Profile Search After the user uploads the services they can provide, other users can search for and use them. As shown in Figure 3.3, the entry point is located on the homepage. Any 30 Figure 3.13: Structure of the search results Figure 3.14: Workflow of server and search server interaction user, including strangers, can use this function. The structure of the search is shown in Figure 3.13, where in addition to the search results, the reasons for recommending these users are also given. The specific interaction is shown in Figure 3.14, where the user sends a request to the server, the search server searches for suitable services and returns the corresponding user IDs, and then the main server assembles the data and returns it. As mentioned earlier, the data in our resume database is not particularly large. Since, as a service provider, the user is increasingly comparable to a product, we use the concept of an SKU from e-commerce to represent the user’s service, with skuId serving as the identifier for the service. In order to support diverse queries under such a dataset, we adopted the following solution. 31 Search Pipeline Figure 3.15: Overall RAG workflow extend → retrieve → rerank → generate. Our search approach is based on Retrieval-Augmented Generation (RAG) [32]. The end-to-end workflow is illustrated in Figure 3.15, and consists of four stages: extend, retrieve, rerank, and generate.The dependent LLM is is gpt-4o-min. To maximize the likelihood of matching relevant data, the following search pipeline iterates up to 3 rounds until suitable results are found. Also, during implementation, we must carefully adjust the prompts to control how the LLM expands and interprets the query. For instance, using a standard RAGstyle prompt to expand the query “Today is too hot” (Accepted set by L. Chen) yields the following: Standard RAG Prompt 32 1. What are some ways to cope with the heat today? #next-question 2. How can I stay cool during this hot weather? 3. What tips do you have for dealing with high temperatures? 4. What are effective methods to beat the heat today? ... Community-Oriented Prompt 1. What local resources are available to help me stay cool during this heatwave? 2. Are there any community centers or cooling stations open today to escape the heat? 3. Can you recommend any nearby parks or shaded areas where I can relax in this hot weather? 4. Is there a local mutual-aid group that provides assistance for those struggling with the heat? ... It can be seen that without constraining the direction of prompt expansion, the questions become overly divergent, which is detrimental to recommendation.The following are the details of each step in the pipeline. Extend The purpose of this stage is to expand the original query to avoid failure in matching caused by underspecified questions. As noted above, the prompts must 33 be controlled to ensure expansions move in the intended direction. Furthermore, since the pipeline may be executed iteratively, the LLM must be instructed to exclude questions generated in previous rounds. The detailed template is shown in Expansion Prompt 3.1.5. Retrieve All expanded questions are processed concurrently. Because retrieval is parallelized, increasing the number of expansions does not materially impact endto-end latency. Rerank We deduplicate candidates (e.g., by sku id or name) and retain the top10 most relevant to the original question. Since the candidate pool is usually around 30 and LLM-based ranking may be slower or less precise, we adopt a cross-encoder reranker to ensure accuracy and efficiency [33]. Generate In the final generation stage, the prompts must enforce that answers are grounded in community-oriented scenarios, while preventing the LLM from over-focusing on keyword overlap. For instance, the query “I want to pursue a PhD”(Figure 3.16) should still return the SKU “I am a doctor of computer science,” even with low keyword overlap. Similarly, user capability levels must be respected (e.g., senior-level requests such as “I need a plumber with 10+ years of experience”). Additionally, If no suitable candidate is found, the system still returns a clear reason so the user can adjust the query(Figure 3.17).The complete prompte is provided in Final Selection Prompt 3.1.5. Regarding temperature, based on [30], accuracy differences in the range 0 ≤ temperature ≤ 1 are negligible for selection-style tasks. However, to maximize stability and reproducibility, we set temperature=0. 34 Cache What we can anticipate is that the above process is rather time-consuming, especially for cases where no answer can be found. To ensure a good user experience, we have currently added a Least Recently Used Cache(LRU Cache) at the interface generation stage so that identical queries can be returned quickly. Figure 3.16: Successfully Matched to “I Want to Pursue a PhD” — Feedback Requested Figure 3.17: Show the reason for the empty result Expansion Prompt You are a helpful HR assistant in a local community mutual-aid system. Your task is to generate {expand_to_n} alternative versions of the given user question. The goal is to make the questions better reflect local community scenarios, so they can 35 retrieve more relevant documents from a vector database. When rephrasing, think about what kind of help the user might actually need in a neighborhood or local worker context, and adapt the variations accordingly. Each variation should stay faithful to the user’s intent, but can expand naturally to include possible services or assistance they might be seeking. Provide the variations separated by ’{separator}’. Strictly Avoid and Do not generate questions similar to: {exclude} Original question: {question} Final Selection Prompt You are a local community HR specialist. Your responsibility is to help residents find suitable candidates who can solve their **practical problems** in daily life or personal development. This includes but is not limited to: health, home repair, childcare, transportation, and academic or educational support. Always match based on the actual intent and domain of the user’s question, not on superficial keyword overlap. 36 When the user requests professional expertise or seniority, prefer candidates whose name or description reflects that level. Select only the most appropriate candidate(s); partial matches or loosely related candidates should not be chosen. Use only the provided context to evaluate candidates and infer the most appropriate one(s) based on the user’s question. Each candidate has a unique sku_id. If suitable candidate(s) exist, output **only** a JSON object in the exact format below. If no candidate is suitable, still output a JSON object with an empty sku_id_list and provide the reason why. JSON format: {{ "reason": "Explain clearly why the selected candidate(s) can reasonably 37 and practically address the user’s question. If none are suitable, explain why.", "sku_id_list": "comma-separated sku_id(s), or empty string if none" }} Do not add any extra text, explanation, or formatting. Base your judgment strictly on the provided context, the user’s question, and practical relevance. User question: {query} Context: {context} 3.1.6 Service Selection Figure 3.18: product After the search completes, the user clicks a service to open its detail page (Figure 3.18), where an appointment list is available. The user selects a preferred timeslot and then places an order. Upon placing the order, the request is first 38 Figure 3.19: freezed time slot added to the shopping cart; the user then navigates to the cart to checkout, and subsequently pays with tokens. After a successful payment, notification emails are sent to both parties for follow-up communication, and the service timeslot is locked (see Figure 3.19). This flow is designed to minimize complaints caused by worker scheduling conflicts; see Figure 3.20 for the overall process. Place Order Next, we describe the order placement process in detail. From the homepage (see Figure 3.3), the user can see the shopping cart on the homepage and any other page; clicking checkout (Figure 3.21) starts the order flow. During checkout, the user must provide an exact address to enable subsequent service delivery, see Figure 3.22. After submission, the system performs token deduction and sends notifications. Notice Notifications are sent via an external email API. In the email, you will see two links: one indicates that the order has been successfully completed — after clicking it, the token will be credited to the worker’s account. The other indicates that there is an issue with the order — after clicking it, the token will be returned. Ultimately, the decision of whether to release the token is left to the user. A consistency risk exists 39 Figure 3.20: End-to-end purchase workflow because token deduction uses our internal database, while notices are delivered by an external API. If deduction succeeds but the notice fails, the system enters an inconsistent state and must compensate the user to avoid complaints. Since the help community functions as a sharing-economy platform, user trust is fragile in the 40 Figure 3.21: Begin Check Out Figure 3.22: Address Configuration Figure 3.23: Notification format sent via the email API early stages. Therefore, it is critical to guarantee the correctness of token data to prevent the trust from breaking down. Therefore, we employ the mechanism below to guarantee that—without manual intervention—user data is eventually correct. The notification format is shown in 41 Figure 3.23. Transaction Field order id goods context retry time status step release time Value billing-151 {”attribute”: {”amount”:”6.99”, ”} 1 2 3 0 Table 3.1: Order log row data (id = 2033). Our solution is implemented using the Saga pattern [17] and is delivered as an SDK (no separate service deployment required). All user actions are recorded in a journal table; once a record is successfully written, downstream steps automatically retry until a business-correct outcome is reached. The journal interface (Table 3.1) includes the userId and the execution context for subsequent steps. After logging, the SDK begins processing the record. The SDK has two roles: the Transaction Manager (TM), which orchestrates the flow, and actions, which encapsulate business logic. Here, the actions are token consume and notice send. Actions are executed sequentially: after token consume succeeds, notice send runs. If an action (e.g., notice send) fails definitively (e.g., API blocked), the TM triggers compensation in the previous action (refund). The end-to-end Saga flow is illustrated in Figure 3.24. 42 Figure 3.24: Saga workflow with TM 43 3.2 The Deployment of Local Help Platform The system relies only on Qdrant and MySQL, and uses just two languages. As shown in Figure 3.25, it has very few dependencies, so even without a dedicated operations team the deployment can be completed smoothly. Figure 3.25: Dependencies 44 Chapter 4 Evaluation Our evaluation is organized around two primary goals. First, the system should run with minimal human intervention and low resource consumption. Second, the quality of recommendations must be accurate and trustworthy. 4.1 Performance Testing This part focuses on ensuring low cost and stable performance under normal user traffic. We identify high-traffic endpoints by feature usage and subject them to load testing. We use wrk as the load generator. Metrics include QPS, latency (P50/P95/P99 [34]), CPU usage, and memory usage. The results below summarize representative endpoints. API Endpoint /homepage /new employee /checkout/complete QPS 1214 1125 1198 P50 (ms) 23 51 129 P95 (ms) 39 79 151 P99 (ms) 157 178 279 Table 4.1: Latency statistics under 1200 QPS load 45 API Endpoint /homepage //new employee /checkout/complete CPU Usage (%) 73% 77% 79% Memory Usage (MB) 2780 2721 2842 Table 4.2: System resource usage during benchmark Metrics Given that our system targets a broad user base, we pay special attention to maintaining a consistent user experience under high-concurrency scenarios. As highlighted by Dean and Barroso in The Tail at Scale [34], tail latency plays a decisive role in overall system performance and user-perceived responsiveness in large-scale distributed systems. Even a small fraction of high-latency requests can significantly degrade overall user satisfaction. To comprehensively evaluate system performance under realistic workloads, we selected three representative endpoints for testing: /homepage, /new employee, and /checkout/complete. The /homepage endpoint serves as the entry point for most users upon accessing the platform, /new employee represents recommendation logic for new users, and /checkout/complete is the universal endpoint for all order completion operations. Together, these endpoints effectively reflect the core user journey and cover the system’s critical business paths under concurrent load. Based on this setup, our performance testing and monitoring activities do not rely solely on the traditional mean latency metric. We have introduced P95 and P99 latency statistics to more precisely characterize the system’s extreme response behavior under heavy load. These high-percentile indicators allow us to identify hidden performance bottlenecks and assess the potential impact of load conditions on user experience. In addition, we also monitor CPU and memory consumption to evaluate overall resource utilization during load testing. These data help us determine the efficiency of system resource usage and identify potential optimization margins while main46 taining a stable quality of service. Result Analysis It can be observed that during the stress tests of each endpoint, CPU utilization remained below 80%, and memory consumption did not exceed 3 GB. This indicates that the overall system has relatively low resource usage and no frequent JVM garbage collection (GC) events occurred, suggesting that the resource configuration is well-balanced and efficient. Next, we analyze the response time. The /checkout/complete endpoint shows the highest latency since it involves multiple validation steps and data writing operations. The /new employee endpoint requires rendering data from several services, resulting in slightly higher latency than /homepage. Nevertheless, the P50, P95, and P99 latencies all remain within normal ranges and do not negatively affect user experience. The slightly higher P95 compared to P50 is mainly due to cache misses in certain requests, while the P99 increase is attributed to database connection reinitialization and GC during the test. 4.2 Recommendation Quality Testing We adopt the RAGAS evaluation scheme [35]. The key advantage is that it does not require pre-authored ground truth; instead, an LLM is prompted to assess answer grounding and relevance. Test Flow As explained in Section 3.1.5, we represent each service using an SKU identifier In our platform, the testing flow is: the user calls the API and receives an answer; the test script extracts sku id list from the answer, concatenates it into a SKU query string, and then enters the RAG pipeline (retrieve → augment → generate). Finally, RAGAS metrics are computed. Figure 4.1 illustrates the 47 workflow.The test set used by the platform was generated with GPT-5 and consists of 40 everyday life questions—20 with explicit requests and 20 with vague ones—used for testing separately.The test data are provided in the Appendix. Figure 4.1: RAG-based evaluation workflow Metrics We report three metrics as defined in RAGAS: Faithfulness, and Context Relevance, Since the primary requirement of the local mutual-aid platform is to 48 verify whether the recommended SKUs (services) are accurate, and the responses mainly consist of reasons for selecting these users rather than being generated strictly based on keywords, the Answer Relevance metric cannot be used as a reliable reference. Faithfulness Inputs: question q, answer a, and the retrieved context sku detail. 1. Use an LLM to decompose the reasoning/answer content of a into an atomic statement set S = {s1 , . . . , sn }. The statement-extraction prompt follows the RAGAS paper [35]. 2. For each si ∈ S, use an LLM verifier to check whether si is supported by sku detail. Let V ⊆ S be the subset judged as supported and denote fs = |V |. 3. The Faithfulness score is F = fs |V | = ∈ [0, 1]. |S| |S| Context Relevance Inputs: question q, answer a, retrieved context sku detail (a set of sentences/items). 1. As above, sku detail provides the candidate evidence for answering q. 2. Prompt an LLM to extract only the sentences/items from sku detail that are necessary to answer q; denote this essential subset by R. 3. The Context Relevance score is CR = |R| ∈ [0, 1], |sku detail| 49 where |sku detail| is the total number of sentences/items in the context. Aggregate RAGAS Results The reported values are obtained by averaging the RAGAS metrics across all test questions. A total of 40 questions were prepared for evaluation, consisting of 20 clear questions(Table 4.3) and 20 ambiguous questions(Table 4.4). The following section presents the average metrics for explicit and vague questions separately.The overall results indicate that the recommendation accuracy is sufficient to meet user needs. One notable point is that the Context Relevance score for clear questions is lower than that for ambiguous questions. A possible explanation is that when the question is ambiguous, the model can still achieve a high score even if it arbitrarily extends the user’s query, since RAGAS has difficulty discriminating between different SKUs. In contrast, when the question is clearly defined, unnecessary responses can be explicitly filtered out. Metric Faithfulness Context Relevance Score (0–1) 0.925 0.8 Table 4.3: RAGAS Evaluation Scores of Clear Question. Metric Faithfulness Context Relevance Score (0–1) 0.808 0.9 Table 4.4: RAGAS Evaluation Scores of Ambiguous Question. 50 Chapter 5 Conclusion and Discussion We have successfully implemented a local help platform that, in terms of its business model, satisfies the criteria of a minimal sharing economy platform. It can be launched quickly and avoids entanglement with cash transactions, thus posing relatively low legal risks. One final point worth mentioning is that Canada has a long-standing tradition of mutual aid [8], which further supports the value and relevance of this platform. From a technical perspective, as described in the Methodology section, the optimized RAG pipeline enables flexible natural language queries, resulting in a very low entry barrier for end-users. During the testing phase, we also observed that setting temperature=0 is more effective in our filtering-oriented tasks, whereas higher temperature values cause the LLM’s evaluation criteria to become overly divergent [30]. Furthermore, due to the presence of the incentive mechanism, our system must ensure data consistency when interacting with external services. This paper demonstrates that it is indeed feasible to implement a lightweight transaction manager at a relatively low cost. Nevertheless, there remain several avenues for future work, which we discuss below. 51 Field Work As our web application is still at the prototype stage, the most important next step is to recruit a group of volunteers to try it out and provide practical feedback based on their real usage. This feedback will help us identify issues and opportunities to improve, and guide us in refining the design to enhance the overall user experience. Insurance In practice, workers may still get injured while providing assistance. Although the entire system functions merely as an intermediary platform, we still have an obligation to offer insurance to our clients. The insurance model can be similar to the Airbnb Host Guarantee [36], where users purchase the corresponding insurance at the time they post their service order. Prompt Engineering There is also room for improvement in our prompting strategies. We observed that when users’ queries contain specialized terms (e.g., insomnia, Polysomnography), the LLM may fail to retrieve relevant data. In future versions, query expansion could be improved by incorporating domain-specific vocabulary based on the query’s intent. Another optimization lies in reducing reliance on prompt-level constraints to guide the LLM. Instead, future iterations could first analyze all data within the service profiles, extract technical keywords, and then allow the LLM to use these keywords for query expansion and final validation. In addition, the RecPrompt [37] mechanism could be introduced, enabling the LLM to generate prompts based on the data distribution and gradually converge toward an optimal prompt. This approach could significantly enhance both the efficiency and accuracy of the LLM’s operations. 52 Bibliography [1] C. Öberg. Towards a typology of sharing economy business model transformation. Technovation, 123:102722, May 2023. [2] F. D. A. Alauddin, A. Aman, M. F. Ghazali, and S. Daud. The influence of digital platforms on gig workers: A systematic literature review. Heliyon, 11(1):e41491, January 2025. Online available: Dec. 26, 2024. Open Access under CC BY 4.0. [3] Statistics Canada. Remoteness index map by census subdivision. https: //www150.statcan.gc.ca/n1/pub/11-633-x/11-633-x2020002-eng.htm, 2020. Accessed: 2025-08-31. [4] Statistics Canada. Distance as a factor for first nations, métis, and inuit high school completion. https://www150.statcan.gc.ca/n1/pub/81-595-m/ 81-595-m2023002-eng.htm, 2023. Accessed: 2025-08-31. [5] Statistics Canada. Access to the internet in canada, 2020. https://www150. statcan.gc.ca/n1/daily-quotidien/210531/dq210531d-eng.htm, 2021. Accessed: 2025-08-31. [6] Statistics Canada. Internet-use typology of canadians: Online activities and digital skills. https://www150.statcan.gc.ca/n1/pub/11f0019m/ 11f0019m2021008-eng.htm, 2021. Accessed: 2025-08-31. 53 [7] Industry Canada. Community access program: Proposal guide. Technical report, Industry Canada, Ottawa, ON, Canada, August 1997. Cat. No. C23271-1-1997, ISBN 0-662-63122-6. [8] Statistics Canada. Volunteering and charitable giving in canada, 2018 to 2023. Technical Report Catalogue no. 11-001-X, Statistics Canada, June 2025. Released in The Daily, June 23, 2025. [9] Peter S. Li. Cultural diversity in canada: The social construction of racial differences. Technical Report rp02-8e, Department of Justice Canada, Research and Statistics Division, Ottawa, 2000. Research Paper. [10] Statistics Canada. Population estimates by age and median age — canada, provinces and territories, july 1 2024. https://www150.statcan.gc.ca/n1/ daily-quotidien/240925/dq240925a-eng.htm, 2025. Accessed: 2025-08-31. [11] Rural Ontario Institute. Age factsheet: Rural and urban median age comparison in ontario. Technical Report Factsheet No. 19, Rural Ontario Institute, 2022. Accessed: 2025-08-31. [12] K. Xie, C. Y. Heo, and Z. E. Mao. Do professional hosts matter? evidence from multi-listing and full-time hosts in airbnb. Journal of Hospitality and Tourism Management, 47:413–421, June 2021. [13] M. J. Pouri and L. M. Hilty. The digital sharing economy: A confluence of technical and social sharing. Environmental Innovation and Societal Transitions, 38:127–139, March 2021. [14] K. Stanoevska-Slabeva, V. Lenz-Kesekamp, and V. Suter. Platforms and the sharing economy: An analysis. report from the eu h2020 research project ps2share: Participation, privacy, and power in the sharing econ54 omy. Technical Report Tech. Rep. D5.1, Univ. St. Gallen, November 2017. [Online]. Available: https://www.bi.edu/globalassets/forskning/h2020/ ps2share_platform-analysis-paper_final.pdf. [15] P. Helland. Life beyond distributed transactions. Queue, 14(5):69–98, October 2016. [16] X/Open Company Ltd. Distributed transaction processing: The XA specification. X/Open CAE Specification XO/CAE/91/300, X/Open Company Ltd., Reading, UK, December 1991. [17] H. Garcia-Molina and K. Salem. Sagas. ACM Sigmod Record, 16(3):249–259, 1987. [18] R. Calo and A. Rosenblat. The taking economy: Uber, information, and power. Columbia Law Review, 117(6):1623–1690, 2017. [19] A. E. Kazdin and R. R. Bootzin. The token economy: An evaluative review. Journal of Applied Behavior Analysis, 5(3):343–372, 1972. [20] Z. Li, K.-W. Huang, and H. Cavusoglu. Quantifying the impact of badges on user engagement in online q&a communities. In Proceedings of the Thirty Third International Conference on Information Systems (ICIS), Orlando, FL, USA, 2012. Research-in-Progress. [21] A. Anderson, D. Huttenlocher, J. Kleinberg, and J. Leskovec. Steering user behavior with badges. In Proc. 22nd Int. Conf. World Wide Web (WWW), pages 95–106. ACM, 2013. [22] L. Dabbish, C. Stuart, J. Tsay, and J. Herbsleb. Social coding in github: 55 Transparency and collaboration in an open software repository. In Proc. ACM Conf. Computer Supported Cooperative Work (CSCW), pages 1277–1286, Seattle, WA, USA, February 2012. ACM. [23] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International Conference on World Wide Web (WWW ’01), pages 285–295, New York, NY, USA, 2001. ACM. [24] Paul Covington, Jay Adams, and Emre Sargin. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM conference on recommender systems, pages 191–198, 2016. [25] Yu A Malkov and Dmitry A Yashunin. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE transactions on pattern analysis and machine intelligence, 42(4):824–836, 2018. [26] A. Rao, H. Alipour, and N. Pendar. Rethinking hybrid retrieval: When small embeddings and llm re-ranking beat bigger models. arXiv preprint arXiv:2506.00049, 2025. [Online]. Available: https://arxiv.org/abs/2506.00049 (accessed Sep. 13, 2025). [27] J. Huang, S. Wang, L. Ning, W. Fan, S. Wang, D. Yin, and Q. Li. Towards next-generation recommender systems: A benchmark for personalized recommendation assistant with LLMs. arXiv preprint arXiv:2503.09382, 2025. [Online]. Available: https://arxiv.org/abs/2503.09382 (accessed Sep. 13, 2025). [28] T. Shen, G. Long, X. Geng, C. Tao, T. Zhou, and D. Jiang. Large language models are strong zero-shot retriever. arXiv preprint arXiv:2304.14233, 2023. [Online]. Available: https://arxiv.org/abs/2304.14233 (accessed Sep. 13, 2025). 56 [29] M. A. K. Ayoub, Z. Su, and Q. Li. A case study of enhancing sparse retrieval using llms. In Companion Proceedings of the ACM Web Conference 2024 (WWW ’24 Companion), pages 1609–1615, Singapore, Singapore, May 2024. ACM. [30] M. Renze. The effect of sampling temperature on problem solving in large language models. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 7346–7356. Association for Computational Linguistics, November 2024. [31] C. Yang, Y. Shi, Q. Ma, M. X. Liu, C. Kästner, and T. Wu. What prompts don’t say: Understanding and managing underspecification in llm prompts. arXiv preprint arXiv:2505.13360, 2025. [Online]. Available: https://arxiv. org/abs/2505.13360 (accessed: Sep. 13, 2025). [32] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W. Yih, T. Rocktäschel, S. Riedel, and D. Kiela. Retrieval-augmented generation for knowledge-intensive nlp tasks. In Proc. 34th Conf. Neural Information Processing Systems (NeurIPS), Vancouver, Canada, 2020. [33] H. Déjean, S. Clinchant, and T. Formal. A thorough comparison of crossencoders and llms for reranking splade. arXiv preprint arXiv:2403.10407, 2024. [Online]. Available: https://arxiv.org/abs/2403.10407(accessed Sep. 13, 2025). [34] J. Dean and L. A. Barroso. The tail at scale. Communications of the ACM, 56(2):74–80, February 2013. [35] S. Es, J. James, L. Espinosa Anke, and S. Schockaert. Ragas: Automated evaluation of retrieval augmented generation. In Proc. 18th Conf. of the European Chapter of the Association for Computational Linguistics: System Demonstrations (EACL Demo), pages 150–158, St. Julians, Malta, March 2024. Association for Computational Linguistics. 57 [36] C. Marzen, D. A. Prum, and R. J. Aalberts. The new sharing economy: The role of property, tort and contract law for managing the Airbnb model. SSRN Electronic Journal, 2016. [37] D. Liu, B. Yang, H. Du, D. Greene, N. Hurley, A. Lawlor, R. Dong, and I. Li. Recprompt: A self-tuning prompting framework for news recommendation using large language models. In Proc. 33rd ACM Int. Conf. Inf. Knowl. Manage. (CIKM), pages 3902–3906, Boise, ID, USA, 2024. ACM. 58 Appendix: RAGAS Evaluation Samples This appendix provides a selection of prompts and explanations used in the RAGASbased evaluation. Only a subset of the test questions is included here for illustration purposes, as the full dataset is too large to present in this document. Context note. During testing, all cases shared the same retrieval context: a unified résumé/profile database. To avoid redundancy, we do not repeat the full context for each example. Below is a compact SQL-style schema and a small sample of rows to illustrate the format of the context. -- Schema (illustrative) CREATE TABLE profiles ( sku_id INTEGER PRIMARY KEY, title TEXT, domain TEXT, skills TEXT ); -- Sample rows (excerpt) INSERT INTO profiles (sku_id, title, domain, skills) VALUES 59 (1, ’Residential Electrician’, ’Electrical’, ’wiring; panel upgrades; lighting installation; troubleshooting’), (14, ’Lighting Installation Electrician’,’Electrical’, ’ceiling lights; wall lamps; energy-saving systems; breakers’), (100, ’Home Lighting Electrician’, ’Electrical’, ’home lighting; LED retrofit; ceiling/wall lamps’), (10, ’Smart Home Electrician’, ’Electrical’, ’smart lighting; thermostats; IoT; home automation’), (12, ’Mobile Auto Repair Technician’, ’Automotive’, ’on-site repair; battery replacement; diagnostics; brakes’), (5, ’Mobile Car Technician’, ’Automotive’, ’battery; tires; minor engine diagnostics (on-site)’), (2, ’Auto Mechanic’, ’Automotive’, ’diagnostics; oil changes; brake repairs; maintenance’), (8, ’Motorcycle Mechanic’, ’Automotive’, ’engine tuning; brake adjustment; chain service’), (13, ’Bathroom Plumbing Specialist’, ’Plumbing’, ’low pressure; clogs; pipe issues; bathroom fixtures’), (3, ’Emergency Plumber’, ’Plumbing’, ’24/7; leaks; drains; water heater issues’); Example 1: Clear Questions 60 USER INPUT User question Can you replace the brake pads and rotors on my car this weekend? RETRIEVED CONTEXTS [’Experienced in replacing brake pads, calipers, and rotors. Ensures your vehicle stops safely and smoothly.’, ’Replaces brake pads, fluids, and rotors. Ensures responsive braking.’] EVALUATION Faith 1 Cont. Rel 1 USER INPUT Please install a smart thermostat and ensure it’s connected to my User question HVAC system. RETRIEVED CONTEXTS [’Installs smart thermostats, lighting systems, and security devices. Ensures modern electrical integration in your home.’, ’Installs smart lighting, thermostats, and home automation systems. integration with existing wiring.’] EVALUATION Faith 1 Cont. Rel 1 61 Ensures safe USER INPUT I need a licensed electrician to upgrade my breaker panel to handle User question more circuits. RETRIEVED CONTEXTS [’Upgrades outdated electrical panels to modern circuit breakers. Ensures safety and increased capacity.’, ’Handles residential and commercial wiring projects. Experienced in circuit installation, breaker panel upgrades, and fault diagnosis.’, ’Upgrades outdated fuse boxes to modern breaker panels, increasing electrical capacity and home safety.’] EVALUATION Faith 1 Cont. Rel 1 USER INPUT User question Can you unclog my kitchen sink drain and check for any leaks? RETRIEVED CONTEXTS [’Fixes leaky pipes, replaces corroded plumbing, and handles emergency indoor water issues efficiently.’, ’Unclogs kitchen, bathroom, and floor drains using mechanical and chemical methods. EVALUATION Faith 0.333 Cont. Rel 0.5 62 Fast and affordable.’] USER INPUT User question Please install a ceiling fan in my bedroom and ensure proper wiring. RETRIEVED CONTEXTS [’Handles residential and commercial wiring projects. Experienced in circuit installation, breaker panel upgrades, and fault diagnosis.’, ’Installs smart lighting, thermostats, and home automation systems. Ensures safe integration with existing wiring.’, ’Specialist in installing ceiling lights, wall fixtures, and energy-efficient LED systems for homes and offices.’] EVALUATION Faith 1 Cont. Rel 0 USER INPUT User question My water heater stopped working can you repair or replace it? RETRIEVED CONTEXTS [’Repairs and installs water heaters. systems.’] EVALUATION Faith 1 Cont. Rel 1 63 Handles both tank and tankless USER INPUT User question I need someone to install an EV charging station in my garage. RETRIEVED CONTEXTS [’Sets up electric vehicle home charging stations with proper grounding and capacity checks.’, ’Installs home EV charging stations compatible with Tesla, Nissan Leaf, and other electric vehicles.’] EVALUATION Faith 1 Cont. Rel 1 USER INPUT Can you replace the old light switches with dimmer switches in my User question living room? RETRIEVED CONTEXTS [’Provides safe and efficient installation of ceiling lights, wall lamps, and energy-saving lighting systems.’, ’Specialist in installing ceiling lights, wall fixtures, and energy-efficient LED systems for homes and offices.’] EVALUATION Faith 1 Cont. Rel 0 64 USER INPUT Please install a new dishwasher and connect it to the existing User question plumbing. RETRIEVED CONTEXTS [’Specialist in sink, faucet, and dishwasher installation and repair. Keeps your kitchen running smoothly.’] EVALUATION Faith 0.5 Cont. Rel 0 USER INPUT My car’s A/C isn’t cooling can you recharge the refrigerant and User question check for leaks? RETRIEVED CONTEXTS [’Diagnoses and repairs car air conditioning systems. Services include refrigerant recharge and compressor replacement.’, ’Repairs vehicle AC systems including refrigerant top-up and compressor fixes.’] EVALUATION Faith 1 Cont. Rel 1 USER INPUT User question Can you install LED recessed lighting in my kitchen ceiling? RETRIEVED CONTEXTS [’Specialist in installing ceiling lights, wall fixtures, and energy-efficient LED systems for homes and offices.’] EVALUATION Faith 1 Cont. Rel 1 65 USER INPUT I need someone to replace the corroded pipes under my bathroom User question sink. RETRIEVED CONTEXTS [’Replaces corroded or leaking water pipes in kitchens, bathrooms, and basements. Uses durable materials and precise fitting.’, ’Installs and repairs toilets, bathtubs, and showers. Solves low pressure, clogging, and pipe issues efficiently.’] EVALUATION Faith 0.667 Cont. Rel 1 USER INPUT User question Please install a garbage disposal unit in my kitchen sink. RETRIEVED CONTEXTS [’Installs and seals new sinks, faucets, and garbage disposals in kitchens.’] EVALUATION Faith 1 Cont. Rel 1 66 USER INPUT Can you repair the electrical wiring for my oven? It’s not heating User question properly. RETRIEVED CONTEXTS [’Handles residential and commercial wiring projects. Experienced in circuit installation, breaker panel upgrades, and fault diagnosis.’, ’Fixes electrical issues in ovens, washing machines, dryers, and other household appliances. In-home repair visits available.’] EVALUATION Faith 1 Cont. Rel 1 USER INPUT I need someone to install a smart lighting system throughout my User question home. RETRIEVED CONTEXTS [’Installs smart lighting, thermostats, and home automation systems. Ensures safe integration with existing wiring.’] EVALUATION Faith 1 Cont. Rel 1 67 USER INPUT User question Can you replace the suspension struts on my vehicle? RETRIEVED CONTEXTS [’Fixes vehicle suspension systems for a smoother and safer ride.’, ’Fixes worn-out shocks, struts, and suspension components. Improves ride comfort and vehicle stability.’] EVALUATION Faith 1 Cont. Rel 1 USER INPUT User question Please install a surge protector at the main electrical panel. RETRIEVED CONTEXTS [’Installs whole-home surge protection devices to prevent appliance damage during voltage spikes.’] EVALUATION Faith 1 Cont. Rel 1 68 USER INPUT User question My toilet keeps leaking an you replace the internal components? RETRIEVED CONTEXTS [’Installs and replaces standard and smart toilets. Handles leaks, flushing issues, and low water pressure.’, ’Responds quickly to pipe bursts, leaks, and severe clogs. Available 24/7.’] EVALUATION Faith 1 Cont. Rel 1 USER INPUT I need someone to install outdoor waterproof lighting along my User question walkway. RETRIEVED CONTEXTS [’Installs garden and pathway lighting with waterproof systems. aesthetics and security.’] EVALUATION Faith 1 Cont. Rel 1 69 Enhances USER INPUT User question Can you perform a full electrical safety inspection for my home? RETRIEVED CONTEXTS [’Performs safety inspections for residential and rental properties. Identifies fire hazards and code violations.’, ’Inspects residential and commercial wiring for safety compliance and issues.’] EVALUATION Faith 1 Cont. Rel 0.5 Example 2: Ambiguous Questions USER INPUT User question My car is making a strange noise can someone check it? RETRIEVED CONTEXTS [’Experienced in replacing brake pads, calipers, and rotors. Ensures your vehicle stops safely and smoothly.’, ’Performs diagnostics and repairs on engine misfires, leaks, overheating, and timing belt replacements.’, ’Uses OBD-II and advanced tools to find check engine issues, sensor faults, and system inefficiencies.’, ’Skilled mechanic providing car diagnostics, oil changes, brake repairs, and general vehicle maintenance services.’] EVALUATION Faith 1 Cont. Rel 1 70 USER INPUT User question The lights in my house keep flickering. Any idea why? RETRIEVED CONTEXTS [’Handles residential and commercial wiring projects. Experienced in circuit installation, breaker panel upgrades, and fault diagnosis.’, ’Certified residential electrician with experience in wiring, panel upgrades, lighting installation, and electrical troubleshooting for homes.’, ’Provides safe and efficient installation of ceiling lights, wall lamps, and energy-saving lighting systems.’] EVALUATION Faith 1 Cont. Rel 1 USER INPUT User question I have a leak in the bathroom. Can anyone help? RETRIEVED CONTEXTS [’Responds quickly to pipe bursts, leaks, and severe clogs. Available 24/7.’, ’Fixes leaky pipes, replaces corroded plumbing, and handles emergency indoor water issues efficiently.’, ’Handles plumbing for full bathroom renovations including showers and vanities.’, ’Installs and repairs toilets, bathtubs, and showers. and pipe issues efficiently.’] EVALUATION Faith 0.667 Cont. Rel 1 71 Solves low pressure, clogging, USER INPUT User question My dishwasher isn cleaning dishes properly. RETRIEVED CONTEXTS [’Specialist in sink, faucet, and dishwasher installation and repair. Keeps your kitchen running smoothly.’] EVALUATION Faith 1 Cont. Rel 1 USER INPUT User question The water pressure in my shower is really low. RETRIEVED CONTEXTS [’Installs and replaces standard and smart toilets. Handles leaks, flushing issues, and low water pressure.’, ’Installs and replaces toilets, showers, faucets, and vanity sinks. Also handles related water pressure issues.’, ’Installs and repairs toilets, bathtubs, and showers. low pressure, clogging, and pipe issues efficiently.’] EVALUATION Faith 1 Cont. Rel 1 72 Solves USER INPUT User question My thermostat isn’t working as expected. RETRIEVED CONTEXTS [’Installs smart thermostats, lighting systems, and security devices. Ensures modern electrical integration in your home.’] EVALUATION Faith 1 Cont. Rel 1 USER INPUT User question One of my breakers keeps tripping randomly. RETRIEVED CONTEXTS [’Troubleshoots tripping or overheating breakers. Offers repair or replacement services with full testing.’] EVALUATION Faith 1 Cont. Rel 1 USER INPUT User question My kitchen sink is draining very slowly. RETRIEVED CONTEXTS [’Unclogs drains in kitchens, bathrooms, and basements using mechanical tools.’, ’Unclogs kitchen, bathroom, and floor drains using mechanical and chemical methods. Fast and affordable.’] EVALUATION Faith 1 Cont. Rel 1 73 USER INPUT User question The A/C in my car isn’t cooling well. RETRIEVED CONTEXTS [’Diagnoses and repairs car air conditioning systems. Services include refrigerant recharge and compressor replacement.’, ’Repairs vehicle AC systems including refrigerant top-up and compressor fixes.’] EVALUATION Faith 0.5 Cont. Rel 1 USER INPUT User question My washing machine stopped spinning. RETRIEVED CONTEXTS [’Fixes electrical issues in ovens, washing machines, dryers, and other household appliances. In-home repair visits available.’] EVALUATION Faith 1 Cont. Rel 1 74 USER INPUT User question The ceiling light in my living room doesn’t turn on. RETRIEVED CONTEXTS [’Handles residential and commercial wiring projects. Experienced in circuit installation, breaker panel upgrades, and fault diagnosis.’, ’Provides safe and efficient installation of ceiling lights, wall lamps, and energy-saving lighting systems.’, ’Specialist in installing ceiling lights, wall fixtures, and energy-efficient LED systems for homes and offices.’] EVALUATION Faith 1 Cont. Rel 1 USER INPUT User question My toilet keeps running after flushing. RETRIEVED CONTEXTS [’Responds quickly to pipe bursts, leaks, and severe clogs. 24/7.’, ’Installs and repairs toilets, bathtubs, and showers. Available pressure, clogging, and pipe issues efficiently.’] EVALUATION Faith 0 Cont. Rel 0 75 Solves low USER INPUT User question There water under my kitchen sink. RETRIEVED CONTEXTS [’Repairs or replaces leaking, dripping, or stiff kitchen faucets. Handles both modern and traditional fixtures.’, ’Specialist in sink, faucet, and dishwasher installation and repair. Keeps your kitchen running smoothly.’] EVALUATION Faith 0.5 Cont. Rel 1 USER INPUT User question My car won’t start this morning. RETRIEVED CONTEXTS [’Provides vehicle repair services at your location, including battery replacement, minor engine diagnostics, and brake service.’] EVALUATION Faith 1 Cont. Rel 1 76 USER INPUT User question The oven isn heating up properly. RETRIEVED CONTEXTS [’Fixes electrical issues in ovens, washing machines, dryers, and other household appliances. In-home repair visits available.’] EVALUATION Faith 1 Cont. Rel 1 USER INPUT User question My motorcycle brakes feel soft. RETRIEVED CONTEXTS [’Experienced in repairing motorcycles: engine tuning, brake adjustments, chain servicing, and general maintenance.’, ’Experienced in replacing brake pads, calipers, and rotors. Ensures your vehicle stops safely and smoothly.’, ’Replaces brake pads, fluids, and rotors. braking.’] EVALUATION Faith 1 Cont. Rel 1 77 Ensures responsive USER INPUT User question The garage door opener isn responding. RETRIEVED CONTEXTS [’Troubleshoots tripping or overheating breakers. Offers repair or replacement services with full testing.’] EVALUATION Faith 1 Cont. Rel 1 USER INPUT User question My dryer is making a loud noise. RETRIEVED CONTEXTS [’Fixes electrical issues in ovens, washing machines, dryers, and other household appliances. In-home repair visits available.’] EVALUATION Faith 1 Cont. Rel 1 USER INPUT User question The smart light bulbs aren connecting to the app. RETRIEVED CONTEXTS [’Installs smart lighting, thermostats, and home automation systems. Ensures safe integration with existing wiring.’] EVALUATION Faith 0.5 Cont. Rel 1 78 USER INPUT User question My bathroom fan stopped working. RETRIEVED CONTEXTS [’Handles plumbing for full bathroom renovations including showers and vanities.’, ’Installs and replaces toilets, showers, faucets, and vanity sinks. Also handles related water pressure issues.’] EVALUATION Faith 0 Cont. Rel 0 79