Utilizing Machine Learning to Forecast the Charging Patterns of Electric Vehicles by Saeedeh Goodarzvand Chegini M.Sc., Islamic Azad University, 2019 B.Sc., Islamic Azad University, 2016 PROJECT SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENT FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE UNIVERSITY OF NORTHERN BRITISH COLUMBIA October 2024 ©Saeedeh Goodarzvand Chegini,2024 Abstract This project involves the application of advanced machine learning techniques to forecast the charging behaviors of electric vehicles (EVs), addressing the growing demand for a robust and efficient charging infrastructure as EV adoption accelerates. Utilizing historical data from NL Hydro’s public EV charging network, this research aims to develop predictive models that can optimize charging schedules, reduce peak demand on the power grid, and enhance overall charging efficiency. This study applies a variety of machine learning algorithms, including Isolation Forest for anomaly detection, Support Vector Regression for precise regression tasks, Random Forest for robust predictive modeling, XGBoost for highefficiency gradient boosting, and ensemble methods such as Stacking Regressor to improve predictive accuracy by combining multiple models. These algorithms help analyze key factors such as the starting state of charge (SOC), energy consumption during charging sessions, and the duration of charging events. The models are designed to predict charging behavior patterns, providing insights into how EV users interact with charging infrastructure. The findings reveal that EV users mainly engage in short, frequent charging sessions, typically beginning when the SOC is at a medium level and concluding when it reaches a high level. This pattern suggests a strategic approach to optimizing driving range while reducing concerns about running out of battery. The project contributes to the advancement of intelligent transportation systems by offering data-driven insights that can guide policymakers, utility companies, and the car industry. By optimizing EV charging infrastructure, the study supports the broader goal of sustainable mobility, facilitating the transition to electric transportation while achieving longterm environmental and economic benefits. i Contents Abstract ................................................................................................................................ i List of Tables ...................................................................................................................... vi List of Figures.................................................................................................................... vii Acknowledgement and Dedication .................................................................................. viii Chapter 1: Introduction ...................................................................................................... 1 1.1 The Rise of Electric Vehicles ................................................................................. 1 1.2 Environmental and Economic Implications ............................................................ 2 1.3 Challenges in EV Charging Infrastructure .............................................................. 2 1.4 Importance of Coordinated Charging ..................................................................... 3 1.5 Technological Solutions ......................................................................................... 3 1.6 Research and Development .................................................................................... 4 1.7 Challenges of the Research .................................................................................... 5 1.8 Research Objective ................................................................................................ 7 Chapter 2: Background....................................................................................................... 8 2.1 Forecasting EV Charging Loads for Sustainable Transportation ............................. 8 2.2 Optimizing EV Charging with Predictive Algorithms for Grid Stability ............... 10 2.3 Smart Home Energy Management with Predictive EV Charging .......................... 13 2.4 Analyzing Rapid Charging Patterns of BEVs for Improved Infrastructure ............ 14 2.5 Predicting EV Charging Loads with Enhanced Random Forest Algorithm ........... 17 2.6 Predicting EV Arrival and Departure Times Using Support Vector Machines (SVM) for Grid Management ...................................................................................................... 19 2.7 Intelligent Charging and Load Management for EV Integration ........................... 21 2.8 California Blackouts ............................................................................................ 23 2.8.1 California to Ban the Sale of New Gasoline Cars ......................................... 24 2.8.2 Flex Alert: .................................................................................................... 25 Chapter 3: Methodology ................................................................................................... 27 3.1 Data Description .................................................................................................. 28 3.1.1 Inputs and Output......................................................................................... 28 3.2 Purposes and Background .................................................................................... 29 3.3 Data preprocessing .............................................................................................. 30 ii 3.3.1 Conversion of Time...................................................................................... 30 3.3.2 Selection of features ..................................................................................... 30 3.4 Exploratory Data Analysis ................................................................................... 31 3.4.1 Descriptive Statistics .................................................................................... 31 3.4.2 Data Visualization ........................................................................................ 31 3.4.2.2 Histograms ...................................................................................................... 32 3.5 Modeling ............................................................................................................. 33 3.5.1 What does Machine Learning mean? ............................................................ 33 3.5.2 What does Data Mining mean?..................................................................... 34 3.5.3 Isolation Forest for Anomaly Detection ........................................................ 35 3.6 Neural Network ................................................................................................... 36 3.7 Random Forest Description.................................................................................. 37 3.7.1 Basic Principles of Random Forest ............................................................... 38 3.7.2 Construction of Random Forest .................................................................... 38 3.7.3 Handling of Feature Importance ................................................................... 38 3.7.4 Majority Voting Mechanism ......................................................................... 39 3.8 Support Vector Regression (SVR) ........................................................................ 39 3.8.1 Fundamental Concept................................................................................... 39 3.8.2 Loss Function .............................................................................................. 40 3.8.3 Optimization Problem .................................................................................. 40 3.8.4 Kernel Trick ................................................................................................. 40 3.8.5 Advantages and Challenges .......................................................................... 40 3.8.6 Practical Implementation.............................................................................. 41 3.9 XGBoost.............................................................................................................. 41 3.9.1 Dealing with Missing Values ........................................................................ 42 3.9.2 Advantages of XGBoost............................................................................... 43 3.10 Stacking Regressor .............................................................................................. 43 3.10.1 Structure of the model .................................................................................. 44 3.10.2 Process of Training ...................................................................................... 44 3.10.3 Primary models ............................................................................................ 44 3.10.4 Meta-Regressor ............................................................................................ 44 3.10.5 Benefits........................................................................................................ 45 3.11 Voting Regressor.................................................................................................. 45 iii 3.12 Structure of the model.......................................................................................... 46 3.13 Categories of Averaging ....................................................................................... 46 3.14 Process of Training .............................................................................................. 46 3.15 Benefits ............................................................................................................... 46 3.16 Model Evaluation ................................................................................................ 47 3.16.1 Mean Absolute Error (MAE) ........................................................................ 47 3.16.2 Root Mean Squared Error (RMSE)............................................................... 47 3.16.3 Symmetric Mean Absolute Percentage Error (SMAPE) ................................ 48 3.16.4 Challenges and Implementation Strategies in Model Development: .............. 48 Chapter 4: Experiment Design ......................................................................................... 50 4.1 Data Pre-Processing ............................................................................................. 50 4.1.1 Handling Missing Data ................................................................................ 50 4.1.2 Data Normalization ...................................................................................... 51 4.1.3 Time Data Conversion ................................................................................. 51 4.1.4 Feature and Target Variable Preparation ....................................................... 51 4.2 Energy Consumption ........................................................................................... 52 4.3 Start SOC (State of Charge) ................................................................................. 52 4.4 End SOC ............................................................................................................. 53 4.5 Charging Duration ............................................................................................... 53 4.6 Insights ................................................................................................................ 54 4.7 Box Plot............................................................................................................... 55 4.8 Energy Consumption Patterns .............................................................................. 55 4.9 Starting State of Charge ....................................................................................... 56 4.10 Ending State of Charge ........................................................................................ 56 4.11 Charging Duration ............................................................................................... 57 4.12 Insights ................................................................................................................ 57 4.13 Isolation Forest Outlier Detection ........................................................................ 58 4.13.1 Isolation Forest Analysis of Outlier Detection .............................................. 58 4.13.2 Comprehension of the Plot ........................................................................... 58 4.13.3 Major Observations ...................................................................................... 59 4.13.4 Insights: ....................................................................................................... 60 4.13.5 Conclusion: .................................................................................................. 60 4.14 Analysis of Neural Network Model...................................................................... 61 iv 4.15 Analysis of the model's framework: ..................................................................... 61 4.15.1 Metrics for evaluating performance .............................................................. 62 4.15.2 Analysis and explanation of findings ............................................................ 63 4.16 Analysis of Random Forest Model ....................................................................... 63 4.17 Analysis of Support Vector Regression Model ..................................................... 65 4.18 Analysis of XGBoost Model ................................................................................ 66 4.19 Analysis of Voting Regressor Model .................................................................... 67 4.19.1 Outcome: ..................................................................................................... 67 4.19.2 Example Predictions: ................................................................................... 68 4.20 Analysis of Stacking Regressor Model ................................................................. 69 4.21 Result Comparison: ............................................................................................. 70 4.21.1 Explanation of Performance Metrics: ........................................................... 70 4.21.2 Detailed Comparison:................................................................................... 71 Chapter 5: Conclusion ...................................................................................................... 75 5.1 The Journey of Discovery .................................................................................... 75 5.2 Achievements in Predictive Modeling.................................................................. 76 5.3 Real-World Applications and Implications ........................................................... 77 5.4 Future Directions and Opportunities .................................................................... 77 5.5 Broader Impact and Societal Contributions .......................................................... 79 5.6 Final Reflections and the Road Ahead.................................................................. 80 References .......................................................................................................................... 81 v List of Tables Table 4.1: Real and predicted values by Voting Regressor ................................................... 68 Table 4.2: Evaluation Metrics ............................................................................................. 71 Table 4.3: Stacking Regressor and Voting Regressor Comparison ....................................... 74 vi List of Figures Figure 4.1: Energy and Duration Histogram ....................................................................... 55 Figure 4.2: Energy and Duration Boxplots .......................................................................... 58 Figure 4.3: Isolation forest scatter plot................................................................................ 61 vii Acknowledgement and Dedication I would like to express my deepest gratitude to my supervisor, Dr. Fan Jiang, for his invaluable guidance, support, and encouragement throughout the course of this project. His expertise and patience have been instrumental in shaping both my research and academic journey. I am truly fortunate to have had the opportunity to learn from him. I would also like to extend my heartfelt thanks to my parents for their unwavering love and support. Their belief in me and constant encouragement have been my greatest source of strength. I am forever grateful for the sacrifices they have made and for always standing by my side. viii Chapter 1: Introduction With the increasing popularity of EVs, the transportation sector is undergoing significant transformation. Reducing greenhouse gas emissions and reducing transportation's environmental impact depend on this transformation. Electric vehicles have potential as a climate solution due to their lower emissions compared to traditional internal combustion engine (ICE) cars. Studies suggest that EVs can reduce carbon emissions by as much as 45% compared to internal combustion engine (ICE) vehicles [1]. However, the rapid adoption of electric vehicles brings significant challenges, particularly in developing a reliable and efficient charging infrastructure, which is essential for widespread use [2]. 1.1 The Rise of Electric Vehicles For the last 10 years, the rise of the electric vehicle market has been notable because of better battery technology, more concern for the environment, and assistance from governments. Initially, limited driving range and battery reliability were major obstacles to electric vehicle adoption, but technological advancements have largely addressed these issues, making EVs more appealing to a broader range of consumers [3].This shift has led to a notable increase in the market share of electric vehicles, making it possible for them to compete with internal combustion engine (ICE) based vehicles’ domination [3]. In as much as progress has been made, there remain important challenges in respect to EVs such as grid infrastructure. High electric vehicle uptake therefore overloads power grids current distribution networks given that EV charging has high power requirements. To prevent grid meltdown or other system failures, it is essential for charging stations to be managed in an efficient manner leading to more stable electricity provision [4]. 1 1.2 Environmental and Economic Implications The migration from ICE engines to electric power is far beyond technology considering it as an environmental issue too. It is important to note that emissions from transportation are largely due to vehicles running on roads across the world. We should not forget about traditional ICE (Internal combustion engines) vehicles, which cause global environmental problems e.g., air pollution and an increase in average world temperatures. This would then mean that electric vehicles run on electric power and not fossil fuels like oil or coal, so we don’t have to worry about CO2 emissions caused by their combustion [5]. Additionally, there are substantial economic consequences inherent in adopting EVs. Promotion of electric mobility can stimulate new businesses as well as provide employment opportunities in fields such as battery production, renewable energy and smart grid technologies. However, these economic benefits will only be realized if we have reliable support systems put in place, particularly charging stations. Such issues include availability of charging stations, speed at which they charge and finally how renewable energies are implemented into electricity grids [5]. 1.3 Challenges in EV Charging Infrastructure Achieving an effective charging infrastructure for electric vehicles remains a considerable challenge. The underdeveloped nature of the current charging network has made it impossible for it to keep pace with increasing needs, in some cases resulting in long queues at public charging stations and uneven distribution of charging facilities. This is not only inconvenient for the users of electric vehicles but also poses a potential danger to the power distribution system’s stability [5]. 2 A key solution to these problems is in improving scheduling algorithms for public EV charging stations. The proper way of doing this would be to make schedules that are efficient enough to use all available charging resources while reducing waiting times for EV users as well as decreasing the burden on the electricity grids. On the other hand, uncoordinated charging patterns can cause peak demands affecting grid stability and causing occasional blackouts. Such a system could help in attaining an balanced distribution of charging load over time thus guaranteeing stable power generation and delivery [2]. 1.4 Importance of Coordinated Charging The importance of coordinated charging behavior is great. When multiple EV's are charged at the same time it can cause overloading of the grid leading to high operational costs because there will be need to reinforce grid; hence multiple EV's must charge at different times. This implies that the management of charging stations schedule is beyond mere convenience; it plays a key role in ensuring that the power system remains efficient and reliable. The use of advanced scheduling algorithms supported by predictive models on charging behaviors has potential to greatly enhance operation of charging networks [6]. 1.5 Technological Solutions Among the several technical options under research to optimize the management of EV charging infrastructure, machine learning algorithms are being developed to forecast EV charging behavior and hence maximize charging schedules. Deep reinforcement learning (DRL) has been applied, for example, to regulate multiple EV chargers at the distribution level, therefore demonstrating the potential of learning and adaptation to very unstable surroundings [7]. Furthermore, machine learning methods as random forests and ensemble 3 learning have been used to forecast home or community charging needs, hence improving the effectiveness of such systems [8]. Furthermore, development of intelligent charging systems considering several charging profiles for different electric vehicles helps to maximize the available charging resources. Therefore, these data-driven models forecasting specific charge profiles by omitting any unknown internal characteristics can produce more precise scheduling while enhancing the total charging network performance. By including these sophisticated predictive models into the operation of public EV charging stations, this may thus considerably enhance user experience and power system stability [9]. 1.6 Research and Development In this project, the main aim is to use prediction models in developing that will aid in predicting closely the charging behavior of electric vehicles thus, contributing towards advancement of intelligent transportation systems. Through analyzing historical data got from charging stations which are NL Hydro-owned plus use of more advanced machine learning methods; the major focus of the project is in achieving optimized schedules for public EV charging networks. Primarily, we are keen on anticipating single session charging time, calculating consumed energy during those periods and improving algorithms facilitating efficient utilization and friendliness of the charging infrastructure. A comprehensive methodology that includes data preprocessing, feature engineering, model selection and evaluation will be employed in this project. With aim to provide practical insights based on historical data as well as current machine leaning technologies towards improving on EV charging management issues; a generalization of the results is done leading to increased use these vehicles by many people across the globe. Ultimately though its various research objectives it should be able to act as an agent in helping shift towards 4 environmentally friendly cheap means of transportation therefore affecting global climate change campaign. The electric vehicle market has experienced rapid growth in recent years which provides many opportunities as well as several challenges ahead too. On one hand electric cars (EVS) provide hope for reducing carbon emissions through reduced tailpipe pollution; however, developing reliable fast charging network is greatly needed for its wide acceptance. By solving the non-coordination problem among chargers, optimizing their location (e.g., shared parking lots with offices) and capacity utilization of public stations (to avoid long queues) and providing integrated technical solutions or regulation support, it is possible to make them competitive solutions in comparison with internal combustion engine cars and promote the switch to sustainable transport modes. This research aspires to provide support for the development of intelligent transportation systems and broader sustainability by designing models that would predict electric vehicle charging accurately. 1.7 Challenges of the Research At various points during drafting this project, I came across a variety of challenges which sort of posed a test to my technical and complexity managing skills. First was working with numerous machine learning algorithms that were hard because they each required different tuning and optimization procedures. It was not a small task to cross over diverse datasets for these models to do well; striking this balance took a lot of trial and error. Another big issue I had to deal with is making sure the models are ready for the future. This means I had to make sure the EV model I made still works; regardless of the changes in battery technology and user behavior that may occur tomorrow. This was difficult since I had to consider how predictions would be affected by developments in energy storage technology and customer behavior. 5 Managing time was always a problem area. Just by itself, data preprocessing involves cleaning, organizing and preparing raw data for analysis, and takes a lot of time. In addition, there was implementing different algorithms within limited timelines low because they needed to be implemented and tested alongside a super-organized mind alongside high levels of discipline. Interpreting and integrating the results from the models into a coherent framework that could be used by stakeholders like policymakers and utility companies was another complex task. This was difficult given that it was also important to act on them and understand them wherefore they were not only accurate but also easy to comprehend, making an added layer of complexity in the study. There was another challenge in managing a large amount of data. The computational requirements for advanced machine learning models on large datasets are very high and ensuring that processing is efficient and effective remains an ongoing challenge. It was a complex issue understanding these models and uniting them into an understandable format for policy makers at large or even industries where electricity is widely consumed. This project called for not only exact discharge but also useful information, which made it even more challenging. The problem of just how overwhelming large amounts of information can be is yet another stumbling block. Advanced machine learning models require significant computational power for their running when applied in big dataset domains such as this study; a situation that created difficulties in achieving efficacy. 6 It also wasn’t easy staying current with the latest developments in machine learning, as well as EV technology. The field is always moving forward, so I kept myself updated with new techniques and trends throughout my study. Though, these challenges were important in my growth as a professional who is both technically advanced and soundly grounded. More so, they also were a big lesson for me on how important it is to be flexible, persistent and precise whenever necessary but still consider practicality. 1.8 Research Objective The primary objective of this research is to develop and implement advanced machine learning models to predict the charging behaviors of EVs based on historical data. By using data from NL Hydro’s public EV charging network, this study aims to create predictive models that optimize charging schedules, reduce peak demand on the power grid, and enhance the overall efficiency of EV charging infrastructure. The research focuses on addressing the variability in charging patterns across different users and locations, ensuring that the models developed are both accurate and adaptable to future changes in EV technology and user behavior. Ultimately, the goal is to provide actionable insights that can support the development of a sustainable and efficient EV charging network, benefiting policymakers, utility companies, and stakeholders in the automotive industry. 7 Chapter 2: Background 2.1 Forecasting EV Charging Loads for Sustainable Transportation Electric vehicle and driving towards sustainability: Comparison between EV, HEV, PHEV, and ICE vehicles to achieve net zero emissions by 2050 from EV [10]. The rise of electric vehicles in modern transport systems has created a need for new forecasting methods for their power consumption in general, and electricity charging load in particular. According to this research, random forest algorithm is used to develop an EV charging load forecasting model. The increase in number of EVs resulted in greater demand for charging each time, making it necessary to investigate how this increase will lead to overloading of existing power grids especially during peak periods. An accurate and reliable model that can forecast individual station charges is essential for load balancing and capacity planning purposes. In this regard, the study applies the Classification and Regression Tree (CART) algorithm in order to make short-term predictions for single stations [1]. They also designed an algorithm which predicts daily charge capacity for different sized or located stations. Using both regression and classification models over a large historical data set of charging data from which it learns effectively since it can perform additional tasks other than just classification alone. It divided the data into details that allow building solid models and predict variations in charging current [10]. An analysis of charging stations data for Shenzhen was carried out with an emphasis on time and place. The temporal one is concerned with the pattern tails of energy consumption during the year and within each day pattern revealed by this analysis indicates that demand increases greatly from winter to summer season while on holidays there are even other differences in between. The second analysis locationally describes different stations 8 charging loads taking into account such factors as where each station is located and what features are around it [10]. The suggested Random Forest model is a mix of several machine learning models that improve prediction accuracy. It looks at various factors like time of the day, day of the week among many others within which electric power demand varies as well as specific station information [10]. [11] is primarily concerned with designing data-driven strategies for smartly charging diverse electric vehicle fleets It notes that EVs charge in nonlinear profiles like constantcurrent, constant-voltage (CCCV) mode, an approach that leads to less efficient use of the charging infrastructure if not well-managed [11]. Consequently, smart charging seeks to optimize the use of such infrastructure by creating effective charging schedules for each EV based on charging patterns [11]. In [11] the authors suggest using machine learning to forecast power consumption during charging of EVs. The study relies on a dataset obtained from 2016 to 2018 consisting of 10,595 charging events from 1,001 EVs of 18 different models. The preprocessed dataset contains 1.2 million data points. Several machine learning models such as linear regression, neural networks and XGBoost were trained in order to predict charging profiles. XGBoost demonstrated the highest performance as it managed to achieve an MAE of 126W and a relative MAE of .06 [11]. According to simulations, integrating the XGBoost model into a smart charging algorithm makes the charging infrastructure more efficient for up to 21% of energy charged. It goes on to illustrate that it is crucial for the developers to pay attention to the actual behavior of batteries during charging when designing algorithms for smart charging. If this is done, then smart charging becomes more efficient using data-driven predictions thereby 9 ensuring that the charging resources are distributed fairly within this field. Working place charging stands to benefit from this particular approach where several EVs have to be charged at once. The overall performance of smart charging algorithms is improved through the incorporation of machine-learning models leading to better battery utilization as well as higher SOC for EVs [11]. The study examines real-world data from public AC charging stations in the Netherlands to dissect how different types of EVs charge. The wide adoption of EVs could strain the power system, particularly during peak hours as emphasized by the research. It also highlights the importance of delayed charging so that the system could be optimized hence relieving the pressure put on the grid by electricity usage. Unlike many other studies which treat EV charging load as a constant entity, this research instead observes actual behaviors during each session aiming at making those sessions more convenient and possibly improving the efficiency of smart charging algorithms [11]. 2.2 Optimizing EV Charging with Predictive Algorithms for Grid Stability Numerous academic researchers have started studying how to ensure the smooth integration between the grid and the charging of battery electric vehicles following their wide adoption. Unlike conventional vehicles, EVs offer an environmentally friendly mean of transport hence contribute to decrease in the rate at which greenhouse gases are emitted into the atmosphere. However, the rise in EVs usage escalates electricity needs which challenges the power grid dynamics with regard to stability and efficiency. Effective management of EV charging is essential to reduce these issues, ensuring that the benefits of EV adoption are maximized while minimizing negative impacts on the power system [6]. User behavior prediction is essential in EV charging management. This study employs various machine learning algorithms to forecast EV users' stay duration and energy 10 consumption based on historical charging data. The algorithms discussed in this report are Multiple Linear Regression, Support Vector Regression, Decision Tree (DT) regression, Random Forest (RF) Regression, and K-Nearest Neighbor (KNN) Regression. These algorithms do have their strong and weak points that depend on specific nature of data and user behavior patterns [6]. Incorporating right behavioral predictions into EV charging scheduling considerably improves the efficiency of power distribution. By predicting when and how long the EVs are planned to be charged, it helps grid operators share resources more effectively in order to balance load, as well as prevent such peaks that may destabilize power grids. This approach also reduces the waiting time experienced by EV users and makes the process of charging more user friendly. After Define the desired goal, there are several steps that need to be taken before an optimal predictive model can be developed. First off, it is important to clean the data and eliminate discrepancies. This also involved aligning timestamps with weather data from an external source. Other attributes we engineered were essential factors for the charging patterns in aspects like time or context where they occur among many others for example time-related characteristics about different charging sessions could be used as features while at the same time the location information relating to the charging environment could also be utilized among others. At this juncture, we examine the preliminary results on the applicability of the proposed algorithm in making better EV charging predictions. The study evaluates the performances of various machine learning techniques that are most effective according to different types of charging patterns. This is done with the help of various algorithms to improve the accuracy of the predictions and make them consistent [6]. The results also show that the proposed algorithm can predict EV users’ behavior accurately, leading to better scheduling and distribution of resources. Nevertheless, these predictions may vary with the type of data and algorithms employed. The study also 11 highlights the importance of continuous data collection and model refinement to adapt to changing patterns in EV usage. This research recommends that accurate EV user behavior prediction is needed if charging scheduling is to be optimized and load on power grid is managed. An integration of machine learning models gives a possibility to make these estimations more robust and enhance efficiency and reliability of specific electric vehicle infrastructures. Hence this study provides insights into how grid bodies as well as policymakers can facilitate transition towards sustainable and efficient transportation systems [6]. Researchers are studying electric vehicle charging behavior to optimize EV charging schedules using machine learning algorithms [12]. They studied actual charging data for 252 users and discovered patterns in electric vehicle charging behavior such as stay duration and power consumption. Prediction accuracy of these behaviors is influenced by entropy and sparsity of the data, hence the need for a ratio (R) that combines entropy and sparsity thus identifying a primary indicator of algorithm selection [12] Three main discussed algorithms in [12] are: Support Vector Regression: It has high precision in predicting stay duration when the entropy/sparsity ratio (R) is low [12]. Random Forest Regression: Random Forest Regression: More accurate for predicting energy consumption under low R conditions [12]. Diffusion based Kernel Density Estimator (DKDE): It performs best among all other methods during high values of R concerning stay time and electrical power prediction [12]. The authors of this study invented the Ensemble Predicting Algorithm (EPA) by integrating the above algorithms that greatly enhance forecasting consistency. In this study, 12 EPA demonstrated a reduction of 11% in prediction errors for stay duration and 22% in energy consumption [12]. To apply EPA at any scale of charging station, it is assumed that records of electric vehicles are kept. For optimal scheduling across a distribution grid, integration with Open Charge Point Protocol (OCPP) and the use of real-time data for better predictions is proposed [12]. Additionally, the study highlights the practical implications of this work for both power suppliers and EV users, suggesting that better load predictions can lead to more efficient energy management and cost savings. The results validate the effectiveness of the EPA in managing EV charging loads and optimizing the overall performance of the charging infrastructure [12]. 2.3 Smart Home Energy Management with Predictive EV Charging The opportunity for developing micro-grids and smart communities within energy internet offered by the global rise of electric vehicles usage is presented in [13]. those microgrids and smart communities can serve local concentrated energy demands. Electric Vehicle charging management through smart devices is a promising solution, but it heavily depends on predicting the actual charging demand precisely [13]. Despite its significance, householdbased EV charging demand predictions have not been extensively investigated. Most of the current research concentrates on charging station solutions, and general models ignore individual charging behavior [13]. [13] fills the gap by using various well-known machine learning algorithms to forecast when the next day’s household EV charges will be made as well as whether they will occur at all. Some of these algorithms involve Random Forest, Gradient Boosting, Adaptive 13 Boosting, Naive Bayes, K-Nearest Neighbors, and Artificial Neural Networks. They developed a two-layer hybrid stacking ensemble method that combines different types of algorithms with the aim of achieving better prediction results. Furthermore, this work suggests that accurate forecasting of household vehicle-to-grid (V2G) reception is an essential component for efficient Home Energy Management Systems (HEMS) design because HEMS should be able to schedule energy use efficiently. When individual algorithms are combined in an ensemble model, they outperform when used independently based on their complementary nature. To validate the model, empirical data were collected from a house to which a typical private charger had been attached; the results revealed that the model could predict both time of occurrence and likelihood of absence during no-charge events with increased accuracy. They need to implement EV charging prediction as part of larger smart home solutions so that it improves the flexibility and efficiency of household energy management. proposes a method whereby ensemble learning is used to better forecast EV charging demand, which is vital for creating intelligent EV charging systems that support subsequent energy internet infrastructures [13]. 2.4 Analyzing Rapid Charging Patterns of BEVs for Improved Infrastructure [14] investigates the rapid charging patterns of privately-owned battery electric cars (BEVs) and analyzes the several elements that impact these patterns. Drivers frequently encounter limited cruising ranges and extended charging periods as a result of the restricted performance of BEV power cells. Furthermore, the unequal allocation of charging infrastructure leads to partial congestion and waiting at charging stations, further complicating the charging procedure. In order to tackle these obstacles, the research examines the patterns of rapid charging by utilizing data from 130 privately-owned Battery Electric Vehicles in Beijing, which were gathered over a period of seven months. The dataset consists 14 of 15,752 trajectories, out of which 2,161 involve rapid charging Some crucial factors that affect the overcharge behavior are discussed in These are recharge initiation SOC, departure time, trip duration, distance travelled, speed, weather conditions e.g., rain or snowstorms or fog that leads to reduced visibility on roads and has negative effects on road turn-taking behavior, and past charging records. According to it, the lower is the SOC at the beginning of the trip, the more chances for overcharging it. Besides the trips of longer durations as well as the increased speed require fast charges more frequent than any other thing does. Weather conditions significantly impact fast charging behavior. Both high and low temperatures increase the demand for fast charging as additional energy is used for heating or cooling the vehicle. Wind power is another factor, with higher wind speeds leading to increased energy consumption due to higher resistance, thereby necessitating more frequent fast charging [14]. There are findings on day of week effects too: charging behavior barely changes throughout weekdays versus weekends that suggest similar weekly travel habits and charging demands. A binary logistic regression model is employed to forecast if overcharging will happen by means of these factors. The model includes parameters for start-SOC, time-origin, travel duration, driving distance, driving speed, wind power, temperature, and the last fastcharging event. As can be revealed from regression analysis results lower starting SOC as well as lower speed contribute negatively towards the increased likelihood of overcharging while travel duration distance and extreme temperatures increase chances of fast charging [14]. The model’s predictive accuracy is validated using three-tenths of the data with a prediction rate of 89.36%. This high prediction rate shows how effective the model can be in projecting outcomes because the dependent variable can assume only two values. Furthermore, the study compares the logistic regression model with other models, such as univariate linear regression (ULR) and multivariate linear regression, using receiver 15 operating characteristics (ROC) curves and areas under the curve (AUC). The logistic regression model outperforms the other models, indicating its superior predictive performance [14]. Finally, if major overcharge determinants are known, the utilization of quick recharging stations can be improved, and this information can be useful for BEV users seeking to optimize their charging decisions. It is possible to reduce queuing times at charging stations and idle time through proper positioning and operation of fast charging stations thus enhancing user experience and helping spread the adoption of BEVs. Policy makers, researchers as well as industry players are highly likely to benefit from the outcomes of this study as this will guide in making innovations aimed at optimizing BEV infrastructure [14]. Data was collected in [15] is from the Municipality of Amsterdam and energy providers EVNET and NUON. The dataset is composed of detailed charging session records and specific meter values collected every 15 min, thus making it a comprehensive database for analysis. Several primary variables contribute to alteration of charging profiles. They include temperature depending environmental factors like peak times, point of charging like whether both sockets of an outlet are in use and other EV-specific parameters including battery degradation or voltage levels [15]. Among main results are: 1. Environmental Effects: In contrast to expectations, charging at peak hours gets quicker (17:00–21:00), implying that the power grid is robust and exhibits a large capacity [15]. However, it takes longer for daytime charging due to high power loss and voltage fluctuations. Room temperature is positively related with charging rate thereby meaning warmer conditions promote faster charging particularly among 230V EVs [15]. 16 2.Charge Point Characteristics: Charging rate is influenced by another EV at the same charging point. For example, a 230V EV will be charged faster if there is a 400V EV at the other plug socket. However, if both sockets are plugged in, speed of charging decreases because of increased power loss and voltage drop [15]. 3. EV Characteristics: Speed of charging is significantly affected by voltage system – roughly three times faster for 400V EVs than 230V EVs. Another parameter that reduces the speed of such batteries over time is battery degradation with a clear association observed after multiple charging cycles [15]. [15] employs multiple linear regression models to determine how these variables affect charging rate. However, this model tells us that fast dynamic charging profiles should be considered rather than static loads so as to efficiently optimize EV charging infrastructure. Such understanding draws a line between various factors that affect EV charging behavior and therefore contributes to development of smarter charging systems hence this will result in reduced stress on power grid while improving overall user experience. These results could also be beneficial for politicians as well as experts involved with power systems who are interested in supporting environmentally friendly fuel-cell vehicles [15]. 2.5 Predicting EV Charging Loads with Enhanced Random Forest Algorithm An improved Random Forest algorithm for predicting EV charging load is explored in [16]. As the adoption of EVs grows, the power system faces an uphill task of balancing loads, planning capacities and fostering power quality. Therefore, forecasting EV charging load accurately is critical to efficient management and future planning [16]. User behavior based traditional methods have been outperformed by randomness inherent in EV charging loads. This paper seeks to enhance accuracy and reliability of EV charging load predictions using machine learning techniques, specifically one known as Random Forest. RF algorithm with a 17 high efficacy and multi tree arrangements for dataset correlations reduction was chosen. The study employs data from various charging stations in Shenzhen combining single station predictions with station group predictions [16]. The data contained charging records from 2016-2018 thus addressing time-relatedas well as spatial distribution characteristics [16]. Essentially the methodology includes: Temporal and Spatial Analysis: Temporal distribution of charges is analyzed from which we note that the load is higher in summer compared to winter and varies during holidays. As shown through spatial analysis economically developed areas- which include Nanshan, Futian, Long gang and Baoan districts- are marked by high electricity loads for charging facilities [16]. RF Algorithm Application: RF algorithm has been applied in the prediction of shortterm charging load for both single station and group of stations. Characteristic data such as date, time, location and previous charging amounts act as inputs. The algorithm was assessed using metrics like Mean Absolute Percentage Error (MAPE) and Root Mean Square Error (RMSE) [16]. Model Training and Validation: Here we trained the RF model by using 90 percent of data and tested it on the remaining 10%. The model competes against Support Vector Regression and decision tree algorithms. The RF model demonstrates superior accuracy, particularly for predicting charging load at individual stations and station groups [16]. Feature Importance: Among the predictors of charging load, the most crucial are the previous day’s charge, an activity indicator, and time indicators. This information helps in tuning the model and thereby improving its accuracy when making predictions [16]. Results have shown that the RF algorithm is highly accurate in predicting charging load. For single stations, the RF model has an average MAPE of 9.76% and RMSE of 2.27 18 while for station groups it performs well with an MAPE of 10.83% and RMSE of 39.59. The study concludes by noting that the RF-based prediction method is practical and reliable as it can help manage EV charging infrastructure more effectively, which gives power suppliers and policy makers some valuable insights [16]. 2.6 Predicting EV Arrival and Departure Times Using Support Vector Machines (SVM) for Grid Management The main focus of [17] is on predicting the times of electric cars entering and leaving the University of California at San Diego campus with a focus on using Support Vector Machines. Proper electric load forecasting is essential for effective grid management and power distribution especially with the increasing number of electric vehicles particularly in California. To better allocate electric power on the grid, the study intends to predict the arrival and departure times of EVs by utilizing historical data obtained between 2012 and 2014 [17]. With support vector machine (SVM), a machine learning algorithm that has been recognized for its performance when it comes to classification and regression tasks, this research will use data on EVs showing connection and disconnection times as well as cumulative energy consumed. Overnight parking events and very short or negligible charging sessions were eliminated during preprocessing of the dataset while considering only those conducted between 6:00 am- 9:00 pm to concentrate on day-to-day travel patterns among people who own electric vehicles [17]. The data was segmented by weeks and hours were used for categorizing arrival and departure times for every hour within this given range while each year such as 2012, 2013, … has its own training set in addition to validation and testing ones when it comes to selecting features used by the model that help SVM to learn such patterns accurately regarding future 19 arrival or leaving times. Specifically, 45 weeks of data are used for training, whereas 5 weeks each are set aside as testing and validation sets. The attributes that are essential in the SVM model include week number, day of the week, hour, arrival time, previous arrival time, departure time and previous departure time used for teaching the program important trends about when to expect a car on the area [17]. To assess how well the model could make predictions, it used Mean Absolute Percentage Error (MAPE), Root Mean Square Error along with Mean Bias Error (MBE). The variables mentioned contribute to determining how accurate the given forecast is likely going to be on arrival or departure times of EVs which move within a specified area. It has been established that the distribution patterns have changed significantly over the years because there were many more electric vehicles coming into or leaving the UCSD campus between 2012 and 2014. In doing so, the SVM model is trained based on data collected for 50 weeks and tested for 5 weeks while models are previously executed for separate weeks representing various seasons in order to attract broad data coverage [17]. Results from the study show that the SVM model has relatively high prediction accuracy with low error rates. The research compares the performance of the SVM model against a reference forecast based on persistence where the latter uses data from the previous week to forecast the following week. In comparison with others, the SVM model exhibits fewer errors, which means it is more precise and reliable. The findings also suggest that increasing the size of the training dataset leads to better forecasts given the declining values of MAPE and RMSE as seen over time [17]. It is thus seen that accurate short-term load prediction of EVs is vital in achieving optimal power distribution within the grid. There is also an increased efficiency in managing the risk of grid instability due to overloaded EVs through prediction on entrance as well as 20 exit time by SVM model. The method applied herein can be used in future for any university campus or city where there is a lot of power usage by many electric vehicles [17]. 2.7 Intelligent Charging and Load Management for EV Integration [18] discusses how the integration of electric vehicles impacts power systems, as well as the role played by intelligent charging schemes. This paper recognizes the advantages of EVs in terms of environmental conservation; however, they put extra load on electrical grids especially during peak hours. The solution to this therefore is the use of delayed charging options which optimize the charging system and relieve pressure on power systems [18]. Using the actual data from charging sessions performed in the Netherlands for individual EVs, this study examines factors influencing charging profiles uptake. It argues that mainstream approaches view the charging profile as a static load neglecting dynamic behavior during the process of charging. In a bid to address this issue, this study examines how different external factors affect the progression from an empty to a fully charged battery. This includes assessing the influence of other parameters on charging time as well as the overall charging profile [18]. The methodology requires collection and analysis of charging information containing variables like time of day when charging started, duration of each session and initial State of Charge. The main objective of this examination is to help improve EV charging timing by either enabling cost saving opportunities through reduction in consumer electricity bills or peak demand shaving among other benefits during night-time peak off period [18]. According to some major findings, it is necessary to have smart charging management since it would help reduce the negative impacts of EVs on the power system from time to time. As a result, the study suggests that one can come up with more effective electric vehicle 21 charging schemes which are likely to improve charging infrastructure efficiency while keeping power grids stable. The above research ultimately aims at providing viable methods for managing the electrical load increase occasioned by the growth in EVs [18]. [19] focuses on the effect of the widespread adoption of electric vehicles on power systems and shows the relevance of intelligent load management for charging panels. The more EVs there are, the more ecologically friendly it gets but they overload the power grid during peak hours. Therefore, there is a need for late charging strategies that make the most of the charging systems as well as relieve pressure on grids [19]. In order to investigate this, real-world data from public AC charging points in the Netherlands were used by the researchers [19]. This study was meant to show how charging proceeds from an empty battery to a full one and what makes an individual EV’s charging profile differ over time based on various external factors. It would allow for optimization analysis of EV smart charging schemes [19]. This included collecting detailed session-level records including timestamps of start and end as well as electricity supplied, or specific meter readings taken every quarter-hourly period during each session [19]. Particularly in daytime hours, the time of day was an important factor next to property type of the charge point while there were other variables such as battery degradation and voltage levels distinctive of EVs that were considered significant [19]. According to the research results: Environmental Effects: At peak hours (17:00– 21:00), charging speed increases as this illustrates a strong power grid with high capacities whereas during daytime hours charging rates tend to be slower due to increased power losses and deviations on voltage levels. Also, during hot seasons, charging is faster especially for 230V EVs [19]. 22 Charge Point Characteristics: Charging speed is impacted by the presence of another electric vehicle at the same charging point. For instance, if a 400V EV is plugged into one socket and a 230V into the other, then the latter charges faster than when there are no other cars at the point. However, simultaneous charging on both sockets reduces speeds because more power is lost and there are voltage drops [19]. Concerning these variables’ effects on charging rate, multiple linear regression model was employed in the study. The model indicated that rather than using static loads, consideration should be given to dynamic charging profiles in order to optimize EV charging infrastructures effectively. They noted that this disclosed how different factors-controlled EVs’ charging behavior thereby contributing to the creation of better smart load systems for electric vehicles so that power system constraints are reduced [19]. In a nutshell, the research affirms why intelligent use of energy in EV charging is necessary for protecting power systems from harm. By understanding when and how much energy a given electric car would require you can develop strategies that will improve charging infrastructure efficiencies besides ensuring electricity grids remain stable always, amid growing loads from such cars. Therefore, such findings are essential for supporting sustainable growth of electric cars uptake and offering workable solutions to power system load management challenges [19]. [20] 2.8 California Blackouts During the West-wide excessive heat wave on August 14 and 15, 2020, the California Independent System Operator Corporation (CAISO) had to implement rotating electrical outages in California. After the emergency occurrences, Governor Gavin Newsom asked the CAISO, CPUC, and CEC to investigate and report on the underlying reasons for the August outages, once they had taken steps to prevent any future outages. The Final Root Cause 23 Analysis (Final Analysis) includes supplementary data analyses that were previously unavailable during the publication of the Preliminary Analysis. However, it does not significantly alter previous findings and affirms that the three primary causal factors behind the August outages were extreme weather conditions, resource adequacy and planning processes, and market practices [20]. To summarize, the factors were as follows: 1. The significant heat wave caused by climate change in the western United States led to a higher demand for power than what was available and planned for. 2. The resource planning objectives have not kept up with the need for dependable, clean, and inexpensive resources that can match the demand during the early evening hours. This caused the difficulty of matching demand and supply within the intense heat wave. 3. Certain behaviors in the day-ahead energy market worsened the supply difficulties during extremely strained circumstances [20]. 2.8.1 California to Ban the Sale of New Gasoline Cars California authorities have approved a comprehensive plan to impose restrictions and eventually prohibit the sale of automobiles fueled by gasoline, according to state officials. The governor of California has characterized this decision as the initial step towards phasing out the internal combustion engine [21]. The California Air Resources Board has approved a regulation mandating that all new automobiles sold in the state by 2035 must be devoid of greenhouse gas emissions, such as carbon dioxide. The regulation also establishes intermediate benchmarks, mandating that by 2026, 35 percent of newly marketed passenger vehicles must be capable of emitting zero pollutants. The requirement increases to 68 percent by 2030 [21]. According to state officials, the new regulation in California would reduce greenhouse gas 24 emissions from passenger vehicles by almost 50 percent in 2040 compared to the projected levels without the program. Liane Randolph, head of the California Air Resources Board, stated that this would result in the elimination of 395 million metric tons of greenhouse gas emissions, which is comparable to burning 915 million barrels of oil [21]. 2.8.2 Flex Alert: A Flex Alert is a request for customers to willingly reduce their electricity usage when there is an expected shortfall of energy supply, particularly if the California Independent Operator (ISO) has tap into reserves to ensure the stability of the power system. Californians may avert more severe emergency measures, such as rotating power outages, by reducing electricity usage during a Flex Alert [22] Actions should be taken in response to a Flex Alert: Minimize electricity consumption During the period from 4 p.m. to 9 p.m. [22]. This period corresponds to the peak of energy consumption, during which the availability of renewable energy sources such as solar power is relatively low [22]. Adjust the thermostats to a temperature of 78 degrees Fahrenheit or above. Raising the thermostat temperature decreases the burden on air conditioning systems, which are a substantial contributor to energy usage during periods of extreme heat [22]. Refrain from utilizing large household appliances: Delay the utilization of household appliances such as dishwashers, washing machines, and dryers until after 9 p.m. in order to reduce the strain on the electricity grid during periods of high demand [22]. Minimize the use of superfluous lighting: Diminishing illumination aids in reducing the total consumption of power, particularly during periods of high demand [22]. 25 Restrict the charging of electric vehicles during periods of high demand: Electric vehicle owners are requested to charge their vehicles outside the time frame of 4 p.m. to 9 p.m. in order to prevent additional strain on the power system during these crucial hours [22]. 26 Chapter 3: Methodology The objective of this work is to comprehensively analyze and predict the charging behaviors of electric vehicles using modern machine learning techniques. This research focuses on converting charging times into durations and normalizing the dataset. It involves utilizing machine learning models such as Isolation Forest for anomaly detection, Neural Networks, Support Vector Regression, Random Forest, and Stacking Regressor ensemble methods. Through the analysis of key factors such as the starting state of charge (SOC), energy consumption during the journey, and charging duration, any differences in charging will be found. This will enable accurate forecasts of charging time and evaluation of the performance of various predictive models. This study is significant in enhancing comprehension of how individuals charge their electric vehicles, which is essential for optimizing the usage of electric vehicle infrastructure. Hence, by identifying patterns and anomalies in the charging data, one may consistently discover more efficient methods for utilizing charging stations, resulting in shorter waiting times and less congestion, all while avoiding consumer displeasure. Furthermore, accurate time forecasts might potentially improve the scheduling of resource distribution in EV networks, hence boosting their overall sustainability. Finally, this project assesses several machine learning algorithms that are practical in real-world applications by examining their strengths and drawbacks in a human-centered manner. The results are anticipated to provide valuable guidance to policymakers and other stakeholders in the automotive sector for the creation of data-driven plans to expand the EV infrastructure. Ultimately, this will enhance the acceptance of electric vehicles by effectively tackling significant issues related to consumer satisfaction and the overall charging infrastructure. 27 3.1 Data Description The dataset used in this project was provided by NL Hydro and contains detailed records from the public electric vehicle charging network for the entire year of 2022. The dataset includes charging session information from various public charging stations, all equipped with both 62.5 kW direct current fast chargers (DCFC) and 7 kW Level 2 chargers. These charging stations support both CCS and CHAdeMO connector types, ensuring compatibility with a broad range of EV models. The chargers are integrated into the ChargePoint network, allowing for precise station location tracking via ChargePoint’s driver map and open-source platforms like PlugShare. The data offers a valuable snapshot of charging behavior across multiple stations, capturing patterns over time, across locations, and across different user types. Dataset Overview Source: NL Hydro Year of Data: 2022 Types of Chargers: 62.5 kW DCFC Chargers (with CCS and CHAdeMO connections): Primarily used for fast charging, these stations offer high power output to quickly replenish EV batteries. 7 kW Level 2 Chargers: Slower, more common chargers typically used for longer charging sessions or when fast charging is not available. 3.1.1 Inputs and Output Start SOC (State of Charge): Input - This feature indicates the initial charge level (as a percentage) of the EV battery when the charging session begins. It provides insights into 28 when users typically plug in their vehicles for a recharge, highlighting their driving habits or charging strategies. End SOC: Input - This feature records the battery charge level (as a percentage) at the conclusion of the charging session, indicating how much charge the user prefers to accumulate before unplugging. Charging Time: Input - The total duration of the charging session, recorded in hours, minutes, and seconds. This feature is vital for understanding how long users typically spend at charging stations and helps in identifying fast versus slow charging patterns. Energy (kWh): Output - This records the total amount of energy transferred to the vehicle during the charging session. Energy consumption is one of the most critical features for understanding demand on the charging network, as well as for analyzing the load EVs place on the power grid. 3.2 Purposes and Background NL Hydro's objective is to effectively handle the heightened demand on the electrical grid resulting from electrification initiatives, such as electric vehicle charging, household hot water heating, and space heating. Efficiently handling these additional demands is essential for utilities and customers to prevent wasteful investments in the electrical infrastructure, potentially leading to increased power costs for consumers. NL Hydro is getting ready to initiate a Residential EV smart charging trial in order to assess the possibility of moving EV charging demand away from peak hours. This pilot project aims to investigate two distinct control methods: direct connection with smart chargers and Telematics, which entails direct contact with the electric vehicle's internal charging management logic. 29 The dataset is crucial for this research as it offers factual data essential for analyzing and forecasting EV charging behaviors. The research intends to utilize this data in order to enhance the efficiency of EV charging infrastructure and assist NL Hydro's goals of effective grid management and cost-efficient electrification. 3.3 Data preprocessing Data preparation is an essential step to guarantee the quality and uniformity of the dataset prior to doing any analysis. The EV charging data was preprocessed using the given code. The following procedures were followed: 3.3.1 Conversion of Time The Charging Time (hh:mm:ss) column is transformed from the format of hours, minutes, and seconds into a cumulative duration measured in seconds. This conversion standardized the time format, making it more convenient for manipulation and analysis. The function divides the time string into hours, minutes, and seconds, and subsequently transforms these elements into a cumulative count of seconds. 3.3.2 Selection of features Significant characteristics were chosen and retrieved for examination. The selected main characteristics consisted of Start SOC, Energy (kWh), End SOC, and the recently generated duration (converted from Charging Time (hh:mm:ss)). The selection of these traits was based on their direct relation to the research aims, particularly their influence on comprehending and forecasting EV charging trends. Normalization refers to the process of organizing data in a database to eliminate redundancy and improve data integrity. 30 The data was normalized by scaling numerical features to a standard range, typically between 0 and 1. This was done by subtracting the minimum value of each feature and dividing it by the range (maximum value minus minimum value). Normalization ensures that all features contribute equally to the analysis and is crucial for models sensitive to the scale of input data. Dealing with Missing Data: The dataset was examined for any missing values, especially in crucial attributes such as Start SOC, Energy (kWh), End SOC, and Charging Time (hh:mm:ss). Records containing missing values in these crucial aspects were examined and eliminated if they were considered incomplete for the purpose of analysis. This measure guaranteed that the dataset utilized for analysis was comprehensive and dependable, hence reducing the possibility of distorted outcomes. 3.4 Exploratory Data Analysis Exploratory Data Analysis involves summarizing the main characteristics of the dataset and visualizing data distributions and relationships. The following steps were undertaken to perform EDA on the EV charging data: 3.4.1 Descriptive Statistics Key statistics of the dataset were summarized to provide an overview of the data. This included measures such as mean, median, standard deviation, and range for numerical features like Start SOC, Energy (kWh), End SOC, and duration. These statistics helped in understanding the central tendency, dispersion, and overall distribution of the data. 3.4.2 Data Visualization Visualization libraries such as Seaborn and Matplotlib were used to create various plots that depict the distribution and relationships of the data. 31 3.4.2.1 Box Plots Box plots were created to visualize the distribution of the duration and Energy (kWh) features [23]. Box plots help in identifying the presence of outliers and understanding the spread and skewness of the data [24]. A box plot, also known as a box-and-whisker plot, is used in displaying the distribution of quantitative data so that comparisons can be made among variables as well as across levels within a categorical variable [23] [24].The whiskers extend to show the rest of the distribution except for any points that fall outside “outliers” which are identified using some function involving inter-quartile range [23] [24]. For this plot, the box contains the quartiles of the dataset while whiskers expand to display other parts of the distribution [24]. 3.4.2.2 Histograms In order to visualize the distributions of each numerical characteristic, histograms were drawn for each of the components. It is possible to recognize patterns such as normal distribution, skewness, or bimodal distributions with the use of histograms, which are graphics that depict the frequency distribution of data. They organize the data points into continuous ranges, often known as bins, and each bar in the histogram shows the frequency of the data points that are contained inside each different bin. The form of the data distribution may be seen, and outliers can be identified with the assistance of this [25]. 3.4.2.3 Scatter Plots The purpose of the scatter plots was to provide a visual representation of the link that exists between the various numerical characteristics. A scatter plot is a type of graph that uses dots to represent values for two distinct numeric variables. The position of each dot on the horizontal and vertical axes indicates the values for a single data point. When it comes to 32 determining the links, correlations, and possible outliers that exist between variables, this sort of figure is quite helpful [26]. For example, plotting duration against Energy (kWh) helps to observe any potential correlations or trends between the amount of energy consumed and the duration of the charging session. The utilization of these visualization tools enhanced the comprehension of the dataset, uncovering patterns, trends, and possible anomalies that influenced further analysis and modeling. The integration of descriptive statistics and visualizations offered a thorough and all-encompassing depiction of the data, establishing a foundation for further in-depth research and predictive modeling. 3.5 Modeling Machine learning and data mining are changing many parts of our lives by helping us find useful information in the huge amounts of data that are created every day. These tools are very important for making sense of data and using it to solve problems and make smart choices [27]. 3.5.1 What does Machine Learning mean? A part of data mining called machine learning tries to make things work better and predict what will happen by looking at past data [27]. It means showing computers how to learn from cases and decide what to do or guess without being told directly. There are two types of machine learning: supervised learning and unsupervised learning. Supervised learning uses labeled data to help the system learn, while unsupervised learning lets the system find patterns on its own [27]. 33 3.5.2 What does Data Mining mean? The goal of data mining is to find patterns and useful information in big sets of data. For instance, companies use data mining to figure out how their customers act, guess what trends will happen in the future, and make important business decisions. It involves going through huge amounts of data to find trends that make sense and can help you make decisions [27]. Data should be gathered from various sources like client’s past purchases, or sensor data from instruments [27]. Pattern Detection: Data mining techniques uncover patterns and relationships in the information [27]. It can be found out, for example, that people who usually purchase bread also buy margarine [27]. Learning from Data: Machine learning enables decision making based on trends. If somebody makes a purchase of bread, the computer system may suggest buying some butter during the next visit [27]. These technologies are significant because they allow us to comprehend and exploit large volumes of data created daily. It transforms raw data into meaningful information that guides firms in making better decisions as well as helps doctors detect impending disease attacks earlier than they occur and enable engineers to come up with improved products [27]. Machine learning on the other hand uses feature recognition, supervised learning and unsupervised learning in the same way human beings learn, machine learning algorithms concentrate on those factors that are considered important when coming up with conclusions [28]. 34 3.5.3 Isolation Forest for Anomaly Detection The Isolation Forest (iForest) is an anomaly detection system that relies on ensemble learning [29]. The approach builds a collection of isolation trees (iTrees) by repeatedly dividing the data space using randomly chosen features and split values [30]. 3.5.3.1 Isolation Mechanism: iForest generates iTrees by iteratively picking a feature and a split value at random, repeating this process until each data point is individually separated [29]. Distinct anomalies are segregated with fewer partitions, leading to shorter routes in the trees [29] [30]. Score indicating the presence of an anomaly: Anomaly scores are calculated by averaging the path length from the root to the leaf nodes. Smaller average route lengths suggest a greater probability of abnormalities [29]. Anomalies are promptly identified and addressed due to their distinct and exceptional attributes [30]. 3.5.3.2 Enhancements: The system uses k-means clustering to automatically establish anomaly thresholds, eliminating the requirement for human threshold configuration and enhancing the objectivity and consistency of anomaly detection [30]. Top-K Anomaly Detection refers to the process of identifying the K most significant anomalies in a given dataset [30]. The k-nearest neighbor distance is utilized to compute anomaly scores, guaranteeing consistent ranking of anomalies throughout repeated tests [29] [30]. This aids in accurately recognizing the top-K anomalies, which is essential for the interpretation of hydrological data [30]. 35 3.6 Neural Network Artificial neural networks (ANN) are a category of information processing systems that draw inspiration from the structure and functioning of the human brain. Nonlinear systems simulation and control are utilized across many domains including medical, biology, mathematics, physics, philosophy, computer science, and information science [28] [29]. Artificial neural networks are composed of a multitude of interconnected processing units known as neurons. Every individual neuron receives input signals and generates an output depending on these received inputs. The synapses between neurons possess weights that may be modified throughout the process of learning in order to enhance the performance of the neural network [29]. An Artificial Neural Network is generally composed of an input layer, one or more hidden layers, and an output layer. Neurons at the input layer receive signals from the external world, which are further processed through the hidden layers before being sent to the output layer [28]. The presence of hidden layers in the network enables it to effectively capture complex patterns and correlations present in the data [29]. The learning process in artificial neural networks entails modifying the connection weights between neurons by considering the discrepancy between the network's expected output and the true output [28]. This technique is commonly accomplished through the utilization of algorithms such as backpropagation [28] [29]. Backpropagation analyzes the gradient of the error in relation to each weight and subsequently adjusts the weights to minimize the error [28] [29]. Artificial Neural Networks are renowned for their capacity to accurately represent complex nonlinear connections and acquire knowledge from input, rendering them wellsuited for tasks such as identifying patterns, categorizing information, and controlling 36 systems [29]. Artificial intelligence systems have the capability to effectively manage and analyze substantial volumes of data, enabling them to identify significant patterns. This ability is particularly important for applications like image recognition, voice processing, and autonomous systems [28]. To summarize, Artificial Neural Networks are very effective instruments that replicate the architecture and capabilities of the human brain to analyze data, acquire knowledge, and generate informed choices [29]. Artificial neural networks, often referred to as ANNs, are extensively used across different fields to address complex issues by utilizing their adaptive learning skills and capacity to represent nonlinear interactions [28] [29]. We used this model and followed these steps for conducting [31]. First, the input features, including Start SOC, End SOC, and Charging Time, were selected to provide relevant data about the charging behavior of electric vehicles. The output feature, Energy (kWh), was chosen as the target variable for predicting the total energy consumption during each session. This approach allowed us to model and analyze the relationships between these inputs and energy consumption, providing valuable insights into charging habits and patterns. We employed various machine learning techniques, trained the models, and evaluated their performance using established metrics to ensure the effectiveness of our predictions. 3.7 Random Forest Description Random Forest (RF) is an ensemble learning method used for classification and regression tasks. The algorithm creates multiple decision trees during training and outputs the class that is the mode of the classes (classification) or the mean prediction (regression) of the individual trees [32]. 37 For the Random Forest model, we selected Start SOC, End SOC, and Charging Time as the input features, with Energy (kWh) as the output variable to predict. In [33], Random Forest’s ensemble approach, which builds multiple decision trees, was particularly useful for capturing the diverse interactions between the input variables. By averaging the predictions of multiple trees, Random Forest improved the model’s robustness and reduced overfitting, leading to more reliable predictions of energy consumption during EV charging sessions. Its ability to handle feature importance and noisy data further strengthened the accuracy of the results. 3.7.1 Basic Principles of Random Forest The Random Forest algorithm uses the bootstrap resampling method to establish a decision tree model for each sample set. During modeling, each decision tree randomly selects features to split the attributes of internal nodes, thus forming a random forest. The final output is derived from the comprehensive decision trees. For regression tasks, the final prediction result is the average of the outputs from all decision trees [32]. 3.7.2 Construction of Random Forest Random Forest constructs different training sets to increase the variation between classification models, enhancing the extrapolation prediction ability of combined classification models. Through k rounds of training, a classification model sequence is obtained and used to form a multi-classification model system. The final classification result of the system is determined using a simple majority voting method [34]. 3.7.3 Handling of Feature Importance In Random Forest prediction, noise can be added to a feature to judge its importance. The importance is determined based on whether the prediction accuracy decreases significantly when the feature's data is perturbed. This capability allows the algorithm to 38 calculate the importance of characteristic variables while maintaining accuracy even with outliers and noise [32]. 3.7.4 Majority Voting Mechanism The final classification decision in Random Forest is made using a majority voting mechanism. Each tree in the forest gives a classification, and the class with the most votes becomes the model's prediction. This mechanism helps in reducing overfitting and improves the robustness of the model [34]. 3.8 Support Vector Regression (SVR) Support Vector Regression is an extension of the Support Vector Machine tailored for regression tasks. The primary goal of SVR is to predict continuous output values based on input features [35]. For the Support Vector Regression (SVR) model, we applied this method by using Start SOC, End SOC, and Charging Time as input features, with Energy (kWh) as the output to predict. In [33], SVR was chosen for its capacity to handle complex relationships by finding the optimal hyperplane that minimizes prediction errors. This model enabled us to predict energy consumption accurately, especially in cases with non-linear relationships between inputs and the target variable. The use of the kernel trick further allowed us to model these non-linearities effectively, making SVR a valuable tool in understanding charging behavior patterns. 3.8.1 Fundamental Concept SVR constructs a hyperplane in a high-dimensional space to predict continuous values. The method is an adaptation of the Support Vector Machine (SVM), which is typically used for classification tasks [35]. 39 3.8.2 Loss Function SVR employs an ε-insensitive loss function [36]. This loss function helps measure the quality of the estimation by ignoring errors within a certain margin (ε) from the true value. Deviations beyond this margin are penalized. The ε-insensitive loss function allows SVR to find a balance between the complexity of the model and the precision of predictions [35] [36]. 3.8.3 Optimization Problem The objective of SVR is to minimize a function that balances model complexity (measured by the norm of the weights vector) and the sum of the errors exceeding the ε margin. This balance is achieved by introducing slack variables to handle deviations outside the ε-sensitive zone. The optimization problem is formulated to minimize the norm of the weight vector while also considering the slack variables that account for errors [35]. 3.8.4 Kernel Trick Similar to SVM, SVR can handle non-linear relationships by employing the kernel trick [18]. This technique involves mapping the input features into a high-dimensional space using a kernel function [18]. Common kernel functions include the Radial Basis Function (RBF) and polynomial kernels. The kernel trick allows SVR to model complex, non-linear relationships effectively [35]. 3.8.5 Advantages and Challenges SVR is capable of producing robust regression models that can handle highdimensional data and complex relationships. However, the performance of SVR depends heavily on the choice of kernel and the tuning of hyperparameters such as the regularization parameter (C) and the ε margin. Proper tuning of these parameters is crucial for the effectiveness of the SVR model [35]. 40 3.8.6 Practical Implementation In practical applications, such as facial expression recognition, SVR is used to predict continuous values representing different expressions. The effectiveness of SVR in these applications demonstrates its versatility and robustness in handling regression tasks with high-dimensional and complex data. SVR's ability to model complex relationships makes it suitable for various real-world applications where continuous output prediction is required [35]. 3.9 XGBoost XGBoost, also known as Extreme Gradient Boosting, is a highly potent machine learning method utilized for problems involving supervised learning [37]. XGBoost is a sophisticated version of the gradient boosting framework. The system is optimized for rapid execution and high efficiency, enabling it to effectively process extensive volumes of data [37]. The model structure of XGBoost consists of a tree ensemble, where many decision trees are utilized to create predictions. The ultimate forecast is determined by aggregating the results of each individual tree in the ensemble. This methodology facilitates the identification and analysis of complex patterns within the data [37] [38]. The goal function in XGBoost comprises a loss function and a regularization term [37]. The loss function evaluates the degree of correspondence between the model's predictions and the true target values, while the regularization term manages the model's complexity to avoid overfitting [37] [38]. Greedy method: XGBoost employs a greedy method to identify the optimal split points for the decision trees. This method systematically assesses every potential division for 41 each feature and chooses the one that reduces the loss function to the greatest extent. In order to enhance efficiency, XGBoost pre-sorts the data and accesses it in a sequential manner during the process of searching for splits [37]. Regularization is an important aspect of XGBoost, as it incorporates regularization within the goal function [38]. Regularization mitigates overfitting by imposing a penalty on the complexity of the model [37]. This is accomplished by including phrases that pertain to the count of leaf nodes and the amount of the weights applied to each individual leaf node [37] [38]. The model can be represented by: ŷᵢ = f₁(xᵢ) + f₂(xᵢ) + ... + fₖ(xᵢ) (3.1) where fₖ represents the k-th decision tree [37]. For the XGBoost model, we followed a systematic approach by using Start SOC, End SOC, and Charging Time as input features, while Energy (kWh) was the output target. In [18] XGBoost's ability to handle large datasets and capture complex relationships made it an ideal choice for predicting energy consumption. We employed this gradient boosting technique to iteratively improve model accuracy by minimizing prediction errors at each step. XGBoost’s regularization and efficiency in handling missing values further enhanced our ability to model the intricacies of EV charging behavior and deliver precise energy consumption predictions. 3.9.1 Dealing with Missing Values An advantage of XGBoost is its capability to effectively manage missing values. During the training process, XGBoost algorithm acquires knowledge on how to effectively manage missing values, enabling it to provide accurate predictions even when certain data points are lacking [38]. 42 Parallel Processing: XGBoost is specifically engineered to exploit the benefits of parallel processing. XGBoost achieves a notable reduction in training time compared to classic gradient boosting algorithms by dividing data and executing calculations simultaneously [39]. 3.9.2 Advantages of XGBoost Efficiency: XGBoost has exceptional efficiency, demonstrating the ability to effortlessly manage enormous datasets and high-dimensional data [40]. Accuracy: XGBoost frequently produces great predicted accuracy because to its utilization of sophisticated techniques such as second-order Taylor expansion and regularization [39] [40]. Flexibility: The algorithm is designed to accommodate a diverse set of hyperparameters, enabling thorough customization and precise adjustment to enhance performance for particular jobs [38]. 3.10 Stacking Regressor The Stacking Regressor is an ensemble learning method that enhances forecast accuracy by amalgamating numerous regression models [41]. The stacking regressor is a conceptual framework used in machine learning [42]. A stacking regressor is a machine learning ensemble strategy that combines numerous base regressors to improve the accuracy of predictions [41]. For the Stacking Regressor model, we followed a similar process by using Start SOC, End SOC, and Charging Time as inputs and Energy (kWh) as the target output. The Stacking Regressor combines the predictions of multiple base models by using a meta-regressor to make the final prediction. In [31], this ensemble approach was applied to leverage the diverse 43 strengths of different regression models, enhancing the predictive accuracy of energy consumption. This method allowed us to gain deeper insights by effectively combining information from various models and minimizing errors in forecasting. 3.10.1 Structure of the model The stacking regressor paradigm consists of many base regressors and a metaregressor [42]. The primary regressors are trained autonomously, and their forecasts are employed as input characteristics for the meta-regressor [41]. The stacking regressor is a technique used in forest height estimation to enhance the accuracy of forecasts by combining the predictions of many base models [42]. 3.10.2 Process of Training A stacking regressor undergoes a training procedure that consists of two distinct steps [24]. During the initial phase, every base regressor is trained using the original training data. During the second step, the meta-regressor is trained by utilizing the predictions made by the basic regressors as input characteristics [41]. 3.10.3 Primary models The base regressors can encompass a range of regression techniques, including linear regression, random forest, adaptive boosting, support vector regression, and ridge regression [41] [42]. Each of these models captures distinct facets of the data, hence enhancing the overall prediction's resilience [41]. 3.10.4 Meta-Regressor The meta-regressor is often a straightforward model that acquires the ability to amalgamate the predictions made by the base regressors. Ridge regression is commonly 44 employed as the meta-regressor because of its capacity to address multicollinearity among the input characteristics [41]. 3.10.5 Benefits Stacking regressors can enhance prediction accuracy by capitalizing on the capabilities of numerous regression algorithms, surpassing the performance of individual base models. The utilization of this ensemble strategy mitigates the likelihood of overfitting and improves the model's capacity for generalization [41] [42]. 3.11 Voting Regressor The Voting Regressor is a type of ensemble learning method that is specifically designed for regression challenges [43]. The concept of a Voting Regressor refers to a machine learning algorithm that combines the predictions of many regression models to make a final prediction [43]. The Voting Regressor is an approach that enhances overall performance by aggregating the predictions of numerous independent regression models. The system operates by utilizing the collective knowledge of a group of individuals, either by calculating the average or assigning weights to the predictions made by its component models [44]. For the Voting Regressor model, we utilized this approach by selecting Start SOC, End SOC, and Charging Time as input features and Energy (kWh) as the output. By combining predictions from multiple models, the Voting Regressor aggregates the strengths of each individual model to improve the accuracy of energy consumption forecasts. Throughout [31], we applied this technique by training several base models and averaging their predictions, allowing us to capture different aspects of the charging behavior and ultimately deliver more reliable and robust results. 45 3.12 Structure of the model The Voting Regressor is composed of many basic regressors. Every individual base model is trained separately using identical information, and their predictions are combined to get the ultimate prediction. The process of aggregating can be accomplished by either a basic calculation of the average or by giving distinct weights to each model's forecast, taking into account their anticipated performance [44]. 3.13 Categories of Averaging Simple Averaging: In this method, each model is given the same weight, and the final forecast is calculated by taking the average of the predictions from all models [43]. Weighted averaging involves assigning different weights to each model's forecast based on their respective performance [43]. 3.14 Process of Training Every each base regressor is trained separately using the training data. During the prediction phase, each model generates its own forecast. These predictions are then merged using either simple or weighted averaging, according to the selected aggregation technique [44]. 3.15 Benefits The Voting Regressor can enhance accuracy by amalgamating the predictions of numerous models, resulting in superior performance compared to individual models [43]. 46 Reduced Overfitting: The process of averaging predictions from different models aids in diminishing overfitting by ensuring that the ensemble is less prone to capturing irrelevant details from the training data [44]. The Voting Regressor offers versatility by being compatible with any regression model, enabling a broad spectrum of combinations and freedom in the choosing of models [44]. 3.16 Model Evaluation 3.16.1 Mean Absolute Error (MAE) MAE measures the average magnitude of errors in a set of predictions, without considering their direction. It is calculated as the mean of the absolute differences between predicted and actual values, making it easily interpretable as it is on the same scale as the data being predicted. MAE is particularly useful because it gives a straightforward measure of prediction accuracy [45]. 3.16.2 Root Mean Squared Error (RMSE) RMSE is a quadratic scoring rule that also measures the average magnitude of the error. It is the square root of the average of squared differences between predicted and actual values. RMSE gives a higher weight to larger errors, making it more sensitive to outliers compared to MAE [46]. R-squared (R²) R-squared is a statistical measure that represents the proportion of the variance for a dependent variable that is explained by an independent variable or variables in a regression 47 model. It indicates the goodness of fit of the model. Higher values indicate better model performance [47]. 3.16.3 Symmetric Mean Absolute Percentage Error (SMAPE) Description: SMAPE is an accuracy measure based on percentage errors. It is calculated as the mean of the absolute percentage errors between predicted and actual values, making it useful for comparing model performance across different datasets [48]. 3.16.4 Challenges and Implementation Strategies in Model Development: During the implementation of the models, several obstacles emerged that required careful navigation to ensure the accuracy and reliability of the results. The first issue encountered was the presence of missing data in crucial columns like Start SOC, End SOC, and Energy (kWh). These gaps in the data could have significantly undermined the models' ability to learn effectively. To maintain the quality of the dataset, it was decided to remove rows with missing values in these key columns. While this approach reduced the overall size of the dataset, it allowed the models to be trained on complete and consistent data, ultimately enhancing their predictive performance. Another aspect that demanded attention was the handling of the Charging Time (hh:mm:ss) feature. Initially, this data was in a time format, which posed a challenge for direct use in most machine learning models. The solution was to convert this time data into a numerical format, specifically seconds, which allowed it to be used as a continuous variable in the models. This transformation enabled the models to better interpret and utilize the time-related information. Overfitting was a concern throughout the process, particularly given the smaller dataset after the removal of incomplete records. Overfitting occurs when a model performs exceptionally 48 well on training data but fails to generalize to unseen data. To address this, various strategies were employed. In the case of the Random Forest and XGBoost models, careful tuning of hyperparameters, such as the number of estimators and the depth of trees, was implemented to prevent the models from becoming too complex. The Random Forest model, in particular, was monitored using the Out-of-Bag (OOB) score, which provided an additional measure of performance on unseen data during training. To further reduce the risk of overfitting, ensemble techniques such as stacking and voting regressors were employed. By combining multiple models, these techniques used the strengths of each individual model, resulting in more robust predictions and reducing the likelihood that any single model would dominate and potentially overfit the data. Computational resources also played a significant role in the model implementation. Training models like neural networks, Random Forest, and XGBoost can be demanding, especially when fine-tuning hyperparameters or using ensemble methods. Given these constraints, training was conducted with careful attention to the balance between computational efficiency and model performance. This included optimizing the number of estimators and epochs, and incorporating more computationally efficient models, like Support Vector Regression, into the ensemble to lighten the overall load. Finally, the choice of evaluation metrics was crucial in ensuring a comprehensive assessment of model performance. Multiple metrics were used, including Mean Absolute Error, Rsquared (R2) score, Symmetric Mean Absolute Percentage Error, and Root Mean Squared Error. This multi-faceted approach provided a well-rounded evaluation of the models, capturing not only their accuracy but also their potential biases and variances. 49 Chapter 4: Experiment Design The dataset of EV charging sessions provides a rich narrative about user behaviors and charging patterns by delving into the histograms of key numerical features—Energy (kWh), Start SOC, End SOC, and Duration—we can uncover important trends and insights. 4.1 Data Pre-Processing 4.1.1 Handling Missing Data Handling missing data is a vital part of data preprocessing to ensure the integrity and accuracy of the model's predictions. The following approach was taken: • Column Selection: Initially, columns that were not essential for the analysis were removed from the dataset. This was done to focus on the key variables: Start SOC, End SOC, Charging Time (hh:mm:ss), and Energy (kWh). Removing irrelevant columns reduces the dimensionality of the data and simplifies the subsequent analysis. 50 • Row Filtering: The dataset was then processed to identify and handle missing values. Specifically, rows that contained missing values (NaN) in any of the critical columns (Start SOC, End SOC, Energy (kWh)) were excluded from the final dataset. This approach, known as listwise deletion, was chosen to ensure that only complete cases were used for model training, thereby avoiding the potential biases introduced by imputation techniques. By removing rows with missing data, the dataset was cleaned and prepared for accurate modeling, with the understanding that sufficient data remained to maintain the robustness of the analysis. 4.1.2 Data Normalization Normalization is a standard preprocessing step, especially when dealing with features of varying scales. In this analysis, a normalization function was implemented to scale data values to a range between 0 and 1. Although this function was not utilized in the final model, its inclusion highlights the importance of normalization in machine learning, as it ensures that no single feature disproportionately influences the model due to its scale. 4.1.3 Time Data Conversion The dataset included a time-related feature, Charging Time (hh:mm:ss), which needed to be converted into a numerical format suitable for modeling. This was achieved by transforming the time into seconds, allowing the model to process and learn from the time data effectively. 4.1.4 Feature and Target Variable Preparation The preprocessed dataset was then divided into features (X) and the target variable (y). The features included Start SOC, End SOC, and the normalized Charging Time, while the 51 target variable was Energy (kWh). This clear separation ensured that the model could be trained specifically to predict the target variable based on the provided features. In the experimental setup, the dataset was first split into training and testing sets using the train_test_split function from the sklearn.model_selection module. Specifically, 33% of the data was allocated to the testing set, while the remaining 67% was used for training. The split was made with a fixed random_state to ensure reproducibility. No cross-validation techniques were directly applied in this setup. However, the consistency provided by the random_state during the train-test split helped in achieving reliable and repeatable results across different models. If cross-validation had been used, it would involve dividing the training data into multiple folds to validate the model's performance across various subsets, providing a more thorough evaluation. 4.2 Energy Consumption Examining the distribution of energy consumption reveals a pronounced right skew, with a skewness value of 0.78. This indicates that most charging sessions are characterized by lower energy usage, with a smaller number of sessions requiring higher energy inputs. The histogram depicts a large concentration of sessions at the lower end of the energy spectrum, gradually tapering off towards higher values. This pattern suggests that users often engage in short charging sessions, potentially opting for partial charges rather than full ones. Such behavior might be influenced by the widespread availability of fast-charging stations, allowing drivers to quickly top up their batteries as needed. 4.3 Start SOC (State of Charge) The Start SOC histogram displays a somewhat uniform distribution with a slight increase towards the mid-range, followed by a decline. With a skewness of 0.43, this pattern 52 indicates that many charging sessions begin with the battery at a mid-level state of charge. It appears that drivers typically start their charging sessions when their battery is neither fully depleted nor overly charged, hovering around a comfortable mid-range. This could be a deliberate strategy to maintain battery health, as frequent deep discharges and charges can reduce battery longevity. By starting charges at a mid-range SOC, users might be aiming to prolong their battery's lifespan while ensuring they have enough charge for their next journey. 4.4 End SOC In contrast to the Start SOC, the End SOC histogram is left-skewed, with a skewness of -1.22. This indicates that most charging sessions conclude with a high SOC, often nearing full charge. The histogram shows a significant concentration of sessions at the upper end of the SOC spectrum, suggesting that users prefer to charge their vehicles to a high level before concluding the session. This behavior is likely driven by the desire to maximize driving range and reduce the frequency of charging stops. By ending their sessions with a high SOC, drivers ensure they have sufficient battery capacity for their upcoming trips, reducing range anxiety and enhancing convenience. 4.5 Charging Duration The distribution of charging durations is strikingly right-skewed, with a skewness of 7.07. The histogram reveals a large number of sessions with very short durations, accompanied by a long tail extending towards longer charging times. This extreme skewness suggests that the majority of charging events are quick top-ups, likely facilitated by the presence of fast-charging infrastructure. Users seem to prefer short, frequent charging sessions to maintain their SOC, capitalizing on the speed and convenience of modern 53 charging stations. However, the long tail also indicates occasional longer sessions, which might be necessary for deeper charges or when slower charging speeds are utilized. 4.6 Insights The patterns observed in the dataset weave a compelling story about the charging habits of EV users. The tendency towards lower energy consumption in most sessions highlights a preference for partial charges, likely driven by the convenience of fast-charging stations. Starting charges at a mid-range SOC and ending them with a high SOC reflects a strategic approach to battery management, aimed at balancing longevity with readiness for the next trip. The predominance of short charging durations further underscores the impact of fast-charging technology, allowing users to quickly replenish their batteries and get back on the road. Together, these insights paint a vivid picture of the evolving landscape of EV charging. Users are increasingly leveraging advanced charging infrastructure to optimize their charging routines, maintaining flexibility and minimizing downtime. As the adoption of EVs continues to grow, understanding these behaviors will be crucial for enhancing charging infrastructure, improving user experience, and supporting the broader transition to sustainable transportation. 54 Figure 4.1: Energy and Duration Histogram 4.7 Box Plot The box plots generated from the dataset provide an overview on how users interact with EV charging infrastructure, revealing patterns in energy consumption, state of charge (SOC), and charging duration. 4.8 Energy Consumption Patterns The box plot for Energy (kWh) paints a clear picture of how much energy users typically consume during their charging sessions. The interquartile range (IQR), represented by the green box, captures the middle 50% of the data, showing that most charging sessions consume between approximately 5 kWh and 30 kWh. The median energy consumption, which is around 16.58 kWh, indicates a common tendency towards moderate energy use. 55 However, the presence of outliers—data points beyond the whiskers—tells another part of the story. These outliers, extending up to 100 kWh, suggest that while most sessions are moderate, there are instances of significantly higher energy consumption. This could be attributed to longer charging sessions or charging sessions for larger battery capacities, highlighting the diverse needs of EV users. 4.9 Starting State of Charge The box plot for Start SOC reveals how charged the vehicles are when they begin a session. Most sessions start with the SOC in the range of 20% to 50%, as indicated by the green box's position. The median start SOC is around 35%, suggesting that drivers often begin charging when their battery is somewhat depleted, likely to avoid running too low on charge. Interestingly, the whiskers and outliers show a few sessions starting with a very low SOC, close to 0%, implying that some users charge only when their battery is nearly empty. This behavior might be influenced by the availability of charging stations or individual charging habits. The upper whisker extends to around 90%, indicating that some users start charging even when their battery is relatively full, perhaps to maintain a high state of readiness. 4.10 Ending State of Charge The End SOC box plot shifts the focus to how charged the vehicles are by the end of the session. Here, the green box lies predominantly in the higher range of SOC, between 55% and 85%, with the median close to a full charge at around 75%. This indicates a strong preference among users to charge their vehicles to a high level, ensuring maximum driving range for subsequent journeys. 56 The few outliers on the lower end suggest that occasionally, charging sessions are cut short, possibly due to time constraints or the immediate need for the vehicle. Nevertheless, the general trend towards a high end SOC reflects a cautious approach, aiming to reduce range anxiety. The lower whisker reaches down to 13%, showing that even the lowest typical ending SOC is relatively high. 4.11 Charging Duration Finally, the duration box plot, now represented in seconds for granularity, reveals the length of typical charging sessions. The IQR shows that most sessions last between approximately 667 seconds (0.185 hours) and 2698 seconds (0.75 hours). The median duration, at around 1584 seconds (0.44 hours), confirms that quick top-ups are common practice. However, the long tail of outliers extending far beyond the whiskers indicates the presence of longer charging sessions, some exceeding 45,000 seconds (about 12.5 hours). These could be due to slower charging rates or situations where a full charge is necessary. This variety in session lengths underscores the flexible nature of EV charging, accommodating both quick stops and longer charging needs. 4.12 Insights Users generally prefer moderate energy consumption, starting charges when their SOC is between 20% and 50%, and aiming for an end SOC of around 75%, all while capitalizing on the convenience of short charging durations with a median of approximately 26 minutes. The outliers in each plot remind us of the diversity in charging needs and behaviors, painting a comprehensive picture of the dynamic world of EV charging. 57 Understanding these patterns not only helps in optimizing charging infrastructure but also in designing policies and strategies that align with user behavior, ultimately supporting the broader adoption of electric vehicles. Figure 4.2: Energy and Duration Boxplots 4.13 Isolation Forest Outlier Detection 4.13.1 Isolation Forest Analysis of Outlier Detection The following scatter plot demonstrates an analysis of outlier detection using an Isolation Forest algorithm with respect to EV charging data. Let’s explore on this plot’s information about data and behavior of the charging sessions. 4.13.2 Comprehension of the Plot This plot presents normalized values for two main variables: 58 Normalized Energy (kWh): This approximates the amount of energy consumed by charge sessions which is scaled to a range between 0 and 1 to allow for comparison hence appearing on x-axis. Normalized Duration (hours): The y-axis shows how long the charging session took when also scaled between 0 and 1. The data points in the plot are color-coded based on the predictions of the Isolation Forest model: Blue points: Indicate normal charging sessions. Red points: Indicate anomalous charging sessions (outliers). 4.13.3 Major Observations 4.13.3.1 Cluster of Normal Points: There is a significant group of blue points located at the bottom-left part of the diagram. This indicates that most charging sessions consume relatively low amounts of energy and have short durations; By the Isolation Forest model, these sessions are seen as normal. Thus, it’s clear from the concentration that most users will do their charging sessions very quickly and they will consume very little power at the same time. Scattered anomalies: Red dots denoting anomalies are spread throughout the plot but tend to be more frequent as both energy consumption and duration increase. Several red points are spread across higher normalized energy and duration values, suggesting that unusually long or high-energy charging sessions are flagged as outliers. 59 4.13.3.2 Behavior Patterns: Most of users usually have shorter low energy charges although there is some time users have long term charges with more energy. These outlier sessions might be the result of slower charging rates, a complete charge needed, or other user-specific activities. 4.13.4 Insights: With its obvious difference between typical and abnormal sessions, the scatter plot reveals dataset outliers and charging patterns. Usually driven by convenience and efficiency, the normal user behavior consists in brief, low-energy charges. The anomalies, however, highlight rare situations wherein consumers participate in longer or more energy-consuming sessions. This might be the result of long trips or other particular demands requiring a complete charge. 4.13.5 Conclusion: Isolation Forest effectively separates the typical charging behavior and anomalies. A better understanding of these trends can help optimize the charging infrastructure as well as customize services for customers’ needs such as locating charging stations based on session lengths since most are short ones or making sure there is access to moderate chargers occasionally For instance, knowing that most sessions are short can inform the placement and type of charging stations, while recognizing the need for occasional long charges can ensure that facilities are available for those situations. 60 Figure 4.3: Isolation forest scatter plot 4.14 Analysis of Neural Network Model We used a neural network model constructed using TensorFlow and Keras to forecast the energy consumption during these charging sessions. The neural network is composed of three hidden layers that utilize ReLU (Rectified Linear Unit) activation functions, along with an output layer containing just one neuron. The model was constructed via the Adam optimizer and trained with the objective of minimizing the mean squared error (MSE). 4.15 Analysis of the model's framework: The input layer receives three features: Start SOC, End SOC, and Charging Time. The neural network consists of three hidden layers, each containing 64, 32, and 16 neurons, respectively. All of these neurons utilize the Rectified Linear Unit (ReLU) activation function. 61 The output layer consists of a single neuron that generates the predicted energy consumption. The training process consisted of 15 epochs. 4.15.1 Metrics for evaluating performance In order to assess the efficacy of the model, we employed multiple metrics: The Mean Absolute Error is 5.87. The model's forecasts have an average deviation of 5.87 kWh. This provides us with a direct quantification of the average magnitude of the error. If the actual energy consumption was 40 kWh, the model's average prediction would likely fall within the range of approximately 34.13 to 45.87 kWh. R-squared Accuracy: 0.76. The R-squared score quantifies the proportion of the variability in energy usage that the model is able to account for. A score of 0.76 indicates that the model accounts for 76% of the variation in the data. This suggests that the model successfully captures the majority of the significant patterns and trends present in the data. The Symmetric Mean Absolute Percentage Error is 26.66. SMAPE quantifies the accuracy of our predictions by assessing the average absolute differences between the projected and actual values. A SMAPE of 26.66% indicates that, on average, the prediction error is approximately 26.66% of the true energy use. This indicates that although the model has satisfactory performance, there is still potential for better results. The Root Mean Squared Error is 60.93. RMSE is a metric that places greater importance on larger errors. It calculates the square root of the average of the squared discrepancies between anticipated and actual values. A root mean square error of 60.93 kWh indicates that the model's predictions deviate significantly from the actual values, suggesting the occurrence of occasional substantial errors. 62 4.15.2 Analysis and explanation of findings The neural network model has exhibited a commendable capacity to forecast energy consumption during electric vehicle charging sessions. The MAE of 5.87 kWh suggests that the average forecast error is rather minimal, which is a positive outcome. The R-squared score of 0.76 indicates that the model is able to account for a significant portion of the variation in the data, demonstrating its ability to accurately capture the fundamental patterns. Nevertheless, the SMAPE of 26.66% suggests that the prediction errors are still perceptible, with an average deviation of approximately one-fourth of the actual values. This indicates that the model's predictions are reasonably precise, however there is room for improvement. The root mean square error of 60.93 kWh indicates the existence of substantial flaws in the model's predictions, implying that although the model is often precise, it occasionally produces notable inaccuracies. To summarize, the neural network model offers a reliable forecast of energy use during electric vehicle charging sessions. It exhibits strong explanatory power for the variability in the data and maintains a relatively low average error. Nevertheless, there is always potential for enhancement, particularly in minimizing significant errors and enhancing overall precision. The findings indicate that by improving and maybe incorporating supplementary characteristics, the model has the potential to enhance its precision and dependability. 4.16 Analysis of Random Forest Model The Random Forest algorithm is a machine learning technique that constructs several decision trees and combines them to achieve a more precise and robust prediction. 63 For this study, we applied the Random Forest algorithm with 10 trees to forecast the energy consumption during electric vehicle charging sessions. The results indicate an out-ofbag score of 0.82. It demonstrates the anticipated performance of the model on unfamiliar data. An out-of-bag (OOB) score of 0.82 indicates that the model is very dependable and demonstrates strong performance on data that it has not been trained on, accounting for 82% of the variation in energy usage. The Mean Absolute Error is 3.74. MAE measures the average size of the errors in the predictions, without considering whether the predictions are too high or too low. Here, an MAE of 3.74 means that, on average, the model’s predictions are off by 3.74 kWh. This gives us a straightforward understanding of the prediction accuracy. The R-squared score tells us how well the model explains the variation in the data. An R2 score of 0.89 means the model can explain 89% of the variability in energy usage. This is a high score, indicating that the model fits the data very well. The Symmetric Mean Absolute Percentage Error is 17.56. SMAPE measures the accuracy of the predictions relative to the actual values. A SMAPE of 17.56% means that the average error in the model's predictions is 17.56% of the actual energy values. This percentage helps us understand the error size in a way that is relative to the size of the values being predicted. RMSE, or Root Mean Square Error, is a metric that is similar to MAE but places greater emphasis on larger errors. The metric calculates the square root of the mean squared deviation between the expected and actual values. A root mean square error of 5.44 kWh indicates that the average deviation between the predicted and actual values is 5.44 kWh. This measure is valuable when greater errors are especially unwanted. 64 In summary, the Random Forest model offers a robust and precise forecast of energy consumption during electric vehicle charging sessions, exhibiting minimal average errors and a high capacity to elucidate the fluctuations in the data. 4.17 Analysis of Support Vector Regression Model Support Vector Regression is a type of Support Vector Machine (SVM) used for regression problems. The Mean Absolute Error is calculated to be 4.71. MAE is a metric that quantifies the mean value of errors in the predictions, regardless of their direction. An MAE of 4.71 indicates that, on average, the model's forecasts deviate by 4.71 kWh. This offers a clear and precise assessment of how accurately predictions are made. The R-squared score is 0.81. The R-squared score quantifies the degree to which the model accounts for the variability in the data. Achieving an R2 value of 0.81 indicates that the model accounts for 81% of the variance in energy usage, demonstrating a robust performance and capturing the majority of significant patterns in the data. The Symmetric Mean Absolute Percentage Error is calculated to be 22.24. SMAPE quantifies the precision of the forecasts in relation to the real values, taking into account both the size and direction of mistakes. A SMAPE of 22.24% indicates that the average inaccuracy in predictions is approximately 22.24% of the true energy values. This suggests a moderate degree of precision, indicating that although the model performs satisfactorily, there is still potential for enhancement. The Root Mean Squared Error is 7.08. RMSE prioritizes greater errors by calculating the square root of the average of the squared discrepancies between anticipated and actual values. A root mean square error value of 7.08 kWh indicates that the average deviation 65 between the predicted and actual values is 7.08 kWh. This statistic is valuable for comprehending the average magnitude of the errors, particularly when larger errors have greater significance. In summary, the Support Vector Regression model offers a robust and precise forecast of energy consumption during electric vehicle charging sessions, striking an ideal balance between the mean number of errors and the ability to account for variations in the data. The model demonstrates satisfactory performance, but it can still be enhanced to minimize prediction errors and enhance overall accuracy. 4.18 Analysis of XGBoost Model XGBoost (Extreme Gradient Boosting) is a powerful machine learning algorithm that is used for regression and classification tasks. The Mean Absolute Error is 3.75. MAE is the average magnitude of the prediction errors, regardless of their direction. A mean absolute error of 3.75 indicates that, on average, the model's predictions differ from the actual values by 3.75 kilowatt-hours (kWh). This metric provides a concise and precise measure of the accuracy of predictions. The R-squared score is 0.89. A score of 0.89 indicates that the model accounts for 89% of the variability in energy usage, suggesting a robust match. The Symmetric Mean Absolute Percentage Error is calculated to be 18.81. It indicates that the average error in predictions is roughly 18.81% of the true energy levels. The Root Mean Squared Error is 5.49. RMSE, or Root Mean Square Error, is a metric that quantifies the average squared discrepancies between anticipated and actual values. The XGBoost model, in general, is able to reliably predict the amount of energy that will be consumed during charging sessions for electric vehicles. It achieves this by finding an 66 acceptable balance between the average error magnitude and the ability to explain variability. Despite the fact that the model displays satisfactory performance, there is a need for more improvement in order to reduce errors while increasing overall precision. 4.19 Analysis of Voting Regressor Model We predicted the amount of energy utilized during electric vehicle charging sessions using a Voting Regressor model. To increase overall accuracy, the Voting Regressor integrates predictions from multiple models. In particular, we used Random Forest, XGBoost, Support Vector Regressor, and Linear Regression. Our goal in employing this ensemble of models was to produce more dependable forecasts than could be produced by a single model. 4.19.1 Outcome: After completing the training of the model, we proceeded to assess its performance using the test data. The following are the essential measurements: The Mean Absolute Error is 3.87. This provides a concise understanding of the standard deviation of prediction errors. The R-squared score is 0.89 and shows that the model is capable of accounting for 89% of the fluctuations in energy consumption. The high score suggests that the model is a good fit for the data. A SMAPE of 19.15% indicates that our estimates, on average, differ by 19.15% from the actual data. This indicates that the model possesses a satisfactory level of accuracy. The Root Mean Squared Error is 5.38. This aids in comprehending the overall precision of the model, particularly in instances where there are significant discrepancies. 67 4.19.2 Example Predictions: Below are many instances illustrating the extent to which the model's forecasts align with the real values: Actual Value Predicted Value 39.975 43.27 30.154 26.82 1.755 1.55 15.632 12.90 65.439 55.20 Table 4.1: Real and predicted values by Voting Regressor These examples demonstrate that the model typically produces predictions that are near the actual values, while there are occasional variances. The findings suggest that our Voting Regressor model is highly proficient at forecasting energy consumption during electric vehicle charging sessions. In conclusion, The Voting Regressor model offers precise and dependable prediction of energy use during EV charging sessions. By harnessing the capabilities of numerous models, we attained superior overall performance. This methodology showcases the efficacy of employing ensemble methods to augment the dependability and precision of predictions in practical scenarios. 68 4.20 Analysis of Stacking Regressor Model The Stacking Regressor is an ensemble learning technique that combines multiple regression models to improve prediction accuracy. The technique trains many regression models, including Linear Regression, Support Vector Regressor, XGBoost, and Random Forest. A meta-model, also known as a final model, is trained using the predictions generated by the basic models. This meta-model utilizes a learning process to assign weights to the predictions generated by the basic models, resulting in a final prediction that is more accurate. The Stacking Regressor model showed excellent performance, demonstrating high accuracy and dependability in forecasting energy use during electric vehicle charging sessions. Through the integration of diverse models, we successfully captured a broad spectrum of patterns present in the data. The R2 value of 0.90 suggests that the model is capable of explaining a significant portion of the variability observed in the data. The Mean Absolute Error of 3.63 kWh indicates that the average discrepancy between our estimates and the actual values is rather small. The SMAPE (18.20%) suggests that our forecasts are rather accurate in relation to the actual values. The root mean square error of 5.14 kWh indicates that although the majority of mistakes are tiny, there are a few significant discrepancies. The Stacking Regressor model, which integrates Linear Regression, Support Vector Regression, XGBoost, and Random Forest, yielded precise forecasts of energy use during 69 electric vehicle charging sessions. By employing this strategy, we can take advantage of the positive aspects of several models, which results in enhanced overall performance. 4.21 Result Comparison: The significant focus that has been given to the efficient operation and optimization of electric vehicle charging infrastructure is a direct result of the rapid increase in the use of electric vehicles. To ensure a charging network that is both dependable and effective, it is essential for utilities, legislators, and owners of electric vehicles to measure the amount of energy that is consumed during charging sessions. During the course of this thesis, we investigated a number of different machine learning algorithms with the goal of precisely predicting the amount of energy that will be consumed. Among the models that were examined were the following: Linear Regression, Support Vector Regression, XGBoost, Random Forest, Voting Regressor, and Stacking Regressor. 4.21.1 Explanation of Performance Metrics: Mean Absolute Error: Lower values indicate more accurate predictions. The Stacking Regressor has the lowest MAE, indicating it provides the most accurate predictions on average. R-squared Score: Higher values indicate better model fit. The Stacking Regressor has the highest R2 score, suggesting it explains the most variance in the data. Symmetric Mean Absolute Percentage Error: Lower values indicate better relative prediction accuracy. The Stacking Regressor has the lowest SMAPE, indicating better relative accuracy. Root Mean Squared Error : Lower values indicate fewer and smaller errors overall. The Stacking Regressor has the lowest RMSE, indicating it has the fewest large errors. 70 4.21.2 Detailed Comparison: Model MAE R2 SMAPE RMSE Neural Network 5.87 0.76 26.66% 60.93 Random Forest 3.74 0.89 17.56% 5.44 Support Vector Regression 4.71 0.81 22.24% 7.08 XGBoost 3.75 0.89 18.81% 5.49 Voting Regressor 3.87 0.89 19.15% 5.38 Stacking Regressor 3.63 0.9 18.20% 5.14 Table 4.2: Evaluation Metrics Mean Absolute Error: Measures the average magnitude of the errors in a set of predictions, without considering their direction. It’s a straightforward measure of accuracy. Lowest MAE: Stacking Regressor (3.63) – indicates the most accurate model on average. Highest MAE: Neural Network (5.87) – indicates the least accurate model on average. R-squared: 71 Indicates how well the model explains the variance in the dependent variable. Values closer to 1 signify better explanatory power. Highest R2: Stacking Regressor (0.90) – explains 90% of the variance, indicating a strong model. Lowest R2: Neural Network (0.76) – explains only 76% of the variance, indicating weaker performance. Symmetric Mean Absolute Percentage Error: Measures the accuracy based on relative errors. It is especially useful for comparing the performance of models on different scales. Lowest SMAPE: Random Forest (17.56%) – indicates the smallest relative error. Highest SMAPE: Neural Network (26.66%) – indicates the largest relative error. Root Mean Squared Error: Measures the square root of the average squared differences between predicted and actual values. It gives higher weight to larger errors, thus highlighting models with larger discrepancies. Lowest RMSE: Stacking Regressor (5.14) – indicates the least overall error magnitude. Highest RMSE: Neural Network (60.93) – indicates the largest overall error magnitude. Best Overall Model: Stacking Regressor 72 Strengths: Lowest MAE and RMSE, highest R2, and relatively low SMAPE. This model balances accuracy and explanatory power, making it the best performer across multiple metrics. Second Best Model: Random Forest Strengths: Second lowest MAE and RMSE, high R2, and the lowest SMAPE. This model is highly accurate and reliable. Model to Improve: Neural Network Weaknesses: Highest MAE and RMSE, lowest R2, and highest SMAPE. This model has significant room for improvement in all metrics. The Stacking Regressor emerged as the most effective model for predicting energy usage during EV charging sessions. It consistently provided the best performance metrics, demonstrating its ability to leverage the strengths of multiple models and reduce their weaknesses. The success of the Stacking Regressor highlights the importance of ensemble methods in achieving higher accuracy and reliability in predictive modeling tasks. 73 Stacking Metric Voting Regressor Better Model Regressor MAE 3.63 3.87 Stacking Regressor R2 0.90 0.89 Stacking Regressor SMAPE 18.20% 19.15% Stacking Regressor RMSE 5.14 5.38 Stacking Regressor Table 4.3: Stacking Regressor and Voting Regressor Comparison 74 Chapter 5: Conclusion This research represents a significant step forward in the field of electric vehicle infrastructure management, offering a comprehensive analysis of EV charging behaviors and the application of advanced machine learning techniques to predict and optimize these behaviors. By analyzing the complex patterns of how and when people charge their EVs, we have gained valuable insights that not only enhance our understanding of current usage trends but also pave the way for future improvements in EV infrastructure and energy management. 5.1 The Journey of Discovery The journey of this research began with a deep dive into data preprocessing, where careful attention was given to transforming raw data into a form that could be effectively analyzed. This involved converting charging times into durations, normalizing the data to ensure consistency across different variables, and carefully handling missing values to maintain the quality of the dataset. These steps were crucial, as they laid the groundwork for the subsequent analysis and modeling efforts, ensuring that the models built upon this data were as accurate and reliable as possible. As we moved into the analysis phase, the data began to reveal fascinating stories about how people interact with EV charging stations. We discovered that many users prefer short, frequent charging sessions—likely a reflection of the convenience offered by the increasing availability of fast-charging stations. This behavior suggests a shift in how drivers approach charging, opting for quick top-ups rather than waiting for a full charge, which aligns with the fast-paced nature of modern life. Furthermore, the data showed that users tend to start charging when their battery's state of charge (SOC) is at a mid-level, avoiding both deep discharges and starting with a 75 nearly full battery. This finding indicates a strategic approach to battery management, where drivers are mindful of maintaining battery health while ensuring they have enough charge for their next journey. By ending their sessions with a high SOC, drivers also minimize range anxiety, ensuring they have ample battery capacity for upcoming trips. 5.2 Achievements in Predictive Modeling One of the most significant contributions of this research lies in the development and evaluation of various machine learning models designed to predict energy consumption during EV charging sessions. The use of models such as Isolation Forest, Neural Networks, Support Vector Regression, Random Forest, and ensemble methods like Stacking and Voting Regressors allowed us to explore different approaches to predictive modeling, each offering unique strengths and insights. The Stacking Regressor emerged as the most effective model, consistently outperforming others across multiple metrics, including Mean Absolute Error, R-squared (R²) score, Symmetric Mean Absolute Percentage Error (SMAPE), and Root Mean Squared Error. This model’s success highlights the power of ensemble learning, where combining the strengths of multiple models leads to more accurate and robust predictions. The Stacking Regressor’s ability to leverage diverse inputs and methodologies makes it particularly wellsuited for the complex task of predicting energy usage in EV charging scenarios. The predictive power of these models offers benefits for both EV drivers and the broader infrastructure that supports them. For drivers, accurate predictions mean less time spent waiting at charging stations, as well as greater confidence in the availability of charging options when needed. For utilities and infrastructure managers, these models provide a valuable tool for optimizing the distribution of energy resources, helping to prevent grid overloads and ensure that charging stations are used efficiently. 76 5.3 Real-World Applications and Implications The practical implications of this research are far-reaching. As electric vehicles continue to gain popularity, the demand for efficient and reliable charging infrastructure will only grow. This research equips stakeholders with the tools they need to manage this growing demand effectively. By understanding charging behaviors and predicting energy consumption, utilities can better manage the load on the power grid, ensuring that energy is available where and when it is needed most. For policymakers, the insights gained from this research can inform the development of data-driven strategies to expand EV infrastructure in a way that aligns with user behavior. For instance, the preference for short, frequent charging sessions suggests that more fastcharging stations should be placed in high-traffic areas, while also ensuring that slower, Level 2 chargers are available for those who need longer, deeper charges. These strategies can help ease the transition to electric vehicles, making it smoother and more appealing for the general public. Moreover, the ability to detect anomalies in charging behavior using models like Isolation Forest can be used to identify potential issues with charging infrastructure or to tailor services to meet the specific needs of different user groups. For example, detecting unusual patterns in charging behavior could indicate the need for maintenance or the addition of new charging stations in underserved areas. 5.4 Future Directions and Opportunities While this research has made significant strides, it also opens up several exciting avenues for future work. One promising direction is the incorporation of additional features into the predictive models. Factors such as weather conditions, time of day, geographical 77 location, and individual driving habits could be integrated to enhance the accuracy and relevance of the predictions. These additional variables could help refine the models, making them more sensitive to the details of real-world charging behavior. Another critical area for future exploration is the development of models capable of real-time predictions. As the adoption of EVs continues to rise, the ability to provide instantaneous feedback based on live data from charging stations could prove invaluable. Real-time models could be integrated with smart grids and other advanced infrastructure to dynamically manage charging loads, prevent grid overloads, and optimize the distribution of resources in real time. The personalization of predictions also represents an exciting frontier. By leveraging user-specific data, such as individual charging habits and preferences, future models could offer personalized recommendations and forecasts. This approach would not only improve the accuracy of predictions but also enhance the user experience by providing tailored insights that align with each driver's unique needs and circumstances. The integration of these models with smart technology, such as IoT devices and smart meters, is yet another promising area for exploration. Such integration would allow for seamless communication between vehicles, chargers, and the power grid, facilitating more efficient energy management and enabling the development of more sophisticated charging strategies. For instance, smart meters could provide real-time data on household energy usage, which could be factored into the timing and intensity of EV charging, thereby reducing strain on the grid during peak hours. Long-term testing and validation of these models are also critical. While the models developed in this study have shown strong performance, their reliability over extended periods and across different contexts needs to be tested. Long term studies that track the 78 performance of these models over time would provide valuable insights into their robustness and adaptability. Such studies could also help identify any potential degradation in model performance and guide the development of strategies to reduce these effects. 5.5 Broader Impact and Societal Contributions The broader impact of this research extends beyond the immediate improvements in EV charging infrastructure. As electric vehicles become increasingly central to global efforts to reduce carbon emissions and combat climate change, the ability to efficiently manage charging infrastructure will be critical. The insights and models developed in this study can contribute to the broader goal of creating a more sustainable and resilient energy system. Policymakers can leverage the findings of this research to craft regulations that support the expansion of EV infrastructure in a way that maximizes efficiency and user satisfaction. For example, policies that encourage the development of fast-charging networks in strategic locations, or that encourage the integration of smart grid technologies, could be informed by the charging behaviors and patterns identified in this study. Additionally, the ability to predict and manage charging demand could help stabilize energy prices and reduce the need for expensive upgrades to the power grid, ultimately benefiting consumers. Furthermore, as the market for electric vehicles continues to grow, the demand for accurate and reliable predictive models will only increase. This research positions itself at the forefront of this emerging field, providing a foundation upon which future innovations can be built. The development of models that can adapt to new technologies, such as wireless charging or vehicle-to-grid systems, will be essential as the EV landscape evolves. In addition, this research has the potential to influence broader societal shifts towards more sustainable transportation solutions. As governments and industries worldwide seek to 79 reduce their carbon footprints, the findings of this study can inform policies and practices that support the transition to electric vehicles. By improving the efficiency and accessibility of EV charging infrastructure, we can encourage more people to make the switch to electric vehicles, contributing to global efforts to reduce climate change. 5.6 Final Reflections and the Road Ahead In reflecting on the journey of this research, it is clear that we have made significant progress in understanding and predicting EV charging behavior. The models and insights developed here offer practical tools for improving the efficiency of EV infrastructure, enhancing user experience, and supporting the broader transition to electric mobility. However, the road ahead is still full of opportunities for further exploration and innovation. As we look to the future, the potential for continued advancements in this field is great. By continuing to refine our models, exploring new technologies, and fostering collaboration across industries and disciplines, we can create a more sustainable, efficient, and user-friendly EV infrastructure. This isn't just about making things work better—it's about creating a future where clean, electric transportation is accessible and convenient for everyone. Ultimately, this research is about more than just technology; it's about making a positive impact on the world. By improving the way we manage and use energy, we can help build a more sustainable future for generations to come. The insights and tools developed in this study are a step in that direction, and with continued effort and innovation, we can continue to make progress toward a cleaner, greener world. 80 References [1] International Energy Agency, "Key World Energy Statistics 2020," IEA, Paris,France, 2020. [2] CEDAMIA (Climate Emergency Declaration and Mobilisation in Action), "Climate Emergency Declaration and Mobilisation in Action," CEDAMIA , [Online]. Available: https://www.cedamia.org/global/. [Accessed 23 08 2024]. [3] United Nations Department of Economic and Social Affairs (UN DESA), "68% of the World Population Projected to Live in Urban Areas by 2050, says UN," 2018 May 2018. [Online]. Available: https://www.un.org/development/desa/en/news/population/2018-revision-of-worldurbanization-prospects.html. [Accessed 23 August 2024]. [4] X.Gong, X.Zhang, F.Gao and Z.Wang, "Comparison of Climate Change Impact Between Power System of Electric Vehicles and Internal Combustion Engine Vehicles," in Advances in Energy and Environmental Materials, 2018, pp. 739-747. [5] International Energy Agency (IEA), "Global EV Outlook 2019: Scaling-up the transition to electric mobility," OECD Publishing, Paris, France, 2019. [6] K. Yeongmin, S. Sanghoon and K. Jang, "User satisfaction with battery electric vehicles in South Korea,," Transportation Research Part D: Transport and Environment, vol. 82, 2020. [7] M. Qasem and J.Jung, "A Comprehensive State-of-the-Art Review of Wired/Wireless Charging Technologies for Battery Electric Vehicles: Classification/Common Topologies/Future Research Issues," IEEE Access, vol. 9, pp. 19572-19585, 2021. [8] J.Wamburu, S.Lee, P.Shenoy and D.Irwin, "Analyzing Distribution Transformers at City Scale and the Impact of EVs and Storage," in Association for Computing Machinery, New York, USA, 2018. [9] J. García-Álvarez, M. Á. González and C. Vela, "Metaheuristics for solving a real-world electric vehicle charging scheduling problem}," Elsevier Science Publishers B. V., vol. 65, p. 292–306, 2018. [10] I. Veza, M. Z. Asy'ari, M. Idris, Vorathin.Epin, I. R. Fattah and M. Spraggon, "Electric vehicle (EV) and driving towards sustainability: Comparison between EV, HEV, PHEV, and ICE vehicles to achieve net zero emissions by 2050 from EV," Elsevier, vol. 82, pp. 459-467, 2023. [11] O. Frendo, J. Graf, N. Gaertner and H. Stuckenschmidt, "Data-driven smart charging for heterogeneous electric vehicle fleets," Energy and AI, vol. 1, 2020. [12] D.Ronanki, A.Kelkar and S.S.Williamson, "Extreme Fast Charging Technology—Prospects to Enhance Sustainable Electric Transportation," Energies, vol. 12, no. 19, 2019. 81 [13] S. Ai, A. Chakravorty and R. Chunming, "Household Power Demand Prediction Using Evolutionary Ensemble Neural Network Pool with Multiple Network Structures," Sensors (Basel), vol. 19, no. 3, 2019. [14] Y.Yang, Z.Tan and Y.Ren, "Research on Factors That Influence the Fast Charging Behavior of Private Battery Electric Vehicles," Sustainability, vol. 12, no. 8, 2020. [15] J.Mies, J.Helmus and R. d. Hoed, "Estimating the Charging Profile of Individual Charge Sessions of Electric Vehicles in The Netherlands," World Electric Vehicle , vol. 9, no. 2, 2018. [16] Y.Lu, Y.Li, D.Xie, E.Wei, H.Bao, H.Chen and X.Zhong, "The Application of Improved Random Forest Algorithm on the Prediction of Electric Vehicle Charging Load," Energies , vol. 11, no. 11, 2018. [17] Z. Xu, "Forecasting Electric Vehicle Arrival & Departure Time On UCSD Campus using Support," San Diego, 2014. [18] O. Frendo, N. Gaertner and H. Stuckenschmidt, "Improving Smart Charging Prioritization by Predicting Electric Vehicle Departure Time," IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 10, p. 6646–6653, 2021. [19] M. Shariatzadeh, C. H. Antunes and M. A. Lopes, "Charging scheduling in a workplace parking lot: Bi-objective optimization approaches through predictive analytics of electric vehicle users' charging behavior," Sustainable Energy, Grids and Networks, vol. 39, 2024. [20] L. Gill, M. Kootstra, E. Huber, C. McLean and B. Fooks, "Midterm Reliability Analysis," California Energy Commission, 2021. [21] C. Davenport, L. Davenport and B. Plumer, "California to Ban the Sale of New Gasoline Cars," The New York Times, 24 Aug 2022. [22] California Independent System Operator, "What is a Flex Alert?," California ISO, 2024. [Online]. Available: https://flexalert.org/what-is-flex-alert. [Accessed July 2024]. [23] M.Waskom, "seaborn.boxplot," Seaborn, 2012 - 2024. [Online]. Available: https://seaborn.pydata.org/generated/seaborn.boxplot.html. [Accessed July 2024]. [24] M.Yi, "A complete guide to box plots," Atlassian, 2024. [Online]. Available: https://www.atlassian.com/data/charts/box-plot-completeguide#:~:text=A%20box%20plot%20(aka%20box,line%20marking%20the%20median%20value.. [Accessed July 2024]. [25] J.Frot, "Using Histograms to Understand Your Data," [Online]. Available: https://statisticsbyjim.com/basics/histograms/. [Accessed July 2024]. [26] M.Yi, "A complete guide to scatter plots," Atlassian, 2024. [Online]. Available: https://www.atlassian.com/data/charts/what-is-a-scatter-plot. [Accessed July 2024]. 82 [27] I. Witten and E. Frank, Data Mining Practical Machine Learning Tools and Techniques, San Francisco, California: Diane Cerra, 2005. [28] A. Faul, A Concise Introduction to Machine Learning, CRC Press, 2019. [29] D. ElMenshawy, W. Helmy and N. El-Tazi, "A Novel Approach for Collective Anomaly Detection in Internet of Things," in Association for Computing Machinery, 2020. [30] Y. Qin and Y. Lou, "Hydrological Time Series Anomaly Pattern Detection based on Isolation Forest," in Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Chengdu, China, 2019. [31] S. Sakib, AL-Ali, A. Osman, S. Dhou and M. Nijim, "Prediction of EV Charging Behavior Using Machine Learning," IEEE Access, vol. 9, 2021. [32] L.Wei, "Genetic Algorithm Optimization of Concrete Frame Structure Based on Improved Random Forest," in 2023 International Conference on Electronics and Devices, Computational Science (ICEDCS), 2023. [33] Y.-W. Chung, B. Khaki and C.-C. Chu, "Ensemble machine learning-based algorithm for electric vehicle user behavior prediction," Applied Energy, vol. 254, no. 2, 2019. [34] D. Yuan, J. Huang, X. Yang and J. Cui, "Improved random forest classification approach based on hybrid clustering selection," Chinese Automation Congress (CAC), pp. 1559-1563, 2020. [35] G. Gupta and N. Rathee, "Performance comparison of Support Vector Regression and Relevance Vector Regression for facial expression recognition," International Conference on Soft Computing Techniques and Implementations (ICSCTI), pp. 1-6, 2015. [36] Kavitha S, Varuna S and Ramya R, "A comparative analysis on linear regression and support vector regression," Online International Conference on Green Engineering and Technologies (ICGET), pp. 1-5, 2016. [37] Y. Zhou, X. Song and M. Zhou, "Supply Chain Fraud Prediction Based On XGBoost Method," in IEEE 2nd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE), 2021. [38] C. Sheng and H. Yu, "An optimized prediction algorithm based on XGBoost," in International Conference on Networking and Network Applications (NaNA), 2022. [39] A. H. Syed and T. Khan, "A Supervised Multi-tree XGBoost Model for an Earlier COVID-19 Diagnosis Based on Clinical Symptoms," in 7th International Conference on Data Science and Machine Learning Applications (CDMA), 2022. [40] H. Chen, H. Ai, Z. Yang, W. Yang, Z. Ye and D. Dong, "An Improved XGBoost Model Based on Spark for Credit Card Fraud Prediction," in h IEEE International Symposium on Smart and 83 Wireless Systems within the International Conferences on Intelligent Data Acquisition and Advanced Computing Systems, Dortmund, Germany, 2020. [41] Y. Li, Z. He, Y. Zhang, W. Zhang, L. Guo and C. Du, "Downlink Channel Parameter Prediction Based on Stacking Regressor in FDD Massive MIMO Systems," in 7th International Conference on Computer and Communication Systems (ICCCS), 2022. [42] J. Pereira-pires, J. Silva, A. Mora and J. Fonseca, "Using Sentinel-2 and Stacking Regressors for Forest Height Estimation," in IEEE International Geoscience and Remote Sensing Symposium, 2023. [43] R. Herbrich and T. Graepel, Ensemble Methods Foundations and Algorithms, Cambridge: Microsoft Research Ltd, 2012. [44] GeeksforGeeks, "Voting Regressor," Sanchhaya Education Private Limited, 25 Oct 2023. [Online]. Available: https://www.geeksforgeeks.org/voting-regressor/. [Accessed July 2024]. [45] K. Matsuura and C. J. Willmott, "Advantages of the mean absolute error (MAE) over," Climate Research, vol. 30, pp. 79-82, 2005. [46] R. J.Hyndman and A.B.Koehler, "Another Look at Measures of Forecast Accuracy," International Journal of Forecasting, vol. 22, pp. 679-688, 2006. [47] J. L. Devore, Probability and Statistics for Engineering and the Sciences, 8th ed., M.Julet, Ed., Boston: Cengage Learning, 2011. [48] B. E. Flores, "A Progmatic View of Accuracy Measurement inForecasting," Omega, vol. 14, no. 2, pp. 93-98, 1986. 84