One of the driving force that pushes a normal cell towards a cancerous state is led by unregulated control of its cellular division through the suppression of tumour suppressor genes and over expression of oncogenes. These cancer-driver genes have been associated in the advancement of different cancer types, though how they are represented and with which genes they are associated within the cancer’s gene expression profiles is a boon towards understanding of said cancer’s development. An novel way of modelling these gene expression profiles is to use graphlet-based network analysis, a data mining technique that allows for identification, understanding, and prediction of their functionality, emergent properties, and potential controllability. The thesis aims to identify patterns in gene co-expression networks of shared cancer census genes found in breast cancer and lung cancer using cancer gene census data provided by COSMIC, the Catalogue of Somatic Mutations in Cancer.
Quantum computing is a rapidly advancing field of computer science that is increasingly becoming more practical. With these devices becoming more realistic, frameworks are needed by which the hardware resources, both quantum and classical, of quantum computers can be utilized more efficiently. This research aims to fill gaps in the research examining the effectiveness of hardware scheduling on the current generation of quantum computers. A hardware scheduling strategy is implemented using the A* search algorithm for routing qubits to conform with hardware limitations, and this algorithm is tested against a wide variety of quantum programs and devices. The effectiveness of the scheduler is determined through analysis of metrics obtained from the scheduling process. This particular scheduler proved to be effective for most of the tested algorithms and efficient for some, making it useful for general purposes, though some potential sources of improvement could increase the number of algorithms it is efficient for.
Increasingly, artificial neural networks are explored to learn relationships among temporal sequence data for purposes of classification, prediction, and anomaly detection with the hope of exceeding the performance of more traditional machine learning algorithms. While the underlying Long Short-Term Memory or Gated Recurrent Unit networks are still the preferred choices by many researchers, such recurrent networks are sub-optimal to learn relationships within and across longer sequences. Transformer neural networks, originally designed to improve the performance of natural language processing tasks, pose an interesting alternative as their attention mechanisms are more capable of capturing context and meaning within longer sequences. Such features present opportunities to apply transformer networks also to temporal sequence data of financial asset prices. This thesis introduces an extension of the original transformer neural network which is capable of multivariate time series representation learning in a supervised learning context and attempts to train temporal sequences of financial asset prices. The prediction accuracy of the transformer extension exceeds two of the most popular recurrent neural networks used for temporal sequence data prediction. The experiments are conducted in the context of a trading algorithm that showcases the practical potential and its implications. As the model is not input data specific, opportunities to transfer enhancements to other domains exist.
Due to an exponential increase in number of electronic documents and easy access to information on the Internet, the need for text summarization has become obvious. An ideal summary contains important parts of the original document, eliminates redundant information and can be generated from single or multiple documents. There are several online text summarizers but they have limited accessibility and generate somewhat incoherent summaries. We have proposed a Graph-based Automatic Summarizer (GAUTOSUMM), which consists of a pre-processing module, control features and a post-processing module. For evaluation, two datasets, Opinosis and DUC 2007 are used and generated summaries are evaluated using ROUGE metrics. The results show that GAUTOSUMM outperforms the online text summarizers in eight out of ten topics both in terms of the summary quality and time performance. A user interface has also been built to collect the original text and the desired number of sentences in the summary.